XX Physical Bibliography

This module defines elements that can be used to encode the physical structure of books and manuscripts, either in order to provide a higher level of bibliographic detail or more structured encoding of bibliographic facts than allowed by the TEI Header or the Manuscript Description module, or in order to associate transcribed text or images of pages with an encoding of the physical structure of the book from which the transcription or images are taken. Two kinds of tags are provided to supplement the standard provisions of the <sourceDesc> section of the TEI header, those that allow encoding of bibliographic formulae, that is, standard or project-specific systems of representation or notation of the physical facts of books or manuscripts, such as the "collation formula" refined by Fredson Bowers, and those that permit direct encoding of the physical facts themselves. In addition, tags are provided to enable book structure to serve as the primary hierarchy governing the encoding of the text itself, and tags and a stand-off markup strategy are provided for users who must choose another kind of TEI hierarchy as their primary one in order to capture the textual features that are of interest to them, but who also wish to encode the physical structure of the source as an aspect of their text encoding.

The collation element

The <collation> element will appear within the <msDescription> or <bookDescription> elements in the <sourceDesc> section of the TEI header. It can contain a collation formula and the other elements that form a full bibliographic description following the Bowers notation (or some other standard or project-specific collation formula); or a paragraph-form description of the structure of a book or manuscript. It can also contain a full formal representation of the structure of the book itself using <codexStructure> and other tags defined below.

The <collation> element has one or more of the following components:

The collationFormula element

The <collationFormula> element is designed to be used to encode any of the standard kinds of collation formulae, such as the type of collation formula specified by Fredson Bowers in his influential book Principles of Bibliographical Description, the kinds used by manuscript cataloguers, and the kind employed in the Gesamtkatalog der Wiegendrucke, or to be adaptable to a project-specific style of collation. It contains the following elements, none of which is obligatory:

Sub-elements of <gatherings>:

Sub-element of <pagination>:

Example (showing the encoding for the formula for an Aphra Behn book given on page 471 of Fredson Bowers's Principles of Bibliographical Description):

<collation>
        <format>quarto</format>
        <collationFormula>
        <gatherings>
        <signatureAlphabet>23 letter</signatureAlphabet>
        <gatheringRange signed="no">
        <start>A</start>
        <end>A</end>
        <leaves>4</leaves>
        </gatheringRange>
        <gatheringRange signed="yes">
        <start>B</start>
        <end>L</end>
        <leaves>4</leaves>
        </gatheringRange>
        <gatheringRange>
        <start>M</start>
        <end>M</end>
        <leaves>4</leaves>
        </gatheringRange>
        <gatheringRange signed="no">
        <start>N</start>
        <end>N</end>
        <leaves>1</leaves>
        </gatheringRange>
        <signatureLeaves>
        <start>1</start>
        <end>2</end>
        </signatureLeaves>
        <anomSignature type="added">
        <gathering>B</gathering>
        <leaf>3</leaf>
        </anomSignature>
        </gatherings>
        <totalLeaves>49</totalLeaves>
        <pagination>
        <pageRange type="front matter" numbered="no">
        <start>1</start>
        <end>8</end>
        </pageRange>
        <pageRange numbered="yes">
        <start>1</start>
        <end>33</end>
        </pageRange>
        <pageRange numbered="yes">
        <start>26</start>
        <end>27</end>
        </pageRange>
        <pageRange numbered="yes">
        <start>36</start>
        <end>37</end>
        </pageRange>
        <pageRange numbered="yes">
        <start>30</start>
        <end>31</end>
        </pageRange>
        <pageRange numbered="yes">
        <start>40</start>
        <end>89</end>
        </pageRange>
        <pageRange numbered="no">
        <start>90</start>
        <end>90</end>
        </pageRange>
        <totalPages>90</totalPages>
        <paginationAppears>in parens centered in hdl.</paginationAppears>
        </pagination>
        </collationFormula>
        </collation>

The codexStructure element

The <codexStructure> element encloses a complex of elements that together describe the full physical form of a printed or handwritten book, such as <gathering>, <leaf>, and <page>. In the case of multi-volume works, <codexStructure> may be repeated for each volume.

Sub-elements of <codexStructure>:

Sub-elements of <gathering> include:

Example (a representation of a gathering of common octavo, folded as in the illustration in Figure 50 of Gaskell's New Introduction to Bibliography, with all relationships explicitly represented in the encoding)

 <gathering>
        <leaf xml:id="leaf1" conjunct="#leaf8"><page xml:id="p1" SheetSide="1" cutFromN="#p8" W="#p16"/><page xml:id="p2" SheetSide="2" cutFromN="#p7" E="#p15"/></leaf>
        <leaf xml:id="leaf2" conjunct="#leaf7"><page xml:id="p3" SheetSide="2" cutFromN="#p6" W="#p14"/><page xml:id="p4" SheetSide="1" cutFromN="#p5" E="#p13"/></leaf>
        <leaf xml:id="leaf3" conjunct="#leaf6"><page xml:id="p5" SheetSide="1" cutFromN="#p4" W="#p12"/><page xml:id="p6" SheetSide="2" cutFromN="#p3" E="#p11"/></leaf>
        <leaf xml:id="leaf4" conjunct="#leaf5"><page xml:id="p7" SheetSide="2" cutFromN="#p2" W="#p10"/><page xml:id="p8" SheetSide="1" cutFromN="#p1" E="#p9"/></leaf>
        <leaf xml:id="leaf5" conjunct="#leaf4"><page xml:id="p9" SheetSide="1" cutFromN="#p16" cutFromE="#p12" W="#p8"/><page xml:id="p10" SheetSide="2" cutFromN="#p15" cutFromW="#p11" E="#p7"/></leaf>
        <leaf xml:id="leaf6" conjunct="#leaf3"><page xml:id="p11" SheetSide="2" cutFromN="#p14" cutFromE="#p10" W="#p6"/><page xml:id="p12" SheetSide="1" cutFromN="#p13" cutFromW="#p9" E="#p5"/></leaf>
        <leaf xml:id="leaf7" conjunct="#leaf2"><page xml:id="p13" SheetSide="1" cutFromN="#p12" W="#p4" cutFromE="#p16"/><page xml:id="p14" SheetSide="2" cutFromN="#p11" cutFromW="#p15" E="#p3"/></leaf>
        <leaf xml:id="leaf8" conjunct="#leaf1"><page xml:id="p15" SheetSide="2" cutFromN="#p10" cuFromE="#p14" W="#p2"/><page xml:id="p16" SheetSide="1" cutFromN="#p9" cutFromW="#p13" E="#p1"/></leaf>
        </gathering>  

"Milestone" tags for book-structure

Note: these tags replace <pb/>, <cb/>, and <lb/> tags included in previous editions of these Guidelines.

The following "milestone" tags may be used to indicate within a text the points at which the various articulations of the physical source occur:

A stand-off markup strategy using milestone tags

The physical structure of a book can be conceptualized as a series of hierarchically-organized objects, such as gatherings which contain leaves, and pages which contain lines of text. For some encoders, especially those with strong bibliographic interest and those preparing electronic transcriptions of manuscript or print materials, the physical structure hierarchy will be the primary one, and tags are provided elsewhere in this chapter to facilitate such a choice of primary hierarchy. For many other encoders, the rich resources of these Guidelines for encoding conceptual textual hierarchies such as chapters, sections and paragraphs are important and a primary hierarchy other than physical book structure must be chosen. The situation arises so frequently that a researcher using another TEI hierarchy as her or his primary hierarchy also wishes to encode the book structure hierarchy in the same file that special provision is made here to facilitate this in addition to the resources offered in Chapter 31, Multiple Hierarchies.

The mechanism described here creates a kind of within-file "stand-off markup" in which information about the book structure hierarchy is kept separate from the encoded text but is linked to the book-structure milestone tags within the encoded text. Reference from the encoded text to the elaboration of book structure in the <codexStructure> section of sourceDesc is by means of pointer-like references to the xml:id attribute of instances of the <page> element in <sourceDesc>, references which occur within the <pageID> attribute of the empty milestone element <newPage>.

The following example shows the use of this strategy:

<teiHeader> . . . 
        <sourceDesc> . . . <msDescription><collation><codexStructure>
        . . . 
        <leaf xml:id="leaf4" ><page xml:id="p7" /><page xml:id="p8"/></leaf>
        <leaf xml:id="leaf5" ><page xml:id="p9" /><page xml:id="p10" /></leaf> . . .
        
        
        </codexStructure></collation></msDescription>
        
        
        </sourceDesc> . . . </teiHeader>
        <text> . . . 
        <newPage pageID="#p7"/>Text from page seven with associated markup.<newPage pageID="#p8"/>Text from page eight. . . .
        </text>
In this example, <newPage/> tags within the text indicate the places where pages begin in the physical book. The <newPage/> tags are milestone tags that do not contain any text and do not participate in the document hierarchy, so elements that do, such as <div>, <p>, or <hi> can be used even if the marked sections of text cross page boundaries. However, the book structure hierarchy is specified in the <codexStructure> section of <sourceDesc>, and the <newPage/> tags within <text> are linked to that specification of the hierarchy by means of the pageID. In effect, the <newPage/> tags specify the points at which the book structure hierarchy specified in <codexStructure> intersects with the running text in which they are inserted. Note that only the <newPage/> tags need to be inserted into the transcription in this example, since leaves and gatherings, composed of pages, can be fully represented in <codexStructure>. In effect, the book structure hierarchy "stands off" from the encoded text, since it exists as a hierarchy only in <codexStructure>.

Physical structure as the primary hierarchy

Scholars creating book surrogates or electronic transcriptions, or those who have a strong interest in representing bibliographic structures, may wish to make book structure the primary organizing principle of their encoding of a text. The following tags are provided to permit such encoding. Users should note that in most instances the use of a book structure hierarchy will make it necessary to treat the addition of other forms of TEI markup carefully, either by avoiding the creation of a competing hierarchy or by employing one or more of the techniques outlined in Chapter 31, Multiple Hierarchies. To signal this need for caution, tags provided for recording the physical structure of the source document within the encoded text are provided with the preface "phys":

  • <physPage> contains the text that occupies a physical page (that is, one side of a leaf) in the source material.
  • <physColumn> contains the text that occupies a physical column in the source material.
  • <physLine> contains the text that occupies a physical line in the source material.