DRAFT 28/12/93

Revised 13 Jan 1994

[Nota bene: These protocols will require revision in the light of TEI P3.]

Markup Practice for Piers Plowman Electronic Archive

A. To Record Non-ASCII Characters

1) If you have access to an SGML reader/browser, use the appropriate entity references. See Appendix I for TEI-conformant entity references.

2) If you do not have access to an SGML reader/browser, it's probably best to make an initial transcription with easily read, and completely unambiguous, alternate characters. For the Piers Plowman transcriptions, let's use the following conventions:
"@" for thorn
"&T;" for capital thorn
"#" for yogh
" &#;" for capital yogh;
" %" for eth
" &%;" for capital eth.

Eventually, each will be converted to an entity reference.

3) Use an entity reference for paraph markers: ¶ph;

4) Use a semi-colon to represent the punctus elevatus. That can later by changed with a macro to "&punctusElevatus;" or perhaps simply left with a note to the user?

5) In initial transcription, use "&" for manuscript versions of ampersand. Later, these will be converted to the correct entity reference. However, remember that you must convert this as it appears in all contexts: line beginning, line ending, before "c" in the expression "&c", before or after punctuation, etc. Would you prefer to use the correct entity reference from the beginning?

B. To Record Alternative Letter Forms

Editors must decide whether it is worthwhile to record different allographs of letters. For instance, the F scribe uses a long "s" as well as a sigma "s" and one other. He has three "r" forms. I concluded that these were in free variation and carried no information. Someone else might wish to record differences among letter forms. Entity references will be the most efficient way to do that.

C. To Record Abbreviations and Suspensions

1) Indicate resolution of standard abbreviations by placing the expanded material between parentheses; e.g. p(ro)p(er). The essential rule here is that one should record the interpreted graphic material in parentheses, leaving unambiguous graphs outside the parentheses. Eventually, these parentheses will be replaced with "(=<expan>" and ")=</expan>." (or)s

2) If you wish to leave the abbreviation or suspension itself in the text, use the following convention:
p<abbrev type=superscription resp=hnd expan="re">&re; st

That would indicate that "hnd" was responsible for interpreting this superscription to mean "re," and it would supply the entity reference (defined in your DTD or header). The SGML reader would print "prest" with whatever marker you've instructed to make for abbreviations; e.g. italics or color change.

3) Place non-standard abbreviations between parentheses, but each should be followed by a note in this form:
<note>Content of the note.</note>

4) To mark numbers written in roman numerals.
<expan type=num orig="lx">sixty</expan>

5) To indicate whole word brevigraphs of different form.
<expan rdg="Ihu˜">Iesu</expan>
<expan rdg="Ihc˜">Iesu</expan>
For the slightly different brevigraph for "Iesus," use the following:
<expan rdg="Ihs˜">Iesus</expan>
You may, if you wish, use a "~" and replace it with the entity reference later in the process.

D. To Treat Word Division

Users will wish to search for whole words, but scribal word divisions are not always consistent. To make such searches easier, use the OED to determine whether a word consisting of two or more morphs is to be treated as one word or more. For instance, "for euer" is one word in American usage, two in British. It should be written "for euer." Words like "believe" which appear inconsistently as "be leue" ~ "beleue," should be transcribed as "be-leue" to represent scribal use of spacing when there is a distinct break between the morphs and "beleue" when there is not. It is probably madness to attempt to adjudicate tiny differences between scribal words, but in general, the principle is to reflect possibly -emic separations of morphemes and to ignore word divisions that appear not to carry information.

Some word divisions carry information about scribal pronunciation and should be represented in transcription. For instance, "at ease" appears as "a tese". Transcribe it as
<expan rdg="a tese">at ese</expan>
Ignore erratic word divisions that lack significance, though it is perhaps worth recording them until such time as one can distinguish noise from information.

E. To Record the Physical Condition of the Manuscript

The list of major categories listed here is not exhaustive, and editors will have to use ad hoc terms in quotation marks for problems not anticipated here.

1) To Record Damage to the MS.

<damage type="hole" length="4">

Possible attributes here include type, length, resp[onsibility] Type will include tears, rips, trimmed margins, overbinding, fading ink, rubbing, water or mildew stains, etc. For instance, in MS R, where the margins have been trimmed away with some loss of text in the first few folios, should be marked as follows:

<damage type="margin trimmed away" length=2>

or if one can supply what is missing,

<rdg type="margin trimmed away" length=2>Wh</rdg>anne

Length can be measured in characters, inches, millimeters, etc.

Resp refers to the transcriber who makes the decision about the existence, type, length of the damage. {I think this is best left as a default, unless someone other than the initial editor makes the decisions?}

2) To indicate where characters are unclear or ambiguous for some reason other than damage to the manuscript.

<unclear type="ill-formed" length=2 resp=HND>

As with damage, this tag can stand alone or surround restored or guessed at text.

3) To indicate where space is left vacant for characters.

<space type=horizontal length=6>

F. To Mark Scribal Corrections/Additions to the Text

1) To Mark Additions

<add place=interlinear resp="hand1">

means that the text-hand scribe had put the graphs into his text above the line.

Other place possibilities include "marginRight," "marginLeft," "marginTop," "marginBottom."

Resp[onsibility] is assigned where possible to the scribe who made the addition.

The <add> tag is used for shorter sequences of text, single words or phrases. For larger level additions, use <addSpan>Material</addSpan>. It has the same attributes as <add> with the addition of endpoint, which refers to an anchor placed at the end of the span of the added text.

2) To Mark Deletions

<del type=erasure length=7 resp=hand2>Plowman</del>

marks a still readable "Plowman" that had been erased or subpuncted (changing type=subpunction) by the revising scribe identified in the header as "hand2". Another attribute status may be added with the values "errTooFar" or "errShort" to mark erroneous deletions that went to far or not far enough. One may also use a certainty tag if there are doubts as to what is still readable, indicating the relative security of the reading tagged.

As with <add>, use <delSpan> for longer deleted stretches of text.

3) To Mark Corrections

<corr resp="hand1" sic="pardon" type=erasure>oligarchy</corr>

Such a tag would indicate that the original scribe erased the word "pardon" and replaced it with "oligarchy." {{I need to work more on this. The TEI chapter in P2 doesn't appear to include "type" as an attribute. P3 has just become available, and I haven't seen the new chapter.}}

One could also use the <sic> tag to mark a passage one takes to be in error but has not chosen to correct. e.g.

<sic resp=hnd corr=oligarchy>oleagenous</sic>

G. To Mark Changes in Hand, High-Lighting, Emphases, etc.

1) To Mark Ornamental Capitals

<hi rend="orncp8">N</hi>Ow

This would mark an ornamental capital "N" of 8 lines height followed by a capital "O" and lower case "w". This tag is to a degree cobbled, but it works in the SEENET.DTD.

2) To Mark Latin/French Text with a Change of Script

<foreign lang="latin"><hi rend="name of script">Latin text</hi></foreign>
or
<foreign lang="latin" rend="name of script">Latin text</foreign>
or
<hi lang="latin" rend="name of script">Latin text</hi>

Note that in the first example the interior <hi> tag is closed immediately following the Latin text and that the </foreign> goes outside it.

3) To Indicate Rubricated Words and Phrases or Otherwise Highlighted Text

<hi rend="rubrication">Dowel</hi>
or
<emphasis type="rubrication">Dowel</emphasis>

If it turns out that "Dowel" is not rubricated but merely in a different script size with red ink on the black letters, use the following tag:

<emphasis type="red ornament" hand="hand2">Dowel</emphasis>

4) To Mark Underlined Words

<hi rend="underlined">Underlined forms</hi>

5) To Mark Boxed Words and Phrases

<hi rend="boxed">Boxed forms</hi>

H. Structural Divisions

1) To Mark Passus Divisions

<div1 type="passus" n="Prol">

2) To Record Strophic Divisions

<lg type=strophe n=1> strophe </lg>

Strophes are marked in some MSS with paraphs or skipped lines or both. Record these, with <lb> for skipped lines and ¶ph; for paraph markers. {In the MSS I have seen, these paraphs are in red and blue. Should the colors be recorded in view of the black and white microfilms from which the transcriptions are made?}

3) To Record Line Numbers

<l n="F P006">

Lines will each be numbered. Each MS will have its own lines numbered absolutely with Latin lines counted the same as English, but with additional tags for its line number in the archetype, in the critical text, and in Kane-Donaldson. Thornton Staples is writing a program for inserting line numbers semi-automatically.

4) To Indicate Differences from Canonical Divisions

When non-standard passus divisions occur, use the immediate scribe's passus divisions for <div>s and for absolute numbering, but indicate where passus divisions appear in the B archetype (using Kane-Donaldson's numeration until such time as the archetype is established) with milestone tags.

<mstone type="B passus" n="1">

The numbering program will ignore lines beginning with <mstone ....

I. Folio Markers

Between the bottom of each leaf and the top of the next, supply the following tag:

<mstone type="fol" n="36v">

where the transcription of fol. 36v follows the tag.

Copyright (c) 1994 by Hoyt N. Duggan, all rights reserved.
Last Modified: Monday, 27-Aug-2001 16:24:04 EDT