Overview


Elwood's search capabilities are accessed by clicking on the "Srch" button appearing on the Document Display's control bar. A search session initiated in this way will continue until the blue and red "Exit" button appearing on the Search Display is clicked to end the session.

Clicking on the "Srch" button enables searches run both against the document text and against the tags that comprise the document markup. The reader controls the version of the text searched – diplomatic, scribal, critical, and alltags – as well as the types of material – lines, marginalia, headers/footers, formework, etc. – that are included. The reader also sets the display format for search results by selecting from a range of options appearing in the lower portion of the search inquiry box. Text searches may be conducted using words or regular expressions. Markup pertaining to the appearance of the text, rubrication, underlining, etc., as well as capitalization is not taken into account in text searches.

Dynamic indexing assists formulation of searches on document markup by presenting readers with lists of only those elements, attributes, and values that are actually present in the file being examined. Elwood's search display formats include two options for the display of line images as well as an option to display text markup for each item found. All search criteria are entered in the box that appears at the top of the Search Display screen. All successful searches result in the display of units of text in which found search items appear. In the majority of cases this will translate into the display of lines (<l> elements) although it can also result in the display of marginalia, headers and footers, and formework, provided that they have been included in the search.

   [Back]    [Exit]

Selecting Text to be Searched

The style sheet active when a search is invoked remains in force for all text searches conducted during a search session. This means that parts of the encoded text will not be included in searches run against a style sheet version of the text in which they are not displayed. For example, the scribal error "mychief" for "myschief" has been encoded as follows: "<sic>mychief<sic><corr>my[s]chief</corr>" A search conducted against the text as formated by the Scribal style sheet could find "mychief" but not "my[s]chief" since the content of a <corr> element is not displayed by this style sheet. Likewise, "my[s]chief" will be in the text as formated by the Critical style sheet but not in the text formated by the Scribal. Owing to such variances, the AllTags style sheet may be best used for certain searches since it makes all variants available to a search within one comprehensive version of the text. So, to continue our example, either "mychief" or "my[s]chief" could be found by a search conducted against the AllTags version of the text in question.


The style sheet in force during a given search session will be noted as part of the "search" button as in the illustration to the right in which the notation "Search Scribal Text" indicates that the text searched will be shaped by the conventions of the Scribal style sheet.

Elwood facilitates trageted searching by making it possible to limit the elements of the text being searched. Checkboxes (appearing at the bottom of the preceding illustration) for "lines," "marginalia," "heads," and "formework" permit the inclusion (or exclusion) of <l>, <marginalia>, <header> and <trailer>, and <fw> elements in a search. Note that the "lines" checkbox is checked by default.    [Back]    [Exit]

Text Searches

Simple text searches can be carried out on single words or phrases placed in the uppermost line in the "Search Text" area. Note that if a phrase is entered, the search will be conducted on the precise phrase as entered — with the exception that lettercase will not be taken into account. Searching on the phrase "piers plowman" per the illustration to the right will find all all instances of "Piers Plowman" in the text but it will not find lines containing the phrase "Piers þe plowman." Keep in mind also, that intervening punctuation marks can have an effect on the outcome of a search: "hem so" as a search term will yield different results than "hem · so"    [Back]    [Exit]

Boolean Logic

In addition to simple searches on indivdual words and phrases, Elwood allows for more complex Boolean logic searches that expand the range of possible search strategies available to a reader. Initiating such a search requires entries on at least the first and second lines of the "Search Text" area as well as the selection of a logical operator from a drop-down menu box.

Elwood caries out such searches using the basic units of text — lines, marginalia, headers/footers, formework, etc. — to define the boundaries required to meet a Boolean condition. For example, in a search confined to lines (<l> elements) the fulfillment of a logical "AND" will take place if and only if both search items linked the logical "AND" are found within the same line.

The order of precedence for searches involving three terms, such as that in the above illustration, may be expressed by "term one" AND/OR/NOT ("term two" AND/OR "term three"), where the operation inside the parentheses is carried out first. So, for example, the search initiated by the illustration above may be described by saying "find the set of the lines that contain either "plowman" or "ploughman" and then display the members of that set which also contain "piers."

Boolean searches provide one way to proceed when the key terms of a phrase are known but intervening words or punctation are either unknown or known to be variable. Although such searches do not maintain an order among the terms searched ("plowman" might preceded "piers" in some of the lines found in the above example), this feature is ideal when interest is in lexical collocations apart from the sequence of their appearence. Note that the search for the repetition of an item within a line (without regard for intervening words) may NOT be carried out by supplying the same word in the first and second text search lines and linking them with a logical "AND." Elwood's search mechananism tests for each term of a Boolean search separately — thereby making it possible for the same instance of a word or phrase to satisfy the separate tests required by the search. If you wish to carry out such a search, you must use a regular expression (see below).    [Back]    [Exit]

Special Characters

Text searches may be conducted on words and phrases that contain non-keyboard characters. Since thorn ("þ") and yogh ("ȝ") are commonly found in the texts published by the PPEA, a special convention has been arranged by their entry in the search text fields. If you wish to enter a thorn as part of a search term, you may do so by entering a lowercase "t" followed by a percent sign: "t%" To enter a yogh, type a lowercase "y" followed by a percent sign: "y%" Elwood will take care of the conversion of these character strings into the appropriate codes for thorn and yogh respectively.

Other special characters needed for searching the text must be typed in a unicode format consisting of four hexadecimal digits prefixed by the escape character "\" and a "u": using this convention, the paragraph or pilcrow sign ("¶"), for example, would be entered as "\u00B6" (without the quotation marks). This character string will be treated as a single character

    
  raised point     \u00B7  
       punctus elevatus     \uF161  
  ¶     pilcrow sign     \u00B6  
when it is read by Elwood's search engine. All special characters, with the exceptions of the thorn and the yogh as discussed above, must be entered in this format for searches. Other "character entity" formats (e.g. "&para;" or "&#x00B6;" or "&#182;") will not be properly interpreted by Elwood. The table at right provides commonly used special characters with their unicode escape equivalents.    [Back]    [Exit]

Regular Expressions [Advanced feature]

Regular expressions greatly extend Elwood's text search capacity by enabling searches based on complex pattern matching in addition to those carried out with actual words and phrases. It is outside the scope of this help file to provide a comprehensive introduction to the construction of regular expressions. Readers interested in learning more about the notation associated with this powerful tool may consult any one of a number of online resources including those listed in the table below. Examples of regular expression searches are also provided as a way of giving concrete illustration to the kinds of inquiry that they can support.

Note that the pattern matching conventions in different implimentations of regular expressions can vary. Since Elwood makes use of the regular expression engine embedded in Internet Explorer, Microsoft's discussion of regular expressions has special importance. Clicking on the following URL will open a new window to display Microsoft's introduction to regular expressions.
http://msdn.microsoft.com/library/en-us/script56/html/js56reconIntroductionToRegularExpressions.asp?frame=true
The following URL summarizes the pattern matching syntax used in Elwood's regular expressions.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/js56jsgrpregexpsyntax.asp
The following URL echos the contents of the preceding site.
http://netcoders.dk/docs/jscript/jsregexpsyntax.htm
A number internet sites offer discussions of regular expression pattern-matching. Since conventions in this are not entirely uniform, it would be best to consult one of the preceding resources before branching out to more generic discussions or tutorials. With that said, there are a number of thoughtful presentations available on the internet, of which the following is but one example.
http://www.evolt.org/RegEx_Basics
   [Back]    [Exit]

Sample searches


Example 1. Find all instances of the word "they" occurring in the text of Piers Plowman.

a. First, if you wish to confine the search to the body of the poem and exclude any occurences of "they" in marginalia, headers, footers, and formework, confirm that the checkbox for "Lines" in the list of "Search Items:" is is checked and that all others are blank.
b. Enter "\s(t%|th)e[iy]\s" (without the quotation marks) in the first line of the "Search Text:" display.
c. Click on the "Search ... Text" button.

If this search is performed on the text of the Laud MS of the B-version of Piers Plowman, Elwood will display the results shown below.


Discussion. The regular expression \s(t%|th)e[iy]\s has enabled the search to capture instances of "they" in spite of varied scribal spellings — they, thei, þey, and þie. Note that the expression is bounded on both sides by \s — the regular expression metacharacter that matches any white space character in a text. Bracketing the body of the regular expression with the white space metacharacter \s insures that the match will include an entire word. The set of characters in parentheses, (t%|th), following the intital \s signals that either "þ" or a "th" will satisfy the serach criteria. When the regular expression engine conducts a matching pass through the text, groupings in which individual characters or sets of characters are enclosed in parentheses and separated by pipes "|" are treated as collections of alternative valid patterns. The e in the middle of the regular expression is a required letter "e". Generally speaking, letters that are not preceded by the escape character "\" are to be taken at face value in regular expression pattern matching. Finally, in the syntax of regular expression pattern matching a set of individual characters enclosed in square brackets identifies a set of valid alternative characters for a match. In this case, the [iy] notation indicates that the occurence of either "i" or "y" will be valid — and, by implication, that no other characters will satisfy pattern match.

An important point needs to be made regarding a convention occasioned by the "þ" that appears in the preceding search. Ordinarily, if a text contains only "keyboard" characters, the appropriate choice of word delimiter for a regular expression search would be the word boundary metatcharacter, "\b". The addition of characters such as "þ" and "ȝ" to the text's alphabet requires that the whitespace metacharacter "\s" be used in its place. You should always use "\s" as the matacharacter to mark the beginning and ending of a pattern intended to match entire words. Use of "\b" metacharacter will produce anomolous results when "þ" and/or "ȝ" are used in a word's spelling.

Example 2. Find all occurences in Piers Plowman of "such" when it is the first word in a line

a. as above b. Enter "^\ssuche?\s" (without the quotation marks) in the first line of the "Search Text:" display. c. Click on the "Search ... Text" button.

If this search is performed on the text of the Laud MS of the B-version of Piers Plowman, Elwood will display the results shown below.

Discussion. Use of a regular expression ^\ssuche?\s has permited the identification of all lines in the poem that begin with "such(e)". Whitespace metacharacters again bracket the search pattern, serving the same purpose here of forcing a match on word boundaries as they did in the earlier example. Note, however, that the start of line metacharacter, ^, is placed at the beginning of the regular expression — thereby anchoring the pattern match to the start of each line. Note also that the whitespace metacharacter, \s, that follows it will successfully match the first character in the line because, for the purposes of word selection, Elwood insures that each line begins with a space. The charaters that comprise body of the regular expression, suche, have been set off by those that precede them so that they will participate in a successful match only if it occurs at the start of a line. The ? that follows the final character of the word indicates that its presence in the pattern is optional, thereby enabling matches on "such" and "suche" by the same regular expression.

Example 3. Find all occurences in Piers Plowman of adverbs ending in -ly and -liche and their variant spellings.

a. as above
b. Enter "\s\w*?liche?\s" (without the quotation marks) in the first line of the "Search Text:" display. Select "OR" in the drop-down box at the end of the line and enter "\s\w*?l[iy]\s" (without the quotation marks) on the second line.
c. Click on the "Search ... Text" button.


Discussion. At first glance this search serves as a reminder that spelling conventions do not map neatly into grammatical parts of speech — as "ferly," "bely," and "dedly" in the found set of lines illustrate. Nevertheless, a search such as this greatly eases the effort in identifying candidates for the desired class of words. The first regular expression, \s\w*?liche?\s, is crafted to match words ending in "-lich(e)". The function of the liche? portion of the expression should be clear from Example 2: liche is a sequence of regular alphabetic characters that must be present if the pattern is to be matched — with the exception of the final "e" which is followed by the ? that signals its optional presence in the pattern. The first portion of the expression, \w*?, after the initital whitespace metacharacter \s that insures a match on a a word boundary, will match any number of characters at the start of a word. The "\w" metacharacter in this sequence matches any word character including underscore (A-Z,a-z,_). The "*" metacharacter in this sequence matches the preceding character zero or more times — thereby creating a variable length pattern of word characters or letters. The "?" metacharacter in this sequence has an entirely different significance than the second "?" in the regular expression. In this case, owing to the fact that it follows a metacharacter and not a regular alphabetic letter, it qualifies the preceding "*" by limiting the length of the sequence of word characters that it signals to the fewest possible number of characters while still completing the match.

A similar analysis could be conducted for the second regular expression in this example, but the key to its role in this search is the Boolean "OR" that permits Elwood to find entries made on both lines of the search query display. Taken together, these entries combine the power of regular expression pattern matching with the capacity to overlay multiple regular expression searches in a single pass through the target text.

Example 4. Find all lines in Piers Plowman where "of" occurs at least two times in the line.

a. as above
b. Enter "\s\of\s" (without the quotation marks) in the first line of the "Search Text:" display. Select "AND" in the drop-down box at the end of the line and enter "\s\of\s.*?\s\of\s on the second line." (without the quotation marks).
c. Click on the "Search ... Text" button.


Discussion. This search illustrates how to find a word when it occurs two or more times in the same line. The meaning of the regular expressions used here should be clear from the previous examples — except for the dot (".") metacharacter that occurs in the second regular expression and which matches any single character. The sequence .*?, coming between the two instances of "of" (each delimited by a whitespace character), serves as an expandable filler, signifying a sequence of any length of comprised of any kind of character (alphabetic, whitespace, or punctuation marks).

The unusual feature here is the use of the Boolean logic built into Elwood's search mechanism to insure that only the word "of" is highlighted in the lines returned by the search and that each instance of "of" is highlighted in red even when there are more then two occurences of the work in a line. Strictly speaking, the second regular expression used here, \s\of\s.*?\s\of\s is all that would be needed to locate all lines with two or more instances of the word of. If it were applied as the sole search pattern, it would return only those lines that had two or more instances of of — but the highlighting in each case would extend from the beginning "o" of the first of to the ending "f" of the second as it does in the illustration below.


This situation is remedied by the inclusion of \s\of\s as the first term of the search — an inclusion that is for formating purposes only. Since the highlighting of found text by Elwood is carried out in the order in which the terms of a Boolean search appear, and since the application of formating to each of occuring in the found text blocks the application of formating that would normally occur for the second search term, only of is highlighted and not the text that is a match to the second term of the search even though it is the latter that serves to limit the search to lines in there are two or more occurences of of. [Note: the "blocking" referenced above occurs because the attempt to apply highlighting to the text to conform to the pattern of the second search term takes place upon text that has been altered by the highlighting called for by the first search term. This alteration, in the form of hidden, but nonetheless present, markup causes the attempted pattern match for the second term to fail.]

The foregoing search strategy, which turns on a feature of Elwood, not of regular expressions per se, can be applied in a number of useful ways. One of the most significant of these is in the highlighting of alliterating words in alliterative poetry, as the following image illustrates. The foregoing examples provide the basis for an understanding of the search terms that are also present in the illustration.