Connect Banner
for layout only

Search This Site

for layout only

for layout only

Link to Current Issue
Link to Archives
Link to About Connect Page
for layout only
 

Select a Spring 2004 article to read:

for layout only
 
Category: Humanities Computing

The Clerk's Tale

Digitizing Chaucer's Canterbury Tales

By Matthew Zimmerman, with Martha Rust, David Hoover, and Carlos Garcia

[Ed: Links to web pages and/or e-mail addresses which have become inactive since the publication of this article have been enclosed in curly brackets { }. Replacement links have been provided where possible.]

For the past four months, the Humanities Computing Group within ITS' Academic Computing Services has been providing technical support to Professors David Hoover and Martha Rust of NYU's Faculty of Arts and Sciences Department of English in the development of an electronic edition of Chaucer's "The Clerk's Tale." This edition is part of the Canterbury Tales Project, an international, multi-university endeavor based at De Monfort University, Leicester, England and directed by Dr. Peter Robinson.

The Canterbury Tales Project is working to provide new transcriptions of all the extant manuscripts and early-print copies of Chaucer's Canterbury Tales on CD-ROM, with the ultimate goal of determining "as thoroughly as possible" the textual history of the Tales.1 The work undertaken to achieve these linked goals may be divided into four stages:

  1. As the first step, each manuscript is transcribed and encoded. Transcription is done using BBEdit and the Canterbury Tales font, a character set designed to accommodate late-medieval English manuscripts. Encoding conforms to the guidelines of the TEI (Text Encoding Initiative), which is the standard for mark-up of electronic texts in the humanities. Once completed, each transcription is reviewed by two separate readers.
  2. Next, the transcriptions are compared to each other using a collation software program, Collate, first developed for the Project. This program creates a record of the agreements and disagreements among the manuscripts and encodes that record of variants in XML (Extensible Markup Language).
  3. Then, the body of variants is analyzed using the "cladistic" methods of evolutionary biologists; this type of analysis yields a working delineation of the "genetic" relations among the various copies of the text and facilitates queries against the data in order to refine that analysis further.
  4. In the final stage of this process, each tale is published on CD-ROM in a format that allows users to access black and white images of every page of every manuscript or early-print copy of that tale as well as full-text electronic transcriptions. The CD-ROM editions also allow users to compare differences among manuscripts or print copies at any given point in the text. These comparisons are facilitated by Anastasia, another software program that was developed for the Canterbury Tales Project. See figure 1 an example from Caxton's Canterbury Tales.

a screenshot of the Caxton's Canterbury Tales website
 
Figure 1. The Canterbury Tales website at {http://www.cta.dmu.ac.uk/Caxtons/} Replacement URL: http://www.cts.dmu.ac.uk/Caxtons/.
 
 

To date, the Canterbury Tales Project has published "The Wife of Bath's Prologue," "The General Prologue," "The Hengwrt Chaucer Digital Facsimile," "Caxton's Canterbury Tales: The British Library Copies," and is working on "The Miller's Tale," "The Nun's Priest's Tale," "The Franklin's Tale," and "The Merchant's Tale," in addition to "The Clerk's Tale." The online editions are available at {http://www.cta.dmu.ac.uk/Caxtons/} Replacement URL: http://www.cts.dmu.ac.uk/Caxtons/. Since the scope of the Canterbury Tales is so large—consisting of 24 tales in addition to the "General Prologue"—Robinson has begun to form collaborative partnerships with scholars around the world to produce editions.

The Canterbury Tales Project has formed partnerships with Brigham Young University, Virginia Poly-technic Institute and State University, the Institut für Buchwissenschaft und Textforschung at the University of Münster, Germany, and now New York University. NYU's work on "The Clerk's Tale" marks the first time the group at De Montfort University has trained another group in the entire range of skills and tools used in this project—from first transcription, collation, and analysis right through to final electronic publication.

The completion of the edition of "The Clerk's Tale" should take about two years. The first step is to transcribe into electronic text the 53 existing manuscript and early-print copies of the "Tale." This task is currently being carried out by a team of five transcribers, including Professors David Hoover and Martha Rust and three graduate students from NYU's English Department: Adam Coccaro, Mark Hewitt, and Amanda Leff. (Funds for graduate student stipends have been supplied through a generous grant from the NYU Humanities Council.)

Initially, the transcribers are working from digitized images derived from microfilm copies of the manuscripts. Upon completion of the transcription process, one or more members of the transcription team will travel to England to consult the original manuscripts to verify any readings that are in question.

Technical Considerations and Solutions

While the transcription methods, Collate software, and Anastasia server were all developed by Peter Robinson at DeMontfort, the Humanities Computing Group (HCG) has actively provided technical support to the project.

Access to three computers in the Studio for Digital Projects and Research has been provided for use as transcription stations. The Studio, housed in Bobst Library, is a joint project of the NYU Libraries and ITS. The computers have large, high-resolution monitors to view the scanned manuscripts and all of the software needed to produce the electronic transcriptions. In addition, the HCG provides server space for storage of the manuscript images and transcribed texts. This allows the manuscripts and texts to be accessed from any computer with web access and also provides data protection and backups.

Perhaps the most important contribution the HCG has made to the project, however, is the implementation of a Concurrent Versioning System (CVS) repository for version control of the transcriptions. CVS is typically used by teams of computer programmers working on one project. It is a client-server system that allows a project member to "check out" a file from a project, make changes and additions to the file, and then return that file to the server.

CVS keeps track of the changes made and assigns a version number to that file. At any time, a user can go back and see previous versions of the file. Thus, if a mistake was made in updating or changing a file, the previous version can be called up to replace the updated version. Also, if two people have checked out the same file from a project at the same time, they can work on the files separately and both sets of changes will be integrated into the new version. If there are any conflicts (e.g., the same part of the file has been edited), CVS notifies both people before making any permanent changes.

The CVS repository has made life a bit easier for "The Clerk's Tale" editors and transcribers. Before this system was implemented, transcribers for previous Canterbury Tales projects saved files using a system of codes for the tale name, manuscript name, transcriber's name and version number. For instance, for a Caxton version of "The Clerk's Tale," a transcriber could create a file called "CL-Cn-TS-1" where "CL" stood for "Clerk's Tale," "Cn" stood for "Caxton," "TS" stood for the transcriber's initials, and "1" meant it was the first version. Then with subsequent changes to the file, the transcriber would save it as "CL-Cn-TS-2", "CL-Cn-TS-3," etc.


a screenshot of the BBEdit 7 CVS Menu
 
Figure 2. BBEdit's CVS Menu.
 

In theory this system is sound, but in practice it is prone to problems. A transcriber might forget to update the file name before saving it, and therefore overwrite the previous version. Also, after months of transcribing, there could be hundreds of versions of one transcription, meaning hundreds of files, usually stored locally on the transcriber's machine. Lastly, there was no cataloging system to track what changes were made in each version of the file.

The CVS repository has solved these problems. Typically, CVS is run from a UNIX command line, which is prohibitive for the casual computer user, but for the Clerk's Tale project, the NYU team is using the built-in CVS menu in BBEdit version 7.0, a text editor software program for the Macintosh platform.2

Now, when an editor wants to work on a file, he or she "checks out" the project from the CVS repository by choosing one of the UNIX checkout scripts the HCG has created, available directly in the BBEdit menu. Files in the project can then be opened, edited locally, and "committed" to the CVS repository as often as the editor wishes.


a screenshot of the CVS Commit File window
 
Figure 3. BBEdit's CVS allows editors to add comments to a version.
 

Every time a file is committed, it is saved to the CVS repository as a new version of the file, thereby preserving all previous versions. If an editor wants to go back and look at a previous version, he or she can simply check that version out of the system. These commands are all available in BBEdit's CVS menu (see figure 2). In addition to saving multiple versions of the file, the CVS system also saves certain administrative data about each version: the username of the person who committed the file, the date and time it was committed, and any comments about the file the person may have added.

Each time a file is committed, the person saving the file is given an opportunity to make comments about the file or the changes they have made (see figure 3). An editor can also choose to compare different revisions against one another (see figure 4).

So now, instead of having to save hundreds of files under different names and knowing little about when those files were saved or what changes were made, a person can quickly see a list of all the versions of a file, when it was saved, whom it was saved by, and, if comments were added, what was changed or added to the file. And he or she has access to any of these versions at any time.


a screenshot of the Choose Revision window
 
Figure 4. CVS makes it possible for editors to compare revision versions.
 

Although the real work of the Clerk's Tale project is the scholarly editing and transcription being done by Professors Hoover, Rust, and their staff, the HCG has been pleased to facilitate the project through the technical assistance it has provided. The transcription stations and server space provide the project with a dependable work area and secure data backup. CVS has proved to be a very successful tool for managing the large number of files associated with the transcription of "The Clerk's Tale," and may be adopted by the other partner universities working on the project.

This project exemplifies how ITS can collaborate with professors like Dr. Rust and Dr. Hoover by providing the technical support needed to allow them to focus on the scholarly aims of a project and free them from concerns about day-to-day technical details. Related Links

Footnotes
  1. “The Canterbury Tales Project: About the Project,” {http://www.cta.dmu.ac.uk/projects/ctp/about.html} Replacement URL: http://www.canterburytalesproject.org/CTPhistory.html. This web page is also the source for the description of the Project’s stages of production and analysis.
  2. To access BBEdit’s CVS menu, Apple Developer Tools must also be installed (http://developer.apple.com/tools/download/).


Author Biographies

Matthew Zimmerman is a Humanities Computing Specialist in ITS' Academic Computing Services; Martha Rust and David Hoover are professors in NYU's FAS Department of English; Carlos Garcia is a student working with the ITS Humanities Computing Group.


Posted: April 17, 2004. Page last reviewed: May 16, 2006. All content © New York University.
Questions or comments about this site? Send e-mail to: its.connect@nyu.edu.