NINCH guide home        interview table of contents        previous interview        next interview

 

35   University of Virginia, William Blake Archive

 

HATII interviewed Matthew Kirschenbaum, the Technical Editor of the William Blake Archive at the University of Virginia, and Andrea Laue, the Project Manager, on September 22 2000. Sponsored by the Library of Congress, the William Blake Archive represents a digital collection of work by and relating to Blake, in various forms from printed books to artifacts. Rather than restricting access to the University community, the Archive aims primarily to reach an international audience. Indeed, this was the most significant motivation for its initiation. Matthew Kirschenbaum completed the questionnaire section of the interview instrument, while Andrea Laue was interviewed for the remainder.

 

35.1    Organizational Digitization Program and Policy

The William Blake Archive developed in response to the burgeoning interest in Blake’s poetry and visual art amongst a dispersed and varied global audience. The aim of the Archive is to provide an international, unified, public resource. Active since 1996, the Archive has sought to unite visual and literary works that are highly disparate, widely dispersed, and often severely restricted because of their value, rarity, and fragility.

The William Blake Archive’s priorities for digitization were formalized in the original Getty Grant Program application, produced by the project editors. In the first phase (pre-1999) this prioritized a group of Blake’s illuminated books based on historical principles derived from recent revisionist scholarship. These works were also the best known and studied by the editors; however, gaining permission from institutions has meant that presented difficulties from the outset. In the second phase (1999 to 2002), priority has moved to additional illuminated books and non-illuminated material, such as paintings, drawings, original and reproductive prints, manuscripts, and rare or unique typographical works. The aim behind this second phase is to expand the core of the Archive and exemplify the full range of Blake’s work. Ultimately it is hoped that the full range of Blake’s work could be incorporated.

The objectives this policy sought to achieve are to broaden access, enhance teaching and scholarly research through providing comparison copies and editions, and provide a durable foundation for future scholarship and a legacy of adaptable tools.

The biggest obstacle to planning the development and building of the digital deliverables has been obtaining permission (which are the largest single budgetary expense), along with the actual cost of the digitization process.

The criteria that have guided the project in selecting material are those that informed the original editorial principles. These principles have not changed over time, even after the shift to non-illustrated books as the corpus of illustrated books became digitized.

The project has co-operated with libraries, museums and academic institutions at the national and international levels. In particular the Archive hopes that the collaborative work of the three editors and the staff of the Institute for Advanced Technology in the Humanities will become a useful prototype of “distance editing”.

The Archive’s work began in 1995 when the Getty Grant Program underwrote the initial phase of the project and has an anticipated end date of 2004.

The purposes in creating the digital deliverables are academic and public access, teaching, learning, research, experiment, and in response to previous demand. The program has produced an explicit statement of intent that covers its rationale, scope, significance, primary audience, long-term sustainability, level of faithfulness to the originals and suitability for different target audiences. This statement is available on the web at http://jefferson.village.virginia.edu/blake/public/about/

The type of source material digitized includes:

It is hoped that the digitization process will ultimately represent the entire body of material. The Archive is being built on the principle of creating a history of Blake’s artistic production in one edition. However, the first priority was to digitize one copy from every printing of every book, with the illuminated books as an editorial and archival backbone. The project does not intend to re-purpose these digital deliverables.

The following standards, guidelines or tools are used for representing content:

The following standards, guidelines or tools are used for describing content:

The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy. It rejected TEI as inadequate for editorial purposes.

The following standards, guidelines or tools are used for representing structure:

The DTDs were developed for the Archive by IATH. The principal of these is the Blake Archive Description (BAD) DTD. This is used to encode all works at the object and collection level. The TEI DTD is used for other materials in the Archive such as bibliographies and Erdman’s Complete Poetry and Prose of William Blake.

The program does not make any recommendations in relation to standards in general, nor in terms of navigating between the ideal and the realistic.

The intended target audiences for digital deliverables are:

The project carried out an evaluation of the target audiences. Various groups other than the primary target audience could used the digital deliverables. The profile of the users has been as anticipated.

 

35.2    Project Management and Planning

Experience in project management was available in-house. The program has led to changes in organizational relationships and procedures (no details available). There are formal project management procedures in place (no details provided). No project management procedure has failed. Quality assurance procedures take the form of annual meetings to set the year’s agenda, while grants require specific goals and program reports. These have been successful for grant applications.

Pilot studies have been carried out for training needs, technical feasibility, user needs, workflow analysis and technology forecasting. No significant changes have been made to the design of the project as a result. No time and motion or benchmarking studies have been carried out. Job descriptions are applied.

Digitization is carried out in-house using equipment that was already available and some that was bought in. The Archive uses a Microtek Scanmaker III and a Microtek Scanmaker V flatbed scanners (the latter uses a transparent media adapter) and a Nikon LS-3510AF slide scanner. Guidelines for data capture procedures have been established and color charts are the benchmarks used. The monitors and scanners are calibrated on a regular basis (using ProSense 1.8 software) and compressed air is used to remove dust and lint, from the scanner bed and transparency before every scan.

 

35.3    Human Resources and Training

The project employs three general editors (directors) plus a technical editor. One of the editors is also a digitizer; one or two graduate students are employed for approximately ten hours a week and a further two graduate students for markup also for ten hours a week. The project manager and technical support person are graduate students and work 15-20 hours per week. IATH has two technical development staff who work 5% of their time on the project. All the project staff have a humanities background and the markup students are specialized in Blake. A combination of external and in-house advice on the technical aspects of digitization is used (e.g. Worthy Martin, Kodak and Xerox for JPEG 2000).

Areas where training needs were identified were project management, application of technical standards, preparation and handling of materials for digitization, technical operation of digitization equipment, and metadata creation. The project director, technical staff and equipment operators engaged in training and this has been organized in-house (own consultants), through external courses, independent study and learning on the job. The training has met the needs of the project.

 

35.4    Project Life-Cycle Processes and Procedures

The project is aware of the copyright position of the digital deliverables and does not own the copyright in the original materials. Material in copyright was digitized with the owners’ agreement (for which there are formal procedures). The program declares copyright or rights status. Users are allowed to make printouts of the digital deliverables on paper and download to a PC. Users can view texts marked up according to the Blake DTD and to TEI, but the actual markup is not available. Users can view lower and highest quality images. No electronic management systems, such as watermarking, are in use.

The project does not have a conservation procedure for the original materials (since this is not part of the project). The interviewee was uncertain whether the material’s condition is investigated. The project does not modify, degrade or compromise material to carry out digitization. No information was available on risk assessment.

For illustrated books, the cataloging or reference system in place is the Bentley number system. There is a variety of catalogs and reference aids for non-illustrated books. Copy/edition information, location, etching print date, provenance from the catalog or reference system, and editor’s information are used in the digitization process. The program does not alter originals for the digitization process or reject material. The Archive digitizes from three types of materials, two sizes of transparencies and slides. The intermediaries that have been used are 4x5 and 8x10 transparencies and occasionally 35mm slides. The project is developing an in-house system for non-illustrated books, possibly based on access accession numbers.

The Archive’s second DTD is the Blake Object Description (BOD). This is used to encode the textual metadata comprising the Image Information Record. Each and every image in the Archive also contains textual metadata comprising its Image Information Record. The Image Information Record combines the technical data collected during the scanning process from the Image Production Record with additional bibliographic documentation of the image, as well as information pertaining to provenance, present location, and the contact information of the owning institution. These textual records are, at the most literal level, a part of the Archive’s image files. Image files are typically considered to be nothing but information about the images themselves (the composition of their pixilated bitmaps, essentially); but, in practice, an image file can be the container for several different kinds of information. The Blake Archive takes advantage of this by slotting its Image Information Records into that portion of the image file reserved for textual metadata. Because the textual content of the Image Information Record now becomes a part of the image file itself in such an intimate way, this has the great advantage of allowing the record to travel with the image, even if it is downloaded and detached from the Archive’s infrastructure. Tools for controlling data values are controlled vocabulary and classification scheme.

 

35.5    Format, Resolution and Compression of Digitized Materials

The formats for retroconverted text-based digital deliverables are:

Some texts contained non-Latin scripts. OCR is not used and keying in is the main conversion method for textual materials.

For image material the TIFF file format is used for capture and preservation and JPEG for delivery. The capture and preservation resolution is 600dpi. Delivery resolution is between 100 and 300dpi. Bit- depth is 24 throughout. Image compression is TIFF for preserving and JPEG for delivery. The aim of compression was to improve access, enhance usability and decrease storage requirements. The project retains the uncompressed scans. The project carries out post processing color correction on the images using PhotoShop and hooded Radius Press View 17SR and 21SR monitors. Color correction takes between 30 minutes and several hours. This is a key step to establishing the scholarly integrity of the Archive. Each transparency has first been color corrected to the original before the digital image is color corrected to the transparency. In this way the digital image will match the original when viewed under optimal conditions.

Average file size for capture and preservation is too variable to state. The quality control procedures in place for the digital deliverables are spot checks and total checks. Metadata quality control procedures are spot checks on the parsed files for images, while text files are checked by the editor.

Users can access the open catalog plus materials. Searching and browsing facilities are keyword and text description, Boolean, portions of the SGML, transcriptions and image markup. The Image Information Record may be viewed using either the “Info” button located on the control panel of the Archive’s ImageSizer applet, or with the Text Display feature of standard software such as Adobe PhotoShop or X-View. Users do not have to pay for use of the digital deliverables.

Potential users of the digital deliverables are informed about their availability through website announcements, press releases, articles in print media, print media coverage, conferences and meetings, email shots and registering with search engines.

Special software used is Java and Dynaweb. Apart from searching and browsing, users are able to manipulate images by using an image sizer developed in house in Java. Usage levels are 1700 hits a day monitored by automatic data capture.

 

35.6    Evaluation, Funding and Long-term Sustainability

The program did not carry out front-end or formative evaluation of users before the development of the project. Summative evaluation was undertaken using email and observation of users’ interaction with the system. The purpose of this evaluation was interface design. As a result, a revised interface was released in January 2001.

Funding sources were NEH, Getty, Mellon and sponsorship (Sun, IBM, INSO (for Dynaweb)). The program believes that the use of standards and guidelines in the digitization process has cost money. The program has been monitored by its funding through annual reports. Documents requested by funders include project management plans, cost models and workflow reports.

New material is added continually and the user interface updated regularly.

The Archive’s preservation strategy is the use of durable and migratory encoding structures, and robust backup and archive procedures. The Archive’s project office retains hard copies of all Image Production Records, as well as ledgers tracking electronic file transfers, consignment of TIFF images to tape and CD-ROM, and shipping of transparencies and slides. Hard copies of the DTD and the Archive’s SGML are also retained at the project office.

All of the Archive’s data is safeguarded via daily incremental backups on magnetic tape. In the event of a catastrophic disk failure or a server break-in, the Archive’s data could be quickly restored from the off-site backup system.

The digital deliverables will be available indefinitely and the project’s longer-term sustainability is not reliant on self-generating funds.

Access tracking to all three of the Archive’s servers is via Wusage 5.0.

The project does not have an exit strategy. Loss of the digital deliverable would be a matter of concern.




valid xhtml 1.0 strict
abp~04/02