NINCH guide home        interview table of contents        previous interview        next interview

 

7       University of Chicago Library

 

HATII interviewed Kathleen Arthur, Head of Preservation Reformatting and Digital Initiatives at the University of Chicago Library, and Elizabeth M. Long, Co-director of the Digital Library Development Center, on January 9 2001. The digitization activities at the University Library have resulted in the creation of several individual programs and projects. While these projects are designed to satisfy specific requirements, the Library also has the general intention to broaden the availability of resources to audiences other than University Library users, by means of Internet access. Simultaneously, they hope to encourage resource-based leaning and the use of technology in education and research both within and out-with the University community.

 

7.1       Organizational Digitization Program and Policy

To date the University of Chicago’s digitization activities have been project based. Initially the preservation department began with a microfilming grant that included a small grayscale image capture element, although there was no OCR used at this stage. At the same time the library as a whole had an interest in developing these areas. The library did not conduct a collection survey as part of its digitization planning process, but a condition survey was undertaken ten years ago. This survey is not used directly to establish priorities for digitization but it raised questions and helped develop experience and understanding that in turn informed digitization priorities. These priorities have not been formalized in a strategic policy statement but one is currently under development. The extent to which the whole institution and library staff participate in digitization differs between projects. The preservation and systems department are always involved and frequently the subject bibliographer.

One obstacle to planning the development of digital collections identified by the University of Chicago Library has been staffing. As areas of new expertise have been required more staff have needed to be recruited. The greatest obstacle to the process of building digital collections has been the question of what to select for digitization. This is why a policy for implementation and coordination is under development.

Intellectual property rights are the make or break criterion for the selection of materials to digitize as the Library has few resources to deal with this. Providing there are no intellectual property issues the materials teaching and learning potential, historical and cultural value and research significance all have a high priority. Enhanced access and improved functionality are secondary, but concomitant, priorities. Of particular significance is the notion that digitization adds something extra, for example in the case of microfilm. Research into digitization strategies and preservation also receive a high priority but on a large scale or collection level.

For the reformatting department these criteria have not changed over time, although the Library’s electronic reserve would save staff for other duties.

In developing its digitization program the University of Chicago Library has co-operated with archives, academic institutions and foundations and charities at local, national and international levels. From this experience, a recommendation for other institutions thinking of starting collaboration would be to fully specify everything in advance – for example, what each partner would contribute – while at the same time remaining flexible.

The Library’s digitization program began formally in 1994, although projects such as ARTFL had existed prior to this and there had been other informal activities. These digitization activities are ongoing with no anticipated end date.

The digital deliverables are created for the purpose of a teaching and learning resource, public access, wider access and preservation. The Library has also tried to anticipate demand rather than respond to it. The Library produces explicit statements of intent for digital proposals that cover the rationale, scope, significance, primary audience, long-term sustainability, level of faithfulness and suitability for different target audiences. Future developments include putting these statements on the web.

The type of source material digitized includes:

The predominant medium of the materials to be digitized is paper based, but the Library has also digitized 4,500 glass lantern slides. The digital deliverables represent both a sample of collections and entire bodies of material depending on the project. It is the intention that the digital deliverables can be re-purposed. The Library currently maintains a website for K-12.

The following standards, guidelines or tools are used for representing content:

The following standards, guidelines or tools are used for describing content:

Intermediaries (i.e. a second generation medium derived from the original) used for image digitization are:

The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy. These included the Cornell benchmark, UVA, Making of America and NARA. The Library modified these guidelines as required by adding more fields, for example local Dublin Core fields.

The following standards, guidelines or tools are used for representing structure:

In relation to standards in general and navigating between the ideal and the realistic, the University of Chicago Library suggests it is critical to look to the future and not to subvert standards for convenience.

The primary intended audience for the digital deliverables are the university and research community (four-year college and graduate school). However, there is also secondary interest in K-12 and lifelong learning. The Library is also interested in public, specialist, government and private sector users, as the material becomes available to them.

An evaluation of the K-12 audience has been undertaken as part of the eCUIP project. This involved participation by public school teachers. The Library’s digital projects have not formally acknowledged the W3C’s guidelines on web accessibility but have adopted a standard approach and the use of conservative HTML to keep our sites available to as wide an audience as possible. The University of Chicago retains rights on all digital deliverables and permission is required for re-use. The profile of University of Chicago users has been as anticipated, but there have been a higher than expected number of international users and a wider range of uses for the library’s environmental photographs collection.

 

7.2       Project Management and Planning

Internal advice was used for managing the digitization program. The management of the Library’s digitization program is integrated into the Library’s structure. In terms of preservation, this has meant a change in their type of activity rather than any fundamental re-organization. Developments in procedures have seen the establishment of the Digital Library Development Center to coordinate the Library’s digitization work as more staff have been devoted to digitization activities. Some of these activities have now been centralized, such as metadata and cataloging, while others such as web authoring are decentralized. The Library believes institutions may need to evolve new structures but consider a named project manager to be essential. Several new groups have been created to aid digital development in the Library.

Managerial quality assurance procedures include project timelines, reports and accountable structures.

The Library has carried out very small-scale pilot studies as part of grant writing for scheduling and technical feasibility. Because of the early undertaking of these pilot studies no significant changes have been made to project designs. The Library has found that it is other factors, such as technology that cause changes in projects. The Library has also undertaken some benchmarking studies for other groups, again as part of external grant writing. Who does what work on a project is determined by staff expertise and availability and on some occasions a new post is created.

So far all digitization has been carried out in-house, using equipment bought in for the purpose with the exception of a very small amount of specialized material (e.g. reel to reel). Which digitization process to adopt is determined by the needs of the original material. Equipment used in digitization includes Agfa Ultra Horizon and Epson 816 flatbed scanners and a Minolta PS3000 scanner with grayscale output. Guidelines for data capture procedures have been established. They are 300dpi 8-bit grayscale, 600dpi 8-bit grayscale and 300dpi 24-bit color. Benchmarks selected for image digitization are: Kodak grayscale Q13 and color separation guide, Kodak Q60 color input target and an Aimm scanner textchart No2.

 

7.3       Human Resources and Training

The Library employs two digital development co-directors (2 FT equivalent); full texts are now parsed by a vendor. Technical support and development staffs equate to 2 FT equivalent. Within the digitization unit the Library appointed one person for scanning and data input for the first project. They now have one FT manager of the digitization unit with an additional 1-3 FT equivalent students for capture and data input.

Advice on the technical aspects of digitization was available in-house. There are no formal methods for assessing training needs. Training areas that have been identified include:

The team members that have received training include:

Training has been organized using the following:

The training has met the needs of the project.

 

7.4       Project Life-Cycle Processes and Procedures

The Library is aware of the copyright position of the digital deliverables and owns the copyright in some of the original materials. Copyrighted material is digitized for electronic reserve but is only available on the web to enrolled students and is not intended for long-term preservation. Where material is in copyright, the owner has been paid a fee. The copyright status of the digital deliverable is declared in a condition of use statement on the web site. Users are allowed to make printouts of the digital deliverables on paper and download them. For non-e-reserve digital materials, users can order high resolution versions delivered on CD.

For textual material users can view:

For digital image material users can download and view:

For digital audio material users can listen to:

Minimal electronic management systems are in use.

The University of Chicago Library has a conservation procedure for original materials. Prior to digitization investigations are made into the binding (binding pressure), paper quality, size of material, operability and legibility. Conservation activities are built into the project workflow and include encasing in Mylar, disbinding, rebinding and boxing. A book’s structure is risk assessed prior to and during digitization and monitored by curatorial staff during digitization. Book cradles are used to minimize risk and sometimes material is rejected because of incompleteness, blurred pages or fragility. Once material is digitized there are no restrictions placed on access to the originals.

A full range of cataloging systems is in place before digitization, ranging from an inventory to a full electronic catalog. All the available information is used from these systems in the digitization process. Where a project has to locate additional reference material the key criterion is how much metadata would allow the material to be usable.

The Library has digitized from intermediaries, but the material often takes other forms. For example, sometimes only photocopies are used because the originals are too large and need to be reformatted.

Intermediaries used for image digitization are:

Original material is entered into the online catalog or a collection level record is created. If there is a record for the original collection material then one is created for the digital collection, but enhanced with, for example, a description for each image in a database.

The following standards or guidelines are used for cataloging the digital deliverables.

The Library has developed its own in-house database using Microsoft Access or MySql.

Tools used for controlling data values are:

Metadata details are recorded about:

The digitizer and an archivist/information professional or the digitizer who is also an archivist/information professional creates metadata records. Collection level metadata records for the digital deliverables are included in the main catalog and sometimes in a separate catalog and are available on the internet. The records in the two catalogs are mostly independent of each other, while some features are similar. The catalog and object are linked using the MARC 856 field.

 

7.5       Format, Resolution and Compression of Digitized Materials

The formats for retroconverted text-based digital deliverables are:

Some texts contained non-Latin scripts. OmniPage Pro OCR software is used to create the Library’s Centennial Catalog. As this process is ongoing, no reliable accuracy figures are currently available. No special treatment is carried out prior to OCR. Keying in is used for the Library’s eighteenth century manuscript full text projects. Full text materials acquired from vendors are parsed, indexed, and served through Philologic.

For image material TIFF and JPEG file formats are used for capture and preservation, while JPEG and GIF are used for delivery. PDF format is used for electronic reserve material. PSD (PhotoShop) format is used for oversize material, and the images are then stitched together. The capture and preservation resolution can be 300, 400 or 600dpi. Delivery resolution is 72, 300 or 400dpi. Capture, preservation and delivery bit depth is either 8 or 24 bit. Average file size for capture and preservation is the range of 1-80 MB. Average file size for delivery is 100KB to 5/700KB. JPEG image compression used is for delivery to improve access. The project retains the original scans in uncompressed form.

The dynamic range of equipment is checked using Kodak targets and the info tool in

PhotoShop for RGB values. The project carries out some post processing on images using Silver Fast and Epson Twain Pro for gamma adjustment and PhotoShop for de-skewing. Debabelizer Pro software is used to create access versions (GIF and JPEG).

For others starting work in digital imaging, the University of Chicago Library’s digital projects would recommend ensuring that all capture devices are calibrated to the same colorspace.

RealAudio file format is used for the capture and delivery of sound material. Sound material is streamed at several different rates for different connection speeds.

The quality control procedures in place for the digital deliverables, are a total check on each image and a random check of files written to CD. There are metadata quality control procedures that ensure correct entry into the MS Access database.

Users have open access to the Library catalog and digitized materials. Searching and browsing facilities are keyword and Boolean operators. Users do not have to pay to use the digital deliverables, but instead only pay for high quality TIFF images they order.

Potential users of the digital deliverables are informed about their availability through website announcements and local press releases.

 

7.6       Evaluation, Funding and Long-term Sustainability

The Library has not carried out any formal front-end, formative or summative user evaluations.

The main funding sources for the various digitization projects have been the NEH, Library of Congress/Ameritech, LSTA, Save Americas Treasures and the Women’s Board (of the University of Chicago Library). The Library is unsure if the use of standards has saved money but is convinced that it is the best long-term approach. Most funding organizations have monitored projects through regular reports. The Library of Congress/Ameritech were more involved and worked in detail about scheduling. Funding organizations have requested project management plans, cost models and workflow reports as parts of grants.

New material and metadata is added on an ongoing basis but no specific frequency is set. Metadata is changed as and when required and the user interface and file formats will possibly be changed. For archiving purposes the Library archives to CD and University tape backups, but there is no procedure for checking file integrity as yet. A long-term preservation strategy has not yet been written.

It is intended to keep the digital deliverables available indefinitely. The projects do not depend on self-generating funds for long-term sustainability. The Library does not have an exit strategy for its digital collections, however, it is agreed that the loss of the digital deliverables would matter.

 

7.7       Additional Comments

The Library believes that surveys such as this need to acquire information on the hidden costs of staffing and in-kind support that are common to the majority of institutional projects. This they feel is a critical point and that other projects or institutions looking to start digitization need that layer covered before they can begin. In addition, there is the importance of website design, while public service expertise is often needed for effective delivery of materials.

 




valid xhtml 1.1
abp~04/02