NINCH guide home interview table of contents previous interview next interview
Phil Michel, the Digital Conversion Specialist at the Prints and Photographs Division of the Library of Congress, was interviewed by HATII on September 19 2000. The Division holds more than 13.6 million images in its collections, prompting the creation of the Digital Library Group project, in order to develop a strategy for digitization. Preservation played a significant role in the decision to digitize the materials, while issues of public access, research and education were also considerable incentives.
The Prints and Photographs Division (P&P) developed its digitization strategy from an exploration of electronic surrogates that began with an optical disc pilot project in the 1980s. The P&P Division did not conduct a collection survey as part of planning its digitization strategy.
Although a collection survey was not used to establish priorities for digitization, the American Memory project for education had overlapping interests (saving, handling, etc.) so that the collection used met multiple criteria. This established objectives for serving an educational function (K-12 and college) through digitizing popular or hard to serve material.
The project does not have any formal selection criteria or prioritization.
To date the project has not co-operated with other organizations in developing its digitization program.
The current status of the project is ongoing. The NDL pilot ends in 2000 but is ongoing with no anticipated end date. The project started in the 1980s, the American Memory pilot began in 1990, and the NDL pilot in 1995.
The purposes in creating the digital deliverables include:
The project did not feel, however, that it could rank these aims in an order of priority. The program produced an informative statement of intent, which was explicit about its rationale.
The type of source material digitized includes:
Across the whole project the predominant nature and format of the materials digitized consists of negatives. The digital deliverables are intended to represent the entire body of material. There is no specific intention to re-purpose digital deliverables.
The following standards, guidelines or tools are used for representing content:
The following standards, guidelines or tools are used for describing content:
The following standards, guidelines or tools are used for controlling data values:
The program looked at other projects as well as existing guidelines for digitizing particular document types when planning its digitization strategy.
The following standards, guidelines or tools are used for representing structure:
In relation to standards in general the project recommends a system of adopting, modifying and emulating.
The intended audience for digital deliverables includes all of the teaching and learning group:
However, the project also feels it is serving other groups of the general public, subject specialists, and the government and private sector, and could not give them a prioritization ranking. The profile of actual users has been as expected.
The project undertook an evaluation of the target audiences, see: http://memory.loc.gov/ammem/usereval.html
The interviewee was unsure if the project had taken account of W3C’s “Guidelines for Web Site Accessibility” on users with disabilities. As far as possible, the project determines any limitations to the use of the digital deliverables but ultimately it is the responsibility of the user. The most common form of limitation is copyright, which requires a letter of permission.
Advice on managing the program has come from a mixture of external and in-house sources but not on a systematic basis. The management of the project is becoming more and more integrated into the P&P Division. The project has led to changes in organizational procedures as policies for serving collections have altered and new positions have been created. Less formal project management procedures are in place as part of the Division. No particular project management procedure has failed; the project has felt its way through, been flexible and altered its workflow accordingly. The quality assurance procedures have been “find and fix” and developed on an incremental, maintenance basis in the same way as traditional collection management.
Neither pilot nor feasibility studies have been undertaken. It is not known if any benchmarking studies were undertaken. Gantt charts are used to track pictorial scanning. The project timing, and hence allocation of work, is geared to the contractors’ start and work is “ramped up” from there.
Approximately 90% of digitization is outsourced but an increasing amount is done in-house. Outsourcing was chosen for reasons of efficiency, expertise and the requirement of high volume in a short space of time.
The project employs one part time (PT) director, two catalogers (PT) and five technical support staff (100%). The director and catalogers have library backgrounds, whilst most of the technical support staff have come from NDL. No staff member was redeployed from other areas. Advice on the technical aspects of digitization was a combination of external and in-house.
The training needs of the project team were assessed in an ad hoc fashion and were identified as the technical aspects of digitization and post-digitizing processes. All team members engaged in training and this was organized through in-house consultants, external consultants, external courses, independent study and learning on the job.
The project is aware of the copyright position of the digital deliverables but does not own the copyright of the original materials. The copyright or rights status of the final digital deliverable is declared in a copyright statement at the collection level, and this has been effective. The material digitized in copyright was done with the owners’ agreement.
Users of the digital deliverables are allowed to make printouts on paper and film, burn to CD, DVD etc., and download to a PC, LAN or WAN. For some of the NDL collections, users can download and view thumbnails, lower quality images and highest quality (up to TIFF) images. For the P&P collection users can view and download thumbnails but lower and higher quality images are only available internally for some of the collections (on-site). No electronic management systems, such as watermarking, are in use.
The project has a conservation procedure for the original materials (part of P&P duties) and material is investigated prior to digitization. If necessary, conservation staff undertake conservation activities, and no materials are modified, degraded or compromised to carry out digitization. Risk prevention methods for the material in preparation for digitization include safe handling practice from conservation to the contractor and close attention to the work environment. Steps to minimize risks are built into the contractor plan. The materials are stabilized by curatorial or preservation staffs prior to digitization but not monitored by them during digitization. After digitization, access to some of the original materials is restricated; the digital surrogate is seen as the primary source for users.
Some cataloging systems are in place prior to digitization (such as lists or inventories) and any information available from these is used in the digitization process. Although the project does not have access to all the relevant cataloging information, staff know at least the quantities and material types, and the digitization process sometimes feeds back into the catalog record — e.g. negative size. Some core source material may be located but this is unrelated to the digitization process. The only way in which material may be altered for digitization is sometimes to lift sleeves, Mylar or tissue. No material is rejected prior to digitization. The project sometimes uses intermediaries, where these already exist (such as transparencies of posters, slides, 35mm, 4x5 transparencies, photographic prints).
The original material is cataloged in MARC and EAD records. The digital surrogates are referenced by bibliographic object using MARC, EAD and USMARC standards. Tools for controlling data values are controlled vocabulary (MS name rules), thesaurus (Thesaurus for Graphical Material), and a classification scheme. The metadata details recorded include the original object, the digital object, the digitization process, technical details and administrative information.
All members of staff create different parts of the metadata records. The catalog for the digital deliverables is held in paper and electronic form on an intranet server and technical details about the project are also available on the web: http://www.memory.loc.gov/ammem/techdocs/white.html.
For images, TIFF file format is used for capturing, preserving and delivery, while GIF and JPEG are additional delivery formats. Capture, preservation and delivery resolutions range from 50-3500dpi. The original bit depth used for capture, preservation and delivery was 8 but is now 12 to 16. JPEG compression is used for delivery only. TIFF Gp IV compression is used for best copies of texts and architectural drawings (bitonal images).
The project retains images in uncompressed form and only holds compressed bi-tonal images. Post processing operations are limited to skew and gamma adjustment on negatives as the project has a strong emphasis on full capture across the range of material right up to 200MB master files. The dynamic range of scanning equipment is checked by the vendor and was a factor in selecting it. From its experience of digitizing images the project would recommend benchmarking, doing tests, taking time at startup and not taking specifications at face value.
The quality control procedure in place for the digital deliverables takes the form of a quality assurance plan from the vendor (who carries out total checks). The project also does spot checks on the TIFF headers. The vendor also has a few systems for metadata quality control, but these were not specified per se by the project. Quality control procedures have had a direct effect on the management of production workflow. There are now more direct look-in sessions and smaller batches to the vendor.
Users do not have to pay for the use of the digital deliverables and instead have open access to catalog plus the materials. Standard web browsing is offered which also links to the catalog. The system does allow searching of metadata but in a limited way. Special software used by the project includes inquiry software and a relevance ranking search engine. These required extra programming on top to control the display.
Usage levels are monitored by information technology services (ITS) to which the NDL has access.
Potential users of the digital deliverables are informed about their availability through website announcements, press releases, articles in print media, print and broadcast media coverage, conferences, meetings and electronic and conventional mail shots. The project does not know which medium has been the most effective.
No front-end or formative evaluation has been undertaken. A summative evaluation has been undertaken.
The project is not aware of how much it has cost to date. The main source of funding has been the NDL, which has received gift funds from various donors, with a little from the LC. In relation to how much the project should have cost, the project feels it has been the right size, pace and scale for its funds. The project was less sophisticated five years ago; if it had its funding over again it would have higher productivity now. The project believes that standards have saved money. It does not know how the funding organization monitors the project or if it asked the project to provide collections surveys, cost models, project management plans, etc.
Under the NDL, new material (and its associated metadata) is updated two or three times a month. Metadata are changed weekly and the user interface is not updated systematically.
The project has a preservation strategy under development based on file format, storage media, storage conditions and updating metadata. Quality control procedures are in place for life-cycle management but the details are not known.
It is intended for the digital deliverables to be available indefinitely, since the project does not rely on self-generated funds in order to sustain the resource. The project does not have an exit strategy.