NINCH guide home        interview table of contents        previous interview        next interview

 

12   Library of Congress / National Digital Library Program (NDLP)

 

On January 22, 23 and 24 2001, HATII interviewed the following members of the National Digital Library Program at the Library of Congress: Martha Anderson, Project Manager; Deborah Thomas, Digital Conversion Specialist; Tamara Swora-Gobel, Digital Conversion Projects Co-ordinator; Karen Lund, Digital Conversion Specialist; Marc Dudley, Computer Specialist; Thomas Bramel and Mary Ambrosio, Systems Analysts; and Dave Woodward, Computer Specialist. The core objective of the NDLP was to digitize the five million objects within the Library’s collection of Americana, which was successfully achieved in September 2000. The American Memory project served as a pilot study, leading to improvements and an increased emphasis on standardization for future initiatives.

 

12.1    Organizational Digitization Program and Policy

The National Digital Library Program (NDLP) is a flexible newer unit of the Library of Congress that was set up specifically for the creation of this program over a five-year period (1996-2000) with the financial support of the Congress and private funding. In September 2000 it achieved its target (set in a rather arbitrary fashion at the program’s initiation by the Librarian and the political/funding influences without real input from the NDLP team) of digitizing 5 million objects. The program is now moving towards a second phase when a lot of discussions are taking place about its future, integration with the other existing departments of the LC, funding and modus operandi. It is anticipated that the new phase will be an ongoing process and service, without an end date.

12.1.1   Collection Survey

Neither the LC nor the NDLP has a formal digitization strategy as such, but several activities led to the formulation of clearer objectives and digitization working practices. They carried out an optical disc pilot in the early 1990s that sought to capture the primary photographic collection in digital format. Then in 1995-6 there were discussions about the distribution of digital material. They began with what was possible for technical reasons (e.g. photographs), what was free from copyright, without any legal restrictions, and depending on technical feasibility, were influenced by the Librarian’s and the custodians’ desire to provide access to unique materials. There was some consideration of what was heavily used in analog form, but this was not the primary motivation.

A collection survey was carried out, in the process of planning the digitization program on American history and culture. The ten custodial divisions were interviewed and 200 collections were selected. The type of questions asked in the survey were:

They also looked at educational recommendations, and material that was in high demand by researchers and the public. The institution as a whole was made aware that the collection survey was being conducted and staff were encouraged to participate. However, it seems that it was left to the individual custodial divisions to organize who would participate and the nature of their contribution. Three or four people in the NDL Task Force team carried out the collection survey. The survey produced good results, providing a body of work that the team could build on. There was some conflict with the custodial divisions, which were hoping that they would have more control of the prioritization and selection of the material. In some cases they thought that it would be a more scholarly exercise. Also, private funders sometimes pushed collections up the list because of their own agenda. There is more information available now about what is technically feasible and how material could be used, than was available when they started.

12.1.2   Priorities for Digitizing

The collection survey was used to establish priorities for digitizing holdings. As was indicated, these were established by the custodial divisions in conjunction with the NDL staff, with some influence from funders and their interests and priorities, and from practical aspects, such as the scheduling required by conservation. The NDL staff in conjunction with the custodial divisions would put together suggestions that were approved by the rest of the team, for example in their two- and three-year plans. These priorities were not formalized in a strategic policy statement, since the documents prepared by the NDLP were more production-oriented. There was goal-setting every year, rather than an overall strategic policy statement. It seems that the high stakes and the pressure to achieve the targets specified the documents required to facilitate production and project management, rather than more general policy documents.

The objectives of the program were primarily to produce a large database of Americana, but also to provide access to the material (not preservation), and finally to help preserve the digital product. The program has been successful in achieving all of these.

The NDL project manager would recommend to other organizations attempting to formalize their selection criteria in strategic policy statements and embarking in digitization projects in general, having a clear goal of the volume of material and work from the beginning. In this case, the fact that they knew how much material they had to digitize and within what time frame, helped to motivate the team, cut short unnecessarily lengthy discussion, moved the whole team in one direction, and helped them to focus on content. Other projects often fail when they try to do too much. It is important to agree about a program with a specific goal, rather than leaving it to discussion.

12.1.3   Overall Obstacles

In planning the development of digital deliverables, it was difficult getting to the specific details from the start. Another obstacle was developing a unified perspective for how the work should progress. With a project of this size and type, accommodating diversity of media types was also difficult. For the motion pictures, the limited amount that existed in the public domain also proved problematic. For folk tunes, the nature of the material, the fact that there are often more than one titles for one tune and several performances posed difficulties in planning.

In the process of actually building digital deliverables from the collections, the main problem was dealing with the technology (e.g. capture devices), getting enough storage, and trying to forecast accurately. Another challenge was blending the different collections into a coherent whole. The level of cataloging and the filenaming used by the various departments and staff varied, so it was often difficult for the technical team to link the descriptive records with the digital files. For example, the archivists related more to the physical location of the material, rather than use naming conventions that were consistent or appropriate to the digital world. Finally, finding physical space to accommodate the new workforce was problematic.

12.1.4   Selection and Prioritization of Materials

In both the selection and prioritization of materials for digitization the main criteria in order of importance were:

These did not really change over time, apart from a few cases where there was pressure from outside sources, e.g. funders. So, in some cases, although these were the criteria, the practice was different. In the case of conservation, they generally tried to ensure that the treatment of the material that would be required would not be too costly or time-consuming to impede progress. They tried to select collections that would step through the workflow efficiently, since they had a target amount to reach. With the curatorial divisions more involved in the digital program in the future, it is likely that research significance will acquire more importance as a selection and prioritization criterion. Staff in large research libraries usually have their favorite collections which tend to surface as selections even when they are not the best or easiest to be converted.

12.1.5   Co-operation

Although the main activities related to the program were carried out by the in-house NDL team, there was wide-ranging co-operation with other organizations, including archives, other libraries, museums, academic institutions, corporations (for funding), foundations and charities (as donors), government agencies, and historical societies. These spread geographically across the whole range from local to international.

This experience and the sharing of information and data with other institutions reinforced their belief in the general rule of interoperability. Another important rule was to try not to bring extra work to the house, to assess where the work should be done and by whom. Another recommendation would be to be careful about managing the expectation of donors.

12.1.6   Purpose of the Program

The main purpose for the creation of the digital deliverables was public access, provision of a teaching and learning resource, preservation, and research. The program produced a public affairs announcement and also included information on its website that made explicit the rationale, the scope, and the primary audience of the program. There is also a mission statement on every collections page on the website.

 

Revenue generation was never intended, except to intrigue donors and raise consciousness. Other areas of the LC, e.g. duplication services, will probably generate revenue in the future and respond to user demand. In the future they might digitize new material in answer to customer demand. They will need to explore these issues in the future – how this could be done, by whom, and whether any fees would be incurred or not. This is all presently in a state of transition.

12.1.7   Nature of the Source Materials and the Impact of Digital Deliverables

The program digitized a very wide range of source material with varied format and nature:

The materials selected for digitization were both the entire body of some collections, as well as a representative sampling. They initially intended to do much more of the former, but in practice ended up doing samplings more frequently.

12.1.8   Interoperability

For representing content they used TEI, JPEG, MPEG, and SGML. They are currently exploring the use of XML, but have not applied it yet. Only one Ameritech collection came with XML markup. For describing content they used MARC, Dublin Core, EAD, and TEI Header. Controlling data values was the area that presented most difficulties. Although they used Library of Congress subject headings, Library of Congress subject thesaurus, genre headings for films, name authority files and the Thesaurus of Graphical Material, none of these were enforced. There was not as much consistency in this area as they would have preferred. The differences in the descriptive practices among the contributors proved to be a significant and common problem. For moving images, they tried to conform with the film division of the LC and what the Library as a whole used, but it was very difficult to catalog film using MARC. Nevertheless, it is important to keep some common guidelines. Whenever the cataloging deviated from MARC, it was more difficult for the technical staff to integrate in the system.

For a public institution that tries not to incur any costs to the end user and has a goal of reaching as many people as possible, it is even more important to stay away from proprietary formats and use commonly agreed standards. It is important to try to provide easy access to the user. Also, it is best to try to set standards at the beginning, so there is no need to go back and fix things retrospectively.

12.1.9   Target Audience

The primary intended audience for the digital deliverables was K-12. Although the program was also relevant to and could be used by community and four-year colleges, graduate schools, lifelong, distance, and computer-mediated learning, public library and archive users, these were all secondary audiences.

There was no evaluation of the targeted audience, although the existing experience of the LC and feedback from User Services was very useful. The Learning and Visitors Center gives them a sense of who is looking at the digitized materials. They also get useful feedback at conferences.

They tried to acknowledge the needs of those with disabilities. For example, in the presentation and design of the user interface they did not use frames. It was not always easy to accommodate disabled audiences in relation to content, e.g. silent films.

Where there were limitations as to use of the digital deliverables, the reasons were clearly stated on the website (e.g. intellectual rights, material only for private use).

The profile of actual users was the same as the one anticipated, although there is more general public use than educational. There are also many foreign users – they reach a truly worldwide audience.

 

12.2    Project Management and Planning

Project management is conducted in-house. There is a Management Team and regular meetings of the project leaders. The project director is an accountant who was very successful in raising private funding and was not closely involved in the digitization process and production.

The NDL team is very different from other departments in the LC. Although the normal federal regulations apply, the program was “fast-tracked” and there is a certain distance from the other departments that will probably change in the future. Their structure is much more even and flexible - less hierarchical. They are very team and project-based and less bureaucratic. Some of the teams that work around specific tasks are very small. They have to work with the web design team, the programs division people, and often fit with other people’s schedules. Unfortunately, although they would have liked to be able to report that the NDLP led to changes or re-evaluation of organizational relationships and procedures in the rest of the LC, this has not really happened. This is currently an issue, with all the discussions about the next phase of the program.

The initially tried to create a central Technical Liaison Production Team to provide support to all content teams, but it seems that content teams actually preferred to do this themselves, and so the idea was abandoned.

In order to ensure quality assurance, they developed guidelines for work with vendors, an editorial handbook about the quality of documents on the web, and have put a great deal of material on the intranet with resources, examples, and models. This has proved valuable and grew out of the need to make resources available.

They did not carry out a feasibility or pilot study for project management purposes. The American Memory pilot provided the core framework, but was not conceived as a project management and planning study. That pilot did lead to changes, after observing the effects on the workflow. There was increased emphasis on standardizing the relationship with the vendors. For example, they developed a database about what was received from vendors and when it was delivered. Common quality document guidelines were also developed. The American Memory Pilot included three films and was useful in order to see what could be done. The videotape was sent to 3-4 different contractors and helped them to assess turnaround time and quality. Once they decided what worked best, they stuck to it. No specific time and motion or other benchmarking study was carried out at the start of the program. They developed the project team organization based on custodial divisions. (Later they developed a core support activity team.) This was the result of experience, wanting to make progress on numbers. Job descriptions were initially very general, but have become more specific, although a great degree of flexibility is still required. Performance indicators were not produced, as the structure of the LC does not encourage their use.

Digitization was carried out in-house, however, a greater amount was outsourced. The rapid change of technology led to a preference for outsourcing, but the maps are digitized in-house as the equipment was donated. They felt that it was not in their interests and goals to learn about the technology, so they outsourced wherever they could. Generally, the technology available outside is more cutting-edge and more standardized. However, for some materials such as audio digitizing, in-house provided better quality control. For much of the folk audio material it was important to provide a faithful digital representation of the original archival material, whereas some professional houses often “clean” the sound, deleting for example the sound of the dog barking in the background of the singer’s performance. Also, private companies use various staff members to process material, so sometimes the initial instructions are lost from one operator to the other. For these reasons, it was often more efficient to digitize in-house rather than invest time and resources reviewing the material that was outsourced.

 

Technologies Used for Image Digitization

Manufacturers

Flatbed scanners

Agfa, Tangent, UMAX, Pollenex, Kurtzweil

Film scanners

SunRise

Digital cameras

JJT

High-end professional cameras

Phase One

Other

Minolta Book-key Open-book scanner, satellite scanner

 

Guidelines for data capture procedures (e.g. calibration of equipment, handling dust, etc.) are part of the request for proposal (RFP) to vendors. They use grayscales and color charts as benchmarks.

 

12.3    Human Resources and Training

The number of people who worked on the program (not counting outside contractors) were:

 

Type of Staff

Number

% of time on the project

Director

1

100

Metadata specialist (1 per team)

10

50

Curator

10-20

5-10

Digitizer – in maps only

4

50

Photographer

-

 

Technical support staff

1

100

Technical development staff

2

75

Education specialist

10-11

100

Evaluation specialist – quality review

40-50

25-50

Digital Conversion Specialists (image management and transformation, collation, preparation of material)

40-50

40-50

10

25-50

 

The background and profile of most people on the team was that of the “technical humanist” – usually a degree and good background in humanities with high level of IT literacy. The team was comprised of approximately 25% trained librarians, 25% specialists (e.g. in folk life), 25% historians, and for 25% this was their first professional job. A generalist/specialist is needed for these projects. The post requires flexibility and ability to see the big picture, knowledge of the cultural picture, with ability to focus on technical issues. The work involves a combination of mundane and creative activities. Work with interesting collections is important for job satisfaction. They have much more flexible procedures than the rest of the LC, and team members try out different tasks. Very few staff were re-deployed from other areas. The NDL team is seen as very talented, committed, technically aware and responsive.

Although advice was available in-house about technical aspects of digitization, they also used external advice. The training needs of the project team were assessed on the fly. Areas where training was needed were project management, application of technical standards, preparation and handling of materials for digitization, and post digitization processes such as editing. Until now training on preparation and handling of materials was carried out on the job, but they are about to embark on a more organized way of doing this. Training was received by curatorial staff, specialist technical staff and digital conversion specialists. This was organized in a variety of ways, in-house, using project staff, the Library’s own consultants, and external consultants, by attending external courses, with independent study, and a great deal by learning on the job.

The general approach is to try to identify the core basic level of training needed and offer it to everyone, then to encourage staff to fill extra needs as necessary. It is important to hire highly skilled people with curiosity and initiative to learn in the first place.

 

12.4    Project Life-Cycle Processes and Procedures

12.4.1   Reproduction and Copyright

The organization is aware of the copyright position or other rights status of the digital deliverables that the program has created. They do not own the copyright to the original materials. There is a huge variety in copyright status in American Memory materials. Writers, photographers, publishers, companies, and individuals are represented. Many unidentifiable creators such as authors of older, unpublished letters are represented. The vast majority of the materials are, however, in the public domain as works of the US government or because of their age and/or publication history. As a policy matter the organization has not declared the copyright or other rights status of the final digital deliverables, particularly where the underlying work is in the public domain. As a principle, they are trying to maximize access wherever possible. They certainly recognize the considerable skill and professional judgment required to make high quality scanned images but do not treat those “plain” scans as new copyrightable works. As an aside, if the underlying work were subject to copyright, the scan would be a reproduction or a derivative work and would require the permission of the original copyright owner to exploit in any manner beyond their Library functions (i.e. for revenue generation).

When material in copyright was digitized, this was done in a variety of ways:

Copyright issues are more complicated when dealing with folk material. Although there are no performance rights in federal copyright law, the Folk Center of the LC believed that they had to try to find the families of the performers.

Users of the digital deliverables can copy the material in multiple ways at their own terminals as long as they have the equipment and facilities, since there is no control over that. Users must make their own assessment of the legality of particular works in the context of their intended use. The Program provides no warranty that everything on the site is public domain or “rights free” – it could not make nearly the volume of material available if that threshold was used. Users coming to the American Memory site are still in a “Library space” and must act accordingly. (See the Copyright and Other Restrictions Notice on the website: http://memory.loc.gov/ammem/copyrit2.html). For example, special exemptions in the Copyright Act for libraries on which they rely may not extend to a third party’s use. Users must make that evaluation for themselves. The NDLP team provides as much information as possible where rights or permissions are obtained. As a practical matter, they strategically sought for American Memory to maximize the amount of public domain content to facilitate the broadest possible use. Other projects may not have the same focus. As they begin to grapple with preservation and access for “born digital” materials and other copyrighted content, they will probably consider using copyright management systems to assure copyright owners of maximal security. They strive to balance this with appropriate access over time. They also need to be able to “collect” materials subject to other electronic management systems, which is another layer on the access, preservation, migration chain of issues.

Users of American Memory can download a variety of digital deliverables. For texts, there are ASCII text files and TEI DTD marked-up text, while they are considering XML for the future. For digital images, users can download thumbnails, lower and highest quality images and associated documents. For digital audio, users can download samples of less than 30 seconds, full length compressed and high fidelity sound files, as well as associated documents. For digital moving images, there are samples, lower (QuickTime) and highest quality (MPEG) digital video clips, RealMedia streaming files, as well as associated documents. No electronic management systems are used to control copying. They have experimented with MrSid format, but have not used it extensively. In a few cases they do not provide the TIFF image to control copying or use the user’s IP address to restrict access to some prints and photographs to LC machines.

12.4.2   Preservation/Conservation

The NDLP team worked with Conservation for assessment of how handling would affect materials. The conservationists assessed the material and took steps to minimize risk. They were involved in all the RFPs and contracts, provided tips for handling during scanning, such as on the use of cradles, and gave specific instructions per type of material or collection. They were also consulted on the space and the environment, equipment and hardware. (More information can be found in “Conservation Implications of Digitization Projects”, a paper prepared by the Library of Congress National Digital Library Program and the Conservation Division in 1999 - http://memory.loc.gov/ammem/techdocs/conserv83199a.html.)

Images of original paper prints had to be transferred to 35mm film in the lab prior to digitization, although this was done more to improve access rather than for conservation reasons. Some strengthening of material was required prior to scanning and some after scanning, in the case of disbound materials. One of the most common dangers identified during the preparation or digitization process was damage to the binding and potential chipping. Very few bound volumes could be inverted and required face-up scanning. They used the Minolta camera for this, but lacked the option for scanning in grayscale. As a result a long queue has been created and the printed ephemera, for example, are still not completed. The materials were prepared by curatorial staff before digitization. There were always members of staff with the material at the time of scanning also for security reasons, which led to significant staff drain. They have no plans to restrict access to originals once the material has been digitized, although the Prints and Photographs division might consider it for some material.

12.4.3   Preparation and Sources

The cataloging and reference systems used at the Library are the LC catalog, MARC, and finding aids. Where available, all relevant information from these systems was used in digitization. For most materials, however, there was not enough information available (e.g. for manuscripts). The NDL team had access to all the relevant materials (e.g. cataloging information) for the digital deliverables collected before the digitization process, but had to locate core reference or source materials where information was missing. This required a great deal of work - about 60% of the effort. For example, for folk audio material information was often retrieved from finding aids, lists, data on record jackets, cards written by volunteers, and field notes.

Some of the material had to be altered from the original form for the digitization process, e.g. disbinding, removal from frames, but not much. They tried to co-ordinate with conservation and digitize material that was going to be disbound or altered anyway. In a few cases they rejected material before digital imaging, mainly due to the physical circumstance of microfilm (e.g. major variation of density in the same micro-image, micro-images of inappropriate density) and conservation reasons, e.g. brittle books. They were surprised by how many of the problems the vendors were able to overcome. They digitized originals, as well as using reproductions and intermediaries. For images they used photocopies, 35 mm slides or 4 * 5 transparencies, photographic prints and microfilm. Most audio is 1-6 minutes long. The form of intermediaries used for audio included variable speed discs, audio cassettes and wax cylinders that were converted to DAT tapes and ¾ inch U-matic videotapes. The intermediary used for moving images was Betacam SP videotape. From 35 mm film they would transfer to 16 mm film and then copy to Betacam SP tape. For some films, errors in the printing process were corrected (e.g. double imaging, upside down images that were not faithful to the original). This was documented in the “About the Collection” page.

12.4.4   Metadata

Each custodial division is responsible for cataloging the original (physical) materials using MARC in most cases. For digital surrogates, the NDL staff add the digital ID on the MARC record or create an Access database record for the manuscripts. They also used Dublin Core (Ameritech), EAD, and TEI Header for cataloging the digital deliverables. For controlling data values they used controlled vocabularies, thesauri, and classification schemes, such as the Thesaurus of Graphical Material (TGM), the LC Subject Headings, and Name Authorities. The metadata recorded information about the original object, the digital object, and they are currently in the process of adding information about the digitization process. Until now, information about the digitization process can be found on the website, but not in the individual records. In some cases, they also record staffing details. The metadata records were created by a range of staff, including digitizers, archivists/information professionals, digitizers who were also archivists/information professionals, or the digital conversion specialists. The metadata records for the digital deliverables are kept in a separate catalog, available in paper form and in electronic form on an Intranet server. The records for digital deliverables and the original digitized materials were in some cases the same, while in others independent of each other. The catalog and the objects are linked through the A56 field.

12.4.5   Format, Resolution and Compression of Digitized Materials

Technical notes available by type of material can be found at http://memory.loc.gov/ammem/dli2/

12.4.6   Text

The formats chosen for retroconverted text-based digital deliverables were HTML and SGML markup. They use OmniPack to convert SGML to HTML automatically. They have just started using OCR software (TextBridge) to convert the digital images. This was used for slave narratives with dialect, so the level of accuracy was low, at about 75%. It was still sufficient for their purpose, however, which was to be able to retrieve names of people and places through enhanced searching. This was achieved in a much more cost-effective way than keying in. Their experience shows that you need to be clear what you want from the documents. Re-keying in the case of the slave narratives would have been much more expensive with the same search result. They have also used “keying in” which is expensive. It showed a need to simplify the DTD and reduce its requirements.

12.4.7   Images

The file formats used for images were GIF for delivering, TIFF for capturing, preserving and delivering, and JPEG for capturing and delivering. The resolution used is 300-400dpi for capturing and preserving and 72 or 300 for delivering images. They used 1 to 36 bit-depth for capturing and preserving, and 1 to 24 for delivering. JPEG compression was used at capturing and delivering stage while LZW was used by vendors. The aim of the compression was to improve access. The original scans are retained in uncompressed form. Some post-scanning processing is carried out on the images, mainly scaling and producing deliverables, for which they use Image Alchemy and Photoshop. The average file sizes created were 50-300 MB at all stages - capturing, preserving, and delivery. They did not check the dynamic range of the scanning equipment in-house, but the vendors were expected to do so using targets.

Their experience showed that you should try to capture at the highest quality you can afford. It is best to let the vendor do the job and judge only on the output. It is best to specify what you want and let them decide how to produce it – bear in mind that it is not photocopying and the rate of progress may be slow.

12.4.8   Sound

For the digitization of sound materials they usually convert from the original format to DAT tape. They make two DAT tapes and one ¾ inch U-matic videotape for safety. This process was initially outsourced, but is now carried out in-house. WAVE (.wav) files are derived by sampling from the DAT at 16 bits, 22,500 times per second (22.05 kHz and 44.1 kHz sampling rate used). The WAVE files are then converted in batches to RealAudio and MP3 file formats (MPEG 2 Layer 3 that is a lossy compression) for delivery. The aim of the compression was to improve access and enhance usability. Most audio files vary in length between 1 and 6 minutes. The file sizes created were approximately 2.6 MB per minute. In some cases post-capture editing was carried out in order to remove obvious pops and clicks.

12.4.9   Moving Image

The work methods used to digitize moving images is as follows. Films are copied to Betacam SP videotape at a film production house, which creates the digital tags that are then sent to vendors. The vendors then deliver files on CD-ROM. The NDLP team takes these and creates catalog records on which the web presentation is based. They experimented with AVI in the beginning, but did not use it. They use MPEG (version 1), QuickTime, and RealVideo for capture and delivery. The MPEG version is created at 30 frames per second and at a spatial resolution (size) of 320x240 pixels at a data rate of approximately 1.2 Mbits per second of playing time. The QuickTime version is created at 10-15 frames per second and a spatial resolution (size) of 160x120 pixels at a data rate of approximately 640 Kbits per second, usually quoted as 80 Kbytes/sec of playing time. They have now started also delivering a streaming RealVideo version. QuickTime is used with the Cinepak compression algorithm, with some files using the Sorenson compression. The reasons for using the compression were, in order of priority, to improve access, to enhance usability, to decrease storage requirements, and finally to reduce cost. The average file sizes created were about 10 MB/min for the MPEGs, and 5 MB or less for QuickTime. The post-capture processing carried out on the moving images was mainly the conversion from QuickTime to RealVideo.

12.4.10                  Quality Control

They used a variety of procedures for ensuring the quality of the digital deliverables: spot checks, random set of checks, check on a stratified random sample, percentage checks on carriers (e.g. corrupted disks), total check. For film material, they review all the files created by vendors, which is a very time consuming, but nonetheless vital step in the process. In a few instances they had to return material to the vendor, since the quality standards they demand are very high. The text presented about the films is approved by the film division and the NDL project manager. Recently they have hired specialized editors on the content, which has led to a more consistent and formalized presentation style. They have a receipt and review unit that runs all CD-Rs through diagnostic software to make sure that certain parameters are kept. It is very important to keep statistics on the material received, information on vendors and the quality produced. They have a three-week limit for quality review or return of batch for paper or microfilm.

There is no formal way of checking quality of metadata recording; the best way is to see whether it works or not. In general the control of metadata varied more than the checkup of the actual digital files. There was a lot of trial and error. Some of the automated procedures that the technical team has put into place pick up mistakes from the other departments. For audio material, subject specialists review all the records before they are released for the public. This process depends on scholars and their own timetables. Generally speaking, they found that for every minute of audio, ten minutes’ work was required after the digitization was completed to find the file and review it. Quality control procedures affected workflow and policy decisions, which involved coaching new staff, revisiting collections, establishing policy on handling vendors and contractual obligations.

12.4.11                  Delivery, Access and Use of Digital Deliverables

Users have free open access to both the catalog and the digital deliverables for browsing, but not for manipulating in any way. The level of use of the material is very high and there is a steady increase every year. This is monitored via automatic data capture (http://www.loc.gov/stats/), although there is not much analysis. They record the browsers and platforms used. This is generally motivated by the Public Affairs Department. They also respond to queries and problem reports.

The system supports subject searching by keyword, also browsing by title, author, music performer, film title, genre, venue of jazz clubs, list of tribes. When new data is entered, these lists must be re-run. Some browse lists are actually search results, but look like browsers. Metadata fields are fed into the index, not just the subject field. Some collections allow search in all the core fields, while some others support full free text searching of e-texts. Some researchers have asked for full-text searching throughout the whole material, across collections. Although this is possible, it is not practical. There was experimentation with the Learning Center for tools to allow users to manipulate the collections, for example to create lesson plans or with page viewers for young children, but this has not been followed up.

12.4.12                  Dissemination and Publicity

The methods used for informing potential users about the digital deliverables created were: announcement on website; press release; articles in print media; some print and broadcast media coverage; announcements at conferences and meetings; announcement at electronic listservs. There is no systematic way of evaluating the effectiveness of the various dissemination and publicity strategies.

 

12.5    Evaluation, Funding and Long-term Sustainability

Until now, very little evaluation has been done, since the emphasis was on production and creation of content.

An evaluation survey on the use of American Memory during the period 1991-93 was carried out in 1993. This was published on the web (ftp://rs7.loc.gov/pub/american.memory/user.eval/intro.txt). It included more than 40 libraries in institutions such as schools, colleges, universities, special, public, and state libraries and used 1,800 user questionnaires, 120 user interviews, and site visits. That survey clinched the argument about addressing a K-12 audience. An educational contractor was hired for Children in Technology to assess the subject areas, the content to be used, and how these related to the Curriculum, but this did not lead to any changes. Although an online study was carried out several years ago, there were methodological limitations, which resulted in a demographically limited return. LC is just embarking on a general survey that will include the NDLP.

Generally, informal feedback from the educational and user services was useful. For example, feedback from teachers about the platform most commonly used in schools led to the use of QuickTime for Macintosh computers. They also offered feedback about the presentation of material, as well as about broad themes that they would find useful. The moving images team also noted, that whenever they offered more contextual information about the collections, they always received very positive feedback. Subsequently, whenever they can, they add some contextual material, such as essays, interviews with Edison, articles from journals of the time, as sometimes “replicating the Library experience” and letting users make sense of it themselves can prove shortsighted. Even though they did not embark on extensive formal evaluation programs due to production pressures, it is always interesting for the NDLP staff to get feedback from the public, otherwise they feel that they operate in a kind of vacuum.

Informal observation and queries from the public indicate the high level of use, the educational impact, the varied audience and the effect of the resource. For example, the BBC made a program on “Voices from the Dust Bowl”, based on the American Memory material from the migrant worker collection, documenting the everyday life of residents of Farm Security Administration (FSA) migrant work camps in central California in 1940 and 1941. The Omaha Indians use the resources to learn the language at their schools (http://memory.loc.gov/ammem/omhhtml/omhhome.html). It has also received a lot of web awards, such as the Encyclopaedia Britannica’s Best of the Web and the Top 100 of 1999 of the Education Source http://www.loc.gov/loc/awards.html.

The program cost $60 million in its first five-year phase (1996-2000). $15 million came from the appropriation of Congress funds, while the remaining $45 million was raised by the LC from private donations (Founding Sponsors with contributions of over $5 million; Mr John W. Kluge; The David and Lucile Packard Foundation; about 20 Charter Sponsors who contributed at least $1 million, such as AT&T, Ameritech, Bell Atlantic, Eastman Kodak, Reuters; about 13 smaller contributors; and three contributors in kind, such as IBM and Hewlett-Packard). The hidden costs were greater than that. The monitoring of the program from the funding organizations was fairly informal. They produced some cost models, Gantt charts and reports, but not much more. The reputation of the organization, the high profile of the program, and the support from Congress probably played an important role here.

The NDLP team feels that it was generally sufficiently equipped. They also think that using standards saved them money.

Most elements of the project will need to be updated fairly frequently. New materials and metadata will need to be digitized approximately every three months or quarterly (probably at a lower rate of production than was carried out until now). Metadata will need to be changed at a fairly low frequency. The user interface will need to be changed about once a year. It will be necessary to change file formats, but the frequency for this is unknown at the moment. There is relative safety in basing choices on a recognized standard. For example, although MPEG 4 seems to be growing in popularity, users can still use MPEG 1. Adding a new format does not necessarily render the previous one obsolete.

They are currently in the process of formalizing a preservation strategy to ensure long-term access to documents and objects in digital form and are working on their Digital Repository. This involves the adoption of standards and enforcing rigid naming practices and is based on migration of data. They are also looking at the length of time required for carrying out backups of records and how often these are being used to inform their plans for creating archival copies for the collections. File formats are more stable for text and images, but change more often for audio and moving images. In terms of storage media and conditions and updating metadata, the same strategy is followed for all media types. Quality control procedures in the life-cycle management include a checksum system (currently used by curators on the content), colleague review, archiving and use of standards. However, they are currently under review and are subject to change.

Their aim is to keep the digital deliverables available for as long as possible. The adoption of an exit strategy in the event of the institution being unable to meet the costs of sustaining the digital deliverables is an unthinkable question for the LC. On the contrary, the LC acts as the exit strategy for other institutions in the country. In their case, longer-term sustainability is not dependent on self-generating funds and moreover, the resource would not generate sufficient income to sustain it. The program has already secured partial resources for long-term sustainability.

 

12.6    Conclusion

General comments for others considering digitization projects, are the importance of clear goals as well as ensuring support from the highest level, taking into account the political aspect. In the case of the NDLP, the congressmen were very supportive as it provided a good argument to their constituents that their tax dollars were going towards something “good” and educational. This high level support also helped internally, to secure collaboration from all departments, even when they were not enthusiastic about the project. Since the targets set have been met and the popularity of the program and material created is obvious, even detractors cannot argue with the success of the project.

Few libraries actually have ongoing programs, rather than projects. Consequently, there is little comprehension of how complicated a procedure it is. Managers need to have some appreciation of the complexity of the digitization process. Due to the high costs and difficulties involved, extensive digitization projects are rare. There is now a great deal of information available on the web and elsewhere. There is no need to reinvent the wheel. Communication and collaboration are key in this field. With a project of this kind, knowledge of the material, the social history of the collections, library practices, and good technical and computer skills are all combined. It is necessary for staff to be flexible and willing to work in a team, and have good communication skills. Another point stressed was the importance of rapid generalization and standardization, trying to make all staff look beyond the specific great collection (especially for projects of this size and diversity). Consistency of metadata, for example, is very important. For smaller organizations, it is necessary to keep the project small and segmented, for example, choosing minimal cataloging or focusing on a small number of objects.

The LC and the NDLP are currently going through a period of transition. It seems that in the future, they will still carry out digital production but probably at a lower rate. They will carry out extensive work on the Digital Repository. The content has already shifted and will continue to do so - from solely Americana to a more international content (links with Spanish and Russian culture are already being explored).

Generally, this is a case where the mandate was to produce a very large number of digital resources in a limited time at high quality, to be available to as many as possible. Although the target was very high, the fact that it was very specific from the beginning helped to move the team in one direction. The need for very clear goals shared by all in a project or program is a useful recommendation here. Quality control and use of standards in as many areas as possible were important principles of the program; a good example here is the way the Program handles and documents outsourcing and quality control of material returned from contractors. It is interesting to observe that the intensive production worked best with a flexible and informal management structure, with a group of committed and skilled staff working as a team, in a model that is different and less hierarchical from the rest of the organization.




valid xhtml 1.1
abp~04/02