NINCH guide home interview table of contents previous interview next interview
HATII interviewed Elizabeth Roderick, Director of the Digital Library Program at the Library of Virginia, on September 26 2000. The Program was initiated in 1995, by the Library’s Collection Management Services department, in an effort to preserve, digitize and enable access to important archival and library collections. Through digitization, the materials held by the Library are therefore integrated into an accessible research environment. The project is responsible for the digitization of 2.2 million original documents, as well as the creation of eighty searchable databases, indexes and finding aids. Items of historical and cultural significance can thereby reach a wider audience, both in terms of educational usage and public access.
The Library of Virginia’s (LVA) digitization strategy developed alongside the growth of the web, the availability of web-based interfaces to bibliographic databases, and the development of the MARC 856 field. These developments coalesced around 1994/95 with LVA pilot projects, available finances and capital for equipment.
The project conducted an informal collection survey that interviewed staff members and convened ad hoc groups from within the library to brainstorm about the collections. This survey involved people from all branches of the Library and developed a shortlist of projects in the Library and archives. The biggest obstacle to overcome in carrying out the collection survey was the uncertainty that it prompted. If the project were to conduct a survey now it would make it more formalized, and would disseminate information more widely (because some staff felt excluded or did not understand the objectives).
The collection survey was used to establish priorities for digitization. The project director, who had worked towards a consensus and was free of political considerations, established these priorities in consultation with appropriate managers. The priorities were not formalized in a strategic policy statement but practicability was the biggest factor involved. The project director will develop a formalized policy in the next twelve months.
One of the objectives this policy sought to achieve was to establish some sort of institutional memory - a framework that would ensure that all staff would work towards a common goal. Another objective was one of validation for the Digital Library Program itself, in that it should be consultative and not ad hoc. The project believes that it has been successful in meeting these objectives. The recommendations the project would make to other organizations that are attempting to formalize their selection criteria are: to make sure there is one person in ultimate charge; to be clear about the institution’s mission and its primary customers; to gain support from higher administration; to make detailed documentation; to talk to collection experts; and to minimize open-ended or unresolved questions that will produce repercussions.
Obstacles to planning the development and building of the digital deliverables have included staff members who were uncomfortable with the project and were slow to support it, along with apprehension on the part of the library’s small technology staff at the time.
The criteria that have guided the project in selecting and prioritizing materials for digitization are: intellectual property rights (IPR); teaching and learning potential; conservation; historical and cultural value; research significance; provision of user services; enhanced access; research into digitization strategies; preservation; labor cost reduction; potential to reach disadvantaged groups; and infrastructure cost reductions. Of these, meeting user needs, especially on high use materials, has been top priority. The project is not a preservation strategy; material is microfilmed first and nothing is discarded. This saves cost in terms of delivery. The priority is then practical feasibility (such as housing and storage conditions). These criteria and their priority have not changed over time.
The project has co-operated with archives, libraries, museums, academic institutions, corporations, foundations and charities, and government agencies at the institutional, local, regional and national levels. Program staff regularly provides consulting services to these institutions. Co-operation with national bodies such as the Library of Congress has been on specific projects (e.g. religious petitions and national newspapers, which the LC digitized). Recommendations from this experience are to be clear about the approach, and to determine if or when it is appropriate or desirable for institutions other than the holding Library to digitize and/or host collections. The project would also suggest digitizing materials in their entirety, so as not to create artificial collections.
The current status of the program is ongoing; it started in 1994/95 with no anticipated end date.
The primary purposes in creating the digital deliverables are public access, provision of teaching and learning resource, wider access and response to previous demand. Secondary purposes are preservation and research. The program has produced an explicit statement of intent that covers its rationale, scope and significance.
The type of source material digitized includes:
The format of the materials to be digitized covers a wide range up to large-scale maps (which could be delivered using MrSID, via LizardTech.). In some cases secondary surrogates needed to be made (e.g. colonial records photocopied). Digitization is intended to represent the entire body of material.
The following standards, guidelines or tools are used for representing content:
The following standards, guidelines or tools are used for describing content:
The following standards, guidelines or tools are used for controlling data values:
The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy. In particular the program’s vendor (VTLS Inc., Blacksburg, Virginia) was quite visionary and the program relied on its expertise, especially on early solutions for graphical material. Other early pioneers were also consulted to achieve the simplest, highest possible quality, non-proprietary master.
The following standards, guidelines or tools are used for representing structure:
In relation to standards in general and suggestions for navigating between the ideal and the realistic, the program would recommend making things as simple as possible by planning file names, structures, and common elements at the beginning, and also to make decisions about standards early, thereby reducing the burden on data entry staff.
The primary intended audiences for digital deliverables are K-12, community college, four-year college, graduate school, lifelong learning, distance learning, museum users, public library users, archive users, government and private sector. The project has not carried out any evaluation of the target audiences. Groups other than the primary target audiences could use the digital deliverables.
The program has not acknowledged the needs of those with disabilities or the W3C’s “Guidelines for Web Site Accessibility”. There are no limitations to the use of the digital deliverables because the program has steered away from IPR issues and has attempted to operate in good faith in all cases. The profile of users has been as anticipated, as the aim was to make heavily used material available in digital form. However, the program has been surprised at the level of customer support requests (people with different browsers, errors, explaining collections so that expectations are managed, etc.). The program has since instituted an error tracking system.
There is considerable internal experience in project management in the person of the Director. The program was established as part of the Library (Collection Management Services Division). The program has led to changes in organizational relationships and procedures. It was appropriate for the program to become part of the Collection Management Services (CMS) Division because the latter houses technical services, archives processing and description, preservation services and special collections. One project management procedure that has not been successful, has been coordination with the Library’s IT Department. Most IT tasks are handled by the vendor, and on many occasions local Library IT staff have not been available to provide necessary on-site support for the Program. Quality assurance is achieved through the use of staff with a background and interest in the subject matter of the collections, and expertise and knowledge of cataloging and imaging, rather than computer expertise
Regarding pilot and feasibility studies, the program undertakes sampling and testing on a case-by-case basis for technical feasibility, workflow analysis and workflow piloting. As a result of these studies, some projects have been scrapped as priorities have shifted. Cost benefit, time and motion and benchmarking studies have also been carried out. Work is assigned by the program’s three project managers who sequence and schedule each project. They then assign to part-time staff or outsource.
Approximately fifty percent of the work is done on-site at the LVA and the remaining fifty percent is completed by the vendor, particularly large-scale batch data processing projects. The factors in each decision are the resources available, technical expertise and what material can be moved. The program has a track record of five years with the vendor. The program purchased equipment, including flatbed scanners, drum scanners, film scanners, digital cameras and high-end professional cameras. Guidelines have been established for data capture procedures and benchmarks used are grayscales and color charts.
The project employs four full-time (classified) staff, one director (100%), three project managers (100%) and ten PT, 40 hours-per-week (non-classified) staff to digitize, catalog, analyze and re-house materials. Staff have a predominantly library background or content/subject expertise and none was redeployed from other areas. External advice was sought on the technical aspects of digitization through reading, discussion, workshops and self-study.
Identifying training needs was not a problem, but finding quality training was. Areas where training needs were identified are preparation and handling of materials for digitization, technical operation of digitization equipment, post-digitization and metadata creation/cataloging. The project director, full-time and part-time staff engaged in training and this was organized in-house (via internal and external consultants), external courses, independent study and learning on the job. The training has so far met the needs of the project.
The project is aware of the copyright position of the digital deliverables and owns the copyright of the original materials. The program is working on a declaration of copyright or rights status to protect the property. No third party copyright material is digitized. Users are allowed to make printouts of the digital deliverables on paper. Users can view HTML, and compressed TIFF IV, JPEG and MrSID images. No electronic management systems, such as watermarking, are in use.
The project has a conservation procedure for the original materials whose condition is investigated by appropriate staff, such as archival or special collections staff. If necessary, the program carries out conservation procedures such as rebinding or mounting. The project modifies, degrades or compromises some material to carry out digitization (e.g. dis-binding, sometimes the original). Risk assessment of the material during the preparation for digitization is undertaken on an informal basis. Special equipment is available but the program does not digitize books. Materials are prepared by curatorial or preservation staff prior to digitization but are not monitored by them during digitization. Once the material has been digitized, restrictions are placed on the originals as a preventative measure to reduce handling and serving pressure.
Various cataloging and reference systems were in place prior to digitization, ranging from no finding aids or old finding aids, to index cards and databases. All of the available information from the catalog or reference systems is used in the digitization process, and staff members conduct extensive research to identify all relevant information and to locate core reference or source material. The program does alter some originals for the digitization process. Material has been rejected for incoherent pagination, incomplete works and blurred pages. The program has digitized from originals and intermediaries and some material has only existed in intermediary form. Intermediaries that have been used are photocopies, slides/35mm or 4x5 transparencies, photographic prints, microfilm and single frame microfiche.
The original material catalogs range from handlists to databases. The library and archive catalogs are both on the web and there are 35 different bibliographic databases. Cataloging is done at the item level wherever possible with bibliographic records linked to individual images via the MARC 856 field. MARC and USMARC standards are used in cataloging digital deliverables. The program emphasizes the bibliographic description of the original item and provides only brief information about the digitized image in the 533 field. However, the digitization process and image format is standard for each type of material, is consistent across collections, and is documented at the program level.
Tools for controlling data values are LC subject headings and the Art and Architecture thesaurus and LC thesaurus for graphic materials. Details recorded in the metadata include information about the original object and the digital object. The vendor also maps from a variety of catalog sources to MARC records (using MARC Maker). The digitizer and an archivist/information professional record metadata, and some skeletal MARC records are used. Scanning and cataloging are separate in the workflow but the filename is captured when scanning (this is set up prior to scanning). Metadata records for the digital deliverables are then held in separate bibliographic databases from the main online book catalog, which is available on the Internet. The database records and the digital objects are linked through the MARC 856 field. In the case of photographs, there is a one-to-one correspondence between the 856 field and the image; in the case of multi-page documents, the 856 field is linked to an intermediate HTML page that contains links to the individual document pages.
Format for retroconverted text-based digital deliverables includes:
None of the texts contained non-Latin scripts. OCR is used by the vendor but only very little on one or two projects (pre-published, typescript material, e.g. the published guide to the Board of Public Works collection). The aim of using OCR was to achieve searchable computer-based analysis. From this experience the key advice is to use clean text.
Keying-in and data-mapping are the main conversion methods for textual materials. The lesson learned here concerned data entry errors created by the vendor’s part-time staff due to inadequate supervision, which later had to be manually corrected.
The TIFF file format is used for capturing, preserving and delivery. Photo CD is also a capture and preservation format. JPEG is solely a delivery format.
The capture and preservation resolution is 300dpi (600 capture for small images and then back to 300 for archiving). Delivery resolution is between 72 and 100dpi. Bit-depth is 24 throughout. Image compression is CTIT TIFFs for preserving and delivery and JPEG for delivery. For the Camp Lee Soldiers Home Applications and Registers a grayscale JPEG presentation was also used. The aim of compression was to improve access, enhance usability, decrease storage requirements and reduce the risk of theft of the digital resources. The project retains the uncompressed scans.
The program carries out post processing on the images (and sometimes text) as well as rescanning from the originals. This is done when there is a poor horizontal or vertical crop. De-skew, re-size, grayscale to bitmap and rotation (from microfilm) are carried out.
Average file size for capture and preservation is 1MB (6MB for 8x10 photos), compressed Group4 images are 1MB and up to 100K uncompressed. The dynamic range of the equipment is checked through the set up calibration. From their experience of digitizing images the program recommends planning, analyzing and going through the raw material and similar materials at the same time, looking at the physical organization of the resource to allow for re-housing/handling and attempting the best capture from the outset.
The quality control procedures in place for the digital deliverables are spot checks on map records (subject-heading keywords can be checked easily). Total checks are undertaken on photograph and document images. Link checking is accomplished with spreadsheets. Metadata quality control procedures involve a double check (e.g. Bible records are under one accession and then by alphabetic character). An ASCII text file is then produced which the vendor place checks. Scanned images and then a burned CD image are also checked for legibility.
Users can access the open catalog plus materials. Searching and browsing facilities are offered through the Library’s integrated system (through MARC records). Facilities include keyword, Boolean and combination searches. Browsing is approached through the catalog, and the second stage is thumbnails. When there is no MARC or HTML information, there is a card index-like record with links. Apart from searching and browsing users are able to manipulate images by zooming, rotating, etc., using MrSID. Users must employ a TIFF viewer as a browser helper application in order to view the TIFF images. The Library provides a generic TIFF viewer at no cost, and users may also obtain other viewers such as Netmanager’s Chameleon and Netview, Wang image, Alternatiff, and GraphicConverter for MAC. Usage is monitored by automatic data capture and formal data collection. Usage for a typical month will include users from more than 130 countries, with 120,000 hits on all product Home Pages, 45,000 database searches performed, 147,000 digital images and 1 million electronic cards viewed.
Users do not have to pay for the use of the digital deliverables, but have the opportunity to order high quality images off-line.
Potential users of the digital deliverables are informed about their availability through website announcements, press releases, articles in print media, print and broadcast media coverage, conferences and meetings, email shots, registering with search engines, receptions and word of mouth. Many users are gained through search engines, word of mouth and list servers for target groups (e.g. Virginia and genealogy lists, the Scout Report).
The program carried out front-end evaluation of users before the development of the project. The techniques used were online questionnaires, email and focus group discussions. As a result of this valuable feedback, the program modified and enhanced several products, sometimes in several versions, to better serve the needs of the users. Formative evaluation with users was achieved through online questionnaires, email and focus group discussions. The results from this were to enhance the interface, navigation and amend errors. Summative evaluation will be undertaken with the Virginia Historical Inventory Project (Mellon funded). In addition, all feedback information is retained.
The lesson from this evaluation is simple: users do not read instructions and putting material online can lead to unforeseen problems, particularly with regard to remote user support.
Since 1995 the program has cost $4 million plus salaries. Funding was from federal and grant sources. The program believes that its $4 million funding is sufficient and has actually asked to have its budget reduced. The program believes that the use of standards and guidelines has saved money in the short and long term. The program has been monitored by its funders through annual reports to the Institute of Museum and Library Services (IMLS), Mellon and internal reporting. Mellon requested cost models and workflow reports from the program.
New material (and its associated metadata), metadata change, user interface updates and file format changes are ongoing but it was not known at what frequency.
The program’s preservation strategy is offsite storage and secure backup. Text and image materials have storage media and storage condition strategies, while images also have a file format strategy. These strategies are based on migration of data.
The digital deliverables will be available indefinitely and the project’s longer-term sustainability is not reliant on self-generating funds. The project has secured funding at a national level. The project does not have an exit strategy, but may start charging if this became necessary. Loss of the digital deliverable would be a matter of concern.