NINCH guide home interview table of contents previous interview next interview
On September 25 2000, HATII interviewed William G. Thomas, Director of the Virginia Center for Digital History at the University of Virginia. The project at the Center for Digital History was initiated as a study of American history and culture, in order to subsequently transform teaching methods in this field. The new technologies were employed with a view to advancing the discipline of historical study. To this end, the project produced high quality, reliable resources with World Wide Web availability and therefore, the capacity to reach wider audiences, both within the education community and in the general public.
The most important criteria for selecting and prioritizing material for digitizing are teaching and learning potential and research significance. Medium level criteria are historical and cultural value, enhanced access, potential to reach disadvantaged groups and social inclusion. The selection and prioritization criteria have not been changed over time.
Co-operation in developing the Center’s digitization program has involved libraries (UVA Special Collections), museums (Woodrow Wilson Birthplace), academic institutions (Virginia Tech, Norfolk State University), corporations (Public TV), foundations and charities (National Council for History Education) and government agencies (Department of Education — State).
A recommendation from this experience is that it is essential to build strong collaborations across disciplines and with schools both within and outside the institution.
The program was initiated in 1991/92 when IATH started work, and the Center was established in its own right in 1998. There is no anticipated end date.
The purposes for which the digital deliverables are created and their rank are:
The project produced an explicit statement of intent that covered its rationale, scope, significance and primary audience.
The type of source material digitized includes:
The nature and format of paper material is relatively standard and, at the other extreme, maps are up to 6ft by 6ft in size.
Where possible, the program generally tries to represent an entire body. Some of the digital deliverables are intended to be re-purposed, e.g. Valley of the Shadow CD-ROM. This approach has had mixed results: the images have been reasonable but the maps and databases were challenging, and nothing was seamless. The problems are inherent in the discrepancy between the tools that are available for CD versus those available for the web.
Standards, guidelines and tools used for representing content are:
Standards, guidelines and tools used for describing content are:
No standards, guidelines or tools are used for controlling data values, as none were appropriate for the project.
The Center consulted some existing guidelines for digitizing particular document types, possibly in the Making of America White Paper.
Standards, guidelines and tools used for representing structure are:
A recommendation for how to navigate between ideal standards and realistic use, is that while this is a function of funding and staff ability, it is nevertheless important to aim for the highest quality at every level.
The highest priority intended audiences are K-12, community college, four-year college and graduate school. Lifelong learning, distance learning, museum users, public library users, archive users, government, and the private sector rate medium priority.
An evaluation of the target audience occurred through a 1996 NEH seminar for teachers at UVA. This was not in a social scientific way but through workshops and seminars; a report was posted on e-text. Groups other than the target audience can use the digital deliverables. The Center is 75% compliant with W3C’s guidelines for web accessibility, for example, in terms of text browser capability.
Some projects have restrictions on the use of the digital deliverables because of copyright. These restrictions are clearly stated.
Both external and in-house project management advice was available.
The Center for Digital History is related to the structure of UVA within the college of arts and sciences as a Research Center, and therefore has no curriculum.
The digitization program has led to the university changing its allocation of resources; the very existence of the Center is proof of this policy change. There are no formal project management procedures in place, but ideally every project will have a manager and director/directors. It has been noted that the decision to eliminate the post of project manager was a mistake. On distributed projects, site managers are also required, otherwise communication breaks down.
A large number of quality assurance procedures are in place, including style sheets for markup and data entry and a team leader in each group to provide day-to-day leadership, so that the manager is not solely responsible. In addition, a second opinion is required, concerning all material.
Pilot studies were carried out for training needs but the program underestimated technical needs. As a result the program raised more money. No formal benchmark or time and motion studies have been undertaken. Gantt charts are used to aid project management. Work is delegated according to the interest and specialization of students. Job descriptions and performance indicators are used (this is a state requirement).
Digitization is a combination of in-house and outsourcing (some programming, web design, newspaper images for Valley project for the latter). Outsourcing is chosen on a project and material basis and where the skills are not available in-house. Equipment for in-house digitization was bought in through a large university grant (ITC). When the material excludes some digitization processes, the choice is then based on quality assurance and lowest error rate.
The following technologies are used:
Guidelines for data capture procedures have been established. Benchmarks are used mainly for data and text.
The Center employs one director (100%), one associate director (also metadata specialist, currently vacant), five digitizers/research assistants (10 hours per week), one technical development worker (50%), one educational specialist (25%) and one administrator (33%). Staff are predominantly from an arts, humanities or history background. The administrative support was redeployed from another area.
Advice on the technical aspects of digitization was available both externally (through manufacturer’s support) and in-house.
Training needs were assessed informally and areas identified were:
Specialist technical staffs were engaged in training in-house, with internal consultants. The training has met the needs of the project.
The Center is aware of the copyright position of the digital deliverables. It does not own the copyright in the original materials (which for the most part are out of copyright). The copyright status is declared in an overall statement, which has been effective. Material in copyright is digitized with the owners’ agreement, payment of a fee or without formalities. There are no copying restrictions for users.
Users are able to view and download text:
Users can view and download images:
Audio resources are listen-only through full length compressed (streaming) sound.
Moving images are view-only through lower and highest quality digital video clips.
No electronic management system is in use.
The Center does not have a conservation procedure for the original material and little investigation into the condition of the material is made prior to digitization. Conservation activities are left in the hands of the collection holder. To carry out digitization one set of newspapers was dis-bound. No risk to the materials was identified during the preparation or digitization process. No special processes or equipment are specified but curatorial staffs prepare material prior to digitization. There are no access restrictions on materials after digitization.
It is not known which cataloging and referencing systems are in place before digitization. Some information from these records is adopted. The Center has to locate some core reference material. Some material is altered from the original to complete the digitization process. No material is rejected. The Center uses reproductions or intermediaries where the material only existed in intermediary form.
The form of reproductions/intermediaries used includes:
The digital deliverables are cataloged as part of project management but the records are not transferred to the library.
Standards or guidelines used for cataloging the digital deliverables are not applicable. An in-house system using MS Access is used to catalog digital material, which is later expressed in the TEI header.
Tools used for controlling data values are:
Metadata details recorded are:
The digitizer creates Metadata records. The records are on an intranet server and are available on the internet. The records for the digital deliverable and the original digitized materials are independent of each other. The catalog and object are linked through the database.
The format of retroconverted texts includes:
No texts contained non-Latin scripts.
No OCR software is used. Keying-in is used as the method for textual materials. The lesson from this is to use style guides for documents.
Capture and preservation formats for images are TIFF and delivery formats are GIF and JPEG. Capture and preservation resolution is between 300 and 600dpi and delivery resolution is 72 or 100 to 150. Bit-depth is not known. JPEG compression is used for delivery to reduce cost, improve access and decrease storage requirements. The Center retains the uncompressed scans. The program does not carry out any processing on the images.
An outside consultant carried out sound digitization for the Presidential Tapes Project.
Capture formats are:
Preservation format is:
Delivery format is:
Quality control for the digital deliverables includes spot checks on likely error areas and total checks on a second pass. Metadata quality control procedures are second pass checks. Quality control procedures have meant the workflow has been prolonged.
Users have a combination of open access to the catalog, and restricted access to the materials, open access to which is limited to in-house users. For web delivery, users are able to search by keyword, Boolean, etc. (including metadata). The Center’s specialist software uses are Generator, Flash Generator and MrSID. Users are able to zoom in on MrSID images.
The level of use of the digital deliverables is 13,000 hits a day monitored by automatic data capture using Webusage 7.0. Users do not have to pay to use the digital deliverables.
Potential users of the digital deliverables are informed by website announcements, press releases, articles in print media, print and broadcast media coverage, conferences, meetings, email shots and registering with web search engines. Print and broadcast coverage has proved the most effective.
The Center’s formative evaluation used the following techniques:
As a result of this, evaluation changes were made to the interface and nature of materials.
The program has cost $1,500,000 so far (not including institutional support). The main sources of funding have been UVA, NEH and private donors. The main problem with the funding has been that it has been unevenly distributed. Therefore, if the Center were given funding again it would prefer a level of base support and then project funding. The Center believes that adopting standards represents a cost because higher expertise is needed.
Funding organizations monitor the Center through annual reports (plus quarterly reports for UVA/NEH). Funding organizations required project management plans, cost models and workflow reports from the project.
New materials and metadata are added daily and the user interface is changed every six months.
The Center does not have a preservation strategy beyond CD-ROM (server material is backed up by ITC). The Center intends to keep the digital deliverables available indefinitely. The long-term sustainability of the program is dependent on securing future grants.
There are a variety of options for the Center’s exit strategy e.g. commercial opportunities. Loss of the digital deliverable would be a matter of concern.
The interviewee felt the questionnaire had a digital library focus.