NINCH guide home        interview table of contents        previous interview        next interview

 

32   University of Virginia, Institute for Advanced Technology in the Humanities (IATH)

 

HATII interviewed John Unsworth, Director of the Institute for Advanced Technology in the Humanities at the University of Virginia, on September 11 2000. IATH is a research unit of the University, which aims to expand the employment of IT within a humanities environment. In overseeing the digitization activities of the University, IATH is responsible for producing digital materials with specific relevance to research interests. While maintaining this focus on the activities of the University, the Institute also forms partnerships and participates in initiatives with other institutions, in order to further the development of IT in the field of cultural heritage.

 

32.1    Organizational Digitization Program and Policy

The Institute for Advanced Technology in the Humanities (IATH) does not have an organizational digitization strategy per se. Any strategy is decided case by case for the various projects that IATH runs. These strategies are based on an analysis of the materials; size of image and processing issues are a factor in any strategy. The documentation associated with the projects will include the digitization strategy. Much of IATH’s advice to the projects that are considering establishing a digitization strategy is based on the realities of the materials and of the processes. IATH does not conduct a collection survey, as each project is different and the collections are not necessarily all local, in fact very are often not. While many institutions that control the artifacts will often carry out the digitization, this is not always the case with IATH. Each project will differ; some will carry out digitization in-house on locally controlled materials, while other projects will send out to other institutions for the digitization process. The selection of the materials is driven by research rather than by the institutional collections.

As the digitization projects are research-driven, there is often a rush to production and a pressure to produce the materials for research. These obstacles can often work against a solid digitization design process. If the project has succumbed to this pressure, then building the digital deliverables can be a more difficult process; often there is a lack of training in faculty staff. A more formal structure may remedy this.

Selection of materials is highly research-based with intellectual property rights, teaching and learning potential, historical and cultural value, enhanced access, research into digitization strategies, potential to reach disadvantaged groups and social inclusion all as important factors. The selection criteria have not changed over time, as research is the main impetus.

IATH has collaborated with archives, libraries, museums, academic institutions, corporations, foundations and charities, and government agencies at the institutional and national levels. This collaboration has been very successful, with the main criteria being to establish firmly who has responsibility for which task.

IATH has many projects, which have differing purposes, priorities, and objectives. Many projects are complete and as many again are ongoing. Generally, projects do not finish entirely and have some level of addition. Priorities for projects will include preservation, public access, teaching and learning resource, research, wider access and response to previous demand, as well as specific criteria such as urban planning (this would be for a very specific purpose).

Projects do tend to produce websites with explicit statements of intent covering rationale, scope and significance.

As the projects cover a wide range of topics, the type of source material digitized can include:

The format of the materials to be digitized covers a wide range as specified by the project criteria and will include items such as drawings and photographs, schematic plans of water systems and large-scale maps. Depending on the project, materials may either be a representative sample or the entire collection.

Different projects have different views on the process and what may be changed, for example understanding digitization of maps, and using different specialist software.

Again, the range of projects will cover the use of the following standards, guidelines or tools for representing content:

The following standards, guidelines or tools have been consulted for describing content:

Projects will use a database schema that refers to these standards.

The following standards, guidelines or tools were consulted for controlling data values:

Different projects use standards in different ways, and will, for example, consult them for creating their own database schema.

When standards are not used, it can be for a variety of reasons. They may not yet be agreed, they may be inappropriate or not known, or there may be a lack of documentation or of training.

IATH projects will consult other existing guidelines for digitizing particular document types when planning their digitization strategy; an example would be standards for large-scale maps.

Projects use different standards, guidelines or tools for representing structure, including:

In relation to standards in general and recommendations for striking a balance between the ideal and the realistic, the program recommends starting with the standards and constructing their own database which can have better data ranges. However, it is advisable to use and consult standards wherever possible.

The range of projects ensures that the primary intended audience for digital deliverables will cover many communities such as K-12, community college, four-year college, graduate school, lifelong learning, distance learning, museum users, archive users, government and private sector. The main audience is research users, generally in the graduate school community. Some projects have carried out evaluation of the target audiences using methods such as focus groups of the main institutions and it would be expected that a wider scope of researchers would use the digital deliverables.

The program has not acknowledged the needs of those with disabilities through the W3C’s “Guidelines for Web Site Accessibility”. Some of the materials will have limited access for IPR reasons. It is not known if the profile of the users differs from the anticipated set.

 

32.2    Project Management and Planning

IATH uses in-house expertise when planning a digital project and talks to other centers at a technical level. IATH tends to be the funder of the projects and the project managers are recruited by Faculty. IATH will give advice and training where required and each project will differ in the reporting relationship but it tends to be very informal. Some projects have more formal procedures such as selection committees and annual reviews.

One project management procedure that has not worked is the flat project where the structure is very democratic. It was difficult to establish intellectual responsibility. IATH encourages projects to develop the project management procedures best suited to individual needs. Quality assurance procedures will depend on the project varying from a little to extremely high standards.

Just as each project varies on procedures, they also vary on planning, and occasionally pilot and feasibility studies will be undertaken. However, no project uses benchmarking for scheduling tasks. Planning tools will vary from project to project: IATH will encourage but not prescribe.

Digitization can be carried out in-house or outsourced depending on the project. The factors in each decision are the resources available, and permission for access. IATH has equipment already installed. These include flatbed scanners, film scanners, digital cameras and high-end professional cameras. Guidelines for data capture procedures have been established and benchmarks used are gray scales and color charts but, once again, this will vary from project to project.

 

32.3    Human Resources and Training

Each project will have a variety of staff but IATH has three directors, one metadata specialist, four technical support/development staff and two administration support staff. However, it is feasible that a hundred students may also be employed. Forty to fifty faculty staff may be involved in the various projects.

Staff have a wide variety backgrounds including library background or computing science as well as architecture and literature publishing. Initially, staff were redeployed from other areas. Internal and external advice was sought on the technical aspects of digitization.

Areas where training needs were identified are project management, applications of technical standards, preparation and handling of materials for digitization, technical operation of digitization equipment, post digitization and metadata and markup. The project director, specialist technical staff, library cataloging staff, equipment operators and, in fact, everyone on the projects are all engaged in training and this has been organized in-house (via own consultants), external courses, independent study and learning on the job from each other. It is considered that training can never fully meet the needs of the projects as staff can never have enough training in this developing area.

 

32.4    Project Life-Cycle Processes and Procedures

The project is aware of the copyright position of the digital deliverables and the ownership of the copyright of the original materials will depend on the project. The projects declare the copyright on the page as text embedded in the object, for example as JPEG headers. This practice has proved effective. Some material in copyright is digitized and this is done under legal provisions for libraries, with the owner’s permission and by payment of a fee.

Users are allowed to make printouts of the digital deliverables on paper, download to a PC, LAN and WAN. Users can view and download text, image audio and video objects in different ways on a project-to-project basis. Potentially these will be: ASCII text files, TEI DTD marked-up text; thumbnails, lower quality images, highest quality images, associated documents; full length compressed sound and lower quality video clips. Watermarking is deliberately not used, as IATH believes that it defaces the original and misleads the users.

IATH does not have a conservation procedure for the original materials, as this is not applicable to the organization. The condition is investigated by appropriate staff, such as archival or special collections staff, but any conservation would be carried out by the institution in possession of the material. Risk assessment of the material during the preparation for digitization is undertaken by analysis. Objects are handled carefully and gloves are worn. Some projects will use special equipment. Potentially as each project varies, materials can be prepared by curatorial or preservation staff prior to digitization and may be monitored by them during digitization. Once the material has been digitized, restrictions can be placed on the originals as a preventative measure to reduce handling and serving pressure depending on the project.

As IATH does not own the material the cataloging and reference systems will vary depending on the institution that the materials came from. Projects often have to use the information from these systems but often do not have access to all the relevant information and have to locate some core reference or source material. It is possible that some originals are altered for the digitization process.

Various projects have been digitized from originals and intermediaries and some material has only existed in intermediary form. Intermediaries used include photocopies, slides/35mm or 4x5 transparencies, photographic prints, microfilm, single frame microfiche and glass negative. Audio intermediaries include 78rpm discs, 33 rpm discs, audio cassettes and digital sound media. Moving images can be digested from video copies, betacam SP and 16mm.

The original materials are not cataloged by IATH. The digital surrogates are cataloged for internal project management systems. They have used Astoria document management and Unix and relational databases. Tools consulted for controlling data values are controlled vocabulary, thesaurus (Getty) and classification schemes. There is no off-shelf solution for the projects. Details recorded in the metadata are information about the original object, the digital object, the digital process, technical details, staffing details and administrative information. The creation of metadata is project dependent.

 

32.5    Format, Resolution and Compression of Digitized Materials

Format for retroconverted text-based digital deliverables varies from project to project and can include:

Some of the texts may contain non-Latin scripts. Keying-in rather than OCR is used to convert the digital image. From this experience the key advice is that it is more cost-effective to send out large quantities of text and is worthwhile to combine projects for this purpose.

The TIFF file format is used for capturing and preserving; JPEG is solely a delivery format. JPEG 2000, Wavelet and MrSID are also used for delivery.

The capture and preservation resolution and bit-depth will depend on the project but capture is at the highest resolution appropriate to the subject. Images are compressed using Wavelet, JPEG and MrSID. This is done to improve access and decrease storage. The originals are retained in uncompressed format.

The program carries out post-processing on the materials using PhotoShop; the processing techniques varying from project to project. Due to the wide range of projects there is no average size of file. The dynamic range of the equipment is checked through the set-up calibration.

Sound files are digitized at the digital media center. WAV, AIF and RealAudio are used for capture and delivery of sound. The sampling and bit-rate will depend on the project. Projects have used bit-rate compression and frequency compression but once again the details depend on the project. Compression is used to improve access and to decrease storage. Sound files will have noise cleaned in post-processing procedures.

Moving images are also digitized by sending them to the digital media center. File formats used are AVI, MPEG, QuickTime, RealVideo and Mov for capture and delivery. The resolution (frames per second) will depend on the project. Kodak video compression is the current favorite and this compression is used to improve access and to decrease storage. Video is post-processed to clean up noise and edit to bite-sized chunks.

IATH does not have a formal quality control procedure in place for all projects and each project will have its own. IATH has oversight of the projects and advises on these procedures. Rendering in markup is one method to ensure metadata quality control.

Users can access the materials generally on the web and access tends not to be restricted. Search facilities include DynaWeb, indexing HTML, Excite search engine and PostgreSQL. Users can search the metadata using DynaWeb and databases. No special hardware or software is required. Users can resize objects online. Each project varies on the level of use but these are recorded through web statistics project by project.

Users do not have to pay for the use of the digital deliverables for any project.

Potential users of the digital deliverables are informed about their availability through website announcements, articles in print media, print and broadcast media coverage, email shots, conventional mail shots, registering with search engines, receptions and bibliographic studies. Many users are gained through email and bibliographic studies.

 

32.6    Evaluation, Funding and Long-term Sustainability

IATH does not have formal evaluation procedures for the projects and any front-end evaluation is done in the selection process of the materials by speaking to researchers. In these cases comment forms are used. Some aspects of projects have been changed as a result of front-end evaluation, such as assessment of the time to start the project.

No formative evaluation is carried out and no project has really ended, so there is no summative evaluation.

IATH has funded approximately six to seven million dollars worth of projects. Half of this funding comes from the university and the other half from private funds, federal funding and corporate donation.

New material (and its associated metadata), metadata change, and user interface updates are ongoing at various frequencies depending on the project. File formats are not changed if possible.

The variety of projects means a variety of approaches to preservation. However, each project will ensure that the strategy used is appropriate to the materials and the goals of the project.

IATH concentrates on the scholarly goals for each project and the scholar is the driver for all projects. They recommend using appropriate standards where possible. Projects should look at the goals and why they are using documents, and create their digital strategy based on these criteria.




valid xhtml 1.0 strict
abp~04/02