NINCH guide home        interview table of contents        previous interview        next interview

 

6       University of Chicago, The Oriental Institute

 

On January 12 2001, HATII interviewed Charles E. Jones, Research Archivist-Bibliographer at the University of Chicago’s Oriental Institute and John C. Sanders, Head of the Computer Laboratory. The cultural and historical value of the materials held by the Institute was a significant factor in the decision to digitize the collections. To this end, the process would ideally enhance public access as well as most importantly, increasing the potential of the material for teaching, learning and research purposes. The work of the Institute primarily revolves around research, and a considerable shift of focus was required of those employed at the Institute, in order for the digitization program to successfully become a reality.

 

6.1       Organizational Digitization Program and Policy

The Oriental Institute has a long record of involvement with electronic and digital media. Experimental – and expensive – work was undertaken with entering texts into a mainframe computer in the 1960s. PCs have been used since their introduction in the 1970s on the initiative of individual faculty members. An organized institutional effort began in the early 1990s with the connection of the Institute’s building to the local area network. It was then that the Institute decided to purchase a PC for all its members of staff. Subsequently, the convenience of email and printing convinced people to use their computers.

However, the Oriental Institute’s digitization strategy was not directly related to infrastructure developments, even if these were a pre-requisite. This began in 1992 when Institute staff spoke with the University of Chicago’s Computing Service and they realized that they could serve redundant digital material. The Institute has run a web service since 1994 and it is operated under the assumption that the service will be available indefinitely. However, the Institute does not have a formalized, strategic digitization policy. Rather, the Institute’s Research Archivist-Bibliographer, the Head of the Computer Laboratory and the Director of the Institute make such decisions. The interviewees were acutely aware of the need to institutionalize their digitization operations and avoid the yo-yo effect of periodic centralization and decentralization. However, any initiative for this is likely to come from the museum, rather than the research side of the Institute. Another possible locus for a digitization strategy may be the publications department. The Institute has not conducted a collection survey as part of its digitization program and priorities have not been formalized. However, an original principle that was used to support the idea of publishing material online was that it should have been published in print before it appeared on the website. Today many components of the Institute’s website have no print analog. The museum’s closure for refurbishment in 1996 was an opportunity to photograph the objects but this opportunity was not taken for fear it would slow down the renovation.

The main obstacles to planning and building the development of digital deliverables have been money, manpower and the lack of an institutional overview, although there has never been a philosophical objection to digitization.

The primary selection criteria used to prioritize material for digitization are teaching and learning potential and research significance. This is closely followed by the material’s cultural and historical value. Enhanced access was a primary criterion for selecting material when the Institute’s museum closed for refurbishment. There is also theoretical institutional support to digitize for preservation and the preservation department is fully supportive of this aim. The driving force behind the Institute’s digitization activities are funded projects.

The Oriental Institute has co-operated with museums, academic institutions and professional societies at local, regional and national levels in the development of its digitization projects. For example, the museum’s education department is part of a project to deliver curricular materials over the web to Chicago schools. Based on this experience one observation is that collaboration often encourages institutional hesitation and it is vital to get beyond talking to action. It is also pointed out that projects, even when using the same people and working to the same deadline, can have different rates of progress.

The primary purposes in creating the digital deliverables are as a teaching and learning resource, research, and to increase accessibility. The Institute has produced explicit statements of intent for some aspects of project work.

The type of source material digitized includes:

The predominant medium of the materials to be digitized is paper, but some velum has also been digitized. Some out-size material has been digitized by a vendor. The digitized deliverables have usually represented a sample of material rather than an entire body, but the Institute may digitize a whole season’s field drawings or photographs, for example. It is the Institute’s intention that the high-resolution material be re-purposed. For example, digitized vector lines from a map were built to provide a digital model on video on the web, but were also used to create architectural plans.

The following standards, guidelines or tools are used for representing content:

Modified MARC is used for describing content.

An in-house library system is used for controlling data values. Other standards, guidelines or tools for controlling data values were not appropriate to the projects.

The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy. These included XML, and other DTDs being developed to either use or modify. Guidelines were rejected as being unsuitable for archaeologists and philologists.

The following standards, guidelines or tools are used for representing structure:

The Oriental Institute is still in the process of determining how best to navigate between the ideal and the realistic in the application of standards. The Institute is trying to develop a DTD and deal with the overriding difficulty of how to represent scripts for which there are no standards.

The target audience for the digital deliverables are four-year colleges and graduate schools, but very much aimed at the teachers. Similarly, there is a current drive towards the K-12 audience, but again aimed at teachers through an ongoing project in the education office. The general public museum users are also a target audience, through the Institute’s general public display collection material rather than the more tailored or specific material geared toward other audiences.

The Oriental Institute’s education department has undertaken an evaluation of its target audience using a 40-member focus group of high school teachers. This was a full written evaluation and was a requirement of the funder. Aspects of the material were revised as a result.

The Institute has not explicitly taken account of the W3C’s web accessibility guidelines. However, they are represented on an accessibility group that discusses this and works out of the University’s networking services division.

Limitations are placed on some digitized material; these restrictions take the form of password restricted areas of the website, users needing to be enrolled in a particular class, and some material is for in-house use only. In all cases these limitations are in place because of copyright or IPR restrictions.

The profile of users has been surprisingly broad in terms of country and the types of files used.

 

6.2       Project Management and Planning

Internal advice was available on managing the Institute’s digitization program. The management of the digitization program has fitted into the existing departments and units of the library and computer laboratory. The digitization program has not led to any changes in organizational relationships and procedures although there is a degree of push and pull between departments.

Formal project management procedures are embodied in the Institute’s committees, one of which is computational and makes recommendations. The advisory board for publications also makes recommendations although it has an ad hoc existence. However, these have not been terribly effective policy making bodies. There are no managerial quality assurance procedures in place and staff have worked personally in an entrepreneurial way without formalized structures.

Several of the projects undertaken were conceived as pilot projects in themselves. The education office may have undertaken feasibility studies in their own right but the details are not known at present. No significant changes to the design of projects has resulted from these pilot studies. The Institute has not carried out any time and motion or benchmarking studies.

Work is allocated on an ad hoc basis depending on staff availability. Some staff have been hired specifically for digitization work; however, graduate students and especially volunteers have carried out the majority of work.

Digitization has been carried out in-house because of the cost and risk of transporting the original materials and costs of outsourcing. Equipment for digitization has been bought in and the Institute has tried to forecast its technology needs in purchasing equipment. The decision of which digitization process to adopt is made through discussion of the outputs required, equipment available and the nature of the original material. The equipment used for digitization is Apple, Epson, HP and Umax flatbed scanners, Nikon film scanner and Nikon and Olympus digital cameras. A lot of second stage digitization is done from high quality photography.

Data capture guidelines are in place but not for projects in the field, where the conditions make this impractical. However, the field station equipment is cleaned and calibrated before it goes out and includes capture instructions. Grayscale and color charts are used in the photographs as benchmarks so the color can be adjusted.

 

6.3       Human Resources and Training

The Institute’s digitization program employs one full-time director (Head of Computer Lab) who also undertakes technical support and development. One part-time digitizer is employed, plus a number of graduate students and volunteers who work anything from a few hours to a full week. A consultant is also employed who devotes approximately 25% of their time to digitization projects. The graduate students employed have a Near Eastern background whilst the volunteers have some intellectual interest in the subject. In-house expertise is available on the technical aspects of digitization and the University of Chicago has staff with a wide variety of expertise, for example in panoramic movies.

Training needs are assessed informally. Areas of training that have been undertaken include:

One area where further training is needed is metadata creation.

The team members who have received training include:

Training has been organized using the following:

The training has met the needs of the project.

 

6.4       Project Life-Cycle Processes and Procedures

The project is aware of the copyright position of the digital deliverables and owns the copyright in almost all the original materials. The copyright status of the digital deliverable is declared on every web page. In addition the Institute has a general statement of copyright and permissions linked from the banner navigation tool in its website. With the exception of a few isolated abuses this practice has been effective. Material in copyright was digitized with the owners’ agreement for which written permission was obtained. Users are allowed to make printouts of the digital deliverables on paper and film, burn to a CD and download to a PC, LAN or WAN providing they do not republish without permission.

For textual material users can download and view HTML files, with one exception of a text database that creates HTML on the fly from queries.

For digital image material, users can download and view:

For digital moving images users can only view QTVR clips.

No electronic management system, such as watermarking is in use.

The Institute’s museum staff approve both the material and the process prior to digitization. Risk assessment and cleaning are undertaken prior to photography and curatorial staff make the final recommendations regarding risk. Curators monitor and handle materials during digitization to minimize risk; light limits are set and special cradles built. Once material has been digitized no access restrictions are placed on the originals. The museum may send the digital surrogate instead of the user having to visit in person; they may also serve the digital collection first to users who visit.

Some items in the museum collection are uncataloged, on paper or card catalogs or in electronic records; however, all the library material is now in an electronic catalog. If no catalog record exists prior to digitization the handlist is processed as the basic electronic catalog entry. Sometimes project staff have to locate reference or source material and this is a continuous and developing process and the electronic catalog is updated.

No material is altered from its original state for digitization. Material has been rejected because scanning may cause damage or for procedural reasons. Intermediaries have been scanned, although the material did not exist only in this form. The collection index is one example where the material was to be displayed on the web and quality was less of an issue.

Intermediaries used for image digitization are:

Original materials are cataloged in the library electronic catalog, with each item accorded a separate entry using the fiche caption, not that affixed to the print. Some original material in the archive is not recorded in the electronic catalog.

Modified MARC is used for cataloging the digital deliverables.

An in-house classification scheme is used for controlling data values.

Metadata details are recorded about:

Metadata records are created by the digitizer and then checked by an archivist/information professional. The metadata records are included in the main library catalog but the Institute does not have a unified catalog to include material in the archives. This library catalog is available on the internet. The records for the digital deliverable and the original digitized material are the same. The question of how the catalog and the objects should be linked is still under development.

 

6.5       Format, Resolution and Compression of Digitized Materials

The formats for retroconverted text-based digital deliverables are:

Some documents contained non-Latin scripts. OmniPage was the OCR software mainly used, but also TextBridge. Accuracy levels varied with print quality and font, but 90% and above was achieved. Pre-OCR treatment included enlarging and increasing the contrast. The aim of OCR was enhanced searching. Recommendations from this experience would be to increase the text of the source as large as it can go and ensure the OCR software has the right dictionary. Keying in was also used and the recommendation from this experience would be to key it in twice and carry out file comparisons.

For image material the TIFF and PICT file formats are used for capture and preservation depending on the end goal. GIF, JPEG, Postscript and PDF file formats are used for delivery. The capture and preservation resolution is 300-600dpi or 1200dpi; 2400dpi is used for slides. Delivery resolution is 72dpi. Capture, preservation and delivery bit depth is 24-bit for color images. JPEG level 3 or 7 and GIF compression formats are used for delivery to improve access speed. The program retains the original uncompressed files. Post processing operations using PhotoShop include cleaning and sharpening so long as they do not detract from the level of faithfulness to the original. The average file sizes vary but are approximately 5-9MB for archive TIFFs, 300-500KB for 8x10 images, 80-100KB for 4½-inch intermediary images and 18-25K for thumbnail images.

For others starting work in digital imaging the Oriental Institute would recommend considering carefully the trade off between scanning for the immediate purpose and scanning archival TIFF files once, in terms of the ramifications on disk space, money and re-scanning at a later date. A further recommendation would be never to touch the archival copy, and for any work whatsoever to be done on a second copy.

The only quality control procedures in place for the digital deliverables are visual checks although the Institute recognizes the need for better editorial policy on the deliverables. Metadata quality control procedures are more than one check on each record.

Users have a variety of access rights depending on their permissions, including open access to the catalog, open access to the catalog plus materials, and access restricted to in-house users. Users are also able to zoom in on QTVR movies. Searching and browsing facilities are keyword, Boolean, field operators and proximity. Metadata searching is handled in the same way. The level of usage is 300,000 hits per week on the website and usage goes up approximately 20% pr annum, with additional sharp seasonal increases at the beginning of the academic year. Usage is monitored by automatic data capture.

 

6.6       Evaluation, Funding and Long-term Sustainability

The Oriental Institute carried out front-end evaluation of users for its web page front end using small focus group discussions.

It is impossible for the Institute to estimate how much its digitization activities have cost, but it is less than imagined and the value of the work in relation to the effort and cost should not be underestimated. The education office has external private foundation funding; otherwise sources have been a mixture of government and non-governmental organizations with a significant amount of funding coming from charities for specific components. If the Institute were able to set its own funding, then ten years ago a very worthwhile activity would have been to develop a unified database for all projects. The Oriental Institute’s view is that the use of standards will allow it to move to future formats and it does not perceive the benefits in monetary terms, although projects have anyway tried to avoid using experimental formats. Funding organizations have requested project management plans, cost models, and workflow reports as pro forma items in bids.

New material and metadata are not added to a regular timetable, however, a major update occurs annually. The user interface has been changed once but may change annually in the future. The Oriental Institute does not have a formal strategy for digital preservation to ensure long-term access. At present text and image material are archived in the same way on removable drives and CDs, and the website drives are RAID protected. There is no off site storage or backup. This archiving/preservation strategy is based on migration of data and there are quality controls in place for life-cycle management.

It is intended to keep the digital deliverables available indefinitely. Longer-term sustainability is dependent on self-generating funds although verbal assurances have been received regarding the web publishable material. The project does not have an exit strategy if the Institute could no longer afford to sustain the digital deliverables. In this eventuality, the relationship between the Institute and the University would be critical. Loss of the digital deliverables would be significant.

 

6.7       Conclusion

The Oriental Institute at the University of Chicago provides an excellent example of the problems facing organizations in developing a strategic view of digitization when a significant proportion of their activities are research based. Where there is significant research digitization it would appear that there is little or no motive for the researchers concerned to adopt the same standards or view their output as a deliverable for a wider audience. On the other hand, the Institute’s “public face”, in the form of the museum and publications department do have motivation and a need for policies and procedures to guide the selection and prioritization of material. Add to this the fact that the driving force behind digitization is but a few individuals, and it is extremely difficult to gain sufficient momentum across the organization for an “institutional view” on digitization. As the interviewees rightly pointed out, the motivation for this is likely to come from the museum and publications department and it may be that procedures are formalized for these areas of the Institute’s activities while research based projects continue to operate independently. The Oriental Institute also highlights the problem that developments such as TEI DTDs do not yet adequately address more esoteric material. Whilst the guidelines such as the TEI have user extensibility built into them, this would not appear to be sufficient in this case. Therefore, if one is making a recommendation in relation to standards of “adopt, adapt and emulate” some projects must recognize that this may not be possible for their material, or that they will suffer redundancy or reformatting.




valid xhtml 1.1
abp~04/02