NINCH guide home        interview table of contents        previous interview        next interview

 

14   Digital Imaging and Media Technology Initiative (DIMTI)

 

On January 8 2001, HATII interviewed Beth Sandore, Professor of Library Administration and Head of the Digital Imaging and Media Technology Institute at the University of Illinois. The primary objective of DIMTI is to create digital deliverables as resources for teaching, learning and research. The Institute has liaised with other organizations both nationally and internationally, with a view to creating a fully comprehensive archive. While the intended audience for these resources is predominantly the University community, the Institute recognizes that audiences outside this sector can also benefit from access to the material, and continues to make developments in this area.

 

14.1    Organizational Digitization Program and Policy

The University of Illinois’s Digital Imaging and Media Technology Initiative’s (DIMTI) digitization activities began in late 1994 when the University of Illinois Library was approached by two corporate sponsors about academic digital imaging. Kodak wished to promote its PhotoCD format and the Follet Corporation, better known as a textbook distributor, was interested in collaborating to develop image databases through the digitization of special collections. Immediately following this the Library joined the MESL project at its outset. This provided a ready made database that familiarized the Library with issues related to large-scale repositories, some 10,000 images over 4GB of space. The two years of this project were the most useful and productive for DIMTI and entailed agreeing upon issues such as intellectual property, metadata and interface design amongst the seven universities involved. This project also carried out user surveys with 350 students and 20 faculty members, and Getty funded several very useful MESL meetings. Although there was no formal collection survey as part of this project the Library interviewed local curators, special collections staff, archivists and librarians and made a call across faculty to suggest projects. The results of this call were also used for the Kodak PhotoCD project mentioned above. The Library learnt about guidelines and standards and suffered for a lack of these. As a result, they developed their own metadata standards early on, and have subsequently adopted a number of metadata image file format guidelines.

Involvement in these projects has helped establish priorities for digitizing holdings. A formal strategic policy statement was developed in fall 2000, with initial objectives to be realized by 2003. The responsibility for the creation of this statement lies with the Library Information Technology (LIT) Committee, which advises the University Librarian and Library faculty about emerging information technologies. This policy has been approved by the University Librarian and two advisory groups (an Administrative Council and Executive Committee). The Library has a very flat administrative structure and these two bodies help build consensus across the Library. The establishment of digitization objectives by the LIT is the most tangible problem in the development of this policy and while most are optimistic about moving forward in this fashion, some feel that the Library needs a hierarchical structure, in particular with regard to creating a vision.

The Library is also involved in the University’s Advanced Humanities Technology Group, which discusses the application of advanced technology and has aided digitization initiatives on emblem books, historical maps and aerial photography. At present, the DIMTI has adopted a set of rolling goals that are revised yearly. In practice the DIMTI does not undertake a project unless there are faculty who want to do it or will use it.

One major obstacle to the development of a strategic policy has been that the institution has not fully addressed the need for an internal re-allocation or new funding, as it cannot rely solely upon external funding. A recommendation to other institutions that are attempting to build digital library programs would be to plan the infrastructure so that digitization operations become integrated into mainstream activities, although this may be implemented in different ways.

The main overall obstacle to planning the development of digital deliverables from the Library’s collection has been how to deliver these. A strong thread through DIMTI has been the undertaking of exploratory projects, but this has been relatively easy. The difficulty has been in determining intellectual access in terms of appropriate benchmarks and resolutions. Therefore, an obstacle to the process of building digital deliverables has been the need to develop robust databases and structures, and testing metadata and interoperability. DIMTI would strongly suggest that if you don’t know how you are going to deliver material, do not digitize it. To help in this process, DIMTI have under development a software applet that determines capture settings according to type of material, size, etc., using the Cornell quality index formula.

As a result of the above deliberations, DIMTI’s primary selection criteria are the material’s teaching and learning potential, as expressed through faculty interest, which in turn contributes to new research content. Another primary selection criterion is the material’s historical and cultural value expressed in terms of uniqueness, an example of this is the Library’s emblem book collection. Secondary selection criteria are enhanced access and preservation (against content loss). DIMTI places access criteria in the context of usage and scholarship. For the digitization of aerial photographs, preservation was the number one priority and here the cost projection of $40-$50 per image was not as expensive as cataloging a book. These cost projections represent one element of a set of further considerations for digitizing any collection: Does DIMTI have the tools? What is the cost? Do we have the expertise? What is the cost of metadata? All these criteria have stayed stable since 1994, however, DIMTI believes that digitization for archiving is going to come into the foreground.

The DIMTI has co-operated with other university archives, libraries, museums, historical societies, academic faculty and users and corporations at the local, regional, national and international levels. There are also a number of projects planned with further co-operation in mind.

The DIMTI has no anticipated end date, but the Head suggests that they may not exist in ten years’ time if the goal of integrating digitization activities into the mainstream activities of the Library is achieved. In this scenario there may be just one coordinator to provide an overview of the various projects and ensure they are managed in a complementary fashion.

The primary purpose in creating the digital deliverables reflects DIMTI’s selection criteria. Digital deliverables are primarily a teaching, learning and research resource. Revenue generation was a factor injected at the start of the initiative but has subsequently been put aside. However, DIMTI is beginning a small experimental project to publish some selected parts of collections with the aim of cost recovery. This material would also be made freely available by the Library on the web.

The DIMTI produced a two-paragraph mission statement, published on its website regarding the significance of its activities.

The type of source material digitized includes:

The predominant medium of the materials to be digitized is paper, but for visual, graphical material the form varies from 6x7” to 3x4’ for oversize maps. Until now the digitized deliverables have usually represented a sample of material rather than an entire body. For the DIMTI the intention to re-purpose the digital deliverables has always been part of its remit. For example, they work with the educational technology group on campus and elementary school teachers to produce electronic guides using some material and metadata. This approach has worked well and the DIMTI has found that it has been most common for faculty members to point to material for web syllabi.

The following standards, guidelines or tools are used for representing content:

The following standards, guidelines or tools are used for describing content:

The following standards, guidelines or tools are used for controlling data values:

The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy. These included TEI recommended best practices at the University of Michigan, the IFLA Metadata Website, the Berkeley Digital Library SunSite, and the Library of Congress National Digital Library Program metadata sites. The DIMTI modified Dublin Core for a few projects where documents had to map to the basic metadata scheme held in the project database.

The following standards, guidelines or tools are used for representing structure:

In relation to standards in general and navigating between the ideal and the realistic, the DIMTI has drawn from the ideal metadata in the Making of America 2 Interim Report and found that providing at least 15 Dublin Core elements has worked well to identify static images of visual materials and text documents.

The intended audience for the digital deliverables depends on the project but the university and research community is at the heart. However, there is also significant K-12 and general public use. With regard to metadata, DIMTI think in terms of the university community and then try and gear the metadata to K-12 where necessary. This has not been as hard as one might think.

As part of the MESL project, an evaluation of the university target audience was undertaken. This established a baseline across the seven universities involved, for preferences in functionality and access to an art image database. This demonstrated that rich metadata was as important to users as the images themselves. In most cases the profile of users has been that anticipated, except for heavier general public use.

To an extent, DIMTI has recognized the needs of those with disabilities, but there are no text-only versions due to the exploratory nature of the projects. In only one case are there limitations to the use of the digital deliverables where access is restricted by IP address. This is a subscription to the AMICO database.

 

14.2    Project Management and Planning

Both internal and external advice was available on managing the digitization program. Internal advice included using the Survey Research Laboratory for statistical analysis and the Division of Academic Outreach Web Technology Group on campus for advice on the design of instructional material. External technical consultancy was used for aspects such as interface design and search engines, although this tended to be informal. The existence of DIMTI has led to changes in organizational relationships and procedures. The involvement of the LIT Committee in the development of a strategic plan is a recognition that DIMTI and other projects are viable and provide a locus of expertise. An ineffective project management procedure has been attempting institutional collaboration without ensuring the partner institution has sufficient human resources in place to support the work. Managerial quality assurance procedures include developing timelines and plans in advance, with specific milestones and events. These plans are revisited every quarter and made available on the web for all parties.

Feasibility and pilot studies have been carried out for scheduling, training needs, technical feasibility, user needs, workflow analysis, workflow piloting and technology forecasting. DIMTI have made changes to the design of projects as a result of preliminary studies. The projects have not departed from their original goals but the timelines have been changed based on technology changes and the ability of partners to deliver aspects of the project.

Benchmarking studies are undertaken, which include determining network throughput, however, all procedures are analyzed throughout. The delegation of work is determined by externally funded projects, where a project coordinator and dedicated staff are hired. There is more flexibility in internal work, where staff have been hired for their areas of expertise and to develop the skills set of the core staff.

Some digitization has been outsourced but there has not yet been a large enough project to justify this. In-house digitization has been the predominant method, since the student body represents a captive and cheap labor source and the University of Illinois Printing Service has a Xerox Digipath that can handle batch digitization of text documents. All other equipment has been bought in, as there was nothing available at the start. The decision of which digitization process to adopt, depends on the type and volume of material and the conflict levels involved. For example, if the material is rare or original it is currently more acceptable to set up the digitization on site. For its digitization capture, DIMTI uses two Epson flatbed scanners for material up to 12x17” (800dpi, 36 or 48 bit color input/output), a large format color scanner, a 1500dpi color slide and film scanner and a 3 megapixel Nikon D3 digital camera. DIMTI also has a large format six-color inkjet printer and a 600dpi color laser printer. Guidelines for data capture procedures for this equipment have been established. Benchmarks used for image digitization are gray scales, color charts and reproduction charts. The slide scanner is calibrated using a printer color palette and the digital camera will use the vendor's calibration software as soon as it becomes available for the Windows platform.

 

14.3    Human Resources and Training

The unit employs three FTE professionals, including a full-time director, and two Visiting Special Projects Librarians who co-ordinate externally funded projects. It also employs one FTE graduate student who specializes in programming and database development, digitization benchmarking, and web page creation. The professionals have expertise in metadata, digitization, database design and development, instructional materials development and evaluation. For professional positions, staff have a library degree or equivalent degree experience. Graduate student employees have some interest in library studies or information retrieval. In addition, various Library staff and faculty participate in specific projects in a team environment, and receive necessary training.

 

Training needs are assessed through talking to project team members and asking about their skills. Areas of training that have been identified include:

The team members that have received training include:

Training has been organized using the following:

The training has met the needs of the project.

 

14.4    Project Life-Cycle Processes and Procedures

The project is aware of the copyright position of the digital deliverables and owns the copyright in most of the original materials. The copyright status of the digital deliverable is declared in a statement on the project’s website. DIMTI is comfortable with the effectiveness of this practice. Material in copyright was digitized with the owners’ agreement. Users are allowed to make printouts of the digital deliverables on paper and download to a PC or LAN.

For textual material users can download:

For digital image material users can download:

For digital image material users can only view highest quality images on-site.

For digital audio material users can download:

For digital audio material users can only listen to high fidelity sound on-site.

For digital moving images users can download:

For digital moving images users can only view highest quality digital video clips on-site.

The Digimarc electronic watermarking management system is in use for selected projects and this has proved effective.

DIMTI recognizes the need to develop a conservation procedure for original materials. The project works with curatorial staff to assess the condition of original materials prior to digitization. At present there is no head of preservation in the Library so conservation activities are incomplete. Previous work has included encapsulating material and ensuring that materials are stored in acid-free sleeves and boxes. An example of some of the problems identified prior to digitization was the aerial photography collection. These had not been stored in ideal conditions and the photographs had become curled, the emulsion cracked and there were drawings and marks on the prints. Procedures to minimize risks include using the gloves for handling, acid free photomounts and limiting light exposure. DIMTI has used cool lights but the deliverables were not as good; they have also stayed away from digitizing fragile books because they do not have a cradle. Materials are prepared and monitored by curatorial staff during digitization. Access has not yet been restricted to any original material because of digitization. Ultimately this depends on the fragility of the collection and the Library is conscious of its role as a public research institution.

Much of the original material in special collections is uncataloged or has a finding aid but most material is cataloged at the collection level in the Library system using MARC records. DIMTI uses all the information that is available from these records during the digitization process. The project sometimes must locate core reference material to complete the digital deliverables or alter material from its original form for digitization. Material has been rejected for digital imaging because of skewing, print skew or blurred pages.

Intermediaries (i.e. a second generation medium derived from the original) used for image digitization are:

Intermediaries used for audio digitization are audio cassettes.

Intermediaries used for moving image digitization are video copies from film.

In only one case did DIMTI catalog original material; in this case an inventory list was produced in collaboration with the curator concerned. This work was written into the grant for the project. An institutional decision on how to catalog digital deliverables has not been reached yet, for example on whether or not to include details about the original in the metadata. In its cataloging, DIMTI includes information on the original on the basis that users first want to correctly identify and find material and then decide in what format they want to receive it.

The following standards or guidelines are used for cataloging the digital deliverables:

 

DIMTI developed its own in-house system for holding metadata using off-the-shelf database and scripting software (SQL, MS-ASP).

Tools used for controlling data values are:

Metadata details are recorded about:

Metadata records are created by either an information professional or the digitizer who is also an information professional. The decision of who undertakes this work depends upon the project, size of the workload, specificity of the metadata and the qualifications of staff. The non-MARC metadata records are held in a separate catalog from the main Library catalog and are available on the internet. The records in the two catalogs are partly independent of each other, with some similar features.

MARC and Dublin Core records for digital resources were created as part of the OCLC CORC project. An online catalog system was implemented three years ago, which can use the MARC 856 field, although the Library does not locally have a convenient capacity to use this feature at the moment.

 

14.5    Format, Resolution and Compression of Digitized Materials

The formats for retroconverted text-based digital deliverables are:

No texts contained non-Latin scripts. OCR is not used a great deal as a conversion method for textual materials. It was used for scanning facsimiles and 65% accuracy was achieved (higher order ASCII character set). Initially Paperport suite was used for OCR and then TextBridge and OmniPage. The aim of using OCR was for automatic indexing, enhanced searching and computer based analysis. No special treatment was carried out prior to OCR. From this experience, DIMTI would recommend thinking carefully about the need for full text capability. DIMTI is looking at using Adobe Distiller for on the fly OCR to create indexes. No major “keying in” conversion has been done, only where it has been important to save the intellectual content at 100% accuracy.

For image material, the TIFF file format is used for the capture and preservation of master images and JPEG or GIF for delivery. Photo-CD has also been used for capture and JPEG or GIF for delivery. The capture and preservation resolution varies, depending on the original material and the benchmark. Delivery resolution averages 72-150 dpi, and is rarely more than 200dpi. Capture and preservation bit depth for bi-tonal images is 1 bit, for grayscale it is 12 bit and for color 24 or 36 bit. Delivery bit depth is 1 bit for bi-tonal, 8 bit or lower for grayscale and 8-24 bit for color depending on the material (for example continuous tone material is delivered at 24-bit compressed and illustrations are at 8 bit). Average file size for capture and preservation depends on the project but is in the range of 15-20 MB although with wide variation. Average file size for delivery is 200KB or less. JPEG image compression is used for delivery, to enhance usability. The project retains the original scans in uncompressed form. The project carries out post processing on images using PhotoShop for sharpening, unsharp and masking. For this process, testing on samples is undertaken on similar groups of material and scripts are then written for batch processing. Occasionally the contrast or color balance is increased.

For others starting work in digital imaging, DIMTI would recommend resisting the temptation to draw conclusions from limited experience – these will change as the scope increases. It is sensible to take on larger scale projects because common factors can then be seen. DIMTI also recommends using the Cornell guidelines as minimum standards (taking care that they are based on text scanning), while balancing these with local benchmarking practice and knowledge of users’ needs and potential uses of the digital objects through pilot projects.

DIMTI has recently started working with sound digitization and uses WAV and RealAudio formats for capture, preservation and delivery. The audio sampling bit rate is 20 bit for capture and preservation and 16 bit for delivery.

The quality control procedures in place for the digital deliverables are a random set of checks on materials and a percentage check on carriers. Metadata quality control procedures are question and answer checks on the database, field by field, using human and script review. In addition there is scripted review of the web pages. Quality control procedures have become part of the daily routine work of staff. As a result, problems are identified and corrected along with the routine workflow wherever possible.

Users have open access to the catalog plus materials. Searching and browsing facilities are full text, fielded searches, simple and advanced modes with SQL employed server side. This same system handles metadata searching. Users are also able to save sets of selected items in some cases. The level of usage is in the region of 25,000 uses per month, monitored by automatic data capture and focus group interviews. Users do not have to pay to use the digital deliverables. Potential users of the digital deliverables are informed about their availability through website announcements, press releases, articles in print media, broadcast media coverage, conferences, meetings, email and conventional mail shots and registering with search engines. Press releases, registering with search engines, emails and conference presentations have been the most effective methods of dissemination.

 

14.6    Evaluation, Funding and Long-term Sustainability

For the MESL project, the DIMTI carried out front-end evaluation of users with a pre and post-development survey that included paper and online questionnaires, email, focus group discussion, user observation and computer interaction of logging. This led to changes in the interface and search options, metadata display and the underlying index.

Formative evaluation was carried out locally using focus group discussions and user observation, and this contributed to the changes mentioned above.

Summative evaluation was undertaken using paper and online questionnaires, email, focus group discussion, user observation and computer interaction of logging. This was undertaken to determine how the availability of digital images affected the established use of visual resources for coursework and research. If DIMTI were to repeat its evaluation work, it would follow up with focus groups. The main suggestion from the quantitative evaluation was that the more users learned about digital image availability, the less adequate they felt their skills were to effectively integrate images into their coursework and research. The results of this evaluation were disseminated through conference proceedings and are available on the web at the Getty Research Institute site.

DIMTI’s budget is approximately $150,000 per year, excluding the Head’s salary, and 75% of this is external funding. If DIMTI had had $200,000 a year they would have been able to hire a dedicated research programmer and a colleague to work on faculty liaison and content standards. The DIMTI’s view is that in this environment the use of standards has saved money in the short and long term. Funding organizations’ monitoring of DIMTI is built into the grants. Funding organizations have requested project management plans, cost models, workflow reports and user evaluation reports from DIMTI.

New material and metadata are added on an ongoing basis with periodic changes to metadata, user interface and file formats. DIMTI does not have a formal strategy for preservation to ensure long-term access. At present DIMTI checks data integrity annually and uses tape back-up and three sets of CDs stored in climate and light controlled conditions, but not cold storage. This is an informal stopgap strategy, based on original hardware and application software retention and migration of data, until the Library develops a formal plan.

It is intended to keep the digital deliverables available for at least 25 years, or such time as their use is not deemed sufficient to make widespread access available. Digital preservation strategies are currently being explored. The resource will not generate sufficient income to sustain DIMTI and the project does not have an exit strategy. This is one major problem that all staff are working on; a revenue stream needs to be identified from a combination of internal re-allocation, new money and external support. Loss of the digital deliverables would matter.

 

14.7    Conclusion

DIMTI is an excellent example of “joined-up thinking” in relation to digitization. The benefits of establishing a body such as DIMTI at the outset of digitization is demonstrated in the strategic overview that DIMTI enjoys. Whereas other libraries are dealing with formalizing policy they are doing so on top of largely autonomous, independent and entrepreneurial projects. Inevitably, trying to create a unified institutional vision or procedure is more difficult than if a single body had responsibility for dealing with generic issues from the outset. Furthermore, DIMTI demonstrates that such a body does not need to be large when supported by other appropriate administrative or managerial bodies. The fact that the University of Illinois Library has a relatively flat structure seems to have been beneficial in this case. Lastly, the goal of integrating digitization into the existing Library activities and structures commends itself for being both far sighted and an effective mechanism to garner institutional support and maintain digitization activities in the long-term.




valid xhtml 1.1
abp~04/02