NINCH guide home interview table of contents previous interview next interview
On January 26 2001, HATII interviewed Martha Mahard, Curator of Visual Collections at the Fine Arts Library of Harvard University, and Ann Whiteside, Visual Resources Librarian at the Frances Loeb Library of Harvard’s Graduate School of Design. The Visual Information Access project is co-ordinated between Harvard’s Fine Arts Library and the Graduate School of Design. The aim of the project is to create a comprehensive catalogue containing information about the materials and artifacts housed in the University’s libraries, museums and archives. The materials are digitized for the purposes of preservation, increased accessibility, as well as to serve as resources for teaching and learning, for users both within and outside the University community.
The Visual Information Access (VIA) system is a union catalog of visual resources at Harvard University. It includes information about slides, photographs, objects and artifacts in the University’s libraries, museums, and archives. This system represents the first phase of an ongoing effort and is frequently updated. The program started in fall 1996, completed phase one in April 1998, and is ongoing. The participating libraries, museums, and archives are:
The institutions that participated in VIA conducted a collection survey which examined images holdings, how much information was available in automated databases, etc. This was useful to confirm their initial ideas; it gave a clearer indication of the breadth and scope of the initiative and showed which institutions wanted to be automated. The administration had very limited understanding of the size of resources and the implications of the project. The survey could have been more detailed, but it focused more on data collection and management issues, which was still very useful.
The collection survey was also used to establish priorities for digitizing holdings. Initially those with automated collections moved up the prioritization list. After that, the Library Digital Initiative (LDI) forced prioritization with the grants they offered.
The individual repositories make their own prioritization decisions. Many people digitize the material that is likely to attract grants, rather than what they would have preferred. External forces, such as the LC/Ameritech grant, allowed them to digitize Americana at the Graduate School of Design. This would otherwise not necessarily have been high on the prioritization list. Selection is also influenced by what is taught at the curriculum.
The Task Group Report on Visual Resources at Harvard, produced in May 1997 acts as a kind of strategic document.
The objective of the digitization policy was to create a union catalog to help facilitate access to visual resources (whether there were images accompanying the catalog or not). The catalog was to act as resource discovery tool. The project has been successful in achieving its aims. The time was right for VIA, as Harvard was exploring ways of offering access to visual resources beyond books. People now want to put their material on VIA.
One of the lessons learned from this process is that it is useful to make people aware that curatorial ideas and wishes do not always match what the funding will support.
Copyright is a serious impediment. One of the greatest obstacles was to establish use of a common language. Many participants did not have the technology at their fingertips. There was a variety of different needs, from the Museum, which wanted the highest possible resolution to the Schlesinger, which wanted the lowest. There was an overload of technical information that they had to learn. Communication with technical staff was not always easy. Another difficulty was setting standards and realizing that they could not have a single benchmark across the board. There were no sufficient or appropriate guidelines available when they started, so they did not know what constituted an acceptable deliverable.
The main criteria for selection, used by most of the individual repositories, were primarily intellectual property rights and enhanced access, followed by teaching and learning potential and conservation. These changed to some extent over time. For example, preservation was also an initial consideration, but they are now more concerned with the limitations of digital information.
There was co-operation with archives, other libraries, museums, and academic institutions, whese were all local.
This experience showed that in any collaborative project, it is important to be receptive to others, in order to learn from each other. It is important to try to get managers and technical staff on board first and to allow time for collaboration, since building communication takes time. Email helped, especially to discuss sensitive issues and disagreements that could not be mentioned in face-to-face meetings.
The digital deliverables are created for preservation, public access, teaching and learning resource, research, and wider access. The program produced a statement that was explicit about its scope and rationale. This is available on the web (http://via.harvard.edu/html/VIAscope.html) and refers more to the type and size of material contributed by each partner. This complements the brief statement at the opening VIA web page (“The Visual Information Access (VIA) system is a union catalog of visual resources at Harvard … This system represents the fist phase of an ongoing effort and additional information will be added on a regular basis.” http://via.harvard.edu/html/VIA.html).
The program digitized a wide range of source material with varied format and nature:
The materials selected for digitization were both the entire body of some collections, as well as representative samples. The digital deliverables were not intended to be re-used or re-purposed, but for some projects it was the other way round. Projects like MESL for the Museum and LC/Ameritech for the Design Library allowed them to digitize material that they brought to the program. This did not seem to work very well, with a great deal of massaging of data, rescanning and reformatting of older images. Another problem was the amount of material on Photo-CD that needed to be copied and that also demanded resources. If they had unlimited resources, they would have copied high-resolution images on the server. Problems such as these made them realize that it is usually not possible to find the perfect solution, for example to scan perfectly once. It is therefore useful, to accept that you are scanning for today and not necessarily for the future and that you might have to rescan in the future.
For representing content they used JPEG and SGML. For describing content, they used MARC, Dublin Core, Categories for the description of Works of Art, and the Visual Resources Association Core Categories. For controlling data values they use AAT, Library of Congress subject headings, Library of Congress subject thesaurus, and name authority files, but these are not imposed.
They looked at existing guidelines for digitizing particular document and object types, but did not find any that were relevant or useful.
They realized the necessity of compromise and accepting from the beginning that there is no one right answer. That is acceptable as long as everyone agrees on one standard, so that everyone may claim joint ownership of the project.
The primary intended audience for the digital deliverables is: four-year college, graduate school, distance learning, public library and archive users. The vast majority of resources on VIA are accessible to all, so apart from the targeted audience, anyone who desires to use Harvard’s public catalogs and the linked objects that have not been restricted to the Harvard community, can do so. This may be of interest to school teachers.
The program did not acknowledge the needs of those with disabilities.
They are working on security of the system and access restrictions at the moment. Restrictions are due to copyright and sensitivity of material.
Evaluation of the targeted audience is going to be carried out as the next step of the program. Five or six people on the central team have been working in this area for many years. They need independent staff to carry out the evaluation, since those working on the program are too closely involved and lack the right specialization. Although they do not have any detailed hard data about the system’s use, there are more users than were anticipated.
They received project management advice from the wider University. The management of the program is carried out by the Steering Group with help from the staff of the Information Systems department of the central Library. The collaborative nature of the program has led them to explore other ways they can share funding. They have regular meetings of the Steering Group, an email list, and regular checks by the Systems Office. Initially they did things individually, without delegating, which did not work. These project management procedures were suggested to them by the University Library Council to which they report. They did not carry out any feasibility or pilot study to assist with the planning of the program.
Digitization was mainly carried out in-house because the central Library Digitization Initiative (LDI) wanted to develop local infrastructure and skills. In some cases it was outsourced, because of historical legacy with the CD-ROMs.
|
Technologies Used for Image Digitization |
|
Flatbed scanners (various manufacturers) |
|
Film scanners |
|
High-end professional cameras |
They established a set of guidelines for document scanning and digital photography (see LDI report). They used grayscales, color charts, and targets as benchmarks (see LDI and Museum report).
The number of staff who work on the VIA program are as follows:
|
Type of Staff |
Number |
% of time on the project |
|
Director (Steering Committee |
|
|
|
Metadata specialist (1 per team) |
1 |
30 |
|
Curator |
10 |
30 |
|
Digitizer |
2 |
50 |
|
Photographer |
2 |
50 |
|
Technical support staff |
2 |
50 +30 |
|
Technical development staff |
3 |
30 |
The background and profile of most people on the project is curatorial and technical. Only one member of staff was re-deployed from another area.
Although advice was available in-house about technical aspects of digitization, they also used some external consultants (e.g. Eastman House Permanence Institute). Areas where training needs have been identified include application of technical standards, preparation and handling of materials for digitization, technical operation of digitizing equipment, post digitization processes, metadata creation, and digital preservation. Training was received by curatorial staff, specialist technical staff, library/cataloging staff, and equipment operators. This was organized in a variety of ways: in-house, using project staff, the Library’s own consultants, and external consultants, by attending external courses, with independent study, and by learning on the job. The training successfully met the needs of the organization.
They are aware of the copyright status of the digital deliverables that the program is creating. Due to the nature of the material (e.g. with artists often owning copyright of analog works), this is a complex area. Although they own the copyright for most of the materials that they digitize, there are some where they do not, but these can be used via the standard legal provisions made for libraries. The rights information is available, but they do not put it on VIA. There is a standard copyright statement on the VIA opening web page: “Most of these materials are owned, held, or licensed by the President and Fellows of Harvard College. They are being provided solely for the purpose of teaching or individual research. Any other use, including commercial reuse, mounting on other systems, or other forms of redistribution requires permission of the appropriate office of Harvard University”. Users of VIA can make printouts on paper or download to a PC both thumbnails and lower quality images. No electronic management systems are used to control copying.
There is a conservation procedure for original materials. The investigations carried out into the condition of the original materials prior to digitization vary according to the institution participating in the project. No risks to the material were identified during the preparation or digitization process. For some material they specified special equipment and digitization processes, e.g. special handling, use of cradles. The material is prepared/monitored by curatorial and conservation staff before and during digitization. In most cases, the organization will not restrict access to the original once it has been digitized, except for preservation reasons, e.g. lantern slides.
The cataloging and reference systems used by the VIA participating institutions before digitization were a variety of both manual and automated. For example, the Museum uses EmbARK, while the Library has created a customized version of EmbARK. Others use FileMaker Pro. In some cases material had to be altered from the original form for the digitization process. No material was rejected before digital imaging. They digitized originals, as well as reproductions and intermediaries (the majority). They used 35mm slides, 4x5 or 8x10 transparencies, and photographic prints.
The originals are in most cases cataloged. The metadata standards used to catalog originals are based on the Harvard core, their own in-house system which is developed using the SiteSearch database system from OCLC (the OnLine Computer Library Center) and uses Access as a front-end. Some of the participating institutions also use Dublin Core, MARC, VRA, CDWA. They use a controlled Vocabulary, thesaurus and a classification scheme to control data values. The metadata records information about the original object, the digital object, the digitization process, technical details, staffing details, and administrative information. The metadata records were created by a range of staff, including digitizers and archivists/information professionals. The metadata records for digital deliverables are stored in a separate catalog, which is available on the web.
The file formats used for images were JPEG and TIFF for delivering, and TIFF, Photo-CD and JPEG for capturing. The resolution used is 300-600dpi for capturing and preserving and 800 pixels (dpi varies) for delivering images. They used 1 to 48 bit depth for capturing and preserving, and 4 to 24 for delivering. Group 4 PCD compression was used at capturing and delivering stage while JPEG compression, GIF/LZW was used for delivering. The aim of the compression was to reduce cost (by reducing file size, since storage in Harvard’s Digital Repository is charged annually by GB), to improve access, to enhance usability, and to decrease storage requirements. The original scans are retained in uncompressed form, except for 1-bit scanning and Photo-CD scanning projects). See LDI report for details.
The quality of digital deliverables rests with each repository, but they are currently discussing issues around that at this stage. They carry out a random check on materials. The quality control for metadata recording relies on checking with the catalogers.
Users have free open access to both the catalog and the digital deliverables for browsing, but not for manipulating them in any way. This is high on their list of priorities for exploring in the future. Access to some material is restricted to in-house users. Users can browse by 9-10 fields (e.g. name, title, subject, place, nationality, date, repository, ID). They monitor use via automatic data capture by the digital repository and other systems and web logs. The level of use has not been very high yet (under 2,000 hits per month), but is growing.
The methods used for informing potential users about the digital deliverables created are: announcement on website, press release, articles in print media, announcements at conferences and meetings. This is a very important area, which they will develop, as VIA is still underused.
They have not carried out any formal evaluation yet, but are about to organize one now, as mentioned above in the section on target audience.
Each repository contributed the cataloging and the staff. The Office for Information Systems of the Library contributed the rest. They produced an interim preliminary report.
All elements of the project will need to be updated. New materials and metadata will need to be digitized all the time, metadata will need changing, the user interface will probably need to be changed, and it is highly likely that they will have to change file formats. They are in the process of developing a preservation strategy. No exit strategy has been established.
General comments about the program included the need for curators to be able to communicate better with technical staff or try to find “interpreters” initially. People starting in this area should not think that they can do it alone. Administrative support and funds, but also belief in the project are all key. It is important to secure institutional commitment. They were fortunate that they did this at institutional level.
In this interview, it was interesting to see the perspective of two committed librarians / subject specialists who embarked on a digitization project that has since become a program. They had to learn a lot of technical information and a new language, and overcome a steep learning curve. The digitization activity was an add-on project for their institution and their own time. In this type of collaborative project where several institutions from a wide range of disciplines are working together, it was evident that communication was vital, as was stressed several times during the interview. Setting common standards across the field was also important, while accepting that no one solution will work for all materials. The wide-ranging nature of the content and format of the material that is being digitized is an interesting model for the provision of access to visual resources that other universities and similar institutions could benefit from, with serious implications for teaching and research.