NINCH guide home        interview table of contents        previous interview        next interview

 

24   New York Public Library (NYPL)

 

Barbara Taranto, Director of the Digital Library Program at the New York Public Library, was interviewed by HATII on November 26 2001.[7] The DLP is responsible for overseeing collection surveys of the NYPL’s research libraries. The Science, Business and Industry Library is one such department whose resources are being considered for inclusion in the digitization process. The materials are assessed in terms of their historical and cultural value, as well as their research significance. The Program hopes to allow a more inclusive audience to have access to the materials held by the research libraries, while broadening the overall vision of the NYPL.

 

24.1    Organizational Digitization Program and Policy

Over the past year, NYPL has developed its digitization strategy from having no system in place, to the creation of a robust digital library program suitable for many different materials. The impetus behind the establishment of this more formal strategy was $5 million worth of funding. This was originally in place to digitize 600,000 images over a period of five years, but it has now formed the foundation for the wide-ranging Program. There was an initial collection survey of sorts undertaken that prompted the original approach to donors. Now there is a more systematic survey to identify materials.

The NYPL is in two parts, the public branch libraries and the research libraries. The collection survey to identify material has been completed across the research libraries, of which the main ones are SIBL, Lincoln and the Schomburg center for studies in black culture. This survey is managed by the Digital Library Program (DLP) and is advised upon by a committee of curators, appointed by the library. The primary objectives are ease of access, ease of use, ease of navigation, and as quick a delivery as possible. A number of lessons have been learned from the first survey such as the discovery of scattered collections, problems of absent metadata or metadata divorced from the collection, the ease, or otherwise, of scheduling scanning, and curators putting forward material for their own reasons. Collections on arrival to the library had been broken up and given to different locations within the library (depending upon, amongst other things, the rarity of the objects and the content of the materials). One of the major aims of the Program is to re-unite these scattered materials and create an intellectual intellectual landscape with complete collections. Similarly, the technical metadata now specifically refers to the objects, whereas in many cases there had been no associated technical metadata.

The collection survey was used to establish a preliminary priority of how much material could be contracted out (the standard material). The project also looked at other US digitization projects. The Deputy Director for Research Libraries established these priorities in conjunction with curators. These priorities were then formalized by the DLP in a strategic policy statement, which was approved by the DLP and the advisory group.

Rather than having production as a main objective, the library is concentrating on developing an infrastructure for archiving as a model for standardization and creating an OAIS model for ingest format. They are working towards an open source system with attached metadata, but which is not embedded in the actual object. They have taken this model from the Harvard system. For each session of creation all the metadata is bundled in XML with the object. The program is also working towards the creation of multiple search options designed around the varied needs of user. Once complete this will offer searching by text entry, and visual entry.

Broadening access, social inclusion and reinforcing the NYPL’s presence in the new library forum are subsequent objectives. The project has not yet achieved these objectives because the funding came through more quickly than the original project envisaged, recruitment of qualified staff has been difficult and production schedules slower. A recommendation for organizations attempting to formalize their selection criteria in strategic policy statements is to adopt formal participation procedures.

The one overall obstacle to planning the development of the digital deliverables has been recruitment, and for building the digital deliverables the timing and establishing of the proportion of in-house and external work.

The criteria that have guided the project in selecting and prioritizing materials for digitization are as follows.

Selection is restricted to pre-1923 material. Historical and cultural value is key. Research significance is important, as are enhanced access and social inclusion to reinforce the Library’s public reputation. Preservation against content loss is also a significant factor. The material’s teaching and learning potential is less of an issue and, although conservation is important for curators, it is a middle priority for the project, along with improved functionality and potential commercial exploitation (the latter only through reproduction). Publicity is important in the general sense and research into digitization strategies will become important from year three. Space rationalization and labor cost reductions have the least priority and will only be a factor through reduced handling. It is too early to say whether these criteria will change over time.

The project has not co-operated to date with other organizations in developing its digitization program. The current status of the project is ongoing. The DLP is indefinite and its establishment as a fifth research center/library of the NYPL is possible. The start date of this phase of the project is 2000 and anticipated end date is 2004.

The purpose in creating the digital deliverables is research for the NYPL and profession at large. The project has not made an explicit statement of intent.

The type of source material digitized includes:

The nature and format of the materials digitized is wide and varied; photographic negatives and postcard-sized prints are outsourced. Incunabula had been looked at for inclusion in the Program. This was in reponse to a research project in collaboration with Princeton University studying their typeface with mathematical models. Digital microfilm already exixts at the library, but the project is working on a ‘page-turning’ program for it. Projects do sometimes lead the Program into new avenues; these can originate within, or outwith the Library. The digital deliverables represent neither a sample nor an entire body of material — the aim is to create an entirely new collection. At present there are no plans for the project to re-purpose the digital deliverables, but there are some ideas.

The following standards, guidelines or tools are used for representing content:

The following standards, guidelines or tools are used for describing content:

The Program is developing a system which involves traditional hierarchies of content description, but also allows resemblancies across these hierarchies to be displayed, which is rarely available. To do this they are using a MARC system for hierarchies and a relational database to chart resemblancies. This is in progress.

The following standards, guidelines or tools are used for controlling data values:

The program looked at other existing guidelines for digitizing particular document types when planning its digitization strategy, including LC and Berkeley.

The following standards, guidelines or tools are used for representing structure:

The recommendation in relation to standards in general and for navigating between the ideal and the realistic is to adopt some general standards but not necessarily all the details in practice.

The primary intended audiences for digital deliverables are graduate school, lifelong learning, distance learning and public library users. Secondary target audiences are K-12, community college and four-year college. The project has not carried out any evaluation of the target audiences but will do so when a research post is established.

The deliverables can be used by audiences other than the primary intended target audience. The project has acknowledged the needs of those with disabilities via the W3C’s “Guidelines for Web Site Accessibility” but is not sure to what extent. No limitations on the use of the digital deliverables are imposed directly. However, to protect the Library's rights in the digital deliverables the quality is limited for the freely available material, which prevents it being re-used for publishing. The Library is also protective of its reputation and name and wants both to safeguard possible income and not misuse sensitive material.

 

24.2    Project Management and Planning

There has been no external project management and one member of staff has a project management background. The management of the project is part of the federal structure of the NYPL, means it can take longer than desired to reach decisions. There have not been any changes or re-evaluations of organizational relationships or procedures. There are not as many formal project management procedures in place as the project would like. There is an Advisory Group but it has no responsibility; ultimately the Director is responsible to the Director of Research Libraries. The managerial quality assurance procedure is through the Advisory Group.

Neither pilot nor feasibility studies have been undertaken to assist project management. Some time and motion studies have been carried out in-house by the digital imaging unit (with a background in film processing). The project is starting to use MS project manager for scheduling. Who does what work was based on early arbitrary staff decisions.

Digitization is both carried out in-house and is outsourced. The more standard, bulk material is outsourced for efficiency, while the more specialized, rare and fragile material is digitized in-house. Some equipment was available in-house and other equipment was bought in. The decision on which digitization process to adopt is based on flow or batch processing wherever possible. Flatbed scanners (2xHP Deskscans) digital cameras (SLRs with digital backs) and high-end professional cameras (Kontron) are all used by the project. Guidelines for data capture procedures have been established (an external company for calibration and practice manual). The benchmarks used are Kodak gray scales and color charts.

 

24.3    Human Resources and Training

The project employs one director (100%), potentially four metadata specialists (two in post 100%) plus three part-time (1.5 FTE, 50%) twelve curators (10% max), eight digitizers (five in post 100%), technical support/development staff (three in NYPL information technology group) one FT 100% on project.

Almost all the staff have library backgrounds, the technicians are fully qualified, the digitization staff were trained on the job and the manager has a library photo background. About one third of staff were re-deployed from other areas. Both internal and external advice was available on the technical aspects of digitization (handling, light, heating).

The training needs of the project team have not been assessed but areas where training needs have been identified are project management, preparation and handling of materials for digitization and the technical operation of digitization equipment. All members of the project team have or will be engaged in training and this has been organized in-house (own and external consultants), and via external courses, independent study and learning on the job. The training has so far met the needs of the project.

The project has identified that training must take place to give coherence to the whole project. Production staff are being taught to see the wider picture of the project they are working for, and the importance that metadata, and other aspects in order to give full and meaningful delivery of the objects.

 

24.4    Project Life-Cycle Processes and Procedures

The project is aware of the copyright position of the digital deliverables and owns the copyright to the original materials. The copyright or rights status of the final digital deliverable is declared in a copyright statement on the web. Although the Library digitizes material in copyright (under the legal provision for libraries and with the owners’ agreement) the project does not do this.

Users of the digital deliverables are allowed to make printouts on paper and film, burn to CD, DVD etc., and download to a PC, LAN or WAN as long as it is for personal use. Users can download and view thumbnails and lower quality images, download and listen to less than 30 second sound samples and full length compressed sound, and download and view samples and lower quality digital video clips for moving images. There is much discussion in the project and Library at present about the use of electronic management systems, such as watermarking.

The project has a conservation procedure for the original materials whose condition is investigated by conservators. No conservation work has been undertaken yet. The project has not modified, degraded or compromised any material to carry out digitization but much was already dis-bound. Risk assessment of the material during preparation for digitization was undertaken by an external consultancy and made use of the Head of Conservation’s guidelines (which were based on Library standards). Special equipment and processes specified include book cradles and exposure limits. Materials are prepared by curatorial or preservation staffs prior to digitization and are sometimes monitored by them during digitization, but in most cases this is avoided. No restrictions are placed on the originals post-digitization but this is a proposal for future material.

Cataloging systems in place prior to digitization include catalog and reference systems (CATNYP) at the collection level as well as bibliographies and handlists. Identification and metadata information is used from these records in the digitization process. The project does not have access to all the relevant information (e.g. cataloging) and has to locate some core reference or source material. The project does not intend to alter the originals for the digitization process. Material has been rejected for incoherent paging, incomplete works, and micro-images of inappropriate density and blurred pages. Where possible, the project has digitized from originals. Intermediaries that have been used are slides/35mm or 4x5 transparencies, photographic prints (the project might use microfilm), 78rpm discs, wax cylinders and video copies from film.

The project does not catalog the original material per se. A record is created for the digital surrogate, now based less on Dublin Core, since the intention is to create a bibliographic record with rules of inheritance: a system of such hierarchical resemblance is not premitted by Dublic Core. Tools for controlling data values are LC subject headings and perhaps name authority files in the future. The metadata details are attached to an open source, so as not to embed it in the actual object. This is a model borrowed from Harvard, in which XML information is included with the object, and will take an estimated two years to be put into general practice. The digitizer and an information professional create the records. This metadata record for the digital deliverables is then held in a separate catalog in electronic form on an intranet server. It will eventually be available on the internet. The records for the digital deliverables and the original digitized materials are independent of each other.

 

24.5    Format, Resolution and Compression of Digitized Materials

The project has so far had no requirements to adopt standards for handling textual materials. The TIFF file format is used for capturing and preserving images. Delivery is by TIFF, JPEG and SPIFF-file, from one compressed file and its sample, rather than from derivatives. The project is also considering JPEG 2000. MrSID is used for large images, while the Lunar image processing system (which can do what MrSID does in real time) is also being used for managing service copies.

Image, sound, moving image DPI, bit-depth, sampling rate and compression details are not known.

The quality control procedures in place for the digital deliverables are random checks that vary from material to material with a total check on problem material. Metadata quality control procedures involve a second person check and random samples on outsourced materials. These quality control procedures have had a knock-back effect on scheduling.

Users do not have to pay for the use of the digital deliverables, but have the opportunity to order high quality images off-line. Users will eventually have access to the digital deliverables through an open- access catalog plus the materials, but at the moment it is restricted to in-house use. The system offers searching on creator and title, but is limited by data format. Searching and browsing metadata are handled by metadata stored in an Informix database plus sequel calls and Java.

Apart from searching and browsing, users are able to manipulate images by zooming in and out using MrSID and Lunar.

Usage levels are not applicable at the moment but will be monitored by automatic data capture.

Potential users of the digital deliverables are informed about their availability through website announcements, press releases, articles in print media, conferences and meetings (the project is obliged to hold a research conference each year). The project does not know which medium has been the most effective.

 

24.6    Evaluation, Funding and Long-term Sustainability

No front-end, formative or summative evaluation has been done but this will be undertaken when the research post comes in. In the meantime the project did not want to hold up digitization.

The project has funding of $5 million over five years, the main sources of which were private donors to the NYPL. The project is monitored by an Annual Report and a mid-year verbal report. The funding body requested a project management plan and cost models, and is expecting workflow reports as part of the Annual Report.

New material (and its associated metadata), metadata changes, user interface updates and file format changes will be ongoing but it was not known at what frequency.

The project is about to start work with the Mellon Foundation on a preservation strategy that will include the archive server as one of its components along with migration of data.

The digital deliverables will be available indefinitely and the project’s longer-term sustainability is reliant on library funds generally, rather than self-generating funds. Loss of the digital deliverable would be a matter of concern.


[7] An original interview had been carried out with the previous director Michael Alexander in September 2000, but it was felt that there had been sufficient change to warrant a new interview with Barbara Taranto. The original interview had focussed on the digitisation of the 60,000 images, but the emphasis is now on the creation and development of a Digital Libray Program.




valid xhtml 1.1
abp~04/02