table of contents previous chapter next chapter
In this section we look at the issues you need to examine when selecting material - whether selecting physical originals for digitization, or reviewing born digital materials for preservation or republication. We also show how you can ensure that this process takes into account the aims and characteristics of your organization, the profile and needs of your users, and the characteristics of your collections. Some central questions you'll need to consider include the following:
These are only some of the questions you will need to address when embarking upon a digitization project.
Collections are a vital component of the intellectual capital of cultural heritage institutions and in most cases, their raison d’être. Successful digitization programs start from a strategic knowledge of the institution's collections and their relation to the institutional mission. A strategic assessment of this sort is also a vital step for regional, national, and international initiatives and collaborative or larger programs. Unfortunately, this prior analysis of existing holdings was often omitted in many of the early digitization projects.
Planning for digitization should start from a study of the analog sources, the physical materials themselves, rather than in response to available technology or other pressures.
The analysis should include an assessment of the physical objects, their condition, characteristics, and educational, cultural, historical, and aesthetic value (Ross 1999).
Planning for digitization and digital publication should start from a study of the analog sources, the physical materials themselves, or, in the case of born digital material, the digital collections, rather than in response to the nature of the available technology or other pressures. As we are now passing from the first stage of experimental digitization projects to sustainable long-term programs, the technology itself holds fewer mysteries and attractions, and problems of scale and long-term planning come to the fore. It is thus important to carry out an overall assessment of institutional holdings before deciding on digitization priorities. In this way, even if digitization starts at a smaller scale with potential for expansion, its development will be well planned and will complement the institution's strategy and objectives.
Where strategic knowledge of the institutional holdings does not already exist, this type of assessment will require resources if it is to be carried out in a systematic and thorough way. It needs to be planned ahead, have institutional support at all levels, and include all team players. This can be demanding in terms of time and staff; however, the required resources will be well spent, as this analysis will be a valuable tool in planning all the organization's activities. It will clarify the strengths of the collection and will place them in a local, regional, national and international context.
When planning an evaluation of this type, physical access to the original materials is a fundamental consideration. This type of assessment needs to include both primary and secondary material and to examine the condition and size of both. A systematic assessment of the institution's collections will complement the resource inventory, as well as the institutional digitization policy we mentioned in Section II, highlighting particular strengths and resources in answer to the questions 'Who are you?' and 'What do you have?'
For cultural institutions that are in the process of examining their assets, it is important to first assess their intellectual value to the intended audience. The project team, which should include custodians of the material, should assess whether the analog materials have sufficient intrinsic value and quality to ensure interest in their digital surrogates and to sustain the levels of access made possible by digitization. The Question Box below contains examples of the kinds of questions you may wish to ask when considering whether material should be selected for digitization.
Question Box:
Questions to Ask Before Selecting Material for Digitization:
As is clear from these questions, there is an element of subjectivity when making these judgments and weighing the intellectual nature of the various collections. Our perceptions and evaluation depend on our perspective and are subject to change. It is therefore advisable to consult widely within your institution and peer group as well as with your users in order to reach a general consensus in your decisions. Establishing the needs of potential users (see Section XII, User Evaluation), and determining what collections in other institutions might be digitized (see Section IX, Working with Others), will further enhance this activity, and can also help inform the institutional collection policy overall.
While examining the current intellectual value of the original materials, it is also worth considering the advantages of digitization in this area. A good example is the digitization of the Anglo-Saxon Beowulf manuscript, which was carried out by the British Library in collaboration with Professor Kevin Kiernan from the University of Kentucky with the close collaboration of scholars, curators, conservators, photographers, and technical experts (Kiernan 1994; Prescott 1998). This is one of the greatest treasures of the Library and of paramount importance for scholars of Old English. The manuscript had been badly damaged in a fire in 1731 and in order to protect the brittle and smoke-stained pages, each one was mounted in a protective paper frame in the mid-nineteenth century. In order to have a retaining edge for the frame, the letters around the edge of the verso of each leaf were covered, obscuring hundreds of letters from view. The digitization of the manuscript was one of the Library's early worldwide digitization projects, conceived in 1992 and completed with the production of a CD-ROM in 1997. Using a high-end Kontron digital camera (manufactured originally for medical imaging), while lighting each page with fiber-optic lighting, it was possible to capture digital images of the pages where the hidden letters were revealed.
In this case, digitization offered tremendous added value to the original manuscript, bearing in mind that some of those hidden letters represent the only known record of some Old English words. The project subsequently expanded to include a collection of digital images of all the primary evidence of the Beowulf text held in different geographic locations, ranging from the Royal Library in Copenhagen to Houghton Library at Harvard University, thereby creating a new resource and tool that allows enhanced study of the manuscript.
Digitization can also enhance the intellectual value of your original collections by allowing new possibilities for education and access, for example:
While the intellectual significance and content of your materials are very important, their physical characteristics also influence selection for digitization since they directly affect the digital outcome. Therefore, the analysis of the physical characteristics of the collections is an important step that will define how to handle the material, while deciding the subsequent digitization process. The custodians of the materials should ideally consult with those responsible for the digitization program to decide on what information is relevant. We include here some suggestions that are intended as a starting point to guide you in this process of examining the material itself and recording information about it. You might want to devise your own categories or expand on those listed in the Checklist Box: Physical Properties of Source Material.
Question Box:
Physical Properties of Source Material
In addition to the steps already outlined (examining the intellectual value of the materials, how this could be enhanced by digitization, and taking into account the physical characteristics of the collection), guiding principles for the selection process are the aims of the digitization program itself. These will vary between institutions, but we include here some of the most common general aims for starting a digitization program and how they might affect the selection of the material. In order to prioritize the selection process, examine which collections would be good candidates for:
In the examples of the Beowulf manuscript and the digitized art works previously cited, it is obvious how digitization can significantly increase access to resources. Examining whether digitization would indeed substantially increase resource accessibility is another criterion that can guide you in the selection process. This might involve asking the following questions:
The answers to these questions will reveal good candidates for digitization or will at least assist in establishing priorities. For example, some of the easily accessible analog material might move further down the selection list in favor of materials that are in remote stacks or storage areas. On the other hand, you might still select easily accessible materials, if they are so important to your users that the current level of analog access that you provide is inadequate. Other candidates might be significant but under-utilized collections with high intellectual value and relevance to the interests of your users. In this case it is worth further examining the reasons behind their limited use. Could you do anything to remedy the situation using methods more affordable than digitization?
When trying to identify a user group that would benefit from access to digitized materials, most institutions start from information collected about the current use of analog materials. The quantity, information needs, characteristics and location of current users can be a useful starting point for estimating future use. However, this is not always accurate, as digitization can increase access to and use of material that was hitherto unknown or underused. Additionally, the potential of digital information and web access can be so powerful and difficult to predict, that the actual users of digital resources are not always the same as those anticipated, especially where institutions are sharing digital surrogates of their assets with a worldwide community.
Digitization of analog materials is not a substitute for investment in their conservation and preservation, but can assist the preservation of the original. Heavily used materials can benefit from the creation of faithful digital copies to prevent the deterioration of the originals, as can materials that are fragile or at risk. In this case, you should assess whether the benefit of digitization is greater than the risk placed on the material during the process of digitization. Digitization should also be a priority for cases where the existing storage medium is no longer suitable, as, for example, with nitrate film.
Digital files are themselves vulnerable to obsolescence, chiefly as a result of changing file formats, but also through deterioration of the storage medium. There are a number of strategies for overcoming this problem: careful specification of settings for quality capture and regular quality assessment; consistent use of well-defined and detailed metadata; and use of widely recognized, standard formats. These are discussed in more detail in Section XIV on preservation.
Support for research activities is often an important incentive for digitization, although academics need to be aware that the resulting digital resources may not remain accessible indefinitely (see Section XIV: Preservation). Digitizing high quality, unique and original material can improve access for a wide community of researchers at all levels. It can also enable interdisciplinary collaboration and study from a wider range of scholars than was previously possible. For example, the creation of the Perseus database with its associated tools and resources has encouraged the study of ancient Greek and Latin culture in new ways, combining the analysis of texts, material culture, art, architecture, history, and geography.
Interface design and appropriate metadata ... [require] considerable intellectual effort, knowledge, and resources to bring intelligence and context to the individual digital files, regardless of their quality.
While access to digital surrogates will never supersede the need for researchers to use the original objects, digitization offers added functionality that can assist research in other ways. It has some important shortcomings: it takes away the immediacy of the original and its impression of size and color (which are still not displayed accurately with the existing technology today), and it eliminates or obscures some of the original context. The program team that selects and prepares materials for digitization can counterbalance these effects by providing sufficient context and accompanying information to make the digital objects meaningful. For example, good interface design and appropriate metadata can ensure that digitized illustrations do not appear on the screen in isolation, without reference to the rest of the text or the other related volumes; or that early LP recordings will not be played without reference to the text and images on the original disk jacket. Considerable intellectual effort, knowledge, and resources are required to bring intelligence and context to the individual digital files, regardless of their quality, which are little more than an electronic likeness of the original object. The program team must address these issues: it must examine the type and level of context it wants to provide, which should reflect the available resources as well as the research needs the project aims to support.
We have only begun to explore the possibilities of research tools for manipulating, searching, and studying digital resources. The ability to carry out sophisticated searches and accurate retrieval through rich materials is assisting all areas of research. In the area of text retrieval and processing, for example, the availability of large electronic corpora has contributed significantly to fields such as lexicography, socio-linguistics and authorship attribution studies. Developments in automatic translation programs and multilingual thesauri will allow much wider use of hitherto unknown resources and will enable cross-lingual searching. Image retrieval presents greater challenges. At the moment, few effective content-based image retrieval (CBIR) products have yet reached the marketplace. Most CBIR tools retrieve images based on appearance and the numerical characteristics of color, texture, and shape, rather than intellectual content and image semantics (Lesk 1998), and even in these areas they produce poor quality search results. However, CBIR continues to be the focus of intensive academic and commercial research activity. Even with the weak experimental applications that are available, including IBM's QBIC, Virage's VIR Image Engine, and Excalibur's Visual RetrievalWare, these tools clearly hold considerable promise to improve researchers' access to digital resources. Although there is still no substitute for trained catalogers, librarians, and subject specialists tagging the images with keywords, this is an area where future developments might revolutionize the use of images.
Link Box:
CBIR Tools to Watch:
IBM's QBIC: http://wwwqbic.almaden.ibm.com/
VIR Image Engine: http://www.aa-lab.cs.uu.nl/cbirsurvey/cbir-survey/node41.html
Excalibur's Visual RetrievalWare: http://www.excalib.com/
A wider and richer collection of materials is also becoming available for research through advancements in even more challenging areas: development of next generation scanners for digitizing three-dimensional objects; the use of speech recognition for the indexing and retrieval of digitized speech archives; and research into video analysis and indexing systems that can parse and analyze hours of video, identify events as they occur, extract embedded textual data, employ continuous-speech recognition to convert spoken words into text and convert all this information into a searchable database. Collaboration between institutions (discussed in Section IX) also has a vital role to play in answer to the common current complaint that resources are not sufficiently large and comprehensive.
The digital environment continues to suffer from a number of limitations and restrictions that have a subsequent impact on research. For example, a large percentage of the material digitized thus far by institutions in the cultural and educational sector, belongs in the public domain. The selection has been driven more by copyright restrictions and less by the material's intellectual significance and usefulness for research and education. The rights to moving image and recorded sound material can restrict access to some information sources. Although OCR accuracy has been continuously improving for the last few years, the poor results with non-Latin characters has meant that very few texts in non-Western languages are available online (Smith 1999). Keyboarding is an alternative in these situations — not only for non-Latin alphabets but also for manuscript materials where OCR is not an option in any case. Several projects (for instance, the Japanese Text Initiative at the University of Virginia Etext Center, http://etext.lib.virginia .edu/japanese/) are creating resources in this way and using Unicode to handle the character set encoding. However, the added cost of these methods necessarily limits their size and scope. These issues go beyond technological development and are associated with socioeconomic conditions and pressures; they will need to be addressed by the research community as they have serious implications for the availability of resources and the direction of scholarship.
User requirements should also guide the digitization selection process. Although it is difficult to predict future usage accurately in the electronic environment, current usage patterns of your collections can provide valuable pointers. Knowing the profile of your users and the ways they access information can highlight heavily used areas, limitations, and possibilities for development. Consider the following: What is the size of your user group? How are they distributed geographically? Is there a need for decentralized access of resources, e.g. from different institutions or from home? Will your users need special tools and facilities for using the digital resources? Do you have several different groups of users who may need different facilities and levels of access? If you do not already have sufficient information about your users to answer these kinds of questions, you will need a strategy for collecting that information and assessing whether digital resources can answer existing and future user demands. Digitization can also be used to bring in new types of audiences and help to open up specialized resources to a wider public. Digital collections can support lifelong education and learning, and they can also promote social and cultural inclusivity by opening up access for socially disadvantaged groups. If providing access to new user groups is one of the aims of the digitization program, it is important to try to involve the targeted users early on in this process. Consultation with these groups can bring new perspectives to your collections, assist in the selection of materials and provide information about new ways of using it. For example, providing schoolchildren with access to digitized resources from museums and galleries can be facilitated by early involvement with targeted user groups and discussions with schoolteachers about the ways that these can be used in the classroom. Working with different communities and ethnic groups can result in a variety of different ways of looking at the same material and making associations and links that go beyond the Western European, white middle-class biases.
In order to provide digital resources that are relevant and useful for your target groups, you will need to address the need for further evaluation and research. Evaluation involves user needs analysis and assessment using some of the methods we discuss in Section XII on user evaluation. This can take place before a digitization program begins, or before developing an existing program further. Information about how digital collections are being used is very limited at present and includes very little qualitative, in-depth detail that goes beyond the standard web usage statistics. Although we know, for example, that the American Memory web pages are used by millions of users every year[1], we know very little about how these resources are being used by a variety of audiences, even though this was one of the few projects that included an early evaluation survey of its pilot in 1991-3 with various schools, colleges, universities, and special, public, and state libraries (American Memory Evaluation Team 1993).
The level of technology available to user groups for accessing the digital material is another important consideration when selecting materials and approaches for digitization. If users do not have sufficient bandwidth to download or view large still or moving image files, for example, this material cannot be successfully disseminated. On the other hand, the digital environment encourages new expectations from users that your organization might not always be ready to meet. You will need not only to decide which are the most important needs to meet, but also to develop a clear articulation of the priorities and strategies that led to these choices.
As soon as you have identified appropriate material for digitization, you should tackle the intellectual property issues, since securing permission for materials you do not own can be a lengthy and often a costly process. If the materials you want to digitize are in the public domain, or if you own the copyright and control the digitization rights, then you can probably proceed without hindrance. Bear in mind that these issues—even concerning materials in your own collection—can be complex, for instance in cases where a photographer has been employed to take photographs of objects in the collection and these photographs are themselves to be digitized, or where gifts or bequests are concerned.
If the materials you want to digitize are not in the public domain, and if you do not control the copyright, you will need to identify the copyright holder and seek permission to digitize and publish the materials in question. You may already know who the copyright holder is, and may already have permission to use the materials for certain purposes, but be sure that this permission covers digitization and digital publication. Bear in mind as well that you may be asked to pay a fee or royalties for digitization or publication rights. In some cases, your needs may be covered by fair use (see more about fair use in Section IV on Rights Management) in which case specific permission from the copyright holder is not required.
It is possible that you may not be able to identify the copyright holders, or that they may not respond to your inquiry. If permission cannot be obtained even through good faith efforts, you need to proceed—if at all—very cautiously and assess the risks carefully. Again, in such cases, fair use may justify your inclusion of the materials in question. Restricting access to the materials may also limit your risk. Section IV includes more information on risk management, as well as links to resources on IPR.
Be very cautious, however, when considering publishing material on an unrestricted website. If what you plan to digitize cannot be justified as fair use, and if you cannot secure permission for digitization and electronic distribution of digital copies, you need to assess very carefully the risks of proceeding.
If, after assessing copyright status and fair use considerations, your institution finds that permission is required, you need to make a careful assessment of the value and costs that are at stake. Securing permission can be an arduous, time-consuming, and costly process, often with an uncertain outcome in cases where the copyright holder is either unknown or unwilling to give permission. The costs and uncertainty are frequently reasons why institutions decide not to digitize a particular work, despite its desirability. As indicated above, this raises the concern that some cultural materials will remain unavailable simply because their copyright status is uncertain or heavily restricted. Similarly, these materials may be available in digital form but only through commercial products, if copyright holders require fees that are beyond the reach of non-profit digitization efforts. Some have argued that cultural institutions with rich collections should avoid the effort and cost of securing permission to digitize materials they do not own, focusing instead on materials which they do control. (Zorich 1999), While this approach makes practical sense, it means that the shape and scope of digital collections will tend to reproduce the boundaries of the individual collection, rather than bringing together all related materials in a given domain, regardless of location and ownership. Collaboration can play a very important role in overcoming this limitation, but requires careful negotiation and coordination between institutions (see Section IX, Working with Others).
Even in the cases where your institution has been granted copyright to the analog material, there are further issues to be considered. You should examine, first, whether the rights you have actually extend to the digitization and distribution of electronic copies, particularly in the cases of donations and bequests. Even when there are good reasons for using materials to which you do not own copyright, you should be careful about the permissions or licensing agreements you negotiate and ensure that you have taken proper legal advice. Museums and art institutions should also be careful to protect the rights of the artists whom they display. Furthermore, when commissioning photographic documentation of your collections, you should ensure that the photographer's contract takes into account the digital environment and that you have cleared the rights for digitization and distribution of digital copies.
When you own the copyright of the materials you are selecting for digitization, make sure you plan to manage your rights efficiently and beneficially. It is important to include information on copyright status in the relevant metadata, and it is useful for the institution to have a rights management policy that includes guidelines on constructing a copyright statement to be displayed with the digitized work. Apart from the legal reasons for attaching a copyright notice, outlined in Section IV on Rights Management, a clear statement can deter users from misusing the material and, it can be argued, creates an implied contract between the user and the content provider to work within the confines of the statement.
In the desire to protect intellectual property, institutions and copyright holders alike often resort to methods which either compromise user access, or challenge the current concepts of fair use. Although a lot of research and work has been invested worldwide in technologies for protecting intellectual property in the electronic environment (also outlined in Section IV), many institutions still protect their intellectual property by making only low resolution images available on the Internet. Since these are not of sufficient quality to be used in commercial professional printed products, they are less likely to be worth misappropriating. However, although they may be adequate for various online purposes, they cannot support the kinds of detailed research for which high-resolution images are so valuable. Similarly, many owners of copyright materials feel threatened by the possibility of uncontrolled distribution of digital information, and are challenging the concept of 'fair use' in the digital environment, where it is more difficult to control who has access to the material and for what purpose. Fair use lies at the heart of the work of cultural and educational institutions, and many are striving to maintain it in at least its current form. However, you should note that generally, fair use is a safer option within a secured environment (a password protected class website, for example).
An institution must give serious consideration to how much control of its digital assets it is willing to give up or pass to others and examine how the costs involved in licensing and clearing rights would affect the overall budget.
Cultural institutions are exploring other mechanisms for managing the distribution of their intellectual property. Although none of these mechanisms offers a single uniform solution for the diversity of the cultural sector, some useful models are emerging. Licensing is the most popular option at the moment, but many institutions choose either to administer their materials directly, or to arrange for this to be done through some other means, whether through an external agency or through a consortium of rights holders. (Zorich 1999). Some of these options have existed for a long time, but are relatively new to cultural heritage organizations. The increasing demand for digital copies of the institutions' cultural assets, and the much wider distribution networks that the electronic world has brought, require new strategies from the cultural sector and careful examination of the different options. An institution must give serious consideration to how much control of its digital assets it is willing to give up or pass to others and examine how the costs involved in licensing and clearing rights would affect the overall budget.
With the spread of digitization activity and the multitude of projects and programs around the world, it is important to ensure that your proposed digitization efforts will not duplicate other efforts elsewhere. When selecting material for digitization you should ask yourself if there are any similar or complementary projects in the same institution, country, or even further afield, to avoid a costly waste of effort and resources. By collaborating and coordinating digitization programs, cultural institutions can ensure that they build a critical mass of digital collections. The community needs authoritative, well-governed registries for all types of cultural heritage materials—texts, images, audio collections, etc.—and easy ways to use the registries to determine whether items under consideration have already been digitized. Although such a registry already exists for preservation quality microfilm, there is still no similar listing of digital resources, but a number of initiatives in this area indicate that the situation is likely to change in the future.
Link Box:
Register of Digitization Initiatives and Programs
With the recognition of the need to eliminate redundancy of effort, to maximize the available knowledge about digitization projects and resources, and to improve communication between digitization activities, a number of initiatives in the US are providing mechanisms for projects to register details of their activities and outputs. Among those that have particular value for the humanities are:
Until initiatives in this area expand and such a source of information is created, cultural heritage professionals need to use traditional professional and research channels to collect information about existing digitization efforts and future plans. Web searches and relevant portals, word of mouth from colleagues, special-interest professional email lists, related professional and academic conferences and journals (e.g. DLib or the RLG DigiNews) can provide a wealth of information and a good starting point to map activities in the area.
Link Box:
Web portals on humanities computing resources
When carefully thought out and planned, collaboration with other institutions (covered in greater depth in Section IX) can be part of an effective strategy for digitization, enabling the sharing of resources and expertise. In fact, digitization activities have often encouraged and led to collaboration with others, as cultural institutions found that this was a way that they could afford expensive specialized equipment, or take advantage of the expertise of computing science departments to create advanced tools for managing digital collections, or enhance their digital collection by collaborating with institutions with complementary analog sources. By taking care that the materials you select for digitization complement initiatives at a local, regional, national, or international level, you can increase the breadth and impact of your work, making a valuable contribution towards truly useful digital resources and tools that answer real needs and are widely used. To this end, we need national or even, in the case of the European Union, trans-national strategies for the digitization of cultural material, to coordinate activities and ensure that we invest in creating unique, complementary, interoperable and high quality digital assets (Ross & Economou 1998).
Another criterion for selection of material for digitization is the availability and quality of the related metadata. Digitization activities often reveal backlogs and documentation gaps or inconsistencies in the management of the analog collections and the related analog documentation (which is also a form of metadata, even if not in digital form). It is important that you have good knowledge of the state of documentation, cataloging, and metadata used for the analog materials in order to make informed decisions about selection and to establish realistic projections of project costs and timelines. You may want to give priority to materials that have extensive, high-quality documentation and analog metadata in order to carry out the digitization activities without further delays, adding a minimum amount of digital metadata. On the other hand, digitization can provide the impetus for tackling cataloging backlogs or problems that hinder access to parts of the collection. In any case, a good assessment of the situation is important and should be undertaken before the selection stage, in order to allow you to plan both the cataloging and the digitization process accurately.
It is important that you have good knowledge of the state of documentation, cataloging, and metadata used for the analog materials in order to make informed decisions about selection and to establish realistic projections of project costs and timelines.
Another issue that influences selection is the existence of metadata about the digital surrogate. By coordinating not only with existing metadata activities within your organization, but also with national and international initiatives, you can avoid duplicating other efforts, and will be able to take advantage of the extensive work that has been taking place in this area. Have any other departments or staff in your organization carried out digitization projects before? Did they devise any subject categories or a metadata scheme that you could use and adjust? You need to decide on the type and depth of information you will record about the digital surrogate and how this will relate to the information on the original source. This has staff and resource implications and needs to be taken into account in the program planning stage. Selecting a uniform collection with materials of the same format and type, for example, will require less investment in metadata planning and recording.
There are three main categories of metadata: descriptive, administrative, and structural. You need all three to manage your digital resources. They are discussed in greater detail in the appendix on metadata, but some discussion will be useful here. Descriptive metadata is the information you will need to record in order to identify the digital resource, its analog original if not born digital, or any analog or digital derivatives. Administrative metadata is essential to the management of the digital asset, and describes its creation, intellectual property status, provenance, and the like: for instance, detailed information about technical specifications and the digitization process which can ease future migration paths and ensure consistent results over time. Structural metadata describes the structure of the digital object and the relationships between its components; this information is crucial to assist navigation and ensure that complex objects that belong to a larger collection are linked meaningfully together. The recently developed METS (Metadata Encoding and Transmission Standard) encoding scheme, maintained at the Library of Congress, provides a robust way to record all three types of metadata, and allows the open use of multiple metadata standards simultaneously.
These metadata categories are vital for the longevity and preservation of the digital material, and in applying them you should work not only towards local consistency and completeness, but should also taking into account the work being done on metadata creation at a national and international level. Furthermore, if your digitized collections are to be retrieved easily by search engines and used together with complementary collections, consistent application of widely accepted standards in metadata recording is essential. Although there is no general consensus on the most appropriate metadata scheme for all metadata categories, it is important to keep abreast of the intensive activities in this area as generally accepted models start to emerge (OCLC/RLG 2001). The Dublin Core metadata initiative (http://dublincore.org/), for example, has gained international recognition for its efforts on descriptive metadata to ensure resource discovery and accurate identification in the electronic environment. It also demonstrated the importance, as well as the difficulties, of building general cross-sectoral consensus on metadata issues, which will be necessary in order to create resources that are interoperable across different user groups, disciplines, and institutional types.
(More information on metadata is available throughout the Guide, e.g. in Section V on Digitizing Texts, Section VI on Images, Section VIII on Quality Assurance, Section XIV on Preservation, and in the appendix on metadata.)
Having examined all the areas discussed so far—the analog objects, their intellectual and physical characteristics, rights management questions, the reasons why these objects would be digitized, and users' needs—you should now assess the issues that all these raise for digitization.
In this process you should determine what features would have to be retained in the digital surrogate. Some of the questions you might ask are listed in the Question Box below. You should also examine those features of the original objects that might cause problems during digitization, such as the objects' size, their physical material, and their state of preservation.
Question Box:
What Features of the Original Should Be Retained in the Digital Surrogate?
Examples of original features that may require special accommodation:
What are the technical and resource implications for digitization in order to retain these features or address these problems so that the results are of adequate quality to meet the aims of the program?
When selecting a digitization approach, you should always start from your sources and users and aim to match these with the most appropriate procedure. Thus for example, it may not necessarily be appropriate to scan at the highest possible resolution or use a certain piece of equipment simply because it is available. If the technology you need is not accessible or affordable, you may wish to explore the possibility of collaborating with another institution that could offer access to the equipment you need. Alternative strategies, such as digitization from intermediaries, may also be worth considering.
You also need to examine the long-term use and development of the collection and estimate how it will grow. This should already be part of the selection process at an early stage, as future development and scope may influence your selection priorities and digitization strategies. If you know from the start that you will be dealing with a very large collection and will need to accommodate subject retrieval, for instance, you can build in provision for robust keywording even though this may seem like overkill for your first small collections. Although you might be starting small in the beginning, you will have prepared the ground for future expansion and growth.
Another important consideration to take into account is the relationship of projected costs to expected benefits. Institutions in the cultural and educational sectors traditionally operate with limited resources and try to satisfy many competing demands. Although cost-benefit analysis is not the only consideration, it is nevertheless an important one that cultural heritage institutions cannot afford to ignore. However, what we should not forget in the analysis is that benefits from digitization might be intangible, especially when we are dealing with programs in the educational and cultural sector.
As digitization is very resource-intensive, at both the creation and the maintenance stage, you should first examine whether there are lower cost alternatives to digitization for achieving your stated goals. Sometimes, for example, the publication of a traditional scholarly catalog might serve your specialized research community better, or traditional access might be sufficient for resources that are of mainly local interest to a small group.
Even if you do decide to digitize, there might be lower cost alternatives to the digitization approach you have selected. For example, is color scanning really necessary to serve your aims and users? This is the most expensive approach (creating the largest files), compared to grayscale (which records shades of gray using 8 bits per pixel, as explained in Section VI on Images) or bitonal scanning (which records information only in black or white using only one bit per pixel). Bitonal scanning, which creates the smallest file sizes and is the generally most affordable approach, might be sufficient for scanning text, although there are documented cases of unbound grayscale scanning with a flatbed or sheetfeed scanner being less expensive per page than bound bitonal scanning with overhead scanners or digital cameras. Similarly, in some cases, uncorrected OCR text might be sufficient for retrieval and indexing. Hand-correcting the output of OCR software improves retrieval accuracy, but significantly increases the costs of text conversion. It may even be worth asking whether you need to perform OCR in the first place, or whether it might be sufficient to provide an index with links to the page images.
There are a number of challenges and even uncertainties in conducting a careful cost-benefit analysis. For instance, the labor costs associated with scanning are usually much lower than those related to selecting, preparing, inspecting, and indexing digital resources. Indeed, it is generally true that the intellectual overhead for digitization efforts—the cost of setting up infrastructure, making decisions, overseeing quality control, ensuring good quality metadata—is among the most significant costs. For this reason, it has been suggested that it is more economical to convert once at a high level in order to avoid the expense of duplicating the process at a later stage when more advanced technology requires better quality digital objects (Lesk 1990). However, these claims have not yet been borne out clearly in practice. Similarly, it can be extremely difficult to calculate expenses and compare the anticipated costs of new projects with those of existing projects. Again, the main costs are usually associated with the staff involved in digitization but this can vary enormously.
There also are many hidden costs that are often omitted in published reports. Cataloging, indexing, preparation of material for scanning, post-scanning processing of the material, quality assurance, and maintenance of digital resources are some of the activities that are not always calculated or indicated separately. “Though digitizing projects must calculate the likely costs and benefits, our ability to predict either of them is as yet rudimentary” (Hazen et al 1998). Another area of hidden digitization costs, ironically, is collaboration. Although, as we mentioned above, working with other institutions can potentially save money through the sharing of expensive equipment, it can also involve added commitments of time and resources—for instance, the overhead required in coordinating joint efforts, attending meetings, and the like. If not planned carefully, these may be missed in the project planning and create unforeseen cost overruns. Finally, in the calculation of costs and benefits other important factors are the levels of usage and the distribution mechanisms selected (the latter are discussed in greater depth in Section X on Distribution).
Digitization might lead to some cost savings, though these are unlikely to offset the project costs fully. For example, digitization may reduce the need for storage space, as is the case with the JSTOR project (http://www.jstor.org). This project helps the participating academic libraries reduce the costs associated with the storage and care of journal collections by digitizing and providing easier access to backfiles of over one hundred and fifty journals. Of course, this is no solution for museums where holdings are unique and cannot be discarded after digitization. Cost savings might also be involved in reducing staff time spent in retrieving information in analog form, but on the other hand, digitization creates a whole series of new demands and costs, particularly for the management and long-term preservation of the digital resources.
If digitization is indeed the best option, but involves higher costs than you can afford, you should examine whether you can secure external funding (see also Section XI on Sustainability). The priorities of funding bodies have often influenced the selection of material for digitization, with grant applicants trying to find out what is more likely to receive funding and selecting projects and material with that in mind. This is also a consideration with sponsors and donors of material, who often have their own agenda and can influence selection priorities. In some cases, however, you might find that the areas supported by external funding organizations agree with your institutional goals. Having a strategic knowledge of the collections and development plans, as discussed at the beginning of this section, can help you take a constructive approach in seeking sponsors and donors, so that their priorities match those of your institution.
The issues discussed here represent some of the questions you would need to examine with your project team when defining the selection criteria for your program. The criteria you identify will depend on the particular characteristics of your institution and the aims of the program. Once you have agreed on some key principles and guidelines, it is very useful to document them and share them among the team members. The diagram below, prepared by Harvard University Libraries, is one possible model which summarizes some of the questions and decisions involved in selecting material for digitization.
http://www.clir.org/pubs/reports/hazen/matrix.html or http://preserve.harvard.edu/bibliographies/matrix.pdf
SELECTION FOR DIGITIZING: A Decision-Making Matrix from Dan Hazen, Jeffrey Horrell, Jan Merrill-Oldham, Selecting Research Collections for Digitization, Council on Library and Information Resources (August 1998), http://www.clir.org/pubs/reports/hazen/pub74.html
[1] The American Memory Project received over 40 million requests per month in 1999, increasing to over 50 million in 2000, and over 80 million in 2001, numbers that exceed by far the number of readers who visit the reading rooms of the Library of Congress.