table of contents previous chapter
The projects and programs interviewed for this Guide all acknowledge that digital materials are at risk of loss, both through physical damage to the storage medium and through technological changes that render the data unreadable. Although none of the projects we interviewed identified digitization as a chosen method for preserving analog material, digitization is starting to supplant microfilm as a preservation medium and hence the urgency of providing long-term preservation of digital materials becomes of paramount importance. For museums and institutions with two- and three-dimensional objects, moving images, and audio, the need is even greater, since microfilm has never been even a minimally adequate preservation medium for these. Overall, there is an urgent need for significant research and the development of affordable strategies for trustworthy digital preservation services. This Guide does not aim to cover the full range of technical and scientific developments in storage media and methodology, but explores some of the issues involved in preservation in order to explain the practical steps you can take to store and preserve your digital materials.
In this Guide we define ‘preservation’ as the actions taken to preserve digital objects, whether they are created from analog collections or are ‘born digital’ objects such as original digital art, and digital audio or video that are not re-recorded from analog originals. This domain includes two important and overlapping areas: first, the goal of presenting the content of analog items via digital reformatting; and second, preserving digital content regardless of its original form. ‘Conservation’ of the original analog objects—generally the domain of conservators, museum curators, librarians, and collection managers—is somewhat different, as it involves the care and repair required to keep the physical object in existence. However, there is some overlap between the two: objects are often digitized to help conserve the original object by reducing wear and tear on fragile and vulnerable materials, while still allowing access to them.
Preservation of digital objects can be thought of as long term responsibility to the digital file — responsibility to ensure that the file retains the information it had when it was created. Keeping the content ‘alive’ over time entails much more than simply keeping the files intact. Issues such as metadata must be understood as well. These issues are explored more fully in the rest of the section. There are two distinct areas of intention here, although in practice they often merge: preserving the digital surrogate, so that its content remains accessible and usable over the long term; and using digitization as a method of preserving objects which are valuable, fragile or transient. For example:
Therefore the digital surrogate is a form of preservation (although not a substitute for any other form of preservation) and must itself be preserved to ensure future access and use (see Section XI on sustainability). The project should consider the two areas: digital reformatting and preserving content in digital form.
Digital preservation is an essential aspect of all digital projects: it is imperative that the digital objects created remain accessible for as long as possible both to intended users and the wider community. Decisions made throughout the digitization process will affect the long-term preservation of the digital content. While the processes involved in digital preservation are the same whether for digital surrogates or for born-digital objects, remember that born-digital objects are the only version of an artifact or object so are, in effect, the preservation copy. All these factors must be incorporated in your digital preservation strategies.
CEDARS (http://www.leeds.ac.uk/cedars/) has described the digital preservation process as follows:
Digital preservation is a process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein. A more precise definition is: the storage, maintenance, and accessibility of a digital object over the long term, usually as a consequence of applying one or more digital preservation strategies. These strategies may include technology preservation, technology emulation or data migration. There is a growing wealth of information on digital preservation and related issues available on the Web.
The term ‘accessibility’ as used above deserves further glossing. It is important to distinguish between what we might term ‘machine accessibility’, involving the data integrity of the original files and digital objects, and human accessibility, or the ability to render the object intelligible and usable by a human reader. The latter is the more complex challenge, since it requires not only that all the original bits be present in the right order, but also that there be software available that can interpret them and display them appropriately.
Digitization projects are costly and time consuming, so it is essential that the digitization process should not need to be repeated, incurring more expense and subjecting the originals to further stress. The process may not even be repeatable: some projects we interviewed do not keep the original object. The Library of Congress, for example, converts newspapers to microfilm in certain cases, discarding them when the process is complete. The University of Michigan and Cornell have both discarded original objects for a variety of reasons, such as maximizing shelf space. Now the digital files are the de facto preservation copy, so it is even more important that they be suitable for long-term use and remain accessible for future generations.
Preservation should not be considered as an ‘add-on’ to the whole digitization process; the decisions taken throughout — choosing file formats, metadata, and storage — should all consider the long-term sustainability of the digital objects.
The projects surveyed saw digitization not as a way of creating preservation surrogates from original analog materials, but rather as a reformatting exercise designed to enhance access to and usability of materials. However, as noted above, projects like those at the University of Michigan, Cornell, or the Library of Congress, which discard the analog originals following digitization, must treat the digital version as a preservation surrogate. Furthermore, any project dealing with unstable materials must have similar concerns, since their digital version may in time become the only record of the original. Nonetheless, all of the projects interviewed recognized that they were creating assets that need to be preserved in their own right. Ensuring their long-term availability was particularly important where scholarly work depended upon or referenced them. As the limitations of microfilm become more apparent, many in the cultural heritage community are recognizing digitization as a means of preservation of digital content as well as improving access. Remember that here we are not so much concerned with conservation of the original but with preservation of the digital surrogate itself.
The starting point in planning digital preservation is to identify what can actually be preserved. Various levels of digital preservation are possible, from the ‘bucket of bits’ or bitstore (the storage of files in their original form with no plan to further reformat them for different forms of accessibility or searchability) through to preserving the full searchable functionality of an image or audio system. Each project must decide what is crucial, manageable and affordable. Consider a range of options, from the absolute minimum up to the ideal. If you decide to preserve the minimum, or the bucket of bits, then migration of software and operational systems will be less of a concern. This option might be attractive if your goal is simply to preserve archived data created by others, where your role is merely to keep the data, not to provide an ongoing framework for active use. However, if you decide to preserve the whole system you will need further migration of data to ensure usability in the future. The basic minimum will depend on each project, its rationale and its long-term user requirements. There is no limit to what can be preserved in digital format, apart from the limitations imposed by migration, storage space and other resources, such as staff time.
There are four main issues to consider in formulating your preservation strategy.
The first is software/hardware migration. All products of digital preservation must be migrated at some point, at the very least to a file format that the latest technology can recognize. If you have chosen to preserve the whole system, then operating systems and functional software must be migrated as well. This can cause problems as upward compatibility is notoriously uncertain, even from one version of software to the next, and there is no guarantee that the version you are using will be compatible with releases in many years to come. Full system migration must be carried out frequently to ensure access and usability. You will need to formulate a migration policy that is implemented on a regular basis rather than as a reaction to new software or hardware. Regular testing after migration is also crucial to ensure that functionality has been preserved. This is conventionally called system emulation.
The second issue concerns the physical deterioration of digital media. All digital media deteriorate over time, but this process will be more rapid if they are stored in an inappropriate way, such as in a damp basement, or as a pile of CDs stacked one on top of another. Correct storage (e.g. in racks that enable the disks to be stored separately) and an environmentally controlled location will help to optimize their stability and protect them from loss. The Genealogical Society of Utah, for example, stores the archival digital master files in an underground vault in Utah as well as in another state in the USA. Tape storage is useful, as there is strong evidence for the extended lifetime of this medium when stored correctly; remember that all tapes must be spooled regularly to avoid sticking. As with software and hardware migration, digital media should be checked and refreshed regularly to ensure that the data are still readable, and this process should be part of your preservation policy. All institutions should draw up a policy that reflects the value of the digital material and therefore sets out how regularly the media should be checked and replaced. This aspect of the preservation policy is linked to hardware migration, as some media formats may no longer be readable by the latest computer hardware. Preserve your data on a medium where the hardware exists to transfer to later media if the original becomes obsolete. Remember that it is costly to use a data recovery agent to move files from an obsolete medium, so make sure your preservation policy will prevent this happening, and migrate while the process is still straightforward.
The third issue concerns metadata, which is crucial to preservation of digital resources. The level of metadata recorded and stored will, once again, depend upon what you choose to preserve, the minimum content or the fully functional system. The format of the metadata is also important, as the metadata should be as accessible as the data. ASCII will not need to be migrated, so it is the most durable format for metadata, although it lacks the functionality that a database might give. SGML/XML files (including HTML) are stored in ASCII format and provide a high level of structure and functionality without requiring proprietary software. The site structure and the relationships between the files should be recorded with a METS format XML file, and the other kinds of files (e.g. images, databases) that make up the site should, if possible, have preservation copies made. Word-processing files should be copied as ASCII text files, and if possible converted to XML to preserve their structure. All databases and Excel spreadsheets should be exported as comma-delimited or tab-delimited ASCII text files. PowerPoint files should be exported to HTML with gifs, with those gifs potentially re-saved as separate TIFFs. GIS datasets should be exported as comma-delimited ASCII.
A fourth and more complex issue is the question of user needs and preferences, which may cause certain formats to become effectively obsolete even while they remain technically functional. For instance, user discontent with microform readers threatens to make microfilm obsolete even though it still fulfills the original goals of its creation. User acceptance—and its decline—will be one of the key “trigger events” that will compel migrations to new delivery versions of digital collections.
Careful consideration of the technical issues will help you to ensure the long-term accessibility of your digital material.
As outlined above, the physical media on which the digital material is stored need to be managed in monitored, controlled systems and environments to avoid physical deterioration through human mishandling and inappropriate storage. Magnetic media such as digital audio and video tape are highly vulnerable to fluctuations in temperature and humidity, and of course to magnetic fields. Optical media such as CD-ROMs and DVD-ROMs are more durable, but should still be stored with care. For all media, it is advisable where possible to store a second full set in an off-site location, to guard against the risk of theft, fire, and similar disasters.
Choice of medium is equally important. Formats change rapidly and obsolescence is a perennial problem; it may be very difficult in the future to find a drive that can read ZIP disks, just as now it may be difficult to find readers for certain kinds of magnetic tapes. No one format can be guaranteed to persist and remain accessible in years to come, so you need to anticipate continuing developments and new products. Among currently available media, DVD, DLT (Digital Linear Tape) tape and CD-ROM are the most popular. Tape has a tried and tested longevity, while DVD and CD-ROM are newer media and current research has not yet agreed on their likely lifespan. Best practice in preservation is to migrate data from one medium to another, for example from optical disc to tape, while the hardware and software are still available. Migration is an integral part of any digital project and should be set out in your preservation policy.
If projects choose to make back-up or archive copies on fixed media, then it is crucial to ensure that the fixed media is stored correctly and that migration to the latest type (e.g. DVD) is carried out regularly.
A similar obsolescence problem will have to be addressed with the file formats and compression techniques you choose. Do not rely on proprietary file formats and compression techniques, which may not be supported in the future as the companies which produce them merge, go out of business or move on to new products. In the cultural heritage community, the de facto standard formats are uncompressed TIFF for images and PDF, ACSII (SGML/XML markup) and RTF for text. Migration to future versions of these formats is likely to be well-supported, since they are so widely used. Digital objects for preservation should not be stored in compressed or encrypted formats.
While TIFF and PDF are de facto standards, it must be remembered that they are proprietary formats with multiple ‘flavors.’ Not all image software can represent TIFF files accurately and Acrobat is required to view PDF files. Projects should monitor the changes in these formats and ensure that users will be able to access the files over time, either by migration or emulation.
The readability of your files is dependent on application software and operating systems. Consideration of file format selection must take this into account. Access software, operating systems and user interfaces are subject to the same process of continuing development and change as storage media and file formats. New versions of operating systems will affect the usability of your files, even though the file format itself may still be accessible. Remember that your operating system may well be proprietary software; unless you are using a non-proprietary system such as Unix, you will have to purchase the more recent version and migrate the files.
It is advisable to separate the underlying content from the discovery and display systems.
Migration will be more complex for those projects that have chosen to preserve the complete integrated system, with full searchable functionality, as there are more features to capture and preserve. Backwards compatibility is often limited and so, as with media migration, these programs should be monitored closely to enable migration before obsolescence makes this impossible. For example, you may be using proprietary software which has regular (often yearly) upgrades. The versions may eventually change so much that backwards compatibility is no longer viable, resulting in, at best, a long and expensive process of data extraction and migration, and at worst, the loss of the data altogether. To ensure accessibility, migrate every file to every new version. The use of standard data formats such as SGML and XML can be an advantage here, since it keeps the data independent from the delivery software. Even if the latter becomes unsupported or if cost forces you to switch to a different delivery system, your data will remain intact and portable.
Preservation of Electronic Scholarly Journals: http://www.diglib.org/preserve/presjour.html
This project, a collaboration between CLIR, CNI and DLF working with the Andrew Mellon Foundation, aims to develop plans for affordable preservation strategies that will implement the OAIS reference model.
Monitor the user interfaces, too. Scripting languages, plug-ins, and applets may all become obsolete over time and users will not be able to access your materials.
In general, remember that standards are the key to creating a resource that is interoperable (the ability of the resource to be used by a different system) as well as sustainable over time. Relying on proprietary software or formats that are not accepted community standards may mean that you will have to migrate more frequently and in isolation. All projects will have to migrate their digital files at times, but by using standards this process will be less frequent and may take place when all projects are migrating data in response to a general change, such as that from SGML to XML, making it easier for everyone.
The long-term view taken by institutions will also influence the sustainability of digital objects. Projects that have established preservation policies and procedures, such as Cornell and the Online Archive of California, are identifying preservation issues and placing them at the heart of their institution’s ongoing procedures. Insufficient institutional commitment to long-term preservation can create digital resources with limited sustainability.
Similarly, human and financial resources play a huge role in implementing such policies and procedures. Staff turnover and variations in usage will influence the level of commitment to individual projects. Unsatisfactory record keeping and metadata, administrative as well as preservation and access, can also contribute to institutional neglect of digital materials. Moving from individual projects to organizational digital programs will facilitate the long-term preservation of digital materials, by providing greater organizational stability and the overhead necessary to maintain good records and documentation.
Whether you have opted to preserve the minimum content or the whole discovery and display system, policies should be in place to ensure the long-term sustainability and accessibility of the digital files you have chosen to be preserved.
We found that attitudes to digital preservation vary enormously. Many of the major digitization projects are using research sources, such as D-Lib Magazine (http://www.dlib.org/), to help them understand the trends and strategies in digital preservation. At the same time, we found that others did not know what digital preservation was and were uncertain of its importance for their projects. There are many questions and no single answer to the preservation issue. No one project is satisfied that it has found the best storage and migration policy to ensure longevity. The Genealogical Society of Utah project is taking a larger view of the issues through its research with Brigham Young University to identify storage media and standards that will help ensure longevity of digital surrogates. All the projects agreed that adopting standards in creating digital objects would enable their sustainability. Each project will choose what it will preserve based on the level of resources available for storage and migration as well as user needs.
Research is being carried out that will help you make the right decisions for your digital resources. Projects, such as CEDARS, have attempted to establish strategies for ensuring digital preservation through technology preservation, technology emulation and data migration. VidiPax and other companies are addressing the technical media side through research into restoring and archiving digital media, while organizations such as NARA are addressing both traditional concerns (records management) and new technical standards. Internationally, The Digital Preservation Coalition in the UK is working to foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally http://www.jisc.ac.uk/dner/preservation/prescoalition.html, and Preserving Access to Digital Information (PADI), a project of the National Library of Australia, maintains a useful web site with international research results and recommendations (see Link Box below).
The key to understanding and implementing digital preservation is to view it as a series of tasks rather than as one process. These tasks, if properly carried out, will help ensure not only that digital objects are correctly stored but also adequately maintained and useable over time.
Although there are others, six methods of digital preservation are widely discussed among heritage professionals. They are not mutually exclusive, and you may decide on one or a mixture of methods.
Technology preservation relies on preserving the technical environment in which the digital objects were created, both hardware and software. This approach is more relevant to projects that have chosen to preserve the full system with functionality and interface as well as the actual content. The digital files must be preserved as well as the computer and software. Maintaining hardware for long-term use is problematic, and it is also costly to store the physical machinery as well as the digital files. This method would apply to projects that wish to maintain the environment for historical use (a computer museum, for example) and may not be appropriate for many cultural heritage projects.
Technology emulation also seeks to preserve the technical environment, but instead emulates the environment on current technology, mimicking the software and even the hardware. Everything has to be emulated, including operating systems, scripts, and applications. This is unlikely to be a viable approach for most digital imaging projects. HATII’s Digital Archaeology Report for the British Library ( http://www.hatii.arts.gla.ac.uk/Projects/BrLibrary/index.html) has more detailed explanations of emulation.
Data migration focuses on maintaining the objects in current format and may involve migrating all files to a newer format (for example in 2001, from JPEG to JPEG 2000), when current software no longer supports the earlier file format. Data migration is a viable option for cultural heritage programs and will involve making certain decisions, such as how often to migrate and what format to migrate to. It can be a time-consuming process, involving inevitable loss of data as well as sacrificing an element of the ‘look and feel’ of the original material. Using community standards at least gives the reassurance that every project will be migrating data of similar types at roughly the same point in time, making it more likely that tools and support will be available. The Archivo de Indias project has demonstrated that it is possible to migrate millions of documents form one file format to another, provided you design the work program effectively.
Enduring care encompasses all aspects of caring for the digital objects and the media they are stored on, including housing in secure locations and careful handling. This is short-term care and projects must consider migration and refreshing for longer-term preservation.
Refreshing is the process of moving data from one medium to another, e.g. from CD-R to DVD. It is not a total guard against obsolescence and should be seen as part of the whole digital preservation strategy.
‘Digital Archaeology’ describes what to do in a ‘worst case’ scenario, where a digital preservation policy has not been followed, or an unforeseen catastrophe has damaged the media. As described in the HATII report on Digital Archaeology, this is the process of rescuing content from damaged media or from obsolete or damaged hardware/software.
Data migration should not be confused with ‘data refreshing’. Data migration is the act of moving the actual content of the digital files to a newer file format (e.g. from Access 97 to Access 2000), while data refreshing is moving the files (no matter what format they are in) to a new physical medium, e.g. from CD to DLT tape, either because of obsolescence (e.g. laser discs) or because of physical deterioration.
Each project will have to identify the best storage strategy for its digital masters. This will include the choice of medium and the choice of institution/organization entrusted with managing that medium. The benefit of choosing a management service is that you can forego the responsibility and effort in choosing the medium. The benefit in investing the time to choose the medium is that you enjoy total control over your collection. Favored media vary from project to project and depend upon costs, access issues, physical storage space and technical equipment available. An in-depth examination of the technical issues of digital media is beyond the scope of this Guide. For a fuller explanation of these issues see the HATII report on Digital Archaeology cited above.
The formats used by the projects interviewed for this Guide include:
CD-ROM (including CD-R)
DAT Tape
DVD - both DVD - Video and DVD- ROM
DLT Tape
RAID Server
A mixture of the above media
Projects using audio-visual material will need to use a medium that can deal with the large file sizes created when digitizing such materials as well as enable access to the stream of data. DVD and DV Stream are the most popular media for these formats.
CD-ROM is a popular storage medium — it is cheap and simple to process. Projects such as the Online Archive of California, the Digital Imaging and Media Technology Initiative of the University of Illinois and the Colorado Digitization Project use CD to archive their digital materials. CD writers are easy either to install and run from the machine that is temporarily storing the data, or to back up from the server that holds the access data. CD-R has less storage capacity than DLT tapes or DVD — CD-R can hold approximately 640 MB of data while a DLT tape holds 35-70GB. Magnetic tape is known to be stable for approximately 30 years under good storage conditions. DLT tapes for instances should survive for a million head-passes over a 30 year period. Manufacturers claim that CD-Rs will last from 50 to 200 years, but experience indicates that even after as little as 10 years they are beginning to suffer from drop-outs. Media lifespans are an unknown quantity, so you may count on having to migrate and refresh the data regularly. These figures are based on climate- and temperature-controlled storage conditions and safe environments. Lifespan also assumes that the media are free from manufacturing errors and are not mishandled.
You can act to improve the storage life of your media. We recommend that you handle media with care and that you check it regularly. The following two tables give examples of how to improve the lifespan of various storage media.
Improving the lifespan of CDs and DVDs:
| Avoid | Never | Always |
|---|---|---|
| Damage to the upper and lower surfaces and edges of the disc | Attach or fix anything to the surface of the CDs | Store media in a jewel case or protective sleeve when not in use |
| Scratching and contact with surfaces that might result in grease deposits (e.g. human hands) | Write on any part of the disk other than the plastic area of the spindle | If using sleeves, use those that are of low-lint and acid-free archival quality |
| Exposing discs to direct sunlight | Wear gloves when handling the master CDs |
Improving the lifespan of DLTs:
| Avoid | Never | Always |
|---|---|---|
| Placing the tapes near magnetic fields | Stack the tapes horizontally | Keep tape in its protective case when not in use |
| Moving the tapes about | Put adhesive labels on the top, side or bottom of cartridge | Move tapes in their cases |
| Touch the surface of the tape | Store the tapes in appropriate | |
| Put a tape that has been dropped in a drive without first visually inspecting it to make certain that the tape has not been dislodged or moved | Store the tapes vertically |
There are two types of DVD: DVD-Video and DVD-ROM, which can be compared respectively to the audio CD and the CD-ROM. DVD-Video can contain a whole film with digital audio. The DVD-ROM stores computer-readable data.
Many larger projects store two copies of the media, in different storage facilities. Both the GSU and SHOAH use underground vaults with temperature controls and high security in two geographically remote locations. SHOAH has the added risk factor of earthquakes and has chosen its second site in Washington DC. Not all projects can afford such facilities and the additional security that they provide, and will simply store one copy on site. We recommend, however, that all projects store archived master files off-site where possible to ensure against natural and human disasters.
We found that many projects did not check archived data for refreshing and did not have a preservation strategy at all, although many are developing them.
There is a need for better long-term storage formats. The GSU project is conducting very interesting research into media for digital preservation. Through their work with BYU and Norsam (http://www.norsam.com/), they are examining media such as the HD-ROM, which has the following manufacturer’s claims:
HD-ROM and the "ET" System are planned to have the following specification features: (ET is Electrical Transport Data Storage System)
The file formats you choose are crucial to ensuring data migration. In 2001, uncompressed TIFF is the preferred raster image format for digital masters. Migration from this format may be inevitable to ensure long-term access to the materials. However, TIFF is widely supported and software manufacturers may continue to support it for the foreseeable future. It should be monitored carefully. TIFF is platform-independent. Archival copies should be stored uncompressed.
JPEG, JPEG 2000, MrSid, and GIF formats—which may be well suited for delivery—are not considered to be as well suited as TIFF for preserving digital master images, for two main reasons. All of these formats incorporate compression algorithms that offer potential benefits for storage and networked delivery, but increase the requirements for both monitoring and migration. Compression is one more thing to worry about when tracking the availability of software that can read and write formats without introducing information loss; thus, it is possible that intervals of migration will be more frequent for compressed formats rather than for uncompressed TIFF. In addition, because these formats employ lossy compression algorithms, there are opportunities for information loss at the time of capture and in subsequent migration activities. Other formats with lossless compression, such as PNG, are not widely used yet because of their relatively limited support within software packages and image viewers. TIFF, while not an open standard in the strict sense, is the de facto standard format for digital masters within the community, but not necessarily for delivery, given the fact that standard web browsers will not display TIFF images.
Formats for audio-visual material include WAV, AIFF, MPEG (various versions) as preservation archival copies. RealMedia (both audio and visual) and QuickTime are used for streaming data to users but not as archival formats.
The structure of digital objects is just as important as media and format in ensuring their longevity.
Digital objects stored in a proprietary format, such as a proprietary database, are difficult to migrate forward; proprietary software often has a short lifespan and is vulnerable to changes in the company that owns and supports it. A proprietary database can be unreadable within five years. You can minimize this risk by exporting your data in a standard format (such as tab-delimited ASCII). Similarly, use standard encoding schemes to structure your digital objects in order to minimize the risks of data obsolescence. Encoding standards such as SGML or XML are platform- and software-independent, and many projects, including some of the larger ones such as those at the Library of Congress, Cornell, Michigan, the Online Archive of California, and Virginia, have adopted these formats.
One of the most important factors in ensuring digital preservation is the creation of robust metadata to describe the process of digitization as well as the actual digital objects themselves. Administrative metadata are particularly associated with digital preservation and the retrieval of the images from the archived media. This facilitates both short-term and long-term management and processing of digital collections and includes technical data on creation and quality control. Administrative metadata also handle issues such as rights management, access control and user requirements as well as preservation strategy information. The administrative metadata should include technical data that can be of potential use to systems (e.g., for machine processing tasks during migration), and to people, such as administrators monitoring the number and types of formats in the repository, or even to users inquiring about the provenance of a digital object. Having metadata in place will facilitate digital asset management, by helping to manage the digital files, their location and their content. Also see Section XIII on Digital Asset Management.
Some projects are adopting an institutional approach to metadata.
While administrative metadata are crucial to the long-term usability of digital materials, structural and descriptive metadata are also critical to the digital preservation process. All projects recognize the importance of good, robust metadata and have implemented schemes that facilitate the maintenance and accessibility of the resources. The importance of high quality metadata cannot be underestimated, as this can ensure that while data migration occurs, the original information about the digital object is not lost and can be recreated if necessary.
Over the last few years, those involved in developing and delivering digital objects have realized that preservation issues go far beyond questions of preserving bits and dealing with obsolete media and file formats. Indeed, there is widespread recognition of the need to develop trusted digital repositories that can ensure not only migration of media and formats, but also the preservation of relevant information about digital objects so that we can preserve their behaviors as they are migrated forward in time and make meaningful use of them in the future. We need to preserve not just the byte stream, but also the structure of the digital object and its context. Discussions about the attributes of digital repositories have crystallized around the emerging ISO standard, “Reference Model for an Open Archival Information System,” originally developed by NASA to deal with the huge volume of digital material developed by the space program. (In May of 2002, RLG-OCLC issued a report based upon this model entitled “Trusted Digital Repositories: Attributes and Responsibilities.”) The Draft standard states:
An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.... The model provides a framework for the understanding and increased awareness of archival concepts needed for long-term digital information preservation and access, and for describing and comparing architectures and operations of existing and future archives.
OAIS is a reference model, not an implementation plan. Libraries and archives are just now beginning to design digital repositories. [1] As they look to implement OAIS systems, metadata concerns loom large. Long-term preservation of digital objects in a repository requires that we develop and maintain structural and administrative as well as descriptive metadata about those objects. The METS initiative has gained much attention this past year as a flexible way to address these needs. The FEDORA project, which is complementary to METS, is aiming to develop behavior metadata as well. At this time, creators of digital objects and projects do not yet have full guidance on what digital repositories will need to secure their objects for the long term. In particular, much of the progress to date has focused on ingesting, storing, and preserving individual digital objects and much more research needs to be done on the long-term needs of complex digital creations. Nonetheless, attention to the emerging structure of repositories and their metadata needs will definitely increase the likelihood and ease with which digital objects can be incorporated into repositories and preserved for long-term access.
One of the major issues involved in creating trusted repositories is that of responsibility. To what extent will creators and local providers need to cede control over their digital objects to centralized repositories? The advantages that large, financially secure institutions can provide in terms of long-term preservation are unquestionable. Nonetheless, research efforts especially amongst the digital library community are also focused on developing distributive systems. Indeed, there is a growing sense that “lots of copies” will be an important preservation and authentication strategy for the foreseeable future. With the ever-declining cost of storage space, redundant distributive systems with shared responsibilities has many advantages that mitigate against centralized control.
Digital preservation is a discipline that is still in its infancy. Much of the preceding discussion provides a general description of various conceptual approaches to digital preservation, and also provides guidance for good custodianship (in the present) of existing digital files and digital media. However, it is expected that, in the near future, digital preservation will be executed within the context of enterprise-wide digital repository systems. This is an area of ongoing research and development. For example, in the United States, there is the new National Digital Information Infrastructure Preservation Program (Friedlander 2002), whereby the Library of Congress will lead a national planning effort for the long-term preservation of digital content and will work collaboratively with representatives of other federal, research, library, and business organizations.
Key factors in preservation are:
[1] Examples of projects using OAIS are CEDARS, NARA, NEDLIB and PANDORA. A chart for the deposit system for electronic publications can be found at: http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html. The CEDARS project aims to produce strategic frameworks for digital collection management policies, and to promote methods appropriate for long-term preservation of different classes of digital resources, including the creation of appropriate metadata.