This appendix brings together material from various sections of the Guide, in expanded form, to provide a detailed description of the kinds of metadata and metadata standards that are of greatest importance to the cultural heritage sector. It has already been observed, but is worth repeating, that metadata is a crucial part of any cultural heritage digitization project and should not be neglected. Without metadata identifying its source, contents, and details of creation at an absolute minimum, a digital object is useless. Capturing additional information to facilitate rights management, administrative tracking, preservation, and distribution can enable you to get much more powerful use from your digital materials with much less difficulty.
Metadata is literally “information about the data”: information created about the source material and the digital version to record the essentials of their identity, creation, use, and structure. Their purpose is to facilitate the discovery, use, management and reusability of digital material. Metadata can be usefully divided into three categories: descriptive, administrative and structural. These are not rigidly bounded groups and they frequently overlap.
Descriptive metadata describes and identifies information resources, to facilitate searching, retrieval, and management. It typically includes basic bibliographic information such as the creator, title, creation date; catalog information such as accession or other identifying numbers; and topic information such as keywording. Examples of descriptive metadata include Library of Congress Subject Headings, Categories for the Description of Works of Art (CDWA) (http://www.getty.edu/research/institute/standards/cdwa/), the Art and Architecture Thesaurus (AAT) (http://www.getty.edu/research/tools/vocabulary/aat/), and the Dublin Core metadata set.
Administrative metadata is used to facilitate management, tracking, migration and re-use of digital assets. It typically includes information on creation, quality control, rights and preservation. See Cornell University’s “Web Hub for Developing Administrative Metadata for Electronic Resource Management” (http://www.library.cornell.edu/cts/elicensestudy/home.html). The term “technical metadata” is also used in a similar sense to indicate metadata about data capture and the technical characteristics of the images.
Structural metadata describes the internal structure of digital resources and the relationships between their parts. It is used to enable navigation and presentation. Examples of structural metadata are included in the METS standard (http://www.loc.gov/standards/mets/) and SMIL (Synchronized Multimedia Integration Language) (http://www.w3.org/TR/REC-smil/)
Metadata for textual resources is in some ways the most straightforward to create, because it can be captured in the same format as the digital object itself, and can be included directly in the digital file, for instance as a header section in an SGML/XML-encoded document. EAD, TEI, and HTML all include a header element of varying scope; the HTML header provides for the inclusion of basic Dublin Core metadata terms, while the EAD and TEI headers provide for more extensive information about both the electronic file and the information it captures or describes.
Metadata for still images may be stored in a file header or in a separate database or file. Because images themselves, unlike text, cannot currently be searched very effectively for content, metadata is doubly important for retrieval purposes, as well as for internal project tracking, management, and documentation. The METS standard can be used to bring together the different types of image metadata required for different project purposes. It can also be used not only to document individual images, but also to represent the relationships between multiple images that together constitute a single digital object (for instance, high-resolution archival images, thumbnails and delivery images at lower resolutions, images of particular details at higher magnification). The NISO IMG draft standard on metadata requirements for digital still images provides extremely detailed specifications for capturing technical metadata for images: http://www.niso.org/standards/dsftu.html and http://www.niso.org/standards/resources/Z39_87_trial_use.pdf
As with still images, metadata is crucial to digital audio or video, and the principles of metadata interoperability and documentation standards are as important to digital AV media as to still image and text media. Metadata for digital audio and visual resources can be used in much the same way as metadata for complex digital objects composed of still images. A metadata standard like METS (with the appropriate extension schema) can be used to describe the structure of an audio-visual digital object: for instance, a group of original taped interviews and a final version edited for delivery.
SMIL (Synchronized Multimedia Integration Language) can be used to describe the content and structure of time-based digital files such as audio and video. SMIL, for instance, can be used to describe structural metadata about a particular frame of video (frame 30, timecode 01:20:36.01) as well as to link the appropriate series of frames to alternate representations such as a transcription of the dialogue in that scene. As with image resources, this allows users to search for a particular bit of dialogue or the name of a character, and be taken directly to the video scene in which they appear.
A number of metadata standards are now in use by the cultural heritage community that have been developed by different subcommunities to address particular needs. These standards are not mutually exclusive; on the contrary, some of them, such as METS, are specifically intended to be a way of bringing together various forms of metadata in a single place where it can be processed uniformly and predictably.
The Metadata Encoding and Transmission Standard (METS) is an XML-based encoding standard for digital library metadata. It is both powerful and inclusive, and makes provision for encoding structural, descriptive, and administrative metadata. It is designed not to supersede existing metadata systems such as Dublin Core or the TEI Header, but rather to provide a way of referencing them and including them in the METS document. As a result, it is an extremely versatile way of bringing together a wide range of metadata about a given digital object. Through its structural metadata section, it allows you to express the relationships between multiple representations of the digital object (for instance, encoded TEI files, scanned page images, and audio recordings), as well as relationships between multiple parts of a single digital representation (for instance, the sections of an encoded book). Its administrative metadata section supports the encoding of the kinds of information projects require to manage and track digital objects and their delivery: technical information such as file format and creation; rights metadata such as copyright and licensing information; information about the analog source; and information on the provenance and revision history of the digital objects, including any data migration or transformations which have been performed. METS is a very recently developed standard but is well worth watching and using.
The Dublin Core Metadata Element Set defines a set of 15 essential metadata components (for instance, author, title, format) which are broadly useful across disciplines and projects for resource discovery and retrieval. These components can be used to add metadata to HTML files (using the <meta> tag) but can also be used in other contexts to create basic metadata for a wide range of digital resources. Dublin Core does not provide for detailed administrative or technical metadata, and as such is largely suited for exposing resources for search and retrieval, rather than for internal resource management and tracking. In addition, since its goal is to be simple and broadly applicable to a wide variety of resources, it does not provide for the kind of highly structured metadata about specific document types that TEI and EAD offer. Although projects using these encoding systems will probably not need to use the Dublin Core, they may find it useful to be aware of it as a possible output format for distributing metadata about their resources. One aspect of the work of the Consortium for the Interchange of Museum Information (CIMI) is research into SGML, XML, and metadata standards such as Dublin Core for museum collections and RDF.
The TEI Header is a required component of any file conforming to the Text Encoding Initiative Guidelines, and is ordinarily used to document a text file encoded in TEI. However, it can also be used to describe other kinds of resources. It is designed to express a wide range of metadata about a digital file, whether that file is an encoded text, an image, a digital recording, or a group of any of these. It provides not only for standard bibliographic information about the file itself and about its source, but also more specialized metadata to record the details of classification schemes, encoding and sampling systems used, linguistic details, editorial methods, and administrative metadata such as the revision history of the file. It is designed to accommodate a wide range of metadata practices, and while it offers highly structured options for capturing detailed metadata, it also allows for briefer and more loosely organized headers which record only the most basic information.
In a sense, the Encoded Archival Description (EAD) bridges the realms of data and metadata. As a digital finding aid, it may stand on its own as metadata about an archival collection. As a digital representation of an analog finding aid, it may also be a form of digital preservation (particularly if the original finding aid has any historical significance). It provides for the capture of all the information ordinarily conveyed in a finding aid, but it also provides for metadata about the finding aid itself—its author, language, publication details—and about the EAD file as well. EAD is a powerful tool for providing digital access to archival collections by representing the information user’s need to discover archival materials of interest in a consistent and digitally transparent way.
The UK Museum Documentation Standard represents a common understanding of good practice for museum documentation, established in partnership with the museum community. It contains procedures for documenting objects and the processes they undergo, as well as identifying and describing the information which needs to be recorded to support the procedures. Spectrum was developed by the MDA in Great Britain. CIMI has adapted the SPECTRUM XML DTD for web based museum object.