table of contents previous chapter next chapter
Despite differences in format and standards, the fundamental issues for capture, storage and management of audio and video are quite similar and will therefore be considered together. The interviews conducted with audio and video digitization projects highlighted two broad issues.
The first is concerned with terminology and definition: when considering audio and moving image material, what exactly is meant by digitization? For example, the copying of fragile, nitrate-based filmstock to a digital tape format such as digibeta format after restoration, or the audio transfer of a 78 rpm or wax cylinder onto DAT (Digital Audio Tape) is, strictly speaking, a digitization activity. In the context of this Guide digitization implies the analog-to-digital conversion of audio and video materials and their encoding in digital audio and video file formats that can be stored, manipulated and delivered using a variety of software and media (e.g. CDs and the Web). In this digital form, audio and video materials can be used and distributed in a richer variety of ways than in their analog form.
Secondly, the usual factors that come into play when considering capture standards and storage or delivery options for any type of digital object (such as the nature and conditions of the originals, the purpose of digitization, the intended mode of access, the needs and expectations of the intended audience) are the same for audio and video digitization projects. However, the fact that high-quality storage is still difficult and expensive, as well as the very high commercial stakes involved in networked delivery of high-quality audio and video to home users, all put a rather different slant on these familiar issues.
One further consideration of time-based media that does not apply to still image media is the act of editing for content. For instance, in a still image of an art artifact, one would normally not apply much "content-editing" to the master file, perhaps only cropping out the color-bar to create a sub-master for automated creation of publicly deliverable derivatives. However, editing a piece of audio or video for content, cutting out pauses or entire sections of the video that will not be used, is a necessary prerequisite for creating any kind of deliverable user version, and represents a large investment of time which needs to be taken into account when creating masters. It is important to keep one master copy of the “raw source” material that has not been edited for content, as a permanent reference file. But it can also be very useful to keep one version of the file as a sub-master that has received basic content editing, but is not technically different from the master file. Saving this sub-master file will save you the work of redoing this basic content editing for each derivative, and allow for quicker, automated generation of edited derivatives. Saving this edited sub-master will have some associated costs in storage and management, but will save the even more expensive cost of re-editing later on.
This section looks at the motives for digitization and why sample projects have chosen particular digitization strategies, briefly describes the decision points and principles of audio and video capture, outlines the main standards and formats and their suitability for different access scenarios, and considers options for delivery and management.
In common with other types of digitized media, digitized audio and video data should be easy to handle and manipulate and more convenient to deliver than their analog counterparts. By digitizing analog materials, we unlock the content from a fragile storage and delivery format, and make it possible for the content to be copied without loss. Digitization facilitates research and study, allowing quick comparison, searching and editing within the digital object. In their digital form, audio and video content can be more effectively accessed by users than has been possible with analog collections. Once data is in digital form it can be converted more easily to another digital format without loss of quality, unlike all analog formats, which degrade with each use and lose quality when copied (an extreme example of this is audio wax cylinders which have very limited playing life, but every playing even of a vinyl record contributes to its destruction).
Digitization is used, for example in the National Library of Norway, to preserve fragile and vulnerable materials (e.g., volatile nitrate-based film stock) or materials which need special handling or obsolete playback devices. The challenge here is to produce a high quality digital version. It is very time-consuming to quality check digital audio against the original analog material.
The downsides are financial (e.g., considerable investment in equipment, and large storage is necessary if high-quality masters are to be stored), technical (e.g., methods of compression are still evolving, high-bandwidth networks are not yet universally in place), the difficulty of data recovery from digital tapes in comparison with analog formats, and the continuing uncertainty about the suitability of digital formats for preservation. In digitizing video the cost of matching the quality to that of the original remains a formidable challenge. This is hard enough with tape/video sources, and is still very expensive with film sources. The Library of Congress still specifies analog audio tapes as preservation copies; the Survivors of the SHOAH Visual History Foundation specifies digital Betacam tape copies as the main preservation medium for film. The National Library of Norway argues that digital video formats are not yet good enough, and storage system resources are insufficient in size to make feasible the extensive reformatting of analog material into digital form. Of course, the main problem is that it is very expensive to create a digital version of analog film or video material of comparable quality, though the price of creating accurate digital copies of video, especially VHS, is currently much less than achieving the relative accuracy in copying film. It is common practice among film archives, such as the British Film Institute (www.bfi.org.uk), to create analog copies, known as sub-masters, of their tape and film masters for viewing and exhibition purposes. A digitized version of the tape is just another, and better, way of making a viewing/exhibition copy for all the reasons outlined above.
Institutions may find themselves with a rich array of materials in analog form, but without the devices to play this material back. Unlike textual and still image material (with the exception of slides and born digital), audio and moving image material require a playback device in addition to a digital capture device. For example, a flatbed scanner can digitize directly a wide range of reflective media of different formats and sizes (e.g., photographs, letters, printed matter, bus tickets). No similar general-purpose capture device for audio and moving image material exists. A collection that included 78 rpm records, compact cassettes, 8mm film and VHS video cassettes would require a playback device for each of these and each would then need to be connected to an appropriate digital capture device. For audio and moving image material that is already in a digital format (such as CD or Digibeta), playback equipment is less of a problem. Although many—frequently incompatible—proprietary digital formats exist, their recent development means suitable playback equipment is still on the market and relatively easy to source. Therefore this section concentrates on identifying analog audio and moving image formats, their properties and the source device required.
Three methods can be used to progress from moving image film to digital. Film can be transferred onto videotape for digitization via a transfer box or multiplexer. Both these options depend upon the material being projected in some way. Transfer boxes project the image into a box containing a mirror and onto a rear image projection screen with the video camera mounted on the other side of the box. The resulting video is subsequently digitized. These transfer boxes are not expensive, but do not in general produce as high a quality material because they produce generational loss in quality.
A better solution is to use a multiplexer. In this device the projector and camera are mounted on a single table. The image is projected by a set of lens and mirrors, directly into the camera without the need for a projection screen. This has advantages for image clarity. In both processes quality suffers because it introduces an extra production generation into the reformatting of the analog material. An alternative to these methods is the use of 8, 16 and 35mm film for a chain film scanner to digitize directly from the analog film material. These machines scan the films and digitize at the scanner, passing the digital signal to the computer. (They work slightly differently for digital video. In this instance they grab individual lines of video to construct a frame and produce broadcast quality digital video.) In 2001 the costs of these machines remains high at between $500,000 and $1,000,000. One of the strengths of chain scanning is that, because the analog to digital conversion is done at the camera rather than on the computer, there is less opportunity for noise to be added by the process to the analog signal. Whereas small institutions can probably set up a transfer box or multiplexer system, even wealthy institutions would find outsourcing to a facilities house to be the only practical option if they wished to go directly from the analog film to the digital material.
Determining a film's original frame rate is also difficult without viewing the film with a projector, particularly for old 8 and 16mm films. The widespread availability of VHS and S-VHS video players makes the playback of these video formats for digitization relatively simple. The rapid adoption of digital formats in broadcasting, post-production and amateur markets is making the availability of even quite recent analog video devices scarce.
As there are fewer analog audio formats, these provide less of a problem than moving images. Compact cassette players, 33 and 45 rpm record players are still widely available new. Even record players with a 78 rpm speed can still be purchased new. The other formats present a greater challenge. If digitizing the sound as played on period equipment is important, the tone arms of phonographs and gramophones can be customized to provide an appropriate feed. Alternatively, the sound can be recorded via an external microphone onto a more convenient intermediary format. Reel to reel tape, wire recorders and cartridges pose similar problems of transfer. By modifying the equipment, it may be possible to provide a direct sound output. Alternatively, the sound can again be captured via an external microphone to an appropriate intermediate format. Here is where a great deal of specialist advice can be helpful. Just as we noted that it is easier to train a good photographer in digitization than it is to train a digital expert in photographic principles and practice, you will find that sound engineers bring to the digital environment strengths that are difficult to replicate.
In the case of all audio and moving image material, whether it is in analog or digital form, projects should carefully consider the advantages of outsourcing digitization. In general audio and moving image digitization require more and more expensive and specialized equipment than is necessary for still image material.
| Audio Media | Properties | Source Device |
|---|---|---|
| Wax or Celluloid Cylinders | 1890s & 1900s, up to 5”diameter, 2-4 mins. playing time | Phonograph. See http://www.tinfoil.com for details of digital transfer. |
| Wire | Magnetic coated wire drums or reels. Invented 1898. Widely used by the US military in WWII. Eclipsed by magnetic tape by the mid 1950s. | Wire Recorder |
| 78 rpm shellac resin discs | 1898 to late 1950s, 10”(25cm) and 12”(30cm) most common sizes | Gramophone (wind-up) or Hi-Fi. Gramophone’s steel needles need replacing after each side or record played. Hi-Fi needs a 78 rpm turntable and a cartridge with a 78 rpm stylus. For best results on modern equipment a phono pre-amplifier is required to correctly equalize the different types of record. |
| 45 rpm and 33 rpm vinyl discs | 7” (20cm) single and 12” long play (30cm). Long play (LPs) introduced in 1948, stereo recordings in 1958. | Hi-Fi. Hi-Fi requires turntable with 45 and 33 rpm speeds. |
| Reel to Reel magnetic tape | ½” to ¼” magnetic tape. BASF and AEG developed 6.5mm ferric tape and Magnetophone player in Germany from 1935. Post-war development in USA by Ampex and 3M. Stereo capability from 1949. | Reel to Reel player for appropriate width of tape. |
| Compact Cassette | Magnetic polyester tape introduced by Philips in 1963. | Hi-Fi. Hi-Fi requires compact cassette player. |
| Cartridge | ¼” magnetic tape. Fidelipac (4-track, devised 1956, released 1962) and Lear (8-track, 1965) cartridge systems. | Despite similarities 4 and 8 track cartridges are not compatible and require separate players. Predominantly used for in-car audio. 4 track unpopular outside of California and Florida. |
There is a series of decisions to make in digitizing audio and video materials, having to do with hardware and software components, sampling rate and precision. To understand these decisions clearly, it may help to first explain the principles of analog and digital recording, and the digitization process itself.
In analog audio recording, a plucked string (for example) vibrates the air around it. These airwaves in turn vibrate a small membrane in a microphone and the membrane translates those vibrations into fluctuating electronic voltages. During recording to tape, these voltages charge magnetic particles on the tape, which when played back will duplicate the original voltages, and hence the original sound. Recording moving images works similarly, except that instead of air vibrating a membrane, fluctuating light strikes an electronic receptor that changes those fluctuations into voltages.
Sound pressure waveforms and other analog signals vary continuously; they change from instant to instant, and as they change between two values, they go through all the values in between. Analog recordings represent real world sounds and images that have been translated into continually changing electronic voltages. Digital recording converts the analog wave into a stream of numbers and records the numbers instead of the wave. The conversion to digital is achieved using a device called an analog-to-digital converter (ADC). To play back the music, the stream of numbers is converted back to an analog wave by a digital-to-analog converter (DAC). The result is a recording with very high fidelity (very high similarity between the original signal and the reproduced signal) and perfect reproduction (the recording sounds the same every single time you play it no matter how many times you play it).
When a sound wave is sampled using an analog-to-digital converter, two variables must be controlled. The first is the sampling rate. This rate controls how many samples of sound are taken per second. The second is the sampling precision. This precision controls how many different gradations (quantization levels) are possible when taking the sample. The sampling error or quantization error means the fidelity of the reproduced wave is not as accurate as the analog original, basically the difference between the analog signal and the closet sample value is known as quantization error. This error is reduced, by increasing both the sampling rate and the precision. As the sampling rate and quantization levels increase, so does perceived sound quality.
In digital representation, the same varying voltages are sampled or measured at a specific rate, (e.g. 48,000 times a second or 48 kHz). The sample value is a number equal to the signal amplitude at the sampling instant. The frequency response of the digital audio file is exactly half the sampling rate (Nyquist Theorem). Because of sampling, a digital signal is segmented into steps that define the overall frequency response of the signal. A signal sampled at 48 kHz has a wider frequency response than one sampled at 44.1 kHz. These samples are represented by bits (0’s and 1’s) which can be processed and recorded. The more bits a sample contains, the better the picture or sound quality (e.g. 10-bit is better than 8-bit). A good digital signal will have a high number of samples (e.g. a high sampling rate) and a high number of bits (quantizing). Digital to digital processing is lossless and produces perfect copies or clones, because it is the bits that are copied rather than the analog voltages. High bit-depth is also result in much-increased dynamic range and lower quantization noise.
Ideally, each sampled amplitude value must exactly equal the true signal amplitude at the sampling instant. ADCs do not achieve this level of perfection. Normally, a fixed number of bits (binary digits) is used to represent a sample value. Therefore, the infinite set of values possible in the analog signal is not available for the samples. In fact, if there are R bits in each sample, exactly 2R sample values are possible. For high-fidelity applications, such as archival copies of analog recordings, 24 bits per sample, or a so-called 24-bit resolution, should be used. The difference between the analog signal and the closest sample value is known as quantization error. Since it can be regarded as noise added to an otherwise perfect sample value, it is also often called quantization noise. 24-bit digital audio has negligible amounts of quantization noise.
With this background established, we can return to the practical questions a digitization project must ask. The first decision to make in digitizing audio materials involves hardware and software components. Digital audio can be created either by recording directly to the digital sound card or by using an external device to transfer audio material. High quality external devices produce superior results to sound cards; for archival digitization purposes, a high quality stand-alone ADC is recommended. Most internal PCI audio cards are built from inferior quality components and are prone to electrostatic interference from the computer circuitry.
Sample values over times are most commonly encoded in the PCM (pulse code modulation) format. This is the foundation of the digital audio file. PCM data can then be transmitted via a number of digital interfaces (such as AES/EBU) to other devices or software applications.
The next important decision to be made when making an analog to digital audio or video transfer, for example from vinyl or audio cassette or VHS videotape, is on the sampling and bit rates — in other words, the quality of resolution at which the transfer is to be made. Different sampling rates have an important effect on the end result. Nothing can compensate for a bad decision at the sampling stage, so the decision has to be informed, including careful consideration of purpose, intended longevity, circumstances and needs. Put plainly, the higher the number of samples, the better the resulting quality. Current technology allows audio digitization at the so-called “DVD standard” (96,000 Hz/24 bit) and should be recommended as the preferred audio digitization standard for most organizations. However, the quality of the original also needs to be taken into account: there is no point in using a high sampling rate for a poor quality original.
Related to the decision on sampling rate is the purpose of the digital transfer and the intended target audience and mode of delivery (e.g., is a preservation master at the highest possible quality necessary? Are users to access the materials via slow home Internet connections?). Of course deciding at what rate to sample has time, labor, and cost implications. Will it be possible, and cost-effective, to re-digitize the original source material at a later date for another purpose? Will the analog material be accessible in the future? Are they so fragile that you only have one opportunity to digitize from them? Are the access devices becoming increasingly rare? If not, then a better quality initial digitization is recommended to ensure cost-effective future uses. As we have noted elsewhere in this Guide, once material has been reformatted it is rare that the work will be done again. It is thus usually better practice to capture at the highest rate you can afford, and deliver a downsampled version, than to capture at a low rate now simply because your immediate intention is to provide the material as small video images over modems on the web.
A policy decision has to be made on whether to clean up or enhance recordings and this, again, depends on the purpose of the digitization: is the objective to restore the recording, to re-create the sounds and images that reached the original recording device, or to make an accurate re-recording of the original? Filtering and noise reduction techniques that remove audio hiss, clicks and pops in an old recording inevitably change that recording and cut out some of the original sound.
Different organizations take different views according to their objectives. The Library of Congress, for example, takes a conservative stance on noise suppression for the preservation masters of historical recordings, seeking to reproduce the original as a recording before cleaning up or enhancing copies for specific listening or study purposes later on. Similarly, for much folk audio material it is important to provide a faithful digital representation of the original archival material, and even if, for example, there is a dog barking in the background of the singer’s performance, it should be left in. From this master version it would be possible to produce derivatives in which the barking dog was removed if you wished to provide one group of listeners with access just to the singer’s performance for easy listening and to produce a copy of the master with all the details of context in terms of environment and capture device (e.g. lack of noise suppression) for folk historians and anthropologists.
Given the flexibility of the digital audio file, it is recommended to digitize at the highest available settings (e.g., 96 kHz/24 bit) without any outboard or software digital signal processing (DSP) applied. The only exception may be a high-quality adjustable compressor/limiter to help with really noisy and soft signals. All other DSP techniques can be easily applied in the post-production process, and their choice should be determined by the delivery purpose, mode, and the target audience.
There may be exceptions to this general rule. For example, if making a digital transfer from an audio cassette it may be appropriate to use Dolby to get the best possible re-recording from the original. Indiana University’s Hoagy Carmichael collection equalizes 78 rpm recordings but not reel-to-reel tapes. Uses of such techniques vary according to needs and circumstances; professional studio practices and methods for adjustment and preparation of equipment (e.g. cleaning and demagnetizing tape heads, distortion filtering, alignment testing) may be beyond the resources of one digitization project but vital for another. Once again this should never be done to the master, but may be done to derivatives depending on the purpose you are trying to achieve with them.
One should not forget about calibration and adjusting equalization (EQ) curves. Some analog recordings will require the use of calibration and an appropriate EQ curve (e.g., many vinyl recordings) to approximate the signal characteristics intended by the original mastering engineer.
The choice of digitization standards should not be contingent upon the type of acoustic signal to be digitized. While it is true that speech does not have the same dynamic range as the sound of a symphony orchestra, this should not justify the use of a lower bit-depth for speech recordings. We should apply the same, high standards to all kinds of acoustic signals.
The use of standards increases the portability of digital information across hardware platforms, space, and time. There are in general two types of standards in the marketplace, those that are proprietary and those that are non-proprietary. Proprietary standards are frequently developed by a single company or consortium and are designed to provide that organization or group with market advantages. Non-proprietary ones may also be developed by commercial consortia or not-for-profit groups, but the architecture of the standard is publicly accessible and often in the public domain. Three main audio formats are in common use:
Definition Box:
| Audio Formats: | Extension | Meaning | Description | Strengths/weaknesses |
|---|---|---|---|---|
| Liquid Audio Secure Download | Liquid Audio is an audio player and has it’s own proprietary encoder. Similar to MP3 it compresses file for ease of delivery over the Internet. Only AAC CD encoder available. | Boasts CD quality. Compressed file, thus some loss. | ||
| .aif, .aifc | Audio Interchange File Format | Developed by Apple, for storing high quality music. Non-compressed format. Cannot be streamed. Can usually be played without additional plug-ins. Allows specification of sampling rates and sizes. | .aifc is the same as aif except it has compressed samples. | High quality. Flexible format. Large file sizes. |
| .au, .snd | SUN Audio | Mostly found on Unix computers. Specifies an arbitrary sampling rate. Can contain 8, 16, 24 & 32 bit. | In comparison to other 8 bit samples it has a larger dynamic range. Slow decompression rates | |
| .mp3 | MPEG-1 Layer -3 | Compressed format. File files vary depending on sampling and bit rate. Can be streamed, but not recommended as it isn’t the best format for this — RealAudio and Windows media are better. | Typical compression of 10:1. Samples at 32000, 44100 and 48000 Hz. | Small file sizes. Good quality. |
| .paf | PARIS (Professional Audio Recording Integrated System) | Used with the Ensoniq PARIS digital audio editing system. Can contain 8, 16 & 24 bit. | ||
| .ra | Real Audio | One of the most common formats especially for web distribution. Compresses up to 10:1. | Sound quality is passable, but not high quality. Lossy compression. | |
| .sdii | Sound Designer II | Originally digital sampling and editing platform. The format is still in use. Used mostly on Macs by professionals. It’s a widely accepted standard for transferring audio files between editing software. | Problems with playing on PCs. High quality. Large file sizes. | |
| .sf | IRCAM | Usually used by academic users. 8 or 16 bit, specifies an arbitrary sampling rate. | ||
| .voc | Older format, .wav files are far more common. Used mostly in IBM machines. It samples in relation to an internal clock. | Is not a flexible format. | ||
| .wav | Wave | Windows media non-compressed format. Can usually be played without additional plug-ins. Specifies an arbitrary sampling rate. 8, 16, & 32 bit. | High quality. Large file sizes. Can be used on both Macs and PCs | |
| MIDI | Musical Instrument Digital Interface | Good for instrumental music. The file play digitally stored samples of instruments which are located on a sound card. |
It may be useful to be able to make a simple comparison between the file sizes of three of the formats. For example, a five minute music file will be some 60MB if stored in .wav, 5MB as an MP3 file, and about 1MB as a RealAudio file.
The MPEG standards are among the most important for digital audio and video. The Moving Picture Experts Group (MPEG, http://www.cselt.it/mpeg/) develops standards for digital audio and video compression under the auspices of the International Organization for Standardization (ISO). Each of the MPEG standards is designed for a particular purpose and is continually being developed. It is most commonly encountered as a means of delivering compressed video over the World Wide Web but these standards have also made interactive video on CD-ROM and Digital Television possible. The commonly encountered audio format MP3 is in fact a version of the MPEG-1 audio layer 3 standard.
MPEG standards have progressed considerably and care needs to be taken when using the term “MPEG format” (see table Development of MPEG Standards). MPEG 1, 2 and 4 are standard formats for encoding audio-visual media, MPEG 7 is a metadata standard for describing audio-visual media while MPEG 21 is a descriptive framework to encompass the creation, delivery, use, generation and transactions of digital objects. Projects that are intending to encode audio-visual material should be aware that MPEG 1,2 and 4 essentially define the decompression standard: the technology at the user’s end that puts the compressed stream of data back together. It is individual companies that control the encoding technology that compresses the data to be sent. When MPEG 1 was introduced, technology companies such as Microsoft and Apple envisaged a utopian future and included decoders in their software. When MPEG 2 was introduced the likes of Microsoft, Apple and Real Networks decided the cost of MPEG 2 decoding licenses was too high and enhanced their existing technology. These provide high-quality, but proprietary AV streams supported by the distribution of free players (decoders for users). These systems can encode MPEG 2 but distributing MPEG 2 encoded files is problematic because it has been overtaken by proprietary formats such as Real. Therefore, for most projects seeking to encode AV material in an MPEG format, it is MPEG 1 that is a realistic option.
Definition Box:
The Development of MPEG Standards:
| MPEG Format | Properties |
|---|---|
| MPEG 1: Started in 1988 and released in 1992. A standard for the storage and retrieval of moving pictures and associated audio on storage media | Designed for coding progressive video at a transmission rate of about 1.5 million bits per second. It was designed specifically for Video-CD and CD-I media. MPEG-1 audio layer-3 (MP3) has also evolved from early MPEG work. |
| MPEG 2. Started in 1990 and released in 1994. A standard for digital television. | Designed for coding interlaced images at transmission rates above 4 million bits per second. MPEG-2 is used for digital TV broadcast and digital versatile disk (DVD). An MPEG-2 player can handle MPEG-1 data as well. |
| MPEG 3. Merged with MPEG 2 in 1992. | A proposed MPEG-3 standard, intended for High Definition TV (HDTV), was merged with the MPEG-2 standard when it became apparent that the MPEG-2 standard met the HDTV requirements. |
| MPEG 4. Started in 1993, with version 1 released in 1998 and version 2 in 1999. A standard for multimedia applications that is currently being extended. | Designed to meet the convergence of telecommunications, computer and TV/Film industries and provide for flexible representation of audio-visual material. |
| MPEG 7. Started in 1997 and parts 1-6 (out of 7) released in 2001. A metadata standard for describing multimedia content data. | Designed to support some degree of interpretation of multimedia content’s meaning by a device or computer by as wide a range of applications as possible. |
| MPEG 21. Started in 2000. A framework that is capable of supporting the delivery and use of all content types across the entire multimedia development chain. | Designed to provide a framework for the all-electronic creation, production, delivery and trade of content. Within the framework the other MPEG standards can be used where appropriate. |
As noted above, ideal sampling and bit rates depend on the nature of the original, but they are increasing as the technology allows. Simply put, sampling rate refers to the interval between points at which data are collected and bit-depth to the number of samples taken at any one sampling point. The comparison between digital audio and digital imaging is probably obvious; audio sampling rate (say 44.1kHz) is analogous to the number of pixels per inch (ppi) captured from a digital image (say 300 ppi) and in both cases the bit-depth relates to the number of samples taken at each interval point (say 16-bit stereo for audio or 24-bit color for images). Until recently a standard high-quality sampling rate was the CD-quality equivalent: 44.1kHz, 16-bit stereo; indeed this is the quality at which the Variations Project at the Indiana University (Bloomington) uses for capture and preservation. However, 48 kHz 16-bit is the sampling rate routinely used by the National Library of Norway for old recordings such as wax cylinders, and where the originals are of better quality 24-bit is used.
Very limited quality audio originals such as those in the Library of Congress’s Edison collection were created from DAT tape at 22 kHz, 16-bit, mono. However, depending on the characteristics of the source item, the Library of Congress specifies 96 or 48 kHz as a sampling frequency for a master file as a future replacement for reel-to-reel analog tape recordings currently designated as preservation masters. In 2001, the American Folklife Center at the Library of Congress hosted a national meeting to discuss best practices for audio digitization. The consensus of the meeting was to move to 96/24 for preservation/archival purposes. We see no reason, given the declining cost of storage space, not to recommend 96/24 as best practice. Harvard University Library uses a sampling rate of 88.2 kHz for capturing and preserving and a bit rate of 24 for capturing and preserving and 16 for delivering. The file sizes created at these sampling rates are approximately 30 MB per minute at capturing stage and 1MB per minute at delivery. Research shows a clear difference in moving from 16 to 24 bit depth.
Definition Box:
| Moving Image Media[1] | Properties | Source Device |
|---|---|---|
| 8mm & Super 8 Film | 18 fps (frames per second) most common frame rate, followed by 12 fps and 24 fps (often used with sound film). The latter frame rate tended to be used by professionals or for capturing moving objects. During the early 1960s 18 fps started to appear. 8mm sound film appeared around 1960. Super 8 introduced by Kodak in 1965. It is perforated in the center and not the edges of the frame. 3” diameter reels are most common, 6” and 7” reels and cassettes are also found. | An 8mm film projector, for "standard" 8mm and/or Super 8 film. Determining the original frame rate can be problematic. Most older projectors are variable speed which is useful. Projectors should be in excellent condition and the film unshrunken. Capstan drives are better for the film and the sprockets. |
| 16mm Film | Very common film format | 16mm film projector |
| 35mm Film | Very common film format | 35mm film projector |
| ¼” Reel to Reel Video Tape | Can be confused with audio tape. 10” reels are audio, some video, as well as audio, formats used 7” and 5” reels. | ¼” videotape recorder. |
| ½” (12.5mm) Reel-to-Reel Video Tape | ½” videotape recorder. Machine maintenance and replacement parts very difficult. | |
| ¾” (U-Matic) Tape or Cassette | Broadcast TV format. U-Matic has been around since the early 1970s and remains a popular production and archive format because of relatively low cost compared to Betacam SP. | ¾” U-Matic machine. Come in fixed or portable, reel or cassette versions |
| 1” Reel to Reel Video Tape | 1” Reel to Reel tape player. | |
| 2” Reel to Reel Video Tape | Television programs from the late 1950s to 1970s. | 2” Reel to Reel tape player. Playback equipment for this has become increasingly rare. |
| 8mm Video Cassette | 8mm video comes in two formats 8mm and Hi-8 (equivalent to VHS and S-VHS) | Hi-8 players can play standard 8mm cassettes but not vice versa. |
| ½” (12.5mm) Video Tape Cassette | Betacam SP is a popular field and post-production format. M-II is a popular broadcast quality format. S-VHS is a higher quality format of the ubiquitous VHS home video cassette. The now obsolete Beta and Video 2000 formats also used ½” tape cassettes. | Betacam SP and M-II require compatible players. S-VHS players will play standard VHS cassettes but not vice versa. Although almost identical, S-VHS cassettes have additional holes in the casing.[2] |
Three main file formats are in common use: MPEG (see table), QuickTime and RealVideo. However, both the Library of Congress and the National Library of Norway have held back from keeping preservation copies of film material as files on servers, but rather have kept digital video preservation master copies on Digibeta tape. The Library of Congress uses the sampling ratio of 4:2:2 for digital tape copies, which is the current component digital tape recording standard. 4:2:2 refers to the sampling ratio of the three parts of a component color difference signal (one luminance channel and two chroma channels). For every 4 samples of the luminance channel there are 2 samples for each of the chroma channels. As usual, as the sampling rate increases, so the quality increases. In 4:4:4, the chroma channels are sampled equally to the luminance channel, creating better color definition, but this high sampling rate cannot easily be recorded onto tape.
Of the file formats that projects might use for service, rather than preservation copies, the highest quality films are likely to be stored in the .mpg (MPEG) format. The average file size for the MPEG 1 format is about 9 MB for each minute of film. The Library of Congress’s MPEG 1 files are created at 30 frames per second at a data rate of approximately 1.2 Mbits per second of playing time. The National Library of Norway makes digital transfer from film copies in MPEG 1 at 1.5 mbits per second, at a resolution of 25 frames per second, or MPEG2 at from 6 to 15 mbit per second.
QuickTime may include a variety of compression methods; some higher end, some lower end. For instance, QuickTime (with Cinepak compression) offers smaller, downloadable files and allows films to be viewed on lower-end computer systems. The Library of Congress’s QuickTime files are created at 10-15 frames per second at a data rate of approximately 640 Kbits per second, usually quoted as 80 Kbytes/sec of playing time. The average file size in the QuickTime (Cinepak) format is about 5 MB for each minute of motion picture. The Berkeley Art Museum/Pacific Film Archive (BAM/PFA) currently captures video content as DV (see DV discussion). This DV stream is then converted and saved as a video master file in QuickTime/DV format. Derivative files are extracted at a much smaller resolution and are delivered online in QuickTime/Sorensen format for video, and QuickTime/Qualcomm for audio-only materials. Content digitized so far by BAM/PFA includes videos of artist talks and works of video-art from the collection. Works on film will require a different methodology.
RealVideo is a streaming format allowing viewing of the moving image material as it arrives at the user's computer and thus eliminating the need to download the file completely before viewing. Real Media format is especially useful for computers with slower Internet connections, such as a 28.8kps modem. Video playback is slower (3-6 frames per second), may be affected by Internet traffic, and currently provides an image of lesser quality than the worst broadcast TV. But it does make the distribution of material to wide audiences possible.
Definition Box:
Moving Image formats
| Extension | Meaning | Description | Strengths/weaknesses |
|---|---|---|---|
| .mpg | Moving Picture Experts Group | Standards created by the
group working for ISO/IEC. MPEG-1: for Video CD and MP3 are based on this
early standard. MPEG-2: DVD based on this. MPEG-4: Standard for multimedia on
the web. MPEG-7: Currently under development; for ‘Multimedia Content Description Interface’. |
Good quality and low file sizes. MPEG-1 can take a while to load. |
| .qt, .mov | QuickTime | Created initially for Macs, can now be used on PCs too. QuickTime player. Quick Time 4 has streaming capabilities. | Excellent quality, easy capture, widely used, can be large. In Windows the QuickTime player takes up lots of space. |
| .viv | Vivo | No updates since 1997. Played on VivoActive player. Video stream always sent over http (unlike Real Video or Windows Media). Bought by Real networks in 1998. | High compression rates, poor quality due to compression to maximize streaming, various incompatibility issues. |
| .avi | Audio/Video Interleave | QuickView, Windows’ Media Player. Replaced largely by MPEG and Windows media. | Large files, very good quality, must be encoded/decoded properly, |
| .rma | RealMedia | Streaming format. Proprietary format that is an equivalent to Windows Media. | Requires RealMedia plug-in. |
| .wma | Windows Media Video | Streaming format. Version 8 offers near DVD performance. |
Not all audio and video material will need to be digitized from analog material; much of it will come, increasingly, from digital materials. In discussing still image material we noted that derivatives of digitized images should be measured in pixel dimensions rather than dots per inch so digital video formats are measured in pixels. Digital video in NTSC format consists of 720 x 480 pixels[3]. This is the standard resolution used in MPEG-2-compressed commercially distributed DVD movies. As you examine the chart on digital video formats, it will be obvious that the main differences between the DV formats relate to the formats of the tapes (e.g. size and running time) themselves, but there are, in the case of DVCPRO 50 and Digital-S, some differences in the compression algorithm used.
Definition Box:
Digital Video Formats:
| Format | Tape Size | Compressor | Compression Ratio | YUV sampling | Running Time (mins.) |
|---|---|---|---|---|---|
| DVCAM | 6mm | DV25 | 5:1 | 4:1:1
NTSC 4:2:0 PAL |
184 |
| DVCPRO | 6mm | DV25 | 5:1 | 4:1:1 | 183 |
| DVCPRO 50 | 6mm | DV50 | 3.1:1 | 4:2:2 | 90 |
| Digital S | 12.5mm | DV50 | 3.1:1 | 4:2:2 | 124 |
| Digital Betacam | 12.5mm | Sony | 3:1 | 4:2:2 | 94 |
Following the capture of digital AV material, significant post-processing may be necessary or desirable, either to clean up the data using digital processing techniques (such as noise reduction), or to produce smaller, downsampled and compressed versions for various kinds of distribution. Some of these processes can be automated, but they are still labor-intensive and need to be planned and provided for in advance. Examples of post-capture processing carried out at the University of Virginia Robertson Media Center include adding data, fades, and altering frame rate and size using Media Cleaner, Final Cut Pro or Imovie tools. Other projects may choose to clean up noise or edit moving image materials into easily accessible chunks. Post-processing should be performed on digital derivatives, with an unedited digital master copy kept for reference and archival purposes. It is worth considering the kinds of uses to which these derivatives will be put so as to sequence the kinds of processing you perform and minimize the number of special-purpose derivatives you must create and maintain. For instance, where initial capture is performed at a very high bit rate and sampling depth, as recommended in this chapter, some institutions produce two other masters to meet delivery needs: a de-noised "production master," and a downsampled de-noised master from which Real Audio files can be easily made using current software.
Metadata is a crucial extra element that must be created to accompany digital audio or video, and the principles of metadata interoperability and documentation standards are as important to digital AV media as to still image and text media. However, unlike textual resources, audio and video cannot currently be adequately searched by themselves as a raw resource. There is no cheap, standard way to apply the equivalent of a "full-text search" to AV materials. As a result, metadata for audio and video is doubly crucial to internal management as well as public use of such resources. Metadata for digital audio and visual resources can be used in much the same way as metadata for complex digital objects composed of still images. For instance, a metadata standard like METS (Metadata Encoding and Transmission Standard) can be used to describe the structure of a digital object: for instance, a book that is represented by dozens of digital page images, plus a full-text transcription. METS allows one to connect an individual page image with structural information about that page (i.e. Chapter 10, page 99), as well as alternate representations of that page (say a transcription of the text). In this way, one can present and navigate the various digital files (page images, transcriptions, METS XML file) as one cohesive “digital object”. This allows users of the resource to search for phrases that may occur on a particular page, and be taken to the location of that page in the complex digital object. Similarly, AV metadata standards such as METS (with the appropriate extension schema), or others like SMIL (Synchronized Multimedia Integration Language) can be used to describe the content and structure of time-based digital files such as audio and video. SMIL, for instance, can be used to describe structural metadata about a particular frame of video (frame 30, timecode 01:20:36.01) as well as link the appropriate series of frames to alternate representations such as a transcription of the dialogue in that scene. As with image resources, this allows users to search for a particular bit of dialogue or the name of a character, and be taken directly to the video scene in which they appear. In addition to acting as a discovery tool, audio-visual resources metadata also helps enable the exchange of resources between institutions, and facilitates the internal management and preservation of such resources.
As of 2002, there is no shared good practice for what constitutes minimum metadata for digital audio and video in the cultural heritage sector. So, whether creating A/V metadata for management or for discovery, it is recommended to use or adapt a standards-based metadata standard like METS or SMIL, and to use the standard to capture as much information as the institution can afford. Note that some technical metadata can be automatically—and hence cheaply—generated; for instance, some cameras create and can export metadata about capture date, format, exposure, etc. Metadata about content is more expensive to create, since it requires human intervention. At a minimum, useful retrieval will require a basic metadata set for each file, based on a standard like the Dublin Core. In addition, any information that will require effort to rediscover later on (for instance, the format of the original material, or its provenance) should be captured immediately if possible. Most institutions will probably decide to create a very basic metadata set for each file, with the possibility of adding further information later in the lifespan of the digital object. Fewer institutions will be able to create a full metadata record for every object from the outset, although arguably this may be the more efficient way to proceed, since it consolidates processes such as quality checking.
Key AV Metadata Sites
The main issues to be examined when considering options for delivery, storage and management are connected with the larger file sizes associated with audio and moving image material. The restricted nature of available bandwidth for delivery and the lack of appropriate data storage facilities pose challenges for institutions. Rights protection is another important issue, in which streaming as a delivery option can be part of a solution. For example, in Indiana University’s Variations project, while the material in copyright is digitized under the legal provision for libraries (preservation or fair use), students may view the streaming format but are unable to copy (and potentially misuse) it. At the Berkeley Art Museum / Pacific Film Archive, copyright issues have slowed efforts to digitize film and video content. Digitization has so far included documentary material and selected works of video art but no film, as of 2001.
While almost all of the project managers interviewed are committed to maintaining top quality archival masters, many are prepared at this time to trade off top quality against lower, compressed quality for ease of access and delivery. Similarly, they are prepared to trade material of longer duration (say full-length feature films) against availability of short (say 30-second) samples. These are purely pragmatic decisions, based on current technology and bandwidth.
Much depends on the purpose of the digitization project: if the objective is to provide ready access to audio visual materials (perhaps historical material), which are otherwise very difficult to view or hear for a general audience, then delivery of lower quality, compressed versions fits the purpose. If, however, the aim is to preserve fragile materials, there is still some doubt as to the durability of most digital formats; most projects are keeping options open by storing preservation copies on tape until such time as server-based digital formats are proven for long-term preservation purposes.
For many projects, storage is problematic because of the massive data sizes of audio and video material. At the extreme ends of the spectrum, the Survivors of the SHOAH Visual History Foundation and Indiana University’s Variations project serve as exemplars of large data storage options for video and audio material respectively. At Indiana, migration of MPEG files from the current tape storage system to a new university-wide mass storage system has recently been completed, with migration of WAV files currently in progress. The Digital Library Program is working with University Information Technology Services to ensure long-term preservation/access of Library objects in this mass storage system.
The system for storage and retrieval at the Survivors of the SHOAH Visual History Foundation involves use of local caches of 1 Terabyte for instant retrieval. If the information is not available on the cache then the disc server (180 Terabytes) at the Visual History Foundation is accessed. The final step is for the information to be found in the tape storage and uploaded to the server and local cache. This is done automatically by a system that uses a robotic arm to locate, pick, and load the desired information, but it can take between 5 and 10 minutes. This system is appropriate for a project with such a massive scope (52,000 video records of testimonies) and multi-million dollar budget, but much of its practice cannot be applied to smaller museums or archive projects as the level of funding required will not be available to produce such sophisticated systems and technology.
In summary, it is difficult to draw general conclusions or to make firm recommendations on ideal storage, delivery and management of digitized audio and moving image materials. Practice and technology are still evolving, although pointers to the future indicate that streamed delivery of highest quality audio and video material will become widespread, problems of massive storage will ease as costs decrease and reliable preservation formats will be found. Perhaps the most prudent course in the meantime is to transfer materials at the best sampling rate possible and store them at the best possible quality with as little use of lossy compression as the budget allows in order to keep the highest quality materials to work with in the future and eventually migrate as necessary.
[1] This table does not include details on esoteric formats such as lantern slides, picture discs, 4.75, 9.5, 17.5, 22, 28 and 70mm film formats.
[2] There are three different VHS formats: PAL which is common in Australia, New Zealand, the United Kingdom and most of Europe; NTSC used in the USA, Canada and Japan; and SECAM used in France, much of Eastern Europe and Russia. PAL (Phase Alternate Line) uses 625 horizontal lines at a field rate of 50 fields per second (or 25 frames per second). Only 576 of these lines are used for picture information with the remaining 49 lines used for sync or holding additional information such as closed captioning. SECAM, (Sequential Couleur avec Memoire or sequential color with memory) uses the same bandwidth as PAL but transmits the color information sequentially. NTSC (National Television Standards Committee) is a black-and-white and color compatible 525-line system that scans a nominal 29.97 interlaced television picture frames per second.
[3] DV-PAL has a pixel dimension of 720 x 576.