table of contents        previous chapter        next chapter

 

 

VI. Capture and Management of Images

 

Introduction

Images have tremendous power on many levels and in many contexts. Before the advent of the Web, the online world was largely limited to textual data, but with the growth of the Internet, the demand for images is huge and continues to grow. In addition to the compelling visual pleasure they offer—which puts them in high demand in the culture generally—they also carry dense cultural meaning and are being used increasingly for research, teaching, and learning.

As a creator and distributor of digital images you have many factors to think through.

As with other kinds of materials, these include the selection of objects and visual documentation for digitization, the intended purpose of the digitized items or collection(s) and their expected users and uses, as well as the availability of relevant resources. Your approach will be determined by a number of considerations: the nature of your material, the needs of your users, and your available financial resources and human expertise. Overall, you should remember that your objective is to create a high quality digital resource that will remain useful in the long term. This resource should have as its cornerstone an archived digital master that can be considered “use-neutral” and can serve as the parent to many children that are used in print, web, video and other forms of reproduction.

Other sections in this Guide provide information that bears directly on the issues in this section. Section II provides a broad contextual view on overall project planning, including many of the larger decisions that affect image capture. Section III gives guidance on selecting materials, which will play a crucial role in determining what kinds of image capture techniques are appropriate. Section VIII offers detailed information on quality control and assessment, and Section XII discusses user needs evaluation, which can help you decide issues such as the required image quality. With this information as a backdrop, an overview of the steps to be taken will include the following:

Finally, we emphasize that this chapter should be read alongside the chapter on Quality Control and Assurance, which is an integral element of the creation process.

 

Planning for Image Capture

Choice of Source: Original versus Intermediate

During the selection stage, you will have chosen the items to be digitized on the basis of their content or information value as well as for their potential importance to a set of users. At this stage in the planning you should also consider whether to digitize from the original object or from an existing surrogate, for instance if the original is inaccessible or could suffer damage from handling during the process. Examples of derivative formats are photographic prints of original art works, microfilm of newspapers, and slide images of objects.

Frequently, the physical size and condition of the object will determine this question. If an original artwork is oversized, then a flatbed scanner will not work. In addition, some objects should not be placed in contact with glass. For example, fragile work, determined as such by a conservator or preservation specialist, should not be put in contact with a scanner platen or a glass plate holding materials flat on a camera copystand. It may be possible to digitize such objects if a high-end digital camera is in place, with the requisite equipment, such as book cradle and cool lights. Even “cool lights” may emit harmful ultra-violent radiation or destabilize the temperature and humidity in the room, and if this will put the objects at risk you may need to look into the latest generation of electronic flash-based, area array digital cameras. If such equipment is not available, digitizing from a film intermediary may be a possible alternative. With any intermediary there will always be some information loss, so digitizing from the original is advisable when the highest level of quality, per item, is needed, in cases such as preservation replacement. It should be noted that digitizing from the original can significantly raise costs—not only for capture, but also for preservation—which is why intermediaries are used so widely in large-scale, production-level digitization projects.

Intermediaries pose their own problems, including poor production and scratched, faded, or out-of-focus images, all of which affect the quality of the digital product and need to be closely monitored through the quality control and assurance process. However, staff at some projects included in our survey, for example those at the University of Michigan, photograph all the originals and digitize the resulting 4” x 5” transparencies. This approach is not common among the major projects, and most will digitize from the originals. It should be noted, though, that it is better to do a good job of digitizing a well-made intermediary than a merely adequate job of digitizing original images.

Before deciding to use intermediaries, it is worth considering the following factors:

If the answer to these questions is yes, then you should consider using a film intermediary.

 

Quality and Resolution

Decisions as to what resolution, bit-depth or method of tone reproduction, and image format to use for capture, storage and delivery can only be made after project staff have completed an assessment of the material, the needs of potential users, the available technology, costs per image and likely financial support.

The key questions will be:

While you could produce a matrix and try to balance all these parameters, experience shows that you will inevitably need to make trade-offs. Although local circumstances such as resource constraints, long-term plans for the material, and other factors will dictate what trade-offs are appropriate, other things being equal it is probably a good rule of thumb to reduce quantity and maintain quality. A higher-quality digital object offers greater flexibility for future use, since you can always derive lower-quality versions from it, whereas if your needs become more demanding, a lower-quality image will be inadequate. This consideration may carry additional weight if you are dealing with rare or perishable materials, where you may not have a second opportunity to digitize in the future.

There is no set resolution for any individual resource or collection. Each project must identify the minimum level of quality and information density it requires for its digital surrogates. A detailed explanation of sampling rates and other measures of information density can be found in the appendix on digital data capture. Briefly, resolution is usually expressed in dots or pixels per inch (dpi or ppi) and measures the density of sample—the number of information samples taken per unit of area— that is captured by the scanning equipment. Generally speaking, the higher the ppi, the more detail being captured. The sampling depth, or bit-depth, measures how much information is captured in each sample: for instance, how many colors or levels of grayscale. The higher the sampling depth, the more subtle the gradations across the image, and the larger the resulting file size. The sites interviewed for this Guide used a range of sample rates from 300 dots per inch (dpi) to 600 dpi for their master image files. Lower resolution images were often derived from the master file to produce distribution copies with smaller file sizes for ease of download. The most widely adopted format for storing preservation quality digital masters is uncompressed TIFF. Derivatives, in such file formats as JPEG, are used for access and delivery versions of the image. The resolution of these derivatives generally ranges between 72 ppi and 150 ppi. The resolution appropriate for a high-quality image of the source material should be determined by the size of the smallest significant detail that will need to be visible, given the intended use of the image.

Images can be either reflective (e.g. photographs or drawings) or transmissive (e.g. transparencies or negatives). With reflective material, light is bounced off the surface of the image and its intensity measured using light sensitive diodes. In the case of transmissive material, light is instead passed through the original source. There are three constraints on the highest resolution you can use in scanning both reflective and transmissive material: the maximum sampling capabilities of the scanning device; the size of the files you can manage; and the level of information you actually need. An example of this last point involves old postcards, which were often printed on poor quality paper using cheap processes. If scanned at high resolutions you capture the texture of the paper, which can obscure other information in the image.

Remember that before digitizing you still need to answer questions about the format of the material, its size (or size range if in a batch); whether you need color or black and white (see the discussion of this question in the cost-benefit analysis section of Section III: Selecting Materials); what is the smallest significant detail in the image that you wish to capture (fine line drawing, detailed photographs, fine art); and to what range of uses will the digital objects be put?

 

Benchmarking

The greater the detail that needs to be captured, the higher the resolution required. Many projects use benchmarking to identify the best resolution for capturing a given image. Benchmarking will identify the smallest essential detail by investigating the attributes of the source and will address the issues of whether the objective is to convey the informational content of the original or to convey information about its creation, such as brush strokes or engraving methods.

One way to establish a benchmark is to take a sample of the material, perhaps that with the greatest detail, and scan at a variety of resolutions. Show these scans, using the medium on which they will be chiefly viewed in practice, to a sample of users and staff to identify which sampling rate best meets the needs for the digital object. This process will be most effective if you take into account the kinds of equipment your users will actually be using: will they have access to high-end monitors and fast broadband connectivity? Once you have identified the optimum resolution then you can set the specifications for that collection. An alternative to ‘evaluative benchmarking’ is to use a Quality Index (QI). The Cornell Digital Quality Index (DQI, http://www.library.cornell.edu/preservation/tutorial/conversion/conversion-04.html), developed from guidelines for the micrographics industry, though currently restricted to “printed text, line art and book illustrations,” can be used for bitonal and grayscale scans, by measuring the smallest meaningful area of the image and using a QI to calculate the optimum dpi.

The UK’s Higher Education Digitization Service (HEDS) has a Matrix for Potential Cost Factors (http://heds.herts. ac.uk/resources/matrix.html) which can be useful when considering different resolutions. Another good resource to refer to is the California State Library Scanning Standards (http: //www.library.ca.gov/assets/acrobat/scandocrev1122.PDF) but you should use such instruments with great care.

The main principle to remember is that, once the resolution has been identified for a particular batch or collection, it should be fixed for those materials for the duration of the project.

 

Method of tone reproduction

When you undertake the actual scanning of your images, you will need to choose between several methods of tone reproduction. This phrase refers to the number of bits (or “bit depth”) sampled for each pixel, and to the technique used to capture the full range of tones (or density) in the source materials.

Scanners record tonal values in digital images in one of three general ways: black and white, grayscale, and color. In black-and-white image capture, each pixel in the digital image is represented as either black or white (on or off). You can choose the threshold at which a given point will be considered to be black, and you can use halftoning algorithms to create a “screened” pictorial image, but black and white scanning is generally appropriate only for digitizing text and line art. In 8-bit grayscale capture, the tonal values in the original are recorded with a much larger palette that includes not only black and white, but also 254 intermediate shades of gray. Again, the thresholds can be controlled—either manually or automatically—as a first step to ensure that meaningful information in the highlight or shadow areas of images are adequately captured, and to get optimal results for originals with different degrees of contrast and overall darkness. In 24-bit color scanning, the tonal values in the original are reproduced from combinations of red, green, and blue (RGB) with palettes representing up to 16.7 million colors.

Your decision about which method to use for tone reproduction begins with the choice of black-and-white, grayscale, or color scanning as the default minimum for various classes of material. Your user needs will be an important determinant, but you should not assume that there is a simple correlation between increasing information capture and increasing user benefit. If your original is a faded black-and-white photographic print, for example, color scanning might accurately reproduce the effects of aging. However, file sizes for the color images would be three times larger than their grayscale counterparts. If color is not needed to meet the stated viewing, printing, and publication objectives for your project, grayscale may be the better option since it will be faster to download and smaller to store. Relatively speaking, adjusting resolution is a very straightforward process. Adjusting the settings for tone reproduction, however, can become complicated. Assume that each incremental jump might require changes to equipment, to operator skill, or both.

Having selected bit depth, the most important decisions regarding tone reproduction relate to reproducing the densities of tone in the source materials. Good digital masters are characterized by having pixels distributed across the tonal range in the image from black to white. Stated in the negative, tones are not “clipped” in good masters. Following scanning, all subsequent image processing relies upon pixels either being in the region of interest or not. A scanner that automatically chooses to reproduce the middle range of tones in an image might not have enough dynamic range to capture the meaningful highlights and shadows at both sides of the spectrum. Because experts say that much of the secret to good color reproduction is getting the neutrals right, the method used to capture densities should be carefully planned. For reflective materials such as photographic prints, grayscales might be used to set the “aimpoints” for black, white and middle gray in the digital masters. With transmissive materials, however, these decisions might need to be made by the operator on a case-by-case basis, which is one of the reasons that it can take much longer to scan negatives than prints in order to produce pleasing reproductions.

In addition to user needs, you should also consider which method will actually capture the information of your originals best. Grayscale imaging is appropriate for continuous tone images which lack color. Black-and-white imaging might be warranted for line art if cost savings are a concern; for instance, a black-and-white scan of a typescript page might provide adequate legibility. However, many forms of line art (for instance, manuscript pages) contain subtle color variation which might be of importance in distinguishing overlays of pen strokes and identifying the order in which the marks were made. For these purposes, full color scanning would offer significant benefits.

Projects will occasionally benchmark their imaging by establishing a minimum resolution for the collection as a whole. Most often the resolution depends upon the original, the resources available and the use that you intend to make of the material. Sample testing provides a good way to benchmark sampling rate. Setting minimum requirements is generally accepted as the best practice for establishing the capture requirements, but this must be weighed against the resources available.

Projects with very long-term digitization efforts may have to address the changes in standards for creating digital resources. Improvements in scanning technology and the diminishing cost of storage have radically increased the cultural heritage community’s minimum standards for capture quality, and digital objects captured ten years ago may have been scanned at what would now be an almost unacceptably low level. A museum or library with a large number of legacy images captured at lower resolution will need to decide how to handle the transition. One possible strategy would be to digitize new material at the current recommended standard, and create from these new scans a set of lower-resolution derivatives to use with the legacy materials, if internal consistency is a crucial consideration. Over time, it may be possible to rescan the legacy materials and slowly upgrade the entire collection. Under most circumstances, it would be a waste of time to scan the new materials at an arbitrarily low resolution. The conversion of material from analog to digital will always be done in an environment of advancing technology and improving standards. You should be sure that you strike a balance between the costs, preservation suitability of the format, and the best usable standards. Generally, best practice is to capture at as high a level of overall quality as is appropriate to handling, quality, budget, or use considerations in order to limit the upgrading that might be required otherwise.

The advice box below gives summary guidelines for selecting the mode of tone reproduction.

 

Practical Advice Box:

Some practical guidelines for selecting the appropriate scanning method

 

Some projects scan at higher than 24 bit-depths (32-bit color 4 color channels of 8 bits each captured as red, green, blue, and 8 bits of grayscale or 48-bit color) even though the current generation of applications cannot render this depth at the moment. This is a hedge against future improvements in technology that you might wish to adopt. For originals of wide color gamut—e.g. paintings or natural history specimens—it may be important to select capture devices that allow the storage of raw files of 12, 14 or 16 bits per pixel, if an extremely high degree of color accuracy is a goal. Dynamic range may be checked and the image archived for future output devices that will be supported. Document the specific characteristics of the capture device, e.g.. non-interpolated highest resolution, spectral response captured IT-8 targets with the 50% gray neutralized, etc.

Transmissive material (for instance, slides and negatives) requires special treatment when scanning. The decisions made for resolution and bit depth are the same as for reflective material, but as slides and negatives tend to be smaller in size the resolution must be higher to achieve a digital image of useful size and quality. Although film scanners have generally been recommended to capture transmissive material, today many mid- to high-end flatbed scanners outperform production slide scanners. Projects at the Library of Congress capture 35mm slides up to 3500 dpi (the range is from 1200 dpi to 3500 dpi). Individual projects will identify the resolution required depending on the original source, the detail of content and the users’ requirements.

You also need to ensure that there is a match between the dynamic range of the material that you are scanning and the abilities of your equipment to capture that range. Dynamic range is expressed as a scale from 0.0 (perfect white) to 4.0 (perfect black). Some color scanners miss subtle differences between dark and light colors and as a result do not create faithful digital representations of the content. The better quality the scanner, the greater the dynamic range. Drum scanners can capture between 3.0 and 3.8 and give good color quality; however they are very expensive and impractical for large projects. The scanner’s ability to capture dynamic range is an important factor when deciding which equipment to purchase and this is especially true if you are working with transmissive material. Do not believe the manufacturer’s claims of dynamic range. Instead, make your own tests by scanning a standard 22 step Kodak film gray scale and subtracting the highest from lowest perceptible densities. A scanner might have a high dynamic range but also introduce unacceptable noise in the dark areas of the image. If you cannot do your own testing, you may be able to find another institution that has recorded results and is willing to share them.

Knowing your sources will allow you to ensure that you have configured your equipment appropriately. For instance, halftone images are made of regular patterns of dots or lines (the word “halftone” is derived from the printing technology method of representing photographs or artwork). They can be either color or grayscale. During digitization, the scanner dot sample combined with the halftone dots can produce undesirable patterns known as the ‘moiré effect’. One way of avoiding this is to scan at high resolution, but you may also need image enhancement processing, either post process or at scan time. Most good quality capture software packages will enable descreening to minimize this kind of scanning artifacts, although you may have to do this post-processing, using an image processing tool such as PhotoShop.

 

Process and Management Issues

Choosing a file format

In choosing the appropriate scanning method you must not only identify the appropriate digitization standard but also the best file format. You should select file formats that are platform-independent and supported by a wide variety of applications, both for browsing and editing. It is generally understood that for master images, uncompressed TIFF (Tagged Image File Format) provides the most suitable preservation format. ‘Uncompressed’ means that all the information encoded during the scanning process is retained in full. Digital projects store their master archived files as TIFF and then create smaller or derivative files for display and access for the users. These derivatives may be created using a number of different formats.

The file formats typically used for display and access are JPEG (which will gradually be displaced by JPEG2000), GIF, MrSid, PNG, and PDF. These file formats use compression algorithms to reduce the size of the digital files. Some algorithms, such as the LZW (Lempel-Zif-Welch) encoding scheme used by GIF do this to some degree without disposing of information altogether. LZW, for example, uses a lossless algorithm to compress files by up to 33% without throwing away data irrevocably. When the file is decompressed it is possible to reconstruct the compressed data. Other approaches to compression, including fractal and wavelet compression, take more radical approaches and sacrifice data to minimize file sizes. JPEG (Joint Photographics Experts Group) takes advantage of ‘lossy’ compression algorithms, and in particular fractal compression.MrSid (Multi-Resolution Seamless Image Database) uses wavelet-based image compression, which is especially well-suited for the distribution of very large images. The Library of Congress uses MrSid to deliver maps from its collections. As well as having impressive compression capabilities with limited visible information loss, it stores multiple resolutions of images in a single file and allows viewers to select the resolution (in pixels) that they feel will be most appropriate. There are other file formats that are suitable for delivery of images in different environments and you should consider which of these will be most appropriate for your project. The table below provides an indication of the most commonly used formats.

 

Definition Box:

There is a range of non-proprietary and proprietary image file formats available. This table includes some of the more common formats for raster images.

Extension Meaning Description Strengths/weaknesses
.tiff, .tif TIFF (Tagged Image File Format) Uncompressed file. Originally developed for desktop publishing. 1 to 64 bit depth. Used mostly for high quality imaging and archival storage. Generally non-compressed, high quality. Large file sizes. Most TIFF readers only read a maximum of 24-bit color. Delivery over web is hampered by file sizes. Although LZW compression can reduce these file sizes by 33% it should not be used for archival masters.
.gif GIF (Graphics Interchange Format) This 8-bit file format has support for LZW compression, interlacing and transparency. Lossless compression. Popular delivery format on web. .png was defined to replace GIF.
.jpg, .jpeg JPEG (Joint Photographic Experts Group) Compressed images. 8-24 bit. Variable amount of compression to vary quality and file size. Lossy compression. Widely used delivery format. Flexible.
MrSid Multiresolution Seamless Image Database image-compression technology Lossy compression. can compress pictures at higher ratios than JPEG; stores multiple resolutions of images in a single file and allows the viewer to select the resolution.
.pcd ImagePac, PhotoCD Lossy compression. 24 bit depth. Has 5 layered image resolutions. Used mainly for delivery of high quality images on CD.
.png PNG (Portable Network Graphics) Lossless compression. 24 bit. Replaced GIF due to copyright issues on the LZW compression. Supports interlacing, transparency, gamma. Some programs cannot read it.
.pdf PDF (Portable Document Format) 4-64 bit depth. Uncompressed. Used mainly to image documents for delivery. Need plug-in or adobe application to view.
.pct PICT Compressed. Mac standard. Up to 32 bit. (CMYK not used at 32 bit.) Supported by Macs and a highly limited number of PC applications.

 

File sizes

You will need to determine how much storage space your image files will require and this is best done by working out the file sizes first. There are different approaches for scanners and linear scanning digital cameras. In the Calculating Box, Estimating Approximate File Sizes, we provide you with an example of the method. The quality of the master images will reflect the work processes you put in place, the equipment you use, and the resolution (dpi) and bit-depth (BD) you select. External variables that may limit this are the amount of storage available and the time for scanning each document.

 

Calculation Box:

Estimating Approximate File Sizes (in bytes):

Approximate file sizes for material created with a flat-bed scanner can be determined using the following formula:

FS = (SH x SW x BD x dpi2)/8

FS = file size

SH = Source Height (inches)

SW = Source Width (inches)

BD = bit depth

dpi = dots per inch

/8 because 8 bits = 1 byte, the unit in which file sizes are measured.

 

Large file sizes make compression essential for delivery to users. Derivative files in JPEG format are interoperable within the community and accessible through common access systems, i.e. the Internet. In some cases a much smaller thumbnail version of the image is used for quicker access (for instance, in displaying search results), allowing the user to download a larger file once they have found the image for which they are looking.

 

Calibration

Calibration is a crucial aspect of the quality control process in any digitization project. You will need to calibrate the work environment, the monitor, the capture devices, and the light sources. Frequently during the life of the project you will find it necessary to recalibrate this equipment as well.

First, establish the physical environment where the digitization work will be carried out. Avoid fluorescent lighting, natural lighting, reflections, and bright colors on the walls. Ideally, darken the space with black out curtains or dark neutral walls, to eliminate ambient lighting, so that the only light source is the scanner bulb or the cold lights used for the digital camera. Where possible, maintain a dust-free environment with operators wearing neutral colored clothing and protective shoe wear. Avoiding carpets on the floor reduces dust accumulation. While it may not be possible to meet all these conditions, it is crucial to ensure the absence of ambient lighting — do not set the equipment up in a sunny room with fluorescent lighting! Consult conservators to insure that the temperature and humidity conditions will be stable and appropriate for the original work.

Once you have established the physical space, calibrate the monitor of the workstation at the beginning of every day, to ensure that the contrast, brightness and gamma settings are consistent. As monitors improve, gamma settings should remain constant, but do not assume that this is the case. A simple calibration program found in image processing software such as PhotoShop can help to do this calibration. Establish the settings for contrast and brightness at the beginning of the project within the adapted space and make sure all operators adhere to them. A set of guidelines with the appropriate settings will ensure the consistency of the digital images. Data about the settings of the capture device should appear as part of the metadata associated with the digital file. For more critical work you should use a spectrophotometer and calibration software such as the Eye-One Monitor (see http://www.gretagmacbeth.com).

Calibrate the equipment used for the digital process regularly. Most flatbed and film scanners can only be calibrated by their manufacturer because calibration is done at hardware rather than software levels. High-end digital cameras may have to be calibrated as the focal length and lighting may change with daily use. We recommend that calibration settings be defined at the start of each batch. Whether the work is being done in-house or outsourced, an appropriate policy or contract arrangements should be in place to ensure equipment is regularly recalibrated.

 

Color management

A Color Management System (CMS) should be employed by any institution wishing to accurately reproduce color from the source material (original object, artifact or film master) throughout the entire chain of digital hardware devices including monitors, dye-sub and ink-jet printers, film recorders and printing presses. A CMS is a group of software tools and hardware measurement instruments that work together to map the wide gamut of the original color space into the narrower gamut of display and paper output so that a reasonable and perceptible consistency and quality is maintained.

These systems are complex, evolving, and beyond the scope of this Guide. However, there is a growing need, both in terms of economics and image quality, for institutions to develop expertise and deploy a CMS. While lithographers, pre-press experts, image capture specialists and web designers all may use some form of a CMS they often are grounded in different nomenclatures. Therefore a standard form of communication is required to achieve the desired results in image distribution.

Fruitfully, there is a growing body of literature on the subject. The lingua franca is currently found in Adobe Photoshop version 7, which leverages Apple’s Colorsync and Windows ICM operating system core color technologies to provide cross platform color management. These systems use “ICC profiles” that are based on rules made by the International Color Consortium for storing color information.

The most accessible writing on the subject has been by Bruce Fraser who is preparing a mass market reference on color management for autumn 2002 release.[2]

 

Link Box:

Color Management Web References:

International Color Consortium (ICC): http://www.color.org/

Apple Corporation’s ColorSync: http://www.apple.com/colorsync/

X-Rite: http://www.xrite.com

Munsell Color Science Laboratory: http://www.cis.rit.edu/mcsl/

Electronics for Imaging: http://www.efi.com/

GretagMacbeth: http://www.gretagmacbeth.com/

 

Targets

Targets provide a mechanism for benchmarking the capture process. In simple terms, a target is a sample with known characteristics that can be used to establish a baseline or standard by which to assess capture. Your project should adopt a policy on what targets and scales to use, when to use them, and how their quality will be controlled. Two types of targets are commonly used: edge and detail (resolution) targets and color or grayscale charts. See Resolution Targets Example Box in Section VIII: Quality Control and Assurance.

Ensure that the targets used are appropriate to the material being scanned, as they are different for transparencies, prints, and certain kinds of reflective material. You should use new targets at the start of your project as they lose their accuracy with age. Scan the appropriate target at least once a day, at the start of a new batch of material, or if the settings of the equipment are changed for any reason. Although some think that targets should be attached to every image, this is not possible for non-reflective formats and will have resource implications as images then have to be cropped for display (though it is sometimes possible to automate this process). Including a scale with each digital image can be of value. If it is not in each image it may be sufficient to include it in the first image of a batch and whenever the settings of the equipment are adjusted (e.g. the position of the camera).

 

Metadata

Metadata associated with digital objects is the key to their sustainability. Metadata assists in the management of the digital data, documents its origin and specifications, and assists with discovery and retrieval. For detailed information on metadata, see the Appendix on Metadata. Briefly, from among the many types of metadata there are three that are generally used with digital images:

A number of issues arise when designing a metadata model for a project and these must be addressed at the outset if you are to ensure consistency:

Projects, on the whole, use a mixture of automatic capture and manual capture by the operator, subject specialist, or cataloger. The level and depth of metadata captured vary from project to project according to resource levels, including both staffing and equipment.

The operator of the digital camera or scanner will usually capture those technical metadata (as part of the administrative metadata set) that are not automatically captured by the scanning software. Descriptive metadata, particularly those relating to the original object, should be prepared in advance by a staff member with specialist knowledge of the collection.

 

File naming conventions

Before beginning a project you should define a file naming protocol so that your filenames will be consistent and intelligible. The filename can be part of the metadata and the information in the filename can be used to reflect the collection details, e.g. collection name, book title, or page number. One of the most common pieces of metadata in file names is an ID number. In file names you should always ensure that the three letters after the period are reserved for the file type information (.tif, .jpg) as this is important metadata for the operating system to know which applications can handle this file type.

 

Images of Text

Although the capturing and encoding of text have been dealt with in the previous section, here we will raise some of the issues in capturing pages of text as images, rather than as encoded text.

As already indicated, text can be captured in three ways: as an image only; as fully transcribed text; or as fully transcribed text with an associated image. There are currently no viable technologies for performing text searches on images, so if searchable text is required, some sort of transcription will be a necessity. However, for many purposes the original presentation of the text is also important, in which case an image of the text may be a useful addition. A source that is visual in nature as well as textual, for instance, an artist’s book composed of both image and text, is best represented as both data types.

 

Example Box:

Projects that link images and text include:

 

Projects that have only captured images of text because of the condition of the original material (for example, manuscripts or very early printed books), often store the best quality images so that when OCR technology improves sufficiently, or when more resources become available, they can create machine-readable text from these images.

Although proprietary file formats are generally not recommended for digitization projects, especially for archival versions, the PDF (Portable Document Format) can be very useful for a deliverable page image. You will have to buy the software to create PDF files (Adobe Acrobat), but the software to view PDF files is provided free (Adobe Acrobat Reader). The ability to create PDF files from any application and PDF’s platform-independence make it particularly useful for delivering text material over the web.

 

Definition Box:

Portable Document Format (PDF)

 

Post-creation

Quality control and assurance is an integral element of the creation process, and the best digital surrogates should be created to minimize the work involved (see Section VIII on Quality Control below). Standards, technology, staffing and financial resources are all part of this process. Of all of these factors, perhaps the most significant is operator skill and training; investment in both of these areas will pay off significantly in the long run. However, there are also post processing techniques that may be needed to enhance the digital image. This is a practice carried out by a number of projects and generally in batch mode. The standard program used is Adobe PhotoShop© that has various filters and tools to assist in post image processing. Tasks commonly performed after the capture stage include unsharpen mask, gamma correction, noise reduction, deskewing, and cropping. Although you can do a lot in an image manipulation package, it is no substitute for care and attention to best practice at capture time. It is more efficient to set the guidelines for capture and make sure the operators adhere to these than to run image processing to redress errors made in the capture process.

The dynamic range (or density range) of a scanner is a secondary, but important consideration. If a scanner has a high dynamic range, it can sense a wide range of light levels. Dynamic range is usually higher in drum scanners and slide/film scanners. The dynamic range does not actually indicate how many different levels can be resolved but determines the smoothness of transitions between adjacent tones.

How scanner and camera manufactures implement this technology varies. A table in the Appendix on Equipment indicates the types and uses of scanning equipment available.

 

Conclusion

The successful digitization of still images begins with a careful examination of a set of complex issues. From selection of materials to the best means of capture and storage, these issues must be carefully considered for each project. Questions the project manager asks may include the following. Which materials will be selected for scanning—originals or derivatives? If fragile originals are digitized, what safeguards should be in place to protect them? What file formats are most appropriate for the material? What are appropriate file sizes and what will be the impact on storage requirements? What is the best way to develop a file-naming protocol? What about sustainability? This section has answered many of these questions or provided a framework within which to begin to seek answers. For further insight, the reader should consult Section VIII on Quality Control and Assurance. That section further clarifies procedures that should yield a quality product.

 


[1] That said, it is worth bearing in mind that Manfred Thaller’s work at Duderstadt ( http://www.archive.geschichte.mpg.de/duderstadt/dud-e.htm) and work commissioned by the Library of Congress (http://www.loc.gov/) concluded that for manuscripts except under special circumstances grayscale scans provided sufficient information.

[2] Bruce Fraser, Fred Bunting, and Chris Murphy, Real World Color Management (Peachpit Press: forthcoming, autumn 2002). 600pp; ISBN: 0201773406.

[3] For digital cameras this excluded consumer digital cameras that tend to use Contact Image Sensors (CIS) for weight and cost considerations.

[4] This description relates to standard (Legal sized, 8½ x 14”) flatbed scanners. Outsize (or large format) flatbed scanners are also available and employ similar technology. Sheet feed scanners employ an optional attachment to a standard scanner to enable automatic batch scanning of loose-leaf material.

[5] Engineering Scanners are also referred to as Wide Format Scanners.

 

  table of contents        previous chapter        next chapter




valid xhtml 1.0 strict
abp~03/03