table of contents previous chapter next chapter
This section discusses some of the most effective methods of Quality Control and Assurance (QC&A). In the fields of cultural heritage and humanities the authenticity, integrity, and reliability of digital material is crucial to users at all levels, whether scholars, teachers, students, or researchers. What is sometimes less obvious is that effective QC&A must be planned at the outset of the project and built into all its processes. For QC&A to ensure the quality of the project’s deliverables this activity cannot be left until the end. If your QC&A procedures accomplish nothing except to reveal the errors that faulty processes have already produced, they are largely a failure.
At the outset it is useful to make the distinction between quality control (QC) and quality assurance (QA) as these terms are often interchanged. Quality control includes the procedures and practices that you put in place to ensure the consistency, integrity and reliability of the digitization process. Quality assurance refers to the procedures by which you check the quality of the final product.
For all classes of digital materials three measures of quality can be used: completeness, fidelity or faithfulness to the original, and legibility. Completeness simply measures whether the entire object has been captured, without cropping or other essential loss of material. Legibility is a functional measure which indicates whether the digitized version is intelligible: for text, whether the characters can be read; for images, whether the object depicted can be discerned at a basically acceptable level. Fidelity goes a step further and measures whether the digital version represents the original in a way that goes beyond simple legibility: a legible representation of a manuscript page will allow the text to be deciphered, while a faithful representation might also convey the visual texture of the paper and the gradations of ink color. Within these measures one can also make the distinction between subjective and objective measures. Comparing a digital image with the original using the naked eye is a subjective measure. Using a computer to log the number of potential errors per 1000 characters in OCR-generated text is an objective measure. In practice, both subjective and objective measures are combined in the whole quality control and assurance process. A digitizer evaluating color reproduction in a digital image might use a program such as Adobe Photoshop to confirm that the RGB values for a color bar included in the image fall within the correct range. A computer may flag errors in a text but it is a human editor who checks the nature of these errors (if indeed they are errors) and decides whether to accept, reject or correct the text.
Related to this point is the need to be clear about the differences between tasks that are fully automated (e.g., comparing checksums to ensure success of data transfer), some that are semi-automated (correcting what software identifies as likely OCR-generated errors), and finally, those that are fully manual (confirming that text has not been inadvertently cropped; confirming that image content is not skewed in the frame).
Finally, throughout this section it is important to be aware of the differences between the quality assessment of people and systems and the quality assessment of products. In the former case, you are intervening to assess the capacities of your staff and of the systems and products you use during the digitization and quality assurance process. For instance, you might administer a color blindness test to the staff who will be conducting the color correction and quality assurance of digital images, or you might use technical targets to ensure that a camera is operating consistently from batch to batch, or week to week. In the latter case, you are establishing systems to check the actual product that results from the digitization process. Examples of product quality assessment, which abound in the chapter, include using targets as a means to establish correct tone reproduction for the image products created by the photographer and digital camera, or checking a certain percentage of each batch of scanned images for skew and cropping as they come from the vendor.
QC&A in some form is an integral part of any digitization project, but the procedures that projects put in place are often informal, of variable consistency, unreliable, and under-resourced. The aim should be to embed QC&A in the project at the points where it will do the most good. The project must define which digital products, such as masters and deliverable images, metadata, encoded texts, transcriptions and translations, are to be included in its QC&A program. Once you have determined the range of the QC&A program, the next step is to establish appropriate QC benchmarks.
Although there are no broadly accepted standards for image quality, encoded text accuracy or audio quality, there are a number of generally accepted guidelines that can prove very useful. These will be discussed in more detail below, but in general your QC threshold needs to match the purpose of the digital deliverable. For instance, the QC requirements for digital images that will serve as replacements for deteriorating nitrate film would probably be much more stringent—given that there may never be another digitization opportunity—than for “access” images created from relatively stable black and white photographic prints.
In general your QC threshold needs to match the purpose of the digital deliverable.
The first step is to carry out an initial series of QC benchmarking tests on a sample of the analog materials to test capture settings and establish threshold guidelines for rejecting digital material that does not meet the quality criteria. This should be undertaken for each type of material to be digitized and for each of the different types of output that the deliverable will take. For example, tests should be carried out with text for screen and print, or with audio for streaming or for preservation. In this way, different QC procedures, and possibly different methods of QA, will need to be established for different types of material. The resulting benchmark standards will then form the basis of an ongoing QC&A program. These QC benchmarks must be documented and presented to project staff in a way that makes them easy to implement and easy to monitor for effectiveness.
These QC benchmarks must be documented and presented to project staff in a way that makes them easy to implement and easy to monitor for effectiveness.
As well as defining your QC benchmark, you will have to decide on the scope of your QA program. Do you check the quality of every image or page of text against the original (100% check)? Do you undertake a stratified or random sample (for example, every 10th image, or a random 10%)? What QA procedure are you going to follow when a digital object is rejected? If a digital image is created from a surrogate will you compare the digital image against the original object or the surrogate for the purpose of QC&A? Also remember that even if your digitization has been outsourced, the project is still responsible for performing a QA check on the vendor’s work, irrespective of how detailed the QC&A requirements are in the contract with the vendor. Images of England, for example, has put in place quality assurance procedures and a custom application that allows it to check the digital images, mark errors on screen and return the error details to the vendor to enable them quickly to identify and correct the error. Furthermore, will the digital images be compared against the originals or some other benchmark such as a color chart? At what magnification, if any, will the comparison take place?
It is also necessary to ensure that a project’s QC&A program includes measures for controlling the digitization environment and the way staff work (e.g. ensuring they take regular breaks). If you have specified your hardware correctly (see Equipment in Section II on Resources, and in more detail in the Equipment Appendix) then you should have a system that is appropriate for the types of materials being digitized and the purpose for which they are being created.
In addition to ensuring the correctness and informational quality of the digital materials you are creating, you need to ensure the integrity of the data files themselves over the long term. This is a significant preservation issue, and is discussed further in Section XIV. But it is also an ongoing quality assurance concern and is worth touching on here. There are two points to be addressed: checking the integrity of files as they move through your workflow and are transferred from medium to medium, and checking the storage media at intervals to guard against failure and data loss. The first of these can be automated to a large extent. Checksums, for instance, provide a simple way of ascertaining whether a file has been altered or corrupted during data transfer. For SGML and XML documents, parsing and validation can help verify the document’s integrity, though they cannot detect every kind of error. Another more powerful option is a version control system, which can track all changes made to a set of files and prevent casual or accidental alteration. In a lengthy workflow process, where the files change hands and move around a great deal, basic steps like these can save you huge amounts of effort and trouble later on.
The second point—the quality of the media on which your digital objects are stored—is easily neglected. However, storage media vary greatly in their longevity and robustness, and you should not only assess which media are appropriate for your purposes but also take preventative measures to guard against loss. Although hard disk failure is relatively rare, running periodic scandisk checks and defragmenting drives on a monthly basis can go a long way to identifying bad sectors and preventing read/write errors as well as improving the performance of your computer in the process. Removable media such as CDs and tapes should be purchased from a reputable brand; batch diagnosis for structural validity of the media and files is a relatively efficient method of quality assurance. Certain media, such as floppy disks, JAZ cartridges, and ZIP disks, have relatively high failure rates which make them inappropriate for storage and backup.
There are a number of further steps that can be taken to minimize potential errors in QC&A for particular types of material, including images, OCR and encoded text.
Image capture may serve a range of project goals, and for some of these only completeness and legibility are essential. However, for projects that require that the digital image be a faithful representation of the original (to the extent allowed by the medium), careful calibration of the equipment will be needed to ensure consistent and highly controlled results. You should also ensure that the material that you are to digitize is free of dirt and dust, that it is positioned on a calibrated scanner or camera, that the digital capture environment is adequately controlled (e.g. free from stray light sources), and that suitable control targets have been used (see box).
Resolution and color targets should be used.
Common resolution targets:
Common color targets:
Remember that color targets are made with organic dyes and that these dyes breakdown as they age. Therefore over time the charts lose their accuracy.
For all images there is a series of key QC&A tests to perform. The first set of checks is relatively straightforward. Check that the entire image has been captured (i.e. not cropped) including any captions or titles. Are pages missing or out of sequence? Is the image skewed? Does the image have the correct file name? The second set of checks is more complex to assess, and includes detail reproduction, tone reproduction, color reproduction, and color accuracy.
For images of textual material, line drawings, etchings, plans and other objects with distinct line-based features, detail reproduction is the key to image quality. When benchmarking, a resolution target or the smallest resolvable detail should be used. This provides a comparison point for examining legibility, completeness, sharpness, contrast, serifs and uniformity, paying particular attention to individual strokes and dense cross hatchings.
For grayscale and color images the bit depth and dynamic range are as important as resolution in assessing image quality. These issues have already been discussed in some depth in Section VI on images, and in the appendix on digital data capture. Briefly, bit depth is the amount of information (in bits) used to represent a pixel. The use of a grayscale or color chart can provide a standardized reference point for assessing the quality of color and tone reproduction. Assessing color and tone reproduction can be highly subjective, particularly if fidelity is desired, but features to look out for include the presence of details in shadows and highlights (an indication of a good dynamic range), and a smooth transition in tones, particularly on skin and sky (a blotchy or pixellated effect is an indication of insufficient bit-depth). Compare color, contrast and brightness to the original or to a color chart, paying particular attention if digitizing from negatives, where simple inversion can produce a color cast, or digitizing from print, where a herringbone, or moiré, effect can be present.
What to look for when checking digital images for quality:
One of the best examples of an imaging QC&A system is that of the Genealogical Society of Utah (http://www.lds.org). Quality control and assurance of the images is an integrated part of the GSU capture system and uses software specially developed for the programs. The volunteers examine each image and reject for skew, readability and color balance. If rejected, the image will be recaptured and noted in the log file for re-indexing.
When the images are sent to GSU from the projects, an audit program, again specially developed for the project, carries out further checks. A wizard sets up the rejection threshold and uses a random number generator to identify the individual items to be selected for the inclusion in the statistical sample. An auditor checks the image, and there are twenty-four possible rejection criteria; if an image is rejected then the reason for rejection is noted in the log file. If three images are rejected, the audit program turns off and the whole batch must be re-imaged. Auditors are trained to use this system and to evaluate the images. Typically, they look at 150 images in a batch at their choice of speed, e.g. 1 sec per image. They also use histogram analysis as well as checksum, to facilitate automatic QC&A. (Checksum is a value computed from a block of data and transmitted and stored along with it to check whether errors have occurred in transmission or storage.)
The work environment has a significant impact on the QC&A of digital images.
You can control some QC&A environment factors with the following methods:
When digitizing text, the page images are subject to the same QC&A checks as for line art images, but further checks are required at the OCR and encoding stages. The range, scope and method of QC&A must be established and appropriate benchmarks set. If you are creating OCR texts for indexing purposes or batch processing texts, then an error rate of 00.5% may be acceptable (this is the Library of Congress’ NDLP threshold for batch processing by vendors), but if you are creating a scholarly edition, then nothing less than 100% accuracy may be required. OCR rarely produces results better than 99.9% accuracy, or one error in every 1,000 characters (roughly 10–12 lines). Double processing documents and checking each version against the other can speed up identification of errors, but frequently there is no substitute for manually checking the digital version against the original by second, or even third proofreaders.
The Thesaurus Musicarum Latinarum (TML http://www.music.indiana.edu/tml/) at Indiana University employs one such text QC&A system. The quality control procedures in place for the digital deliverables involve at least three sets of complete checks on each text. The individual entering the data is expected to check and correct the text before printing it out and passing it to the person responsible for proofreading and checking. This second person proofs the printout and marks any errors identified. As the corrections are made to the electronic version the marks on the printout are counter-marked. Then, both the printed text with its marks and counter-marks and the electronic text are passed to a third person for review, prior to approval and addition to the database. Where there is a high error rate at any stage in the process, the text is printed once again and subjected to a second proofreading, as outlined just above. The final check and approval by the project director facilitates consistency in the quality control process. Lessons they have learned from this experience are that it has been difficult to persuade people to proofread character-by-character (rather than word-by-word) and to refrain from global search-and-replace editing. In general, the TML has discovered that no more than four double-spaced pages of 12-point text can be proofread per hour with an acceptable rate of accuracy.
With XML or SGML encoded texts the use of a parser to validate files against a DTD greatly assists the QC&A process. Remember, however, that while the coding can be declared well formed and valid, a parser would not pick up typographical errors in the content of the text, or an incorrect choice of tag from a set of structurally valid choices. To catch the latter kind of error, more sophisticated (and usually project-specific) tools are sometimes developed which allow a reviewer to look at overall patterns of tag usage and spot anomalies without having to review every file in detail. For large-scale encoding projects such tools may repay the cost of development.
Digitizing audio and video materials in the cultural heritage community is a relatively new area of activity, QA in this area is especially new, and so there is little documentation of QA practices in libraries and museums regarding AV material. The feasibility of these methods for a community that must weigh scarce labor resources against the need to protect the investment in digitization is thus unfortunately not yet well established in practice. However, some significant examples do exist.
The Library of Congress’ ‘Audio-Visual Prototyping Project’ specifies the following QA in a contract document intended for vendors who are handling the digitizing of LOC audio materials:
“Contractor quality review of audio shall include, but is not limited to, the following criteria:
Phase 2 of the European project, “Presto: preservation technology for audio and video” (http://presto.joanneum.ac.at/index.asp) includes the goal of automating AV quality control: “Implement automated quality control: audio and image analysis algorithm development, fitting of algorithms to the application, and cost / benefit analysis.” Indeed, Presto partners are developing tools that can, for instance, track errors that occur in transferring film to digital and log the timecode of the frame of the film in which the error occurs so that humans may conduct special quality control measures on that area of the file.
While work in this area begins in individual institutions and projects, it is clear that there is no consensus in the cultural heritage sector about what constitutes good practice for quality control and quality assurance of digital audio and video files (metadata is covered elsewhere). Standard QA methods such as random selection of sample files for inspection and playback by a human, as well as other QA methods developed for digital images, are highly recommended as the very minimum while more robust and yet cost-effective means are being tested on a broad scale in the cultural heritage sector.
QC&A of metadata is generally less well documented, but its accuracy is perhaps more important than that of the digital deliverables themselves. If users cannot find an object because of poorly checked metadata they will never know that you have meticulously checked and verified the audio, image, or text. There are even fewer guidelines for checking metadata quality than for images and texts, although for XML-encoded metadata the general principles for encoded text apply to a large extent. From among the projects in this Guide good practice indicates use of the following:
Good practice projects have checked metadata at the time of image checking, but many projects have very little metadata QC&A. It is recognized that metadata QC&A is not done once, as with images, but is an ongoing process. As with any QC&A, this should be considered in the resources both at the creation of the project and for the future.
The QC&A techniques presented in this section have focused largely on detailed questions of accuracy and the scrutiny of the project data. However, you should be careful not to neglect QC&A in other areas of the project. Steering or advisory groups, project plans, and flow charts can all perform an important role in assuring the quality of project management. Similarly, documented procedures for the archiving and preservation of digital material, such as the frequency of back-ups, rewinding archive tapes, and moving copies off-site, all contribute to the overall QC&A environment of a project. Finally, be sure to take a global view of the entire project and the complete product it is creating: how do the parts work together? Is the overall design successful? Does it meet your users’ needs? These larger evaluative questions require different methods from those sketched above, but are an important aspect of the overall assessment of the project’s success.