NINCH guide home interview table of contents previous interview next interview
HATII interviewed Richard Laxman, Herbert J. White and Barry R. Urry from The Church of Jesus Christ of the Latter-Day Saints, on January 29 2001. As part of the Family and Church History Department, the Genealogical Society of Utah is responsible for keeping genealogical records and making them accessible to members of The Church. In order to increase the accessibility of the Department’s resources, digital imaging projects were initiated. Digitization was also employed with the intention of preserving the historical documents and artifacts, while also aiding research activities.
The Family and Church History Department of the Church of Jesus Christ of Latter-day Saints has a massive collection of records which come from archives, libraries, and record custodians from all over the world. The Genealogical Society of Utah (GSU) was founded in 1894 to gather genealogical records and assist members of The Church of Jesus Christ of Latter-day Saints in tracing their family histories. All the records in the collection have one area of commonality, that of genealogy, i.e. birth, marriage, death records and associated information.
Their imaging projects are organized from the first approach and negotiation to the final product and index. The details of this organization are the same for the microfilming projects as they are for digital imaging projects, only the representation format of the documents changes and of course the storage and archiving.
The selection strategy is based on acquiring the records from worldwide sources for the Family and Church History Department. The main associated drivers for the imaging projects include: Historical and cultural value, research significance, improved functionality, enhanced access, research into digitization strategies, preservation (e.g. against content loss), potential to reach disadvantaged groups across the globe.
This will not be altered over time, as the organization has one driver for collecting the information and that will not change. The projects they work with have varied drivers for collections and selection of material but the GSU has one aim in all imaging projects.
The very nature of the imaging projects means that they collaborate regularly with a variety of organizations including archives, record offices, libraries, and religious organizations. This collaboration is at all levels. A more detailed description of how this collaboration is achieved is given in Project Management and Planning.
Each imaging project undertaken by the GSU follows these six stages:
This stage covers all from authenticating the images (to ensure that the data is appropriate), negotiations with the archives to achieve permission to acquire and to enhance the material (where increased readability is required). This raises the issue of copyright when the images are enhanced. This is agreed at the time of negotiation. The Family History Department negotiates the rights to reproduce, distribute, display, use and permit the use of the Digital Images, for nonprofit purposes, via any technology and in any medium. The goals at this stage are to define a win - win situation for all parties (record Custodian, GSU, and patron), to provide better access and to establish good practice in managing imaging projects.
This stage is the technical process of capturing the digital images.
The subsequent stages of the project are described in the following subsections:
The Six Stages have been designed by the in-house project management staff and is in place after many years of microfilming projects.
The Six Stages cover the process from the Records Custodian either approaching or being approached by the GSU to digitize the records, to the digital images being indexed and made available to patrons. All the processes are broken down to discrete units that are managed by the appropriate team from the GSU, e.g. digital imaging by the Field Services and Support Division and volunteer indexing by the Extraction Section.
The source material will vary widely from project to project and can range from handwritten documents to typescript text. The content is always consistent with the remit of the GSU.
Standards are vitally important to the GSU. Not only in adhering to accepted standards, but also in helping to establish new standards in the digital imaging arena. Staff are members of AIIM - Association of Information and Image Management. AIIM is the standards developer in the United States responsible for writing standards for the Document Imaging Industry. GSU staff, working in conjunction with AIIM and the other volunteers who have interest in Document Imaging, are in the process of identifying the areas where standards are required in this Industry. Once written, the standards are then forwarded to ANSI to be adopted as an American Standard, and to ISO via International Standards Technical Committee 171 (TC171) where the standard will be changed to reflect the needs of the international community and adopted as an International Standard. An example of such a standard is the recommended practice, which the GSU is writing regarding Metadata for document imaging. This document will include the Categories, Elements and Sub-Elements needed to describe a document, which is being digitized and will be made available to a general audience. This document will include an XML DTD, which will be mapped to EAD, Dublin Core, and other defacto standards. They use standards for improving interoperability within projects and also to further the research and development of preservation materials.
As the GSU has built on the experience of the microfilming projects (that are still currently happening), they have a whole organization behind them with the support required to ensure that the digital program is successful as the GSU is an integral part of the larger organization of the Church of Jesus Christ of Latter-day Saints.
The GSU has a large staff of specialized people working in different areas of the digital program. These core staff are supplemented by paid employees and teams of volunteers that work in a variety of areas in the digital imaging projects. These volunteers come from the Church as missionaries. Typically, they are retired couples who are assigned to the imaging projects. They can come from a variety of areas not necessarily linked to digital imaging but where possible, skills are targeted to appropriate areas. The process to becoming a missionary is through established procedures in the Church before being assigned to a specific area. It can be a two-way approach.
Volunteers are then given a two-week training program in digital imaging, which can then be applied to any project regardless of the type of material. A volunteer will spend 12 - 24 months on a project. They fund themselves in accommodation, food and everyday expenses, but GSU may help with travel where necessary.
This training is intensive and covers four main areas: document preparation, imaging, quality assurance of images and indexing. The main training is in using the digital camera that will be used in the projects. Any on-site training required is met by the on-site project manager. There is a regional project manager and an area manager.
The training schedule takes the volunteers from basic digital concepts and computer basics through to setting up the camera from scratch. The training manual is comprehensive and enables the volunteer to refer to it when on site. The GSU have found it simpler to train volunteers to capture images digitally than to use microfilm. The digital operator is now the evaluator, thus removing one link from the chain. They try to match volunteer’s skills with the skills required for the project, such as document preparation or imaging. The issues with the archive and how to deal with the materials are dealt with on-site. The GSU sends out the volunteers with training on the practical use of the digital camera and software and the associated software and the skills are cemented on-site.
The volunteer management is part of the larger organization that drives the digital imaging projects. Having project managers at various levels throughout the world enables the volunteers to adapt to the project after the initial training as support is available at all levels. Technical support is through the nearest technical staff, i.e. Germany for Europe. At the moment, there is one digital project in Scotland. However, the basic structure was in place from the microfilming process and it is the GSU desire to move to digitization as the future of imaging that has adapted the process for digital imaging training.
The success of the volunteers is based in the organization of the whole and the levels of management that are established from the start. Rarely do volunteers have to stop working on a project as the training ensures that they are able to work on all the processes in the projects.
They are constantly updating the training to meet the needs of the volunteers and to adapt to the changing technology.
The copyright issues can be a major factor in the imaging projects. These issues are identified and agreed at the negotiation stage of the process. The GSU very rarely holds the copyright to the materials. The negotiation process can be lengthy, in order to achieve the goals required by the archive, the GSU and ultimately the user. These delicate negotiations are a vital part of the whole process. The Family and Church History Department has to eventually make the materials available to genealogists across the world, so the rights to distribute the information must be attained.
Users will be able to view and possibly print and download a variety of objects, from straight indexes to JPG images of the digital material. There may be minimal charge made for this to cover costs.
The archivist on-site generally prepares the material and the volunteers who do the imaging have basic training in handling delicate materials. The material they work with will have been prepared by the curators where necessary. The GSU is occasionally involved, at the request of an archivist, in the preparation process, but usually only handles the objects in the digitization area.
The GSU has a digitization system in place for their external and internal projects. This is the capture, convertion or acquisition stage. In digital projects, they are capturing the image from original materials. They have a system that includes the digital camera, the software to acquire the image and create metadata automatically that links to the main catalog and index to assist in retrieving the digital image and information from these systems. The imaging process also includes a simple and efficient method of quality assurance of the images.
The GSU has recently acquired another brand of digital camera, which has an 8 megapixel range. The camera set-up has daylight balanced fluorescent lighting for cool lights and baffles to protect the eyes of the camera operator. Each computer attached to a camera has three hard drives. One is used for the camera software. The other two are configured in a RAID format to rapidly store the images at capture time.
The camera has a three-pass RGB system to capture color images. Their software is specially written for the camera and for archival projects. They use color and grayscale targets to calibrate the camera, typically they will do a large-scale calibration once a week to ensure the LCD filter has not shifted and color charts at the start of day. The lights have been an issue. They typically last 11, 000 hours and can add to color shift but the large scale weekly calibration ensures this is kept stable. Recent enhancements to the software provided automated color balance. Exposure is checked daily. The capture and save process takes 6 - 8 seconds for color images, 1 – 2 seconds for grayscale and bitonal images. If the image is deficient in any way, it is overwritten. The software allows the camera operator to recapture the image, automatically write over the unacceptable image, keep the image file name the same, and ensure the log file is not in error.
A log file to hold a limited amount of metadata is created automatically during capture. This file will hold up to eight or more elements to be defined by the operator and could include: the operator’s name, the filename, the camera details, etc. The file is automatically updated at each capture. The operator will have to occasionally enter the name of the next large segment of material, for example, a new register book.
The Family and Church History department has begun to develop metadata categories for the digital images. This development is crucial to the longevity of the images. They are researching various categories of metadata to include: discovery metadata, preservation metadata, rights and security metadata and representation information metadata. They have evaluated the Dublin Core as well as many other systems but are favoring Dublin Core for the discovery and description category. The breakdown of their metadata includes:
Each metadata element is defined using a set of attributes. Some of the definitions come from the ISO/IEC 11179 standard for the description of data elements. Others come from the Family and Church History Department.
The metadata is the key to the index and catalog that enables users to conduct genealogical searches. The GSU has an extraction program that is staffed by volunteers as the capture programs are. These volunteers extract the genealogical data and indexes from the digital images.
The catalog to hold the extracted information has been developed specifically for the GSU by specialized developers. They are also working closely with Brigham Young University (BYU) to develop sophisticated techniques to assist with the extraction. This can be extremely complex due to the nature of the images, i.e. hand writing and language. The catalog will eventually also be linked to the images for access.
As explained above the capture stage is controlled by automating the process as far as possible and by using the same system for all capture projects as well as by constant research into the latest technology for digitization.
The images produced from the capture process are TIFF files.In one project they produce three images: one full color uncompressed TIFF for archive, one full color JPEG for the patron and one grayscale image for the GSU. The JPEG is for access purposes. The capture format standard for the GSU is uncompressed grayscale, since all of the information on a given document is legible on a grayscale image and the file size is smaller. The image files are linked to the metadata file, which is created during the capture process. Typically a color file will be 14 - 18 MBs uncompressed, grayscale 6MBs uncompressed, and Bitonal 150 - 250 KB compressed. The patron will have a compressed image of sufficient quality required for their purposes and the GSU have their images in grayscale or bitonal where the quality is adequate.
To determine the correct resolution, the GSU has established a standard for the capture of the characters on the document. The process is known as Pixels per Line Segment or PPLS. Prior to capture, the operator will find the thinnest, lowest contrast character represented within the documents to be captured. Tests will be made on the line stroke of the character to obtain 3 - 4 pixels in the grayscale format and 2 - 3 pixels in the bitonal format. Through this process the operator can be assured that all characters with line segments at that level or above will be captured and made readable.
Some images, which are captured, require image enhancement by virtue of the type of documents that are being acquired, such as documents that are 100 years old or older and have aging blemishes. Some images are enhanced, as the nature of the material generally requires enhancement to make the data extractable. It is in this area that the GSU is working closely with BYU to develop algorithms to enhance the images. They are using technology to clean up fuzzy typescript, to de-fog and remove image distortion. Interestingly, they are also researching the possibility of OCR for handwriting. The computer scientists at BYU assisted in the development of the lasso tool for Adobe PhotoShop. It is technology such as this that is being developed to enhance old handwriting of documents with a large noise factor. The goal of the Family and Church History Department would be to produce these types of technology and then make them available as public domain for other Archives and Libraries to use, to make their documents readable. See http://python.cs.byu/gendocs/gendocs.html for more information on the algorithms used for the image enhancement.
Quality assurance of the images is an integrated part of the capture system and uses software developed for the programs. The camera operator will examine each image and reject for skew, readability and color and lighting balance. If rejected, the image will be recaptured. The defective image will be written over and noted in the logfile or indexing.
When the images are sent to the GSU from the projects, there is an audit program that further ensures quality. This audit program is once again specially developed for the process. A wizard sets up the rejection threshold based on the number of images in the sample and uses a statistical sample created by a random number generator. An auditor checks each random image. There are currently 24 reject criteria - these criteria are identified by codes. Once an image is rejected the code(s) will be associated with image file name as part of the image log file. Based on the number of images in the process and the rejection thresholds which have been established, if the maximum rejection number is reached, the audit program turns off and the whole batch must be re-imaged. Auditors are trained to use this system and to evaluate the images. Typically, they look at 150 images in a batch at their choice of speed, e.g. one second per image. They are evaluating future auditing processes that will use histogramic analysis as well as checksum to automate the audit process.
Evaluations with users are handled by a different area of the Family History Department and are ongoing as the technology develops.
Preservation is crucial to the GSU, their data and information must be available for as long as possible and as accessible as possible. Currently, they use DLT tape and spinning discs for the storage of the digital materials. The tape solution is used, as this enables the material to come directly from the field. Backup tapes are stored off-site in the Granite Mountain Records Vaults - a secure, climate controlled underground storage facility in Utah. The index is the key to the tape access. Once again, they are working closely with BYU to explore digital storage and long term preservation.
Norsam (http://www.norsam.com) is the current method that seems to have the most promise for long term storage of digital materials. This technology etches the information on the disc using ion beams, and is technically visible to the human eye, with the use of a powerful microscope. These discs can hold up to 17,000 images etched on a two-inch square plate and 165 GB data compared to 4.7GB DVD or 0.6GB for CD.
It is not only the technology that ensures the longevity of the materials, it is the migration of hardware, software, media and people skills and the GSU is constantly reviewing its management procedures to ensure that no information can become obsolete due to hardware or software or media or even skills being lost.
The work with AIIM and ISO and ANSI is assisting these goals in the creation of GEDCOM and XML DTD extracted from the EAD written to tackle the issues of digital data capture and information retrieval. The extraction program and index will rely on this as will the metadata at the capture stage.
Access to the digital materials is covered in three areas, the index to the images, the catalog and research guidance. The index is linked to the catalog which in turn will lead to the digital versions of the records. The GSU plans to have various levels of access; worldwide access, the internet and intranets, through libraries and homes, through electronic media distribution and also through human readable media distribution. The goals are to have what you want, when you want it, where you want it, in whatever format you want it. The main drivers are improved readability through their work on image enhancement, viewability and accessibility. At the moment any charges cover cost of distribution. Access to the materials on line may or may not remove this cost. However, if users want hard copies or copies on CD for example then a cost recovery charge will be made. It is the nature of the materials and the nature of the digital program that drives this large-scale accessibility.