NINCH guide home interview table of contents previous interview next interview
On January 24 2001, HATII interviewed Douglas Greenberg, CEO of the Survivors of the SHOAH Visual History Foundation (VHF); Sam Gustman, Executive Director of Technology; Karen Jungblut, Director of Cataloging; and Kim Simon, the Director of Archival Access and Community Relations. The aim of the VHF was to record and preserve the testimonies of Holocaust survivors, and to this end conducted thousands of interviews every year, until the collection of over 50,000 testimonies was complete. The Foundation has the substantial task of cataloging, storing and disseminating the video recorded material, in order to make it available as a crucial resource for education and research, while preserving it as a principal source of historical evidence. In addition to these functions, the very process of digitizing a collection as vast as that undertaken by the VHF prompted the development of new technologies in the field.
The Survivors of the SHOAH Visual History Foundation is a unique organization in that it has one area of collection and one desired objective which together form its single raison d’être.
It is necessary to describe the impetus that led to the creation of the VHF to understand the whole process and details. In 1994, after filming Schindler’s List, the film director Steven Spielberg was so frequently approached by survivors wanting to tell their stories of the Holocaust, that he was determined to record as many of these testimonies as possible. Thus the VHF was founded with the remit to interview, record on video and digitize 50,000 unedited testimonies from survivors from all over the world. This mammoth task has been undertaken with extreme commitment and vigor by the dedicated staff and in four years the 50,000 testimonies were recorded – indeed, at the time of this visit, the total was just under 52,000. To give an idea of the amount of material, it would take over 13 years to watch all the interviews.
There was no collection survey as such, merely the remit to interview and record the survivors’ testimonies, to archive them and to make them accessible to diverse communities. The survivors come from all over the world, representing 57 countries and 32 different languages. All survivors are interviewed in their preferred language.
The survivors encompass the range of people persecuted in the period 1933-1945 and include:
The organizational policy is to take these recorded interviews and store them digitally as well as in analog form to allow preservation and retrieval of this unique collection.
From the beginning of the organization, the strategic thinking was to devise a system that would enable this policy to be put into practice. The system was conceived from the outset and this is one of many factors why the foundation has achieved its main goals so far. The system was designed to consider all the stages and phases in the collection, storage and retrieval of such material and so every step is taken within the larger framework.
Their target audience is wide and varied, covering all levels of education from K-12 to graduates. The driver is of course the unique material and the format in which it is presented. While there are archives of textual materials from survivors of the Holocaust, the VHF is unique in that the testimonies are spoken by the individual and thus heard and seen by the observer. This was the principal motivation behind the concept of the archive, to enable future generations to see and listen to the testimonies from those that survived the Holocaust.
It is not really possible to say whether this collection is a sample or the whole. It can never be the whole as such an undertaking would be overwhelming. However, it is such a large and expansive archive, with one subject only, that to call it a sample does not do it justice. As the Foundation has only one type of source and is unique in its format, it has not set out to be interoperable with other catalogs and archives. The archive is intended to be searched by users at particular centers around the world and is not intended, for instance, to be linked to a larger system of museums. However, in the creation of the catalog it has consulted the Library of Congress about thesaurus and name authority files. Due to the nature of the material, much of the catalog keyword, thesaurus and naming conventions had to be created to allow for the classification. The NISO Z39.19 Standard for Structure and Organization of Information Retrieval Thesauri has been used for the catalog and the retrieval of the information from the testimonies. They have not used standards for the processes because these did not exist for the work they wished to do.
The priorities for collection and digitization are many, with access to the materials and historical and cultural value being the main criteria. Education (teaching and learning) and research are also high priority, with co-operation with local educational organizations to use the materials in syllabus to educate the current generation about the events of the Holocaust. Other criteria include:
The primary source nature of the archives makes this a unique resource.
The project management for VHF is completely in-house. As the project and organization were conceived as a whole, the structure has been in place since the beginning.
A board of directors helps guide the institution. The CEO oversees the various departments and all departmental heads direct their staff accordingly.
At the time of the visit, cataloging the interviews was the main task, with half of the staff and 30-40% of the budget devoted to this area.
Each department head manages his or her department in accordance with goals and timetables. A weekly track on cataloging reveals any areas that can be improved upon and identifies incentives for staff to catalog more quickly.
There was no pilot project as such, but careful planning and development of the system took place from the onset of the project. In 1994, the interview scheduling system was created. The digitization system was completed in 1996 and the catalog was developed in 1995-98. The networks for distribution have been developed throughout 2001. The organization works on industry project management procedures and has a top down approach.
All the digitization of the material is carried out in-house, as are all of the processes apart from the interviews, which are naturally conducted in the survivors’ homes.
The personnel in VHF are an extremely dedicated group of people. There are 150 staff covering areas including interviewing, cataloging, fund raising, information systems, shipping, translation and outreach. These staff come from a wide variety of backgrounds: the catalogers are librarians, archivists, or historians; the interviewers come from a similarly wide range and include survivors themselves, historians and relatives of survivors. The staff employed in the day to day running of the institute, such as shipping and information systems, are all people skilled and trained in their appropriate areas. This is one of the strengths of the VHF; much of its success derives from the dedication of the highly committed staff, many of whom were identified as key players in their fields and asked to become part of VHF. Employing staff with particular skills in one area, who then bring their understanding and experience to the role in the process, rather than having to re-deploy and re-train existing staff, has been an immensely successful strategy.
Interviewers go through a rigorous training process to help them and the VHF assess if they are suited to the task. The content of the interviews makes the process emotionally difficult and not all are capable of dealing with this. The interviewers and video recordists are trained in interview techniques as well as being counseled to deal with the emotional aspect of the task. Prior to the interviews the interviewees fill in a Pre Interview Questionnaire (PIQ), which covers the details and facts of the testimony, such as names, places and dates. This PIQ has a twofold purpose: to assist the interviewer in more fully understanding the survivor’s testimony and also as the basis for the retrieval and cataloging of the finished recordings.
The catalogers are trained in the use of the catalog, both practically and semantically.
Most of the training is done in-house by the appropriate team. Catalogers have the difficult task of listening to testimonies in detail day after day, and VHF have counselors and advisers available for the catalogers to speak to should they become overwhelmed by the details of the testimonies.
The copyright of the materials from the interviews belongs to the VHF. Each interviewee signs a release that gives the Foundation the rights to distribute the testimonies for educational purposes. However, copyright becomes an issue, since the Foundation has a duty to protect the testimonies from misuse and misrepresentation. To this end they have identified four categories of users:
Foundation authenticated – these are early requesters who are using the archive within the Foundation itself. They are known to the VHF which has approved their use of the materials for research.
Location authenticated, where the VHF knows where and who the users are through the interfaces at identified museums There are five initial repositories:
These users would have pre-requested access for research in their discipline.
Location unauthenticated, when the VHF knows the location but not the identity of the user. This may be through an exhibit in a museum or restricted use of the catalog and finding aids.
Globally unauthenticated, potentially via the internet but as yet this is not a reality and much work will have to be done on the authenticated users to understand how the materials are used and how they should be accessed.
All users have to apply to use the archive and sign a non-disclosure document before being granted a license to publish.
The VHF will work on the authenticated users before looking at the unauthenticated possibilities to ensure that any procedure works. This will also enable the evaluation of the interface and finding aids and catalog. They have to balance the freedom of information against the integrity of content. They have a large legal department working on documents to ensure that the testimonies are protected from misuse.
The catalog for the archive is the key to accessing the materials. The remit of the Foundation was to record the interviews on video so that the survivors could be seen as well as heard. Unlike many oral histories, these resources are not transcribed for searching and retrieval.
The PIQ is the first step to accessing the materials, enabling users to search for names, dates and places at a metalevel which will result in focusing on a selection of the interviews. However, the final goal is to allow users to search for keywords and phrases, before actually seeing the clips from the interviews that relate to each search item. The Library of Congress Subject headings were too broad for this intention, however, where possible, national standards are used for this purpose and adapted where necessary.
A team of historians decided on the information for the short form PIQ. These have to be translated from the language they were completed in by the translation department so that, as the initial finding aid, they can be used by the English speaking community at first, although eventually the catalog and finding aids should be accessible in all 32 languages for users. The thesaurus used in the catalog is monolingual but has to be expanded to 32 languages.
This is done by dividing up the testimonies into discrete segments, identified by the cataloger. These can be anything from one to twelve minutes, although the average is around 3.52 minutes. Each testimony varies in length from one to seventeen hours, with the average at two and a half hours. The segments are marked on the digital version, so there is a start and end marker. These segments will be discrete items of information, such as arrival at a concentration camp or the description of a person. The cataloger then associates this segment with keywords that describe the segment. The list of keywords is content driven and the catalog grows as new keywords are discovered. The catalog department has teams of catalogers and researchers that review the testimonies and run quality assurance techniques on the cataloged testimonies. The researchers also oversee and investigate the new keywords that may have to be added to the catalog, where a testimony may bring to light an as yet uncataloged idea or description. Weekly editorial boards look at the methodology and the vocabulary. There are also group meetings which can help shape the methodology. The cataloger proposes an addition, which is researched and then authenticated. All updates are logged in a database to track changes. There are approximately 21, 000 keywords that are related to each other.
At the time of this visit 4,146 testimonies were fully cataloged and 42,682 short form PIQs were completed.
The segmentation marks on the digital video are associated through the metadata to the original object. The metadata scheme for the digitized video is large and complex and while various standards were considered, the VHF has created its own system to inform about the digital process, format, catalog link and administrative details. The metadata is populated automatically and the barcoding of the tapes underlies the various formats the video will take.
The interviews arrive in Beta SP format and are then entered into the index that organizes the tapes. Each tape is barcoded to associate it with the PIQ and thus the interviewee and interviewer. The tape is then copied to digital Betacam for preservation, to VHS for in-house use, to VHS (or other appropriate video format) for the survivor’s copy, and to 3MB per second MPEG1 copy. Each hour of video takes approximately 1.5GB and the average interview is 2.5 hours long. Compression using MPEG enables quicker access and reduces storage.
This process takes place automatically in the video transfer room, where 16 machines can copy simultaneously to four different video formats as well as the digital version.
In some interviews, survivors show objects to the camera. These are extracted from the digital video and stored as JPEG images attached to the MPEG version. No text is associated, as the interviews are not transcribed.
Users will see the MPEG format in the user interface.
Preservation copies are made and stored in separate temperature controlled vaults in California and New York. This double preservation ensures the longevity of the tapes.
The system for storage and retrieval is simple in concept but complex in structure. The user accesses a local cache at the repository for instant retrieval. Each cache is 1 Terabyte. If the information is not there, then the disc server (180 Terabytes) at the VHF is accessed; if the information is held on the actual server, then retrieval is almost instantaneous. The final step is for the information to be found in the tape storage and uploaded to the server and local cache. This is done automatically by a robotic arm that locates, identifies and loads the desired information, taking five to ten minutes. It is hoped that the local caches will hold the most commonly used data and thus the user will not have to wait.
The user interface is at the beginning of its development and so evaluation of end users and interface functionality is just starting. There is an online survey for the interface at the five repositories and an analog one in South America and Europe to discover who the users are, how they want to access the materials and for what purpose. The VHF is aware that it cannot meet the demands of every potential user and that it has to protect the integrity of the material at all times. It is exploring how to meet its mission and at the same time serve public demand, which has raised large ethical considerations as well as issues of academic freedom. These have yet to be fully resolved. As the interface develops and the testimonies are made available at the repositories, this research will continue to identify and develop the authentication required for user access at various levels.
The Foundation has designed products with the intention to raise awareness and to help dissemination as well as creating educational tools. It has produced three documentaries and one interactive CD-ROM based on the archive, in response to users who wish to start accessing the material. The CD is targeted at educational organizations, in particular K-12. It contains four testimonies from survivors intermeshed with historical facts, timelines and maps.
The Foundation is a 501(c)(3) public charity. The initial funding of $45 million covered the production and collection of the interviews. The Foundation is now on the second phase and requires a further $50 million to continue collection and fully catalog the archive. At the moment, the Foundation has an annual running cost of $13.5 million. All the funding is from charitable donations.
The second phase is to ensure the longevity, retrieval and dissemination of the materials. This future production has been planned since the start of the Foundation and covers the areas of security and authentication. They are researching WANS (OC3 for 150 MB per sec retrieval). Internet 2 is a possibility, as is ATD Net. This is leading towards a global infrastructure of copies of the testimonies with the entire archive online. They will also continue to produce CD-ROMS, documentaries and other mass media products. As well as the global infrastructure, they will examine the use of one-off productions for standalone machines, perhaps geared at a specific area of research.
The Survivors of the Shoah Visual History Foundation is indeed an exemplary project that has devised and operated a simple but efficient system to collect, store, archive and catalog 52,000 testimonies from all over the world. It has a major remit and as such, has also had a massive budget to enable these systems and organizations to be realized. Much of its practice cannot be applied to smaller museums or archive projects, as the level of funding required will not be available to produce such sophisticated systems and technology. Nevertheless, there are basic principles that can be applied to any digitization project: