New York University Libraries Resources.
 
  TSD Manual > clean_head.html

Page last updated: November, 2003
Tag help 0xx / 1xx / 2xx / 3xx / 4xx / 5xx / 6xx / 7xx / 8xx / 9xx /

Headings Clean-up in Bobcat

What are we trying to do?

We are using authority records provided by a vendor (OCLC/WLN, Inc.) to standardize headings in BobCat. All the incorrect variant forms of a heading will be replaced by the standard, authorized, correct form. Legitimate variant forms(cross-references) will point to the correct form.

What do we mean by headings?

Headings are the entries in BobCat for authors, subjects, series, author-titles (uniform titles such as Twain, Mark. Adventures of Huckleberry Finn), names of governments, institutions, corporate bodies, etc.

How many headings are being corrected?

There are over 2 million headings in BobCat. Authority processing should merge several hundred thousand of the variant, incorrect headings into authorized headings.

Why are there so many different headings for the some name in BobCat?

BobCat is very literal--the slightest variation in spelling, spacing, punctuation and diacritic marking creates a new heading. BobCat's 1.6 million cataloging records have come from a variety of sources over the past 20 years--some more reliable than others. Some variants are due to changing cataloging rules over the years. Our efforts to quickly convert almost one million manual records in the 1980's did not allow us to carefully review and update all headings.

How will the headings be corrected?

It will take two successive record loads to eliminate incorrect headings and provide legitimate cross-references to correct headings. The first load of over 350,000 "pseudo-authority" records will temporarily turn many of the incorrect, variant forms from headings into "cross references" to the correct form. (As an example, BobCat has three variations on the authorized heading: Abbott, Claude Coleer, 1889-1971:

Abbott, Claude Coleer	
Abbott, Claude Coleer. 1889-	
Abbott, Claude Coleer, 1889-

These variants are first becoming "cross-references" to the correct form. As part of further processing, these temporary "cross references" will fortunately disappear and there will be just one, correct heading. (As of this writing, the first variant has not yet turned into a "cross- reference" because the authority records are not being loaded in strict alphabetical order.) The second load of over 728,000 "true" authority records will provide legitimate cross-references to an authorized heading. (We have, for instance, a cross reference: Abbad Rios, Francisco [See] Abbad-Jaime de Aragon Rios, Francisco

This sounds complicated. Is this how other libraries are doing it?

Yes, it is complicated, and, No, we are doing our headings clean-up differently from other libraries. The structure of the Geac database, with its separate but linked authority file allowed us to just send out a file of our headings rather than a copy of the entire bibliographic record. If we sent out the entire bibliographic record, we would have to "freeze" the database and suspend all catalog maintenance activity until the records came back from the vendor. Over 5,000 new or updated cataloging records are filed each week in the ADVANCE cataloging module. With the high volume of maintenance work being done by the HICUP (holdings inventory) staff not to mention routine maintenance done by cataloging staff, we could not afford to have records unavailable for update for a long period of time.

Isn't this taking a long time?

Yes. We sent a file of our 2 million headings to OCLC/WLN in Spring 1997. Because this is the first time that our vendor did an authority processing project this way, they learned as they went along, and there were a few bumps in the road. In order to catch the variant headings that the machine- match missed, we contracted for OCLC/WLN staff to manually review all "near-matches". This human "eye-balling" and correction of near-matches further extended the processing time. Another monkey wrench has been the fact that over the last two years he national standards for internal MARC tagging for name and subject authority records has changed. Reconciling discrepancies between tags in bibliographic and authority records is extemely complicated.

How long will it take to load one million records?

We have only loaded a test sample of 700 "psudo-authority" records. Sherman and Bill are examining the records carefully to be sure the processing worked the way it suposed to. Once we are sure the processing worked correctly, we will do daily loads, but we will have to monitor the impact of loading on indexing response time.

In what order are the records being loaded?

They are being loaded roughly in alphabetical order. If you take a look at headings for Pierre Abailard, you will see that many headings have been changed to the AACR2 form (Abelard, Peter), but one remaining author/title heading still needs to be processed.

Why are there cross-references for subject headings throughout the whole alphabet?

The entire file (200,000+) of Library of Congress subject heading authority records was loaded in March 1997, and updates are being loaded every few weeks.

Why are there cross-references from one subject heading to another heading, but then there are many occurences of the "wrong", cross-referenced heading subdivided topically? an example is
NEGROES--SEE AFRO-AMERICANS FOLLOWED BY MANY HEADINGS FOR NEGROES--ATLANTA, NEGROES--BIBLIOGRAPHY, ETC.

Loading the LCSH file "flipped" all of the old "Negroes" headings to "Afro-Americans", but that load was not smart enough to match and flip all of the "Negroes" headings that were followed by subdivisions. This is something that the OCLC/WLN load will take care of.

When we finish loading the 1,000,000 authority records, will we be done?

Hardly. First we have to send to OCLC/WLN a "gap" file of all headings added to BobCat since Spring 1997. Those records have to be processed, returned and loaded.

Then will we be done?

Almost. The machine matching and manual review done by OCLC/WLN should have caught and eliminated most of the variant headings, but some will have slipped through the cracks. We are particularly worried about series headings--because of the complexity of their structure, they are especially susceptible to mangling during machine processing.

What are we going to do to keep bobcat "clean"?

After we send out the "gap" file, we will have the Systems Office generate a weekly list of headings new to BobCat. Some of these will be editing oversights which we will correct; others will be new, legitimate headings that need full authority records with cross-references. We ma do this weekly clean-up in Cataloging or we may contract OCLC/WLN to do this on a regular basis.