Uniterm

Uniterm is a subject indexing system introduced by Mortimer Taube in 1951. The name is a contraction of "unit" and "term", referring to its use of single words as the basis of the index, the "uniterms". Taube referred to the overall concept as "Coordinate Indexing", but today the entire concept is generally referred to as Uniterm as well.

Uniterm is designed to allow rapid lookups on topic keywords and then cross-reference those keywords across multiple topics in order to find documents that match all of the terms. The result of a uniterm search is a set of accession numbers that can then be used to retrieve the matching documents. Uniterm is based on existing accession numbers, so it is technically a post-coordinate system. This is opposed to a pre-coordinate system, where the subject of the document results it being given a particular number, as in the Dewey Decimal Classification. Uniterm was among the most popular post-coordinate indexing systems, although some of its success was due to Taube's company winning contracts to index huge technical libraries.

History

The development of Uniterm, and other new indexing systems, ultimately traces its history to the late World War II period. Aware of the advanced aircraft and rocket technologies developed in Germany, the US formed Operation Lusty and UK the similar Fedden Mission in order to gather as much of these materials as possible. Along with examples of the aircraft and various weapons, these efforts returned millions of pages of technical documentation. The desire to ease access into these enormous collections led to a great expansion in the field of information retrieval.[1]

In the US, the aeronautical collection was first sent to US Army Air Force at Wright Field, but over time it was merged with similar caches of US research to form an ever-growing collection of technical papers. The collection grew so large and varied that a new operational group, the Armed Services Technical Information Agency (ASTIA), was formed in 1951 to manage it. This group eventually came under the management of the Atomic Energy Commission. ASTIA began running experiments in indexing the collection, and it was from this work that Uniterm emerged.[2]

Taube introduced the Uniterm concept in a 1951 paper, "Coordinate Indexing of Scientific Fields", part of the Symposium on Mechanical Aids to Chemical Documentation. The next year, in partnership with Gerald Sophar, Taube formed Documentation, Inc. The company offered commercial retrieval and indexing services. Among their largest efforts was a 1958 contract with the newly formed NASA to index their entire technical library, and later, make microfilm copies of it.[3]

Taube's original paper indicates that a significant advantage of the Uniterm concept is its ability to be automated. In essence, the uniterm lookup process is looking for the intersection of several terms, or as Taube referred to it, the "coordinates".[a] To this end, they partnered with IBM to develop the "Continuous Multiple Access Collator", or COMAC. Users would make search term selections on a punch card writer and then feed them into the COMAC, also known as the IBM 9900.[4] The COMAC pulled those uniterm cards and then used optical systems to find matching items. It then returned a new card with those numbers that was then sent into the IBM 305 RAMAC, the first computer with a hard drive, which returned the complete document information for those numbers.[4]

Concept

Uniterm is based on the concept of making a separate card catalog that refers to the documents in the collection by their accession numbers. The accession numbers have no meaning in the Uniterm index, so they may use any of the common systems like the Dewey Decimal Classification or Universal Decimal Classification, or in many cases, simply an incrementing serial number.[5][2]

As new works are added to the collection, the librarian will make a normal index card for the primary card index as they would for any work. Additionally, they will select a small number of keywords from the title or body of the work that can be used to look it up, and these are also written on the card. For instance, a document on icing of air ducts in aircraft might be filed under "air", "ducts" and "icing", but perhaps not "aircraft" which would be found on too many documents.[6]

The librarian then looks in the Uniterm catalog for cards with those terms on them. If they are not found, they are created by writing the keyword at the top of the card and then dividing the lower portion into ten vertical sections, labeled 0 to 9. The last digit of the accession number is then written on the card in that column, for instance, if the last digit of the accession number is 5, the entire accession number would be written in column 5. If the card for that term is found in the collection, the new accession is simply added to the correct column of the existing card.[7]

To retrieve a document, the user selects potentially useful key terms and extracts those cards from the uniterm index. To find this article, the user might select "indexing" and "library", and retrieves those cards from the uniterm catalog. These cards will have numbers for many different documents, for instance, the "library" card might contain a listing for a book on the Library of Alexandria. However, only those documents on "library indexing" will appear on both cards.[8]

The user then scans the card to see if a particular accession number appears on both cards; splitting the cards into 10 columns is intended to make the visual scanning process simpler. Numbers that appear on both cards are likely relevant to the search, and can then be looked up directly or by looking in the main card catalog if partial accession numbers are used.[8]

The cards in the main catalog also contain the uniterms used to file that entry, forming a cross-index. A user that selects the cards for "propeller" and "aeroplane" may find many intersecting works on the cards. Returning to the main index they can look at the uniterms recorded on the main index cards and find that there are other terms that commonly appear, perhaps "aerodynamics". These might suggest additional terms that could be used to narrow their search. They can then return to the uniterm catalog to apply these new terms to return additional documents or further focus their search.[9]

Advantages and criticisms

Uniterm was popular in the United States for large technical collections, which led to considerable study on the system. One particularly useful effort was the National Security Agency's effort to catalog their 70,000-work collection.[10]

They found one major advantage of the Uniterm system was that the librarians did not have to have an understanding of the material in order to correctly catalog it. Simply selecting terms that appeared in the title or were obviously important within the text would often result in a useful uniterm entry. This contrasted with traditional hierarchical approaches, where selecting the proper spot within the hierarchy often required some, or considerable, knowledge of the underlying field.[10]

The same effort also revealed a number of problems and suggested solutions. One was that synonyms presented a problem; was a paper on "air ducts" the same or different than one on "air intakes"? They suggested this could be addressed by splitting the works into sets of about 1,000 entries and building the catalog out in sections. The first set of 1,000 documents might produce 1,000 uniterms, which were then studied to weed out synonyms. When synonyms were found, they added "see also" headings to those cards. The second set would then be added, using those synonyms. They found that the addition of new terms started to flatten out at about 4,000 entries, and after 10,000 only very specific technical terms were being added.[11]

A concern that was raised when the concept was first introduced was that the terms might return a large number of false positives due to terms being used to describe completely different concepts. In particular, terms that might mean different things depending on their order were believed to be an issue. If one was looking for "American exports to Canada", "Canada", "US" and "exports" would return a large number of documents on Canadian exports into the US as well, perhaps overwhelming the result set.[12]

However, this was found not to be a serious problem in practice, and those few examples that did crop up were solved by adding "delta cards", see-also entries that incorporated a direction. In this case, the "US" card would have a see-also entry for "USΔ", that card would only contain those entries from the US. Uniterms on the USΔ page are only those for US exports.[12]

Notes

  1. ^ As in "things that are coordinated", not "a physical location".

References

Citations

  1. ^ Lesk, Michael. "The Seven Ages of Information Retrieval". Bellcore.
  2. ^ a b Sharma & Sharma 2007, p. 19.
  3. ^ Times 1965.
  4. ^ a b Taube 1962.
  5. ^ Install 1953, p. 1.
  6. ^ Install 1953, p. 2.
  7. ^ Install 1953, pp. 6, 7.
  8. ^ a b Install 1953, p. 9.
  9. ^ Install 1953, p. 11.
  10. ^ a b Sanford & Theriault 1956, p. 19.
  11. ^ Sanford & Theriault 1956, p. 20.
  12. ^ a b Sanford & Theriault 1956, p. 23.

Bibliography