Findability as a business issue

Of course librarians have known for a long time how important controlled vocabulary is for subject access. The blog post “Integrating taxonomies with search” underscores the value of terminology control and highlights four techniques which improve the search experience using taxonomies.

The example screenshot in the post is from CAB Direct, a source of reference for applied life science articles (introductory video). There’s a special section in CAB Direct dedicated to the CAB thesaurus where you can browse or search for specific terms.

The databases CAB Abstracts and Global Health which underlie the database platform are maintained by CABI, a non-profit international organization in the agricultural and environmental sector.

It’s encouraging to read the conclusion that “integrating taxonomies with search is therefore at heart a business issue” – it shows that the importance of structured subject access is recognized (in the corporate sector too, for that matter), and it should make librarians confident that we have a real value to bring to the table.

Source-sensitive facet?

To follow up on the last post, picking up Ranganathan’s law “Save the time of the reader”, one way of saving the time of the reader is to refine faceted search and browsing. The bigger the indexed corpus (and that would be the case when including abstracts, TOCs, indexes etc.), the more hits a query will yield, and there will potentially be a higher number of irrelevant results. It is also getting more difficult for the user to see why a certain result is returned if the search term doesn’t show up in the readily identifiable fields like title, author or subject heading. We don’t considerably save the user’s time if they have to wade through pages of search results, just because we don’t want them to miss a book they might find useful that came up due to the search term in the back-of-the-book index.

The discovery layers that more and more replace traditional library OPACs offer faceting of results by various criteria (language, format, year of publication etc.). How about introducing what I would call a “source-sensitive facet”? This facet will show where the search term occurs, whether it’s in the metadata, the subject heading(s), or in supplementary material such as TOCs, abstracts or indexes. Scottsdale Public Library has such a facet in place:

Their “Search found in” facet is apparently generated from the metadata proper, but you could also imagine indicating the location (e.g. an electronic document associated with a given item) where the search term was found: “Search found in TOC / abstract / index “. The document in question has to be typed for the system to recognize which type (TOC, abstract, index) it belongs to and display this information. Such a facet would make the results more transparent, since some users are confused about where their search term occurs in the data and why a specific item turns up in the list of results. Those interested enough to find out will have a tool at hand.

The value of the index

Library metadata will increasingly include (some or all of) the content of the resource cataloged to complement the descriptive data. It is currently hard for libraries to capture what is in a book (granted, we have subject headings, but they categorize resources using controlled rather than natural language), and as a consequence some books are prone to being easily overlooked.

A friend of mine is doing research on two hardly known female Austrian composers of the 1930s for her Master’s thesis. In one instance, she only found out that a book contained important biographical information about them through the book’s index on the publisher’s website listing their names (which she stumbled across via … Google). Obviously neither a table of contents nor a library record could have supplied this pointer. So, failing a full-text scan, digital versions of the parts of a book that serve as windows into the content are helpful additions to the metadata. This is not restricted to TOCs (which can already be found in a number of records), but includes indexes as well. Incidentally, it is not without a reason that the Topic Maps technology was originally developed for automated electronic indexes – they are gateways into a book’s contents, reflecting the main concepts it deals with. All of the above is assuming, of course, that an index exists – as Baron Campbell says in Lives of the Chief Justices:

“So essential do I consider an Index to be to every book, that I proposed to bring a Bill into Parliament to deprive an author who publishes a book without an index of the privilege of copyright; and, moreover, to subject him, for his offence, to a pecuniary penalty.”

(quoted from: Thomas B. Passin: Explorer’s Guide to the Semantic Web)

If publishers or other providers of bibliographic metadata are ready to make additional material about their publications available for display in library catalogs (so that efforts aren’t duplicated), if reviews, abstracts, indexes or TOCs are assembled in one central place, indexed and thus searchable, we stand a much better chance to help users discover the knowledge structure contained in books. Moreover, by not requiring them to go to the shelf to look inside the book or even request it from closed stacks, we get a step closer to fulfilling Ranganathan’s fourth law: Save the time of the reader.