Of course librarians have known for a long time how important controlled vocabulary is for subject access. The blog post “Integrating taxonomies with search” underscores the value of terminology control and highlights four techniques which improve the search experience using taxonomies.
The example screenshot in the post is from CAB Direct, a source of reference for applied life science articles (introductory video). There’s a special section in CAB Direct dedicated to the CAB thesaurus where you can browse or search for specific terms.
The databases CAB Abstracts and Global Health which underlie the database platform are maintained by CABI, a non-profit international organization in the agricultural and environmental sector.
It’s encouraging to read the conclusion that “integrating taxonomies with search is therefore at heart a business issue” – it shows that the importance of structured subject access is recognized (in the corporate sector too, for that matter), and it should make librarians confident that we have a real value to bring to the table.
Lumas offers photography as art editions. I came across one of their galleries while strolling around Stilwerk yesterday and thought I’d take a look at their website.
Why do I mention it here? Metadata, of course! They do a pretty good job at providing a search interface with categories for browsing, and they have information about the artists such as CV and general introductions into their approach and work. But I have a feeling they could benefit from information professionals organizing their assets and enriching the metadata (how about controlled vocabularies to add depth to search and retrieval or applying the VRA Core image description standard?).
Discovery tools with search engine functionality are increasingly being introduced to replace traditional library OPACs. What does this mean for those creating or managing metadata, the catalogers? The role of catalogers gets more important, because however nifty the frond-end is, the underlying data has to be accurate to be useful and to enable findability, and this is even more true now than it was in the days of the OPAC. Catalogers thus have greater responsibility towards the user. Fill in fields correctly or not at all, but not carelessly – data has to be as reliable as possible for machine processing and for the display generated from it. Fields that seem insignificant today may be essential for tomorrow’s technology. We don’t create metadata just for current systems or tools, we also want it to be usable in the future that may yield possibilities we can’t yet foresee.
Discovery systems uncover mistakes or blind spots in the data which in OPACs no one would really have noticed. MARC (or in Germany and Austria MAB) fields that have thus far not been used for display in OPACs now become more visible and actually influence findability to a considerable degree. Some key elements merit special attention. For example, certain facets for browsing are determined by the content of fixed fields (media type, language, country of publication…). If these fields are filled in incorrectly or not at all, search quality suffers and users might be led in wrong directions. The same is true for media type icons. Classification faceting pretends to be meaningful – but can the user be sure that all records in the system have subject headings? I know for a fact that in the cataloging network I’m working in, many records don’t, and these are lost to this particular facet.
Any discovery tool is only as good as the metadata, and the metadata is, in the current cataloging workflow, only as good as the skills of those handling it. Catalogers should be involved in the implementation process of a discovery tool, and awareness has to be raised about what consequences their work has on users’ search and discovery experience. Some cataloging guidelines might have to be adapted in order for the metadata to be fully exploitable in this new environment. This should happen early to keep the amount of metadata to be manipulated later as small as possible.
To follow up on the last post, picking up Ranganathan’s law “Save the time of the reader”, one way of saving the time of the reader is to refine faceted search and browsing. The bigger the indexed corpus (and that would be the case when including abstracts, TOCs, indexes etc.), the more hits a query will yield, and there will potentially be a higher number of irrelevant results. It is also getting more difficult for the user to see why a certain result is returned if the search term doesn’t show up in the readily identifiable fields like title, author or subject heading. We don’t considerably save the user’s time if they have to wade through pages of search results, just because we don’t want them to miss a book they might find useful that came up due to the search term in the back-of-the-book index.
The discovery layers that more and more replace traditional library OPACs offer faceting of results by various criteria (language, format, year of publication etc.). How about introducing what I would call a “source-sensitive facet”? This facet will show where the search term occurs, whether it’s in the metadata, the subject heading(s), or in supplementary material such as TOCs, abstracts or indexes. Scottsdale Public Library has such a facet in place:
Their “Search found in” facet is apparently generated from the metadata proper, but you could also imagine indicating the location (e.g. an electronic document associated with a given item) where the search term was found: “Search found in TOC / abstract / index “. The document in question has to be typed for the system to recognize which type (TOC, abstract, index) it belongs to and display this information. Such a facet would make the results more transparent, since some users are confused about where their search term occurs in the data and why a specific item turns up in the list of results. Those interested enough to find out will have a tool at hand.
Library metadata will increasingly include (some or all of) the content of the resource cataloged to complement the descriptive data. It is currently hard for libraries to capture what is in a book (granted, we have subject headings, but they categorize resources using controlled rather than natural language), and as a consequence some books are prone to being easily overlooked.
A friend of mine is doing research on two hardly known female Austrian composers of the 1930s for her Master’s thesis. In one instance, she only found out that a book contained important biographical information about them through the book’s index on the publisher’s website listing their names (which she stumbled across via … Google). Obviously neither a table of contents nor a library record could have supplied this pointer. So, failing a full-text scan, digital versions of the parts of a book that serve as windows into the content are helpful additions to the metadata. This is not restricted to TOCs (which can already be found in a number of records), but includes indexes as well. Incidentally, it is not without a reason that the Topic Maps technology was originally developed for automated electronic indexes – they are gateways into a book’s contents, reflecting the main concepts it deals with. All of the above is assuming, of course, that an index exists – as Baron Campbell says in Lives of the Chief Justices:
“So essential do I consider an Index to be to every book, that I proposed to bring a Bill into Parliament to deprive an author who publishes a book without an index of the privilege of copyright; and, moreover, to subject him, for his offence, to a pecuniary penalty.”
(quoted from: Thomas B. Passin: Explorer’s Guide to the Semantic Web)
If publishers or other providers of bibliographic metadata are ready to make additional material about their publications available for display in library catalogs (so that efforts aren’t duplicated), if reviews, abstracts, indexes or TOCs are assembled in one central place, indexed and thus searchable, we stand a much better chance to help users discover the knowledge structure contained in books. Moreover, by not requiring them to go to the shelf to look inside the book or even request it from closed stacks, we get a step closer to fulfilling Ranganathan’s fourth law: Save the time of the reader.