The BBC World Service Archive Prototype is a website that provides access to the huge digital archive of radio programs of the BBC World Service. Yves Raimond and Tristan Ferne describe in a concise article (PDF, 8 pages) how Semantic Web technologies, automation and crowdsourcing are used to annotate, correct and add metadata for search and navigation. Ed Summers has a blog post about this project, making a comment I wholeheartedly agree with: “… [I]t is the (implied) role of the archivist, as the professional responsible for working with developers to tune these algorithms, evaluating/gauging user contributions, and helping describe the content themselves that excites me the most about this work.” I think this is not only a possible future role for archivists but also for librarians, especially catalogers and metadata specialists working with digital collections.
“Topic maps and the ILS: an undelivered promise” (Library Hi Tech, 26 (2008), 1, pp. 12 – 18) – a great, accessible way for librarians to explore possible applications for topic maps in a library setting. The authors are Suellen Stringer Hye and Edward Iglesias (who wrote some thoughtful comments on “Data, not records” on the ITIG ACRL-NEC blog).
The main merits of the paper are the demonstration of potential use cases of topic maps in libraries and the comparison of topic maps to discovery applications. Examining the assets and advantages that distinguish topic maps from these tools, the authors point to the power of topic maps: through associations, “each item or topic carries with it information about its context”, for example.
As mentioned in the paper, vendors of library software have not (yet), despite internal use of topic maps, included the technology in ILS development. Why not? What would it take for them to actively promote topic maps? And what about open source software? Of course this cuts both ways – there is no specific demand from libraries either. Maybe librarians need a clearer understanding of the benefits of topic maps compared to the fashionable discovery systems.
A discovery tool only goes so far, topic maps go further.
The paper “An associative index model for the results list based on Vannevar Bush’s selection concept” by Charles Cole, Charles-Antoine Julien and John E. Leide of McGill University, Montreal, which appears in the latest issue of Information Research, contends that algorithmically created methods of refining results lists in online catalogs are not well suited to meet the users’ information needs. The authors draw on Vannevar Bush (whose seminal text, “As we may think” (1945), is available here) and Charles Cutter to develop an associative index model.
Based on an understanding of cognitive processes during a search, the model establishes a second collocation set, triggered by the user’s associative thinking while perusing the first, system-derived results list. This second set is considered to better match the user’s actual information need. In my view it is only an externalization and formalization of thought processes at work in a more or less conscious way, including epistemological questions like, how do we look for information, how do we formulate a query, i.e. use natural language to reflect our information need, how do we process and organize the findings, how is association involved in search and selection.
Some questions remain open, for instance, why didn’t the authors revert to the FRBR user tasks instead of creating their own with slightly different meanings, or how would relevance feedback relate to their approach. I wonder what role topic maps could play in an associative retrieval tool – enabling users to identify subjects in their own words, i.e. from their thought associations, feeding improvements suggested by users back. A dynamic system getting “smarter” through user input which complements computational algorithms.
Discovery tools with search engine functionality are increasingly being introduced to replace traditional library OPACs. What does this mean for those creating or managing metadata, the catalogers? The role of catalogers gets more important, because however nifty the frond-end is, the underlying data has to be accurate to be useful and to enable findability, and this is even more true now than it was in the days of the OPAC. Catalogers thus have greater responsibility towards the user. Fill in fields correctly or not at all, but not carelessly – data has to be as reliable as possible for machine processing and for the display generated from it. Fields that seem insignificant today may be essential for tomorrow’s technology. We don’t create metadata just for current systems or tools, we also want it to be usable in the future that may yield possibilities we can’t yet foresee.
Discovery systems uncover mistakes or blind spots in the data which in OPACs no one would really have noticed. MARC (or in Germany and Austria MAB) fields that have thus far not been used for display in OPACs now become more visible and actually influence findability to a considerable degree. Some key elements merit special attention. For example, certain facets for browsing are determined by the content of fixed fields (media type, language, country of publication…). If these fields are filled in incorrectly or not at all, search quality suffers and users might be led in wrong directions. The same is true for media type icons. Classification faceting pretends to be meaningful – but can the user be sure that all records in the system have subject headings? I know for a fact that in the cataloging network I’m working in, many records don’t, and these are lost to this particular facet.
Any discovery tool is only as good as the metadata, and the metadata is, in the current cataloging workflow, only as good as the skills of those handling it. Catalogers should be involved in the implementation process of a discovery tool, and awareness has to be raised about what consequences their work has on users’ search and discovery experience. Some cataloging guidelines might have to be adapted in order for the metadata to be fully exploitable in this new environment. This should happen early to keep the amount of metadata to be manipulated later as small as possible.
To follow up on the last post, picking up Ranganathan’s law “Save the time of the reader”, one way of saving the time of the reader is to refine faceted search and browsing. The bigger the indexed corpus (and that would be the case when including abstracts, TOCs, indexes etc.), the more hits a query will yield, and there will potentially be a higher number of irrelevant results. It is also getting more difficult for the user to see why a certain result is returned if the search term doesn’t show up in the readily identifiable fields like title, author or subject heading. We don’t considerably save the user’s time if they have to wade through pages of search results, just because we don’t want them to miss a book they might find useful that came up due to the search term in the back-of-the-book index.
The discovery layers that more and more replace traditional library OPACs offer faceting of results by various criteria (language, format, year of publication etc.). How about introducing what I would call a “source-sensitive facet”? This facet will show where the search term occurs, whether it’s in the metadata, the subject heading(s), or in supplementary material such as TOCs, abstracts or indexes. Scottsdale Public Library has such a facet in place:
Their “Search found in” facet is apparently generated from the metadata proper, but you could also imagine indicating the location (e.g. an electronic document associated with a given item) where the search term was found: “Search found in TOC / abstract / index “. The document in question has to be typed for the system to recognize which type (TOC, abstract, index) it belongs to and display this information. Such a facet would make the results more transparent, since some users are confused about where their search term occurs in the data and why a specific item turns up in the list of results. Those interested enough to find out will have a tool at hand.