Category Archives: catalog

A meta catalog for digitized works

Imagine a user wants to read a public-domain book in electronic form. She’d be faced with the same situation as users before the advent of unified resource discovery systems – she has to go to various places on the web and do separate searches. Wouldn’t it be nice if there was a meta catalog for digitized works that brings together data from the likes of the Internet Archive, HathiTrust, Project Gutenberg, Europeana or Google Books? It could show what books were digitized by whom, whether they are downloadable, in what format, on what devices they can be read etc. Such a directory could also enable users to compare the quality if the same work is available in different versions. Another benefit would be the reduction of duplications of effort. Having duplicate electronic versions is not necessarily bad, but are time and money not better spent on unique materials not digitized elsewhere? Local priorities could be determined on a more informed basis.

All of this occurred to me while reading an article about the eBooks-on-Demand (EOD) service discovery platform (from p. 229 here, in German). EOD is a joint initiative of over 30 libraries from 12 European countries that each run their own digitization activities. Together they offer the (paid) service that lets users order a public-domain book to be digitized and delivered as an ebook. Instead of relying on users discovering EOD books “by chance” in the respective libraries’ catalogs, a VuFind search interface was built that allows finding books for digitization from all participating libraries in one central place and gives direct access to alre­ady digitized items. Records are ingested via OAI or FTP batch upload. For the future the project team plans to enhance the search platform to include links (via API queries of players like those I mentioned above) to works already digitized elsewhere. And this is where the idea of a central overarching catalog for digitized public-domain works popped up. Existing portals such as the Zentrales Verzeichnis digitalisierter Drucke (ZVDD, central catalog of digitized printed works, which covers digital versions created in Germany) go into the right direction, but we definitely have to think more globally and on a larger scale.

Rethinking facets and FRBR

It was with great interest that I read the paper (PDF) “FRBR and Facets Provide Flexible, Work-Centric Access to Items in Library Collections” (2011) by Kelley McGrath, Bill Kules and Chris Fitzpatrick (mentioned on NGC4Lib) because it modified and enriched my understanding of the relationship between facets and FRBR and the way facets help meet the users’ information needs. Sure, facets are there to help users refine their search and pull out a smaller set of results that match certain attributes, but what is the theoretical underpinning and how does the FRBR model relate to facets?

The paper cited above highlights the authors’ experience with modeling and building a search interface including facets for a moving image collection, and while some of their observations are specific to these resource types and the retrieval requirements that go with them, much is generally applicable. The main point for me being (as alluded to in the paper’s title) that facets are much more flexible than hierarchical FRBR structures through which the user would have to navigate – facets allow the user to combine any number of attributes when limiting the results, without clicking through hierarchies of work, expression etc.

What makes the model and the prototype interface so powerful is the fact that FRBR is not slavishly followed but rather adapted to the specific features of the resources, collapsing the work, expression and manifestation entities into two levels, “movie” and “version/publication”. This helps avoid duplication of information, both regarding display and cataloging, and answer the questions: “what do you want?” and “how/where do you want it?” (probably the most general questions user bring to the catalog).

Through facets, users are offered several pathways into collections: “Patrons can start their search at any point in the FRBR hierarchy, from Item (location) to Work (genre, date), and easily transition between search and browse strategies, using facets to broaden or narrow their results and pivoting on facet values.” (p. 4) – explorations they cannot as easily undertake in a tree-like FRBR representation.

Facets and FRBR

In one of his recent posts, James Weinheimer stated that the possibilities to limit and sort results by facets provided by discovery layers “fulfill the “FRBR user tasks” right now, and even overfulfill[s] them.” I hadn’t yet seen facets as embodiments of FRBR (or at least I hadn’t spelled it out so clearly). But it turns out you can ascribe attributes associated with each FRBR group or entity to one (or more) facets. So I figured it’s worth visualizing this with a concrete example for myself as well as for others (to who this may have been obvious already …). Let’s not argue about whether form/genre belongs to manifestion or expression, all that matters is whether whatever tool or construct we offer will help users find (I take “find” in the broadest sense of the word, i. e. to include identify, obtain etc.) what they are looking for, not so much how we librarians conceptualize or name it.

Since the example I chose is from the Austrian union catalog (ETA:, which serves a consortium of academic and administrative libraries, facets pointing to specific libraries or locations (to items in FRBR speak) are included. However, grouping is not done very elegantly – more effective grouping would sort the number of hits in a clearer way. A rather explicit way of grouping together expresssions and list all manifestations under the respective expressions is shown in slides 27 and 28 of Thomas Brenndorfer’s presentation “The FRBR-RDA Puzzle: Putting the Pieces Together” (unfortunately, the catalog he refers to is no longer available at that address, so I wasn’t able to replicate his screen shots). This way of hierarchical grouping makes it easier for the user to see whether a library has, say, different language editions, talking book or film versions of a given work.

A library mashup: Wikipedia and the catalog

Recently there was quite a bit of talk on Wikipedia as authority file, which then drifted to the topic of linking Wikipedia and library metadata. Additionally, Ed Summers pointed to Jakob Voss’ 2005 article “Metadata with Personendaten and beyond” which explains in some detail how authority data (control numbers) from the German National Library are added to Wikipedia articles. Wikipedia user APPER created several very useful tools for this task. So, given that the German-speaking community has these tools, I thought I’d take a look at how they are implemented in library union catalogs (these examples don’t claim to be complete).

The German National Library authority record provides a link (but not in the display for an individual title):

SWB union catalog shows a link to the Wikipedia article on the title level (as well as in the authority record):

The Austrian union catalog (into which I currently catalog) implements APPER’s control number look-up script into the title view with a preview of the Wikipedia article (Primo discovery system):

You decide which is the most elegant solution ;)… It would be great if for subjects matches could be found between library subject authority data and the corresponding Wikipedia articles and the same enriching could take place, because often the terms are closer to real-world expressions than the sometimes convoluted subject authorities. It seems that efforts to that effect are already underway:

Curating content and form

Reading Cindy Romaine’s article “The Consumer Electronics Show – insights for SLA” on SLA’s Future Ready 365, the phrase that stands out for me is: “Data devices, or form factors, were very elegant and restrained. It seemed that there was an effort not to overwhelm the consumer with technical options, but to simplify and curate”.

Through collection management and selection, librarians curate content for their patrons. But just as a museum curator not only selects artworks for an exhibition but also takes care of showcasing them (by painting the wall where a painting is hung, for example), so librarians should not only focus on curating content but also curate form.

In my view, the presentation of our well-curated content should be as “elegant and restrained” in design as the devices Cindy talks about. No doubt our discovery systems offer a wealth of technical options (navigating, faceting, word/tag cloud etc.), but librarians should curate these options and where possible simplify so as not to try to do too much and overwhelm the users (who might just – unconsciously – shy away from a library catalog they don’t understand as intuitively as their electronic devices).

The simplicity and functionality of handhelds, cell phones or tablets shape user experience just as much as the web sites they visit, so aspects like these have to be factored in when thinking about catalog interfaces, and curation is as important for form as for content.

Consistency and identity management

Consistency is a strange thing. We are in dire need of it to give computers something reliable to work with, yet we are unlikely to achieve the necessary level of consistency in our data due to various reasons. First, we are human, and inconsistency can be said to be part of human nature; second, there are different catalogers entering data into the same pool who don’t do things exactly the same. We can (and as catalogers, should) strive for as much consistency as possible in our own work, but factors such as the ones just mentioned get in the way.

Current ILS match strings for indexing, so it’s hard for them to tell whether “Oxford UP” and “Oxford Univ. Pr.” and “Oxford University Press” (I’ll spare you other ways to write this – which exist!) are the same or not. Users wanting to browse titles of a certain publisher are left to click through lists of variant names (typos and such included…). Or even worse, failing such an index, they have to search for all kinds of variations.

Why not cluster / merge these under one term? The technical possibilities are there (the freely available Google Refine, or topic maps, for that matter), I’m sure it could be implemented into library systems. A simple list of values to choose from while cataloging would be another, although limited, option. Here software can help straighten out human errors or inconsistencies (which, let’s face it, will continue to exist) and users will benefit from a more time-sparing and useful display. Identity management, anyone?