Monthly Archives: August 2011

The Getty Search Gateway

The J. Paul Getty Trust, consisting of the J. Paul Getty Museum, the Getty Research Institute, the Getty Conservation Institute and the Getty Foundation, has recently launched an exciting new portal, the Getty Search Gateway (see also the press release). It allows you to search and browse the collection database, library catalog, collection inventories and archival finding aids as well as digital collections simultaneously, and filter results using facets. It caught my attention especially because of its similarities to library discovery layers in providing a convenient way to search across collections for a variety of resource formats. Mike Clardy, Assistant Director, Information Systems / Information Technology Services at the Getty, who wrote a blog post to introduce the new research tool, and Joe Shubitowski, Head, Library Information Systems, were kind enough to answer my curious questions and to share some details about the development and underlying structure with me which I’ll paraphrase here.

As you may have guessed, the search gateway was built using the Solr / Lucene search engine. The objective was to bring together a number of sources and formats under one umbrella. This is why the schema definition had to be flexible enough to support the wide variety of contributing sources. In fact, as I learned reading up on the Solr schema, Solr offers ways to dynamically create fields without them being pre-defined or explicitly named. With <dynamicField> declarations, you can create rules that tell the application what to do with certain fields, what data type to use etc. Generally in Solr, fields are strongly typed, i.e. every field in the schema is defined to be of a certain type with specifications about its intended use.

In the case of the Getty Search Gateway, this makes it possible for every source contributor to decide what fields to include in the index, what fields to display (and in which order) and how to label them. More specifically, the Solr schema developed by the Getty staff contains very few required fields, very few mapped fields that all data sources have to map to, and dynamic fields that any source can use to index and display their holdings. A single field may get copied into several different Solr fields, with different field options for searching, sorting, faceting or display, for example. This approach for aggregating museum and library data provides some major facets to pivot on, but also gives each data contributor the freedom to export, index and display the data elements they deem most important. For every data source, custom XSL transformations were written.

The possibility for each source to specify its own options is very powerful and has great potential for other applications. The Solr schema is cleverly exploited in the design of this implementation. I wasn’t previously aware of these possibilities in Solr and really appreciate the chance to understand its inner workings a bit better.

Rethinking facets and FRBR

It was with great interest that I read the paper (PDF) “FRBR and Facets Provide Flexible, Work-Centric Access to Items in Library Collections” (2011) by Kelley McGrath, Bill Kules and Chris Fitzpatrick (mentioned on NGC4Lib) because it modified and enriched my understanding of the relationship between facets and FRBR and the way facets help meet the users’ information needs. Sure, facets are there to help users refine their search and pull out a smaller set of results that match certain attributes, but what is the theoretical underpinning and how does the FRBR model relate to facets?

The paper cited above highlights the authors’ experience with modeling and building a search interface including facets for a moving image collection, and while some of their observations are specific to these resource types and the retrieval requirements that go with them, much is generally applicable. The main point for me being (as alluded to in the paper’s title) that facets are much more flexible than hierarchical FRBR structures through which the user would have to navigate – facets allow the user to combine any number of attributes when limiting the results, without clicking through hierarchies of work, expression etc.

What makes the model and the prototype interface so powerful is the fact that FRBR is not slavishly followed but rather adapted to the specific features of the resources, collapsing the work, expression and manifestation entities into two levels, “movie” and “version/publication”. This helps avoid duplication of information, both regarding display and cataloging, and answer the questions: “what do you want?” and “how/where do you want it?” (probably the most general questions user bring to the catalog).

Through facets, users are offered several pathways into collections: “Patrons can start their search at any point in the FRBR hierarchy, from Item (location) to Work (genre, date), and easily transition between search and browse strategies, using facets to broaden or narrow their results and pivoting on facet values.” (p. 4) – explorations they cannot as easily undertake in a tree-like FRBR representation.

Facets and FRBR

In one of his recent posts, James Weinheimer stated that the possibilities to limit and sort results by facets provided by discovery layers “fulfill the “FRBR user tasks” right now, and even overfulfill[s] them.” I hadn’t yet seen facets as embodiments of FRBR (or at least I hadn’t spelled it out so clearly). But it turns out you can ascribe attributes associated with each FRBR group or entity to one (or more) facets. So I figured it’s worth visualizing this with a concrete example for myself as well as for others (to who this may have been obvious already …). Let’s not argue about whether form/genre belongs to manifestion or expression, all that matters is whether whatever tool or construct we offer will help users find (I take “find” in the broadest sense of the word, i. e. to include identify, obtain etc.) what they are looking for, not so much how we librarians conceptualize or name it.

Since the example I chose is from the Austrian union catalog (ETA: http://search.obvsg.at/OBV), which serves a consortium of academic and administrative libraries, facets pointing to specific libraries or locations (to items in FRBR speak) are included. However, grouping is not done very elegantly – more effective grouping would sort the number of hits in a clearer way. A rather explicit way of grouping together expresssions and list all manifestations under the respective expressions is shown in slides 27 and 28 of Thomas Brenndorfer’s presentation “The FRBR-RDA Puzzle: Putting the Pieces Together” (unfortunately, the catalog he refers to is no longer available at that address, so I wasn’t able to replicate his screen shots). This way of hierarchical grouping makes it easier for the user to see whether a library has, say, different language editions, talking book or film versions of a given work.