From union catalog to fusion catalog

The conference “Academic Librarian 3: The Yin-Yang of Future Consortial Collaboration and Competition” was held in Hong Kong at the end of last month. Presentations are now available, and I would like to draw your attention to one presentation about cataloging: “From union catalogue to fusion catalogue: how collaborative cataloguing might be initiated and implemented in the Hong Kong context” (PDF). Due to electronic resources and the accompanying vendor records, the union catalog, with its relatively uniform application of rules and standards, gets transformed into a “fusion catalog” with different cataloging rules and various levels of detail. This observation definitely resonates with what I’m dealing with at work right now, namely the integration of thousands of e-book records for an evidence-based selection model set up by one of the big university libraries we serve. The data comes from OCLC, in MARC (and created with a different set of cataloging rules), is subsequently converted into the German / Austrian format MAB and into the Aleph Sequential Format in order to be loaded into our catalog. They are not the “prettiest” records but this is an efficient method of offering the users a large amount of content in a fast way. One more project that brought the Austrian union catalog closer to a “fusion catalog” is the big digitization undertaking by the Austrian National Library, “Austrian Books Online”, where not only books are scanned but also catalog cards which are then OCRed, automatically transformed into bibliographic records and batch-loaded into the catalog database.

So does this new “fusion catalog” with a blended mix of standards, formats, rules and detail affect the user at all? Or is it all hidden under the discovery layer anyway? Do we still really need and can we maintain the high level of consistency of the union catalog? The conference presentation gives some aspects of the lessons learned during the transition from union to fusion catalog, that is sometimes imperceptible to everyone but catalogers:

“Past:

  • Following uniform cataloguing practices
  • Preferring a high level of consistency in bibliographic records

Present:

  • Bring in vendor records applying different cataloguing rules and various level of completeness
  • Accepting that ‘a minimal record [is more] beneficial to library users than no record at all’

Variations are inevitable

  • The ideal: Conform fully to one single cataloguing standard and to local conventions
  • In reality: different cataloguing data sets are blended together
  • Direct and immediate access to the needed library materials is more important to users than standard cataloguing records

[...]

  • When variations are accepted and catalogers are open to accepting differences in cataloguing practices”

With RDA on the horizon and with the perspective of having legacy data and new data sitting side by side, as well as data created following different RDA policy decisions for alternatives/options and cataloger’s judgments, if consortial and/or global shared cataloging is to continue we will finally have to say goodbye to our rather closed world-view and come to terms with a non-uniform, blended mixture of bibliographic information.

Cataloging as a function and its atomization

Talking about the consequences of self-publishing (by individuals and increasingly by entities like Provincetown Public Library) on the traditional publishing industry, Mike Shatzkin says : “Publishing will become a function of many entities, not a capability reserved to a few insiders who can call themselves an industry.” I wonder if this doesn’t apply to cataloging as well. Libraries used to have a monopoly on cataloging, but increasingly lose this status and find themselves relying on third-party records. Cataloging and metadata have become ubiquitous and are not reserved anymore to those with the arcane knowledge (on LibraryThing anyone can catalog with a simple web interface), but the library world still has a tendency to think we own and can prescribe the “perfect” bibliographic description (which after all is part of our identity and how we define ourselves as an “industry”). Another quote from Shatzkin’s article with parallels to cataloging and the library field: “This is the atomization of publishing, the dispersal of publishing decisions and the origination of published material from far and wide. In a pretty short time, we will see an industry with a completely different profile than it has had for the past couple of hundred years.”

Remixing and forking books

Library practices of bibliographic description have so far taken for granted the stability of the book. In the future, we might have to deal with describing versioning, forking and remixing. The article ” Forking the book” argues that dynamic content will become possible. As an example, it highlights a tool that lets you edit EPUB with GIT as a backend. “[W]ith this demo we are using GIT with a book so you can clone, edit, fork and merge the book into infinite versions.” There is already a platform for remixing books, BookRiff, which has not yet gained wide acceptance but which is slated to enable the kind of forking the article talks about.

Data modeling has to be aware of developments in the creation of the objects it primarily describes and makes discoverable. Borrowing expressions from the print paradigm, the forked book is comparable to a kind of “bound with”, multi-work constellation, but more complicated since only parts of works might be used, different versions might be created and licensing information would have to be noted. I guess Bibframe will be able to accommodate these versions and remixes, but that would mean that the statement in the November Bibframe report, “Each BIBFRAME Instance is an instance of one and only one BIBFRAME Work”, will not hold, because, as I see it, the instance (the remixed/forked book) would be in a relationship with two or more works.

BIBFRAME datastore

The current issue of the Code4Lib journal contains an article by Jeremy Nelson of Colorado College, “Building a Library App Portfolio with Redis and Django”, that highlights the development of FRBR datastores that run on a NoSQL database server (Redis). More interestingly, perhaps, this platform is based on the BIBFRAME model with four core classes: creative work, instance, annotation and authority (for further details, see the project site on GitHub). To me, such a two-level mapping makes a lot of sense. In fact, I quite like the reduction of the FRBR complexity in the BIBFRAME model, especially with the anticipated re-use by other communities in mind. Jeremy Nelson explains on the BIBFRAME mailing list:

Because of Redis’s flexibility, I’ve been able to use RDA element names as either discrete properties for each BIBFRAME entity or as part of the naming scheme for the BIBFRAME entity’s associated keys.  A nice feature of this approach is that we are not restricted to just RDA but we can use other metadata standards (MODS, DC, ONIX, VRACore, etc.) as discrete properties or as part of the Redis key naming schema for the BIBFRAME entities. We are also using a simplistic mapping of FRBR Work and Expressions to BIBFRAME Creative Work, along with FRBR Manifestation and Item to the BIBFRAME Instance…