“HathiTrust – a research library at web scale” by Heather Christianson, published in Library Resources & Technical Services, vol. 55, no. 2, 2011, p. 93 – 102, gives an informative and detailed overview of what the HathiTrust is and does.
Research libraries have a mission to build collections that will meet the research needs of their user communities over time, to curate these collections to ensure perpetual access, and to facilitate intellectual and physical access to these collections as effectively as possible. Recent mass digitization projects as well as financial pressures and limited space to store print collections have created a new environment and new challenges for large research libraries. This paper will describe one approach to these challenges: HathiTrust, a shared digital repository owned and operated by a partnership of more than forty major libraries.
It is especially the collaborative nature of this effort in the areas of digital preservation, services, discovery and access and collection management that empowers research libraries by allowing them to pool expertise, share resources and save costs.
George Oates has a great post on the Open Library Blog which looks at library data from the perspective of mass processing and variety of data sources and argues for a “minimum viable record”. To a certain extent this ties in with what I wrote about last week. How can we simplify metadata and still meet the goals of how a library catalog serves the user? What are the key elements to describe a resource in an accurate, easily usable and processable manner? To me, the concepts of “good enough” and “minimum viable record” seem to be related in that they try to distill the essence out of descriptive library metadata.
Bibliographic data is used by patrons, librarians and machines – for display, discovery, disambiguation, matching, faceting, exchange, to name but a few of the current use cases. For each of these user groups and tasks different metadata elements are more or less important. “Good enough” and “minimum viability” have to be judged according to these aspects; it’s hard to come up with an across-the-board definition, although there are certainly areas where human and machine requirements overlap (what could be called the “essence”). Additionally, we always need to bear in mind that we don’t yet know every detail of what future systems will be able to do with our data – so how much simplification is justifiable? I imagine that for FRBR structures you would again need more details in order to distinguish different expressions or manifestations (whose place of publication may differ, for example).
Sticking to the metadata essence could make matters more efficient for both humans and machines, and we might even achieve greater consistency and interoperability. Enjoying more freedom than libraries, the Open Library could be a testbed for the practicality of “minimum viable records”.
“Merging Metadata: Building on Existing Standards to Create a Field Book Registry” by Carolyn Sheffield, Sonoe Nakasone, Ricc Ferrante, Tammy Peters, Rusty Russell, and Anne Van Camp appears in the latest issue of LIBREAS (PDF including footnotes here). Abstract:
The Field Book Project is a cross-disciplinary project to develop an online registry for field books and other primary source materials related to biodiversity research. Led by Rusty Russell and Anne Van Camp, this project is a joint initiative of the Smithsonian Institution Archives and the National Museum of Natural History. This paper presents the metadata structure established for building the Field Book Registry. The project team is committed to involving members of the library, archives, museum, and biodiversity communities in the development of the Field Book Registry. We invite your comments and discussion regarding the work presented here.
Field books (notes collected during biodiversity research) represent a particular kind of literature in that they are mainly unpublished, unique items and may be held in museums, archives or libraries, with resulting varying practices of description (if they have been cataloged at all). The authors propose the merging of data elements and relationships of several metadata schemas (Natural Collections Description (NCD); the Metadata Encoding Transmission Standard (METS); Metadata Object Description Schema (MODS); and Encoded Archival Context for Corporate bodies, Person and Families (EAC-CPF)) “to form a unified metadata solution”. Looks like a challenging undertaking, but one that sets out to do justice to the uniqueness of the material.
A recurring theme running through two recent items I read/listened to (slides by Holly Tomren and interview with Janet Swan Hill) is the idea of “good enough” metadata. Janet Swan Hill puts it this way: “… we are still undergoing a period of grieving, I think, for the fact that we are learning that we have to put up with good enough.” (19 min. in, topic is taken up again later in the interview). Indeed coming to terms with the fact that in many cases “good enough” can and must be enough requires a change of mind that is not easy to achieve for meticulous catalogers (and I’d count myself in here too).
Don’t get me wrong – I’m not suggesting we should be doing things some way or other without caring about quality, but we have to develop a different definition of quality and a different way of how to measure it. We will have to figure out which of our standards are worth retaining, which of them *really* contribute to better findability and identification. To be honest, I’m struggling a bit with questions like: What can we accept as “good enough”? What elements of the bibliographic description can be left alone although they may not be “perfect”?
As we move towards an ecosystem of ingesting data from different sources (vendors, publishers etc.), from print-based to increasingly digital collections that enable automated metadata extraction and batch processing, our idea of quality metadata is bound to change because we just can’t afford to fiddle with minute details. “Good enough” for me also means gearing cataloging more strictly towards user needs and findability. It doesn’t mean dumbing down library metadata but rather focus our time, energy and brainpower on the core of our mission and our bibliographic structure (not to mention the cataloger time it frees up that can be spent on tackling other projects etc.).
So what’s your attitude towards “good enough”? Do you think it will threaten our professional values? Or is it a pragmatic approach that reflects necessary changes in catalogers’ outlook?