Despite recent efforts by libraries to release metadata as linked data, library records are still perceived as monolithic entities by many librarians. In order to open library data up to the web and to other communities, though, records should be seen as collections of chunks of data that can be separated / parsed out and modeled. Granted, the way we catalog at the moment makes us hold on to the idea of a “record” because this is how current systems represent metadata to us both on the back- and front-end. However with a bit of abstraction we can see that a library record is essentially nothing but a set of pieces of data.
In some (unfortunately not all) online catalogs, records have permalinks / identifiers so that they can be referred to on the web, but what about the single elements that constitute a library record, i.e. the MARC fields? MARC does not have a great deal of granularity, many pieces of information are crammed into one field and the specific meaning often lies in indicators and subfields. This discussion on LibraryThing about the “Physical Description” field highlights the fact that MARC data is not really machine-understandable and useful for parsing.
Looking beyond the awkward representation of metadata in MARC, we can think of it as pieces of data assembled for describing a resource. The RDA vocabularies could be one step towards modeling the elements of library data as separate entities, providing URI identifiers (potential PSIs?) [1]. So basically we are (slowly) moving towards identity management that is not only understood by humans but also by machines, enabling interoperability, sharing and merging.
From a topic maps point of view, one consequence of breaking library records up into pieces of data is that it facilitates the integration of data from heterogeneous sources, making subject identification richer. You can map between different formats, like MARC, Dublin Core, ONIX etc. What in MARC we call “author” has a different subject indicator in other models, in Dublin Core it is dc:creator, for example (see “Understanding subjects and subject proxies” by Patrick Durusau). Each chunk of information can become a topic of its own. And linking elements of the classic “record” with information from Freebase or DBpedia is one possibility of weaving a richer web of knowledge (aka linked data). Not to mention the possibilities of mapping, identifying and linking subject headings…
Cataloging huge amounts of 19th century material, I often wonder: what if users had a link to the year of publication (e.g. from Wikipedia) that could provide some background information about what happened that year and could assist them in understanding the historical situation and the context a book fits into? Same for place of publication – which state was Sarajevo part of in 1894?
Let me recommend two publications by Karen Coyle that expand on the notion of metadata supplanting traditional “records”: the article “Metadata mix and match” (not for purists
) and “Understanding the Semantic Web: Bibliographic Data and Metadata”, especially ch. 2, summarized on the Panlibus blog. Libraries have rich and valuable data to offer; in order for that data to be exposed, sharable and reusable on the web or in other information frameworks, the “record” has to be broken up into its parts, which allows for modeling and subject identification.
[1] According to the XML Topic Maps specification, a published subject indicator is “any resource that has been published in order to provide a positive, unambiguous indication of the identity of a subject for the purpose of facilitating topic map interchange and mergeability.”