At the moment, my main project is to teach myself more about XML and its applications in libraries in my own time (don’t need it (yet?) for work), and I thought I’d jot down some notes about a recent article from Journal of library metadata I found quite interesting.
“Creating Metadata for Digitized Books: Implementing XML and OAI-PMH in Cataloging Workflow” (DOI) by Myung-Ja Han, University of Illinois Library, traces the way metadata travels between different systems and formats for different purposes (OPAC view, harvesting etc.) and is transformed and enriched along the way. I guess I would have appreciated a diagram of the stages of the workflow because I can more easily visualize and also understand things when they are presented in a schematic way, but never mind.
First, the author gives an overview of approaches towards print books and their digitized counterparts. Should they get one or two records in the bibliographic system? Ultimately, for various reasons, the two records option is favored, so carrier wins out over content (two records, one for each carrier, instead of just one record for one content, which is the same in the print and digital version). Although it would seem more logical to go for the one content option, the main reason, from my point of view, is the FRBRized display and faceting offered in many discovery tools that filter out digitized books as e-resources.
In the procedure explained in the article, all digitized books get a MARCXML record which is created from the MARC record of the printed book. Two persistent identifiers (one for access and one for archiving) are created for each digitized book and included into the MARCXML records for OAI-PMH. These are the basis for the creation of digitized books records for the OPAC. They are harvested from the OAI server and transformed with XSLT which adds format specific information and fields that enable linking to the printed book record. The XML records are then converted into MARC records for upload into OCLC. Some manual work remains for specific fixed fields that are not amenable to automated processing (such as language or country of publication). The records are exported into the ILS and appear in the local OPAC.
Yes, I’m quite aware of the weaknesses of MARCXML; the fact that MARC is expressed in XML doesn’t really make the format as such any better. And despite the emergence of newer web technologies like XML sitemaps or Atom (or even OAI-ORE) for much the same functions, OAI-PMH is still the method of choice for LAM communites. The author remarks: “Since the OAI-PMH is widely used as a resource-sharing and aggregating tool for digitized resources including digitized books and XML technology is used for metadata creation and conversions, the potential for libraries to implement these tools in their traditional cataloging workflows should be examined further.”