To make library data more accessible for others to work with, we not only need to split our records up into atomic bits that can be mapped and modeled and identified, we also have to rethink the relationship between the rules and the format. In his article for Code4Lib, “Interpreting MARC“, Jason Thomale cogently talks about explicit vs. implicit structure as one of the reasons for his struggle with extracting the bibliographic data proper. The implicit structure is a result of the cataloging rules and *not* the data structure – you have to know the rules in order to interpret the 245 field correctly. It’s not enough to simply know what the field stands for. The explicit structure is made up of fields, subfields, indicators, or generally refers to a formal model of a data structure, a schema.
The fact that in MARC the explicit and implicit structures are so intertwined as to be almost inseparable is at the heart of the difficulty of non-catalogers to interpret our data and use it for machine-processing. The rules creeping into the format make it hard for programmers or others willing to reuse our data to make sense of it even if it’s encoded in XML.
… [T]he more structured a data record is, the more explicit the semantics tend to be. Meaning is clear and encapsulated—the overall context in which data appears within a record is irrelevant because, apart from what might be specified in the data model, context carries no semantic meaning. (Jason Thomale, “Interpreting MARC”)
We shouldn’t let the rules interfere with the definition of metadata elements, and we have to get the semantics of these elements across as unambiguously as possible. This is only achievable by keeping content (i.e. rules; AACR, RDA…) and structure (i.e. format; MARC, RDA vocabularies, Dublin Core…) apart. As soon as the former obscures the latter, we’re headed for trouble.