To make library data more accessible for others to work with, we not only need to split our records up into atomic bits that can be mapped and modeled and identified, we also have to rethink the relationship between the rules and the format. In his article for Code4Lib, “Interpreting MARC“, Jason Thomale cogently talks about explicit vs. implicit structure as one of the reasons for his struggle with extracting the bibliographic data proper. The implicit structure is a result of the cataloging rules and *not* the data structure – you have to know the rules in order to interpret the 245 field correctly. It’s not enough to simply know what the field stands for. The explicit structure is made up of fields, subfields, indicators, or generally refers to a formal model of a data structure, a schema.
The fact that in MARC the explicit and implicit structures are so intertwined as to be almost inseparable is at the heart of the difficulty of non-catalogers to interpret our data and use it for machine-processing. The rules creeping into the format make it hard for programmers or others willing to reuse our data to make sense of it even if it’s encoded in XML.
… [T]he more structured a data record is, the more explicit the semantics tend to be. Meaning is clear and encapsulated—the overall context in which data appears within a record is irrelevant because, apart from what might be specified in the data model, context carries no semantic meaning. (Jason Thomale, “Interpreting MARC”)
We shouldn’t let the rules interfere with the definition of metadata elements, and we have to get the semantics of these elements across as unambiguously as possible. This is only achievable by keeping content (i.e. rules; AACR, RDA…) and structure (i.e. format; MARC, RDA vocabularies, Dublin Core…) apart. As soon as the former obscures the latter, we’re headed for trouble.
Well said, thanks.
While I agree in general, I’d like to contradict to your conclusion: we do not need to better separate rules and structure, but we need to better connect them. You can do so, for instance by adding formal schemas like XML Schema and OWL. Even simple checks by regular expressions would help to increase the quality and usability of our data. Surely these schemas will only cover parts of the complex rules of cataloging, but at least there would be something to build on. You wrote “We shouldn’t let the rules interfere with the definition of metadata elements, and we have to get the semantics of these elements across as unambiguously as possible” but I would say from the other direction that we need rules that define our metadata elements and provide the semantics of these elements.
P.S: The relation betwen rules and format in data creation and data formats is very interesting, indeed. If you have some references to library and information science literature that argues about this relationship and distinction, please let me know! Maybe the difference lies where “semantic” comes into play. But both formats and rules provide some structure that is annotated with additional hints to reveal the meaning of the structure. I’d say traditionally formats have more structure and less annotations, while rules have more annotations and less structure. Maybe the “subject indicators” from Topic Maps, you wrote about, act as such annotations. I am not familiar enough with Topic Maps to judge.
In the last sentence of your first comment I sense a disconnect between what we refer to when we say “rules”: rules to define metadata elements or rules for the content, i.e. for data entry? Subject indicators are one way of providing definitions and semantics of certain concepts, terms or data elements.
I think the main questions are: will the format be independent from the rules? What happens when the cataloging rules are modified? If the format is dictated by the rules, it will have to be adapted, too. Will the schema be clear and understandable without the whole backdrop of rules for entering the actual data? After all, cataloging rules are instructions for making decisions about describing the item in hand, and as such should not affect the format in which this description is represented.
I too would like to read more about the relationship between rules and formats because these are issues that have to be taken into account when devising new standards.