The Smithsonian Institution apparently has a long history of crowd-sourcing. David Alan Grier reports in his podcast “The Confident and the Curious” that in the 1850s, the original weather observers collected data for the US Navy. The volunteers sent the data they had gathered with scientific instruments four times a day to the Central Weather Office located in the Smithsonian Institution in Washington D.C. Still today, the Smithsonian makes use of crowd-sourcing to enhance accessibility to their vast collections. Through Flickr, a research fellow at the National Zoo gets help from people in cataloging photographs from wildlife locations.
With the rise of user participation on the web, traditional institutions can no longer claim to have an authoritative view on any given subject. Projects like Linux, Firefox, Wikipedia, OpenLibrary or LibraryThing and professions like journalism testify to this fundamental change. Incidentally, OpenLibrary is thinking about putting out a call for volunteers to help correct bad OCR by transcribing old handwriting.
So what is at the core of crowd-sourcing? People have to be willing to share what they know for a project they perceive as furthering the common good. The mixture of points of view and experience provide a more diverse outlook on the project or topic at hand. Crowd-sourcing entails relinquishing a bit of control, which might be a big step (both psychologically and politically) for some institutions. Could crowd-sourcing be applied to library cataloging too? Libraries could involve experts in certain fields for help with cataloging specific collections that might not have been tackled due to various reasons. This is but one example of how libraries could open themselves to the “wisdom of crowds”.
“Bridging end users’ terms and AGROVOC Concept Server Vocabularies” is a poster by Ahsan Morshed, Gudrun Johannsen, Johannes Keizer and Marcia Lei Zeng presented at this year’s International Conference on Dublin Core and Metadata Applications.
AGROVOC, the Food and Agriculture Organization’s multilingual thesaurus, consists of terms collected by literary or institutional warrant. Since this approach excludes the users’ terms for concepts (which turn out to be different in many cases), synonym rings are used in information retrieval to map users’ search terms to the controlled vocabulary of the thesaurus. Yet another example of the importance of user contributions and having users identify their subjects with their own vocabulary, in a multilingual environment to boot. Another question that comes to mind is: should we say goodbye to “preferred” terms? Preferred in whose view? Maybe we could even introduce a scope for “preferred” – preferred by researchers, by the interested public…
The knowledge organization system in use at FAO, aptly called AGROVOC Concept Server, has a number of topic maps features (like mapping between vocabularies, expression of relationships), which are notable for their concept-centric – or should we say subject-centric – approach. This approach and the use of synonym rings also requires a user interface that makes it clear which terms are considered synonyms if some of the results don’t include the user’s actual search term.
To make library data more accessible for others to work with, we not only need to split our records up into atomic bits that can be mapped and modeled and identified, we also have to rethink the relationship between the rules and the format. In his article for Code4Lib, “Interpreting MARC“, Jason Thomale cogently talks about explicit vs. implicit structure as one of the reasons for his struggle with extracting the bibliographic data proper. The implicit structure is a result of the cataloging rules and *not* the data structure – you have to know the rules in order to interpret the 245 field correctly. It’s not enough to simply know what the field stands for. The explicit structure is made up of fields, subfields, indicators, or generally refers to a formal model of a data structure, a schema.
The fact that in MARC the explicit and implicit structures are so intertwined as to be almost inseparable is at the heart of the difficulty of non-catalogers to interpret our data and use it for machine-processing. The rules creeping into the format make it hard for programmers or others willing to reuse our data to make sense of it even if it’s encoded in XML.
… [T]he more structured a data record is, the more explicit the semantics tend to be. Meaning is clear and encapsulated—the overall context in which data appears within a record is irrelevant because, apart from what might be specified in the data model, context carries no semantic meaning. (Jason Thomale, “Interpreting MARC”)
We shouldn’t let the rules interfere with the definition of metadata elements, and we have to get the semantics of these elements across as unambiguously as possible. This is only achievable by keeping content (i.e. rules; AACR, RDA…) and structure (i.e. format; MARC, RDA vocabularies, Dublin Core…) apart. As soon as the former obscures the latter, we’re headed for trouble.
“Topic maps and the ILS: an undelivered promise” (Library Hi Tech, 26 (2008), 1, pp. 12 – 18) – a great, accessible way for librarians to explore possible applications for topic maps in a library setting. The authors are Suellen Stringer Hye and Edward Iglesias (who wrote some thoughtful comments on “Data, not records” on the ITIG ACRL-NEC blog).
The main merits of the paper are the demonstration of potential use cases of topic maps in libraries and the comparison of topic maps to discovery applications. Examining the assets and advantages that distinguish topic maps from these tools, the authors point to the power of topic maps: through associations, “each item or topic carries with it information about its context”, for example.
As mentioned in the paper, vendors of library software have not (yet), despite internal use of topic maps, included the technology in ILS development. Why not? What would it take for them to actively promote topic maps? And what about open source software? Of course this cuts both ways – there is no specific demand from libraries either. Maybe librarians need a clearer understanding of the benefits of topic maps compared to the fashionable discovery systems.
A discovery tool only goes so far, topic maps go further.