The BBC World Service Archive Prototype is a website that provides access to the huge digital archive of radio programs of the BBC World Service. Yves Raimond and Tristan Ferne describe in a concise article (PDF, 8 pages) how Semantic Web technologies, automation and crowdsourcing are used to annotate, correct and add metadata for search and navigation. Ed Summers has a blog post about this project, making a comment I wholeheartedly agree with: “… [I]t is the (implied) role of the archivist, as the professional responsible for working with developers to tune these algorithms, evaluating/gauging user contributions, and helping describe the content themselves that excites me the most about this work.” I think this is not only a possible future role for archivists but also for librarians, especially catalogers and metadata specialists working with digital collections.
Could this become the linked data killer app? Wikimedia Deutschland has kicked off a project that aims to better centralize, structure and type the vast amounts of data in Wikipedia – Wikidata. Information will be extracted from the info boxes and stored in one central database. Moreover, they will store “meta-metadata” about who said what when (thus adding an important dimension to dbpedia). The identity management aspect appeals to me most: all language versions of Wikipedia will link to one central point for the same entity (thus reducing redundancy), and a single URI will be coined that is language independent. See Daniel Kinzler’s recent presentation (PDF) at SWIB 2012 and this detailed article to learn more about the project (funded by Google, among others) that might demonstrate the concrete usefulness of linked data.
culturegraph.org is a Linked Open Data service which deals with minting common identifiers (Uniform Resource Identifiers) for cultural works (books and other text, paintings, sculptures, piece of music etc.) to ensure these resources’ reliable and persistent referenceability.
This service (still in its infancy) addresses the fact that one and the same resource has multiple identifiers for metadata – library control numbers as well as publisher’s or bookseller’s IDs like Amazon Standard Identification Number (ASIN) and URI’s stemming from Linked Data efforts. Tying these identifiers together and linking among them is great, but so would be pulling together all the bibliographic descriptive data. Creating *yet another* URI for a certain resource description contributes to the URI synonymy issue (“many URIs for one thing”), but I suppose it serves as an umbrella under which the data can be aggregated.
In any case, it’s good not to neglect already existing non-URI identifiers in information integration, although it seems everything must have a URI these days ;).
Mike Bergman, who delivered one of the keynotes at this year’s Dublin Core conference, offers an interesting view of the current state of linked data in an interview for http://semanticweb.com/ (semantic interoperability being one of his main concerns). He raises one point that has been in the back of my mind for quite a while – what is the actual usage of linked data? More specifically, what is being done with the bibliographic datasets released by a number of libraries (CERN leading the way) this year? I’m not aware of mashups or other applications that have been developed using open library data (if they exist, I’d be curious to hear about them). What does that tell us about LOD in the library domain? Granted, it might take some time to gain traction and to convert the data into a friendlier format, yet I can’t help but wonder about the substantial and practical benefit so far.
In order to ensure semantic interoperability, in an ideal world there would be a shared understanding of what concepts and data elements mean. Mapping between terms in different ontologies or between data elements in different formats is alright, but there are deeper issues of how people struggle to represent meaning in computer systems made for others whose model of the world might not be (exactly) the same.
A striking example for the difficulty of semantic interoperability is a Linked Data challenge which sought to answer the question: “Which town or city in the UK has the highest proportion of students?“. One answer puts Cambridge first (you’ll notice the quite obvious mistakes in the data), while another sees Milton Keynes on top. Without digging too deep into the details, one can see that it’s important to make sure the definition of “town”, “city” or “student” is the same in all data sources (Wikipedia, government data…), and to formulate a precise enough query.
The nuances of meaning make a huge difference here. A casual user is unlikely to get the semantics exactly right to match these nuances. Can there be a way to design systems that copes with these intricacies, that can dynamically incorporate context-sensitive and domain-specific semantics, semantic changes over time, locally negotiated semantics as opposed to universal approaches?