The BBC World Service Archive Prototype is a website that provides access to the huge digital archive of radio programs of the BBC World Service. Yves Raimond and Tristan Ferne describe in a concise article (PDF, 8 pages) how Semantic Web technologies, automation and crowdsourcing are used to annotate, correct and add metadata for search and navigation. Ed Summers has a blog post about this project, making a comment I wholeheartedly agree with: “… [I]t is the (implied) role of the archivist, as the professional responsible for working with developers to tune these algorithms, evaluating/gauging user contributions, and helping describe the content themselves that excites me the most about this work.” I think this is not only a possible future role for archivists but also for librarians, especially catalogers and metadata specialists working with digital collections.
The OLAC Movie & Video Credit Annotation Experiment is part of a larger project to make it easier to find film and video in libraries and archives. This experiment breaks current movie records down to pull out all the cast and crew information so that it may be re-ordered and manipulated. We also want to make explicit connections between cast and crew names and their roles or functions in the movie production. Adding these formal connections to movie records will allow us to provide a better user experience. For example, library patrons would be able to search just for directors or just for cast members or only for movies where Clint Eastwood is actually in the cast rather than all the movies that he is connected with. […]
We therefore want to convert our existing records into more structured sets of data. Eventually, we intend to automate most of this conversion. For now, we need help from human volunteers, who can train our software to recognize the many ways names and roles have been listed in library records for movies.
I already submitted a number of annotations, and it’s so much fun and so easy that I could hardly stop! So join in if you have a few minutes to spare and contribute to this crowd-sourcing project.
Some interesting points by Mark Sands, director of media and audiences, Tate London, on new users of cultural institutions and new questions that can be asked and answered with broader access to digitized cultural material (starting at min. 37 of the video recording of a recent panel discussion on “new access to culture”):
– the control that cultural institutions used to exercise over their assets is fundamentally challenged
– it is challenged by experts who sit outside of the institution and by members of the public
– there are communities of interest
– a single curator can simply not know everything about a subject, and there are audiences out there who know a great deal whose knowledge can enhance the knowledge of the curator
The degree of user participation in libraries is growing. Libraries open up to crowd-sourcing in two major ways: some time ago, user tagging was introduced into the library catalog, and now we offer patrons the chance to take part in collection development through patron-driven acquisitions. We move away from the institution deciding single-handedly what to buy, instead users have the possibility to create their collections in libraries. This suggests that they not only have the competence to search for and find material but also to participate in selecting and making it accessible. As the catalog shifts from a library inventory to a tool for accessing content (with an immense increase in data), so the user moves from a “consumer” of library collections to a partner in decision-making who has the skills to assess the quality of available information.
Two examples of crowd-sourced cataloging at the recent LITA National Forum : The Biodiversity Heritage Library (BHL) enables users to download articles from digitized volumes and create their own metadata for them, which are (given sufficient quality) included in CiteBank, an aggregation platform for citations of and access to biodiversity related articles. This is one method of providing the much-needed article-level discovery: “BHL currently contains interfaces and services that allow users to create their own PDF articles. These documents are retained when appropriate metadata have been provided and are made available to other users through CiteBank.”
The Chicago Underground Library (now renamed the Read/Write Library) allows members of the community to catalog items: “We instruct catalogers to list every contributor to the publication, whether the author, editor, typesetter, or illustrator. They are provided with several controlled taxonomies for defining the format of the item and subjects, but then are also asked to contribute their own tags and write a very short abstract. Another key component of the metadata is that we ask catalogers to assign a geolocation tag to describe either where the item was published or what it describes, which supports navigating the catalog by neighborhood.” If they like, library visitors can play an active role in describing the resources in the collection.
These two initiatives present cataloging not as “read-only” but also as an activity users can participate in, providing a granular level of information. While this involves (to a certain extent) letting go of library standards, quality is monitored by professional librarians in both cases.
 Bianca Crowley, Trish Rose-Sandler: “Crowd-sourcing the creation of ‘articles’ within the Biodiversity Heritage Library” (slides)
Margaret Heller, Nell Taylor: “Social Networking the Catalog: A Community Based Approach to Building Your Catalog and Collection” (PDF)
The Smithsonian Institution apparently has a long history of crowd-sourcing. David Alan Grier reports in his podcast “The Confident and the Curious” that in the 1850s, the original weather observers collected data for the US Navy. The volunteers sent the data they had gathered with scientific instruments four times a day to the Central Weather Office located in the Smithsonian Institution in Washington D.C. Still today, the Smithsonian makes use of crowd-sourcing to enhance accessibility to their vast collections. Through Flickr, a research fellow at the National Zoo gets help from people in cataloging photographs from wildlife locations.
With the rise of user participation on the web, traditional institutions can no longer claim to have an authoritative view on any given subject. Projects like Linux, Firefox, Wikipedia, OpenLibrary or LibraryThing and professions like journalism testify to this fundamental change. Incidentally, OpenLibrary is thinking about putting out a call for volunteers to help correct bad OCR by transcribing old handwriting.
So what is at the core of crowd-sourcing? People have to be willing to share what they know for a project they perceive as furthering the common good. The mixture of points of view and experience provide a more diverse outlook on the project or topic at hand. Crowd-sourcing entails relinquishing a bit of control, which might be a big step (both psychologically and politically) for some institutions. Could crowd-sourcing be applied to library cataloging too? Libraries could involve experts in certain fields for help with cataloging specific collections that might not have been tackled due to various reasons. This is but one example of how libraries could open themselves to the “wisdom of crowds”.