“Improving Federal Spending Transparency: Lessons Drawn from Recovery.gov” by Raymond Yee, Eric C. Kansa and Erik Wilde of the UC Berkeley School of Information “explores the effectiveness of accountability measures deployed for the American Recovery and Reinvestment Act of 2009 (‘Recovery Act’ or ‘ARRA’).” Although data has been released as part of Open Government initiatives, the authors point out a lack of transparency due to data silos, highly distributed information sources and lack of controlled access points, among other reasons.
According to the authors, ARRA data resembles a jigsaw puzzle – the legislation is complex and there are many players and sources of data. In my view topic maps could help with a number of problems cited in the paper: they could build a bridge between several budgetary disclosure systems, they could expose the structure behind ARRA and make explicit the relationships between legislation and the wishes of Congress, implementation by the Treasury Dept., allocation of money to different accounts, and spending patterns (including agencies and recipients). Links could go back and forth, connecting data from across agencies (e.g. spending data –> program documentation –> legislation authorizing funding for that program). Obviously, machine-processable and unambiguous identifiers as well as controlled vocabularies are needed for various entities – this seems to be a weakness in the data so far, though.
The authors also call for an account of the data sources, which can be “first-class citizens” in topic maps, i.e. topics in their own right that can be talked about. Moreover, they stress the importance of efficient information retrieval systems – if you can’t find the information, what use is access to data? Budgetary metadata of high quality is critical to findability and useful display.
Classification would also be conducive to discovery, keeping in mind that “… classification is not necessarily an objective process. It is shaped by the assumptions and goals of people and organizations. These worldviews and goals often see disagreement and evolve over time.” Topic maps have mechanisms to reflect changes in terminology without discarding older terms, and different views of the world can coexist and be indicated by scope.
Access to data doesn’t automatically imply transparency and findability. The increasing number of Open Government efforts (so far primarily in the U.K. and the U.S.) look like a great opportunity for topic maps.
ELAG 2010 featured a “Workshop on FRBR and Identifiers”. The presentation gives an overview of which identifiers exist for various forms of resources, with special emphasis on FRBR entities, and including a brief look at the role of identifiers in linked data. Just for completeness’ sake, I won’t talk about URL identifiers for FRBR entities and relationships here – a vast topic in and of itself.
Library-created control numbers identify the metadata about the resource, not the resource itself (like ISBNs). For one resource different institutions (publishers, booksellers, libraries) create different identifiers – but how reliable and consistent are they? One ISBN doesn’t necessarily stand for one book only, undermining uniqueness in many cases. As WorldCat data shows (assuming that catalogers correctly recorded the details available), we have a large number of books without ISBNs (which only came into widespread use in the 1970s). Generally there is a considerable percentage of resources which are not identified in a standard way. So the picture is not uniform at all, and some of the established identifiers will have to be reconsidered: the ISBN system is likely going to reach its limits with the proliferation of e-books, and maybe the library world will sometime stop thinking in terms of “records” (possibly with metadata being assembled just in time instead of just in case) – will the LCCN be obsolete then?
There are many efforts of creating and maintaining identifiers in different domains. Libraries around the world maintain separate authority files (albeit tied together in VIAF) and create separate “records” and thus identifiers for the same resource. It’s important for identifiers to be reused outside their specific areas. Library identifiers have lingered in silos for a long time and are only slowly being adopted by “outside” communities (e.g. German Wikipedia linked identifiers from the National Library’s name authority file with the articles about the respective persons).
A given FRBR work usually has various manifestations which in turn have several identifiers (leaving out the expression level for the moment) – those are the most commonly used (ISBN, LCCN…). OpenLibrary, for one, collocates manifestation identifiers. Topic maps could integrate information from heterogeneous sources on the basis of identifiers. We can probably never achieve global agreement on one unique bibliographic identifier, nor do we have to if we have systems that enable us to consolidate the diversity of identifiers.
Robert B. Allen, College of Information Science and Technology, Drexel University, has a draft textbook entitled Information: A Fundamental Construct available open access at http://boballen.info/ISS/. It provides “a broad overview of informatics, information science, and information systems”, treating topics like knowledge representation, human cognition, natural language, text and human language technologies, to name but a few. A good way to brush up what we already know or to learn something new!
To follow up on the last post, picking up Ranganathan’s law “Save the time of the reader”, one way of saving the time of the reader is to refine faceted search and browsing. The bigger the indexed corpus (and that would be the case when including abstracts, TOCs, indexes etc.), the more hits a query will yield, and there will potentially be a higher number of irrelevant results. It is also getting more difficult for the user to see why a certain result is returned if the search term doesn’t show up in the readily identifiable fields like title, author or subject heading. We don’t considerably save the user’s time if they have to wade through pages of search results, just because we don’t want them to miss a book they might find useful that came up due to the search term in the back-of-the-book index.
The discovery layers that more and more replace traditional library OPACs offer faceting of results by various criteria (language, format, year of publication etc.). How about introducing what I would call a “source-sensitive facet”? This facet will show where the search term occurs, whether it’s in the metadata, the subject heading(s), or in supplementary material such as TOCs, abstracts or indexes. Scottsdale Public Library has such a facet in place:
Their “Search found in” facet is apparently generated from the metadata proper, but you could also imagine indicating the location (e.g. an electronic document associated with a given item) where the search term was found: “Search found in TOC / abstract / index “. The document in question has to be typed for the system to recognize which type (TOC, abstract, index) it belongs to and display this information. Such a facet would make the results more transparent, since some users are confused about where their search term occurs in the data and why a specific item turns up in the list of results. Those interested enough to find out will have a tool at hand.
Library metadata will increasingly include (some or all of) the content of the resource cataloged to complement the descriptive data. It is currently hard for libraries to capture what is in a book (granted, we have subject headings, but they categorize resources using controlled rather than natural language), and as a consequence some books are prone to being easily overlooked.
A friend of mine is doing research on two hardly known female Austrian composers of the 1930s for her Master’s thesis. In one instance, she only found out that a book contained important biographical information about them through the book’s index on the publisher’s website listing their names (which she stumbled across via … Google). Obviously neither a table of contents nor a library record could have supplied this pointer. So, failing a full-text scan, digital versions of the parts of a book that serve as windows into the content are helpful additions to the metadata. This is not restricted to TOCs (which can already be found in a number of records), but includes indexes as well. Incidentally, it is not without a reason that the Topic Maps technology was originally developed for automated electronic indexes – they are gateways into a book’s contents, reflecting the main concepts it deals with. All of the above is assuming, of course, that an index exists – as Baron Campbell says in Lives of the Chief Justices:
“So essential do I consider an Index to be to every book, that I proposed to bring a Bill into Parliament to deprive an author who publishes a book without an index of the privilege of copyright; and, moreover, to subject him, for his offence, to a pecuniary penalty.”
(quoted from: Thomas B. Passin: Explorer’s Guide to the Semantic Web)
If publishers or other providers of bibliographic metadata are ready to make additional material about their publications available for display in library catalogs (so that efforts aren’t duplicated), if reviews, abstracts, indexes or TOCs are assembled in one central place, indexed and thus searchable, we stand a much better chance to help users discover the knowledge structure contained in books. Moreover, by not requiring them to go to the shelf to look inside the book or even request it from closed stacks, we get a step closer to fulfilling Ranganathan’s fourth law: Save the time of the reader.
The cataloging landscape is changing. This recent article in NextSpace, entitled “The catalog is out of the box”, highlights the fact that cataloging (or rather metadata creation) extends beyond producing data for display in library catalogs and metadata is being used in innovative ways. Data and metadata are increasingly gaining importance as we move from a web of documents to a web of data, and so are the people who create and curate them – those people just might be previous catalogers.
New job titles emerge like that of the data librarian, and new fields of activity are opening up both inside and outside libraries for data curators and creators (formerly known as “catalogers”?). It’s great to see that catalogers are involved in various projects (which attests to a certain degree of visibility of that specific skill set) and that there is an “acknowledgment that metadata is an essential element in the information infrastructure”.
If catalogers look beyond traditional library cataloging and broaden their notion of what “cataloging” is, we can see where and by whom our skills are needed, and we can meet that demand by enhancing our core competencies (being precise, analytical, following rules and standards etc.). We will benefit from acquiring more skills in building and implementing ontologies for different domains, data modeling for information systems other than the library catalog, and technological know-how.
It’s not hard to predict that we will see semantic technologies play a major role in metadata processing, interchange and presentation. As mentioned in the article, metadata is increasingly combined in new and meaningful ways. Truly computer-understandable metadata will get more important in the face of “intelligent” applications. Catalogers can bring analytical and conceptual thinking to the table, but we should not shy away from technology, either.
It’s also desirable for catalogers to have some understanding of where other communities are going in terms of metadata, ontologies and semantic techniques (the sciences – check out Anita de Waard‘s work, for example; medicine; museums; archives). This will make their skills transferrable to emerging information services like those presented in the article and marketable both inside and outside the library world.
Now that I talked about Topic Maps in the first posts, aren’t you curious as to what lies behind that concept? These two texts helped me get a first overview:
The TAO of Topic Maps by Steve Pepper and
a paper by Conal Tuohy explaining how Topic Maps are used at the New Zealand Electronic Text Centre.
(kudos to Alexander Johannesen for facilitating the first steps)