Retrospective conversion – some tips
Now that the retrospective cataloging project I’ve been working on full-time for the last three years has come to an end (yay!), I think some pieces of advice that have accumulated over time are worth sharing. Selection and workflow for recon projects depend on the context of each individual library, but there are some general guidelines too.
- Outsource, bring in external catalogers or make do with the staff time you have, or a combination of these options? Progress, quality, costs, timeframe?
- In case the project is fixed-term, what parts of the collection are most important and should absolutely be cataloged within the timeframe?
- If possible, weed before starting the recon project, because there will be less to catalog and some old material may not be needed anymore anyway.
- Do you want descriptive cataloging only or also subject cataloging? If thorough subject analysis and adding subject headings is too time-consuming, consider having the recon catalogers apply a (simple) classification scheme.
- Should the catalogers work from cards only or should they examine every book? Assess the quality of your cards; you might want to remove added entries beforehand to speed the cataloging when only working with main entries. Consider the benefits of the “examine every book” approach – it could serve as a revision of the collection, items whose condition is bad could be restored, duplicate call numbers or other errors could be removed, too many spare items could be discarded etc. Also think about transportation – can catalogers fetch the books from the stacks themselves or is there remote storage (even if that only means stacks in the basement)?
- What quality level do you expect? Is it okay to provide access level cataloging and upgrade when occasion arises? What will be given priority – cataloging effectively or efficiently (see slide 4 in a presentation of the same title by Rick Newell)?
- The same is true for serials – is holdings information enough and can individual items be barcoded as needed? Take into account “just in case” and “just in time” scenarios.
- What kind of quality control mechanisms do you aim to implement? Checking samples is probably the only feasible way given the amount of data created during most projects.
- Also think about little details such as: what range of inventory numbers are the retrospectively cataloged resources going to get – their own (with a prefix such as “retro-…”) or just plain sequential numbers so that they are not distinguishable from the rest of the collection?
- What local information do you need to have recorded?
- Consider making use of batch loading (every record gets a “recon marker” and subsequently all records with the marker are loaded and items are generated with the appropriate indication of status and location).
- It is crucial to document guidelines for the project as well as decisions made for individual cases.
- Follow-up on whether “the old stuff” gets asked for and used more after being visible and findable in the online catalog.
- What about the physical paper cards? What do you intend to do with them – get rid of them straight away or keep them as a kind of “backup”? Should there be a revision (e.g. creation of lists of what has been cataloged and comparison with the cards)?
This was a period of intense cataloging for me, including serials, books published before 1900 and grey literature. Now on to new tasks!
Read/write cataloging
Two examples of crowd-sourced cataloging at the recent LITA National Forum [1]: The Biodiversity Heritage Library (BHL) enables users to download articles from digitized volumes and create their own metadata for them, which are (given sufficient quality) included in CiteBank, an aggregation platform for citations of and access to biodiversity related articles. This is one method of providing the much-needed article-level discovery: “BHL currently contains interfaces and services that allow users to create their own PDF articles. These documents are retained when appropriate metadata have been provided and are made available to other users through CiteBank.”
The Chicago Underground Library (now renamed the Read/Write Library) allows members of the community to catalog items: “We instruct catalogers to list every contributor to the publication, whether the author, editor, typesetter, or illustrator. They are provided with several controlled taxonomies for defining the format of the item and subjects, but then are also asked to contribute their own tags and write a very short abstract. Another key component of the metadata is that we ask catalogers to assign a geolocation tag to describe either where the item was published or what it describes, which supports navigating the catalog by neighborhood.” If they like, library visitors can play an active role in describing the resources in the collection.
These two initiatives present cataloging not as “read-only” but also as an activity users can participate in, providing a granular level of information. While this involves (to a certain extent) letting go of library standards, quality is monitored by professional librarians in both cases.
[1] Bianca Crowley, Trish Rose-Sandler: “Crowd-sourcing the creation of ‘articles’ within the Biodiversity Heritage Library” (slides)
Margaret Heller, Nell Taylor: “Social Networking the Catalog: A Community Based Approach to Building Your Catalog and Collection” (PDF)
Indexes for ebooks
Some people wonder why, with full-text search available, an ebook might still need an index. If you happen to be one of them, go read “Missing Entry: Whither the eBook Index?”
. This article is a great summary of the value of indexes (even or especially for books in electronic form) and gives examples (with nice illustrations!) of what enhanced indexes might look like. Indexes with enhanced functionality can be much more interactive and appealing to the user than pure lists of words with a page indication.
Just like subject cataloging, indexes offer a value that cannot be replaced by full-text search. They chart a structured map of the content, show paths into the information, expose relationships and go beyond pure search (which just pulls up instances of terms) in that content is analyzed and arranged meaningfully.
Experienced indexer Jan Wright points out in a fascinating podcast on ebook indexing that an index is a discovery feature just like other metadata. She says: “The more tools for getting into information readers are given, the happier they will be.”
The potential of what ebooks can be (beyond static representations of regular print books) has not been tapped yet – indexes are only one example. We’ll just have to wait for EPUB to recognize its importance and address it explicitly in its specification, and for publishers to incorporate smarter indexes into their products.
A meta catalog for digitized works
Imagine a user wants to read a public-domain book in electronic form. She’d be faced with the same situation as users before the advent of unified resource discovery systems – she has to go to various places on the web and do separate searches. Wouldn’t it be nice if there was a meta catalog for digitized works that brings together data from the likes of the Internet Archive, HathiTrust, Project Gutenberg, Europeana or Google Books? It could show what books were digitized by whom, whether they are downloadable, in what format, on what devices they can be read etc. Such a directory could also enable users to compare the quality if the same work is available in different versions. Another benefit would be the reduction of duplications of effort. Having duplicate electronic versions is not necessarily bad, but are time and money not better spent on unique materials not digitized elsewhere? Local priorities could be determined on a more informed basis.
All of this occurred to me while reading an article about the eBooks-on-Demand (EOD) service discovery platform (from p. 229 here, in German). EOD is a joint initiative of over 30 libraries from 12 European countries that each run their own digitization activities. Together they offer the (paid) service that lets users order a public-domain book to be digitized and delivered as an ebook. Instead of relying on users discovering EOD books “by chance” in the respective libraries’ catalogs, a VuFind search interface was built that allows finding books for digitization from all participating libraries in one central place and gives direct access to already digitized items. Records are ingested via OAI or FTP batch upload. For the future the project team plans to enhance the search platform to include links (via API queries of players like those I mentioned above) to works already digitized elsewhere. And this is where the idea of a central overarching catalog for digitized public-domain works popped up. Existing portals such as the Zentrales Verzeichnis digitalisierter Drucke (ZVDD, central catalog of digitized printed works, which covers digital versions created in Germany) go into the right direction, but we definitely have to think more globally and on a larger scale.
UC Next Generation Technical Services Initiative
The University of California libraries have started to implement a Next Generation Technical Services Initiative. One of the task groups is called “Transform cataloging practices”, and one of its charges (PDF) is to “define a ‘good enough’ record standard for all UC original cataloging”.
The obvious advantages for workflow are quicker discoverability of resources, reduction of backlogs, freeing up of staff time. By including records into WorldCat, enhancements by others become possible. Metadata automation can play an important role in these iterative improvements. The effort at UC will be collaborative – the plan is to survey public service librarians, selectors and users in order to determine minimum needs in a bibliographic record.
Already in 2005, the University of California espoused the “good enough” approach, in a report (PDF) entitled Rethinking how we provide bibliographic services for the University of California: “Focus on being good enough instead of being perfect”.
I think it is possible to be “perfect” (or rather as good as we can be) in certain areas of bibliographic description and “good enough” in others. If we know which elements do what in OPACs or discovery systems (indexing, faceting, browsing or pure display), if we know the value of fields for users, we can concentrate on these. Our time and energy is well spent on data elements that are relevant for search and retrieval and that have potential in a linked data world (mainly authority data). However, we could cut back on a lot of footnotes or the statement of responsibility without severely harming the user’s ability to find and locate resources. High quality in the right place, and “good enough” where it is sufficient, this balance might be the way to go forward.