Field books (primary source documents that are created during field research and that are of big importance for natural history) are unique because they come in a variety of formats and material types. The recent D-Lib Magazine features an article by Sonoe Nakasone and Carolyn Sheffield, “Descriptive metadata for field books: methods and practices of the Field Book Project”. In an earlier post I talked about the project, but this article now goes into more detail regarding the descriptive metadata used for the Field Book Registry. It was decided that these items are going to be described both on the collection and the item level. Metadata schemas from the museum, archives and library communities were chosen for this task: Natural Collections Description (NCD) is used for collection level records and Metadata Object Description Schema (MODS) for item level records, with Encoded Archival Context (EAC) being used for authority records of collectors, organization and expeditions. These schemas are combined into one database, the Field Book Registry. Explicit connections are established between collection, item and authority records via IDs, and controlled vocabularies like the Thesaurus of Geographic Names (TGN) or LCSH enrich the records. The articles closes with screenshots of the cataloging interface and with mentioning some challenges and future developments.
The OLAC Movie & Video Credit Annotation Experiment is part of a larger project to make it easier to find film and video in libraries and archives. This experiment breaks current movie records down to pull out all the cast and crew information so that it may be re-ordered and manipulated. We also want to make explicit connections between cast and crew names and their roles or functions in the movie production. Adding these formal connections to movie records will allow us to provide a better user experience. For example, library patrons would be able to search just for directors or just for cast members or only for movies where Clint Eastwood is actually in the cast rather than all the movies that he is connected with. […]
We therefore want to convert our existing records into more structured sets of data. Eventually, we intend to automate most of this conversion. For now, we need help from human volunteers, who can train our software to recognize the many ways names and roles have been listed in library records for movies.
I already submitted a number of annotations, and it’s so much fun and so easy that I could hardly stop! So join in if you have a few minutes to spare and contribute to this crowd-sourcing project.
I was just watching OCLC’s recent presentation on their Next Generation Metadata Management which includes an interesting overview (YouTube) by VIVA, the Virtual Library of Virginia, that coordinates the collection management and resource sharing of online resources in a consortial environment.
Managing the e-book metadata for the Austrian library consortium and also serving one library with DDA, I wish I had such a (relatively) unified system of record delivery that still allows you to make individual local settings for each library. Let me briefly describe my current workflow: For certain publishers or packages, we have agreements with German library networks that pre-process the metadata and offer it to other consortia who want to use it. Springer would be an example. But not all Springer packages are covered, so I also need to go to their portal, download records from there and customize them myself. In addition I have to set myself reminders each month for these tasks. The fact that we have these different sources of metadata means that different processing methods are involved for each of these sources – some elements (e.g. some shell scripts) are the same but on the whole there is no identical workflow for all the e-book metadata in our consortium.
A few years ago, there was talk that the German National Library would offer a central metadata pool for e-books for the German-speaking library community, but unfortunately that never panned out. What I find very attractive about OCLC’s system is that you get automatically notified of new, updated or deleted records and can distribute them widely while at the same time have local customizations.
In its recent installment, entitled “Curating the Analog, Curating the Digital”, Archives remixed, part of the Archive Journal, features two articles that might be of interest to librarians and especially catalogers:
- “All in the Family: a dinner table conversation about libraries, archives, data, and science” by sisters Kristen A. Yarmey (Digital Services Librarian) and Lynn A. Yarmey (Lead Data Curator) explores the relationships between libraries, archives, and data curation, covering topics like containers, content and context, metadata or creators and users.
- “Disrespect des Fonds: Rethinking Arrangement and Description in Born-Digital Archives” by Jefferson Bailey looks at the question: “How will traditional principles of archival arrangement and description be challenged or modified to account for born-digital materials?”, outlining the shift from the linear narrative of a traditional finding aid to a dynamic system of multiple interrelationships of born-digital archival material.
Art historians and information systems specialists have been working for two years to make German art sales catalogs (in total about 236,000 art-sale records from more than 1,600 German auction catalogs dating from 1930 to 1945) available online in the Getty Provenance Index. The extensive digitization project was carried out in cooperation with libraries in Berlin and Heidelberg. Read this blog post to learn more about the details of the steps involved: scanning and performing OCR, parsing the data via shell scripts and Perl, hand-editing the data, developing the database and publishing the data as part of the Getty Provenance Index.
At this year’s German library conference there were two presentations about automatic metadata generation. The ZBW (Deutsche Zentralbibliothek für Wirtschaftswissenschaften, German National Library of Economics) catalogs electronic as well as print articles from books and journals. Metadata for these articles are generated automatically from scanned tables of contents. But there is a need to enrich them for various reasons: In order to provide reliable links to electronic versions of articles, identifiers (URLs) and metadata have to be correct. Furthermore, in order to make the data ready for linked data applications or bibliometric rankings, authority control of authors, topics and other entities is key. So automatic metadata generation is a great help in achieving quantity, but quality (human intervention by linking to authority controlled data) is necessary to make the data usable and future-proof (description and slides in German here)
The German National Library reported on their project of automatically extracting metadata from title pages of doctoral dissertations. Since these pages conform to a certain pattern where the same information can be found in the same place on each title page of each thesis, software that can decipher structures according to rules, thesauri and OCR can be used. Here’s a summary of the project in English
and the conference slides in German can be found here.
It’s always interesting to follow the progress and practical examples of automated metadata generation because descriptive cataloging can be supported and accelerated, and human skills can be used for quality management and error assessment instead of manually entering information that can be captured automatically.
The HathiTrust Research Center (HTRC) “is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.”
Here’s a video that details HTRC’s mission of supporting scholars (e.g. in the digital humanities) in their research:
I would be particularly interested in learning more about a project mentioned 2.45 minutes into the video that involves “automatically enhancing the metadata that describes the volumes”, ultimately resulting in higher quality metadata – maybe we’ll hear more about it in the future.
via CDLINFO News