GND – a new authority file

In May, the German and Austrian library community will make a considerable leap – the new authority file GND (Gemeinsame Normdatei, joint authority file) will come into effect, replacing SWD (subject authorities), PND (name authorities) and GKD (corporate body authorities). So structures which have grown over decades will be supplanted by a more modern approach. SWD, PND and GKD will be merged into one big authority file for subjects, names and corporate bodies, and a lot of historical, from today’s point of view incomprehensible redundancies will be done away with.

So far, there were records for the same entity in at least two of the authority files, and use depended on whether you wanted to control a subject or a name. What additionally complicated matters is that the files had different fields and subfields for essentially the same information. As the German National Library points out, the former formal distinction between authority control for subject cataloging and for descriptive cataloging will be abandoned in favor of a more object-oriented view in the GND.

I see some very good developments in the GND:

  • There will be a set of new rules that align cataloging with international practice and already move in the direction of RDA. A number of details are in line with what RDA will prescribe. Moreover, it harmonizes the rules for authority work that was so far governed by two sets of rules in the German-speaking library community (again one for descriptive and one for subject cataloging).
  • GND will be much more granular than the old authority files. Pieces of information that used to be in a single field are now placed in several subfields. This opens up new possibilities for indexing and searching and for moving towards linked data (the German National Library already has a linked data service in place).
  • Identity management is taken seriously (or more seriously than before, anyway). When you want to say that a person is a musician, for example, you put a link to the authority record for “musician” including ID into the person’s authority record. When you want to say that the Library of Congress is located in Washington, DC, you create a link to that geographic authority record. So all records will be linked with each other and facts are not just expressed by a string of words but by IDs that enable reliable and machine-readable linking. In fact, links are also typed, i.e. you put a certain code for place or profession into the respective subfield to express the type of link that goes out to other authority records.

However, the principle of granularity and linking has not been followed through entirely: we still have something like
100 $P Ludwig $n XIV. $c Frankreich, König
where we could have the information in $c, France and king, split into separate subfields and linked to their respective authority records.

Of course a project like this is a gigantic undertaking which involves a whole lot of work and cooperation. Databases will have to be rebuilt and re-indexed, bibliographic records will have to be updated, discovery tools will have to ingest these new structures, catalogers will have to be trained, and services like VIAF will have to reflect these changes too. Ultimately, though, the GND will make authority work more efficient and easier, and it takes steps into the right direction regarding compatibility with linked data and alignment with international rules.

Vocabulary management at the Getty

Interesting interview with Patricia Harpring, managing editor of the Getty Vocabulary Program, which is in charge of four widely-used arts-related terminologies. The interview presents a behind-the-scenes look at vocabulary and standards maintenance and development at a leading museum and research institution.

You'll also find some beautiful photography of the Getty Center and its surroundings

Identifiers, FRBR and diversity

ELAG 2010 featured a “Workshop on FRBR and Identifiers”. The presentation gives an overview of which identifiers exist for various forms of resources, with special emphasis on FRBR entities, and including a brief look at the role of identifiers in linked data. Just for completeness’ sake, I won’t talk about URL identifiers for FRBR entities and relationships here – a vast topic in and of itself.

Library-created control numbers identify the metadata about the resource, not the resource itself (like ISBNs). For one resource different institutions (publishers, booksellers, libraries) create different identifiers – but how reliable and consistent are they? One ISBN doesn’t necessarily stand for one book only, undermining uniqueness in many cases. As WorldCat data shows (assuming that catalogers correctly recorded the details available), we have a large number of books without ISBNs (which only came into widespread use in the 1970s). Generally there is a considerable percentage of resources which are not identified in a standard way. So the picture is not uniform at all, and some of the established identifiers will have to be reconsidered: the ISBN system is likely going to reach its limits with the proliferation of e-books, and maybe the library world will sometime stop thinking in terms of “records” (possibly with metadata being assembled just in time instead of just in case) – will the LCCN be obsolete then?

There are many efforts of creating and maintaining identifiers in different domains.  Libraries around the world maintain separate authority files (albeit tied together in VIAF) and create separate “records” and thus identifiers for the same resource. It’s important for identifiers to be reused outside their specific areas. Library identifiers have lingered in silos for a long time and are only slowly being adopted by “outside” communities (e.g. German Wikipedia linked identifiers from the National Library’s name authority file with the articles about the respective persons).

A given FRBR work usually has various manifestations which in turn have several identifiers (leaving out the expression level for the moment) – those are the most commonly used (ISBN, LCCN…). OpenLibrary, for one, collocates manifestation identifiers. Topic maps could integrate information from heterogeneous sources on the basis of identifiers. We can probably never achieve global agreement on one unique bibliographic identifier, nor do we have to if we have systems that enable us to consolidate the diversity of identifiers.