Monthly Archives: September 2010

Associative index model

The paper “An associative index model for the results list based on Vannevar Bush’s selection concept” by Charles Cole, Charles-Antoine Julien and John E. Leide of McGill University, Montreal, which appears in the latest issue of Information Research, contends that algorithmically created methods of refining results lists in online catalogs are not well suited to meet the users’ information needs. The authors draw on Vannevar Bush (whose seminal text, “As we may think” (1945), is available here) and Charles Cutter to develop an associative index model.

Based on an understanding of cognitive processes during a search, the model establishes a second collocation set, triggered by the user’s associative thinking while perusing the first, system-derived results list. This second set is considered to better match the user’s actual information need. In my view it is only an externalization and formalization of thought processes at work in a more or less conscious way, including epistemological questions like, how do we look for information, how do we formulate a query, i.e. use natural language to reflect our information need, how do we process and organize the findings, how is association involved in search and selection.

Some questions remain open, for instance, why didn’t the authors revert to the FRBR user tasks instead of creating their own with slightly different meanings, or how would relevance feedback relate to their approach. I wonder what role topic maps could play in an associative retrieval tool – enabling users to identify subjects in their own words, i.e. from their thought associations, feeding improvements suggested by users back. A dynamic system getting “smarter” through user input which complements computational algorithms.

Advertisements

Interpreting MARC – article

The current issue of the Code4Lib Journal features an excellent article by Jason Thomale, “Interpreting MARC: Where’s the Bibliographic Data?”. The abstract:

The MARC data format was created early in the history of digital computers. In this article, the author entertains the notion that viewing MARC from a modern technological perspective leads to interpretive problems such as a confusion of “bibliographic data” with “catalog records.” He explores this idea through examining a specific MARC interpretation task that he undertook early in his career and then revisited nearly four years later. Revising the code that performed the task confronted him with his own misconceptions about MARC that were rooted in his worldview about what he thought “structured data” should be and helped him to place MARC in a more appropriate context.

I have to say that the project he writes about (ditigal music collection) is very complex, because music cataloging has special rules on top of the regular ones. This doesn’t change the assessment of the MARC data structure, though, which has “as much in common with a textual markup language (such as SGML or HTML) as it does with what we might consider to be ‘structured data.'”

It’s a worthwhile read for both catalogers and programmers: it illustrates the programmer’s perspective looking at and working with MARC data and it provides insights into what made MARC the way it is and into possibilities of dealing effectively with the quirks that exist.