Structured data (stored in a relational database with its controlled rows and columns in tables where the pieces of information fit neatly and are identifiable and addressable) and unstructured data (in textual documents, for example, that may be marked up by XML but whose information is not forced into a strict table structure) appear to be something like the two ends on a data management continuum.
In his article for the recent edition of the Code4Lib journal entitled “Using XSLT’s SQL Extension with Encyclopedia Virginia”, Matthew Gibson shows a way to bridge the gulf between these two worlds, using an SQL extension to XSLT to leverage the contents of a relational database in an XML context.
I particularly liked his description of the strength of the relational database for certain tasks and information needs like the ones of his specific project, the Encyclopedia Virginia:
- “version control over every piece of content that goes into the encyclopedia
- one-to-many relationship management of, for instance, one author and/or editor to many articles, one chronological event referenced by many articles, and one media object shared by many articles
- most importantly, more efficient and scalable performance in looking up and retrieving data.”
Certain pieces of information require efficiency and consistency through reference to unique keys in the relational database, which is harder to achieve within a pure XML environment.
Both XML and relational databases have strengths and weaknesses, and whether or not you choose a hybrid approach like the one described in the article depends on your data and workflow requirements.
Two more formal differences that spring to mind:
- SQL is not case-sensitive
- syntax is easier in SQL (no brackets and braces, which I got a bit confused about – color coding helps to clarify the blocks that need these “boundaries”)
The precision and abstraction skills gained from cataloging prove to be invaluable when tackling coding. I realize that logic as one of the paradigms of programming is creeping in more and more, so in the long run it probably won’t be possible to avoid that subject …
So I am stepping behind the scenes of the tool that so far I’ve only used as a cataloger – the ILS. I’m learning about database structures, about the Oracle tables where the raw data is stored and how to navigate them in the command line. Basically, all the systems librarian stuff.
Certain tasks (such as adding prefixes to call numbers) can be much easier than the tedious work of editing by hand when you query and manipulate the database with a few lines of SQL code. I tremendously enjoy descending into the depths of the system! This may only be a small step for mankind, but it’s quite a leap for me. 😉
My horizon has been broadened and my interest for the more technical aspects has been sparked by what I read on- and offline. The following quote from Christine Schwartz’ excellent article “Changing mind-set, changing skill set” (Conversations with catalogers in the 21st century) reflects my feelings: “Those who write in this community, their blogs and articles, have helped me grow as a … librarian probably much more quickly than if I had to struggle on my own.”
P.S.: Here’s a link to a useful interactive tutorial to brush up your SQL skills.
The current issue of the Code4Lib Journal features an excellent article by Jason Thomale, “Interpreting MARC: Where’s the Bibliographic Data?”. The abstract:
The MARC data format was created early in the history of digital computers. In this article, the author entertains the notion that viewing MARC from a modern technological perspective leads to interpretive problems such as a confusion of “bibliographic data” with “catalog records.” He explores this idea through examining a specific MARC interpretation task that he undertook early in his career and then revisited nearly four years later. Revising the code that performed the task confronted him with his own misconceptions about MARC that were rooted in his worldview about what he thought “structured data” should be and helped him to place MARC in a more appropriate context.
I have to say that the project he writes about (ditigal music collection) is very complex, because music cataloging has special rules on top of the regular ones. This doesn’t change the assessment of the MARC data structure, though, which has “as much in common with a textual markup language (such as SGML or HTML) as it does with what we might consider to be ‘structured data.'”
It’s a worthwhile read for both catalogers and programmers: it illustrates the programmer’s perspective looking at and working with MARC data and it provides insights into what made MARC the way it is and into possibilities of dealing effectively with the quirks that exist.