One of my new tasks is data analysis and correction. I play a part in improving data quality and by extension the search and find experience of the user. Quantitative data analysis (what is in which fixed field, for example, and which errors occur) precedes correction which very often requires human checking.
Intellectual review of data anomalies (shown in reports) can help bring patterns to the surface that would otherwise go unnoticed, or at least unanalyzed. These patterns might tell us something about which mistakes catalogers make (for various reasons: because they are too “imaginative”, because of misunderstandings etc.), and the more harmful glitches can be addressed in cataloger training sessions. Or they may point us towards some strange system behavior we don’t want.
Data quality issues have started to become urgent because of the discovery layer in use in my library consortium. If for no other reason, we’ll have to work towards making the data as reliable as possible so that the discovery layer can work consistently. It exposes data errors that the OPAC didn’t care about. As an example, records whose holdings data contains some corrupt character(s) are rejected and not “published” from the ILS into the discovery system, or values that have been entered into invalid subfields are not displayed correctly or not at all. The local data of the libraries that are part of the library network will have to be corrected so that the central catalog and the discovery tool can work more reliably. We as a consortium can offer support with this task to our member libraries by providing statistics and data analysis. But in the end they are responsible for “their” (or rather “our”) data and for cleaning up errors. Both for us and for them, data analysis and quality management become more important, and I think we will all need a real strategy for dealing with this issue. Do you have such a strategy that addresses data quality issues in a systematic way, or is it something that is rather neglected in daily work?