Image retrieval relies heavily on keywords as crucial textual metadata which allows efficient access. Many digital libraries feature visual content, so they might benefit from the experience of image retrieval in a commercial setting.
The American Society of Picture Professionals organized a program on keywording on March 25, 2008, the recordings of which are available from the ASPP events site. Both photo buyers and media catalogers were on the panels, looking at keywording from their perspectives as researchers and metadata providers. I think the panelists made some excellent points which I’ll paraphrase here.
Image keywords fall into two broad categories: content (what is it, how many, what color etc.) and concepts (think subject headings). From the keyworders’ / photo stock companies’ perspective, accurate keywords and quick and relevant search results are essential to their service. There might be a lot of keywords that *could* be associated with an image but not all of them are equally useful for increasing findability, so fewer keywords that are accurate and relevant are better than many terms that don’t have much to do with the image. The trick is to capture the most important aspects of the picture (both regarding content and concepts). Keywords should focus on the concepts most strongly depicted and expressed in the picture.
All keyworders agreed that striving for consistency using controlled vocabularies as well as taking advantage of hierarchical relationships is vital. Companies have tight internal standards for keywording and tend to be stricter and more conservative in selecting keywords the bigger the collection. Tangential terms are avoided, especially for content keywords. For concepts, they are a bit more open to a wider variety of terms because they are so subjective. However, you have to pay attention to concept keywords that get over-used and thus less useful.
The constant difficulty keyworders face is what to include, what to leave out, and finding out what matters most to customers. One panelist noted that there is a certain difference between commercial and editorial images. The latter, for example, require more factual information, like date or location. Some pieces of information can most reliably be supplied by the photographers themselves, like the exact location (town, region) where the image was shot. Keyworders, however, should also do research to make location keywords more exact.
What is the psychology of the user? What do they search for and how do they make their selection? One means of getting a better idea is to investigate the search logs to take advantage of user interaction with the site that might tell you something about search behavior or needs. One photo buyer said that he would like to have more keywords that refer to the style (minimalistic etc.) and mood of the image; more terms describing atmosphere would also be helpful. Sometimes layout issues determine what kind of photos buyers search for: if you have to have a headline at the top right corner, you’ll want pictures that leave space for type in that area. But how to keyword this and how to search for this particular requirement?
Buyers’ strategies include utilizing existing keywords for additional searches to find patterns of what terms are used to express things. Features they would like to see are: recommendation models like those of the music site Pandora (when someone chooses several images, show further images that are similar), Amazon-like referral features (people who searched for … also looked for …).
A decision in terms of the search configuration is whether to enable searching captions as well as keywords. Important words appearing in the caption should be repeated in the keywords if captions are not searchable. Captions can also be used to narrow down a search because their descriptions are usually more detailed and precise.
Another buyer raised the question of whether to include user content. He said that tags by people generate very good results for him, so stock photo companies should allow users, e.g. buyers, to add tags to photos they ran across but whose metadata didn’t contain the terms most useful for finding them. User-generated tags can help find keywords that express what the image really boils down to. Some smaller agencies seem to be working with tags already.
One sentence that really summed up issues of keywording and image retrieval (or indeed searching in general) well was: “Everybody has a different interpretation of what it is they’re looking for and how they describe it, that’s the hardest part.”