Harvesting captions and other metadata to construct keywords is a seductive idea, as if it can deliver meaningful keywording for free.  However, there are significant limitations with such an approach.  Here are the most important problems:

1. The quality of metadata fields harvested is often poor.  A case of rubbish in, rubbish out.  That can result in inaccurate and irrelevant information being enshrined in keywords as if it is useful and meaningful.

2. Even when you have a good caption, words can be harvested which lead searchers astray.  This can happen with place names and people’s names being confused, amongst others.

3. Often the most useful keywords will never appear in captions or other metadata fields, these include descriptions of clothing or concepts.  This produces a general dumbing down of keywording, and a limitation of what researchers can find, often to obvious, literal labels concentrated on names of people, places and objects.

4. Poor keywording can push the worst images to the top of search results, not just because the images have wrong and irrelevant keywords, but because some of the best or most relevant images don’t have metadata to harvest from.

