[ CogSci Summaries home | UP | email ]

Gupta, A.[Amarnath], Jain, R.[Ramesh], Visual Information Retrieval, CACM(40), No. 5, May 1997, pp. 70-79.

        AUTHOR = "Gupta, A. and Jain, R.",
        TITLE = "Visual Information Retrieval",
        JOURNAL = "CACM",
        VOLUME = "40",
        YEAR = "1997",
        NUMBER = "5",
        MONTH = "May",
        PAGES = "70-79"}

Author of the summary: Jim Davies, 2000, jim@jimdavies.org

Cite this paper for:

When people think of information retrieval, they usually think of text. But how do you retrieve images? You could request them with a textual query.

But images are first class information bearing entities all on their own. There are two kinds of information associated with a visual object: metadata (alphanumeric information about the object) and visual features (information contained within the object.) You get visual features through computational processing (computer vision, image processing, etc.) (p72)

The simplist visual features are pixel data. Such information can be used to find color shifted images, images with some color in a given area, etc.


On the other extreme are human-annotated images (e.g. there is a tank here, there is a building here.) Most actual applications fall somewhere inbetween. (p73)


When color is attended to, you can answer questions like "Find all images in which more than 30% of the pixels are sky blue and more than 20% of the pixels are green (an outdoor picture?). You can make a color histogram that shows a frequency distribution of color.

By making a quadtree of historgrams (make a color distribution for all quadrants recursively until the quads are 16x16 pixels or smaller) you can ask questions specific to areas of the image. e.g. find all images with red in the center and blue all around.


Assume the images have pure color and distinct shapes, like typical clip art. With images like this you can segment each image into a number of color regions so each region contains a connected set of points, all of the same color. Then for segments you can compute properties like color, area, elongation and centrality. Then you can answer queries like "find all images with two blue circles." (p74)

face retrieval

At the media lab they have an eigenface database. Each face processed and described by 20 eigenfeatures, representing any face. As transformations become more meaningful, they get more difficult to automate. Completely automated image analysis can only occur in small, controlled domains.


most look at video as a series of images, but this does not take advantage of the motion in the video. They contain 3 kinds of motion information: one due to movement of the objects within a scene, one due to motion of the camera, and one do to special effects.

the query

A system called PICQUERY is a language for formulating queries for images. Another way to do it is to query by example. This can be done with a kind of drawing system. Then the image can be changed to further adjust the query. A good query language should include the following:

Summary author's notes:

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)
Last modified: Thu Jan 27 09:22:13 EST 2000