Other Visual Analogy models

Next: Visual Representation Up: Discussion Previous: Other Analogy Models

Other Visual Analogy models

In the previous section I discussed other analogy models. This section narrows the focus to visual analogy models.

ANALOGY is an early visual analogy program (Evans 1968). It solved multiple choice analogy of the kind found on intelligence tests (e.g. A:B::C:?). It does this by describing how to turn A into B, then how C turns into all the choices. It matches the A to B transformation semantic net to the nets of the choices. The best match is determines ANALOGY's choice for the answer. Like my theory, ANALOGY had a visual language consisting of primitives (e.g. dot, circle, square, rectangle, triangle), relations (above, left-of, inside) and transformations (rotate, reflect, expand, contract, add, delete). My theory's ontology has considerable overlap with ANALOGY's.

ANALOGY has many differences with my theory. ANALOGY has no sense of absolute location in its visual representation. It describes only meaningless images, without any tie to what they represent (indeed, the domain is intentionally non-representational). It can only describe transformations that occur in a single step. That is, it cannot represent a series of transformations that must be done in order. It has no sense of transfer-transformations are transferred to other analogs. Because of the domain, it also does not deal with retrieval issues.

GeoRep (Ferguson & Forbus, 2000) takes in line drawings and outputs the visual relations in it with the LLRD (low-level relational describer). Its visual primitives are: line segments, circular arcs, circles, ellipses, splines, and text strings. It finds relations of the following kinds: grouping, proximity detection, reference frame relations, parallel lines, connection relations, polygon and polyline detection, interval relations, and boundary descriptions. Then the HLRD (high-level relational describer) finds higher-level, more domain-specific primitives and relations. GeoRep's content theory is at the low level-the higher level primitives are left up to the modeler. My theory includes GeoRep's primitives, except for splines, which would be modeled in my theory with connected lines and curves.

Like my theory, LetterSpirit is a model of analogical transfer (McGraw & Hofstadter 1993). It takes a stylized seed letter as input and outputs an entire font that has the same style. It does this by determining what letter is presented, determining how the components are drawn, and then drawing the same components of other letters the same way. Like Galatea, the analogies between letters are already in the system: the vertical bar part of the letter d maps to the vertical bar in the letter b, for example. A mapping is created for the input character. For example, the seed letter may be interpreted as an f with the cross-bar suppressed. When the system makes a lower-case t, by analogy, it suppresses the crossbar. This is only theoretical- LetterSpirit never worked as a computer program.

LetterSpirit transfers single transformations/attributes (e.g. crossbar-suppressed) and therefore cannot make analogical transfer of procedures (e.g. moving something, then resizing it) like my theory can. In contrast, one can see how Galatea might be applied to the font domain: The stylistic guidelines in LetterSpirit, such as ``crossbar suppressed'' are like the visual transformations in my theory: it would be a transformation of removing an element from the image, where that element was the crossbar and the image was a prototype letter f. Then the transformation could be applied to the other letters one by one. In this way my theory has more generality than LetterSpirit.

Galatea does not generate the analogical mapping, but other systems, that create mappings with visual information, show that it can be done. The VAMP systems are analogical mappers (Thagard et al., 1992). VAMP.1 uses a hierarchically organized symbol/pixel representation. It superimposes two images, and reports which components have overlapping pixels. VAMP.2 represented images as agents with local knowledge. Mapping is done using ACME/ARCS (Holyoak & Thagard, 1997). The radiation problem mapping was one of the examples to which VAMP.2 was applied.

MAGI (Ferguson, 1994), uses mechanisms from SME. It takes visual representations and uses SME to find examples of symmetry and repetition in a single image. JUXTA (Ferguson & Forbus, 1998) uses MAGI in its processing of a diagram of two parts, and a representation of the caption. It outputs a description of what aligns with what, distracting differences, and important differences. It models how humans understand repetition diagrams.

Like my theory, MAGI, JUXTA, and the VAMPs use visual knowledge. But unlike my theory their focus is on the creation of the mapping rather than on transfer of a solution procedure. MAGI's and my theory are compatible: a MAGI-like system might be used to create the mappings that my theory uses to transfer knowledge. The theory behind the VAMPs is incompatible because they use a different level of representation for the images.

Next: Visual Representation Up: Discussion Previous: Other Analogy Models

Jim Davies 2002-09-12