[ CogSci Summaries home | UP | email ]

Kosslyn, S. M. (1994) Image and Brain: The Resolution of the Imagery Debate. MIT Press, Cambridge, MA.

  author =       "Stephen M. Kosslyn",
  year =         "1994",
  title =        "Image and Brain: The Resolution of the Imagery Debate",
  publisher =    "MIT Press, Cambridge, MA", 

Author of the summary: Jim Davies, 2000, jim@jimdavies.org

Cite this paper for:


Detailed Outline

Chapter 1: Resolving the Imagery Debates

Imagery was crucial in psychological theory until psychology and philosophy parted ways. Imagery kind of left psychology due to the difficulty of objectively observing it. (p1) Images were also disliked because it seemed to some to require a homunculus. Pavio (1971) says that your ability to remember words depends on your ability to visualize their referents. (p2) The imagery debate has three phases: (p4)
  1. What type of representation is used in imagery?
  2. What is the nature of the empirical results? Were there methodological problems with them?
  3. A response to the above 2 with brain data
The representation debate was not at the level of neurons. It's like the computer: You can have a list structure in a computer program, even though there is no actual list in the computer. What is important is that "the machine specifies the information in a way that functions as a list." [p4]

"A propositional representation is a "mental sentence" that specifies unambiguously the meaning of an assertion." [p5] Propositions must have relations that connect entities. The basic elements of propositions are symbols.

In contrast to the propositional representation is the descriptive representation, which "specifies the locations and values of points in space." They need not be actually near each other; this is just the way the information is processed. An array in a computer is a descriptive representation. "The spatial relations among the these patterns in the functional space correspond to the spatial relations among the parts themselves." Some characteristics of desciptive representations are that size and orientation must be specified. The basic element is a point with a location and value. [p6]

During this debate all parties agreed that:

The debate was about whether mental images rely on depictive representations or whether they are purely propositional.

Pylyshyn (1973) argued for a strictly propositional representation because of the homunculus problem (no little man to see it, not light, etc.) He also argued for amodal representations. Kosslyn & Pomerantz (1977) said that depictive reps were not ruled out.

Then began the debate over the meaning of the experiments. Kosslyn (1973) found that it takes longer to scan long distances in mental images. Anderson and Bower (1973) created a propositional account of this saying that distant objects in an image are further apart in a semantic net. [p7] Further experiments showed that it was distance per se that did it. The propositionalists countered that dummy propositions could be inserted to account for the distance. How lame.

Essentially, the propositional theory could take many forms, and many additions were post-hoc, ad-hoc.. Propositions can do about anything. In contrast, the depictive theory did not change much and in addition was more specific in its representational claims. Propositionalists still defend their position.

There were methodological attacks as well. Task demands were brought up but ultimately found to be insubstantial. Expectancy effects are there, but it changes the means, not the differences found.

Subjects have no control over the time it takes to change attention across an imaged object. [p11] Anderson (1978) proved (though some thought he was wrong) that a propositional representation will always be able to account for desciptive data due to structure-process trade-offs. [p12] The way out was to use non-behavioral data.

There are neurons in the cortical area that roughly preserve the structure of the retina (in V1). Specific brain damage can cause perceptual blind spots.

"Almost every visual area in the monkey that sends fibers to another visual area also receives fibers from that area." These fibers are of comparable sizes, suggesting that stored visual information can produce a mental image. [p15]

People who ignore things on one side also ignore things on that side of their mental imagery. (Bisiach & Luzzatti 1978).

[p17] The same brain areas are active during both perception and mental imagery.

When letters are imaged small, it takes longer to make judgements about their shape. Kosslyn's explanation: when imaged small, the neurons in that area are forced to work harder-- when things are imaged large, an area will likely only have a single element associated with it. When imaged small, the foveal area is active in the brain, and when imaged large the parafoveal area is more active. [p18]

The smalllest discernable letters are a fraction of a degree big in imagery. [p19] Kosslyn estimated the size of the largest visual images to be 16.5 to 20.5 degrees of visual angle (for rectangles, 12.5 for animal line drawings). Larger images cause the experience of "overflow."

[p20] Imagery in V1 is not epiphenomenal because damage there causes problems with imaging.

Chapter 2: Carving a system at its joints

Computation: interpretable, systematic i/o [p26]

This book assumes that brain computations are performed by connectionist networks. They are good at constraint satisfaction. (If the bed goes here, the nighttable must go here.) But the internal working of the networks are not intended to be modelling the internal workings of the visual subsystems-- they are black box i/o systems, and that is the level at which they should be analyzed. This book's theory is of what the subsystems are in terms of i/o. The networks are feedforward backprop ones. [p27]

[p29] The subsystems discussed here are not "modular" in Fodor's sense because:

This book assumes that subsystems always operate (no central executive turns them on.) [p32] cooperative computation: When a subsystem later in a process informs an earlier subsystem. [p32]

Principle of the division of labor: An adaptationist heuristic that says that two different subsystems are likely to exist if it is more efficient for two systems rather than one to do two tasks. [p33]

Marr's (1982) aspects of computation: (what is computed-- the top of the Marr hierarchy.)

  1. specify the goal of the computation
  2. characterize the problem itself (e.g. which parts of input to pay attention to)
  3. characterize operation necessary to derive output from input
  4. specify the assumptions that must be met
The other two lower levels of the hierarchy is how it is computed (algorithmic level) and how that algorithm is implemented (implementation level). (Marr 1982)

Kosslyn has a problem with the computation/algorithm distinction. It is fuzzy because, for example, addition is a part of multiplication's algorithm. But isn't addition supposed to be a computation? Computations can have other computations in their algorithm, and it's not clear what the correct perspective is-- it depends on how you look at it. [p36] He also has problems with the alg/implementation distinction. Kosslyn has a triangle model which recommends rough sketches of abilities, computation and brain function, in which all can inform the others. [p38] Zero-crossings: a place in an image where light turns to dark or vice-versa. [p34]

Empirical methods used

  1. Response times and Error Rates.
    strengths: Easy to carry out.
    weaknesses: data is underconstraining. [p40] Many theories consistent with data. unclear: should you use response times or logs of them?
  2. Divided-Visual-Field Studies (show something on one side of visual field so that hemisphere processes it first; see if one side is faster than the other.)
    strengths: If it's better on one side (perhaps because of degraded info passing over the corpus collosum), evidence that subsystems are not identical. Can be run on normals.
    weaknesses: might be overwhelmed by attention. Left hemisphere attends to smaller areas [p41]. Historically difficult to replicate.
  3. Deficits following brain damage
    strengths: Surprising results.
    weaknesses: Damaging an area does not necessarily mean the others are working normally. Other systems might adapt. New stategies might be created. Also, brain-damaged people are slower in general; this might make them unable to do some things at all, in particular high-level cognition. Gregory's (1966) example: Removing a resistor from a radio makes it squawk, it doesn't follow that the function of a resistor is a squawk suppressor.
  4. EEG (electroencephalography), ERP (event-related potentials, amplitude measured over time), MEG (magnetoencephalography, recording of magnetic fields) (all these are recording electrical activity from scalp nodes) [p45]
    strengths: good temporal resolution, noninvasive, inexpensive (except MEG)
    weaknesses: Where are the signals coming from? Distortion by scalp and skull (except MEG).
  5. 133Xe rCBF (regional cerebral blood flow. Subject inhales radioactive gas. Active parts of the brain absorb it more; the radioactivity is recorded.)
    strengths: widely available, easy and inexpensive
    weaknesses: poor spatial and temporal resolution. Subject must do it 5 minutes per image.
  6. SPECT (intake isotopes that emit photons)
    strengths: easy, inexpensive, widely available
    weaknesses: long lasting, experiments one week apart. not spatially or temporally sensitive.
  7. PET (Positron Emission Tomography, same as SPECT but uses coincidence detection)
    strengths: better resolution than SPECT. Can do many scans in a single session.
    weaknesses: expensive, invasive, exotic equipment.
  8. fMRI (functional magnetic resonance imaging, measures blood flow and oxygen use)
    strengths: good resolution, inexpensive, non-invasive
    weaknesses: not as sensitive as PET
    strengths: observe neural activity in a living being
    weaknesses: cannot distinguish excitatory from inhibitory processes. Efficiency in processing may mean that blood flow does not show up where the processing is happening. Averaging across subjects means individual differences are lost. Subtracting is difficult because the brain is always doing something.
  10. Animal brain lesions [p49]
    strengths: control of what gets destroyed. For imagery, monkeys have similar visual abilities to humans and similar brains.
    weaknesses: all the weaknesses of human brain damage. Differences between human and non-human brains.
  11. Animal single-cell recording studies
    weaknesses: What aspects of the stimulus is driving the cell? How do you know that some untested stimulus will not provide the max response? Response rate or modulation of frequency?
dissociation: when patient p can do task x but not y. [p42]

double dissociation: When 2 patients have opposite dissociations. Some use this to imply that there is at least one subsystem they do not share.

association: if a difficulty in performing x is always accompanied by trouble in doing y. Some infer that at least one subsystem is shared by x and y.

Chapter 3: High-level vision

Low level vision is understood better than high-level. [p53] Low level vision (line detection, etc.) depends on the stimulus and does not involve previously stored information. Imagery depends on high-level vision.

Segal & Fusella 1970: visual imagery interferes with visual perception, auditory imagery interferes with audio perception [p55]

Intraub & Hoffman (1992): People sometimes confuse what was seen and what was imagined. (called reality monitoring)

Farah (1985): forming an image of a shape facilitates perception of that shape

Finke & Schmidt 1978: The McCullough effect ("orientation specific color aftereffects") can be induced by mental imagery [p56]. There were complementary color effects. Traslates from one eye to the other, where the perceptual version of the McCullough effect does not.

Berbaum & Chung 1981: optical illusions like the Muller-Lyer illusion can be generated with imagery. [p57]

There is neuropsychological results that suggest that Imagery and perception share processing mechanisms.

Bisiach, Luzzatti & Perani 1979: Patients who, due to brain problems, ignore one side of their visual field have similar effects with mental imagery. [p59] "It is no small feat to formulate a theory tht is consistent with even the most general characterization of our abilities" referring to visual perception [p60]

Five classes of abilities for object identification:

Basic neuroanatomical constraints on a perceptual protomodel: [p64]

The retina is sensitive to light. It sends information through the ganglion cells, specifically the magnocellular (M) cells, which are for motion and transient stimulation, and parvocellular (P) cells, which respond to color and have better resolution. They go to distinct layers of the lateral geniculate nucleus (LGN). There are several ganglion pathways: the geniculostriate pathway, which forbids object recognition if cut, connects to the LGN and then to V1 (V1 is aka the primary visual cortex, striate cortex, area 17, area OC, or the visual cortex). The tectopulvinar pathway (aka tectofugal pathway), which disables orientation upon whisker brushing if cut, hits the superior colliculus and then the pulvinar, and then diffusely goes to the cortex. A full 32 areas of the macaque brain are involved with visual information processing. It is estimated that they reciprically connect to about 15 other areas (Felleman and Van Essen 1991). There is a lot of information flowing backward. [p66]

There are about 15 areas that somewhat preserve retinal spatial mapping. Roughly speaking, the higher you go in the visual system, the less spatial mapping there is. [p67]

The protomodel: [p69]

 --------information lookup---------------->Attention shifting
|             ^                                    |
|             |                                    |
|           Associative                            |
|            memory      <--Spatial                |
|                ^          properties <-----------|---Visual Buffer
|                |          encoding               |    	    
|                |                                 |
 ---------------------object properties encoding <-Attention window
Template matching theory is problematic because we see absolutely novel positions and shapes all the time and have no trouble identifying what it is we are looking at. [p70] features of our innate visual architecture

Visual buffer: Distinguishes figure from ground, edge detection and regions of homogeneous value.

Attention Window: We can select a contiguous set of points for deep processing. This may help identify objects at different parts of the images.

Ventral system: Shape, color, texture.

dorsal system: location and size (occipital lobe to parietal lobe). Good for guiding actions (like reaching and eye movement)

Associative memory: (implemented in the posterior, superior temporal lobes) where the ventral and dorsal systems output to. [p73] Match here to stored information. Things from other sensory systems come here too.

Information lookup: further, directed perception in response to not knowing exactly what is being looked at.

attention shifting: two parts: moving the body, eyes, etc. and priming the right part of the visual field for receipt of the expected image.

The protomodel of visual mental imagery [p74]

"A visual mental image is a pattern of activation in the visual buffer that is not caused by immediate sensory input."

Questions about visual information result in imagery and perception. The perception is the same as if it came from the sense organs.

[p74] Mental images are processed the same way that perceived images are with three exceptions:
1. Mental Images fade rapidly.
2. Mental images are created from stored information
3. Mental images are malleable (can be rotated, etc.)

[p75] "The fact that objects can be recognized guarantees that visual information must be stored in memory."

This is how images come about: things in memory are activated so much that their lower level connections to the visual buffer are activated.

The areas that appear to store visual information in monkeys is not topographically organized.

To image new situations, access the item in memory, put it where the attention was. Move the attention window to the next place.

[p76] Images can be manipulated and re-interpreted. (guess I wasn't the first one with that idea...)

Chapter 4: Identifying Objects in different locations

[p79] How do we recognize objects when they are on different parts of the retina?

Here are two extreme theories: 1. when something is seen, a memory of it is created at each location and future recognition is with the saved image at that location. (McClelland & Rumelhart) . 2. A seen object is transformed into a standard representation and then recognized. (Hinton 1981). There are various intermediate theories too.

[p82] Kosslyn et al did a neural net experiment to see which was better-- having one neural net do identity and location or two smaller ones do it (the split condition.) Turns out that location is easier (for one thing, it's linearly seperable). When more hidden units were given to the identification, the split version outperformed the unsplit.

[p83] The nets created their own feature detectors.

[p84] the unsplit network did not optimize, which may be why the brain has two physically seperate systems.

[p84] The most eficient split network has 3.5 times more hidden units for identity mapping than for location mapping, which is consistent with the inferior temporal lobe having more than the parietal lobe. He does not mean to say that units are neurons, but that the relative difficulty might be the same for ann's and real nn's.

[p85] You can move the attention window like you move the eyes in visual perception. (but see caveats later)

[p86] "The spatial layout of the visual buffer is useful because it allows representations within the buffer to contain an enormous amount of implicit information about spatial properties; such information can be used for numerous purposes downstream.

The difference between high and low level processes is that only high level processes make use of stored information.

[p87] Spitzer et al found that orientation cells in monkeys can get more sensitive when the task demands it. Kosslyn cites this as evidence that higher level processes can affect the workings of lower level ones.

Haenny et al found that touching a pattern didn't activate cells in v4 normally but did when associated with matching to a visual stimulus.

[p89] The attentional window can move faster than the eye (hoffman 1974). Julesz estimated it takes 30--50 ms to move covert attention. But shape can start to be somewhat encoded before the switch is complete.

[p90] Anderson and Van Essen 1987 made a neurally plausable algorithm for moving the attentional window.

Gradient of attention theory holds that it isn't like a spotlight per se, but has an intense point that gradually drops off to the sides. There is also a quadrant theory.

[p92] Stimulus-based attention shifting: How does the system know where to move the eyes or attentional window? We tend to focus on brightly colored, large or moving objects.

[p93] The Techtopulvinar pathway (of which superior colliculus is a part) seems to play a key role in this. But it's complicated. Surprise surprise.

[p94] The attention window can change its size as well. There is a scope/resolution tradeoff that goes on (p96).

[p95] The incremental change in size happens in constant time.

[p101] Human eyes can move in about 250ms, so it's important that the visual buffer clears itself in that time to be ready for the next stumulus.

[p102] Images can be translated across the visual buffer. (can both be moved at once for more speed???)

[p103] How do we distinguish imagery from perception? One way is that the stimulus-based attention shifting is not at work in imagery. Another reason is that imagery is a controlled process. Another is that images fade quickly, and are not as vivid.

Chapter 5: Identifying objects when different portions are visible

[p105] How do we recognize objects from multiple vantage points, and sometimes in novel positions?

[p107] The ventral system breaks into 2 subsystems: the preprocessing subsystem and the Pattern activation subsystem.

The preprocessing subsystem takes as input an image from the attention window and extracts specific properties.

Lowe (1985): nonaccidental properties are like apparent lines that are actually lines.

The preprocessor sends nonaccidental properties and the image itself, because the properties alone are not always enough to know what the object is.

[p110] Biederman et al (1985) found evidence of the usefulness of nonaccidental properties.

[p113] People are worse at distinguishing living things than non-living things. This may be a semantic difference or it could be because living things are made up of curved lines and curved lines are more difficult to process. Also, living things may look more similar to each other.

[p114] The retinotopically organized areas that compose the visual buffer have a greater concentration at the fovea and are stretched horizontally.

Signal properties: accidental properties that serve to distinguish objects.

[p115] The preprocessor obeys the gestalt laws of similarity, proximity, good continuation, etc. (see Kaufman 1974 ???) It can also be tuned from the top down.

[p116] The preprocessing subsystem is probably implemented in the occipital-temportal area, and maybe a bit of it in v4.

[p117] The pattern activation subsystem stores object non-accidental properties and signal properties. These are matched in parallel. This system outputs a pattern code which outputs the best match and how well it matches. It does not specify left-right orientation of size.

[p118] Long-term visual memories (compressed images (p119)) are likely stored as population codes (Desimone and Ugerleider 1987, Young and Yamane 1992). each neuron helping out with more than one representation.

[p119] Fujita estimates 1000-2000 columns, which coorespond to features.

[p120] Lowe says that if the properties are not enough, a mental image is constructed and compared in the visual buffer. Thus mental imagery is important to normal perception. Size and orientation are adjusted until a best match is found. (here is my transformation theory-- could there be more than just these two transformations involved with perception???)

[p121] It works through vector completion. A recurrent neural network can fill in gaps of noisy data.

[p124] the size of the attention window is adjusteed to new stimuli, and at the same time "imagery feedback is projected back into the visual buffer to augment the input." When you expect something, you activate it and resize.

[p126] The brain areas that store visual memories are not organized topographically. the local geometry is implicit in this representation, and must be imaged to get it out.

[p127] View centered representations theory posits stored images from multiple points of view. At most 6 are needed. Opposing that is object centered (Marr's theory) in which there is a view-independent rep.

[p128] Cells were found that responded to imagery only from a particular view.

View centered is more memory intensive, object centered is more process intensive. In fact, no artificial vision system has been able to do it across many objects.

[p129] Rock (1973) showed Ss blobs. Later they could not recognize them in a different orientation unless they knew about it--even if they tilted their heads so that the retinal image was the same! Evidence of a frame of reference encoding that takes into account head position.

[p130] Ss are faster at upside down recognition than 120 degrees. Also once you have seen it from that orientation, you can recognize it immediately.

[p131] But how, then, can you readily recognize items in new orientations? How come people can't remember which way Lincoln faces on a penny?

There are no natural objects that need be discriminated from mirror-reversed shapes. Theory: stim and its mirror image is compared in memory.

[p136] Ss have view-centered representations but occasionally make object-centered ones with additional effort.

[p145] Viewpoint consistency constraint: "An object representation is activated to the extent that input is consistent with seeing that object from a single point of view."

[p146] Mental imagery occurs when representations in the pattern activation subsystem are sufficiently activated. They impose activity on the visual buffer.

Mental images: configurations of activity in the visual buffer.

[p149] the mental rotaion experiments' results are not predicted by object centered theory, because if in such a case you shouldn't need to rotate it; all the info is there.

[p150] People can only visualize 40 degrees of visual angle. People do not imagine themselves in the midle of a 3d scene. It is view centered.

[p151] Our weakness at mental imagery can be looked at as adaptive if you consider that the buffer needs to be cleared constantly so we can do normal visual perception.

lowe: Mental images are adjusted to make a better match to an input image.

Chapter 6: Identifying objects in degraded images

Chapter 7: Identifying contorted objects

Chapter 8: Identifying objects: Normal and Damaged brains

Chapter 9: Generating and Maintaining visual images

Chapter 10: Inspecting and transforming visual images

Chapter 11: Visual mental images in the brain

SUMMARY is UNFINISHED (remove this line when finished)

Summary author's notes:

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)
Last modified: Mon Jun 26 10:02:49 EDT 2000