[ CogSci Summaries home | UP | email ]

Hayward, W.G. & Tarr, M.J. (1995). Spatial language and spatial representation. Cognition, 55, 39-84.

  author = 	 {Hayward, William G. and Tarr, Michael J.},
  title = 	 {Spatial language and spatial representation},
  journal =  	{Cognition},
  year = 	 {1995},
  key = 	 {},
  volume = 	 {55},
  number = 	 {},
  pages = 	 {39--84},
  note = 	 {},
  annote = 	 {}

Author of the summary: Jeanne-Marie Musca, 2007, jmusca@gmail.com

Cite this paper for:


Not all aspects of visual representation can be accessed by our linguistic system. For example, we are unable to describe how we distinguish between faces, yet, visually, we are do so effortlessly. This shows that in order for us communicate, some levels of visual representation are included in linguistic representation, while others are rejected. [p.40]

Talmy 1978: With regards to spatial propositions, like on, above and along, typically, only the schematic relationship is coded, while spatial properties such as shape, colour and size are ignored.

Three possible explanations for partial encoding:

  1. Freyd 1983: Shared information is coarse, and as a result, spatial language cannot be more refined.
  2. The reduced amount of information is necessary for efficiency and other functional reasons.
  3. Spatial language and visual representations are connected by "common underlying structures".
Often, in the study of visual representation the third explanation is assumed (Biederman, 1987; Corballis, 1988; Schacter, Cooper, & Delaney, 1990; Hummel & Biederman, 1992). In these cases, visual representations are characterised by linguistic conceptualisations. This means that important notions such as spatial relations have been described solely in terms of their linguistic representations.

However, this assumed connection has not been empirically validated. Moreover, there are no "well-specified models of the underlying structure of particular spatial relations". [41]
Some studies have partially addressed this issue (Miller & Johnson-Laird, 1976; Mani and Johnson-Laird, 1982; Henkel & Franklin, 1992), but none fully deal with this shortfall.

Bowerman, 1989; Choi & Bowerman, 1991: Cross-cultural variation in spatial predicates indicates a lack of universality which is evidence for the hypothesis that non-linguistic spatial relationships are not mirrored in spatial prepositions. This may even support a Whorfian view of language and cognition.[42]

Talmy, 1978 & 1983: There are some cross-linguistic universals, in that "there exists an assymetry in most descriptions of space, drawing a distinction between a figure object and a reference object. In general, spatial terms specify the position of one object, the figure, by describing its spatial relation relative to another object, the reference." Within this view, much of the information that is visually available to us is in fact not necessary for encoding. The figure object, for instance, only needs to be a dimensionless point to schematize spatial relations such as on . [43]

Futhermore, Talmy's analysis would suggest that spatial prepositions are representative of, as opposed to a classification of, the relationship that exists between the figure and reference objects, in that spatial relations are fuzzy ones.

This reflects the feature vs. prototype debate as models for categorization. In this light, it can be said that spatial prepositions are applied by measuring them against prototypes.

So, it would seem that spatial relations are encoded much like other categorical structures (Reiger 1992). [45]

Landau & Jackendoff, 1993: State that there is a connection between linguistic and spatial representations of spatial relations, in that they exhibit a similar two-tiered structure. As such, there is one level for object identity and one for its location.

In language, the former level consists of object names, which are numerous and richly detailed, and the latter is composed of spatial relation terms, which are quite few in comparison. [46]

Goal of these experiments:
To explain the structure of spatial relations in both visual and linguistic representations, and establish how they correspond to each other, as these areas are lacking in empirical evidence.

Carlson-Radvansky & Irwin, 1993: Tried to answer a similar question. They looked at the frame of reference in which the preposition above was understood and found it to be an environment-centered one. However, this experiment only dealt with objects that were vertically above another, omitting a characterisation of above in which the figure need not be vertical to the reference. So, making generalisations from this study is difficult. This is further evidence of the need for an in depth look at the structure of spatial relations and their representations.

The experiments presented in this article were designed to "systematically [map] out which spatial prepositions best capture configurations of object pairs and then [use] perceptual tasks to investigate whether the visual encoding of those same configurations follow similar patterns". [48]

Limitations of these experiments arise from the fact that the spatial relations examined are limited to one plane. So, the findings can only be applied within a limited context. However, this constraint is present in both the linguistic and perceptual manipulations, which means that we can still compare the two.

Experiment 1

Investigates linguistic descriptions of object pairs when the figure object's position is systematically varied with respect to the reference object's.[49]

In all 4 experiments, subjects were shown several types of 48 pictures on 7 by 7 grids. The reference object was at the centre, and the figure object was placed in every other possible position in the grid.

In Experiment 1, 3 types of displays were used. Subjects were shown the cards sequentially and asked to describe the spatial relationship according to the formula: "The [figure] is [relation] the [reference]", where the relation was a spatial preposition. [50]

If more than one spatial preposition was used, only the first one was considered, and if a response did not fit the formula, it was discarded. The results from all three types were combined. It was found that most of the spatial terms used were either vertically or horizontally oriented prepositions.[51]

The data was analysed by looking at the percent frequency of the horizontally oriented prepositions used to describe a given position in the grid as well as the frequency of the vertically oriented prepositions.[52]

The prepositions used were:

The last point could have two explanations. The first, vertically-oriented terms are primary descriptors of spatial relations, or, it may be that the terms other than the first should not have been removed when more than one was used by the subject, in that the order was not descriptive of their perception of the spatial relation, but instead was the result of conventions, or some other factor. [55]

Experiment 2

Addressed ambiguities in Experiment 1. This was done by restricting the subjects' response to the terms that were predominant in Experiment 1: above, below, left, and right.

In this experiment, only two types of the displays were used. Each image was presented four times along with one of the spatial terms. The subjects were asked to rate the applicability of the spatial term to the relation of the figure object with respect to the reference object on a scale of one (least) to seven (most). [56]

Again, the result from all the types of displays were combined. The average rating for each position in the grid was calculated.

The results from Experiment 1 were supported except for one noteable difference: the application of horizontally oriented prepositions, left and right, was comparable to that of the vertically oriented prepositions, above and below. Therefore, only keeping the first term in descriptions with more than one spatial term had skewed the results in Experiment 1.[58]

Furthermore, the closer the position was to the "primary" axis, the greater the rate of change in applicability.

Discussion of Experiment 1 and 2

The spatial terms have a narrow prototypical region of indeterminate length which is along the orienting axis of the term. Still, the terms apply to a vast space despite their narrow prototypical region. "Essentially, it seems that figure object positions are described according to their proximity to prototypical regions for the complete set of spatial relations in language." [59]

A note of caution: the graded nature of the responses could be due to the task and not a reflection of the structure of spatial terms. However, in Experiment 1, similar results were found without asking the subjects to rate the applicability of the terms, this should allay any doubts with regards to the validity of the results of Experiment 2.

Two possible explanations for the gradation:

  1. When the figure object is not in the prototypical region, the spatial term used is simply the one that best fits the position, and acts only as an approximation.
  2. The spatial term applies to the entire region, which was masked by the task of prefentially rating some positions over others which artificially imposed the gradation on the results.[60]
Again, Experiment 1 seems to rule out the second possibility.

Landau & Jackendoff, 1993: The structure of linguistic representations of spatial relations should also occur in their visual representation. As such, perceptual tasks should reveal a degree of performance in which there are "prototypical regions of superior performance". If this is not the case, then performance should remain constant no matter where the figure object is, or drop off as the object is further distanced from the reference object. In the event that the second case is portrayed, then it would seem that language is more constrained than visual representation.

Experiment 3

Investigate the structure of visual representation to see if it is similar to that of linguistic representation.

On a display similar to those in Experiment 1 and 2, attempt to remember the position of a figure object in relation to the reference object.

Two experimental conditions:

  1. subjects verbalise the relation between the objects before they attempt the task.
  2. subjects perform the task without verbalisation.
The two types of displays used in Experiment 2 were employed but without the spatial terms.[61]

Target configurations were shown on screen. Group one studied it in order to reproduce it. Group two verbally described the relation using spatial prepositions, knowing that they would have to reproduce it. They then both had to perform a distractor task, which made it impossible to fix the position of the figure object. Then, reference object reappeared and the subjects had to click the area they believed to be the centre of the previously displayed figure object.

Accuracy of the estimate was found by calculating the difference, in the vertical and horizontal directions, between the estimate and the target, in pixels.[62]

The positions that were directly vertical of the reference object showed the greatest horizontal accuracy, and similarly, those that were directly horizontal of the reference object showed the greatest vertical accuracy.[63]

There was no difference between the group that verbally described the relationship and the group that didn't.

As predicted, within the prototypical region of the horizontal position, the estimates were more accurate. Furthermore, for the rows the "accuracy increased as the figure object was closer vertically to the reference object".[66]

Similarly, within the prototypical region of the vertical position, the accuracy of the estimates was greater, and the closer the columns were to the reference object, the more accurate the estimates.

Inaccuracies in the estimates had two components, a horizontal and vertical one, which correspond to the prototypical regions for the linguistic representations of spatial relations.

This seems to indicate that there is an connection between the visual and linguistic representations of spatial relations. [67]

Difference: While subjects judged the applicability, a distance effect was observed along the vertical axis, which is not the case with regards to the accuracy scores.

Fundamental difference between Experiments 1 and 2 and Experiment 3: The linguistic representations were qualitative, while perceptual representations were both qualitative and quantative.

Or, perhaps the fact that the subjects had to accurately remember a figure object within a screen, meant that additional clues for were included to help the subject estimate the position of the objects on a vertical and horizontal axis.

Experiment 4

Designed to remove additional clues, such as the edge of the screen, in the subject's visual representation of the spatial relation between the objects.

Subjects were presented with the same displays as in Experiment 3. Two objects would appear on the screen, then a mask, followed by the two objects in either the same or different configuration. The position of the objects relative to the screen varied. The subjects had to decide whether the configuration of the first objects was the same or different as that of the second objects.[68]

Along the horizontal and vertical axis, (prototypical regions) accuracy was greater. Also, as the figure object was distanced from the reference object, the accuracy decreased.[70]

Noteably, trials in which the two configurations had been the same were judged more accurately than those where the configurations were different. This seems to be because 72% of the responses were in the affirmative, while only 28% of the responses were in the negative, probably because of the complexity of the task.

The hit rates in the prototypical regions when compared to that of the other regions were not different, however, the false alarms in the prototypical region were drastically lower. [71]

Two factors affect the accuracy of the visual representation of spatial relations:

  1. Proximity of the figure object to the reference object.
  2. Whether or not the position of the figure object is directly horizontal or vertical relative to the reference object.[72]
This seems to reflect the properties of linguistic representations of spatial relations.

General Discussion

Both visual and linguistic representations of spatial relations display the characteristics of prototype representations.

The variations in accuracy in both are not simply due to a distancing of the figure from the reference, because if such was the case, we would see an even degradation as the figure was distanced from the reference in all degradations. However, the increase of accuracy the axes go against this expectations and are more typical of the idea that spatial relations are encoded within prototypical structures.

The similarities in structure for "spatial language and perceptually encoded spatial relations" seem to indicate that they are directly connected somehow.[74]

While it is true that the images presented were limited in that they only contained a limited set of configurations on a single plane, this is not so far removed from the image plane on our retina. Furthermore, the structure of the configurations may not be far removed from all other spatial templates (Logan & Sadler, in press).

Possible reasons for the correspondence between visual and linguistic representation of spatial relations:

The extent of the similarities between spatial language and vision seems to indicate a causal relationship and not one described by the last point.

Bowerman's (1989; Choi & Bowerman, 1991) claim that there there are no cross linguistic universal's is countered by Talmy's argument regarding the universal "schematization of spatial forms" in figure/reference pairs across all languages.


  1. The visual aspect of spatial relations is independent of language
  2. "Spatial relations in non-linguistic systems and spatial predicates in language both encode spatial forms as prototypes."[76]
  3. This is because linguistic representation of spatial relations arose from the visual representation of space.
Problems with the prototype theory:
Graded structure is even displayed by even numbers and other well-defined categories.

Concepts are represented as prototypes as well as in terms of a core meaning, where prototypes are the model, but membership is determined by the core meaning.

Similarly, it is possible that spatial prepositions have both qualitative and quantative representations, and that by balancing the two, judgement's can be made regarding the appropriateness of specific instances of the preposition. [77]

A corresponding structure could also exist with spatial relations in vision. The quantitative encoding involves specific details about the relations, such as the specific distance between objects, and the qualitative, on the other hand, encodes loosely the spatial relations among objects, relative to each other.

Kosslyn et al., 1989: Present neurological evidence for this distinction visual representations: The left hemisphere was faster than the right at making judgments regarding qualitative relations, and the opposite is true for quantitative relations.[78]

The experiments presented in this article also present evidence for this dual encoding. If there was only quantitative information, the performance in the prototypical regions would not be greater, and if the only information used was qualitative, than those would be the only regions where there would be a high performance. Also, the chaining of spatial terms might be equated to the quantitative representation of spatial relations in vision.[79]

"Spatial relations are schematic"[79] along axes that reflect the laws of the natural world.

As opposed to representations of space that offer great precision, spacial relations are compact, low dimensional and efficient.[80]

Schacter & Cooper, 1992: There are qualitative relations which represent the three dimensional organisation of objects, and there is also quantitative encoding of episodic representations.[81]

Essentially, both quantitative and qualitative aspects are a part of spatial representation, which can be adapted according to the need.


The figures which were considered better instances were horizontally and vertically alignment with respect to the reference objects. This points to environmental influences on spatial representation.

The fact that such regions were preferred also reveals that spatial relations are encoded using prototypical representation.[82]

Similarities between visual and linguistic representation is strong evidence for the notion that they are causally connected, and that at the very least, they have a common root.

Summary author's notes:

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)