The Evolution of Language from Social Intelligence

Robert Worden

Charteris Ltd, 6 Kinghorn Street, London EC1A 7HT

rworden@dial.pipex.com

Abstract

Primates have much greater social intelligence than other land animals; I propose that there is a dedicated faculty of primate social intelligence, and that language evolved as an extension of this faculty. Many properties of language fit well with this hypothesis.

An evolutionary speed limit implies that the amount of new design information in the human brain since our divergence from chimps is less than 5 KBytes - not enough information for the de novo design of a human language organ in the brain. Therefore language must be based on some pre-existing primate cognitive faculty; social intelligence is the best candidate.

Social intelligence needs to use internal representations of social situations, which must be well-matched to the social domain; therefore they are well suited to serve as the meaning representation for language. The theory of mind, whereby we understand other peoples' mental states, is an extension of social intelligence, and is essential for the use of language. Both language and social intelligence are linked to the prefrontal cortex. In these respects, language is closely linked to social intelligence and may have evolved as an extension of it.

Language requires complex and robust computations for understanding, generation and learning. I have built a computational model of social intelligence which extends first to support a theory of mind and then to support the computations needed for language. This model can understand or generate English sentences with many complex language features, and can learn any part of speech from a few examples. Therefore the computations needed for social intelligence lead naturally to a powerful unification-based model of language, with a robust language acquisition mechanism based on primate social learning.

1. Introduction and Summary

A theory of language evolution can be evaluated by four criteria:

Evolution: It should be consistent with the constraints of the theory of evolution, given the fossil evidence of human evolution and the likely selection pressures on our ancestors.
Language Use: It should agree with what we know about the uses of language - including the range of meanings language can express, the speed and robustness with which we use it, and the facts of language learning, structure and diversity.
Neurophysiology and Anatomy: It should agree with what we know from PET scans, lesion data and other sources about the locations of language processing in the brain.
Computation: It should give a working account of how the computations needed for language are done in the brain - how we represent language meanings, how we convert word sounds into those meanings when understanding language, how we convert in the reverse direction to generate language, and (perhaps hardest) how we learn a language.

These requirements get harder in ascending order. The constraints of the theory of evolution are quite loose; but very few theories of language evolution give any good account of how the computations in the brain - which support the rich syntax and semantics of language - evolved or are done today.

This paper presents a theory of language and its evolution which, I believe, agrees well with all the constraints (1) - (4). The theory proposes that language is an outgrowth of primate social intelligence, which is a distinct faculty of the primate mind, not found in other land mammals.

Quite a lot is now known about primate social intelligence, from recent observations such as those by Cheney and Seyfarth (1990), and we can start to build computational models of how it works (Worden 1996a). Social intelligence requires the use of internal representations of social situations. In the computational model these take the form of scripts, similar to those introduced by Schank and Abelson (1977), and the theory proposes that these internal social representations are the mental basis of language meanings.

We use language to alter what other people know; if we had no idea of what they know and do not know, we could not use language. Therefore language requires a theory of mind. The issue of whether or not any other primates (such as chimps) have a theory of mind is currently unresolved (Carruthers & Smith 1996); some observations in the field suggest that they do, while some laboratory evidence suggests that they do not. However this question is resolved, it is clear that people have a theory of mind which evolved at some stage in our primate or hominid ancestors, and that (as we use the theory of mind for social purposes) it is a facet of our social intelligence.

We can extend the computational model of social intelligence to build a minimal working computational model of the human theory of mind (Worden 1996b). This model uses the script representations of social situations, suitably extended to represent what others may know about the same situations. It also requires powerful script functions to transform from what we might know (and infer) to what others might know and infer.

It turns out that these same script functions, which are required for the theory of mind, are also the computational basis of language processing. We can build a working computational model of language using the same ingredients. Each word in a language is a reversible script function; we apply these functions in one direction to generate sentences, in the other direction to understand them.. This model gives a working account not only of how we use language, but also of how we learn it, by learning a script function for each word.

Social intelligence and a theory of mind are vital pre-requisites for language. I propose that language is not just a new mental faculty which uses these two, but is a direct application of them. There is one working computational model of social intelligence, which extends to the theory of mind, which can then be applied to give language - both learning and performance. This evolutionary progression is illustrated in figure 1.

Figure 1: How social intelligence extended to give a theory of mind, and the same computational elements were then applied to give language.

This model fits well with the usage and properties of language, and resolves an evolutionary puzzle - how did language evolve so fast (in less than 2 million years) ? It evolved fast because new computational abilities were not needed; it is an application of a pre-existing cognitive faculty - social intelligence with a theory of mind.

2. An Evolutionary Speed Limit

A theory of language evolution should be consistent with the neo-Darwinian theory of evolution. The constraints of this theory may seem rather loose (in that it is all too easy to tell a variety of plausible evolutionary stories), but they are not non-existent. In particular, there is a recently-derived limit on the speed of evolution (Worden 1995) which places powerful constraints on the evolution of language. This is a limit on the rate at which new design information in the brain can be created by natural selection.

Nature shows evidence of abundant and appropriate design; the phenotypes of all creatures clearly contain a large amount of design information, which was somehow created by evolution over a long period. Most people believe that evolution created this design information rather slowly. Can we quantify the amount of design information in a species, and the rate at which evolution creates it ? It turns out we can, proving that evolution creates design information only at a very limited rate.

The quantitative measure of design information is called the Genetic Information in the Phenotype (GIP) and is measured on an information-theoretic scale of bits or bytes (1 byte = 8 bits). It is a property of a species, not an individual, depending on attributes which are well-defined (having small variance) across all members of the species. The definition and meaning of GIP is described in (Worden 1995). We expect the human brain to have a large amount of this innate design information (which is distinct from the dynamic information we process in our brains); various lines of evidence suggest that the design information in the human brain is at least 106 bits, or 100 Kilobytes - not a lot for what is commonly considered to be the most marvellous and intricate device in nature.

Since evolution is a statistical process, we can prove a mathematical limit on how fast it can happen, from its statistical properties. This speed limit, derived in (Worden 1995), implies that the design information (GIP) in the human brain can only have increased at a rate of 0.1 bits per generation (at most) over recent evolutionary time. Our line diverged from the chimp line some 7 million years ago, or 350,000 generations ago. Since then , the brains of both species have been changing through selection, but (because of the speed limit) at a rate less than 0.1 bits per generation. So the design of the human brain differs from the chimp brain by at most 35,000 bits, or 5,000 bytes. Out of the 100 kilobytes or more of design information in our brains, only 5% differs from the design of the chimp brain.

This result is something of a shock, since we believe ourselves to be so much more intelligent than chimps. Certainly our brains are much larger and more powerful; the speed limit implies that this comes mainly from an increase in size and power of pre-existing design features, rather than from new design complexity. In the periods of rapid expansion of the hominid brain , the capacity and the horsepower of the engine have increased, but its design can only have changed little.

What does this imply for language ? Some people believe that language is entirely an innovation in the human brain, having little or no antecedent in animal cognition. If so, then the entire design of this new feature must be defined in just 5,000 bytes of design information. 5,000 bytes is equivalent to the information content of around one page of text. This is perhaps ironic; after all the many volumes that have been written about language by linguists, philosophers and scientists, nature has written just one page on the same subject, in the design of the human brain.

To those who have tried to build computational models of language, it seems highly unlikely that the whole of the human language faculty - its capacity for unbounded meaning structures, complex syntax, its robust production, understanding and learning, links to the auditory channel and to other meaning structures in the brain - could be specified in a mere 5,000 bytes of design information. This implies we should look for a theory in which language did not arise de novo in the human brain, but is based in pre-existing animal cognitive faculties. The theory of this paper is such a theory.

Several lines of evidence imply that complex language certainly did not emerge before homo erectus 2 million years ago, as several important concomitants of language (enlarged brain size, larger group size, changes in vocal capability) are linked to the emergence of terrestialism and bipedalism at this time. It is also highly likely that complex language has only arrived within the timeframe of homo sapiens , during the last 250,000 years - as implied by the stability of the Acheulian tool culture and brain size over the preceding 1.5 million years (Aiello, this volume).

In these timescales, the speed limit implies an even tighter bound . If language depends on new cognitive faculties evolved during the time of homo sapiens (of the order of 250,000 years, or 12,000 generations) then by the speed limit, the maximum amount of new design information in the brain over that period is of the order of 150 bytes, or less than the information in this paragraph. Or if (as proposed by Bickerton, this volume) language arose through an extraordinary genetic change in one generation, the maximum amount of useful design information which can arise in one generation is of the order of 40 bits, or 5 bytes (any more productive mutation only has a probability of order 2-40 of occurring).

If the important events in the origin of language occurred in these shorter timeframes, then it is even more important (for a credible theory) that they built on pre-existing design information in the brain. This pre-existing cognitive faculty, I propose, is social intelligence.

Other theories presented at this conference (Bickerton, Newmeyer in this volume) propose that large parts of the cognitive faculties needed for language pre-dated its emergence, needing only minor extensions (eg the linking of different representations in the brain) for language to mature. In this respect, those theories are quite consistent with the speed limit constraint. The social intelligence theory goes further, proposing (as we shall see below) that essentially no new cognitive faculties or links were needed for the emergence of language.

3. Primate Social Intelligence

It has become clear in recent years that primates have an extra dimension of social intelligence which is not present in other land mammals. What is this extra social intelligence, and how did it arise ?

Social intelligence is needed sometimes to cooperate with one's peers in the social group, and often to compete with them. All social mammals have some system of ranking within their groups - an ordering of individuals from the alpha male downwards, which is recognised by all members of the group and which has a crucial effect on who gets what resources, such as food, shelter and mates. Gaining high rank has great positive effects on fitness and reproductive potential; so all species have evolved in ways which help individuals compete for rank.

For many land mammal species, the competition for rank is decided by strength and fitness, in one-on-one confrontations. However, for some reason (which might, for instance, be connected to their early tree-dwelling habits) for primates this is not the case. Primate competitions are not just between individuals, but between alliances; in any confrontation each combatant will call on his or her allies, and the contest will be decided by the relative strengths of the two alliances. Social success (= high rank) for a primate depends on cultivating the right alliances.

It has been suggested (Humphrey 1976; Harcourt 1988) that this is the cause of the escalation of primate social intelligence. In order to make the right alliances, one needs to be an acute observer of the social scene - to know who might make a good ally, and whether he or she might be prepared to ally with you - and a manipulator, knowing what social moves are needed to make alliances.

Whatever the reason, we know from many observations that most primates have an acute social intelligence, not found in other land mammals (see e.g Cheney and Seyfarth 1990; de Waal 1982;Tomasello & Call 1994). They recognise one another as individuals, know all about each others' kin and alliance relations, and can rapidly learn simple rules about who will do what in what circumstances.

This is illustrated by an observation of vervet monkeys in Ambolesi National Park by Cheney and Seyfarth (1990). By replaying recorded calls of selected individuals over hidden loudspeakers, Cheney and Seyfarth showed that mothers recognised their own infant's alarm calls and would go to their help. More interesting, however, was the reaction of other females present. They looked at the mother of the calling infant, often before she herself had reacted - showing that they knew the mother and infant as individuals, knew their kin relation, and knew how mothers typically react. This is typical of the complex social knowledge which we now know primates can acquire by observation, and use to their own advantage.

To have this social intelligence, monkeys and other primates need to do three things:

To represent in their minds information about social situations past and present - facts such as “Profumo is Shelley's mother” or “Shelley just screamed”
To learn, and represent internally, the causal regularities whereby one social situation leads to another; regularities such as “if X screams and Y is X's mother, then Y will react”
To combine their knowledge of the present social situation with their knowledge of causal regularities to predict what may happen next; in this case to know that “Profumo will react”

With these three abilities, primates are better able to predict social events, and to choose actions which further their own ends of stronger alliances and increased rank. For the moment we just concentrate on (1) the ability to represent social situations, before proposing that this internal representation of social reality is the basis of our internal representation of language meanings.

In order to be effective, internal representations in the brain should match the properties of the things they represent (Marr 1982; Johnson-Laird 1983). For instance, the representation of the visual field in the V1 visual cortex matches the two-dimensional character of the visual field itself. So the social representation in the primate brain should match the properties of social situations, which are:

Structured: A social situation consists of a number of individuals with attributes (identity, sex, rank, mood...) and relationships or interactions (mother-of, grooming, threatening...). The structural way in which these are combined is important; it matters who is grooming whom.
Complex and Open-ended: There may be several individuals in one incident, in a variety of realtionships; and several incidents together may constitute a particular situation; the set of possible social situations is a very large set.
Discrete-Valued: Many of the important variables which characterise social situations are discrete-valued (e.g identity, sex, rank, kin and alliance relations)
Extended in Space and Time: The incidents which make up a social situation may take place over several days or more, at different places
Dependent on Sense Data of all Modalities : Important information about the social situation may come from vision, hearing, smell, movement or bodily feelings; the social representation must be connected in the brain to all these sense data.

The internal representation of social situations in the primate brain should ideally have all the properties (1)..(5). This list bears a remarkable resemblance to the properties of language meanings. The meanings we can express in a sentence are structured, complex and open-ended, discrete-valued, extended in space and time, and involve sense data of any modality.

This leads to a key hypothesis: The internal representation of language meaning in the brain derives from the primate representation of social situations. No other candidate meaning structure has such a good fit to language meanings.

4. The Uses of Language and the Theory of Mind

While some use of language is internal, for thought processes, its main use is a social one involving other people; this suggests strongly that it is an outgrowth of social intelligence. More specifically, we use language to influence what other people know and intend to do. Thus if we did not have any idea about `what other people know' we could hardly use language effectively (Grice 1968; Dennett 1983). This is demonstrated in every sentence we speak; we constantly build a detailed knowledge of `what the hearer now knows', using this knowledge to guide what we say, and what we can miss out (e.g what can be referred to by a pronoun, because the hearer knows what it is).

A knowledge of what other people know, from moment to moment, is a key prerequisite for language. This mental faculty of knowing what others know is referred to as our `theory of mind' and it has been intensively studied in recent years - to understand its development through childhood, its possible role in autism (as a theory of mind deficit) and its development in other primates (Carruthers and Smith, 1996; Whiten 1991). The study of the theory of mind is, at the moment, quite unsettled, with divergent views on two key questions:

How does the human theory of mind work ? There are two main theoretical proposals: the `theory theory' that our knowledge of other minds is embodied in an explicit symbolic theory along the lines of a folk psychology, with axioms and rules of inference, from which we may deduce what others know and want (e.g. Gopnik and Wellman 1992) and the `simulation theory' that we mentally simulate others' thought processes and feelings, using our own mental resources as a model of theirs (e.g. Gordon 1986). Within these two camps are several variants, and a hybrid picture with some ingredients of both is now becoming popular (e.g. Perner 1996).
Do other primates have a theory of mind ? While it seems clear that monkeys and most primates have no theory of mind, the picture in great apes (particularly chimps) is far from clear. Some evidence from field studies suggests that they do (Whiten & Byrne, 1990), while recent evidence from laboratory studies is more negative (Povinelli 1996), suggesting that young chimps do not even understand some basic aspects of visual attention, which are pre-requisites for knowing what others know. However, the interpretation of both lines of evidence is controversial; at present we simply do not know whether chimps have a working theory of mind (Povinelli & Preuss 1996).

In the face of this uncertainty, what can we say about the relation of the theory of mind to language ? It seems likely that the adult human theory of mind is a complex, multi-faceted thing; we may well use a mix of folk-psychology rules, mental simulation and verbal introspection to figure out what others might be thinking. It is not surprising that several different theoretical models have been proposed. However, the earliest theory of mind in our ancestors could not have been so complex. Whatever its form, we can be fairly sure that the earliest primate theory of mind was an outgrowth of social intelligence - since a theory of mind is used for entirely social purposes. There is no point in working out the contents of another's mind if you are not going to interact with him or her socially.

I propose, therefore, that primate social intelligence was incrementally extended to include a working theory of mind (which was initially much simpler than today's adult human version). This theory of mind was an essential pre-requisite for language use, and the computational ingredients of the theory of mind were co-opted for language (this latter proposal will be made more concrete in section 6). By proposing that language evolved from a pre-existing theory of mind, we minimise the amount of new cognitive design required for language - as required by the evolutionary speed limit.

Some authors propose (Smith 1996) that language is necessary for the adult human theory of mind - that we cannot fully work out what others are thinking unless they can tell us about it. If so, this may still be consistent with a picture in which the adult, multi-faceted theory of mind requires language, but its primitive precursor did not.

5. Neurophysiological Evidence

If language evolved form social intelligence, and uses the theory of mind within social intelligence, then we would expect an overlap in the brain between the language centres and the location of social intelligence. The location of social intelligence located in the primate brain is not often addressed in the research literature (because laboratory experiments, which can investigate neural activity correlated with behaviour, typically do not have a rich social content); however, several lines of evidence (Passingham 1993) suggest that social intelligence is located in the ventral pre-frontal cortex (VPC):

VPC receives sense data of all modalities, mainly via the temporal cortex (e.g visual, auditory, somatosensory, olfactory), as required to construct social representations from multi-modal sense data .
VPC is strongly linked to the amygdala and hypothalamus. They are directly involved in many social/emotional responses (such as increasing blood pressure, pupil dilatation, altered breathing rate etc.) which often follow from appraisals of the social situation.
VPC is involved in cross-modal learning, in learning with time delays (as is required for many social causal regularities) and in so-called `voluntary' actions made in the absence of immediate external stimulus;all these are relevant to social action.
There is evidence that the size of VPC has increased, as a proportion of total brain volume, over primate evolution through apes to mankind (Deacon 1992) - as one would expect with an increase of social intelligence through this timeframe.
PET activation of the neighboring orbital prefrontal cortex has been demonstrated in humans performing a theory-of-mid task (Baron-Cohen et al, 1994)
Lesions to the prefrontal cortex in humans produce deficits in social and emotional behaviour (Damasio 1994)

These together imply that VPC is the main location of social intelligence in the primate brain. For language, Broca's area overlaps with VPC, although it is located a little behind it. Evidence from PET scans shows that the `higher' semantic/syntactic aspects of language (as opposed to motor and auditory aspects) are located most forward in the Broca area, most overlapping with VPC. For instance, measurements of PET activity associated specifically with verbs shows strong overlap with VPC (Fiez et al 1996).

Therefore the hypothesis that language evolved from social intelligence is quite consistent with neuroanatomical data; both language and social intelligence are strongly linked to Ventral Prefrontal Cortex.

6. A Computational Model of Social Intelligence and the Theory of Mind

The hardest problem to confront for a theory of language evolution is the rich structure of language and the complex mental computations which it requires. What computations in the brain allow us to understand complex sentences, to generate them from complex meanings, and to learn a language ? How did those powerful computations evolve ? Many accounts of language evolution cannot yet provide satisfactory answers to these questions.

The main strength of the `social intelligence' theory of language is that it makes just this link. We can build a working computational model of primate social intelligence (at the vervet monkey level), extend it to provide a minimal working theory of mind, and then show that the same computations, needed for the minimal theory of mind, provide a powerful working model of complex language. One working computational model describes social intelligence, the theory of mind, and language.

This model does not shirk the harder features of language such as complex verbs, nested meanings, ambiguity and anaphora; and it handles generation, understanding and learning in the same framework. I have built this computational model in Prolog to handle a significant fragment of English. There is not space here to give full details of the progression from social intelligence through the theory of mind to language, but I shall describe some key points of the model at these three levels.

For a computational model of social intelligence, we must first model the internal representation of social situations in the primate mind, using information structures which have the properties (1) - (5) of section 3 - which therefore match the known properties of the primate social world. For this we use scripts, which are simple tree-like discrete information structures denoting the essence of a social situation. A typical script, denoting the fact that `I bit Portia and then Portia bit me' is shown in figure 2.

Figure 2: A simple primate script

Each half of this diagram is itself a script, each denoting one scene. The first scene denotes `I bit Portia' and the second scene denotes `she bit me'; the arrow between them denotes the time-ordering of the two scenes.

These script structures are well-matched to the primate social domain, and also have the properties we require for language meanings; in fact they strongly resemble the tree-like meaning representations used in many computational linguistic implementations. By allowing suitable types of nodes and attributes on the nodes, and allowing trees of greater depth, we can represent the meanings of even the most complex sentences. But at the moment, for vervet monkeys, it seems that script trees only need to have a depth of about 4 nodes.

If monkeys simply recorded the social events of their lives in a script form, this in itself would not help them to compete socially. To be practically useful, primate social intelligence must be able to represent the causal regularities of the social milieu; to infer these regularities from experience; and to apply the regularities to predict outcomes and choose actions.

Inferring regularities from experience is done by a process of rule induction. Suppose the same monkey (as in the example above) goes around biting several other monkeys, and in each case gets bitten back. In each case she will record a script similar in form to that of figure 2, but with different individuals involved and different `irrelevant' details. There is a computational operation of script intersection which combines these different scripts to project out their common structure. Intersection is a simple computation, matching the nodes of different scripts together and projecting out common information; the result of this is shown in figure 3.

Figure 3: a rule script which describes a causal regularity of monkey social life

This script is similar in structure to the examples (such as that in figure 2) from which it was induced; but in stead of the identity of specific individuals, it has a variable (shown as id: ?X) which denotes `any individual'. So this script in effect says “If I bite any other individual ?X, then ?X will later bite me” - an important causal regularity of monkey life.

This rule induction process is very efficient and robust. It can induce a rule correctly from just a few examples, and will do so in the presence of a lot of extraneous information, irrelevant to the rule; the script intersection mechanism efficently prunes out the irrelevant detail. There is evidence that primates can learn social regularities from just a few examples; the selection pressure of intense social competition has given them just such an efficient learning mechanism.

Once having learnt such a causal regularity, our monkey needs to apply it to avoid future retaliations. Suppose she is contemplating biting yet another monkey, and so forms the script in her mind describing this forthcoming event. That script looks much like the left-hand side of figure 2, but with a different individual as victim. By an operation of script unification, her plan script can be combined with the rule script of figure 3 to deduce the consequence - that the victim will then bite her back. By this means, she can anticipate the consequences of her social actions, and perhaps alter her plans appropriately.

Script unification is also a simple computation, similar to the unification in unification-based grammar approaches to language (e.g Kaplan & Bresnan 1981; Kay 1984), and again involves matching scripts together node by node. Script intersection and unification together form a simple algebraic structure, the script algebra, which underpins the self-consistency of this model of social intelligence.

In this example, there is a `cause' script such as “I bite Nero” and an `effect' script such as “Nero bites me back”. The effect script depends on the cause script , through the identity (Nero); so the effect script is a function of the cause script. In this respect, the rule script of figure 3 acts as a script function, which may be written as Cause Effect, in this case (I bite X) (X bites me). The double-ended arrow reflects the fact that the function is reversible, and can be applied in either direction - forward (to predict consequences) or backwards (to plan to achieve goals). If a monkey wanted to get bitten, he could plan to do so.

This functional view of primate rule scripts will link closely with functional approachs to language syntax, such as Categorial Grammars and situation semantics. However, in the example above, the effect script depends on the cause script only through the identity of an individual, which can have only a finite number of different values; so there can only be a bounded number of possible result scripts. The rules we need for general primate social intelligence are bounded script functions, and are not yet powerful enough for language. Much more powerful functions are used, for instance, in categorial grammars (Oerhle et al 1988), Montague semantics (Dowty et al 1981) and situation semantics (Barwise & Perry 1983).

To summarise the model of general primate social intelligence: monkeys record social events as scripts; they use a process of script intersection to learn causal regularities (rule scripts) from experience; and they use the operation of script unification to apply these rule scripts, to plan and predict. Rule scripts can also be regarded as bounded, reversible script functions. This model can be successfully used to analyse many observations of primate social learning and behaviour, such as those of Cheney and Seyfarth - including the learning and use of alarm calls, dominance rules, and attachment behaviour (Worden 1996a).

How can we extend this model to a theory of mind ? While not attempting to model the complex, multi-faceted adult human theory of mind, we can make some simple extensions to the model of social intelligence to make a candidate `minimal' theory of mind (Worden 1996b).

The first extension required is to use deeper script trees. For general social intelligence, one may need to represent that “Cassius is grooming Nero”, while for a theory of mind, one needs to represent “Brutus knows that Cassius is grooming Nero”. This requires a deeper tree structure; roughly, the top of the tree represents “Brutus knows that...” and the lower sub-tree represents what he knows (in the same formalism as the representation of what I know). One of these deeper trees is shown in figure 4.

Figure 4: A deeper script tree, as required for the theory of mind

The script of figure 4 represents the knowledge that “some other individual Z knows that if any individual X makes a food cry, then X has some food”. This is the sort of knowledge a chimp would need to have, to reason “I had better not make a food cry, or Z will know I have food and take my food away from me”. Field observations suggest that chimps can do this sort of reasoning (Whiten & Byrne 1991).

With these deeper tree structures, scripts are now in principle capable (given suitable node types and attributes) of representing an unbounded range of nested meanings, like those expressible in language.

To build a working theory of mind, a more subtle extension of the model of social intelligence is required. In order to represent the knowledge and possible reasoning of one's peers, one requires some rather general rules such as “If Z is present and S happens, then Z will know that S” or “If I know rule script R, then probably Z also knows rule script R”. Without such broad-range rules, an infeasibly large number of specific rules would be needed to represent a useful portion of another's knowledge and reasoning.

The example rules above can both be expressed as rule script trees or functions. Rewriting them in the semi-formal, functional notation:

(Z is present and S occurs) (Z knows that S)

(Rule R holds) (Z knows that rule R)

These functions have two interesting properties:

In each case, the `cause' part (to the left of the arrow) is an ordinary, un-nested script; while the `effect' part is one of the deeper, nested script trees. These rules can convert a shallow script tree to a deeper one, or vice versa.
Both rules contain variables (S and R) which denote not just `any individual' but `any script'. These variables can have an unbounded set of (script) values, so the result of the function has an unbounded set of possible values. They are unbounded script functions.

With these extensions, to deeper script trees and unbounded script functions, we can build a working computational model of a primate theory of mind (Worden 1996b); I believe it is the only such model to have been constructed. In this model, rule learning is still done by the process of script intersection (rules can still be learnt robustly from just a few examples), and rules are applied by the process of unification. Because the rules are unbounded script functions, the operations for learning and applying them are more computationally complex, but still obey the relations of the script algebra. Script functions are still reversible.

This gives a working computational analysis of how a rule like that in figure 4 could be learnt from a few examples, and how it might then be applied to deceive other chimps. However, there is not yet definitive evidence that chimps even have a theory of mind (Povinelli 1996), let alone the detailed evidence we would need to test the computational model; possibly the whole capability evolved after our divergence from the chimpanzee line.

Reversible, unbounded script functions are an essential part of the theory of mind, and are the key ingredient of the computational model of language. Intuitively, language needs the unbounded functions to build up an unbounded set of possible meaning scripts (trees of arbitrary depth for nested meanings), and it needs reversible functions to support the reversible processes of language generation and understanding. How this works is outlined in the next section.

7. The Computational Model of Language

The computational model of language is based on two key hypotheses:

Language meanings are scripts : the meaning of every sentence is a script, with suitable nodes, attributes and values to represent the full range of language meanings, and using scripts of greater depth to represent nested clauses and phrases. A script meaning structure is not a mental image, but may be used as an intermediate stage in constructing one from heard language.
Every word is a script function : Words are reversible, unbounded script functions, and so can be applied in one direction for generation, the opposite direction for understanding; and they can build up an unbounded set of meanings by repeated function application.

The use of scripts as a meaning structure has many precedents in computational linguistics, going back to the scripts introduced by Schank and Abelson (1977), and having many parallels with the conceptual structures analysed by Jackendoff (1991), Pinker(1989) and others.

To illustrate how words act as script functions, I shall use two typical English words. Figure 5 shows the script function for a proper noun, the word `Fred'.

Figure 5: a word script function for a proper noun.

Word script functions are always shown as trees with one top script holder `ho'node, below which there are two script `sr' nodes. These two script subtrees are the argument and the result of the function; if the subtrees are written as S1 and S2 , the reversible function can be written as S1 S2, or f(S1) = S2 with the inverse function f **-1(S2) = S1.

The script function of figure 5 can be written as `Fred' X, where the left-hand subtree `Fred' is a script with one scene (the `se' node) in it denoting just the sound “Fred”, and the right-hand subtree X has one scene with one entity (the `en' node), denoting the individual Fred, in it; this is the the internal mental representation of that person. So this function simply converts (reversibly) between the sound `Fred' and the internal representation of that individual.

Somebody who has learnt this script function can use it in two ways. On hearing the sound `Fred', he can use the function `forwards' to convert it into a mental representation of that person (e.g. to understand a one-word `sentence'). Alternatively, from a representation in his own mind of the person Fred, he can use the function `backwards' to give the sound `Fred', which he can then say - generating a one-word sentence.

Nouns are the simplest script functions of a language. As a more complex example, the script function for a typical verb `gives' is shown in figure 6.

Figure 6 : Script function for the word `gives'

`Gives' is a three-place predicate, being used to describe `A gives B C', where A, B and C can each denote a wide range of entities - possible donors, recipients and gifts respectively. This is reflected in the left-hand subtree of the function of figure 6, which contains four scenes (the `se' nodes) in strict time order; these are the `A', `gives', `B' and `C' scenes together. The right-hand subtree is the script representation of the act of giving - in which an agent (the left-most entity node) acts on a patient (the next entity node) in such a way that the patient then possesses the gift (the right-hand scene node, with patient and gift below it).

Thus the whole script function can be understood as `A gives B C' G, where A, B and C are scenes describing entities, and G is a scene describing the act of giving. The three curved arrows denote variables shared between argument and result. The first arrow ensures that whatever subtree (in the argument) describes the donor A, the same subtree appears (in the donor role) in the complex script G describing the giving; the second curved arrow does the same for the recipient, and the third arrow for the gift. Because these shared variables are whole scripts, they can have an unbounded set of values. This function could be used to build a meaning script G for `Fred gives Joe a book' or `seeing a man with a green hat gives me the idea to buy one myself' and so on.

On hearing the sentence `Fred gives Joe a book' , we first apply simple noun script functions like figure 5 to find the meaning scripts for `Fred' ,'Joe' and `book'; then we can finally apply the function of figure 6 to give the full `giving' script, with appropriate entities in the three roles - making an internal representation of the full meaning. To start from the meaning, and generate a sentence, we proceed in the reverse direction, using the inverse functions - first using the function of figure 6 in reverse to `break apart' the giving script into scenes for its three roles, the converting each of the three role entities into word sounds by script functions like figure 5.

This script function model gives a working account of language generation and understanding. By repeated function application as in this example, the script meanings in words can be combined to build up meaning structures of arbitrary complexity, just as we do every day. Typically, a noun gives a meaning script S, which is converted by the application of other script functions (words) to more complex meanings, f(S), g(f(S)), h(g(f(S))), and so on; or in reverse for generation.

The one structure of figure 6 embodies the full syntactic and semantic constraints of the word `gives':

It denotes an act of giving (the meaning of the right-hand subtree)
The act takes place in the present (defined in the top scene node of RH subtree)
It selects for a single, human, third person donor (left-most entity node)
The words denoting `A gives B C' must appear in just that time order (time-order arrows between scenes in the left subtree)

There are two lines of evidence that the script function approach will work not just for a toy subset of language, but for a wide range of mature languages:

I have built a Prolog implementation of the model which works for a 400-word subset of English, including many of the complexities of adult language - including all parts of speech, complex verbs, tense, aspect, mood, passives, anaphora, gaps, ambiguity, and so on. It can work in either the generation, or understanding direction, handling sentences such as the example above and many a great deal more complex.
The central operation of applying script functions involves unification, similar to that in unification-based grammars. The formalism can be placed in close correspondence with any unification-based grammar model, such as LFG (Kaplan & Bresnan 1981), HPSG (Pollard & Sag 1987), Categorial Grammars (Oerhle et al 1988) or the Core Language Engine (Alshawi et al 1992). These formalisms have been applied to many languages, and can handle essentially all language features; this implies that the script function model can handle the same features, in similar ways.

As just one example of how mechanisms rooted in social cognition can tackle hard problems of language, consider the resolution of sentence ambiguities. Primates encounter ambiguity in everyday social situations, and need an efficient way to deal with it. When facing a social situation which might be described by two or more script representations, one obvious way to cope is to take the intersection of the two scripts (as used for learning), which projects out their common meaning, leaving out their differences, and use the intersection script for social inference or planning. We can use the same trick in language understanding; whenever any ambiguity leads to two or more alternative meanings, take the script intersection of the alternatives and carry on processing with just that one meaning script. This avoids any combinatoric explosion of parse structures in the face of multiple ambiguities, a problem which has plagued computational language implementations. However, it only works well for languages which obey the Greenberg-Hawkins universals (Greenberg, 1966; Hawkins 1994); for other languages, script intersection of ambiguous structures would destroy too much meaning. In this account, languages themselves have evolved to obey the universals, in order to be easily understandable in the presence of ambiguity, as suggested by Kirby (this volume). This offers an alternative account of language universals to the `working memory' accounts given by Briscoe (this volume) andHawkins (1994).

However, the most distinctive feature of the model is its account of language learning. Any word script function, such as those of figures 5 and 6, may be learnt by the same mechanism used for general primate social learning:

Suppose a child knows Fred, and therefore constructs a representation of him, in her social representation, whenever he is present. On these occasions, she frequently hears the sound `Fred' being said; therefore she also constructs a script containing the sound `Fred'. On any one such occasion, she will construct a combined script with both parts, similar to that of figure 5, but possibly with a lot of other information as well - about other sounds, other individuals, other things going on, etc. Having observed a few such occasions and constructed these scripts, she can use the script intersection learning mechanism to find the common structure in all of them, just as in the primate social rule-learning mechanism of section 5. The script intersection process rapidly prunes out any irrelevant detail which is not common to all examples; the only part common to all the examples is the structure of figure 5, which she learns.
If the same child observes a scene of giving, she will construct a script which describes it in her social representation. This script has a structure similar the the right-hand branch of figure 6, but with specific entities in the roles of donor, recipient and gift. Suppose at the same time she hears a parent comment `Joe gives Fred the ball' and has already learnt the words for `Joe' ,'Fred' and `ball' as above. From the heard sounds, and her partial understanding, she constructs a script similar to the left-hand branch of figure 6. So again she can make a structure similar to the whole of figure 6, but possibly with extraneous information in it from other things going on at the same time. Again, taking four or five examples, and intersecting them together, leaves only the common structure (figure 6) and prunes out all the irrelevant extra information. She has then learnt the syntax and semantics of the word `gives'.

So language learning is a particular example of the efficient and robust social learning mechanism which we share with other primates. Any part of speech can be learnt from a few positive examples, where one hears the word in use and is able to observe (or deduce) the meaning of the scene. I have implemented this learning mechanism in Prolog and shown that it can `bootstrap' learn a 50 word initial subset of English from no prior linguistic knowledge, and there seems to be no obvious limit to its further learning ability; it can learn any part of speech by the same mechanism.

Interestingly, we need to postulate a further application of the same mechanism to learn regular inflections, such as the productive rule that `-ed' implies past tense in a verb. Individual past verbs such as `gave', `fitted' ,'hit' and so on can be learnt from examples by the basic mechanism described above; but to learn that `-ed' implies past tense, one needs to have learnt several specific past tense verbs, and then take a script intersection of their script functions (not of the original examples). Evidence on Specific Language Impairment (Gopnik, in this volume) can be understood, in this theory, as a deficit of the further learning mechanism, which does not affect the basic learning mechanism. In this interpretation, a genetic defect impacts only the further learning mechanism for regular morphology rules.

This model of language learning is quite different from the `principles and parameters' picture introduced by Chomsky. To learn a language is not to set a few parameters, but to learn the script functions individually for hundreds, then thousands, of words. The syntax of a language is embodied in those script functions. The script learning mechanism is robust, efficient, and has plenty of evidence to learn from over years of childhood (any word can be learnt from about ten clear examples of its use, which a child is does not have to wait long to hear). This model of learning works well computationally, seems to agree well with a lot of evidence about language acquisition, and suffers none of the conceptual difficulties which the `principles and parameters' theory has with language change and bilingualism (which would seem to require parameters with intermediate or multiple values, thus losing much of the attractiveness of the parameter idea).

In summary , the computational model of language which follows from primate social intelligence can handle the complexities of adult language in both understanding and generation, and gives a general working model of language acquisition.

8. Conclusions

I have described a theory of how language evolved from primate social intelligence. This theory does well by the four criteria mentioned in the introduction:

Evolutionary constraints: because of an evolutionary speed limit, the amount of new information in the design of the human brain since our divergence from chimps cannot be more than 5 Kbytes, equivalent to just one page of text. Therefore the computational faculties underlying language need to be based on some pre-existing mental capacities; in this theory, language is based on primate social intelligence which is now known to be complex and subtle - a suitable basis for the complexity and subtlety of language.
Language Use : Language is used for social purposes, to alter what other people believe or intend. Therefore we expect language to be tightly linked to social intelligence, particularly to that part of social intelligence - the theory of mind - which is specifically about other peoples' knowledge and goals.
Neuroanatomy: Both language and primate social intelligence seem to be closely associated with the ventral prefrontal cortex - supporting the idea that they have a common evolutionary origin.
Language Computations: The meaning representations required for social intelligence - being structured, open-ended, multi-sensory, extended in space and time, and largely discrete-valued - are remarkably well suited to serve as the basis for language meanings. Social intelligence requires a robust learning mechanism, which, suitably extended to support the learning of a theory of mind, provides just the operations needed for language learning. The reversible, unbounded script functions which primates need for complex social planning and anticipation, are the same operations we use for reversible language generation and understanding. This computational model of language works.

So, as a theory of how language arose - largely by reuse of pre-existing structures and operations in the brain for social intelligence and the theory of mind - the theory agrees with a wide range of data and constraints. However, it does not tell us when or why language arose - what were the particular selection pressures and events in pre-human history which led to these faculties being co-opted for language. In this sense it is complementary to other theories, discussed at this conference, which link the emergence of language to particular pressures such as the need for a more efficient grooming mechanism (Dunbar, this volume), to changes in male/female social relations (Power, this volume), to sexual selection (Miller, 1992) or other factors.

On current evidence, it seems that some of the pre-conditions for language were satisfied with the emergence of homo erectus , 2 million years ago; there was a significant enlargement in brain size, correlated with the onset of full terrestiality and bipedalism (Aiello, this volume). Possibly at this time the larger group sizes needed for a terrestial habitat led to a habit of vocal grooming (Dunbar, this volume), consonant with the observed neuroanatomical changes in the Broca region, and changes to vocal anatomy (Aiello, this volume); using the vocal channel in more sophisticated and controlled ways, for a social purpose, but not yet to convey complex meanings. This adaptation was successful and stable for 1.5 million years.

Then, at some time within the era of homo sapiens in the last 250,000 years, some change in social organisation, possibly linked to altered male/female social relations (Power, Knight, this volume; Miller, 1992) gave a need to express complex meanings in a social context. This need was not satisfied by the evolution of some new brain design; it was satisfied by co-opting a rich pre-existing social intelligence and theory of mind, already well honed by social competition - well adapted for rapid, robust learning and for processing complex, ambiguous social meanings. When these capabilities were co-opted for the purpose of communication, language was born.

References

Aiello, L. (1996) the foundations of human language, in this volume.

Alshawi, H. et al (1992) The core language engine, MIT press, Cambridge, MA

Baron-Cohen, S., H. Ring, J. Moriarty, B. Schmitz, D. Costa, and P. Ell (1994), Br. J. Psychiatr. 165, 640 -649

Barwise, J. and J. Perry (1983) Situations and attitudes, MIT press, Cambridge, MA.

Bickerton, D. (1996) Catastrophic evolution: the case for a single step from protolanguage to full human language, in this volume

Briscoe, T. (1996) Parsability as a constraint on the evolution of language, in this volume.

Byrne, R.W. and A. Whiten (1992) Cognitive evolution in primates: evidence from tactical deception, Man 27, 609-627

Carruthers, P. and Smith, P.K. (eds) (1996) Theories of theories of mind, CUP.

Cheney, D.L. and R.M.Seyfarth (1990) How monkeys see the world, University of Chicago Press

Damasio, A. (1994) Descartes' error: emotion, reason and the human brain, Grosset/Putnam, New York

Deacon, T. W. (1992) Brain-Language coevolution, in J. A. Hawkins and M. Gell-Mann (eds) The Evolution of Human Languages, Addison-Wesley, 49-83

Dennett, D. C. (1983) The Intentional Stance, Behavioral and Brain Sciences 3, 343-350

Dunbar, R. (1996) Gossip: a social function for the evolution of language, in this volume

de Waal, F. (1982) Chimpanzee politics: power and sex among apes, Johns Hopkins University Press

Dowty, D. R. , R. E. Wall and S. Peters (1981) Introduction to Montague Semantics, Kluwer

Fiez, J. A., M. E. Raichle, D. A. Balota, P. Tallal and S. E. Petersen (1996) PET activation of Posterior Temporal Regions during Auditory Word Presentation and Verb Generations, Cerebral Cortex 6 : 1-10

Gopnik, A. and H. Wellman (1992) Why the child's theory of mind really is a theory, Mind and language 7, 1-2,145-71

Gopnik, M. (1996) Genes, grammars and other curiosities, in this volume

Gordon, R. M. (1986) Folk psychology as simulation, Mind and Language, 1, 158-71

Greenberg, J. H. (1966) Some universals of grammar with particular reference to the order of meaningful elements, in Greenberg, J. H. (ed) Universals of language, 2nd edition, MIT press.

Grice, H. P. (1968) Utterer's meaning, sentence-meaning and word meaning, Foundations of language 4:225 - 242

Harcourt, A. H. (1988) Alliances in Contests and Social Intelligence, in Machiavellian Intelligence: Social intelligence and the evolution of intellect in monkeys, apes and humans, ed. Byrne, R.W. and A. Whiten , Clarendon Press

Hawkins, J. A. (1994) A performance theory of order and constituency, CUP

Humphrey, N. K. (1976) The Social Function of Intellect, in Growing Points in Ethology, ed. P. P. G. Bateson and R. A. Hinde, Cambridge

Jackendoff, R. (1991) Semantic Structures, MIT press, Cambridge, Mass.

Johnson-Laird, P. N. (1983) Mental Models, Cambridge University Press, Cambridge

Kaplan, R. M. and J. Bresnan (1981) Lexical Functional Grammar: a Formal System for Grammatical Representation

Kay, M. (1984) Functional Unification Grammar, in Proc. COLING-84.

Kirby, S. (1996) Fitness and the selective adaptation of language: two explanations for universals, in this volume.

Knight, C. (1996) Ritual/speech co-evolution: a `selfish gene' solution to the problem of deception, in this volume.

Marr, D. H. (1982) Vision, W. H. Freeman

Miller, G. F. (1992) Sexual Selection for Protean Expressiveness: a New Model of Hominid Encephalisation, paper delivered to the 4th Annual Meeting of the Human Behaviour & Evolution Society, Albuquerque, New Mexico.

Newmeyer, F. (1996) On the supposed `counterfunctionality' of universal grammar: some evolutionary implications, in this volume

Oehrle, R. T., E.Bach and D. Wheeler (eds) (1988) Categorial grammars and natural language structures, Reidel, Dordrecht

Passingham, R. (1993) The frontal lobes and voluntary action, OUP.

Perner, J. (1996) Simulation as explication of predication-implicit knowledge about the mind: arguments for a simulation-theory mix, in in Carruthers, P. and Smith P. K. (eds) Theories of Theories of mind, CUP.

Pinker, S (1989) Learnability and Cognition: the Acquisition of Argument Structure, MIT press, Cambridge, Mass.

Pollard, C. and I. Sag (1987) Head-driven Phrase Structure Grammar, University of Chicago Press

Povinelli, D. (1996) Chimpanzee theory of mind ? the long road to strong inference, in Carruthers, P. and Smith P. K. (eds) Theories of Theories of mind, CUP.

Povinelli, D. J. and T. M. Preuss (1996) Theory of mind:evolutionary history of a cognitive specialisation, Trends in Neurosciences, in press.

Power, C. (1996) the vocal grooming and gossip theory of language origins: can cheap signals be reliable?, in this volume

Schank, R.C. and R.P.Abelson (1977) Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures, Lawrence Erlbaum Associates, Hillside, New Jersey

Smith, P.K. (1996) Language and the evolution of mind-reading, in Carruthers, P. and Smith P. K. (eds) Theories of Theories of mind, CUP.

Tomasello, M. and J. Call (1996) Social cognition in monkeys and apes, yearbook of physical anthropology 37:273-305

Whiten, A.(ed) (1991) Natural Theories of Mind: Evolution, development and simulation of everyday mindreading, Blackwell

Worden, R. P. (1995) A Speed Limit for Evolution, Journal of Theoretical Biology 176, 137 - 152

Worden, R.P. (1996a) Primate Social Intelligence, to be published in Cognitive Science

Worden, R.P. (1996b) The Primate Theory of Mind, paper in draft