The Singing Neanderthals (2005), by Steven Mithen

Steven Mithen, Professor of Archaeology at the University of Reading, is a leading figure in the field of cognitive archaeology and a Fellow of British Academy. In 1996, drawing together many diverse strands, he described the possible evolutionary origins of the human mind in his seminal The Prehistory of the Mind: A Search for the Origins of Art, Science and Religion, in which he proposed that full consciousness only arose when the previously-separate cognitive domains that make up the mind became integrated by a process he described as “cognitive fluidity” (Mithen, 1996). Subsequent archaeological discoveries in Africa forced Mithen to revise some of his timescales without affecting the validity or otherwise of his theory (McBrearty & Brooks, 2000). However Mithen, who is himself a lover of music, felt that its role in the development of language had largely been dismissed as “auditory cheesecake”, as Steven Pinker had described it.

Mithen pleaded guilty to himself failing to consider music in his 1996 work. Accordingly, in The Singing Neanderthals, he set out to redress the balance. He begins by considering language.

Language is a very complex system of communication which must have evolved gradually in a succession of ever more complex steps generally referred to as proto-language. But what was the nature of this proto-language? There are two schools of thought – “compositional” and “holistic”. The compositional theories are championed by Derek Bickerton, who believes that early human species including the Neanderthals had a relatively large lexicon of words related to mental concepts such as “meat”, “fire”, “hunt”, etc (Bickerton, 1990). These words could be strung together, but in the absence of syntax, only in a crude fashion. Mithen, however, favours the holistic view, which is championed by linguist Alison Wray. Wray believes that proto-language comprised utterances that were holistic i.e. they conveyed complete messages. Words – where the utterances were segmented into shorter utterances – only occurred later.

Mithen presents evidence that there is a neurological basis for music and that this is distinct from language. He draws on a variety of sources: studies of brain-damaged patients, individuals with congenital impairments, brain activity scans and psychological tests carried out on both children and adults.

Just as definite regions of the brain are involved with language, and that damage to these regions can selectively or totally impair linguistic skills, so is the case for music. The musical regions appear to be primarily located in the right hemisphere of the brain, in regions corresponding to the Broca’s area on the left. However there does seem to some linkage between the linguistic and musical regions.

Infant directed speech (IDS) – that is to say the way in which adults and indeed quite young children speak to infants – has a musical quality that infants respond to. Mithen believes that infants have a highly-developed musical ability, but that this is later suppressed in favour of language. For example, infants often have perfect pitch, but very few adults do. Relative pitch is better that perfect pitch for language acquisition, as the latter would result in the same word spoken by two speakers being interpreted as two different words.

This Mithen argues may give us an insight into how Early Humans, such as Homo erectus and the Neanderthals communicated with one another. He falls back on the notion that “Ontogeny recapitulates Phylogeny”, i.e. our developmental history mirrors our evolutionary history. He rejects the notions that music arose from language or that language arose from music. Instead, he argues, music and language both evolved from a single system at some stage in our primate past.

A central point of Mithen’s theory is emotion, which he believes underpin our thoughts and actions. A fear response, for example, was necessary to force a flight response from a dangerous predator. Conversely, happiness was a “reward” for successfully completing a task. There are four basic emotions – happiness, sadness, fear and anger, with more complex emotions such as shame and jealousy being composite of these four. Emotions were crucial for the development of modern human behaviour and indeed for the development of any sapient species. Beings relying solely on logic, such as Vulcans, could never have evolved.

Experiments suggest that apes and monkeys and humans – and by implication Early Humans – all share the same basic range of emotions. Now Mithen pulls together two ideas – firstly, music can be used to both express and manipulate human emotions; secondly the vocalizations of primates serve much the same function in these animals. For example vervet monkeys use predator-specific calls to warn others of their kind. Thus a human would shout “get up the nearest tree, guys, there’s a leopard coming” but a vervet would utter a single specific “holistic” call conveying the same meaning. The difference is that the human utterance is referential, referring to a specific entity and instructing a specific response – a command”. By contrast the vervet monkey is using its utterance to manipulate the emotions of its fellows – the call is associated with a specific type of danger, inducing fear. The fear achieves the caller’s desired effect by inducing its fellows to climb into the trees for safety.

Mithen believes that in Early Humans, living in groups, extended child-rearing and the increased use of gestural communications led to an extention of the “holistic and manipulative” vocalization of monkeys and other primates into a communication mode he refers to as “Hmmmmm” – Holistic, manipulative, multi-modal, musical and mimetic”, with dance and mime being added to the repertoire. He cites a circular arrangement of animal bones at a Middle Pleistocene Homo heidelbergensis (the common ancestor of both modern humans and the Neanderthals) site at Bilzingsleben, in Germany and claims it was a demarcated space for song and dance routines, in other words a theatre. As with the vocalizations of vervet monkeys, Hmmmmm was intended to manipulate the actions of others. It was more complex than the vocalizations of any present-day non-human primate, but less so than that of modern humans. (For another viewpoint on the role of hominin group living in language evolution, see Dunbar (1996).)

The Hmmmmm of the large-brained Neanderthals was richer and more complex than that of earlier humans. It enabled them to survive in the harsh conditions of Ice Age Europe for 200,000 years, but their culture remained static and their scope for innovation limited by the lack of a true language which would have enabled complex ideas to be framed. Indeed, the sheer conservatism, lack of innovation, symbolic and artistic expression in the Neanderthal archaeological record is, to Mithen, proof that they lacked language. He dismisses the “problem” of the Châtelperronian culture, where there is indeed evidence of innovation and symbolic behaviour. Although the archaeological record is ambiguous with some claiming that the Châtelperronian horizon predates the Aurignacian horizon and the arrival of modern humans (Zilhão et al, 2006), Mithen believes this is incorrect and the Châtelperronian is a result of Neanderthal acculturation from modern humans. The coincidence of independent origin just before the arrival of modern humans is just too great to be believed, he states.

If Neanderthals lacked language, how did Homo sapiens acquire it? Mithen believes that language as we know it came about through the gradual segmentation of holistic utterances into smaller components. Though initially holistic, utterances could be polysyllabic, for example suppose “giveittome” was a holistic, polysyllabic utterance meaning “give it to me”. But if there was also a completely different utterance, “giveittoher”, meaning “give it to her”, then in time the “givitto” part would become a word in its own right. That two random utterances could have a common syllable or syllables that just happened to mean the same thing, and that this could happen often enough for a meaningful vocabulary to emerge strikes me as being implausible. However Mithen cites a computer simulation by Simon Kirby of Edinburgh University in support. Mithen also claims that Kirby’s work is turning Chomsky’s theory of a Universal Grammar on its head. Chomsky claimed that it was impossible for children to learn language without hard-wired linguistic abilities already being present, but Kirby’s simulations apparently suggest the task is not as daunting as Chomsky believed.

Language would have been the key to the “cognitive fluidity” proposed in Mithen’s earlier work (Mithen 1996) as the basis of modern human behaviour. Language would have enabled concepts held in one cognitive domain to be mapped into another. Derek Bickerton believes that language and the ability for complex thought processes arose as a natural consequence of the human brain acquiring the capacity for syntax and recursion (Bickerton, 1990, 2007) but if these capacities were also required for “Hmmmmm” then if the Kirby study is to believed, a changeover to full language could have occurred gradually and without any rewiring of the brain. Mithen argues that this was the case and that the first wave of modern humans to leave Africa, who established themselves in Israel 110-90,000 years ago (Lieberman & O’Shea, 1994; Oppenheimer, 2003) were still using “Hmmmmm”. By 50,000 years ago, “Hmmmmm” had given way to modern language and at this point modern humans left Africa, eventually colonising the rest of the world and replacing the Eurasian populations of archaic humans. That language was crucial to the emergence of modern human behaviour has also been suggested by Jared Diamond (Diamond, 1991).

“Hmmmmm”, for its part, did not disappear and music retains many of its features.

To sum up, this is a fascinating theory that clearly demonstrates that music is as much a part of the human condition as is language. Its main weakness as a theory is that it cannot, by definition, be falsified since all the “Hmmmmm”- using human species such as the Neanderthals are now extinct.

Another problem for me is the idea that anatomically-modern humans got by with “Hmmmmm” for at least 100,000 years and only gradually drifted into full language by the method outlined above. Given that creoles can arise from pidgins in a single generation, this seems implausible, unless we allow some change in the mental organization of modern humans occurring after then.

Mithen mentions the FOXP2 gene, which has been shown to have a crucial role in human language. One study suggested the human version of this gene emerged some time after modern humans diverged from Neanderthals (Enard et al, 2002). Supporters of a “late emergence” for modern human behaviour such as Richard Klein cited have cited this as evidence that otherwise fully-modern humans did in fact undergo some form of “mental rewiring” as late as 50,000 years ago (Klein & Edgar, 2002). However it has since been shown that the Neanderthals had the same version of the gene that we do (Krause et al, 2007), weakening the “late emergence” argument.


Bickerton D (1990): “Language and Species”, University of Chicago Press, USA.

Bickerton D (2007): “Did Syntax Trigger the Human Revolution?” in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

Diamond J (1991): “The Third Chimpanzee”, Radius, London.

Dunbar R (1996): “Grooming, Gossip and the Evolution of Language”, Faber and Faber, London Boston.

Wolfgang Enard, Molly Przeworski, Simon E. Fisher, Cecilia S. L. Lai,
Victor Wiebe, Takashi Kitano, Anthony P. Monaco & Svante Paabo (2002): Molecular evolution of FOXP2, a gene involved in speech and language, Nature, Vol. 418 22 August 2002.

Klein R & Edgar B (2002): “The Dawn of Human Culture”, John Wiley & Sons Inc., New York.

J. Krause, C. Lalueza-Fox, L. Orlando, W. Enard, R. Green, H. Burbano, J. Hublin, C. Hänni, J. Fortea, M. de la Rasilla (2007): The Derived FOXP2 Variant of Modern Humans Was Shared with Neandertals, Current Biology, Volume 17, Issue 21, Pages 1908-1912.

Daniel E. Lieberman and John J. Shea (1994): Behavioral Differences between Archaic and Modern Humans in the Levantine Mousterian, American Anthropological Association.

McBrearty S & Brooks A (2000): “The revolution that wasn’t: a new
interpretation of the origin of modern human behaviour”, Journal of Human Evolution (2000) 39, 453–563.

Mithen S (1996): “The Prehistory of the Mind”, Thames & Hudson.

Mithen S (2005): “The Singing Neanderthal”, Weidenfeld & Nicholson.

Mithen S (2007): “Music and the Origin of Modern Humans”, in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

Oppenheimer S (2003): “Out of Eden”, Constable.

João Zilhão, Francesco d’Errico, Jean-Guillaume Bordes, Arnaud Lenoble, Jean-Pierre Texier and Jean-Philippe Rigaud (2006): Analysis of Aurignacian interstratification at the Châtelperronian -type site and implications for the behavioral modernity of Neandertals, PNAS August 15, 2006 vol. 103 no. 33.

© Christopher Seddon 2009


Grooming, Gossip and the Evolution of Language(1996), by Robin Dunbar

Robin Dunbar (born 1947) is a British anthropologist and evolutionary biologist specialising in primate behaviour.

His 1996 work Grooming, Gossip and the Evolution of Language pulls together his work on the social group sizes of various primates, including humans and the correlation with neocortex size relative to body mass. Although not intended as a work of popular science, the book’s style is very accessible and it may be read by non-specialists as well as academics.

The work appears to have been a considerable influence on the science-fiction novel Evolution, by Stephen Baxter.

The following is a chapter-by-chapter summary of this work:

Chapter 1. “Talking Heads”.
Describes the experience of being groomed by a monkey from the viewpoint of one of its peers: notes the similarities with the innuendoes and subtleties of everyday social experience of humans. The social life of humans, with its petty squabbles, joys and frustrations, is not unlike that of other primates. But there is one major difference – human language.

Curiously, when humans talk, most of the conversation is social tittle-tattle. Even in common-rooms of universities, etc, conversations about cultural, political, philosophical and scientific matters account for no more than a quarter of the conversation. The Sun devotes 78% of its copy to “human interest” stories and even the Times only devotes 57% to serious matters. Our much-vaunted capacity for language seems to be mainly used for exchanging information on social matters [anybody doubting Dunbar’s assertion needs look no further than Facebook]. Why should this be so?

Chapter 2. ”Into the Social Whirl”.
The answer may lie with our primate heritage. Monkeys and apes are highly sociable, their lives revolving around the small group of individuals with whom they live. They could not exist without their friends and relations.

Our ape ancestors faced disaster 10 million years ago as the climate became dryer and colder, causing their forest habitat to retreat. Monkeys became a nuisance, able to eat unripe fruit because their stomachs contain enzymes to neutralise the tannin that they contain that would give apes and humans an upset stomach. About 7 million years ago, one population of apes found itself forced out onto the savannahs bordering the forests. Mortality would have been desperately high but those that survived did so because they were able to exploit the new conditions.

All primates have had to deal with predators ranging from various felids, canids, monkey-eating eagles and even other primates. There are two main ways of dealing with primates. One is to be larger than any likely predator; the other is to live in a large group. The latter option reduces the risk in a number of ways: more eyes to detect a marauding predator; strength in numbers – a large group can drive off or even kill a predator; finally a large number of group-members fleeing in different directions will confuse a predator, often for long enough for all to get away.

But group living has disadvantages too. Social animals have to strike a balance between the dangers of predators and the problems of social tensions. The primate solution is for small groups to form coalitions within the larger overall living group. Such alliances take many forms, depending on the overall social and sexual dynamics of the species concerned. In all cases, however, grooming is the key to maintaining these alliances.

Primate alliances are built on the ability of animals to form inferences about the suitability and reliability of potential allies, but apes and monkeys can also practice deception and manipulative behaviour. They can do this because they can calculate the effect their actions are likely to have.

Chapter 3. “The Importance of Being Ernest”.
Grooming takes up a considerable amount of a monkey’s time, typically 10% though it can be as much as 20%. Grooming apparently releases endorphins, but while this encourages animals to groom it isn’t the evolutionary reason for it. One problem among social animals is “defection”, i.e. where one fails to return a favour. A solution is to make defection expensive, and grooming requires a considerable investment of time. Building alliances therefore requires commitment and it is therefore usually better to maintain existing relationships than try to build new ones.

In addition to grooming, monkeys use vocalisation to maintain their alliances. Monkeys make contact calls when moving through dense vegetation to enable the animals to keep track of one another. But subtle differences have been found in the utterances made by vervet monkeys depending on whether they are approaching a dominant animal or a subordinate one. Other calls were given when spotting another vervet group, or when moving out into open grassland. When these calls were recorded and played back, monkeys would look up when hearing calls from animals dominant over them, but ignore those from subordinates. Vervets also make a variety of predator warning calls which depend on the type of predator spotted. Other species use contact calls to keep in touch with preferred grooming partners.

But does any species, other than humans, have language? Attempts to teach apes to use language since the 1960s have produced unconvincing results, the oft-cited cases of Washoe, Kanzi etc notwithstanding. Human communications are on another level. What are they used for and why did they evolve?

Chapter 4. “Of Brains and Groups and Evolution”.
While bigger brains generally mean a smarter animal, the rule doesn’t always hold good because the size of the animal must also be taken into account. For example, whales and elephants have larger brains than humans, but they have to deal with a far greater muscle-mass than a human brain. When the relative brain-size is computed, it can be seen that the distributions for various groups of animals lie on different plains. Dinosaurs and fish lie below birds, which in turn lie below mammals. But among the mammals, there is also a hierarchy: marsupials lie at the bottom, followed by insectivores, then ungulates, then carnivores and finally at the top, primates. Here again there are levels: the prosimians at the bottom and the monkeys and apes at the top. The human brain is about nine times the size of the brain of a typical human-sized mammal and twelve times that of a human-sized insectivore [if such a species existed].

Why is this? It could not be due to chance, because of the high energy budget of a large brain, which is 20% of total in humans despite accounting for just 2% of total body mass. 1970s theories focussed on the need for greater problem-solving abilities; for example fruit eaters need bigger brains than herbivores, because supplies of fruit are harder to find.

Finding brightly-coloured fruit does require colour vision, which in turn requires more brain-power to process the input. Primates possess superior colour vision to other mammals [but so do birds and insects]. However this theory fails to explain why not all fruit eaters have large brains.

That social complexity might be linked to primate brain size was considered but not taken seriously until 1988, when British psychologists Dick Byrne and Andrew Whiten proposed what has become known as the Machiavellian Intelligence Hypothesis. This is based on the fact that monkeys and apes are able to use very sophisticated forms of social knowledge about each other. This knowledge about how others behave is used to predict how they might behave in the future and relationships are based upon these predictions.

The main problem with the theory was that many thought it was too nebulous to test. One problem for Dunbar was confounding factors: fruit eating primates require larger territories than leaf-eaters because the fruit is more widely-spaced. But many fruit-eaters such as baboons and chimpanzees are larger than leaf-eating monkeys and live in larger groups. There are four factors – body-size, brain-size, group-size and fruit-eating. The problem was to ensure that correlation between any two of these factors was not a consequence of both being correlated for quite distinct reasons with a third.

Dunbar decided to consider not the total brain size but the neocortex, which is the “thinking” part of the brain, where consciousness arises. The neocortex comprises the “grey matter” popularly associated with intelligence and it surrounds the deeper white matter in the cerebrum. In small mammals such as rodents it is smooth, but in primates and other larger mammals it has deep grooves (sulci) and wrinkles (gyri) which increase its surface area without greatly increasing its volume.

Dunbar then correlated the size of the neocortex against group size. He chose this as a measure of social complexity because of the volume of data available from field workers on many primate species and because it is a simple numerical value rather than a subjective assessment. Also group size is a measure of social complexity – the larger the group, the more relationships there are to keep track of. Dunbar found that there was a very good fit between the data and the ratio of neocortex volume to total brain volume. The findings provided support for the Machiavellian Intelligence Hypothesis – large brains were linked to the need to hold large groups together. Dunbar was also able to find the same the same correlation between neocortex ratio and group size in non-primate mammals such as vampire bats and some of the larger carnivores.

But just how does neocortex size relate to group size? There are possibilities beyond simply keeping track of social relationships. One is that the neocortex/group size correlation is more to do with quality rather than quantity of intra-group relationships. The Machiavellian hypothesis suggests that the key is the use primates make of their knowledge of others. There are two interpretations: firstly the relationship might be with the size of coalitions primates habitually remain rather than total group size, though larger coalitions are required in larger groups. The second is that primates need to be able to balance conflicting interests – playing one off against another, keeping as many happy as possible.

Research showed that primates form “grooming cliques”, with grooming occurring outside these groups being perfunctory and lacking enthusiasm; and distress calls from non-members likely to be ignored. Grooming seemed to be the glue that held coalitions together. When data about grooming clique size was combined with the data about group size and neocortex ratio, a good fit was found. As group size increases, so larger coalitions are required for mutual protection when living in these large groups.

Since humans are primates, it should be possible to predict group size for humans. The number turns out to be approximately 150 – Dunbar’s Number, as it is now known. Modern living – with millions living in cities – makes this prediction hard to test. However when considering hunter-gatherer societies – which is how we lived in pre-agricultural times, the largest grouping is a tribe, typically numbering 1500-2000 people who all speak the same language or dialect. Within tribes, smaller groupings known as clans can sometimes be discerned. Clan size – with very little variation – averages 150.

150 turns up elsewhere – early Neolithic villages typically had a population of around that number; religious communities such as the Hutterites and the Mormons lived in groups of 150; businesses can function informally with fewer than 150 employees, but require a formal management structure when the headcount exceeds this; army companies typically number around 150 men and so on.

The number of close friends and relatives people have tends to be around 11-12 with a fair degree of consistency. This corresponds to the “grooming clique” in primate societies, suggesting a neurological basis. Finally the maximum number of faces people can put a name has been found to be around 1500-2000 – suspiciously close to the size of a tribe in traditional societies.

Returning to grooming, a problem arises in that the larger primate group size is, the larger the grooming cliques need to be, and the more time needs to be devoted to grooming. Baboons and chimps have a group size of 50-55 individuals, and the amount of time they spend grooming is close to the upper limit of time that can be so spent without making inroads into the time needed to feed etc. If humans relied on grooming, then 40% of the day would have to be devoted to this activity.

The solution, Dunbar suggests, was language, which enabled several individuals to be “groomed” at once. When it comes to social networking, language has other advantages over grooming in that detailed information can be exchanged about individuals not actually present. Language is a “cheap and ultra-efficient form of grooming”. Dunbar rejects the conventional explanation of language evolving as a means to co-ordinate activities such as hunting more efficiently and suggests instead that it evolved primarily to allow humans to exchange social gossip.

Chapter 5. “The Ghost in the Machine”. Language is for communication [contra Bickerton, 1990], somebody trying to influence the mind of another individual. We also consider implications of what people are saying, their body language, etc. We assume everybody behaves with conscious purpose and try to divine their intentions, often extending this to animals and even inanimate objects. Philosophers however doubted if consciousness existed outside of the human world.

Rene Descartes assumed that while humans had minds, animals – which lack language – did not and were nothing more than automatons. However in the second half of the 19th Century Darwin and his contemporaries began to reconsider the emotional and mental lives of animals. A view eventually emerged that since it was not possible to see into the minds of animals, studies should focus on their observable behaviour. The result was the psychological school known as “Behaviourism”, which held sway right up until the 1980s.

Since then, however, attention has switched to “Theory of Mind”. This means the ability to understand what another individual is thinking, to ascribe beliefs that might differ from one’s own and to believe that that individual does experience those beliefs as mental states. Furthermore ToM enables individuals to handle “orders of intentionality” or beliefs about what another believes; for example “I believe x” is first-order intentionality, “I believe that he believes X” is second-order and so on. Humans can at most handle six orders, as in the following sentence due to Dan Dennett:

“I suspect [1] that you wonder [2] whether I realise [3] how hard it is for you to be sure that you understand [4] whether I mean [5] to be saying that you can recognise [6] that I can believe [7] you want [8] me to explain that most of us can only keep track of five or six orders.”

In the 1980s it was discovered that children are not born with a theory of mind and that this does not develop until 4 – 4 ½ years old. Up until then they will fail the so-called “false belief test” which asks if the child is aware that somebody can hold a false belief. For example, the child is shown an object such as a doll or some sweets being put in a particular place in the presence of another individual called (say) Fred. Fred then leaves and the object is moved. Fred returns and the child is asked where they think Fred thinks the object is. Very young children fail the test by saying they think Fred thinks the object is in the new location. They cannot grasp that Fred doesn’t know the object has been moved and that he would assume it was still in its original location. Only older children, with ToM, pass this test. Autism is a failure to develop ToM.

Tests have been carried out on animals to see how their ToM compares with humans and to see if they have self-awareness. An early test was the mirror test, to see if an animal could recognise that a reflection of itself in a mirror was not another individual. Chimps can readily pass such tests, other great apes appear to be competent but gibbons and monkeys invariably fail, as do non-primates such as elephants and porpoises. The validity of this approach has been questioned, as animals do not encounter mirrors in the wild [though of course they might see their reflections in water].

“Tactical deception” is where one individual tries to exploit another by manipulating its knowledge of a situation. To practice tactical deception, an individual must have at least second-order intentionality. It appears to be virtually absent from the prosimians, rare in New World monkeys, but common among the socially-advanced Old World monkeys (baboons, macaques) and chimpanzees. The frequency with which species practiced tactical deception was found to show a good fit with relative neocortex size.

Dunbar reasoned that in species with a large relative neocortex size, low-ranking males should be able to use their brains to exploit loopholes in the system and mate with females, gain access to bananas, etc. Tests showed that this was the case; for example a low-ranking male would feign disinterest in a box containing a cache of bananas in the hope that a higher-ranking male who arrived on the scene shortly after would conclude that the box was locked.

Tests aimed at showing whether apes and monkeys have ToM have involved obtaining a food reward from a baited box and two human assistants, one of whom is not present when the bait is moved. The ape or monkey then had to choose which assistant they wanted to open a box. Chimps did reasonably well at the test, though less so than six-year-old children. This and other tests led researchers to conclude that chimps had limited theory of mind, but monkeys completely lacked it.

Chimps seem to be able to go to third-order intentionality on occasions, but humans can readily surpass this. Humans can envisage people and situations that do not exist in actuality – a prerequisite for producing literature. Humans are able to detach themselves from their immediate surroundings. Dunbar argues that this is a pre-requisite for both science and religion, though some of his colleagues object to the comparison! However both require one to question the world as we find it, which in turn requires third-order intentionality at minimum.

If fourth-order intentionality and above is required for science and religion, it is no mystery why only humans have these things. But if third-order intentionality will suffice, could apes not also have science and religion? It is conceivable, but the main problem is that apes lack language and so could not transmit their ideas to their peers.

Chapter 6. “Up through the Mists of Time”.
Five million years ago, one ape lineage seems to have made more use of the woodlands that lie beyond the edges of the shrinking forests. Animals travelling between the trees here are more exposed to the sun and Peter Wheeler has calculated that an animal walking upright under these conditions receive up to a third less heat from the sun, especially around the middle of the day. They also benefit from the slightly breezier conditions at heights above three feet. Also these upright apes could have shed body hair over the parts of their bodies not exposed to direct sunlight and improved cooling properties by sweating through the skin. A naked biped ape would expend half the amount of water on sweating compared with a furry quadruped ape.

Bipedalism begun fairly early in the hominid lineage, because “Lucy”, 3.3 my old, was already bipedal as inferred from the shape of the pelvis and the articulation of the knee and hip joints. The Laetoli footprints in Tanzania, 3.5 my old, were also made by bipeds. Quite likely these early hominids were naked although they were still more apes than humans.

In their new woodland habitat, these apes had to contend with greater predation, and in response they grew larger and increased their group size. While Lucy was a diminutive 4ft tall, the Narikitome Boy had reached a height of 5ft3 at age 11 and would have topped 6ft had he survived to adulthood. But how do we know their group size increased? The link with neocortex size offers a clue, and the link with grooming time might offer a clue as to when language evolved.

Archaeologists [in 1996] favoured a recent date of 50,000 years ago – the date of the [now abandoned] Upper Palaeolithic Revolution. Anatomists [even then] favoured a date of around 250,000 years ago, co-incident with the emergence of Homo sapiens. This was based on the fact that an asymmetry between the two halves of the brain could be detected at this point. In modern humans, the left hemisphere – where the language centres are located – is larger than the right. This, they argued, was evidence for the appearance of language.

To try to resolve the issue, Dunbar and Leslie Aiello tried to solve the group size issue and find the group size threshold that precipitated language. They reasoned it must lie somewhere between the 20% maximum grooming time for any existing primate and the predicted grooming time of 40% for humans, possibly 30%. They then discovered that in primates and carnivores [but obviously not elephants and whales], neocortex ratio is directly related to total brain size [I assume that means a linear relationship]. Given that estimates for cranial capacity were available for extinct hominids, it was possible to calculate the predicted group sizes. These remained within the ranges of existing apes at first, but rose above this with the appearance of genus Homo. 150 was reached 100,000 years ago, but by 250,000 years ago group sizes would already have reached 120-130, and grooming time would have hit a prohibitive 33-35%. But even 500,000 years ago, group sizes were at 115-120 with corresponding grooming times of 30-33%. [Rather confusingly, Dunbar then cites this time as coinciding with the emergence of Homo sapiens, having dated this event to 250,000 a short while previously. In 1996, humans from this far back were generally lumped together as archaic Homo sapiens; the term Homo heidelbergenis is now generally preferred.]

Homo erectus [then described as the predecessor to our own species] had a predicted group size of 100-120, with grooming time requirements of 25-30%. Dunbar and Aiello took the view that H. erectus lacked language. They also noted no drastic jump in grooming time at any point, which they take to mean that language emerged gradually over a long period of time.

As group sizes increased, so vocalisation began to supplement grooming; probably this process began two million years ago, with the emergence of Homo erectus. As time passed, so the meanings conveyed by the vocalizations increased, though the purpose would have remained largely social. Humans were exploiting the greater efficiency of language as a bonding mechanism to allow themselves to live in larger groups without increasing the amount of time required for social networking. Interestingly, modern hunter-gatherers spend around 3.5 hours for women and 2.7 hours for men on social interaction in a 12 hour day, or about 25%, compared to the 20% maximum observed for other primates.

On this view, the Neanderthals must also have possessed language. They did not become extinct for lack of language, but lacked the technological and cultural sophistication of the incoming Cro-Magnons. Later, the Native Americans and Australian Aborigines suffered the same fate [basically Dunbar is advancing an Upper Palaeolithic version of the Guns, Germs and Steel argument advanced by Jared Diamond (Diamond, 1997)].

What drove the increase in human group size? Baboons can get by with group sizes of 50, why can’t we, especially as we are larger and can carry defensive weapons? Indeed, hunter-gatherers typically live in temporary camps of around 30-35 individuals. Dunbar puts forward three possibilities:

Firstly, our forbears may have occupied more open habitats than those of baboons and needed greater protection. Gelada monkeys live in very open habitats with high risk of predation. They live in groups of 100-250 [so why don’t geladas have bigger brains than humans?].

The second possibility is that human groups were threatened by rival human groups, and bigger group sizes were needed to fend them off.

The third possibility is that following the emergence of Homo erectus, humans became nomadic and left Africa. In unknown territory, they would have had to wonder further to find resources and they would have encountered hostile residents determined to exclude them. This does occasionally happen with hamadryas baboons. Migrants are always at a disadvantage, but one solution is to establish reciprocal alliances with neighbouring groups. This happens with the !Kung San of the Kalahari, who live in communities of 100-200 individuals, but these are dispersed into smaller family-based groups of 25-40. This is known as a fission-fusion system, because members are constantly coming and going. The same characteristic is seen with chimpanzees – the primary community is around 55, but foraging parties often only occupy 3-5 individuals. That we share this characteristic with chimpanzees suggests it emerged very early in our history.

Dunbar opts for this third theory.

The theory that language emerged to facilitate social bonding should be testable. Investigating conversations in various informal situations, Dunbar and his students discovered that conversation groups did not typically exceed four. Conversation groups start when two or three people start talking. Others join in, but once the number of participants rises above four it is difficult to hold everybody’s attention and the group tends to fissions as two distinct conversations start. At any one time, only one person will normally be speaking and the others will be listening, just as at any one time only one ape or monkey will be doing the grooming. If the speaker corresponds to the “groomer” then they are “grooming” up to three others rather than only one, meaning group size could potentially treble. The group size of 55 for chimps and baboons becomes very close to the “Dunbar Number” of 150.

The limit of four people in a conversation group arises from the distance apart people must stand if they form a circle. Once the diameter of the circle increases beyond a certain point, it is not possible for everybody to hear everybody else clearly without shouting, assuming normal background noise levels. The critical distance turns out to be two feet, and it is difficult for more than four people to all remain within this distance of each other.

The researchers also learned that as per Dunbar’s predictions, 60-70% of the conversations concerned social topics. Politics, religion, work etc took up no more than 2-3% and even sport and leisure topics accounted for barely 10%.

Taken together, this data supports Dunbar’s theory that language evolved primarily to facilitate social bonding.

The expensive tissue hypothesis is based on the enormous energy costs of running a brain, which in humans takes up 20% percent of the total energy budget. But total energy production of mammals is a function of size and humans generate no more energy than any other mammal of that size, despite having a brain nine times larger than a typical human-sized mammal. Where does the energy to run the brain come from? Clearly it must come from savings elsewhere [unlike governments, humans have to balance the books]. In addition to the brain, the biggest energy consumers are the heart, kidneys, liver and gut – indeed these use up between them 85-90% of the body’s energy budget. The heart, kidneys and liver cannot be downsized, which leaves only the gut. With a smaller, less-efficient gut, humans have to eat high energy easy to assimilate foods. Meat is one such food, and the shift from the predominantly vegetarian diet of the australopithecines to one with higher meat content seems to have corresponded to the initial increase in brain size. Initially this would have come from scavenging, but the second phase of brain expansion, 500,000 years ago seems to correspond to the beginnings of organized hunting. Aiello and Wheeler believe that big brains only became possible with a switch to meat eating.

But big brains had another cost – the problem of getting a large brained infant down the narrow birth canal. The problem was solved by what is in effect premature birth, with a huge investment required by the parents in post-natal care. Women couldn’t do it all on their own; men had to do their bit. The tendency to male-female pair bonding was the result. Sexual dimorphism, considerable in even australopithecines, reduced. Differences in canine teeth, pronounced in cases of sexual dimorphism, almost vanished. Human males are only slightly larger than females. The implications are a shift from a strongly polygamous mating system to one that is only mildly so. Harem groups became smaller, with males having to make do with two females, and many having to get by with just the one! Provisioning more than one female would also have been expensive [a trend that has continued to the present day].

Chapter 7. “The First Words”. There are three competing theories for the origins of language. The first states that the earliest languages were gesture-based; the second that it arose from monkey-like vocalizations; and the third is that it arose from music.

The gestural theory arises from the fact that fine motor control used for speech and aimed throwing is generally located in the left hemisphere of the brain. In addition to fine motor control, precise control of breathing is required and for this it was necessary for the monkey’s dog-like chest to change to the flattened chest characteristic of apes. When the body’s weight is on the arms, it restricts the chests ability to expand and contract and monkeys can only breathe once with each stride.

When apes adopted a climbing lifestyle, the monkey rib-cage was a major problem. In monkeys, the shoulder blades prevent the arms swinging in a circle, which prevents them from reaching above their heads while climbing. Eventually the scapula moved round to the back of the ribcage and arm-joints became positioned on the outer edge of the chest. The flattened rib cage of the apes was the result, which had the additional benefit of freeing the constraints on breathing.

The anatomical changed permitted aimed throwing. Chimps could out-throw Olympic athletes, but their accuracy is poor because they lack the fine motor control of humans. But our fine motor control, so the theory goes, could also be used to control speech.

The problems with this theory are language involves conceptual thinking, which is quite different to aimed throwing; the complexity possible with gestures is limited; and gestures require people to be in visual contact. Communication would be impossible after dark, when one might have expected a lot of social gossip to take place.

But why did fine motor control evolve in the left hemisphere? Dunbar suggests it is because the right hemisphere was already fully utilised processing emotional information. He speculates the speech evolved in the left hemisphere because there was room there, and fine motor control evolved there later – either for the same reason or because the left side is associated with conscious thought, which is required for aimed throwing. This is the reverse of the sequence of events required by the gesture theory. The reason the majority of us are right-handed is because sensory and motor control nerves from one side of the body cross over to the other side of the brain; the left hemisphere controls the right hand side of the body, and vice-versa.

There is a greater sensitivity to visual cues on the left hand side of the visual field. This lateralization seems to have appeared very early on and fossil trilobites tend to have more scars on the right side, suggesting pursuing predators attached more frequently from the left hand side.

Lateralization of language in the left hemisphere meant that this side became the seat of consciousness, where as emotional behaviour was seated in the right hemisphere [as Dunbar points out, this was the basis of Julian Jaynes’ theory about “bicameral minds” (Jaynes, 1976)].

It turns out that music and poetry are located in the right side. This also suggests that the theory of a musical origin for language cannot be correct.

Dunbar turns to the vocalizations of monkeys as the origins of language [a conclusion that Bickerton (1990) rejects]. He considers the predator-specific calls of vervets and also conversational patterns of gelada monkeys. The calls of these animals appear to be timed in anticipation of others rather than in simple response, just as human conversation is carried on by one individual anticipating the end of the speaker’s phrase or sentence rather than simply waiting for them to stop speaking. The vervet’s calls are an archetypal proto-language in which arbitrary sounds can be used to refer to specific objects, and overtones can be applied to increase the information content. Formalizing sound patterns to carry more information is but a small step; language is but a further small step [though Bickerton would disagree].

If it is accepted that the beginnings of language were the vocalizations of primates, there are still alternative views on the next step. One theory is that language arose out of song-and-dance rituals designed to co-ordinate the emotional states of group members [c.f. Mithen, 2006]. While Dunbar believes language arose to exchange social gossip, he considers this alternative viewpoint.

No society lacks song-and-dance, which on the face of it odd. Much has parallels with bird-song, which is used to defend territory or advertise for a mate. Maasai warriors, Maoris, All Blacks and Scots Guards use ritual song and dance (the Haka, bagpipes, etc) before going into battle. But song is also used in churches and bars in circumstances not associated with battle. Here Dunbar considers the “crowd effect” which leads to groups of people being amenable to far more extreme and intolerant views than individuals. Psychologists identified this phenomenon in the 1960s and referred to it as a “risky shift”. It led to the Crusades, Northern Ireland, Rwanda and Yugoslavia [the Nazis were of course consciously exploiting the phenomenon back in the 1930s at Nuremberg, though Dunbar does not mention this].

Dunbar believes that the explanation is that song and dance is an expensive activity. Deep bass tones are particularly difficult to produce. They are associated with a large and powerful body. Even among humans, there is a tendency to assume powerful, successful people to be tall. In every US presidential election since the war, the taller candidate has won. [The sequence was broken in 2000, when George W Bush (6ft) defeated Al Gore (6ft1) and subsequently held off the challenge of an even taller man, John Kerry (6ft4), in 2004. However in 2008 the taller candidate won when Barack Obama (6ft2) defeated John McCain (5ft9).] People often comment on how small the Queen is [she was positively dwarfed by Michelle Obama (5ft11)]. There is no doubt that smaller people have to work harder to get to the top and have to be fairly bloody-minded. But, as Napoleon and others have shown, it certainly can be done.

It does however appear to be almost universal that deep voices are needed to create a lasting impression. The peculiar deep voice associated with Margaret Thatcher is about half an octave below her natural voice. Thatcher’s advisers encouraged her to lower her voice after she became Tory leader in 1975.

Trying to hold together a group of 150 people is difficult even now and it must have been even harder 250,000 years ago in the woodlands of Africa. Song and dance would have had a role to play. The activity would have stimulated endorphin production. Chris Knight believes that the use of ritual to coordinate human groups by synchronising emotional states is a very ancient feature of human behaviour, coinciding with the rise of human culture and language. Ritual language would have been required to co-ordinate such activities, and this may have been the final stimulus for the evolution of language [I have to say I’m dubious; such organized rituals would surely have required the participants to already be behaviourally modern]. Dunbar is unconvinced and believes this use for language only came later. He believes song-and-dance may well have preceded language, but it was at first informal, unstructured and spontaneous, like chanting at football matches. [This probably explains why attempts by clubs and fan groups to orchestrate the atmosphere at football clubs with schemes such as “singing sections” invariably fail; also fans tend to devote at least as much time to chants abusing the referee or rival teams as they do to songs encouraging their own side.]

If language evolved to facilitate group cohesion, then who spoke first – men or women? In most primate species, females form the core of society. Males typically leave their birth group at puberty, often wondering from group to group in search of enhanced mating opportunities. Chimps [though not bonoboes] are an exception.

If early human societies were matriarchal [like the bonoboes] then language may have evolved first among females, and Chris Knight believes this was principally to help keep the men in line and ensure they invested in them and their children. This would explain that among modern humans women are generally better at verbal communication and have better social skills than men [says who?].

But in fact human society seems to be patriarchal. Evidence for this is that among most [but not all, e.g. the Karen people of Burma – see Wells (2002)] traditional societies, brides move to the village of their new husband, a system known as patrilocality. But most of these societies live in conditions where men control all the resources needed for successful reproduction, such land and hunting grounds. In more equitable societies, such as hunter-gatherers and modern industrial societies, female kinship and alliances are much stronger and matrilocality (where the man moves) may be the norm. [Does this explain the strongly sexually-differentiated culture in one particular work place with which I was familiar? On one occasion the women seemed to be in unusually high spirits all day but refused to share the reason for their good humour with any of the men. It eventually emerged that one of their numbers had just got engaged – to a man she’d already been living with for some years. At least the question of who had to move where after the marriage did not arise!]

Among Central African hunter-gatherers, Y-chromosomal genes are more widely distributed than X-chromosomal genes, which tend to be clustered, suggesting women remain close to their kin-groups, whereas men move across wide areas. A similar picture emerges from the impoverished east end of London in the 1950s, where women were dependent on the support of female kin in order to reproduce successfully and men tended to live closer to their in-laws than to their own parents.

Female kin-bonding may have been a more important force in human evolution than is often supposed [possibly because the people doing the supposing were mostly men]. The pressure to evolve language may have come from the need to form and service female alliances rather than male hunting activities.

Chapter 8. “Babel’s legacy”.
Languages change with surprising speed, the Romance languages for example having all arisen from Latin within two millennia (and having emerged as distinct languages well before that).

The Tower of Babel actually did exist and was a seven-stage ziggurat built some time during the 6th/7th century BC in what is now Iraq during the second flowering of Babylonian culture. Dunbar suggests that the Biblical story is a folk memory of a time when everybody in the world spoke the same language, but I am highly dubious. Even if we accept as late an emergence for human language as 50,000 years ago, it is inconceivable that a folk memory could persist for such a period of time. Dunbar cites the Norse legend of Ragnarok – the end of the world – as a possible folk memory. This legend tells of a world fire followed by the Fimbulvetr, a great winter lasting three years. This he equates to the folk memory of a period of global cooling. He cites a “little ice age” that affected northern Europe around 1000 BC. A more likely explanation is the eruption of Thera in the Mediterranean in 1600 BC, which would have produced a “volcanic winter” – a period of global cooling induced by dust and aerosols ejected into the upper atmosphere. These would also have led to spectacular sunsets around the world, possibly accounting for the “world fire”. At all events, the timescales are vastly different. The Babel legend is more likely a metaphor for poor project management of the kind that continues to bedevil large projects to the present day.

Dunbar then discusses the origins of the Indo-European languages, supporting Marija Gimbutas’ Kurgan Hypothesis. As I have already discussed this topic in this articleI will skip this section. Dunbar goes on to discuss attempts to reconstruct Nostratic, other linguistic superfamilies and the so-called “proto-World” [which might be ancestral to these superfamilies, but almost certainly wasn’t the first language ever spoken, since behaviourally-modern humans were probably living in Africa for at least 150,000 years before migrating to other parts of the world (McBrearty & Brooks, 2000)].

Why do languages change? This is followed by a section on language change that is also covered by my article about Indo-European origins, before asking the central question of just why does this happen? Dunbar notes that the vocalizations of other animals also show regional variations which he equates with linguistic dialects. He speculates on a common cause and believes there must be an evolutionary purpose for it.

Most higher organisms, including humans, tend to favour kin over non-kin. Bill Hamilton showed that there are two ways of getting one’s genes into the next generation. One is to reproduce oneself and the other is to help a relative to reproduce. This principle is known as kin selection. But how does one identify kin? One way is by accent and dialect – if these are the same as your own, then in pre-industrial times they were likely to be related to you. If a group migrates, then over a few generations the language will gradually change to a distinctive dialect. The group will thus gain a distinct identity.

There is some evidence to suppose dialects evolve faster in regions of higher population density. In pre-agricultural times, the rate of language change might have been much slower.

Chapter 9. “The Little Rituals of Life”.
This chapter is concerned with sexual section, first proposed by Charles Darwin, of which the peacock’s tail is the classic example.

Leda Cosmides was able to show that people could solve the so-called “Wason Selection” test with a far higher success rate if it was framed as a social problem rather than in terms of pure logic. She believes that humans have an inbuilt social problem-solving ability which recognises social contract situations and detects violations. Without such a mechanism, human social groups would collapse, with everybody acting in their own interest. Co-operation is essential to the survival of not only a group, but its individual members, so there is evolutionary pressure to develop such mechanisms for policing rules established for the common good. Dunbar believes that language is an important part of this system.

Language may ensure the bonding of a group in a number of ways. In addition to enabling one to keep track of friends, it can also be used to exchange information about cheats. Finally it may be used to influence what people think about us, i.e. for reputation-management. You can flatter people or be rude to them, depending on circumstances. But which function was the crucial one driven by evolutionary pressure, and which were the ones that were convenient spin-offs?

A study carried out for Dunbar showed that only 5% of conversation time is devoted to criticism and negative gossip, with a similar amount of time being spent on giving advice on how to handle social situations. Most of the time was devoted to who is doing what etc, suggesting that “policing” cheats might not have been languages primary function. While a primary function might not be one that’s needed that often, Dunbar argues it is a very expensive mechanism considering there is the far cheaper option of simply beating up the cheats! Indeed the problem of cheats is a consequence of living in large groups, which would not have happened without language in the first place.

Another study did show that while single sex groups were likely to simply exchange gossip, in mixed-sex groups there was a considerable increase in discussions about work, academic matters, politics etc with men showing the greater increase. A further study showed that of the time devoted to social gossip, men spent far more time talking about themselves than did women.

Dunbar interpreted the results as a “vocal lek”. A lek is a display area where males gather to advertise their qualities to potential mates. It is common among antelope and some birds, such as the peacock, which do not pair for life. Each peacock will defend a small territory in an area frequented by females, displaying whenever one approaches. The females wonder from one male to another before making their choice. In the “vocal lek” the men were basically trying to impress the women.

Body language and eye contact play a key part in initiating new relationships, especially for women. This [unsurprisingly] is an ancient primate habit. In species where one male controls a harem, such as the hamadryas baboon, even if a rival male is more powerful than the harem male, he will only move in if a female’s body language indicates she is not particularly interested in her current male. Normally the females follow the male closely, but sometimes they will delay and the male stops and looks back. Rival males can detect such subtle cues.

Males also need to advertise their fitness to reproduce, and in hunter-gatherer societies, one way they may do this is by hunting large mammals. From a purely economic point of view, hunting a large antelope makes much less sense than putting out a dozen or so traps, but a successful kill with all the attendant dangers is far more impressive. Dunbar draws a comparison with chivalric tales of medieval Europe when aspiring young men had to prove their worth by performing difficult tasks such as killing dragons, rescuing sleeping beauties, etc. Risk-taking tasks have the advantage that they are difficult to cheat and thus provide a good demonstration of one’s fitness to father somebody’s children.

Geoff Miller has suggested that the evolution of the human brain was driven mainly by the demands for sexual advertising – both to catch a prospective mate and hang on to them in the face of subsequent competition. A modern male can do this by making his partner laugh – an activity which triggers endorphin release. A study has shown that women are more likely to smile or laugh than man, and more likely to do so in response to men. But unfortunately for any aspiring female comedians, men are more likely to laugh in response to other men. While this may reflect a male-dominated society, Dunbar thinks it is far more likely to be a way women access the relative merits of their current partner against other males that happen to be on the scene. A man’s ability to make a woman laugh may be as good a test of fitness as any other.

Men and women differ considerably in the way they learn accents: men will tend to pick up the regional working-class accent; women by contrast tend to pick up a more neutral middle-class accent, the so-called Received Pronunciation or RP. One explanation is that women can improve their reproductive chances by hypergamy, or marrying up the social scale, whereas less well-off men need to be seen to belong in their environment in order to tap into the social network. Being poor with the wrong accent is a disasterous state of affairs.

But this is changing and a survey of dating agencies and personal ads show that women now want “new men” rather than rich but otherwise unreconstructed men; this is a reflection of the greater economic independence of women, and that the general levels of wealth and comfort are far higher now than they were in earlier times. With less pressure on hypergamy, Dunbar feels that women will eventually abandon RP and adopt the regional accents of men.

Sexual selection can lead to intense selection for males possessing certain traits, causing them to proliferate. This, according to Geoff Miller, was what led to the second phase of brain expansion in humans: a need to keep ones mate entertained.

Dunbar extends this theory to invoke the role of language in making people laugh, in turn triggering endorphin release.

While the ideas proposed in this chapter might not have been the main driving force behind language and large brains, they might have been valuable components that were added into the system as it developed and indeed may have driven them to levels beyond what might have occurred on the social bonding model on its own.

Chapter 10. “The Scars of Evolution”.
This final chapter summarises the work and ends with a few cautionary tales. In particular teleconferencing seems less effective if there are more than four participants. While this limitation came about from the number of people that could stand in a circle and hear each other, it seems to have become hard wired into our brains. Finally there was the case of a medium-sized company which happened to have 150 employees. The company began to struggle when it moved into new premises without a tea room, where much of the synergy that had made the company so successful in the past had been generated.


Aiello L C & Dunbar R I M (1993): “Neocortex Size, Group Size and the Evolution of Language”, Current Anthropology, Vol 34, No 2 (April 1993) pp. 184-193.

Baxter S (2002): “Evolution”, Gollancz.

Bickerton D (1990): “Language and Species”, University of Chicago Press, USA.

Diamond J (1997): “Guns, Germs and Steel”, Chatto and Windus.

Dunbar R I M (1996): “Grooming, Gossip and the Evolution of Language”, Faber and Faber, London Boston.

Jaynes J (1976): “The Origin of Consciousness in the Breakdown of the Bicameral Mind”, Mariner Books, USA.

McBrearty S (2007): “Down with the Revolution”, in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

McBrearty S & Brooks A (2000): “The revolution that wasn’t: a new
interpretation of the origin of modern human behaviour”, Journal of Human Evolution (2000) 39, 453–563.

Mithen S (2005): “The Singing Neanderthal”, Weidenfeld & Nicholson.

Wells S (2002): “The Journey of Man: A Genetic Odyssey”, Penguin.

© Christopher Seddon 2009

The Prehistory of the Mind: A Search for the Origins of Art, Science and Religion(1996), by Steven Mithen

Steven Mithen is Professor of Archaeology at the University of Reading and is a pioneer in the field of cognitive archaeology, which is the branch of archaeology that investigates the development of human cognition.

Mithen’s 1996 work The Prehistory of the Mind: A Search for the Origins of Art, Science and Religion is an ambitious attempt to bring to bear an interdisciplinary approach to the evolutionary origins of the human mind. The book is aimed at the non-specialist and was generally well-received upon its publication.

The following is an extended summary of Mithen’s book:

Chapter 1: “Why ask an archaeologist about the human mind?” Mithen touts his book as an archaeologist’s approach to a problem normally tackled by psychologists and neurologists.

There have been major two spurts of brain enlargement in humans and proto-humans – one between 2.0 and 1.5 million years ago (Homo habilis) and a lesser one at 500,000 to 200,000. One is linked to the development of tool-making, but there was no great advance at that time. Brains had already reached present-day size when two dramatic transformations occurred. One was a “cultural explosion” at 60,000 – 30,000 years ago (art, complex technology and religion) and the other was at 10,000 years ago (agriculture). Brains are expensive to run in terms of the body’s energy-budget, so what were they used for before the cultural explosion? What was going on between the major enlargement spurts? What caused the cultural explosion? How did language and consciousness arise? When did modern intelligence arise – indeed what is modern consciousness?

Chapter 2: “The Drama of our Past”. Mithen presents prehistory from 6 million years ago to the present day as a play in four acts.

Act 1 6-4.5 million years ago features the common ancestral ape (“the missing link”). Nothing much happens during this act.

Act 2 4.5-1.8 million years ago. Starts in Africa – initially Chad, Kenya, Ethiopia and Tanzania for Scene 1; enlarges to include South Africa for Scene 2. Act opens with Australopithecus ramidus, the first of the australopithecines who is joined by A. anamensis 300,000 years later. Both live in wooded environments and are principally vegetarian. At 3.5 million years ago, they are replaced by Lucy (A. afarensis) who can both walk upright and climb trees. She is on-stage for 0.5 million years, then leaves and nothing happens until Scene 2 opens at 2.5 million years ago. Right at the end of Scene 1 we see primitive stone tools (Omo industrial complex), but we cannot see who made them. A rush of actors appear with Scene 2 – gracile australopithecines (A. africanus) in the South and robusts in both East and South. The first humans (Homo habilis, etc) are seen at 2.0 million years. They carry tools, stone artefacts known as the Oldowan industry. Habilis butchers animals but we cannot see if they have been hunted or merely scavenged. The remaining australopithecines become more and more robust.

Act 3 1.8 million – 100,000 years ago. The act begins with a grand announcement: “The Pleistocene begins”. Ice sheets form in high latitudes. Scene 1 – the Lower Palaeolithic. The Homo habilis actors exit and are replaced by Homo erectus who is taller and bigger-brained. The robust australopithecines skulk around in the background until 1 million years ago. H. erectus appears simultaneously in East Africa, China and Java. Erectus persists in East Asia until 300,000 years ago but elsewhere we see actors with more rounded skulls referred to as archaic Homo sapiens. By 500,000 years ago the stage expands to include Europe and a new actor, the large Homo heidelbergensis. New tools join the Oldowan tools in this act, pear-shaped hand-axes that are first seen in East Africa at 1.4 million years ago but soon spread to all parts of the stage except south-east Asia where no tools are seen (possibly they used bamboo). Scene 2 – the Middle Palaeolithic. This scene opens 200,000 years ago though the distinction is blurred and is gradually being phased out. However new tools are replacing the hand-axes and include those made by the Levallois method, which show regional variation. At 150,000 years ago Homo neanderthalensis appears in Europe and the Near East. Like other actors he has to deal with frequent and dramatic changes to the scenery as Ice Ages come and go and vegetation changes from tundra to forest. For all this, tool kits change very little for a million years. Brain size is modern but there is still no art, religion or science.

Act 4 100,000 years ago to present day. Scene 1 covers 100,000 to 60,000 years ago. Homo sapiens sapiens [at the time of writing it was still believed that modern humans and Neanderthals were subspecies rather than separate species] joins a cast that includes archaics and Neanderthals. In the Near East, Homo sapiens sapiens bury their dead (as indeed to the Neanderthals) but also place parts of animal carcasses on the bodies as grave goods. In South Africa ochre is used and bone is used to make harpoons (first use of anything other than stone or wood). In Scene 2, beginning 60,000 years ago) Homo sapiens sapiens builds boats and reaches Australia. Blade technology appears where flakes are removed from prepared prismatic cores. 40,000 years ago we enter the Upper Palaeolithic in Europe and the Late Stone Age in Africa. Diverse props are made from new materials including bone and ivory. Beads; necklaces; animal and human carvings; cave paintings, bone needles used to sew clothes. For about 10,000 years the Neanderthals may be trying to mimic Homo sapiens sapiens but then they fade out, leaving Homo sapiens sapiens alone on the world stage. Cave art flourishes in Europe as the Ice Age is at its height 30,000 – 12,000 years ago. As the ice retreats the scenery fluctuates between cold/dry and warm/wet and the stage expands to take in the Americas. Scene 3 begins with the Holocene. Agriculture appears in the Near East, towns and cities appear, empires rise and fall; carts become cars and tablets become word processors as the final curtain falls.

Chapter 3. “The Architecture of the Modern Mind”. By exposing the architecture of the modern mind and taking it apart we can learn much about how it evolved.

Can the mind of a young child be regarded as a sponge, soaking up knowledge, or is it better thought of as a computer? A computer can take in data, run a program to process it and output the result. Could a young child’s mind be thought of as running a general-purpose learning program to process the knowledge they are soaking up? In fact this analogy is not a good one. Children do more than process data – they think, create and imagine.

In 1979 the US archaeologist Thomas Wynn published an article claiming the modern mind existed 300,000 years ago, before anatomically modern man. He suggested that phases of mental development of a child reflect phases of the cognitive evolution of mankind (the ideas that “ontogeny recapitulates phylogeny”). Wynn consulted child psychologist Jean Piaget, who believed that mind is like a computer and runs a small set of general purpose programs to control entry of new information and restructure the mind so that it passes through a series of developmental phases. According to Piaget, the last phase is reached at 12 when child acquires “formal operational intelligence” and can think about hypothetical objects and events. As such intelligence is required to make a hand-axe with the maker needing to be able to visualise the finished produce, Wynn concluded the makers of such hand-axes, who lived 300,000 years ago, must possess modern minds.

However Mithen is dubious and believes that the events in Act 4 must have required further cognitive developments. Wynn’s reasoning is sound, therefore Piaget must be wrong. In fact Piaget’s ideas of general-purpose learning programs are now widely disputed by psychologists who have instead begun to liken the mind to a Swiss Army knife, with specialised devices to perform different tasks.

Psycho-linguist Jerry Fodor’s book “The Modularity of the Mind” was published in 1983. According to Fodor, the mind has two parts – input systems and cognition or central systems. The input systems are a series of discreet modules with dedicated architectures that govern sight, hearing, touch, etc. Language is also regarded as an input system. However the cognitive or central system has no architecture at all – this is where “thought”, “imagination” and “problem solving” happen and “intelligence” resides.

Each input system is based on independent brain processes and they are quite different from each other, reflecting their different purposes. These systems are localized in specific areas of the brain. The input systems are mandatory, if for example somebody sits behind you on the bus and spends the entire journey gassing away on their mobile, you cannot switch off the hearing module. However this has the advantage of saving time that would otherwise spent on decision-making.

Fodor believes that the input systems are “encapsulated”, i.e. they do not have direct access to the information being acquired by other input systems. What one is experiencing at a given time in one sensory modality does not affect any of the others.

A second feature of the input modules is that they only have limited information from the central systems. Fodor cites a number of optical illusions such as the Muller-Lyre arrows, which continue to apparently differ in length even when one is fully aware that this is not the case. The input modules are essentially “dumb” systems that act independently of the cognitive system and each other. To sum up, they are encapsulated, mandatory, fast-operating and hard-wired. Perception is innate, i.e. hard wired into the mind at birth.

The central cognitive systems are very different to the “dumb” input systems. According to Fodor, they are “smart”, they operate slowly, are unencapsulated and domain-neutral, i.e. they cannot be related to specific areas of the brain.

The Fodorian view is that evolution has given the modern human mind the best of both worlds: input modules that can enable swift, unthinking reactions in situations of danger (predators, etc) or opportunity (prey, etc) on one hand; and a slower central cognitive system, to be used when there is time for quiet contemplation, integrating information of many types and from many sources.

Also published in 1983 was Howard Gardner’s “Frames of Mind: The theory of Multiple Intelligences”. Gardiner was as much concerned with practical issues such as devising educational policies for schools as with philosophy of the mind and he put forward a very different architecture to Fodor. The entire mind is a Swiss army knife, with seven “blades”: linguistic, musical, logical-mathematical, spatial, bodily-kinaesthetic and two forms of personal intelligence, one for looking into one’s own mind and one for looking outwards into the minds of others. Gardner’s modules are smart, interact with each other, and can be used for problem solving. The smartest people are those who can use the modular domains synergistically e.g. by use of metaphor and analogy.

Mithen speculates the two approaches are closer than might at first appear to be the case. Fodor’s non-modular central system might appear that way because its modules function so smoothly the modularity within simply cannot be discerned.

In 1992 the evolutionary psychologists Leda Cosmides and John Tooby (abbreviated to “C&T” by the author) entered the fray with an essay published in “The Adapted Mind”, co-edited with Jerome Barkow. On their view, the human mind evolved under selective pressures during Pleistocene. We remain adapted to this, as it ended so recently. The mind is like a Swiss army knife with many blades, each designed by natural selection to deal with problems faced by hunter-gatherers. The modules are “Fodor type”, i.e. hard-wired, but are “content rich” and possess not just algorithms for solving problems but a built-in knowledge-base. Some are activated at birth – e.g. those for making eye-contact with mother; others need a little time, such as language-recognition.

Hunter-gatherers would have needed specific modules for specific tasks, as more a more general purpose reasoning would have been prone to making errors – e.g. committing incest or failing to share food with kin, if indeed they could make a decision about anything at all, and thus the Swiss army knife model was selected for.

C&T believe children could not learn complex subjects rapidly without content-rich mental modules pre-programmed to do so (cf. Noam Chomsky’s theory about built-in grammar systems).

The dedicated systems are also used to make rapid decisions e.g. run if faced with a lion rather than weigh up the pros and cons of the situation and getting eaten.

Looking at the lifestyle, C&T predict the modules that would be needed: face-recognition, spatial relations, rigid object mechanics, fear, social exchange, emotion, kin-orientated motivation, effort allocation and recalibration, childcare, social awareness, friendship, grammar, communications, theory-of-mind etc. These modules are grouped together in domains called “faculties”.

But does this model explain the existence of a genius like Einstein? Could a mind purpose-built for life as a hunter-gatherer expand the boundaries of human knowledge? Mithen considers present-day hunter-gatherers and note all think of the natural world in social terms – anthropomorphic (animals with human-like characteristics) and totemic (kinship with animals and plants) thinking. This contradicts C&T which assume different “blades” would be used for the natural and social world and would not think about the natural world as if it were a social being. Children will anthropomorphise a cat and interact with a doll as if it is a living person. How can this be squared with content-rich modules? The human passion for analogy and metaphor is a problem for C&T. How can this be resolved?

Mithen then considers child development and four domains of instinctive intelligence – language, psychology, physics and biology.

1) Language has already been considered.

2) Psychology – children possess a “Theory of mind”, i.e. they can predict what others are thinking (autism is the absence of this ability). Alan Leslie and others developed this idea originally put forward by Nicholas Humphrey “The social function of intellect” – such ability will be selected for when individuals live in a group. Such behaviour could not be learned by young children from experience alone and thus must be innate (i.e. the content must be there already). Humphrey believes that the biological purpose of reflexive consciousness is to model the mind of another individual – in other words reflect on how we ourselves would feel in a given situation.

3) Biology. Children realise that animals have an immutable “essence” – a three-legged dog that can’t bark is still a dog; putting striped pyjamas on a horse doesn’t transform it into a zebra, etc. Scott Atran (1990) notes among all known cultures certain concepts are universal: vertebrates, flowering plants, sequential naming conventions (e.g. spotted shingle oak); taxa that are morphologically similar; higher taxa such as birds and fish; trees and grass. Such “natural history intelligence” would be vital to hunter-gatherers.

4) Physics. Young children instinctively understand solidity, gravity and inertia. Children understand the difference between living and inanimate things. There are obvious advantages to having this knowledge from Day 1.

So how do we resolve the paradox of children applying inappropriate rules of psychology, biology and language when playing with inanimate objects?

Mithen goes back to the notion of “ontology recapitulates phylogeny”. The evidence for content-rich modules comes from studies of children aged 2-3. Developmental psychologist Paula Greenfield suggests before that age the Swiss army knife modules aren’t there – instead there is only a simple general-purpose learning program. The language explosion begins at 2, suggesting the content-rich modules only cut in then. Prior to that the child’s mind is like that of a chimpanzee. The child’s has metamorphosed from a computer program to a Swiss Army knife.

However, according to Annette Karmiloff-Smith, the final stage of mental development has yet to come. Her 1992 work “Beyond modularity” attempts a synthesis of Piaget’s and Fodor’s work. Her view is that the dedicated content-rich modules kick-start the development of cognitive domains. Rather than having mental modules grouped in faculties, Karmiloff-Smith has domains comprised of micro-domains. Cultural development shapes these domains and while hunter-gatherers didn’t need a maths domain, one could develop in a modern child under appropriate cultural conditions.

After modularisation, the modules begin working together. Karmiloff-Smith describes this as representational redescription (RR). RR results in multiple representations of similar knowledge and consequently knowledge becomes applicable beyond the special purpose goals for which it is normally used and perceptual links across domains can be forged. Thoughts can arise that had previously been trapped in one domain.

Developmental psychologists Susan Carey and Elizabeth Spelke have proposed a similar idea. “Mapping across domains”, or duplicating the same data in different domains, is a fundamental feature of cognitive development and one that might account for cultural diversity. This can be compared to Gardner’s view of smartest people using the different domains synergistically, as exemplified by use of analogy and metaphor.

Margaret Boden’s 1990 work The Creative Mind explores how we can account for creative thought and concludes this arises from what she describes as the transformation of conceptual spaces. Boden’s conceptual spaces are similar to cognitive domains. Transformation of one of these involves the introduction of new knowledge, or new ways of processing the knowledge that is already contained within the domains.

The evidence for thought requiring knowledge from multiple cognitive domains is overwhelming and it is clearly a critical feature of mental architecture. To account for it, Paul Rozin argued that the processes of evolution should result in a host of modules within the mind. But rather than add more modules as C&T suggested, Rozin believed that accessibility between the modules is a critical feature in both child development and evolution.

Basically partial de-modularisation appears to be essential for creative thought and a fully-developed modern mind.

But the French anthropologist and cognitive scientist Dan Sperber believes we can have it both ways – full modularity and creativity. He believes we have a module of meta-representation or MMR. This holds metadata, representations of representations, which can be updated as new data becomes available. For example new data about cats is matched against a meta-cat is used to update the meta-cat. The MMR thus acts as a clearing-house for new ideas. Ideas that cannot find a home stay in the clearing-house.

“Mischief can occur in the clearing-house” – ideas about dogs can get mixed up with ideas about inanimate objects, thus a stuffed toy can be made to represent a real dog. But this crossover shouldn’t happen according to C&T – it could lead to errors like eating a plastic banana. In fact this doesn’t happen, however fevered our imagination we can (on the whole) distinguish it from reality. Can C&T’s ideas (full modularity) be reconciled with Karmiloff-Smith, Carey, Spelke (partial modularity) and Sperber (clearing-house)? Mithen believes they can in an evolutionary context.

Chapter 4. “A new proposal for the mind’s evolution”. The mind is likened to a cathedral built in several architectural phases. Mithen adopts the premise of ontology recapitulates phylogeny, due to E. Conklin in 1928, documented by Stephen Jay Gould in 1977 and drawn on by Thomas Wynn. Mithen also considers and rejects neoteny however useful this is for morphological development of modern humans.

Phase 1. Minds dominated by a general intelligence. Such minds have a single “nave” in which all the services take place; these are the thought processes. Fodor’s input modules are present for delivering information to the nave, but it lacks the complex cognitive systems Fodor sees in the modern mind. This is a nave of simple general intelligence, similar to that of young (>2 years old) children. The behaviour is simple, the rate of learning slow, errors frequent and complex behaviour patterns cannot be acquired.

Phase 2. To the central nave are added chapels of specialised intelligence or cognitive domains. Just as the addition of side chapels to Romanesque cathedrals in the 12th Century reflect increasing complexity of church ritual, so these chapels reflect increasing complexity of mental activity. Some of the modules found in the chapels were present in the original nave, but they have now been grouped together in the appropriate chapel.

There are four chapels of specialised intelligences – social (group behaviour, intentionality (“mind-reading”)), technical (tool-making, etc), natural history (weather, geography, animal behaviour, etc) and language. The first three are totally separate from the nave and each other, divided by thick walls. Thought, when it occurs, is confined to that domain. If a thought is required that requires more than one domain – e.g. a tool for hunting a particular animal – it happens in the general intelligence. Accordingly thought and behaviour at such “domain interfaces” is far simpler than within a single domain.

The relationship of the language chapel (which Fodor saw an input module) to everything else is unknown at this stage.

Such minds are similar to those of children aged 2-3 as described by Karmiloff-Smith.

Phase 3. Direct access is possible between the chapels and indeed a new “superchapel” corresponding to Sperber’s MMR may also be present. Experience gained in one domain can now influence that in another. This is the synergistic interaction of Gardner’s theory – metaphor and analogy become possible. The nave acquires a greater complexity in its services: this central service is Fodor’s central cognitive system of the mind. It’s like a Gothic cathedral rather than a Romanesque one, where there are no thick walls between the chapels and sound, space and light interact. The mind now possesses “cognitive fluidity.”

How did all this come about? Mithen turns to the chimpanzee.

Chapter 5. “Apes, monkeys and the mind of the missing link”. The mind of the common ancestor of humans and modern chimpanzees (the so-called “missing link”) is assumed to be equivalent to that of the latter. Mithen believes the case for chimp intelligence has been overstated. He concedes that they can make and use rudimentary tools (termite sticks, loo paper, etc) but offspring are slow to pick ideas up. This indicates the absence of a “technical intelligence”. There is no chimp “culture” as such. While different groups of chimps use different tools (e.g. some use termite sticks, some don’t) this is purely because nobody in a particular group ever discovered the technique – it is the absence of a problem-solving approach rather than the presence of culture.

Chimps have only basic natural history intelligence. They are good at making foraging decisions based on a continually-updated mental map of known resources. They will sometimes move hammer-stones and nuts considerable distances to anvil stones. But they lack the ability to make creative and flexible use of their knowledge – for example one group, while playing with a juvenile duiker (a type of small antelope) killed it but failed to eat it because they normally hunt the more abundant colobus monkeys.

On the other hand the social intelligence of chimps and their Machiavellian scheming is well-documented. They clearly have a theory of mind and practice deception (though it only appears to work for other chimps and not humans). Examples of deception include a subordinate male placing his hand over his erect penis so it remains visible to the target female, but concealed from a nearby dominant male.

Chimpanzees appear to have a conscious awareness of their own minds but this only extends to social interactions, not to tool-making or foraging.

Chimp linguistic skills are rudimentary. They can create sentences and use grammar but only the most limited way. Indeed bird song is more analogous to human speech and convergent evolution has probably given birds a dedicated speech module. Song plays a major role in the social life of birds, much more so than vocalisation in non-human primates.

Chimps have a moderately good general intelligence, a specialised social intelligence domain and a domain for mapping resource distribution – a very basic natural history module. Tool-making and foraging use the same mental processes – general intelligence and seem well integrated, albeit limited. However the integration between social and tool-making skills is poor – for example adults rarely tutor offspring about making and using tools, despite the obvious benefits of doing so.

The chimp’s mind is mid-way between Phase 1 and Phase 2. There is a general intelligence and the first “chapel”, one for social intelligence.

Monkeys also have a complex social life, but it is simpler than that of chimps. They get confused by their own reflections. They have no concept of self. They have no theory of mind.

The common ancestor of monkeys, apes and lemurs – the Notharctus – lived 55 million years ago and was probably even more primitive, possessing general intelligence only. This could handle simple learning rules for reducing food acquisition costs and facilitating kin recognition. There was as yet no social module and Nothartcus interaction with the social world was probably no more complex than their interaction with the non-social world (similar to present-day lemurs). Their minds were Phase 1.

Chapter 6. “The mind of the first stone toolmaker”. Mithen now moves on to consider the australopithecines and Homo habilis [which he uses as a convenient catch-all for the various human species then existing]. He looks at the tools – the Omo tools (australopithecines) are little more than smashed stone nodules, possibly even within the range of modern chimps. The Oldowan-type flakes used by H. habilis are more advanced and include sharp flakes for butchery and nodules that can be used for breaking open bones for marrow. They are beyond anything a chimp could produce as they require some knowledge of fracture dynamics, but they are still very simple and show no attempt at the imposition of a preconceived form, the finished product reflecting the character of the original nodule, the number of flakes, and the order in which they were detached. The materials worked were mainly quartzite and basalt – more demanding than stripping leaves off twigs to make termite sticks, but less so than working material such as cherts (flints). So it would appear that H. habilis had only rudimentary technical intelligence.

Mithen now considers natural history intelligence. It is accepted that H. habilis consumed more meat than does a chimp. There are many sites aged 2.0 – 1.5 million years with a mixture of animal bones and stone artefacts. From where these animal bones can be identified (which isn’t often) it appears H. habilis’ diet included zebra, antelope and wildebeest. The bones show butchery cut marks. Additionally, the relatively large brain implies a high-quality diet in terms of calories, given brains are very expensive to run in terms of energy requirements.

In the 1980s there was much often acrimonious debate about how the bone fragments should be interpreted. Glenn Isaacs (late 1970s) proposed the sites represented home bases where food was brought from various locations and shared; and infants were cared for. This implies prolonged infant dependency and linguistic communication. But Lewis Binford published the seminal Bones: Ancient Men and Modern Myths in 1981. In this he argued there was no evidence for the transporting and consumption of large quantities of meat. Instead they were marginal scavengers, taking leftovers at the bottom of the hierarchy of meat eaters on the African savannah. Other models were proposed but no firm conclusions reached, partly because of the poor archaeological record but also because H. habilis probably had a diverse lifestyle with flexibility between hunting and scavenging as circumstances dictated.

What would be the cognitive implications of flexible meat eating? To the chimps ability to build mental resource maps and move tools around would be required the ability to form hypotheses about carcass/animal location. Also habilis took the food to the tools (rather than just the other way round). For all this the range of environments exploited was very narrow compared to later humans and was probably tied to edges of permanent water sources. The behavioural flexibility implying full-blown natural history intelligence was absent.

Mithen goes on to consider social intelligence in Homo habilis. Among living primates there is a relationship between brain size and group size. This was pointed out by Robin Dunbar who believed it is also a measure of social intelligence. Dick Byrne found deception occurs more frequently with larger brains – the larger and more complex the social scene, the more devious one must be to win friends and influence people. But does this rule apply to extinct primates like australopithecines and H. habilis? The latter was more advanced than living primates with tool-making and rudimentary natural history modules. However these were such recent developments that Mithen believes the rule probably holds good. Dunbar plugged estimated brain sizes for australopithecines and H. habilis into a formula derived from living primates to obtain group sizes of 67 for australopithecines and 82 for H. habilis, compared with 60 for chimps. These group sizes are what Dumbar refers to as “cognitive groups” whom one “knows socially” as opposed to the total group size.

Circumstantial evidence supports the larger group sizes. Two factors make primates live in larger groups. Firstly there is a better chance of beating off predators, or at least somebody else getting eaten. Skulls pierced by leopards prove that H. habilis did get eaten by predators. The other advantage is when food comes in large, unevenly distributed parcels. These are difficult to locate and too much for small groups to eat. A bigger group increases the probability of locating such a parcel, and there is enough to go round, so it can be shared with the rest of the group. So it would appear that there were selective pressures acting in favour of an enhanced social intelligence for H. habilis.

H. habilis could probably handle higher levels of intentionality than their predecessors. Modern humans can track five or six levels of intentionality. Chimps can manage two; the increased social intelligence of H. habilis could probably manage three or four.

Did H. habilis have language capability? Fully-developed language requires fully-developed mental modules for language, as we know from the development of children’s linguistic abilities. In modern humans Broca’s area is associated with grammar and Wernicke’s area with comprehension. The 2 million year old H. habilis specimen from Koobi Fora (KNM-ER 1470) is well-preserved and has been examined by Phillip Tobias who believes that Broca’s area is present and Dean Falk confirms this. By contrast there is no evidence for Broca’s area in australopithecines. Terrence Deacon has argued that the pre-frontal cortex in early humans has been disproportionately enlarged and that this would lead to a re-organisation of connections within the brain favouring the development of linguistic capacity – but it’s not clear how far this process had developed 2 million years ago.

Robin Dunbar found that as primate group size increases so does the percentage of time spent grooming. If grooming time goes above 30% there isn’t enough time to look for food. Vocalising means more than one group member can be “groomed” simultaneously and thus grooming time can be reduced. From the inferred group size for H. habilis, grooming time is 23% and there would likely have been selective pressure to reduce grooming time by use of vocalisation. This may have been no more sophisticated than the chattering of baboons or purring of cats.

To sum up, though H. habilis is clearly an advance on the common ancestor, the basic design is the same and the “chapels” are as yet incomplete.

Chapter 7. “The multiple intelligences of the Early Human mind”. Act 3 from 1.8 million to 100,000 features a better archaeological record. Detailed and accurate reconstructions of past behaviour can often be made, but it seems almost bizarre in nature. “Early Human” lumps together Homo erectus, heidelbergensis and neanderthalensis.

Once again, Mithen considers technical, natural history, social and speech. Technical intelligence increased with the development of the hand-axe, which often show 3D symmetry indicating the “knapper” was intent on imposing form on the artefact rather than just creating a sharp edge as in the Oldowan tradition. This is very difficult and requires forward planning. The Levallois method (typically used by the Neanderthals) requires even more technical skill. In neither case can the procedure be reduced to a series of fixed rules that can be followed by rote and both visual and tactile clues must be used to monitor the artefact’s constantly changing shape and adjust plans for how it should develop. Tougher-to-work materials such as quartzite and chert are now used. Different types of artefacts are made from different materials.

Natural history intelligence improved – H. erectus was able to cope with conditions outside the African savannahs where there is more seasonality. From Wales to South Africa we find Early Humans – but very dry and very cold environments were too much for them and they didn’t reach Australia or the Americas. Early humans gathered, scavenged and hunted in a flexible manner.

In the glaciated landscapes of Europe, the Neanderthals flourished for over 200,000 years – an impressive achievement. The frequent environmental changes as glaciers advanced and retreated meant available foods changed and this made life even harder. 70-80% died before 40. Given their technology was primitive compared with modern humans they must have had at least as good natural history intelligence as that of modern humans in order to survive.

Mithen then poses four questions:

1) Why were bone, antler and ivory not used for tool-making? Because Early Humans could not think of animal parts (catered for under natural history) in the tool-making technical domain.

2) Why were tools for specific tasks not made, e.g. spear points show little variation in the Old World although a considerable variety of animals were hunted? Mithen claims this was because Early Humans couldn’t cross-link technical intelligence with animal behaviour (natural history intelligence).

3) Why were no multi-component tools ever made, e.g. erectus never made hafted tools though the Neanderthals and others did occasionally? Such tools were usually made for specific types of prey – same explanation as 2) above.

4) Why so little variation both in time and space, over a million years and in Africa, Western Europe, the Near East and India toolkits show little variation. Again, this is because there was no integration between tool-making and prey availability.

The “social intelligence” of Early Humans must have been at least as good as chimps. Leslie Aiello and Robin Dunbar predict “social knowledge” group size of 111 (erectus) 131 (archaic sapiens) and 144 (Neanderthals) c.f. 150 for modern humans. Mithen believes that true figures were slightly lower as some brain power must have been needed for other domains. As with H. habilis, “large packet” limited availability foods would have favoured group living. However tensions could have risen in larger groups and during milder interglacial times, much smaller groups were favoured. Such flexibility in social relationships is at the heart of social intelligence. The proven care of disabled and elderly is further proof.

Four more questions.
1) Why do the settlements of Early Humans imply universally small social groups, contrary to Dunbar’s theories? The false assumption is that mind of Early Humans was like that of modern humans implies lack of social cohesion. But if technical and social intelligences were not integrated, tool-making and living activities might not have taken place at the same sites. The interpretation of archaeological record could be in error because the “social networking” sites are now invisible in the archaeological record.

2) Why do distributions of artefacts on sites suggest limited social interaction? (Knapping and butchery debris are strewn around all over the place suggesting no dedicated food-processing and tool-making areas.) There is no integration between social and technical, or social and natural history. Eating is not a social activity and food-distribution is handled by general intelligence.

3) Why is there an absence of items of personal decoration? (No beads, pendants, necklaces or cave art have been found. There is possible body painting with ochre in South Africa.) No integration between social and technical intelligence.

4) Why is there no evidence of ritual burial among early humans? Neanderthals buried their dead but there is no unequivocal evidence of grave goods (the supposed pollen evidence of burial flowers is flawed – the pollen was probably blown into the cave) and it was possibly no more than hygienic disposal. Possibly they could recognise ancestors, but if so absence of grave goods is even more puzzling. Presumably the explanation is the same as 3) above, i.e. no social purpose for artefacts.

Mithen considers language. Language can be inferred from brain size, brain shape and character of vocal tract. Brain size of Early Humans mostly falls within the range of modern humans. Recall Dunbar correlation between brain size, social group size, and required grooming time. Dunbar and Leslie Aiello predict grooming time is 40% by the time of archaic Homo sapiens. This is too high and would have required a language with significant social content to alleviate the problem – a “social language”. A general purpose language emerged later, but how much later Dunbar and Leslie do not make clear.

The pre-frontal cortex is responsible for language and reflecting on own and other mental states – central to social intelligence. Broca and Wernicke areas on Neanderthals identical to modern humans. 63,000 year old Neanderthal hyoid bone is identical in shape, muscular attachments and apparent positioning, suggesting Neanderthals possessed the anatomic capability for speech. Given the risk this implies of choking this suggests the mental ability would also have been present.

But was language used for other than social purposes, e.g. for teaching the Levallois Method? It sounds reasonable, but on the other hand H. erectus was a proficient tool maker and forager, despite being in all probability linguistically limited. Additionally such usage would have implied a greater integration between social, technical and natural history domains. Mithen concludes that language was purely social in function, supporting Dunbar.

Brain size increased from 750-1250 cc for early H. erectus to 1200-1750 cc for Neanderthals. Not gradual – there was a plateau between 1.8 million and 500,000 years ago followed by a rapid increase with the appearance of archaic Homo sapiens and the Neanderthals. H. erectus could probably make a wide range of sounds, but probably too simple to be described as a proper language. Aiello notes most complete H. erectus skeleton KNM-WT 15000 suggests muscle control necessary for regulation of respiration not present.

The Levallois method appears at end of brain expansion period (250,000 years ago) implying better technical intelligence but probably not a reflection of more intense social interactions.

The higher latitudes in Europe were occupied a million years after first H. erectus migration out of Africa. Early humans may have had cognitive ability to cope with the harsh Pleistocene European environment.

There was some improvement in natural history intelligence going from H. erectus to Neanderthals, but the biggest difference was increase in linguistic intelligence.

A Phase 2 mind has been achieved in the course of Act 3. The chapels are complete, but isolated and services going on in them can barely be heard elsewhere. There is still no cognitive fluidity.

Chapter 8. “Trying to think like a Neanderthal”. Mithen assumes Humphrey is correct and consciousness evolved as a mechanism to allow an individual to predict social behaviour of other group members. At some stage we became able to interrogate our own thoughts and feelings about how we would behave in certain situations. In other words consciousness arose as a part of social intelligence.

In a Phase 2 mind like a Neanderthal, therefore, consciousness existed only in the social domain and there was no conscious awareness of thought processes involved in technical and natural history domains. Nobody really understands consciousness. There appear to be two types – “sensation” – awareness of sights, sounds, itches, etc which Humphrey regards as “lower order” than that relating to reasoning and reflection of one’s own mental state. Mithen believes the Neanderthals possessed this latter “reflexive consciousness only in relation to the social world.

When making a stone tool they experienced what we experience when driving a car on autopilot while engaged in conversation, thought, etc. We negotiate roundabouts and traffic-lights etc successfully but have no memory of having done so at the end of the journey. This is what is described by Daniel Dennett as “rolling consciousness with swift memory loss”. We find it hard to imagine what it would have been like to have such a Swiss army knife mind but we should remember that we are only aware of a fraction of what is going on in our own minds – for example the complex processes required to generate grammatically-correct speech and the evasive action when knocking over a cup of coffee. Another example is that an epileptic can continue to play the piano during a petit mal seizure despite higher brain stem function being temporarily lost. If modern people can drive cars and play pianos without involving conscious awareness then Neanderthals making stone tools and foraging becomes more plausible.

We must accept that the monotony of industrial traditions such as hand axes and the absence of bone and ivory tools and of art is only explicable in terms of mentalities fundamentally different to our own.

Chapter 9. “The big bang of human culture: the origins of art and religion”. Act 4 sees the entry of fully-modern Homo sapiens sapiens at 100,000 years ago. In Scene 2, 60,000 – 30,000 years ago there is a cultural explosion. But early modern humans, Homo sapiens sapiens, had already been in existence for at least 40,000 years [in fact around 150,000 years following the redating of the Omo remains]. The start of Scene 2 rather than Scene 1 is taken as the Middle/Upper Palaeolithic transition. Archaeologists believe this is a cultural revolution – restructuring of social relations, the appearance of economic specialisation, a technical invention similar to that which caused the adoption of agriculture, and the origin of language – but Mithen rejects this and believes it marks the point at which “doors and windows are inserted into the chapel walls” – the development of the Phase 3 mind. But it doesn’t happen everywhere at once.

Colonization of Australia occurred 60,000 – 50,000 years ago; Blade core technology replaced Levallois technology in Near East 50,000 – 45,000 years ago; appearance of art in Europe dates to 40,000 – 30,000 years ago.

What is Art? Palaeolithic concept of art different from ours; art is culturally specific [art is in the eye of the beholder] and many cultures creating fine rock paintings do not have a word for “art”. Art from Europe in the era includes an ivory statuette from Hohlenstein-Stadel in southern Germany – a man with a lion’s head (totemism); animal figures in ivory including cats, mammoths, bison and horses also from southern Germany; v-shaped signs engraved on limestone blocks in caves in the Dordogne. Once thought to be vulvas, but not now thought to have any simple representational status. Items of personal adornment such as beads, pendants and perforated animal teeth are widely known. At La Souquette in south-west France ivory beads carved to mimic sea shell. A tradition of painting caves with animals, signs and anthropomorphic figures culminating with Lascaux 17,000 years ago. Chauvet 30,000 years ago contains 300 or more paintings of naturalistic and anatomically-correct animals including rhinos, lions, reindeer, horses and an owl. After 30,000 years ago, art is found in Africa and is generally a world-wide phenomenon by 20,000 years ago.

The European art appeared when the last ice age was at its peak and cannot be viewed as a product of favourable circumstances. Yet under similar conditions the Neanderthals [apparently] produced no art.

Visual symbols –
1) Can be arbitrary, e.g. “2” doesn’t look like two of anything.
2) The purpose is communication.
3) A symbol can refer to things distant in space and distance in time
4) The same symbol can mean different things to different people; e.g. a swastika wasn’t always associated with the Nazis and long predates Hitler.
5) Variability is permitted, e.g. variable handwriting.

Consider Australian Aborigines. Circle can mean campsite, fire, mountain, waterhole, women’s breasts, eggs, fruit, etc. As an Aborigine child grows up, they change from face value interpretation such as fish = fishing; to images in the Dreamtime tradition which must of course be learned. Greater metaphorical sense, often relating to Ancestral Beings. Some knowledge may require being in the know. Fish starts out being good to eat; later good to think; potent symbol of spiritual transformation of both birth and death. The two types of fish image are complimentary. Archaeologists can reconstruct face value “outside” meaning; but “inner” meaning requires access to the lost mythological world of the prehistoric mind – the origin, Mithen believes, of religion.

There are three requirements for art –
1) Making a visual image requires a mental template, e.g. I think I will draw a Boeing 747.
2) Intentional communication with something displaced in space or time, e.g. a mammoth that was killed last Tuesday in Ipswich.
3) Attributing meaning to a visual image not associated with its referent – e.g. the hoof-print of a pregnant female deer that passed 2 hours ago.

Art is only possible with cognitive fluidity. 1) is found in the technical intelligence domain (making objects of preconceived form such as hand-axes) and could be used for making art but wasn’t prior to modern humans. 2) is established as a feature of social intelligence as intentional communication as vital to Early humans as to Modern humans – this feature is common, indeed, to monkeys and apes. Finally 3) is a feature of natural history intelligence. A rock painting can be compared with a hoof-print; it is removed from the animal that has been painted. But this was not done before the appearance of modern humans.

The cultural explosion 40,000 years ago in Europe is explained by new connections between technical, social and natural history intelligence, which create a new synergistic cognitive process Mithen calls visual symbolism – or simply art. Evidence – art apparently did not evolve (like a child’s artistic skills) but emerged fully-fledged, the very first images possessing technical skill and emotional power. (Although there are images drawn by children and apprentices – the artists clearly had to learn.)

What of earlier art? 100,000 year old fossil nummulite from Tata in Hungary appears to have an incised line perpendicular to natural crack to make a cross. There is the incised red ochre in South Africa, etc. Mithen believes these relied on general intelligence only; that while the specialist domains had the necessary capability, they could not be brought into play. Thus art was very limited.

Not only is art made possible by cognitive fluidity the content is influenced. We see humans with animal heads. In southern Germany we see a human with a lion’s head – a being in the mythology of people of that time. It could be an animal with human attributes (anthropomorphism) or a human descended from a lion (totemism). Anthropomorphism is common in human society – cats, dogs, Mickey Mouse, etc. Totemism is “the other side of the coin” and was the core of social anthropology during the 19th Century. Major works produced between 1910-50 by pioneer social anthropologists such as Frazer, Durkheim, Pitt-Rivers, Ratcliffe-Brown and Malinowski. This was the foundation for Levi-Strauss’ The Savage Mind followed by surge of renewed interest from 1970. Levi-Strauss defined animals as “good to think” as well as good to eat. Totemism viewed as humanity brooding on itself and its place in nature. The study of natural species “provided non-literate and pre-scientific groups with a ready-to-hand means of conceptualizing relationships between human groups”.

Totemism is universal among (modern) hunter-gatherers; it requires cognitive fluidity (animals and people); based on the evidence it has been present since the start of the Upper Palaeolithic.

Landscapes appear to also be socially-constructed and full of meaning. The Aborigines are again a good example – e.g. wells are where ancestral beings dug in the ground, trees are where they placed their digging sticks and red ochre where they bled. In southwest France in Upper Palaeolithic times we find a range of topographic features universally associated with social and symbolic meanings by modern hunter-gatherers so this mindset certainly isn’t new. The social and natural worlds are one and the same for both modern and Upper Palaeolithic hunter-gatherers. One consequence is that they expressed their view in art producing some of the finest art ever made.

In the Upper Palaeolithic, people hunted the same animals as their predecessors, but did so more efficiently. Early humans were predominantly opportunistic, hunting whatever was available but modern humans concentrated on specific animals at specific sites, e.g. reindeer. Some sites selected for ambush hunting, suggesting animal movements were better predicted, e.g. attacking at critical points along annual migration routes such as narrow valleys or river crossings. This is evidence of anthropomorphic thinking – though a reindeer doesn’t think as a human, imagining it does can still act as an excellent predictor to its behaviour. This has been tested in several studies of modern hunter-gatherers.

Modern humans also produced for the first time bespoke hunting weapons for different types of animal which would be impossible for a Swiss army knife minded human. Examples include weapons made from bone and antler, harpoons, spear-throwers, etc. A variety of projectile point is seen, with specific types being associated with particular prey at particular sites. Key to all this is blade technology where specialized multi-component tools can be made from standardised blanks. We also see a rapid evolution in technology as environmental conditions changed and building on the experience of earlier generations. Large points used for big game on tundra at height of ice age 18,000 years ago; shift to multi component tools and greater diversity of tools as conditions ameliorated and wider range of game became available. This is in stark contrast to the monotony of earlier technology and could only have happened with a connection between natural history and technical intelligences.

Art is used to store information, e.g. bones incised with parallel lines. Alexander Marshack’s microscopic studies suggest regular patterns that appear to be a system of notation. This was probably a visual recording device, probably environmental events such as moon-phases [my personal view is that these tallies recorded the female menstrual cycle]. Similar to notched engraved artefacts made by modern hunter-gatherers which are known to be mnemonic aids and recording devices, such as calendar sticks made by Yakut people of Siberia. John Pfeiffer describes cave paintings as a tribal encyclopaedia. Mithen has himself suggested that the way in which animals were painted relates to information acquired about movements and behaviour. In some cases animals are painted in profile but their hoofs in plan. Was this to facilitate identifying hoof prints or teaching children? Bird imagery dominated by migratory birds like ducks and geese. Again, modern hunter-gatherers in glacial environments use arrival and departure of these birds as harbingers of winter or spring. Paintings could also have fulfilled the same function as “trophy arrays” of the Wopkaimin of New Guinea, which are carefully arranged to act as a mental map of the local environment, an aide memoir for recalling info about environment and animal behaviour. Michael and Anne Eastham have suggested paintings and engravings in Ardeche, France served as a model or map of terrain around the caves.

Objects of adornment appear for the first time, requiring social and technical intelligence integration.

The anthropomorphic images seen earlier and grave goods suggest the Upper Palaeolithic people have beliefs in supernatural beings and possibly an afterlife. In other words, the first religion. What is religion? In his 1994 work “The naturalness of religious ideas” Pascal Bowyer notes that the most common and indeed possibly universal feature is a belief in non-physical beings. He also notes three other features common in religious ideologies:

1. Many societies believe in an afterlife.
2. Certain people within a society (e.g. shamans, priests) are especially likely to receive direct communication from supernatural agencies such as gods and spirits.
3. Performing certain rituals in an exact way can effect change in the physical world.

All these features seem to have been present in Upper Palaeolithic times. It is likely that anthropomorphic images seen in French caves are either supernatural beings or the shamans who communicated with them. French pre-historian Andre Leroi-Gourhan believes the painted caves are likely to reflect a mythological world as complex as the Australian Dreamtime. Bowyer believes supernatural beings typically have characteristics that violate instinctive knowledge of psychology, biology and physics (see Chapter 3). For example, bodies that don’t age and can pass through solid objects (ghosts) or are invisible. They nevertheless conform to some instinctive knowledge in that they have desires and beliefs like normal beings. The Ancestral Beings of the Australian Aborigines have weird characteristics like existing in both past and present; but they play tricks and practice deceptions. The various shenanigans of the Greek Gods provide a more familiar example. Bowyer argues the combination of violation and conformity characterises supernatural beings in religious ideologies. Some conformity is necessary for people to get their heads round things.

Mixing-up of knowledge of different types of entities in the real world – which would have once been trapped in separate domains – are the essence of supernatural beings. It could only happen in a cognitively fluid mind. The notion that some punters in a group can communicate with supernatural beings is a consequence of the belief that some people have a different “essence” than others. Essence is a “natural history” feature (instinctive biology) (chapter 3). Bowyer explains differentiating of people into different social roles exemplified by shaman as the introduction of “essence” into the social world – again, a consequence of cognitive fluidity.

Religious ideologies as complex as those in modern hunter-gatherer societies must have come into being about at the time of the Middle/Upper Palaeolithic transition and have remained with us ever since.

It is not surprising that with new abilities, humans rapidly colonised the world. An expansion began at 60,000 years ago with Australia being colonised by extensive sea voyages (Clive Gamble); then the North European plain; the arid regions of Africa; the tundra and forests of the far north after 40,000 years. Early humans entered but did not stay; modern humans colonized these regions and used them as stepping-stones to the Americas and the Pacific islands. This is all down to cognitive fluidity but it does not happen until well after the emergence of modern Homo sapiens sapiens.

Early modern humans had a degree of cognitive fluidity, but they hadn’t achieved full integration and had still partially Swiss army knife type minds. In the Near East remains of early modern humans in caves at Skhul and Qafzeh dating 100,000 – 80,000 years ago have stone tools almost identical to Neanderthals at Tabun (180,000 – 90,000 years ago) before the early moderns arrived and at Kebara after they left (63,000 – 48,000 years ago). But animal parts in human graves suggest religion/ritual activity (unlike Neanderthal burials) and people/animal association, probably totemic. The early moderns also hunted gazelles more efficiently – though using same spear types, they hunted on a seasonal basis and thus expended less energy; also their spears needed to be repaired less often. This suggests enhanced prey behaviour prediction, achievable only by anthropomorphic thinking. All this implies natural history and social integration. But the technical integration has not yet been achieved.

A similar conclusion is reached when evidence is considered from South Africa. Fossils in caves of Klasies River mouth and Border Cave are less well-preserved but date to same time period of 100,000 years ago. Some archaic features; likely to be original source of H. sapiens sapiens [this before discovery of the Herto remains (155ky) and redating of the Omo remains (200ky)]. Klasies river sequence runs from 100,000 – 20,000 years ago. At 40,000 years ago flake technology changes to blade technology at Middle/Upper Palaeolithic (or Middle to Later Stone Age on the African scheme). Prior to this, tools are similar to those made by early humans elsewhere in Africa during Act 3, even though made by early modern humans after 100,000 years ago. The layers corresponding to appearance of early modern humans contain significantly increased (though still quite rare) amounts of ochre, often used as crayons. Red ochre is entirely unknown prior to 250,000 years ago and extremely rare (a few dozen pieces) prior to 100,000 years ago. Chris Knight and Camilla Powers believe it used for body-painting, since no other art known prior to 30,000 years ago in South Africa [again, this predates discovery of 70ky old incised ochre pieces]. In Border Cave an infant is buried with a perforated Conus shell originating 80 km away. Grave dates 80,000 – 70,000 years ago. Small blades of high-quality stone, apparently designed for use in multi-component tools. Finally there is working of bone with multi-barbed harpoons found at Katanda, Zaire. These are 90,000 years old and are 60,000 years older than known comparable examples.

Mithen suggests that if one assumes that there is only one type of human in southern Africa after 100,000 years ago, then the mentality of these early modern humans is drifting in and out of cognitive fluidity. The benefits of partial cognitive fluidity are insufficient for the change to become “fixed” within the population. A degree of cognitive fluidity exists, but much less than that which arose at the start of the Upper Palaeolithic. It was however sufficient to give early modern humans the edge as they moved out of Africa and spread throughout the world.

The strongest evidence for replacement “Out of Africa” theory is limited genetic diversity of present-day humans, suggesting a recent and severe bottleneck and greater genetic diversity in Africa than elsewhere. One estimate suggests six breeding individuals for 70 years or 50 individuals in all; or 500 if the bottleneck lasted 200 years.

If the exodus comprised humans who were only partially cognitively fluid, they would have taken this condition – genetically encoded – with them as they expanded across the world. The integrated natural history and social intelligence gave them the competitive edge over earlier-type humans, pushing them into extinction.

The final step to full cognitive fluidity – integration of technical intelligence – was taken at different times in different parts of the world. This arose from parallel evolution and was perhaps inevitable as there was an evolutionary momentum towards full cognitive fluidity. As soon as adaptive pressures arose in each area, technical intelligence became part of the cognitive fluid mind and the final step to modernity was taken. [It is doubtful that cognitive fluidity could have arisen by parallel evolution as this would almost certainly have resulted in a degree of cognitive differences between different present-day populations. However in his more recent work, Mithen now suggests that full cognitive fluidity arose prior to Homo sapiens migration from Africa.]

Chapter 10. “So how did it happen?” Mithen believes language gradually broke down the barriers between the domains. He goes along with Robin Dunbar’s proposal that early humans’ language was used to send and receive social info only, unlike modern general-purpose language. But “snippets” of non-social content e.g. tool-making and animal behaviour crept in from two sources. The first source was from general intelligence at domain interfaces. Though limited, this could have permitted some vocalisation about the non-social world. Probably a small range of words, used predominantly as demands, with no more than two or three words strung together, in contrast to the grammatically-complex social language. The second source may have arisen from the specialized intelligences not being entirely isolated and some non-social thoughts about tool-making and foraging etc might have leaked through into the social domain.

As humans used non-social words, they would have entered the minds of other humans and invaded their social domains. There would have been a selective pressure to utilise this non-social information, as better hunting and tool-making decisions could have been made. There would have been a further selective pressure to add to one’s non-social vocabulary in order to question others about animal behaviour and tool-making, etc, and thus add to one’s knowledge. Possibly some happened to have particularly permeable walls between their specialised domains and this physical trait would have been heavily selected for. A general-purpose language would have evolved between 150,000 to 50,000 years ago.

Evidence survives in our conversation today which Robin Dunbar notes is still predominantly about social matters. We also ascribe “minds” to inanimate objects as implied by sentences like “the ball flew through the air” and “the book toppled off the shelf” which linguist Leonard Talmy argues implies the objects move under their own volition because the sentences are so like “the man entered the room”. Utterances use the same range of concepts and structures regardless of whether they refer to mental states, social beings or inanimate objects. Linguists believe that language was originally used for inanimate objects and by “metaphoric extension” were transferred to utterances about the social world. Mithen believes it was the other way around.

The social “chapel” was turned by the invasion of non-social material into one of Dan Sperber’s “superchapels” (chapter 3, MMR). The superchapel allows world-knowledge to be represented in two locations – it’s “home” domain and within the social domain, which now additionally contains non-social information. This multiple representation of data is a crucial feature of Annette Karmiloff-Smith’s ideas about how cognitive fluidity arises during development.

This helps us to understand apparently contradictory views held by hunter-gatherers and indeed any modern humans about the world. For example the Inuit on one hand think of the polar bear as a fellow kinsman, yet they kill and eat it. Deep respect for animals that are hunted, often expressed in terms of social relationships, yet no qualms about actually killing them appears universal among hunter-gatherers. This reflects the same bit of info being present in more than one domain. Another example is the Australian Aborigines and their attitude to landscapes. They exploit these with a profound knowledge of ecology (natural history) yet they view this same landscape as having been created by Ancestral Beings, who don’t follow the laws of ecology. Again the same info is represented in two different domains.

Sperber suggested that non-social info invading the social domain would trigger a cultural explosion, which Mithen claims occurred at the outset of the Upper Palaeolithic.

Mithen has followed Nicholas Humphrey’s argument that reflexive consciousness evolved as a critical feature of social intelligence and he believes a change in the nature of consciousness was a critical feature of the change to cognitive fluidity.

Consciousness was not accessible to thought in other domains. Early humans were not aware of their own knowledge of the non-social world other than via ephemeral rolling consciousness (see chapter 8). Language was the means by which social intelligence was invaded by non-social material and this made the non-social world available for reflexive consciousness to explore. This is the essence of the argument made by Paul Rozin in 1976 – his notion of accessibility was the “bringing to consciousness” of knowledge already in the human mind but located within the “cognitive unconsciousness”. Much mental activity remains closed to us even now; e.g. a potter often will be unable to explain how they throw a pot and can only demonstrate the process.

The new role for consciousness in the human mind is likely to be the one identified by Daniel Schacter in 1989 when he argued that consciousness is a global database integrating the output of modular processes. Such a mechanism is crucial in a modular system where different types of info are handled by different modules. Early humans had only general intelligence to perform this role. But because language acted as the means of delivering non-social thoughts into the social domain, consciousness could start to play the integrating role. Individuals could become introspective about their non-social thought-processes and knowledge, leading to the flexibility and creativity that characterises modern human behaviour.

Sexually-mature females were under selective pressure to achieve cognitive fluidity. Humans can only give birth to small-brained infants (typically 350 cc, no larger than an infant chimp). This is because of design issue of the pelvis – birth canal versus walking upright. There is a huge growth after birth hence massive post-natal dependency. This would have become pronounced during the second phase of brain expansion 500,000 years ago. According to Chris Knight mothers needed to provide good-quality food and Early Modern Human females solved the problem by extracting “unprecedented levels of male energetic investment” from the men and probably co-ordinated their behaviour to that end. A key element was a “sex strike” and use of red ochre as “sham menstruation”. This was the first use of symbolism and increase in ochre use after 100,000 years ago is cited as evidence. Mithen is sceptical about co-ordinated female action, but believes this was a social context in which food became critical in negotiating social relationships between sexes, and in this context snippets of language about food hand hunting may have been particularly valuable in the social language between males and females. Females may have needed to exploit this information in their dealings with men, and this might be why the first step towards cognitive fluidity was an integration of natural history and social intelligences, as seen in Early Modern Humans in the Near East.

The increased time between birth and maturity that arose as brain size increased also provided the time needed for connections between the specialised intelligence domains to be formed in the mind. In chapter 3 Annette Karmiloff-Smith argued a modern child passes through a domain-specific cognition phase and in Chapter 7 Mithen argued that cognitive development in young Early Humans ceased after the domains of thought had arisen and before any connections could be built. With regard to development, the source of cognitive fluidity must lie in a further extension of the period of cognitive development. The fossil record provides some evidence that child development in modern humans is much longer than that of early humans. This shows that Neanderthal children grew up quickly, developing robust limbs and large brain at an early age compared with modern humans. 50,000 year old fragments found at Devils Tower on Gibraltar are of a 3 to 4 year old child. Dental eruption occurred earlier and skull is 1,400 cc – nearly adult size. A two-year old Neanderthal found in Dederiyeh Cave in Syria possessed a brain the size of a six-year-old Modern human. Unfortunately we lack the skulls of children 100,000 years ago from the Near East and those of Upper Palaeolithic. Mithen believes they would show a trend towards increasingly-prolonged infant care over the time period 100,000 – 30,000 years ago.

Chapter 11. “The evolution of the mind”. There has been an oscillation over 65 million years between specialized and generalized ways of thinking.

Mithen believes that the earliest proto-primates, the plesiadapidiforms (Purgatorius, etc) had no general intelligence and that their minds were of the “Swiss army knife” type made up of hard-wired specialised behavioural modules that kicked in in response to specific stimuli and which were little modified by experience. This inability to learn is shared with cats, rats, etc, but not with modern primates who can identify general rules that apply in a set of experiments and use the general rule to solve a new problem. These animals declined because of competition from rodents about 50 million years ago.

The first modern primates included the lemur-like Notharctus who possessed larger brains (greater encephalization) and who appeared around 56 million years ago. The Notharctus probably possessed a general intelligence to supplement the specialist modules, but no dedicated social intelligence. The general intelligence could handle simple learning rules for reducing food acquisition costs and facilitating kin recognition. There was as yet no theory of the mind and Nothartcus interactions with their social world was no more complex than their interaction with the non-social world (similar to present-day lemurs). There is an about-turn from hard-wired behavioural responses to stimuli to a generalised mentality with cognitive mechanisms to allow learning from experience. However a larger brain had higher energy costs and these primates needed to exploit high-quality plant foods such as new leaves, ripe fruit and flowers.

About 35 million years ago more advanced primates such as Aegyptopithecus appeared. They were fruit eating quadrupeds and lived in tall trees of monsoonal rain forests. General intelligence superior to Notharctus and there is a specialised social intelligence domain. There is more complex social behaviour than non-social behaviour (see ch5) and a selective pressure to predict and influence behaviour of other group members. As argued by C&T those individuals with specialised mental modules for social intelligence would be able to solve social problems better. By 35 million years ago, general intelligence had reached its limits and the trend to ever-increasing specialisation of mental faculties had begun, which was to continue almost to the present day.

Andrew Whiten describes brain evolution as deriving from “spiralling pressure as clever individuals selected for yet more cleverness in their companions”. Nicholas Humphrey describes intellectual prowess correlated with social success and if this success means high biological fitness, then any inheritable trait that enables an individual to outwit their fellows will soon spread through the gene pool. The spiralling pressure probably continued 15 to 4.5 million years ago, a period in which the fossil record is poor and during which common ancestor of man probably lived, about 6 million years ago.

The fossil record improves at 4.5 million years ago. As seen in ch2 the best preserved australopithecine A. afarensis is adapted to joint arboreal and terrestrial lifestyle. Fossil record between 3.5 – 2.5 million years ago suggests brain-size remains constant. Why did spiralling pressure come to this hiatus? Probably two constraints applied: bigger brains require more fuel and need to be kept cool. Brains need 22 times more energy than muscle; and overheating by even 2 Celsius can impair their functioning. Australopithecines were probably mainly vegetarian and lived in equatorial wooded savannah. They couldn’t get enough to eat or keep cool.

But at 2 million years ago brain size begins to increase again. Meat-eating and bipedalism had provided the solutions to these problems. Bipedalism begins to evolve 3.5 million years ago, possibly in response to selective pressure to reduce thermal stress. By walking upright solar radiation from tropical sun can be reduced by 60%. The australopithecines’ tree-climbing tree-swinging ancestry pre-adapted them to bipedalism. Bipedalism is also more energy-efficient and australopithecines could forage for longer periods without food and water and in areas with less natural shade, and thus exploit foraging niches closed to predators more dependent on shade and water. The environmental changes occurring in Africa 2.8 million years ago led to more arid and open environments and thus bipedalism would have been advantageous.

Bipedalism required a bigger brain for the more complex muscle control it required. Dean Falk discusses how a network of veins covering the head was also selected for, providing a cooling system or radiator. Thus the overheating constraint was relaxed. Falk also suggests that once feet ceased to be used for manipulation areas of cortex used for foot control were freed up and were utilised to improve manual dexterity.

The new scavenging niche made it possible to obtain animal carcases and with more meat in the diet the gut size could be reduced releasing more energy to run the brain without changing the basal metabolic rate. Thus the second constraint on brain-size was relaxed.

Meanwhile the larger social groups needed to survive in open terrestrial habitats (partly as a defence against predators) meant better social intelligence was required, and this drove brain size up. In ch6 we saw Oldowan tools required more brainpower to make than those used by chimps. However the knowledge probably arose due to enhanced learning opportunities in larger group sizes rather than enhanced technical intelligence.

The distinct domains appear at 1.8 – 1.4 million years ago in response to continuing competition between individuals – removing the constraints on brain size triggered a cognitive arms race. However they may also reflect the appearance of a constraint on further growth of social intelligence. Nicholas Humphrey notes that a point comes when it is pointless devoting any more time to a social argument, so by 2 million years ago possibilities of enhancing reproductive success by enhancing social intelligence were played out, and new cognitive domains were evolved: natural history and technical intelligence. This would have permitted more efficient carcass and other food location, and better butchery techniques. Individuals with these faculties got a better diet and needed to spend less time exposed to predators on the savannah.

With the new domains, humans spread through much of the Old World, reaching Wales, South Africa and south-east Asia. The Swiss army knife mind was so successful there was no further brain enlargement between 1.8 million and 500,000 years ago. It is probable, though, that the minds of different types of human varied subtly in the nature of their multiple intelligences reflecting the diverse environments in which they lived.

The language domain probably began to evolve as far back as 2 million years ago. Mithen follows Robin Dunbar and Leslie Aiello’s argument that language evolved for social purposes only, and selective pressure to reduce grooming time. Anatomic changes required for speech were made possible by bipedalism, such as the descent of the larynx (Aiello). A spin-off from this was the ability to form the sounds of vowels and consonants. Changes in breathing patterns and reduced teeth size due to meat eating also helped as sound quality improved as did fine control of the tongue.

While H. erectus had better vocal ability than modern apes, they were very limited compared with modern speech. A large lexicon and grammar rules did not appear until the next phase of brain enlargement 500,000 – 200,000 years ago, though this remained a social language. Why did this second phase happen?

One possibility was a further expansion of social group size. It’s not clear why. Aiello and Dunbar believe that as the global human population increased so the need for defence against other groups of humans increased. The opportunity so created was however used. As described in ch10 the scope of language spread to the non-social world, leading to cognitive fluidity. Possibly this evolved because the mind had by 100,000 years ago specialisation had reached its limits. Although it only fully developed in modern humans, it may have begun with the last of the Neanderthals, but before it could fully develop they were pushed into extinction by modern and fully cognitively fluid humans.

The minds of today’s humans may have other new domains in response to cultural pressures that begun with the adoption of agriculture, e.g. a maths domain.

We can see an alternation over 65 million years between specialized and generalized intelligences. If we started and finished with general purpose minds, why was there a phase of specialist domains with limited integration? Quite simply the mind developed in a piecemeal fashion, with a general purpose intelligence to keep it running and specialist modules being added on later. Once the latter were working properly they were integrated into the whole, using language and consciousness as the glue. The whole undertaking was far too complex to undertake in one hit, c.f. writing a complex multi-modular computer program.

Art, religion and science are unique achievements of the modern mind. Science has 3 critical properties. Generating and testing hypotheses. Chimps can do this in their social interactions when practicing deception, with social intelligence. Early humans did so for resource distribution, using natural history intelligence. Using tools for problem-solving, e.g. telescopes, microscopes, pencils and paper for making records. Integrated natural history and technical intelligence used for tool making in Upper Palaeolithic. Cave paintings were the DVDs of their day. Use of metaphor and analogy is the third feature of science. Some of these require only one domain, but the most powerful cross boundaries and require cognitive fluidity. Examples include the heart as a pump, atoms as solar systems, clouds of electrons, wormholes, “selfish” genes, “well-behaved” equations – or minds as Swiss army knives or cathedrals.

Epilogue: “The origins of agriculture”. Agriculture arose independently in many parts of the world around 10,000 years ago. How animals reproduced has been known as long as natural history intelligence has existed, 1.8 million years. We also know a good deal about the animals hunter-gatherers hunted. But it is only recently that we have learned about exploitation of plant foods. Charred plant remains found at 18,000 year old sites in Wadi Kubbaniya, to the west of the Nile Valley indicate finely ground plant mush used, probably to wean infants. Roots and tubers have been exploited, possibly all the year round, from permanent settlements. At Tell Abu Hureyra, Syria, occupied by hunter-gatherers between 20,000 – 10,000 years ago, 150 edible plant species have been identified. At both sites technology for grinding and pounding plant material has been found. To sum up, both botanical knowledge and technology to support agriculture was in place at these sites well before agriculture itself was practiced. Why? A degree of compulsion must have been involved.

Agriculture has many disadvantages – being tied to a particular place leads to sanitation problems, the rise of disease, social tensions, depletion of resources such of firewood. Bones and teeth indicate health of early farmers poorer than hunter gatherers. Yet 10,000 years ago agriculture was widely adopted with a wide range of crops being brought under cultivation – wheat and barley in south-west Asia, yams in West Africa, taro and coconuts in south-east Asia.

There are two conventional explanations for the near-simultaneous switch to agriculture. One is that by 10,000 years ago population levels had got too large to be sustained by hunter gathering. This theory is not plausible or supported by the evidence. Hunter-gatherers can limit population by infanticide. Population in a mobile society is limited by difficulties of carrying young children around.

Other possibility was rise in temperature at the end of the last ice age, which was preceded by global climatic fluctuations lasting 5,000 years between warm/wet conditions and cold/dry conditions. In south-west Asia, the first farming communities are seen at Jericho and Gilgad are seen with wheat, barley, sheep and goats. But the wild ancestors of these cereals had grown and been exploited by hunter gatherers in the same places (e.g. Abu Hureyra). The stratified sequence of plant remains has been studied by archaeo-botanist Gordon Hillman and is very informative about changeover from hunter gathering to agriculture. Between 19,000 and 11,000 years ago, climate in south-west Asia improved as European ice sheets retreated, leading to warm/wet conditions especially during the growing season. During this time hunter gatherer punter factor increased exploiting more productive food plants and predictably-moving gazelle herds. Evidence suggests a wide range of plants exploited. But at 11,000 – 10,000 years ago drier and even drought conditions returned. Fruit trees could not survive drought and wild cereals could not survive dry cold conditions. Small-seeded legumes are more hardy but require detox to make them edible. At 10,500 years ago Abu Hureyra was abandoned. When people returned 500 years later they adopted agriculture. Similarly in the Levant at around 13,000 – 12,000 years ago hunter gatherers switched from a mobile to a sedentary life-style, probably in response to a short abrupt period of aridity which resulted in dwindling, less predictable food supplies. This period of building settlements yet retaining the hunter gatherer lifestyle is known as the Natufian and lasted up until 10,500 when true farming settlements appear. The settlements were often extensive and included underground storage pits and terraced huts. There was an expanded range of bone tools, art objects, jewellery and ground stone tools. Stands of wild barley intensively exploited.

There was a point of no return as described by Ofer Bar-Yosef and Anna Belfer-Cohen. Once the sedentary lifestyle had been adopted it was necessary to increase food production, because constraint on having children had been relaxed. Hence agriculture.

But the last ice age wasn’t the only one experienced by humans. Earlier human types didn’t adopt agriculture – but they lacked the mental means.

1) Tools for harvesting and processing plants required an integration of technical and natural history intelligence.
2) Animals and plants were used as a medium for acquiring prestige and power. This required integration of social and natural history intelligence. Eg meat and bones from bison, reindeer and horses, hunted on the tundra-like environment, were stored on the Central Russian plain 20,000 – 12,000 years ago. Access to these resources came increasingly under the control of particular dwellings and hence individuals – who thus used them as a source of power. Similarly in southern Scandinavia 7,500 – 5,000 years ago where people hunted red deer, pigs and roe deer. They appeared to focus on red deer – these were scarce, but the carcases were larger. There was more meat to give away, providing prestige and power. When this was not available day to day needs could be catered for by exploiting plants and fish. Red deer antlers and teeth necklaces are prominent grave goods suggesting these animals were important to these people. It would seem that agriculture was a means for some individuals to gain and maintain power. Brian Hayden favours this explanation and argued that competition between individuals using food resources to wage their competitive battles provided the motives and the means for the development of food production. Hayden felt evidence in the Natufian culture of long distance trade of prestige items and abundance of jewellery, stone figurines and architecture were evidence of social inequality, reflecting the emergence of powerful individuals. To maintain their power base these individuals had to continually introduce new prestige items and generate the economic surplus they needed to maintain power. Many first domesticates were prestige items like dogs, gourds, chilli peppers and avocados rather than resources for feeding a large population.
3) Social relationships with plants and animals, i.e. social and natural history integration. Evidence includes injured reindeer kept alive until injuries healed and the loving care a gardener gives to his plants.
4) Manipulating plants and animals – technical and natural history. This is basically treating them as artefacts to be manipulated. E.g. burning parts of forest as environmental management to encourage plant growth and attract game; leaving a bit of yam in the ground to allow the yam to grow again.


Steven Mithen’s basic premise is that in order to fully understand the mental architecture of a modern human, we need to look at the evolutionary history of Homo sapiens. As an archaeologist, he has used archaeological evidence as the main pillar of his work. He has supported this with evidence from anthropology, linguistics, evolutionary psychology, developmental psychology and primatology.

Mithen believes that are two basic types of intelligence. The first, which appeared early in primate history, is “general intelligence”, a capacity for non-specific learning which can be applied to a wide range of problems. Later there appeared more specialised intelligences for particular tasks – a social intelligence for dealing with social interactions; a linguistic intelligence; a technical intelligence for tool-making; and finally “natural history” intelligence for dealing with the natural world.

In early humans, such as the Neanderthals, these various intelligences were isolated from one another, which restricted the range of thoughts available to the early human mind. Modern Human Behaviour is a “package” of behaviours generally accepted by anthropologists to include the use of abstract thought, symbolic behaviour (such as art, creative expression and religion), use of syntactically-complex language and the ability to plan ahead. This, according to Mithen, did not emerge until humans attained “cognitive fluidity” which enabled the various intelligences or cognitive domains to interact synergistically. Finally, cognitive fluidity permitted reflexive consciousness of the modern type. Language acted as the means of delivering non-social thoughts into the social domain and consciousness could start to play the integrating role. Individuals could become introspective about their non-social thought-processes and knowledge, leading to the flexibility and creativity that characterises modern human behaviour.

That language is intimately linked to modern thought processes has also been suggested by Derek Bickerton, though the Bickerton theory of how modern human behaviour arose differs in a number of details from that of Mithen; in particular Bickerton sees the proto-language of early humans as being entirely distinct from modern language, which evolved primarily as a means of representing concepts rather than a means of communication (Bickerton, 1990). In his later work The Singing Neanderthal, Mithen rejects Bickerton’s compositional (word-based) proto-language and claims that the utterances of early humans were holistic (Mithen, 2005).

The notion of humans initially possessing compartmentalised minds is to some extent reminiscent of Julian Jaynes’ theory of “bicameral minds” proposed two decades before Mithen’s theory (Jaynes, 1976).

In his book, Mithen claimed modern human behaviour did not arise until after anatomically modern humans left Africa, based on the then-prevalent view that the earliest evidence for it is seen in Europe around 40,000 years ago (e.g. Diamond, 1991; Klein, 1999; Klein & Edgar, 2002). However this view has been challenged (McBrearty & Brooks, 2000; McBrearty, 2007; Henshilwood et al, 2004; Lewin & Foley, 2004; Oppenheimer, 2002), with clear evidence for a far earlier emergence in Africa. Mithen now accepts an earlier date, predating the migrations from Africa (Mithen, 2007). This revised timescale doesn’t really affect the validity or otherwise of Mithen’s theory.

However Mithen’s theory is not without its problems and some of the evidence he puts forward in its support is unconvincing. In particular, he suggests that Early Humans (Homo erectus, H. heidelbergensis and the Neanderthals) did not use bone, antler and ivory for tool-making was because they could not think of animal parts (catered for under natural history) in the tool-making technical domain. In other words Mithen is saying that Early Humans were aware that bone, antler and ivory were of organic material, but some kind of cognitive demarcation dispute prevented them from utilising such material for tool-making. But even chimpanzees will use organic material such as sticks for tools (e.g. termite sticks). Mithen seems to be implying that once modularity developed, the ability to use such materials was lost, which strikes me as being implausible. It seems more likely that Early Humans did make tools with organic materials on occasions, but these have failed to survive in the archaeological record.

Mithen then asks why tools were not made for hunting specific types of prey. He attributes this to a lack of integration between the technical intelligence (tools) and natural history intelligence (prey). This question could really be restated as “why are Mode 2 (Acheulian) and to some extent the Mode 3 (Mousterian) technologies of Early Humans primitive in comparison to the Mode 4 (blade) and Mode 5 (microlith) technologies of Modern Humans?”

Clearly this suggests that Early Humans were less cognitively advanced than later humans, but was this solely due to domain isolation?

To conclude, Mithen’s theory is well-argued and the existence of multiple intelligences in early humans is a strong possibility. Personally, though, I am somewhat sceptical as to whether anatomically-modern humans ever had this type of brain. It seems far more likely that “cognitive fluidity”, assuming it did not exist in earlier humans, is a characteristic of Homo sapiens and emerged with it. Derek Bickerton believes that the ability to use syntax in speech and thought is characteristic of our species (Bickerton, 2007): possibly it is this that provided Mithen’s “cognitive fluidity” (though as noted above, Mithen has criticized Bickerton’s theory). The braincase of modern humans is globular, in contrast to the long, low braincases of other human species. This change in shape may have arisen from the need to accommodate the neural anatomy that was responsible for the change in human mental organization to that of the modern type.


Bickerton D (1990): “Language and Species”, University of Chicago Press, USA.

Bickerton D (2007): “Did Syntax Trigger the Human Revolution?” in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

Boden, M (1990): “The Creative Mind: Myths and Mechanisms”, London: Weidenfeld and Nicholson.

Dennett D (1991): “Consciousness Explained”, New York: Little, Brown & Company.

Diamond J (1991): “The Third Chimpanzee”, Radius, London.

Dunbar R (1996): “Grooming, Gossip and the Evolution of Language”, Faber and Faber, London Boston.

Fodor J (1983): “The Modularity of Mind”, MIT Press, Cambridge, MA.

Gardiner H (1983): “Frames of Mind”, Basic Books.

Gardiner H (1999): “Intelligence Reframed”, Basic Books.

Christopher S. Henshilwood and Curtis W. Marean (2003): The Origin of Modern Human Behavior: “Critique of the Models and Their Test Implications”, Current Anthropology Volume 44, Number 5, December 2003.

Jaynes J (1976): “The Origin of Consciousness in the Breakdown of the Bicameral Mind”, Mariner Books, USA.

Karmiloff-Smith A (1992): “Beyond Modularity”, MIT Press, Cambridge, MA.

Klein, R. (1999): “The Human Career” (2nd Edition), University of Chicago Press.

Klein R & Edgar B (2002): “The Dawn of Human Culture”, John Wiley & Sons Inc., New York.

Lewin, R and Foley, R (2004): “Principles of Human Evolution” (2nd edition), Blackwell Science Ltd.

McBrearty S (2007): “Down with the Revolution”, in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

McBrearty S & Brooks A (2000): “The revolution that wasn’t: a new
interpretation of the origin of modern human behaviour”, Journal of Human Evolution (2000) 39, 453–563.

Mithen S (1996): “The Prehistory of the Mind”, Thames & Hudson.

Mithen S (2005): “The Singing Neanderthal”, Weidenfeld & Nicholson.

Mithen S (2007): “Music and the Origin of Modern Humans”, in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

Oppenheimer S (2002): “Out of Eden”, Constable.

Tomasello M (1999): “The Cultural Origins of Human Cognition”, Harvard University Press, Cambridge, MA & London.

© Christopher Seddon 2009

Language & Species (1990), by Derek Bickerton

Derek Bickerton (b.1926) is professor emeritus of linguistics at the University of Hawaii and believes that creole languages provide a powerful insight into both the acquisition of language by infants and the origins of language in humans. A creole is a stable fully-functional language apparently arising from a pidgin, which is a stripped-down lingua franca arising when people sharing no common tongue have to live and/or work together. Examples include merchant seamen in distant ports and, historically, slaves in the West Indies.

Bickerton is the main proponent of the Language Bioprogram Hypothesis (LBH). This theory states that the structural similarity between many creole languages must arise from an innate capacity in the brain.

The following is a summary of Bickerton’s 1990 work Language and Species:

Chapter 1 The Continuity Paradox
Human and animal behaviour separated by one major distinction that not often appreciated – language. Animal communications are holistic and limited, e.g. vervet monkeys have warnings for various types of predators. By contrast, human communications are complex and unlimited. How did one evolve from the other? The theory of evolution states that features do not arise de novo but must be built incrementally upon something already in existence, but how can something infinite arise from something finite? This is known as the Continuity Paradox.

Bickerton resolves this paradox with the bold assertion that language in humans did not arise from the vocalizations of other animals and that its primary function is not in fact communication but representation. Communication is no more than a handy spinoff.

Nouns do not correspond to real objects, only representations of them. If this were not the case we could not have words for things like unicorns and golden mountains, which do not exist in the real world. Our view of the world is always representational and not absolute – what we see is a representation built up by sensory data; through a glass, darkly as St. Paul might have put it.

Chapter 2 Language as Representation: the Atlas
Language can be regarded as a means of mapping reality in a style analogous to both an atlas and an itinerary book. It important to realise that the atlas and the itinerary book are both representations of reality and that therefore they cannot represent with absolute verisimilitude. This limitation also applies to language – it does not directly map the experiential world. Language is a mediated mapping, a mapping that derives from the processing of sensory inputs.

In this chapter, Bickerton considers the atlas-like properties of language and states that a word can have three levels of meaning. Our knowledge of the world, in common with that of other animals, is derived from a series of mapping operations. The first of these – shared with other animals – is from existential objects to neural cells and networks in the brain. The first level of meaning is simple perception of, say, a leopard (non-italicised and not in quotes). We can only perceive a leopard when one is actually present, but we can think about leopards in their absence. This second level of meaning is the concept of something, for example “leopard” – the concept of leopards (in quotes). Some animal such as frogs almost certainly don’t have concepts. Frogs react quickly to snap up passing insects, but this is simply a hard-wired reaction to small rapidly moving objects (it ignores stationary insects and reacts to pellets flicked across their line of vision, but it works more often than not). Humans on the other hand do have concepts: for example an unidentified sound at night will be matched against possible explanations. Vervets probably fall somewhere in between and can equate the smell, sound and sight of a leopard with the same thing. Finally there is leopard (italicised), which refers to the word itself – a label – without any clear meaning being necessarily attached to it.

“Leopard” and leopard are defined in terms of (the perception of a) leopard, but this isn’t necessarily always the case; (the perception of a) burglar can only be expressed in terms of the concept of a “burglar” and the word burglar; we can define “paranoia” and label paranoia, but we cannot perceive paranoia.

Units relating to entities are insufficient to describe the world, because pretty well everything we see is doing something; for example walking, running, swimming, flying, etc. The subject/predicate distinction in language is so fundamental that it tends to be taken for granted, but it corresponds to nothing in nature. You cannot see an animal without perceiving at the same time what it is doing, e.g. a cow grazing. There is no word for cow-grazing, but we would expect there to be if language exactly mirrored reality. One possibility is that this is for reason of economy, because we’d need words for cow-running, cow-mooing, etc. But Bickerton believes that the explanation is that the concept of entities preceded the concept of behaviours. Behaviours are more abstract than entities; a cow cannot be anything other than a cow, but many types of animal can graze or run.

Behaviours are of course not the only things that can be predicated of entities. Properties such as size, colour, temperature, etc may also be attributed to entities. Typically these adjectives are paired, large/small, hot/cold, fast/slow, etc. While we can have words such as fast, faster and fastest, there is no language that represents a continuum of, say, speeds or temperatures.

The level of representation given by the lexicon abstracts away from and interprets the flux of experience. It derives a wide range of entities, together with behaviours and attributes that can be predicated of these entities. These form an inventory of everything that we see; however the lexicon is not unstructured.

Words are hierarchical e.g. animal -> mammal -> dog -> Spaniel. The word “anger” includes a range of words from irritation and annoyance through to rage and fury. Anger in turn falls in the category of emotion. Words can not only be converted to strings of other words, but fall into place within a universal filing system that permits any concept to be retrieved and comprehended.

Words are also constrained by contiguity. For example there is no word for “left leg and left arm”, or “every other Friday” or “red and green”. The referent must be an uninterrupted piece of matter or time or space. This even applies to abstract properties like ownership, location, possession, existence. Some languages, such as English, use one verb (is) for existence, location, ownership (e.g. there IS a book, the pub IS across the road, the book IS yours) and another (have) for possession (I have a book); but no language groups together existence/ownership and location/possession (the equivalent of the pub HAVE across the road). This suggests that contiguity constraints exist even in highly abstract domains. Semantic space may well be an intrinsic property of the brain; the lexicon is carved up into convenient chunks.

Chapter 3 Language as Representation: the Itineraries
While a map can tell you what the terrain is like, an itinerary is required to tell you what journeys may be taken. Similarly there are rules governing a journey through semantic space. Sentences are underlain by three types of structural consistency: predicability, grammaticisation and syntax.

Predicability imposes constraints between entities and predication – e.g. “the story is true” or “the cow is brown” are permissible, but not “the story is brown” or “the cow is true”. Only abstract qualities can be predicated of abstract nouns; and concrete qualities of concrete nouns. What can and cannot be predicated can be drawn up on a tree diagram. A quality at the top of the tree can be predicated of any class below it, but of no class above it. A quality on a side branch can only be predicated of a class on the branch below it.

For example, trees, pigs and men can all be dead; but only pigs and humans can be hungry; and only humans can be honest. All of these things plus thunderstorms can be nearby, but only thunderstorms could have happened yesterday; and so on.

Three observations may be made about the tree. Firstly it has binary branching. There is no obvious reason for this. Why only two? Why not three or more branches at each node? Secondly there is a contiguity constraint – for example anything applying to humans and plants must also apply to animals. Thirdly the tree does not seem to be derived from experience of the world as children as young as three or four used only slightly truncated trees. This does suggest that language as a classification mechanism is constrained by the human-specific conceptual analysis of the natural world.

Grammatical items are structural pieces that hold the meaningful parts of the sentence together – either inflections (-ing, -ed, etc), or words like “of” as in “the handle of the door”, or above, below, on, in, at, by, before, after, while, etc. Some languages to not express all these relations; others express relations not found in English. For example Hopi and Turkish both have inflections that differentiate between information gained through personal experience or obtained second hand. But grammaticization is only used on a few relations – those pertaining to singular/plural, and past/present/future (tense).

No language grammaticizes more than a fraction of the possible relations and while tenses and singular/plural seems to be a universal feature of language there is no language with grammatical constructs for edible/inedible, friendly/hostile, etc, even though these things would be useful. It seems that we are obliged to grammaticize some things, yet other things cannot be grammaticized. While one might dismiss this as a mere convention of languages, conventions can be broken and these never are. We can expand lexicon but not grammar. The latter appears to be a black box; we can neither alter it nor explain it.

Syntax is highly complex, yet we can all master its subtleties. A sentence is constructed of phrases; each phrase is a hierarchical not linear entity. A sentence of 10 words can be re-arranged over 3 million ways, only one of which is correct – yet we can do it effortlessly. Without syntax, complex ideas could not be communicated.

Chapter 4 The Origins of Representational Systems
Language must have evolved as a representational system, not for communications. How did this happen? Our senses give us a species-specific view of reality, only a subset of the data potentially available (e.g. little smell data, unlike dogs). This is the primary representation system, or PRS. All such systems arose from cells that could differentiate between two states, a distinction between sensory cells and motor cells, and motor cells capable of more than one behaviour type in response to a given stimulus. Humans alone have a secondary representation system (SRS) – language.

At lowest level there are organisms like sea anemones that can identify chemical signature of hostile starfish and execute an escape manoeuvre. Next is conditional response such as a crayfish that becomes habituated to being touches and eventually does not waste energy on an escape manoeuvre, or a grub that only moves if light-levels increase above a certain threshold. Ability to evaluate data is more complex in – say – lizard stalking a fly, where there is actual data processing by the brain leading to a choice of behaviours.

Vervet monkeys are genetically-programmed to respond to snakes. Similarly, if you touch something hot you move your hand away without thinking too hard. But such an approach has its limitations. Wildebeest do not always flee when they see a lion. If they did, they’d have less time to feed and they’d either exhaust themselves or starve. So they become alert – indeed they experience fear. But they don’t flee until threat assessment becomes critical. But fear – an emotion – is crucial to making a decision to flee.

Representations are either innate (metabolizing food, growing hair, producing sentences, etc) or learned (writing, sewing, swimming, etc). We are conscious of learned representations, but cannot access innate representations. But all representations lead to category formation – to form a category three things are needed: an object in the external world; patterns of cell activity in observers brain directly or indirectly triggered by the object’s presence; and the observer’s response, both internal and external to these patterns. Categories are species-specific.

For humans, categories are basically concepts, “concept” is simply the name we give a category. In non-human animals, categories might be referred to as proto-concept. Which came first – language or concepts? Probably language originally labelled proto-concepts derived from pre-linguistic experience; this was later expanded to be capable of deriving concepts not present in PRS, e.g. absence, golden mountains, etc. While the SRS can divide the universe exhaustively, PRS must do the same, e.g. for frogs, everything is either a frog, a pond, a large looming object or something else not relevant to frogs.

Pigeons can develop quite sophisticated categories – can be trained to peck certain classes of object, e.g. tree pictures. Such behaviour cannot be entirely innate as they can be trained to respond to objects they could have no knowledge of. But some categories – trees, humans, etc – probably are innate; probably categories of things that are significant to a particular species are innate, but the ability to analyse novel objects as well, by utilizing this processing power subsequently evolved. However provided the referents of particular proto-concepts remained relevant, these would be retained and new ones would be added over evolutionary time.

Categories/proto-concepts such as “tree”, “human” etc may be precursors of nouns. Some monkeys have temporal cortex cells that respond to movement of a primate-like figure – could these be proto-verbs? But these are agent-plus-actions rather than actions – human language does not conflate an entity and its behaviour into single words; subject-predicate distinction is fundamental as seen earlier.

Tiger running, tiger walking, tiger attacking could be broken down into “tiger” + action; however tiger running, dog running, insect running cannot so easily be broken down into X + “running” as the types of “running” differ, as opposed to only one type of tiger. This is why verbs are more abstract than nouns and are harder to represent. However if only a subset of a particular behaviour is considered, it can be restricted to species likely to perform it – for example only primates can “grab with hand”.

Proto-nouns might have represented species interacting with hominids. Proto-verbs might have been actions only hominids could perform. This implies awareness of conspecifics with which the creature interacts – in turn implying a social species. Awareness of self is a cornerstone of language and consciousness.

Chapter 5 The Fossils of Language
Ape language is basically very limited. Does it represent an earlier form of human language? It is comparable to that of a 2-yr old human. Bickerton then considers the possibility that “ontogeny recapitulates phylogeny”. “Genie” was a 13 year old girl imprisoned from birth and not exposed to language. After her rescue, she learned only ape/2 yr old-type language and could not be taught full language. Genie failed to acquire human language but has acquired something else. Language therefore cannot be a unitary system requiring input during critical period or Genie would have not acquired any language at all. Genie acquired proto-language (a robust “mature technology”) but could not go further and acquire full language. The means of acquisition are not the same for both.

Pidgins are proto-languages. Numerous examples known, for example slaves in West Indies, immigrants to Hawaii from 1880-1930, Russian and Scandinavian sailors; their speakers nevertheless have normal linguistic skills.

Thus there are four classes of proto-language speakers: apes, under-2-yr-olds, adults deprived of language and pidgin-speakers. 1. Language has word order constrained by general rules, formal structure; proto-language does not. 2. Language uses null elements in a consistent manner; proto-language does not (not well explained). 3. In language verbs have one, two or three arguments (like subprograms). Sleep 1 “Fred sleeps” Go 2 “Fred goes to bed” Give 3 “Fred gives Bert five pounds”. 4. Proto-language cannot expand phrases – the man to the tall man, tall bald man, tall bald fat man etc, or concatenate phrases – John wants books -> John wants books to study. 5. Proto-languages do not inflect.

Proto-language is not a blanket term for ungrammatical language (e.g. people with aphasia due to damaged Broca’s area). It does appear to be a distinct thing in itself. But how did we get from proto-language to full language?

Chapter 6 The World of Protolanguage
Relative brain size jumps at the Homo habilis-ergaster/erectus boundary. H. habilis is claimed to show enlarged Broca’s and Wernicke’s areas and was right-handed; first signs of lateralization. Proto-language could have begun with H. habilis (Bickerton’s hominid evolution diagrams represent the state of knowledge current in 1990). Tools and language are unlikely to have co-evolved. If H. habilis had proto-language than H. ergaster/erectus possibly had language – in which case why did the Acheulian tool tradition not evolve? It is likely that H. habilis did not have language and H. ergaster/erectus had only proto-language. Possibly this aided them in the use of fire, which seems well-attested. Bickerton rejects the theory of “gesture language”. If this was correct, infants would use sign-language (assuming ontogeny recapitulates phylogeny); language must have been vocal from the beginning. Sufficient cortical control was probably achieved by the time of Australopithecus afarensis – involuntary calling would have been maladaptive to a species that operated on the savannahs of east Africa. Vocal tract developed as the larynx lowered, increasing the risk of choking. Proto-language probably didn’t require the perfected human vocal tract, which would have been maladaptive if it had developed first. However changes to the vocal tract would have been favoured after proto-language developed. Original vocabulary was probably small – phonology may have developed in conjunction with syntax. A pre-phonological stage may exist in pre-syntactic children.

Reliance on sight in primates increased area of brain needed to process data: increased PRS categories -> drove things on in the direction of language. Chimps have few enemies but savannah-dwelling hominids have many. Curiosity about surroundings; recognising and categorizing was adaptive and selected for.

Few animals face the same set of problems as early humans as few are both social and omnivorous, with such varied feeding habits. Social herbivores move in herds, social carnivores hunt in packs and kill much larger animals; humans could not do this 2 million years ago. A band of foraging humans could split into small groups able to use proto-language could get the attention of others and lead them to finds too large for them alone.

There are three types of learning: Experiential learning (e.g. I am trying to escape from a tiger, I jump into a river and swim across and the tiger fails to follow. Next time I am chased by a tiger I’ll head for the river); Observational learning (I see a man escape from a tiger by swimming across a river and conclude this is the thing to do if I’m in the same position); and finally Constructive learning (I note that tigers will go round a body of water rather than swim across it. I conclude that tigers avoid water and if attacked by one this might offer an escape route).

Pretty well all animals are capable of learning from experience, and many can learn by observation such as the blue tits that began pecking their way into milk bottles in the UK in the 1970s. The majority of these birds undoubtedly learned the trick from watching their conspecifics.

But is any animal lacking language capable of constructive learning? For apes, it appears to be possible only in a limited fashion and all the elements involved need to be physically present. Anything involving absent individuals or classes of entity requires a form of representation beyond those available to non-humans.

There is nothing in the fossil record to suggest that H. ergaster/erectus had cognitive capabilities comparable to ours. It therefore seems likely that H. ergaster/erectus lacked syntax, hence had only proto-language and could not think as we do.

Chapter 7 From Protolanguage to Language
Proto-language can evolve to true language without an intermediate. There is no plausible intermediate between the two. A child moves rapidly from proto-language to full language, falling back on the former only when their lexicon does not have the necessary grammatical words. Pidgins develop into creoles – i.e. a true language arises from a proto-language. Creoles tend to have the same grammar regardless of the constituent languages suggesting a biological basis for it. In neither of these proto-to-full language transitions is an intermediate involved.

Modern human behaviour (whenever this did emerge) was probably linked to emergence of true language. Bickerton seems to support early emergence (i.e. AMH were behaviourally modern from Day 1) contra Klein (1999 etc), Mithen (1996), etc. The gap in the fossil record he attributes either to use of perishable materials while H. sapiens remained confined to Africa, or that it took time to develop the artefacts of modern humans despite always having the capacity to do so. This view, quite radical for 1990, is basically the position taken by McBrearty and Brooks (2000).

Cases for an intermediate language are not plausible because the intermediate would be as complex as the full-blown language. Nor could features of a true language be acquired piecemeal as they are all too interlinked. The only way a gradual process could have happened would have been if the structural principles were at hand, but the lexicon was still limited.

Proto-language probably acquired grammatical items – negator, wh-questions; auxiliaries (can, must, etc); time – earlier/later; location particles (often only one meaning on/in/at/to/from); possibly even pronouns.

The verb arguments are of three types (thematic roles) Agent, Patient, Goal – e.g. Bert (agent) gave five pounds (Patient) to Fred (Goal). These roles are not given by nature but are high-level abstractions. They probably originated through millennia of day-to-day hominid routine, with Agent as the most important. These roles were probably not systematically expressed.

These two developments in proto-language could have facilitated the emergence of true language. Verbally expressing emotion may have come next, followed by use of proto-language to model internal states of others. But how did we get to language? Could one mutation have done it? Could one mutation have generated a) syntax, b) skull features and dimensions and c) the larynx positioning?

Author believes possible explanation for a) is visual-processing areas of the brain could have been pressed into service to process syntax and not some central repository such as Broca’s area. This would explain why aphasia affects only grammaticization, not syntax (allegedly) [the role of the FOXP2 gene wasn’t discovered until 1998].

Chapter 8 Mind Consciousness and Knowledge
Einstein’s claims notwithstanding, language is required for thinking. Much goes on beneath the level of conscious thought that the thinker is unaware of. Mind, consciousness and the search for knowledge may all arise from having a language-based SRS with a syntax processor.

Even if the human mind does derive from language, this does not tell us about the precise relationship between language and mind. It was once believed that a full understanding of language would serve as a “window on the mind”, but this implied that language permeated the mind at every level. This in turn implied that the mind was a single problem-solving mechanism, as often been assumed by empiricists.

This view is seemingly at odds with the “modular mind” theory of Jerry Fodor, Howard Gardner, Annette Karmiloff-Smith etc. Despite the success of modularity theories, there is a problem. If modularity emerged after language there would not have been enough time for other modules, each with their own unique mechanism, to have evolved subsequently. [If Steven Mithen is correct, modularity considerably predates the emergence of fully-modern Homo sapiens (Mithen, 1996)].

Conversely if modularity emerged first and remained largely uninfluenced by the development of language it would only work if these were independent of language [which I believe is the accepted view] and language was not a representational system but merely a code for expressing the output [why?]. It would also predict human intellectual capabilities largely pre-existed language, which is clearly not the case [Mithen’s “cognitive fluidity” seems to be the answer here (Mithen, 1996)].

Bickerton’s resolution of this modularity versus window-on-the-mind problem is to suppose that that syntax processing is not an isolated module but a particular type of nervous organization that permeates and interconnects those areas of the brain devoted to higher reasoning processes, concepts and the lexicon, a type of organization that automatically sorts material into binary-branching tree structures. Other modules will then receive and output material that has been pre-processed to conform to syntactic principles [this suggests a mechanism by which Mithen’s “cognitive fluidity” might work, though in fact Mithen is critical of Bickerton’s proto-language and believes utterances of early humans were holistic (Mithen, 2005)].

What is “I”? Am I the whole body or just mind or a homunculus? Human language divides entity from behaviour, so “I am hungry” suggests “I” is divided from being hungry. Is the central directing “homunculus” a product of language – nothing more than an illusion – or is it something more? The latter suggests the human organism is indeed divided in some way, and not necessarily the way language suggests it is. In other words, the brain is modular.

Experiments with left/right hemispheres have suggested that right hemisphere has only PRS, lacks syntax capability but can do inference.

“I” cannot control the entire organism – cannot control bodily functions, which carry on if I’m asleep or unconscious. There is accessible I – linked to language and inaccessible I – not linked to language. This is better than mind-body model. Talking I is a module that forms a part of accessible I, though sometimes other modules grab the microphone. “I forced myself to do x” means “information in the SRS indicated that doing X would bring long term benefit, despite short-term appeal of doing Y”.

Chapter 9 The Nature of the Species
We are living in the fourth age of man [taken to be H. sapiens only]. In the first phase, from 200K years ago to 40K years ago, humans were hunter-gatherers confined to Africa. In the second phase, humans left Africa and “beat the Neanderthals”. The third phase, which began with the coming of agriculture at the end of the last ice age, introduced territorialism and inequality. [In fact early Neolithic societies, such as that at Catal Hoyuk in Anatolia, seem to have been fairly egalitarian, though there is no doubt that sedentism, which made it possible to accumulate possessions for the first time, led to the beginnings of social inequality. The notion that pre-agricultural man was not territorial seems highly dubious to me, considering that chimps are territorial.]

The Fourth Age begun 400 years ago, when inequality between state-level societies emerged.

“Did Syntax Trigger the Human Revolution” (Bickerton, 2007) is a paper submitted as Bickerton’s contribution to a series of papers published after the 2005 Cambridge conference entitled “Rethinking the Human Revolution”, part of the on-going debate about the mode, tempo and timing of the emergence of modern human behaviour.

Bickerton rejects the notion of a “Great Leap forward” 50-30 kya in Europe; the evidence now suggests that features thought to be novel to Europe emerged in Africa earlier. Did characteristically human cognitive capacities (CCHC) emerge gradually over 200-300 ky? It is more plausible that the change occurred with emergence of our own species. Bickerton considers and rejects the notion of studying tool sophistication because of the difficulty of agreeing what constitutes sophistication. There are also the assumptions that gradual increase in tool complexity implies increase in CHCC and that the moment a CHCC emerges it must result in artefact change. It is more plausible that when modern humans evolved, CHCCs emerged with them, but the novel artefacts only appeared later in response to selective pressure or cultural development. However it is equally implausible that these new CHCCs lay dormant for extended periods of time. An in between position seems the most likely.

It is possible to assume that the Acheulian hand-axes required the maker to conceptualise the finished article [the accepted position] but makers could simply copy and possibly modify and improve upon existing axes. Bickerton believes second possibility is more parsimonious, as there are objects intermediate between Oldowan and Acheulian, and between Acheulian and subsequent industries. On this picture, all human (inc. pre-sapiens) artefacts fall into two classes – those that are modifications of earlier artefacts and those that are completely new and would have to be imagined first (e.g. fish hooks, “Venus” figurines, etc).

What is required to create novel artefacts? Not necessarily bigger [or even more encephalised?] brains. Bickerton returns to his thesis of a proto-language developing around 2 mya. It is not, by itself, enough to produce novelties, though it would have increased social and foraging capacities. To sustain the trains of thought needed to produce novelties, something else is required. Bickerton distinguishes between “thought 1” (pre-linguistic thinking), “thought 2” (thinking with proto-language) and “thought 3” (thinking with full language). Only “thought 3” would permit a sustained train of thought.

Thought 1 could permit such thoughts as “that is a lion” (reacting to the sight of a lion) or “I am hungry” (feeling peckish) but not “hungry lions are dangerous” which would require the ability to instantiate the abstract class of “lion” at will rather than in response to actually seeing a lion. Australopithecines and present-day apes were/are probably restricted to thought 1.

Proto-language could have been holistic, like Steven Mithen’s “hmmmm” (Mithen, 2006) or – as per Bickerton’s position – comprise short, unstructured strings of single units (either oral or gestures) roughly corresponding to the individual words of present-day languages. Such a proto-language would have had a term for a lion corresponding to the abstract class “lion”. It would have enabled its possessors to think about things in their absence without triggering the responses (fight, flight, wait, etc) that a pre-linguistic signal might have occasioned.

Bickerton dismisses the long-running debate on the possibility of thought without language. If thought is mental computation, then anything with a brain is capable of thought. The issue is how to think at a level that creates novel behaviours or artefacts. Thinking is not conducted by words or images, as the brain does contain words and images, only neural pathways. Proto-language units enable creation of neural representations that allow thinking without direct reference to external objects to take place. These units could have been used for both language and thought. The former would have required an additional layer of mapping to a phonological representation for utterance and also a relationship between units – words and/or signs.

Language and proto-language both concatenate units of which they are composed, but language does so in a highly-structured manner with embedded phrases and clauses where as proto-language simply assembles words like beads on a string. Complex thought would have been impossible and proto-language speakers would have remained confined to thought 2.

The crucial difference is syntax. Bickerton believes that the same mechanism required to produce full language also enables the brain to marshal the complex trains of thought needed for innovation. Since the functionality is similar it is more parsimonious to assume the existence of just one rather than two distinct mechanisms. Bickerton speculates the need for more complex utterances might have led to the evolution of a syntax system, which served without further modification as an organizer of thought, and that its possessors are capable of thought 3. It is therefore likely that the development of syntax of language was a necessary and possibly sufficient pre-requisite for the emergence of modern human behaviour.

Bickerton D (1990): “Language and Species”, University of Chicago Press, USA.

Bickerton D (2007): “Did Syntax Trigger the Human Revolution?” in Rethinking the human revolution, McDonald Institute Monographs, University of Cambridge.

Fodor J (1983): “The Modularity of Mind”, MIT Press, Cambridge, MA.

Gardiner H (1983): “Frames of Mind”, Basic Books.

Gardiner H (1999): “Intelligence Reframed”, Basic Books.

Karmiloff-Smith A (1992): “Beyond Modularity”, MIT Press, Cambridge, MA.

Klein, R. (1999): “The Human Career” (2nd Edition), University of Chicago Press.

Mithen S (1996): “The Prehistory of the Mind”, Thames & Hudson.

Mithen S (2005): “The Singing Neanderthal”, Weidenfeld & Nicholson.

The quest for the Proto-Indo-European homeland.

The Quest Begins
In 1786, a 40 year old English judge by the name of Sir William Jones made a remarkable observation suggesting far-reaching events many millennia previously; events that had left an eerie footprint in languages spoken by diverse people living thousands of miles apart, between which absolutely no link now existed in the historic record.

The son of a mathematician, Jones was a linguistic prodigy who had learning Greek, Latin, Persian and Arabic at an early age. Despite the death of his father when he was aged three, he went to school at Harrow and then on to Oxford. However, he was forced to work as a tutor in order to make ends meet and in 1770 he took up the legal profession, doubtless at least to some extent attracted by the prospect of financial security. After working as a circuit judge in Wales, he went out to India in 1783 to serve at the High Court in Calcutta. There he became interested in Sanskrit, the classical language of India, a language holding a position analogous to Latin in South and Southeast Asian culture. It dates to around 1500 BC and possibly earlier, and although it has not been spoken for many centuries, it is still used in religious texts and remains to this day as one of India’s 23 official languages.

Jones noted Sanskrit shared many similarities of both grammatical structure and vocabulary with Ancient Greek and Latin – all of which are “dead” languages that were current at roughly the same time. In a famous address to the Asiatic Society in Calcutta, he claimed that these similarities could not be dismissed as chance and suggested all might have arisen from the same source. Jones also speculated that Gothic (the precursor of German), Celtic and Old Persian might also share the same common origin.

The idea that languages spoken in places as far apart as Iceland and India might be linked was startling to say the least, but the connection seemed real. Furthermore the language group – which was soon dubbed Indo-European – grew rapidly. In many cases, relationships among language groups were readily discernable and during the first half of the 19th Century, no fewer than nine major language groups were recognised as members. Thereafter progress was slower and it was not until the late 19th and early 20th Century that the last two major language groups were admitted to the fold. The Tocharian languages – once spoken in the Tarim Basin in Central Asia – were uncovered from a fifth century AD manuscript procured from a Buddhist monastery by the Hungarian-born archaeologist Sir Marc Aurel Stein. Hittite – another lost language – was deciphered by Bedrich Hrozny in 1917 from cuniform tablets excavated in Anatolia some years earlier, and it and other related languages in the region are now recognised as making up the so-called Anatolian group.

Today, no fewer than eleven major Indo-European groups are recognised:

1) Celtic. Includes Welsh, Cornish, Manx; spoken Britain, Ireland, across Europe, from the Bay of Biscay and the North Sea to the Black Sea and the Upper Balkan Peninsula; first attested 1000-500 BC.
2) Italic. Includes Latin, Italian, French, Spanish, spoken in Italy; first attested 1000-500 BC.
3) Germanic. Includes Danish, German, Dutch and English, spoken in Germany and Scandinavia; first attested AD 1-500.
4) Baltic. Includes Latvian and Lithuanian, spoken east and south-east of the Baltic.
5) Slavic. Includes Russia, Polish, Czech, Serbo-Croatian, spoken in Eastern Europe and the Balkans; first attested AD 500-1000.
6) Albanian. Spoken in Albania, Kosovo, Montenegro and Macedonia; First attested AD 1500.
7) Greek. First attested 1500-1000 BC.
8) Anatolian. Includes Hittite, spoken Asia Minor; first attested 1500-2000 BC.
9) Armenian. First attested AD 1-500.
10) Indo-Aryan. Includes Sanskrit, Urdu, Hindi, Iranian, spoken in India and Iran, first attested 1500-1000 BC.
11) Tocharian. Spoken Tarim Basin, Central Asia; first attested AD 1-500.

In addition, ten minor groups are recognised: Lusitanian, Rhaetic, Venetic, South Picene, Messapic, Illyrian, Dacian, Thracian, Macedonian and Phrygian.

What does all this mean? During the course of his 1786 lectures, Sir William Jones also put forward the idea that the various ancient languages could all be traced back to “some central country” which he argued was Iran. He set in motion a debate that has continued ever since, during which the “central country” has been located at just about every point on Earth, leading the American scholar JP Mallory to comment “One does not ask ‘where is the Indo-European homeland?’ but rather ‘where do they put it now?’”.

That a group of languages – a so-called “language family” – can arise from a common origin was already accepted in Sir William Jones’ day. It had long been realised that French, Spanish, Portuguese, Italian, Romanian, etc had all diverged from Latin and from each other after the fall of the Roman Empire in the 5th Century AD. Following the same chain of reasoning, it was therefore logical for the early Indo-European scholars to assume that Sir William Jones was right and that similarities between the Indo-European languages could be explained by divergence from a single ancestral language; and that its original speakers had arisen from one region. This hypothetical language was termed Ursprache by German scholars, or Proto-Indo-European (often abbreviated to PIE). Its’ supposed speakers became known as the Urvolk and their original homeland the Urheimat.

But who were the inhabitants of this homeland, when did they live and how did languages descended from their ancestral tongue come to be spoken across such an enormous area, far greater in extent than the Roman Empire? It really is the ultimate “whodunit”.

On the face of it, given the absence of anything in the historic record, the task facing Indo-European scholars in trying to answer any of these questions might seem impossible. In fact in the rather more than two centuries since Jones’ discovery, three main lines of enquiry have opened up – linguistics, archaeology and population genetics. The linguists moved into action almost immediately but prehistoric archaeology did not really develop as a discipline until the mid-19th Century; and not for another fifty years was archaeological evidence systematically called upon in attempts to solve the problem. Population genetics is a far more recent discipline and not enter the fray until the latter part of the 20th Century. Only now – with these three very different disciplines supported by statistics and modern computational methods – is a clear picture at last beginning to emerge.

How Languages spread and change
As linguistics was the first line of attack upon the Indo-European question, it makes sense to start with a review of the various linguistic and related methodologies that have been brought to bear on the problem since the 18th Century. Before tackling the Indo-European question in detail, however, we should first ask how a particular language comes to be spoken in a particular region?

Obviously when settlers move into a previously uninhabited region for the first time, they will bring their language with them. But the languages spoken by, for example, the people who moved into Britain and Northern Europe at the end of the last ice age have long since vanished. What happened to them? There are two things that can happen to a language once it has been introduced to a region – firstly it can be replaced by another language; secondly it can itself evolve.

Language replacement happens when the language spoken in a particular region is replaced by another brought in by people from a different region. There are a number of ways this can happen.

Subsistence/Demography occurs when large numbers of people move into a territory, bringing their language with them. The newcomers don’t have to be conquerors – the process usually refers to subsistence farmers moving into a territory previously inhabited only by hunter-gatherers. This is a process we shall return to in greater detail.

On the other hand elite dominance occurs when invaders conquer a territory and impose their language on native peoples. It will clearly be to the advantage of the subjugated people to learn the language of their conquerors when doing business, pursuing legal and religious matters, etc, though initially they will continue to converse in their native language when with friends and family. Although the first generation will only speak the new language as a second language, the next is likely to be fully bilingual, having been exposed to the both languages from birth, and the one after that will probably regard the old language as a second language, used mainly for conversing with their grandparents. Within a few generations the old language will die out altogether.

The classic example of elite dominance is the spread of Latin, which was originally confined to a small area around Rome. At that time, one would have had to have travelled no more than 40km north of Rome to find people speaking a different, albeit closely related language – Faliscan. Only a little further to the north, people were speaking Etruscan, a completely unrelated language. But as Rome’s power grew, Latin came to be spoken throughout the whole of Italy and eventually across large areas of Europe.

Another model of language replacement is system collapse, which occurs when an organised society collapses or at least retreats from the peripheries of its zone of influence and other groups move in to exploit the resulting power vacuum. This happened in Britain after the Romans left and the Angles, Saxons and Jutes moved in, bringing with them the West Germanic language from which in time English arose.

Finally there is lingua franca, where a trading language (pidgin) develops in a region as a result of trading or other activity by outsiders. Usually the pidgin is a simplified version of the outsider language. In time a creole – which is a brand-new language – may arise from the pidgin.

Language development as opposed to language replacement occurs over time because languages themselves are not static. There is nothing strange about this – in fact it would be strange if languages did not change with time. For this to be so, people would have to reproduce exactly the same sounds and idiom from one generation to the next, something contrary to human nature even if we isolated a population from social change, contact with other cultures, etc.

For a written language, change can easily be demonstrated by studying texts over a period of many centuries. English has undergone considerable change over the last millennium, as illustrated by these four versions of the 23rd Psalm:

Modern English
The Lord is my shepherd, I lack nothing.
In meadows of green grass he lets me lie.
To waters of repose he leads me.

Early Modern English
The Lord is my shepherd, I shall not want.
He maketh me to lie down in green pastures.
He leadeth me beside the still waters.

Middle English
Our Lord gouerneth me, and nothyng shal defailen to me.
In sted of pasture he sett me ther.
He norissed me upon water of fyllyng.

Old English
Drihten me raet, ne byth me nanes godes wan.
And he me geset on swyth good feohland.
And fedde me be waetera stathum.

Most of us will, if anything, be most familiar with the second version, which has a “biblical” feel to it but is in reality the way people spoke in Shakespeare’s day. But the third version is decidedly strange and the fourth might as well be written in a foreign language. Unlike Modern English, Old English was a fully-inflected language, and it also predates the enormous influx of so-called “loanwords” from French that occurred after 1066. It should be noted that the change is continuous – “Modern English”, “Early Modern English”, “Middle English” and “Old English” are no more than arbitrary points in the evolution of the English language. But how do these changes come about?

One way is by the “morphing” of words as one generation’s sloppy speech becomes the received version. As noted above, English was once an inflected language and nouns had case endings, but in time these fell into disuse – for example the plurals of “fox”, “tongue” and “book” were once “foxas”, “tungan” and “bec”. Some of these archaic forms do survive – for example ox/oxen, sheep/sheep, man/men, woman/women, and child/children.

Pronunciations also change with time – for example the silent “k” in words such as “knife”, “knee” and “knight” was once pronounced; the Old English forms of these words were “cnif”, “cne” and “cniht”. Similarly the “ch-“in words such as “chicken” and “cheese” was once pronounced as a hard “c-”, i.e. “cycen” and “cese”. A recent (trivial) example of pronunciation change is the planet Uranus, which before the advent of popular science broadcasting was usually pronounced with the stress on the second syllable – then about thirty years ago, somebody in authority must have become embarrassed and insisted on a change.

Another way a word can morph is by conflation with the indefinite article – for example “nickname” was once “eka name” – the now-archaic word “eka” meant “also”. But in time, people began to pronounce “an eka name” as “a neka name” from which the progress to “a nickname” was fairly straightforward. The reverse can also happen – oranges were once known as “noranges” (from Hindi) – but “a norange” eventually became “an orange”.

Not only do words morph, they can also acquire additional meanings, or change their meaning entirely. This process is known as semantic change. An obvious example is the word “gay”, which originally meant “carefree”. During the 20th Century the word gradually came to mean homosexual; but by the turn of the century it had acquired an additional pejorative meaning – to describe something as “gay” is to decry it. Other examples include “silly” (original meaning “glorious”), “villain” (a peasant farmer in feudal times) and “husband” (which originally meant manager of a house). The process continues – a future age might recall that “wicked” once meant “bad” and that “cool” meant the opposite of “warm”.

Not only do words morph and change their meaning; new words can be “borrowed” from other languages. As previously mentioned, English has borrowed heavily from French and such “loanwords” include “ability”, “finance”, “rendezvous” and “theatre”. But English also contains loanwords from other languages including “hinterland” (German); “bazaar” and “bungalow” (from Hindi); “alcohol”, “algebra” and “arsenal” (from Arabic).

Language change is as inevitable as death and taxation – only in a society of telepathically interconnected beings such as Star Trek’s Borg could things be any different.

But how can a group of languages can arise from a common origin? As already noted, French, Spanish, Portuguese, Italian, Romanian, etc all diverged from Latin, but how does this happen? Why are the French, Spanish, Portuguese, Italians and Romanians not still speaking Latin, albeit a version that has changed since Roman times? The answer is that in a sense they are.

Just as a language is not fixed in time, so it also varies across regions. In reality there is no such a thing as a language, only dialects. Old English, in fact, had differing dialects from Day One as the Angles, Saxons and Jutes who invaded Britain all spoke various dialects of West Germanic, all with their own peculiarities, resulting in at least three dialects of Old English – West Saxon, Northumbrian and Mercian. This state of affairs persisted over the centuries and what became known as “Standard” English was no more than the mixture of Essex and Middlesex dialects that happened to be spoken in London. But by the 1400s London had become the hub of the newly-established manuscript printing industry – so written English was more likely to be this version, which combined with London’s influence resulted in it becoming accepted as the “standard” version, although the other dialects were equally valid language systems. Similarly in France the Paris dialect came to predominate, with others being dismissed as “patois”.

Even in Roman times, different dialects of Latin were being spoken in different parts of the Empire. After the fall of Rome contacts between the various peoples reduced and the differences began to become more marked. Eventually the Latin dialects spoken in Italy, France, Spain, etc diverged to such an extent that they became distinct languages.

Reconstructing PIE
We are now in a position to consider what the linguistic evidence has to tell us about the Indo-European question. For over a century, scholars attempted to locate the Proto-Indo-European homeland on the basis of linguistic arguments alone, without being able to call upon supporting evidence from any other source. The picture that emerged was blurred and often contradictory, but bearing in mind that Proto-Indo-European was never written down, what has been achieved is remarkable.

By examining cognates – words in different languages with shared roots – it proved possible to reconstruct much of the lost Proto-Indo-European language and build up its proto-lexicon or vocabulary. This task really got underway after around 1850, but important groundwork was done earlier.

In the early 19th Century, linguists discovered a powerful principle: sound shift, where phonetic features in one language differ from those of another in a consistent way. For example in Latin the f sound in many words corresponds to the b sound in Teutonic languages, thus frater in Latin becomes brother in English and Bruder in German. This principle was first noticed by Rasmus Christian Rask in 1818. It was later extended by Jakob Grimm, elder of the Brothers Grimm and is (perhaps a little unfairly) usually known as Grimm’s Law. However, some scholars insist on referring to it as Rask’s-Grimm’s Rule. Also around this time it was noticed that there are structural similarities between the languages, with words displaying similar grammatical case endings.

However before reconstruction of PIE could properly begin, it was necessary to form a better understanding of the relationships between the various daughter languages. Two models were put forward. The first, by August Schliecher in 1862 was the genealogical or “family tree” model of languages, which was influenced by Charles Darwin’s then recently-proposed Theory of Evolution. Languages with strong similarities languages such as French and Italian were grouped together; these groups were in turn linked to produce larger groups. It is assumed languages give rise to daughter languages; thus for example Italo-Celtic split to give Celtic and Italic, Italic then split to give Oscan, Umbrian and Latin. The model has a number of weaknesses. It assumes the different daughter languages remain isolated, whereas in practice this is not the case. For example English (a Germanic language) was heavily influenced by Medieval French and Latin (Italic). It also fails to explain similarities which cut across different language branches. These similarities are known as isoglosses. The best-known example of an isogloss is the so-called centum/satem division, named for the words for one hundred – centum in Latin and satem in Avestan (a liturgical Old Iranian language used to compose the sacred hymns and texts of the Zoroastrian Avesta).

A more realistic model was proposed by Johannes Schmidt in 1872. This was known as the wave hypothesis. On this picture, language changes spread out over a speech area like ripples on a pond. The main weakness of this model is that it assumes all the languages under consideration are all being spoken at the same time, whereas some may be separated from others by thousands of years. Despite their drawbacks, no universally-accepted alternative to these two models has ever been proposed.

Nevertheless linguists were in a position to commence the task of reconstructing PIE. For example the word “sheep” is avis (Lithuanian), ovis (Latin), ois (Greek), oveja (Spanish) and ewe (English). The PIE word is believed to have been *owis, the asterisk denoting a reconstructed word. Other reconstructed words include:

*mehter mother, *phator father, *swesor sister, *bhrater brother, *dhughater daughter, *suhnus son.

*owis sheep, *tauros bull, *gwous cow, *uksen ox, *porkos pig, *ekwos horse, *kapros goat, *mus mouse, *kwon dog.

*oinos one, *dwo two, *treyes three.

One thing that stands out about these words is their familiarity – even if their meaning isn’t immediately obvious it doesn’t take much working out. For example, ewe from *owis and hound from *kwon.

Much of the work of reconstruction was completed in the 19th Century, though refinement has continued ever since. New information has been incorporated as lost languages such as Hittite and Tocharian have come to light. Not all reconstructed words are regarded as equally secure. Ideally a reconstructed word should have a shared correspondence between a European language and a non-adjacent Asian one, but this is not always possible.

Having built up a picture of how the various Indo-European languages are related to each other and reconstructed some of the original language itself, can we make any inferences about the Proto-Indo-Europeans and their homeland? In fact a number of methodologies have been used with varying degrees of success to try and tease clues out of the linguistic data.

When was PIE spoken?
The most obvious question to tackle first is when was PIE spoken? Up until now we have assumed that it is prehistoric language, purely on basis of having no historic record of when and where it was spoken. We can set an upper limit by looking at when the various written Indo-European languages are first attested, and must therefore by that time have diverged from PIE. Just as Latin fell out of everyday use as French, Spanish, Italian, etc diverged from it, so we can assume that by the time the earliest Indo-European languages had diverged from PIE, PIE itself was no longer being spoken.

The three earliest are Anatolian, at c.2000 BC; Indo-Aryan, inferred from a treaty between the Mittani of Northern Mesopotamia and the Hittites dating to around 1400 BC; and Greek, which goes back to at least 1300 BC and probably rather earlier. Mycenaean Greek is attested by the Bronze Age Linear B tablets excavated in 1900 by Sir Arthur Evans at Knossos in Crete, and deciphered by Michael Ventris in 1952. These three groups are sufficiently different from each other to suggest that they had all been going their separate ways well before 2000 BC – but how much earlier?

Linguistic Palaeontology
In the mid-19th Century an approach was set out known as linguistic palaeontology, named from an analogy with palaeontology – the study of the development of life of Earth based on the fossil record.

The basic assumption is that if a PIE word exists for something, then the Proto-Indo-Europeans must have been familiar with it, and inferences can therefore be made about their material culture, social organisation and the geography of their homeland.

PIE contains many words for domestic animals such as sheep, cattle, goats, pigs and horses (though it uncertain whether these were domesticated or wild); but there are fewer words pertaining to agriculture. Words exist for wheels, axles and wheeled vehicles.

This led some to suppose that the Proto-Indo-Europeans were pastors (tending flocks of animals) rather than agriculturalists. On the basis of this argument many homelands were proposed during the 19th Century, with Central Asia and Northern Europe among the favourites. In 1890 Otto Schrader proposed the South Russian steppe, from the Carpathians to Central Asia. Nomad pastoralism has been practiced in this region since the time of the Scythians, who were a nation of nomadic pastors described by Herodotus around 440 BC. There was certainly no reason for Schrader not to suppose that the region had supported nomadic pastors since prehistoric times.

We now know that animal domestication and agriculture first appeared at around 8000 BC in the Near East, spreading gradually to southern Europe before moving both north and west and reaching the northern and western peripheries of Europe by around 4000 BC; the horse was first domesticated at around 4000 BC. The wheel was invented no earlier than 4000 BC.

So – if we accept these conclusions – we get a date for PIE that is no earlier than 4000 BC. But how safe is it to do so? Can we be sure that the Proto-Indo-Europeans had wheeled vehicles? The answer is “no”. In 1969 Calvert Watkins suggested that terms pertaining to wheeled vehicles were chiefly metaphorical extensions of older IE words with different meaning. For example *nobh- (wheel-hub) meant “navel” and the word for wheel itself, *kwekwlo- is derived from the root *kwel- “turn, revolve”. Another possibility is widespread borrowing of the word for wheel. Because the wheel was such a useful invention, the words pertaining to wheeled vehicles spread along with the things themselves. Subsequent sound-shifts in the borrowing languages would create the illusion that borrowed words were part of the proto-lexicon. Only if we reject these possibilities can we trust a date arrived at through linguistic palaeontology.

Reconstructed words for kin are a fertile ground for inferences about the social systems of the Proto-Indo-Europeans. The systems by which people organize their kin vary across the world and a number of kinship systems are recognised by anthropologists, typically named for the ethnic groups among which they were first studied. A loose correlation has been found between kinship terminology and social and family organization.

The system with which most English-speaking people are familiar has separate words for each member of the nuclear family – “father”, “mother”, “brother”, “sister” – none of which are used for anybody who isn’t a member, with different terms being used for “aunt”, “uncle” and “cousin”. (I am ignoring here the colloquial use of the terms “aunt” and “uncle” within a family to refer to unrelated family friends.) We tend to take this system – which is actually termed the Eskimo system – so for granted that we don’t really think of it as a “system” at all, much less that other systems are possible.

In fact it is just one of many kinship systems. Some lump together fathers and uncles, and mothers and aunts. Others extend the definition of brothers and sisters to include male and female cousins. The Omaha system, practiced by the Native American Omaha tribe (and also the Dani tribe of Papua New Guinea and the Igbu of Nigeria) combines nephews and grandsons. The Omaha system is also associated with a strong patrilineal social organization, i.e. descent through the father’s line.

The PIE word *nepots actually means “grandson”. Less secure is that it also means “nephew” (which might have been a later meaning) but if so, it is possible that the Proto-Indo-European kinship system was of the Omaha type.

There is a cognate word for “king” in many Indo-European languages – e.g. Sanskrit raj, Latin rex and Old Irish ri. Some have taken this to imply that the Proto-Indo-Europeans were ruled by a king, implying a complex stratified society. (It has even been suggested that the absence of a word for “king” in some Indo-European languages is evidence for some kind of prehistoric revolution in which the king was driven out and the word was forgotten!)

In fact we need look no further than English to see that the whole notion of Proto-Indo-European kingship is highly suspect. The word “king” comes from the Old English cyning – the true cognate in English is “ruler”. The verb “to rule” can indeed mean to reign, but it also possible to rule on other matters – a point of law, or even whether or not a goal scored in a football match is offside. Finally it is possible to rule a straight line. The correspondence between straight lines and rules can be seen in the expression “to keep on the straight and narrow” and this correspondence is also found in other Indo-European languages. Rather than a king, the reconstructed word *reg might have referred to a tribal head, or simply an arbiter of right and wrong.

Linguistic palaeontology has also been used in attempts to locate the Proto-Indo-European homeland itself, this time by considering words for geographical features. PIE words exist for hills, mountains and swift-running rivers, leading some to suppose that the homeland was mountainous – Armenia has been suggested. But one need not actually live in mountainous terrain to be familiar with mountains. Few candidate homelands are so far away from any mountainous terrain that their inhabitants could dispense with words to describe it, and such inferences are questionable. PIE words for hot, cold, snow and ice, suggest a seasonally-varying (i.e. temperate) climate, but this really only rules out a homeland in the tropics.

Similarly attempts have been made to equate words for flora and fauna to the distributions of these. For example much effort has focussed on the beach tree and the salmon. Unfortunately we cannot be sure that the reconstructed word for the beach tree actually referred to it and not something else, such as the elder, oak or elm. Similarly with the salmon – did the PIE word refer to the Atlantic salmon or the salmon trout? The distributions of these species differ.

Later Linguistic approaches
Other linguistic methods have been brought into play in the quest for the Urheimat but they tend to produce results that can – to be candid – support more or less any conclusion desired. One such method is to consider the relationship between the Indo-European languages and those of other language families on the basis that the one showing the strongest affinities might serve as a pointer to the location of the Urheimat. In fact loanwords and grammatical loans have been discerned between Indo-European and all its neighbours – Uralic, Afro-Asiatic and Kartvelian; these have been used to support homelands set respectively in the Eurasian steppes, Anatolia and central Asia.

Even approaches that produce a definite conclusion are frequently contradicted by methods producing another. Cladistic correlation assumes that the family tree of the Indo-European languages corresponds to the geographical relationships between the various languages and that the first group to diverge from PIE will have a geographic seat in or close to the homeland. It is generally accepted that the earliest known split is that of Anatolian, suggesting a location for the homeland in or close to Anatolia.

The conservation principle makes the assumption that if a language has not moved it will have undergone less change than one that has due to the impact of what are known as substrate languages. A substrate language is one that is supplanted by a second one, but exerts an influence on the new language, e.g. through loanwords, with the consequence that the latter undergoes change. In the case of the Proto-Indo-Europeans, it is assumed that Indo-European languages spoken further away from the homeland will have experienced more change than those close to it, and those spoken in or near the homeland will be the least changed of all.

The Baltic languages, particularly Lithuanian, turn out to be strongly conservative. Lithuanian is a language that was once far more widespread than present-day Lithuania, extending into Russia. So either a Baltic or Russian homeland is suggested – rather at odds with the result obtained from cladistic correlation.

But neither approach is without its faults. The assumption family tree relationships can be equated to geographical locations is dubious. For example Indo-Iranian appears close to Greek and Armenian, but no obvious geographical relationship can be discerned. The conservation principle is also flawed in that the various languages entered the historical record at different times and were current at different times and a comparison across the full range of Indo-European languages cannot be done on a level playing field.

Returning now to the matter of when PIE was spoken, another method that has been used to seek a time-depth for it is glottochronology. First proposed by the American linguist Morris Swadesh in the mid-20th Century, it assumes that the core vocabulary of any language is lost at a consistent rate and can so be used as a “linguistic clock”. Swadesh used a core vocabulary of 200 words, later reduced to 100. By determining what fraction of the core vocabulary is cognate between two languages, an estimate can be made as to when they diverged from a common ancestor. Before the technique could be used, it was first necessary to determine the speed at which the “clock” runs – a figure for the rate of word loss. This was achieved by comparing pairs of languages where the date of divergence was known, and a figure of 14% per thousand years was obtained.

Critics of glottochronology point out that there is no reason to suppose that languages do lose words at a consistent rate; indeed every reason to suppose that the reverse is true as social factors change. Nevertheless when applied to various European languages glottochronology gives results that are in reasonable agreement with accepted dates; and when applied to PIE, the technique has tended to give time-depths of no earlier than 4000 BC – consistent with the findings of linguistic palaeontology. Does this mean that the Proto-Indo-Europeans had the wheel after all?

Not necessarily. In 2003 a study by Russell Gray and Quentin Atkinson using Bayesian inference gave a rather earlier date of 7000 BC, though with a secondary burst of linguistic expansion around 4000 BC. Bayesian inference – named for the 18th Century mathematician Rev. Thomas Bayes – is a powerful but computational-intensive statistical method that has been brought to the fore by the increased “number-crunching” abilities of modern computers.

We shall return to this interesting conclusion later.

Religion and Mythology
Another field of study that has long attracted Indo-European scholars is the religion of the Proto-Indo-Europeans. Linguistic reconstructions do not produce many correspondences, although the word for “god” is widely attested: devas (Sanskrit), deus (Latin), dievas (Lithuanian) and dia (Old Irish). The reconstructed word is *deiwos. Rather more striking is the word *dyeus phater (sky father), better known to anybody familiar with Greek or Roman mythology as Zeus (Greek) or Jupiter (Roman). While there is an obvious temptation to assume, therefore, that the chief deity of the Proto-Indo-European pantheon was a brash thunderbolt-hurling alpha male, we cannot be certain that this was the case, as he seems less prominent in other religions and it has been suggested that his pre-eminence in Mediterranean religions was a later phenomenon involving his conflation with local weather deities.

Comparative mythology is another area of interest. The French scholar Georges Dumezil has been particularly active in this field, developing the notion of a ranked tripartite caste system underlying many Indo-European societies: priests at the top, then warriors and finally herder-cultivators. Thus in Vedic India there are the brahmanas (priests), ksatriyas (warriors) and vaisyas (herder-cultivators); in ancient Gaul these three were druids, equites (horsemen) and plebes. Each caste has its own gods – in Roman mythology there was the ruling god (Jupiter), the god of war (Mars) and the god of the people (Quirinus).

Does this tripartite structure suggest a common origin in an earlier proto-Indo-European institution? It is very plausible that the same phenomenon that spread Indo-European languages could also have spread institutions, customs, beliefs and legends – but only if a certain level of social complexity for Proto-Indo-European society is assumed.

Archaeology joins in
The above considerations have given us tantalizing but tentative insights into the possible worlds of the Proto-Indo-Europeans. The picture is very blurred and most methodologies come with health warnings. More is needed to bring things into sharper focus, and clearly linguistic inferences can only be taken so far. If a convincing solution to the Proto-Indo-European problem is to be found, then evidence from other sources must also be considered.

The need for such an approach was recognised by the end of the 19th Century and first use of a methodology that included archaeological considerations was made by Gustaf Kossinna in 1902. He identified the Proto-Indo-Europeans with the Corded Ware culture, a wide-spread culture that flourished across northern Europe between the Late Neolithic and Early Bronze Ages, developing in various areas from 3200 BC to 2300 BC and who were named for the characteristic decoration of their pottery by means of impressions of fibre cord. Kossinna placed the homeland in North Germany and envisaged the Proto-Indo-Europeans expanding towards Iran and India, carrying their language eastwards. Kossinna was the first to equate pottery styles to specific peoples and their movements, a methodology that is still current.

Kossinna’s work was followed up by Sydney-born Vere Gordon Childe, a philologist by training, who rejected a career in politics because of his interest in archaeology. Childe coined the term “Neolithic Revolution” to describe the coming of agriculture and “Urban Revolution” to describe the subsequent transformation of agricultural villages into complex societies and he is considered to be one of the most influential figures of 20th Century archaeology.

In 1926 Childe published The Aryans: a study of Indo-European origins in which he surveyed the various archaeological cases for the homeland being located in Asia, Central Europe, North Europe and the South Russia steppe and, following Schrader, came down in favour of the latter. Childe equated the Proto-Indo-Europeans to the Kurgan culture, which embraces a series of cultures that occupied the steppe and forest-steppe of southern Ukraine and southern Russia, possibly originating in the Volga-Ural region. The word kurgan comes from the Russian word for their trademark barrows or burial mounds. Childe reversed the direction of Kossinna’s migrations, and had Corded Ware people moving westwards from the steppes of Russia rather than eastwards as Kossinna had proposed.

The word Aryan came to be applied to the Proto-Indo-European people during the 19th Century, though there is no evidence to suppose they applied the term to themselves. The word comes from the Sanskrit word arya, which means “noble”, “free”, “spiritual” or “skilful”. The name Iran literally means “Land of the Aryans”. Unfortunately the word Aryan is now so indelibly associated with the Nazis that post-war scholars have tended to avoid the term, and Childe – who was a committed socialist – later repudiated his work.

In the second half of the last century what are now regarded as the two main competing theories were both put forward. These are the Kurgan hypothesis, proposed by Lithuanian émigré Marija Gimbutas in a series of papers between 1956 and 1979; and the Anatolian hypothesis, set forward in detail by Colin Renfrew, Professor of Archaeology at Cambridge University, in 1987.

The Kurgan hypothesis
Marija Gimbutas’ Kurgan hypothesis followed Childe in locating the homeland on the South Russian/Pontic-Caspian steppe and like him identified the Proto-Indo-Europeans with the Kurgan tradition. Drawing on both linguistic and archaeological evidence, Gimbutas envisaged the Kurgan people as a warlike male-dominated society, worshipping masculine sky-gods. They were a highly mobile society of nomad pastors, who used ox-drawn wagons and horses for transport. Only a few permanent settlements have been found – as could be expected for mobile people – and they are known mainly from their mortuary practices whereby the dead were interred in earthen or stone chambers, above which a burial mound was frequently erected.

By contrast the people of Neolithic Europe – or “Old Europe” to use Gimbutas’ term – were settled farmers, living in small family-based communities. Gimbutas characterised them as peaceful, matriarchal and possessing a mother goddess-centred religion.

Between 4000 and 2500 BC the Kurgan people expanded from the steppes in a series of hostile invasions, moving into Europe, the Caucasus and Anatolia and onwards towards India; and eastwards along the steppe into Central Asia. The archaeological record shows that Old Europe’s female-centric culture disappears and is replaced by that of Kurgan warriors. Fine ceramics and painted wares give way to cruder Kurgan material. Kurgan burials appear, generally confined to males and accompanied by arrows, spears, knives, horse-headed sceptres. There is evidence of suttee – an atrocious practice whereby women were killed on the deaths of their husbands – clear evidence of a male-dominated society. Stone stelae are seen in the Alpine region depicting horses, wagons, axes, spears and daggers – all of which are valued by a warlike society.

Thus Gimbutas claimed the Kurgan people brought about the collapse of the south-eastern European Neolithic culture and absorbed it into hybrid Kurgan societies. These “kurganised” societies then move north and westwards, eventually leading to the Corded Ware culture in northern Europe. Similar evidence is seen in the south Caucasus and Anatolia; and to the east in southern Siberia, from which the Iranians are derived.

Although widely accepted, the Kurgan Hypothesis has its critics. Many reject the exclusively military nature of the expansion and believe more complex factors were involved. One of these is the so-called “secondary products revolution”. In 1981 the late Andrew Sherratt noted that late in the European Neolithic there was an increased exploitation of such products as milk, cheese, wool and the use of animals for traction. Many of these new features – such as plough agriculture and increased stockbreeding would have enhanced the male role in productive economy. This may have brought about the social changes that Gimbutas attributed to invaders.

(See this article The Peopling of Europe for an account of the possible demographics of Neolithic Europe.)

The Anatolian Hypothesis
In 1987 Colin Renfrew put forward an entirely different model. According to Renfrew, the Indo-European languages were spread by Neolithic farmers, who originated in Anatolia at around 7000 BC, a date far earlier than that proposed for the Kurgan Hypothesis. In Archaeology and Language Renfrew summarised and then rejected all attempts to date to solve the Proto-Indo-European problem. He made three major criticisms of the Kurgan Hypothesis.

Renfrew’s first target was linguistic palaeontology and “the lure of the proto-lexicon”. We have already seen that this approach has its pitfalls. In addition to some of the points already noted above, Renfrew criticised the inference that the Proto-Indo-Europeans must have been nomadic pastors on the basis that the proto-lexicon contains more words for animal species than it does for plants. He pointed out that pastoralists are in fact dependent upon their co-existence with farmers. If the Proto-Indo-Europeans were familiar with domesticated sheep, goats or cattle, they must have also been familiar with wheat, barley and peas regardless of whether we have been able to reconstruct words for these species. The argument that the absence of these words implies the Proto-Indo-Europeans were pastoralists therefore collapses.

Renfrew then went on to challenge the assumption that of the appearance in a region of a new pottery style such as Corded Ware or Bell Beakers, or of new mortuary practices such as the kurgans, are evidence of migrations by corresponding groups of people. Kossinna, Childe, Gimbutas and others sought to explain cultural changes in terms of repeated waves of invasions, a viewpoint that was widely held by archaeologists during the first half of the last century. Thus the Beaker culture, an archaeological culture current in Western Europe between c.2800-1900 BC, was seen by Childe as “warlike invaders imbued with domineering habits and an appreciation of metal weapons and ornaments which inspired them to impose sufficient political unity on their new domain for some economic unification to follow”.

In fact, by the 1980s this “migrationist” view was becoming unfashionable. Renfrew saw the appearance of these objects as the result of cultural diffusion whereby ideas, cultural traits, material objects etc were spread from one local community to another independently of mass migrations, a model known as “peer polity interaction”. On this model, the characteristic pottery styles were simply spread either by trade or the development of the appropriate manufacturing skills rather than by hostile invaders.

Renfrew’s final criticism of the Kurgan hypothesis was that insufficient attention had been paid to the dynamics of the supposed expansion. Why would it take place at all? He questioned the whole notion of the homeland people as pastoral nomads. He argued that nomad pastoralism normally develops from mixed farming and herding, which would have been practiced in Central and Western Europe. Transhumance – where cattle are moved from the village to summer pastures – probably developed during the “secondary products revolution” mentioned earlier. Nomadic pastoralism in southern Russia was probably an adaptation to the steppes of older European transhumance. The western steppes must have been colonised from the west, their language must have also have been the language of farmers living to the west and not vice-versa as suggested by the Kurgan hypothesis.

Having rejected the Kurgan Hypothesis, Renfrew drew on the studies by Italian geneticist Luigi Luca Cavalli-Sforza and his collaborator, American archaeologist Albert Ammerman. In papers published in 1973 and 1984, Ammerman and Cavalli-Sforza claimed that Neolithic farmers had expanded across Europe in a slow continuous “wave of advance”, with farmers spreading out into previously-unfarmed regions as population pressures grew; with further expansion occurring as these regions in turn filled up, and so on. In support of this claim they put forward evidence based on the protein products of genes that showed a genetic gradient that spread across Europe in a south-east to north-west direction.

Renfrew proposed that the Proto-Indo-European expansion had begun 9,500 years ago in Anatolia, far earlier than had been proposed up until now. From Anatolia, the expansion had moved into Greece and from there in a north-westerly direction across Europe. He also offered a choice of two hypotheses as to the spread of the Indo-Iranian languages, which he referred to as Hypotheses A and B. Hypothesis A proposed a wave of advance similar to that proposed for Europe. Hypothesis B on the other hand invoked the steppe-invader model. Once the wave of advance reached the steppe and nomadic pastoralism developed, the pastors moved swiftly east across the steppes and into Iran and northern India, possibly taking advantage of (though not bringing about) the collapse of the Indus Valley civilization which flourished between 3000-1800 BC.

Renfrew’s theory attracted a lot of interest, but it was also criticized, largely because it seemed at times to fly in the face of linguistic evidence. But as we have seen much of this evidence is suspect. Archaeology can identify pottery styles and mortuary traditions – but again, some of the inferences that have in the past been drawn from these are questionable.

Linguistics and archaeology have led us to a choice of two theories, but which – if either – is correct?

The emerging synthesis
Colin Renfrew coined the term Archaeogenetics, which refers to the application of population genetics to the study of the human past. Techniques include the analysis of ancient DNA recovered from archaeological remains; the analysis of DNA from modern humans and domestic animals and plants in order to study migrations and the spread of farming practices; and the application of statistical methods to this data.

It is now necessary to give a very brief introduction to the science of genetics. The human body is comprised of cells, most of which contain a nucleus which holds two copies of what is known as the human genome. The human genome is a collection of genes and it is basically a set of instructions for making a complete human being, though the various types of cells generally implement only a few of those instructions depending on their function. One of our two genome copies comes from our mother and the other from our father.

Although the basic structure of the genome is identical for all human beings, the actual genes themselves can differ because a particular gene can exist in a number of different forms. Such genes are said to be polymorphic and each “version” is known as an allele. Different alleles are responsible for differences in such characteristics as blood groups, hair colour and eye colour.

It is these genetic polymorphisms that are the basis of population genetics, which dates back to World War I. Studies of blood groups carried out on soldiers and POWs showed that the proportions of individuals belonging to various blood groups depended on ethnicity. At that time only the classic ABO blood group polymorphism was known, though many others soon followed. O is the commonest type, but its frequency varies considerably from 61% in East Asia and 65% among Europeans to 98% among Native Americans.

In the 1960s Luigi Luca Cavalli-Sforza and the British statistician Anthony Edwards began applying a statistical method known as principal component analysis to raw data compiled over several decades. Cavalli-Sforza later collaborated with Albert Ammerman to back up his “wave of advance” model with genetic data. He did in fact make his data available to Colin Renfrew but Renfrew felt at that time that genetic data based on blood groups could lead to misleading interpretations and chose not to use it.

At that time the study of DNA itself as opposed to its products (such as blood proteins) was still in its infancy. However in 1995 Bryan Sykes, Professor of Human Genetics at Oxford University presented the results of his studies on mitochondrial DNA. Mitochondria are structures that exist in every cell and help cells to produce energy by production of a high-energy molecule known as ATP. Mitochondria contain their own DNA, a rather surprising state of affairs that suggests they were once free-living bacteria that took up residence in more advanced cells, initially as parasites but later in a mutually-beneficial relationship that has endured to the present day. In human sperm cells the mitochondria are located in the whiplash tail that is shed when the sperm penetrates and fertilizes an egg cell. The latter however retains its mitochondria; thus all mitochondrial DNA is passed through the maternal line and is said to be non-recombining, unlike nuclear DNA which is as we have seen an admixture of maternal and paternal components.

Sykes’ results appeared to show that only around twenty percent of modern Europeans could trace their ancestry back to the early Neolithic farmers. The immediate assumption was therefore that Renfrew was wrong. Nobody disputed that farming had spread gradually across Europe: the archaeological evidence was incontrovertible. But had farmers spread? Or had the idea of farming simply spread as Mesolithic hunter-gatherers gradually took up agriculture? In which case Renfrew’s theory was seriously flawed as it while it seemed highly plausible that hunter-gatherers could learn a new way of life from the farmers, it hardly seemed likely that they would choose to speak the farmers’ language in preference to their own.

In fact it is only a problem if one assumes that there was no intermarriage between the Mesolithic hunter-gatherers and the incoming Neolithic farmers. Even a small number of such “mixed marriages” would gradually dilute the Neolithic genes with those of the Mesolithic hunter-gatherers. On this picture the observed “genetic gradient” is exactly what one would expect. But while the Neolithic genetic signal would gradually weaken, the linguistic signal would not. Anybody marrying into the farming community would have to learn the farmers’ language and their children would certainly come to speak it as their first language. It is easy for somebody to be of mixed-race; rather less so to speak half a language.

Does this mean that Renfrew is right and Gimbutas is wrong? Or could they both be right? Cavalli-Sforza thinks so. He believes that the original Anatolian farmers spoke and early form of Proto-Indo-European, which he describes Pre-PIE. The expansion occurred as Renfrew describes, eventually reaching the South Russian steppe. So Gimbutas’ Kurgan people were speaking a later version of Proto-Indo-European when they began their series of expansions from the steppe, which was if you like a “secondary urheimat”. This is in fact entirely consistent with the second of the two hypotheses presented by Renfrew to explain the spread of the Indo-Iranian languages.

Further support for this view comes from the Gray and Atkinson study which dated PIE (or Pre-PIE) to 7000 BC (when the Neolithic expansion began) with a secondary burst at 4000 BC (when the Kurgan expansion begun).

Is this the solution to the Indo-European problem; is the 220-year quest for the urheimat finally at an end? And if so, why did it take so long to come up with the answer?

Obviously the technology to investigate DNA and archaeological techniques such as radiocarbon dating did not exist in 1786, but that is not the whole picture. Part of the problem may have been tendency to look for a monocausal explanation analogous to the rise and fall of the Roman Empire. In fact it was obvious even in Sir William Jones’ lifetime that this was not so because the Indo-European expansion was continuing, having begun a new phase after 1492. Christopher Columbus and his crew were certainly not the first Indo-European speakers to reach the Americas but they set in motion a process which eventually resulted in the linguistic domination of the New World by three Indo-European languages – Spanish, Portuguese and English. The relatively short time since the voyages of Columbus has seen the completion of a process that began shortly after the end of the last Ice Age.

If Luigi Luca Cavalli-Sforza is correct then the Indo-European expansion actually happened in three phases, millennia apart, in which technology and social conditions were totally different. Under such circumstances, seeking one all-encompassing explanation is clearly futile.

But it’s a big “if”. There have been many twists and turns in the lengthy quest for the Proto-Indo-European homeland, and it would be premature to suggest that the saga is definitely at an end.


Bellwood, P & Renfrew, C. (eds.) 2002: Examining the farming/language dispersal hypothesis, McDonald Institute, Cambridge.

Cavalli-Sforza, L.L. 1996: The spread of agriculture and nomadic pastoralism: insights from genetics, linguistics and archaeology in The Origins and Spread of Agriculture and Pastoralism in Eurasia, edited by Harris, D.R., UCL Press, London.

Cavalli-Sforza, L.L. 2000: Genes, Peoples and Languages, North Point Press, USA.

Gimbutas, M 1997: The Kurgan Culture and the Indo-Europeanization of Europe, edited by Dexter, M. R. and Jones-Bley, K, Journal of Indo-European Studies Monograph No. 18.

Gray, R.D. & Atkinson, Q.D. 2003: Language-tree divergence times support Anatolian theory of Indo-European origins, Nature vol. 426 pp 435-439.

Mallory, J.P. 1989: In Search of the Indo-Europeans: Language, Archaeology and Myth, Thames & Hudson Ltd, London.

Mallory, J.P. & Adams, D.Q.: 2006 The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World, Oxford University Press.

McWhorter, J. 2002: The Power of Babel: a Natural History of Language, William Heinemann, London.

Renfrew, C. 1987: Archaeology and Language: the Puzzle of Indo-European Origins, Jonathon Cape, London.

Renfrew, C. 1999: Time Depth, Convergence Theory, and Innovation in Proto-Indo-European: ‘Old Europe’ as a PIE Linguistic Area, Journal of Indo-European Studies 27, 257-93.

Sykes, B. 2001: The Seven Daughters of Eve, Bantam Press, London.

Watkins, C. 1969: Indo-European and the Indo-Europeans, The American Heritage Dictionary of the English Language, Houghton Mifflin Company, Boston MA, USA.

Wells, S. 2002: The Journey of Man: a Genetic Odyssey, Penguin Books, London.

© Christopher Seddon 2008