The origin of language in gesture–speech unity

Part 6. Gumbo: The thought–language–hand link, social interactive growth points, the timeline of Mead’s Loop, and bionic language.

David McNeill, University of Chicago

To end this series, I address four questions regarding Mead’s Loop: 1) what evidence is there for the thought-language-hand link that in theory it established; 2) how did it change face-to-face social interaction; 3) when did it emerge; and 4) how far can it be duplicated artificially?  The questions, disparate as they are, are connected through the concept of the growth point, which is the linchpin of each.

The “IW case” reveals the thought–language–hand link   

Natural selection of a thought-language-hand link, chiefly in Broca’s Area but also with links to the other “language areas” indicated in Part 3[HYPERLINK TO 3], was part of Mead’s Loop’s evolution.  This thought–language–hand link is usually submerged among other actions, but in certain forms of neuropathy, where sensory feedback is eliminated, it becomes visible.  Then action is disrupted but gestures are unaffected.  It is this dissociation that reveals a specific path from thought and language to gesture in the human brain.

Mr. Ian Waterman, sometimes referred to as “IW,” whose character and achievements are captured in the title of Jonathan Cole’s book about him, Pride and a Daily Marathon, at age 19, suffered a sudden, total deafferentation of his body from the neck down – the near total loss of all the touch, proprioception, and limb spatial position senses that tell you, without looking, where your body is and what it is doing.  The loss followed a never-diagnosed fever that Jonathan Cole believes set off an auto-immune reaction.  The immediate behavioral effect was immobility, even though IW’s motor system was unaffected and there was no paralysis. The problem was not lack of movement per se but lack of control. Upon awakening after three days, IW nightmarishly found that he had no control over what his body did – he was unable to sit up, walk, feed himself or manipulate objects; none of the ordinary actions of everyday life, let alone the precise actions required for his vocation.

To imagine what deafferentation is like, try this experiment suggested by Shaun Gallagher: sit down at a table (something IW could not have done at first) and place your hands below the surface; open and close one hand, close the other and extend a finger; open the first hand and put it over the closed hand, and so forth. You know at all times what your hands are doing and where they are but IW would not know any of this – he would know that he had willed his hands to move but, without vision, would have no idea of what they are doing or where they are located.

After years of constant self-drill, IW has mastered movement in an entirely new way – he plans movements in advance and visually monitors them as they occur.  It is remarkable to watch him since these movements look so normal – accurate, at speed, and seemingly effortless (although actually the result of great concentration).

IW also performs gestures in the same planned, monitored way.  He refers to them as “constructed,” and distinguishes them from others he terms “throw-aways,” which just happen without planning or monitoring.  For us, of course, the focus is on precisely these “throw-aways.”

Thanks to the BBC, filming for a Horizon program about IW (“The Man Who Lost His Body,” 1998), IW, Jonathan Cole, Shaun Gallagher and the University of Chicago gesture researchers gathered at our Gesture and Speech Lab for several days of filming. We wanted to record IW under a variety of conditions, both with and without vision. IW cannot simply be blindfolded.  He would be unable to orient himself and be at risk of falling over.  Taking up an idea of Nobuhiro Furuyama, we devised a tray-like blind, that could be pulled down in front of IW, blocking vision of his hands, while allowing him space to move and preserving visual contact with his surroundings. IW was videotaped retelling our usual animated cartoon.  He also was recorded under the blind in casual conversation with Jonathan Cole.

IW’s gestures without vision. The first pair of illustrations shows a coordinated two-handed tableau (a “throw-away”) in which the left hand is Sylvester and the right hand is a streetcar pursuing him.  IW was saying, “[and the atram bcaught him up]” (a, b referring to the illustration’s first and second panels). His right hand moved to the left in exact synchrony with the co-expressive “caught” (boldface), although slightly out of alignment (reflecting a lack of topokinetic control, which requires feedback, versus morphokinetic control, which IW achieves).  Moreover, a poststroke hold (underlining) extended the stroke image through “him” and “up,” capturing more of the co-expressive speech.  It is important to recall that this synchrony and co-expressivity were achieved without proprioceptive or spatial feedback.

This kind of performance by IW – coordinated gestures without feedback (there are many other examples) – is part of the evidence we need of a thought–language–hand link.

The other part is that his gestures are separate from practical actions, which without vision are impossible for him. The second pair of illustrationss shows two steps in his attempt to remove the cap of a thermos bottle.  The first is immediately after Jonathan Cole has placed the thermos into his right hand and placed his left hand on the cap (IW is strongly left handed); the second is a second later, when IW has begun to twist the cap off.  As can be seen, his left hand has fallen off and is turning in midair.  Similar disconnects occurred during other instrumental actions (threading a cloth through a ring, hitting a toy xylophone, etc. – this last of interest since IW could have made use of acoustic feedback or its absence to know when his hand had drifted off target, but still he could not perform the action).

coordinated two handed iconic gesture without vision




action with speech


In keeping with the thought–language–hand link, IW was able to “remove” the cap of an imaginary thermos in gesture (third illustration) and, although not asked to speak, spontaneously produced synchronous, co-expressive speech as he performed it.

IW, without vision, changes speech and gesture in tandem. Another manifestation of the thought–language–hand link is that, without vision, IW modulated the speed at which he presented meanings in both speech and gesture, and did this in tandem. As his speech slowed, his gesture slowed, and to the same extent, so that synchrony and speech–gesture unity were preserved (which the Warlpiri speaker, described in post 2, who was not producing impromptu gestures as with IW but a sign language, specifically could not do while also producing speech),

If IW is forming gesture–speech units, this joint modulation of speed is explicable. He does it based on a sense (available to him) of how long a given joint imagery-linguistic unit remains “alive,” and lacking peripheral sensory feedback needn’t play a part.

During a conversation with Jonathan Cole, while still under the blind, IW reduced his speech rate at one point by about one-half (paralinguistic emphasis). Speech and gesture remained in synchrony:

Normal Speed “and I’m startin’ t’use m’hands and that’s be-”(bold = hands rotating)

Slow Speed “-cause I’m startin’ t’get into” (bold = hands rotating)

The gestures are of a familiar metaphoric type in which a process is depicted as a rotation in space.  IW executes the metaphor twice; first at normal speed, then at slow speed.

The crucial observation is that the hand rotations are locked to the same landmarks in speech at the two speeds (across nearly the same syllable counts). If we look at where his hands orbit inward and outward we see that the rotations at both speeds coincide with the same lexical words and the same stress peaks.

It is important to recall that this tandem slowing was produced without any proprioceptive and spatial feedback; IW could not tell what his hand were doing, yet they were in perfect synchrony with speech.

Whatever controlled the slowdown it was exactly the same for speech and gesture.

As the rotating hands were metaphors for the idea of a process, the pacesetter accordingly was activated by a thought–language–hand link and was co-opted by significances other than the action of rotation itself.  This co-opting is shown in the timing, since the hands rotated only while IW was presenting the metaphor, “I’m starting to…,” and there was actually a cessation of the gesture between the first (normal speed) and second (reduced speed) rotations when he said, “and that’s because.”

That is, the rotation and any phonetic linkages it claimed were organized specifically around the idea of a process as rotation. This is gesture–speech unity over the thought–language–hand link.

Overall significance of the IW case for the origin of language. The IW case suggests that control of the hands and the relevant motorneurons is possible directly from the thought-linguistic system. It does not pass through any pantomime (so is yet another piece of evidence that pantomime is not connected to human speech). Without vision, IW’s dissociation of gesture, which remains intact, from instrumental action, which is impaired, implies that the “know-how” of gesture is not the same as the “know-how” of instrumental movement.  In terms of brain function, it implies that at some point gesture enters a circuit of its own and hooks into speech.  A likely locus of this thought–language–hand link, at least in part, is in areas 44 and 45 or Broca’s area.

Mimicry, interpersonal synchrony of growth points.

The Mead’s Loop “twist” sustains a host of social interactions including turn-exchanges in conversations and two-person mind-merging.

Irene Kimbara studied gestural mimicry as an interactive phenomenon. Mimicry she describes as a process of “interpersonal synchrony,” which creates a sense of solidarity and is prominent when the interlocutors are personally close.

Mimicry can merge GPs and contexts between speakers.  If through mimicry people approximate similar growth points they come to some common ground.  It works through embodiment. Recreating a gesture in mimicry is more than imitating a movement; it is the envelopment of the mimic in the other’s world of meanings. Imagine experiencing an interpersonal misconstrual. One can overcome it by mind-merging the other’s GP and, from this, finding its context, the only context in which this mimicked GP is a possible differentiation. By their nature, growth points are not independent of the context, which means that if a speaker is generating a GP contexts also tend to emerge.

The mimicry need not be overt. We focus on the mimicry of growth points. It is accomplished at a level of orchestration, with or without overt movement.

Two-body GPs are joint constructions, collaborative GPs wherein Mind 2 mimics the gesture and speech of Mind 1. A psychological predicate and field of equivalents seemingly belonging to Mind 1 arise as if by magic (but it is not magic – it is because the original gesture had absorbed this context and mimicking it recreates it at least in part).

Mimicry imports the GP of the other (or rather, recreates it over one’s own thought–language–hand link). It is a kind of borrowed embodiment.  It recreates the other’s gesture–speech unit as if it were one’s own.  The many experimental demonstrations of sympathetic responses to verbs that denote actions (“grab” accompanied by a listener’s incipient grabbing) are, in this version, mimicry of “new actions,” of GPs.

Two-body GPs appeared in experiments devised by Nobuhiro Furuyama.  The setting was one person teaching a second person, a stranger, how to create an origami box but without actual paper in hand. In one version shared embodiment occurred when the learner mimicked the teacher’s gesture without the learner speaking. The gesture instead was synchronized with the teacher’s speech – “[pull down] the corner,” the learner performing a gesture during the bracketed portion. One person, that is, appropriated the other’s speech, combining it with her gesture, as if student and teacher were jointly creating a single GP.

The reverse also occurs.  The learner appropriates the teacher’s gesture by combining it with her speech. In one such case the learner (female) said, “[you bend this down?]” and during the bracketed speech seized and moved the (male) teacher’s hand down. It is striking that the taboo normally prohibiting strangers, especially of opposite genders, from non-accidental physical contact was overridden, possibly because both the learner’s and tutor’s hands were no longer “hands,” actual body-parts subject to the taboo, but pure symbols.

Turn-taking at momentary overlaps of GPs depends on this process, and creates yet another interactive discourse unit.  Turn-taking is typically analyzed as the coordinated activity of one speaker authorizing the next speaker to speak. But the process also involves joint GPs at the exchange point, with gestures playing a critical role. A GP starts with one speaker and passes over to the next speaker.  Emanuel Schegloff, in an early gesture study, used gesture to forecast what would be “in play” in the next round of a conversation.  We follow his lead, supplemented with the concept of a GP, and look for joint GPs and the contexts they differentiate.  A new joint discourse unit is formed when the listener mimics the gesture of the speaker; or when two individuals participate in one GP, one providing the speech, the other the gesture.

Shared tip of the tongue. Mimicry also offers an explanation (found by Liesbet Quaeghebeur, not published) of the curious phenomenon of tip of the tongue contagion – one person cannot recall a common word whose meaning is clear to all, and you, the interlocutor, suddenly also cannot recall it.  If conversation includes “mind merging,” it also could include “tip-of-the-tongue merging,” through spontaneous mimicry.

Gesturecoder mimicry. Coders frequently mimic gestures and speech as they work.  Mimicry brings the speaker’s differentiation and context into the coder’s own momentary cognitive being; she inhabits the other’s gesture and speech. It is mimicry of a stranger visible in video (or, as here, in a screenshot). The following illustrations demonstrate the experience.

In Panel 2 the field of equivalents is something like EXPECTED TWEETY and the differentiated newsworthy point is GRANNY.

In Panel 3 it is something like HOW THIS ESCAPADE ENDS, with the point of differentiation NOT WHAT HE THOUGHT.  Each speaker forms his own story, and with speech the gestures tell it.


Mimicking of fields of oppositions self test, as much as gesture codes do spontaneously


Thanks ultimately to the social references built into Mead’s Loop, we find discourse units in conversations formed by two persons, their gestures and contexts realized in common, through mimicry. Mimicry can take place in conversations, in deception or during instruction, or in the virtual interaction of a gesture coder with video images of another person’s gestures.

Mead’s Loop timeline.

The phrase, “the dawn of language,” suggests that language burst forth at some definite point, say 150200 kya (thousand years ago), when the prefrontal expansion of the human brain was complete.

But the origin of language has elements that began long before – 5 mya (million years ago) for bipedalism, on which things gestural depend. I think 2 mya, based on humanlike family life dated to then, for starting the expansion of forebrain and the selection of self-responsiveness of mirror neurons and the resulting reconfiguration of Areas 44/45. I imagine this form of living was itself the product of changes in reproduction patterns, female fertility cycles, child rearing, neotony, all of which must have been emerging over long periods before.

So this says that language as we know it emerged over 1 to 2 million years and that not much has changed since the 150K200K landmark of reconfiguring Broca’s area with the mirror neurons/Mead’s Loop circuit (although this date could overlook continuing evolution: there are hints that the brain has changed since the dawn of agriculture and urban living).

The Mead’s Loop model doesn’t say what might have been a protolanguage before 2 mya – Lucy and all. It would have been something an apelike brain is capable of. There are many proposals about this – Kendon, for example, proposed that signs emerged out of ritualized incipient actions (or incomplete actions). Natural gesture signals in modern apes have an incipient quality as well, the characteristic of which is that an action is cut short and the resulting action-stub becomes a signifier. The figure in Part 2 shows a truncated shove by one bonobo signaling a demand for a second bonobo to move in a certain direction.

The slow-to-emerge precursor from 5 mya to 2 mya may have built up a gesture language from instrumental actions, a gesture-first type language. It would have been an evolution track leading to pantomime.

But the human brain evolved a new system in which gesture fused with vocalization.

Mead’s Loop also does not say where language evolved (an argument by Atkinson suggests the southwestern corner of Africa), but it does “predict” that wherever it was the languages there now would tend to be of the isolating type (and this appears to be the case in SW Africa; see Part 3 for the “isolating type”). In any case the origin point would have been an area where human family life also was emerging.

A proposed time line for the origin of Mead’s Loop is as follows:

  1. To pick a date, the evolution of a thought–language–hand link started 5 mya with the emergence of habitual bipedalism in Australopithicus. This freed the hands for manipulative work and gesture, but it would have been only the beginning. Even earlier there were preadaptations such as an ability to combine vocal and manual gestures, to perform rapid sequences of meaningful hand movements, and the sorts of iconic/pantomimic gestures we see in bonobos, but not yet an ability to orchestrate movements of the vocal tract by gestures.
  2. The period from 5 to 32 mya – Lucy and the long reign of Australopithicus – would have seen the emergence of various precursors of language, such as the protolanguage Bickerton attributes to apes, very young children and aphasics; also, ritualized incipient actions becoming signs as described by Kendon.
  3. At some point after the 32mya advent of H. habilis and later H. erectus, there commenced the crucial selection of self-responsive mirror neurons and the reconfiguring of areas 44 and 45, with a growing co-opting of actions by language to form speech-integrated gestures, this emergence being grounded in the appearance of a humanlike family life with a host of other factors shaping the change (including cultural innovations like the domestication of fire and cooking). The timing of this stage is not clear but recent archeological findings strongly suggest that hominids had control of fire, had hearths, and cooked 800 kya.
  4. Thus, the family as the scenario for evolving the thought–language–hand link we see in the IW case seems plausible, commencing no more recently than 800 kya.

Another crucial factor would have been the physical immaturity of human infants at birth and the resulting prolonged period of dependency giving time for cultural exposure and GPs to emerge, an essential delay pegged to the emergence of self-aware agency (Neanderthals, in contrast, may have had a short period of development).

Along with this sociocultural revolution was the expansion of the forebrain from 2 mya, and a complete reconfiguring of areas 44 and 45, including Mead’s loop, into what we now call Broca’s area. This development was an exclusively human phenomenon and was completed with H. sapiens about 200–100 kya. If a “dawn” occurred , it was here.

At least two other human species have existed, Neanderthals and the recently discovered Denisova hominin; each may have had a gesture-only form of communication but our species also developed Mead’s Loop and GPs. These other humans went extinct, one factor in which could have been a confinement to pantomime and consequent inability to reach a new form of language, inhabitance with thought and action, just as, in our case having evolved this ability, we were spared the same fate (but it is also possible that Mead’s Loop emerged earlier and Neanderthals also had speech–gesture units and extinguished for other reasons; see Part 3 for more).

Language with dual semiosis came into being over the last 1 or 2 million years. Considering protolanguage and then language itself the time-line seems to be over five million years (low hum more than big bang). Meaning-controlled manual and vocal gestures that combine under imagery, emerged over the last two million years. The entire process may have been completed not more than 100 kya, a mere 5,000 human generations, if it is not continuing.

Bionic language.

A bionic version of human language is one possible continuation.  Language in this vision of the future is extended with artificial enhancements. In a book published in 1968 Herbert A. Simon made the case for the sciences of what he called “the artificial.” He was careful not only to explain these sciences but also to distinguish the artificial from the natural.  He seems to have believed that language was “man-made.”  Over the years enthusiasm for the artificial has grown while a sense of its limits has shrunk. However, this enthusiasm underestimates the gulf between the artificial and the natural in the case of gesture–speech unity.

The artificial is not natural.  To begin with, the origin of language was not artificial.  It was “man-made” in one way – it was made in “man” (actually, probably in “woman”) as the product of natural selection but was not artificial in Simon’s sense, the outcome of human purpose and goal-directedness.

Simulations such as automatic speech “recognition,” while constantly improving, do not recreate, nor do they aim to recreate, the human inhabitance of language.

And for good reason. Purpose-designed artificial devices cannot model GPs and their evolved global-synthetic semiotics from Mead’s Loop.  Even gestures, while inputting them might improve recognition, could not lead to an imagery–language dialectic. This is because systems that model gestures (as in conversational agents and physical robots) do it in a bottom-up, features-to-whole, static dimension language-like way that, even if synchronized with speech, is inherently incapable of forming a dialectic.

The problem is not just adjusting models to include imagery. Mead’s Loop is beyond their reach basically because action, which speech fundamentally is and which the linguistic system evolved in part to orchestrate, does not exist as a unit in these artificial systems. They instead construct actions using a feature-based mode wherein the features are the units and the actions the outcomes (in a GP, features are outcomes, actions are the units).

Foremost of these difficulties is the global-synthetic imagery of the GP, essential for the dynamic dimension as a whole. The problem is that the use of features in computational models forces the process of gesture creation to be combinatoric, to move from parts to whole rather than whole to parts; and this loses the semiotic opposition.

Once created we can usually identify form and meaning features, e.g., enclosure means interiority, and so forth. But we must not conclude that composition was the process of creation; it is the result of our analysis. Features are products. This is the paradox of natural gestures – they work in the opposite direction from modeling based on features.

Coordinative structures, drawn to ideas as attractors, may avoid the bottom-up problem but they create a new problem. (An anonymous Yale linguistics handout defines coordinative structures as “flexible patterns of cooperation among a set of articulators to accomplish some functional goal.”) The weakness is that they impose a distinction between “image” and “gesture” (the attractor is the image and coordinative structures fashion a gesture to embody it).

This creates a new contradiction with the concept of a gesture as a material carrier (see part 4). The gesture is the image – the image in its most material form; it is not a copy of it. Thus we have merely exchanged one contradiction for another, and are no closer to a model of the GP and imagery–language dialectic.

Analog machines. One may think that a hybrid analog–digital machine with self-defining, self-segregating imagery would do the trick. The most effective approach would be to build in a self-responding Mead’s Loop and then attempt to have the machine evolve a new language. Robots capable of limb and hand motion may be the nearest approximation to such a machine. To do this information needs to be (or simulated to be):

  • 3D, that is, embody variation as in gesture space.
  • With correct orientation, as in gesture space.
  • In the correct direction, as in gesture space.
  • With texture, as in gesture space.
  • As a spatial array, as in gesture space
  • With local identity (in all 3Ds), as set up in gesture space.
  • With memory of past configurations, as in catchment space.
  • And organized by action.

No doubt the list can be extended but it is already a substantial departure from what I understand to be modeling practice. Its feasibility is far from assured and a global analog device is at present more a deus ex machina than a realizable thing.

But there is a more profound difficulty. None of this is actually imagery, global-synthetic, and meaningful. Meaningful imagery is totally absent, so a hybrid machine is no closer to the imagery–language dialectic than the digital one.

Synchrony.  My co-worker Susan Duncan once contrasted an autonomous agent, “Max,” to the GP in how it synchronizes gesture and speech.  In a GP the synchrony is a condition of the dialectic, and achieving it is a matter of thought, not of external signals tying speech and gesture together. However, as Duncan writes: “Max works as follows – looks ahead, sees what the linguistic resource will be, calculates how far back the preparation will have to be in order for the stroke to coincide with this.  Then speech and gesture are generated on their own tracks, and the two assembled into a multimodal utterance.  In contrast, in the GP the gesture image and linguistic categorization constitute one idea unit, and timing is inherent part of how this thought is created. The start of preparation is the dawn of the idea unit, which is kept intact and is unpacked, as a unit, into a full utterance.”

The natural is not artificial. We learn from these thought experiments that artificial models do not match an evolved biological/psychological process, or head in the right direction to reach the GP – most crucially, that the semiosis must include a global component (to drive the dialectic), that there is a dialectic, and that finally the process is embodied in and tied to action, and requires accordingly a “body” that is the embodiment of meaning. Further, the gesture–speech unit differentiates a context and the context and its differentiation are one “thing.” Finally, all these models conflict with Quaeghebeur’s “all-at-onceness,” in that in their logic they are sequential. Conversational agents can simulate many of these properties, but the basic difference between an artificial system, designed by rational intelligence, and what has naturally evolved remains a root fact in the contrast of the GP with modeling schemes.

Why the fascination with the artificial?  A machine that thinks (or seems to), speaks, or evolves a human language or something like one, strikes us as uncanny.  It captures life in the making, a new existence or being; and for scientific interest, the elements and history of this being.  Uncanniness is one reason for fascination. The fascination (similar to the fascination with chimps schooled in sign language) is actually with our own existence; the bionic has an existence close to but not quite ours, one that can be dismantled and regarded objectively.

And it is this fascination the jeremiad here must disappoint. Machines that attempt to “inhabit” language as Merleau-Ponty would have agreed seem blocked from the possible.

We learn (or recall) the uniqueness of human evolution.  It was it that gave us language. Bionic man tries to make it artificial, and here lies hubris.


And here ends our series on how language began in gesture–speech unity. To all who have participated, I express my thanks and admiration.  Comments are more than welcome at [email protected].  I thank R.B. McNeill, N.B. McNeill and E.T. Levy for very helpful comments.

Further Reading

Atkinson, Quentin D. 2011. ‘Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa.’ Science 332: 346-349.

Bickerton, Derek. 1990. Language and Species. Chicago.

Cole, Jonathan. 1995.  Pride and a Daily Marathon.  MIT.

Deacon, Terrence W. 1997. The Symbolic Species: The Co-evolution of Language and the Brain. Norton.

Donald, Merlin. 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and Cognition. Harvard.

Evans, Patrick D, Gilbert, Sandra L., Mekel-Bobrow, Nitzan, Vallender, Eric J., Anderson, Jeffrey R., Vaez-Azizi, Leila M., Tishkoff, Sarah A., Hudson, Richard R. and Lahn, Bruce T. 2005. ‘Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans.’ Science 309:1717-1720.

Freud, Sigmund. 2003. The “Uncanny” (Part two), David McLintock, trans., Intro Hugh Haughton. Penguin.

Furuyama, Nobuhiro. 2000a. Gestural interaction between the instructor and the learner in origami instruction, in McNeill D. (ed.), Language and Gesture, pp. 99–117.  Cambridge.

Gill, Satinder. 2007. ‘Entrainment and musicality in the human system interface.’ AI & Society. 21:567–605.

Goren-Inbar, Naama, Alperson, Nira, Kislev, Mordechai E., Simchoni, Orit, Melamed, Yoel, Ben-Nun, Adi and Werker, Ella. 2004.  ‘Evidence of hominid control of fire at Gesher Benot Ya’aqov, Israel.’  Science 304:725-727.

Kimbara, Irene. 2006. On gestural mimicry. Gesture 6: 39–61.

Lieberman, Philip. 2002.  On the nature and evolution of the neural bases of human language. Yearbook of Physical Anthropology 45: 36-63.

McNeill, David, Duncan, Susan, Franklin, Amy, Goss, James, Kimbara, Irene, Parrill, Fey, Welji, Haleema, Chen, Lei, Harper, Mary, Quek, Francis, Rose, Travis, and Tuttle, Ronald. 2009. ‘Mind merging,’ in Morsella, E. (ed.). Expressing Oneself / Expressing One’s Self:  Communication, Language, Cognition, and Identity, pp. 143-164. Taylor and Francis.

McNeill, David. 2010. ‘Gesten der Macht und die Macht der Gesten’, in Wulf, Christoph & Fischer-Lichte, Erika (eds.). Gesten, pp. 42-57. Munich: Wilhelm Fink (translation of ‘Power of Gestures and the Gestures of Power,’ available under Writings: Essays at

Merleau-Ponty, Maurice. 1962. Phenomenology of Perception, Colin Smith (trans.), Rourledge.

Pika, Simone and Bugnyar, Thomas. 2011. ‘The use of referential gestures in ravens (Corvus corax) in the wild.’ Nature Communications 29 November.

Quaeghebeur, Liesbet. 2012. The ‘All-at-Onceness’ of embodied, face-to-face interaction. Journal of Cognitive Semiotics 4: 167-188.

Schegloff, Emanuel A. l984. On some gestures’ relation to talk, in Atkinson J. M. and Heritage J. (eds.). Structures of Social Action, pp. 266–298. Cambridge.

Simon, Herbert A. 1968. The Sciences of the Artificial.  MIT.

Wachsmuth, I., Lenzen, M. and Knoblich, G. (eds.). 2008. Embodied Communication in Humans and Machines.  Oxford.

Wrangham, Richard W. 2001. ‘Out of the pan, into the fire: How our ancestors’ evolution depended on what they ate.’ In F. de Waal (ed.), Tree of Origin: What Primate Behavior Can Tell Us about Human Social Evolution, pp. 119-143. Harvard.

David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago.

His new title How Language Began: Gesture and Speech in Human Evolution is now available from Cambridge University Press at £19.99/$36.99


1 comment to The origin of language in gesture–speech unity

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>