Blog post by David McNeill author of Why We Gesture: The Surprising role of the hands in communication
Why do we gesture? Many would say it brings emphasis, energy and ornamentation to speech (which is assumed to be the core of what is taking place); in short, gesture is an “add-on.” (as Adam Kendon, who also rejects the idea, phrases it). However,the evidence is against this. The lay view of gesture is that one “talks with one’s hands.” You can’t find a word so you resort to gesture. Marianne Gullberg debunks this ancient idea. As she succinctly puts it, rather than gesture starting when words stop,gesture stops as well. So if, contrary to lay belief, we don’t “talk with our hands”, why do we gesture? This book offers an answer.
The reasons we gesture are more profound. Language itself is inseparable from it. While gestures enhance the material carriers of meaning, the core is gesture and speech together. They are bound more tightly than saying the gesture is an“add-on” or “ornament” implies. They are united as a matter of thought itself. Thought with language is actually thought with language and gesture indissolubly tied. Even if the hands are restrained for some reason and a gesture is not externalized, the imagery it embodies can still be present, hidden but integrated with speech (and may surface in some other part of the body, the feet for example).
The book’s answer to the question, why we gesture is not that speech triggers gesture but that gesture orchestrates speech; we speak because we gesture, not we gesture because we speak. In bald terms, to orchestrate speech is why we gesture. This is the “surprise” of the subtitle—“The surprising role of the hands in communication.”
To present this hypothesis is the purpose of the current book. The book is the capstone of three previous books—an inadvertent trilogy over 20 years—“How Language Began: Gesture and Speech in Human Evolution,” “Gesture and Thought,” and “Hand and Mind: What Gestures reveal about Thought.” It merges them in to one multifaceted hypothesis. The integration itself—that it is possible—is part of the hypothesis. Integration is possible because of its central idea—implicit in the trilogy, explicit here—that gestures orchestrate speech.
A gesture automatically orchestrates speech when it and speech co-express the same meaning; then the gesture dominates the speech; syntax is subordinate and breaks apart or interrupts to preserve the integrity of the gesture–speech unit.Orchestration is the action of the vocal tract organized around a manual gesture. The gesture sets its parameters, the order of events within it, and the content of the speech with which it works. The amount of time speakers take to utter sentences is remarkably constant, between 1 and 2 seconds regardless of the number of embedded sentences. It is also the duration of a gesture. All of this is experienced by the speaker as the two awarenesses of the sentence that Wundt in the 19th C. distinguished.The “simultaneous” is awareness of the whole gesture–speech unit. It begins with the first stirrings of gesture preparation and ends with the last motion of gesture retraction. The “successive” is awareness of “…individual constituents moving into the focus of attention and out again,” and includes the gesture–speech unit as it and its gesture come to surface and then sink again beneath it.
The gesture in the first illustration, synchronized with “it down”, is a gesture–speech unit, and using the Wundt concepts we have:
“and Tweety Bird runs and gets a bowling ba simultaneous awareness of gesture–speech unity
starts[ll and ∅tw drops gesture–speech unity enters successive awareness it down gesture–speech unity
leaves successive awareness the drainpipe]simultaneous awareness of gesture–speech unity ends.”
The transcript  shows the speech the gesture orchestrated and when – the entire stretch, from “ball” to “drainpipe” is the core meaning of “it down” plus the image of thrusting the bowling ball into the drainpipe in simultaneous awareness. The same meaning appeared in successive awareness, the gesture stroke in the position the construction provided, there orchestrating “it” and “down”together.
The “drops” construction provides the unpacking template and adds linguistic values. Its job is to present the gesture–speech unit, including Tweety’s agent-power in the unit. Gesture–speech unity is alive and not effaced by constructions. To the contrary,Sylvester-up/Tweety-down conflict in socially accessible form. This unit must be kept intact in the speech flow. What is striking and why the example is illustrative, is that “it down” was divided by the construction into different syntactic constituents (“it”the direct object, “down” a locative complement), yet the word pair remained a unit orchestrated by the gesture. In other examples, speech stops when continuing would break up a gesture–speech
it controls them. A gesture–speech unity dominates.
How did it all come about? It occurred because “it down,” plus the co-expressive thrusting gesture, was the source (the “growth point”) of the sentence. The growth point came about as the differentiation of a field of equivalents having to do with HOWTO THWART SYLVESTER: THE BOWLING BALL DOWN. It unpacked itself into shareable form by “summoning” the causative construction (possible because a causative
meaning was in the gesture–speech unit from the start of the preparation – the speaker’s hands already in the shape of Tweety’s “hands” as the agent of thrusting). Thus “it down”and its stroke were inviolate from the start: the stroke orchestrated the two words as a unit, and the gesture phrase the construction as a whole. I believe the situation illustrated with “it down” permeates the production of speech in all conditions and different languages.
1 Participants retell an 8-minute Tweety and Sylvester classic they have just watched from memory to a listener (a friend, not the experimenter). Using Kendon’s terminology and our notation, the gesturephrase is marked by “[” and “]”. The stroke, the image-bearing phase and only obligatory phase of the
gesture, is marked in boldface (“it down”). Preparation is the hand getting into position to makethestrokeandisindicatedbythespanfromtheleftbrackettothestartofboldface(“ba[lland∅twdrops”).Preparation shows that the gesture, with all its significance, is coming into being – there is n oreasonthe hands move into position and take on form than to perform the stroke. Holds are cessations of movement, either prestroke (“drops”), the hand frozen awaiting co-expressive speech, or poststroke
(“down”), the hand frozen in the stroke’s ending position and hand shape after movement has ceased until co-expressive speech ends. Holds of either kind are indicated with underlining. They provide a precise synchrony of gesture-orchestrated speech in successive awareness. Retraction is also an active phase, the gesture not simply abandoned but closing down ( “the drainpipe,” movement ending as the last syllable ended – in some gestures, though not here, the fingers creep along the chair arm rest until this point is reached). In writing growth points – a field of equivalents being differentiated and the psychological predicate differentiating it–we use FIELD OF EQUIVALENTS:PSYCHOLOGICAL PREDICATE (“HOW TO THWART SYLVESTER: THE BOWLING BALLDOWN”).
A “strong prediction.” Our arguments predict that GPs in successive awareness remain intact no matter the constructions that unpack them. This follows from the expectation that unpacking will not disrupt a field of equivalents or its differentiation. Belonging to different syntactic constituents – the “it” with “drops” and the“down” with “the drainpipe” – did not break apart the “it down” GP. Instead, syntactic form adapted to gesture. The example shows that gesture is a force shaping speech not speech shaping gesture. Gesture–speech unity means that speech and gesture are equals, and in gesture-orchestrated speech the dynamic dimension enters from the growth point. In a second version of the “strong prediction,” speech stops if continuing would break the GP apart. The absolute need to preserve the GP in successive awareness then puts a brake on speech flow, even when it means restarting with a less cohesive gesture–speech match up that doesn’t break apart the GP.
Gestures of course do not always occur. This is itself an aspect of gesture. There is a natural variation of gesture occurrence. Apart from forced suppressions (as informal contexts), gestures fall on an elaboration continuum, their position an aspect of the gesture itself. The reality is imagery with speech ranging over the entire continuum.It is visuoactional imagery, not a photo. Gesture imagery linked to speech is what natural selection chose, acting on gesture–speech units free to vary in elaboration. As what Jan Firbas called communicative dynamism varies, the gesture–speech unit moves from elaborate movement to no movement at all. To speak of gesture–speech unity we include gestures at all levels of elaboration, including micro-level steps.
An example of the difference it makes is a word-finding study by Sahin et al of conscious patients about to undergo open-skull surgery, from which the authors conclude that lexical, grammatical and phonological steps occur with distinctive delays of about 200 ms, 320 ms and 450 ms, respectively. We hypothesize that gesture should affect this timing for the 1~2 seconds the orchestration lasts(no gestures were recorded in the Sahin study). If the idea unit differentiating a past time in a field of meaningful equivalents begins with an inflected verb plus imagery,does the GP’s on flashing wait 320 or 450 ms? Delay seems unlikely (although would be fascinating to find). It may be no faster (and perhaps slower) to say “bounced” in an experiment where a subject is told to make the root word into a past tense than to differentiate a field of equivalents with past time gesturally spatialized and the gesture in this space.
To see gesture as orchestrating speech opens many windows—how language is a dynamic process; a glimpse of how language possibly began; that children do not acquire one language but two or three in succession; that gestures are unique forms of human action; that a specific memory evolved just for gesture–speech unity; and how speech works so swiftly, everything (word-finding, unpacking, gesture–speech unity, gesture-placement, and context-absorption) done in a couple of seconds with workable (not necessarily complete)accuracy.