The origin of language in gesture–speech unity

Part 1: Language and Imagery

By Professor David McNeill

Why do we gesture? Many would say that it brings emphasis, energy, and ornamentation to speech (which is assumed to be the core of what is taking place); in short, as Adam Kendon says, also arguing against the view, gesture is an “add-on.”  However, the evidence is against this.  The reasons we gesture are more profound. Language is inseparable from imagery. The natural form of imagery with language is gesture, with the hands especially.  While gestures can enhance communication, the core is gesture and speech together. They are bound more tightly than saying the gesture is an “add-on” or “ornament” implies. Even if for some reason a gesture is not made (social inappropriateness, physical difficulty, etc.), its imagery is still present, hidden but part of the speech process (it may surface in some other part of the body, the feet for example).

To answer to the question, why we gesture?, it is because gesture was built in from the start. Language could not have evolved without it.  If a theory of language origin is to predict the nature of language, it must among other things predict this gesture-speech unity.  But if a theory says that gesture-speech unity did not evolve, and/or predicts that something incompatible with it did evolve, the theory cannot be correct.  A widespread theory that I will call “gesture-first” fails this test. In this first post I explain gesture–speech unity. In later posts I apply the test and propose a new theory, called “Mead’sLoop,” that meets it.

The smallest unit of gesture–speech unity is called a growth point, or GP.  Growth points are inferred from the totality of communicative events, with special focus on speech–gesture synchrony and co-expressivity. They are called growth points because they are meant to be the initial pulses of thinking-for-and-while-speaking, a dialectic (or “multilectic”) from which a dynamic process of organization emerges.  The result is what Wundt described as the two modes of consciousness in speech:

“From a psychological point of view, the sentence is both a simultaneous and a sequential structure.  It is simultaneous because at each moment it is present in consciousness as a totality even though the individual subordinate elements may occasionally disappear from it.  It is sequential because the configuration changes from moment to moment in its cognitive condition as individual constituents move into the focus of attention and out again one after another.” (Blumenthal 1970 translation of Wundt 1900, p. 21).

In a GP dialectic Wundt’s two modes are a natural outcome. The “simultaneous” is consciousness of the GP and dialectic itself; the “sequential” is awareness of what I call unpacking. The model is that a GP differentiates what for the speaker is the point of newsworthiness in the immediate context of speaking. This differentiation, partly linguistic form, partly imagery, is then “unpacked” into a construction, both rendering it communicable as a social effort and putting a stop-order to the dialectic.  (It’s the nature of this post that I must introduce a number of new terms, several of them binary opposites; I’ve placed a diagram at the end that lists them and shows how they relate.)

The figure below is an example of the kind of gesture we focus on (sometimes called “gesticulation”), illustrating the two simultaneous modes (the gestures were spontaneous occurrences, recorded during an experiment in which speakers were retelling an animated Tweety and Sylvester cartoon they had just watched; in this episode, Sylvester (an ever-seeking cat) attempts to reach Tweety (a pugnacious canary), who is perched on a windowsill several stories above the street, in a stealth approach by climbing a drainpipe on the inside). The gestures are iconic signs but the iconicity is semantic, not photo-like. They are images of concepts of the events in the story:

  • Pointing upward, she says, “he tries to go up inside,” localizing the character in gesture space.
  • Then, making a spiraling upward movement, she says, “barreling up through it,” depicting the character’s presumed spinning inside the pipe – an inference (not shown in the film).
  • Her left hand is shaped as a cup and embodies her concept of the inside of the drainpipe, timed exactly to go with the “barreling” part of her description.

Photos of a man gesturing

It is important to note that gesture and speech cover the same idea units. It is not that gesture holds hidden messages. Statements that such and such a percentage of meaning is “non-verbal” fly against the reality of gesture-speech unity. In nearly every case, speech and gesture convey the same meaning, but they do it in opposite ways. We see in the gesture visuospatial thinking – not only about space as such but about the same content also expressed verbally. Note the use of the left hand for inside the pipe. It is her concept of interiority, not a depiction of the actual pipe, which enclosed Sylvester and was vertical, not horizontal. Both the verbal and imagistic modes capture interiority but in opposite ways.  Unlike the sentence, the ‘parts’ of the gesture (the shape, the direction, the motion, etc.) do not have their own meanings; they are meaningful only in the context of the gesture as a whole.  This is called the global property: the meanings of the parts depend on the meaning of the whole. It is the opposite in speech.  There the parts (words) have their own meanings and build up the meaning of the whole through combination.  This is called the syntagmatic property.

So in gesture-speech unity different modes of semiosis (“semiosis” and “semiotic” refer to the nature of symbols) are presenting the same meanings at the same time – global whole-to-part in gesture, syntagmatic part-to-whole in speech, and they are synchronous. Throughness is visualized as a hollow space – not an iconic replica of the pipe but the concept realized imagistically with its own location.  It goes not with ‘inside’ but with its conceptually parallel ‘barreling up through.’  When gesture and speech synchronize (as they do in that vast majority of utterances), one idea – here, Sylvester’s ascent via the pipe – is simultaneously in two semiotic modes, imagery and language. The result is an idea unit in which imagery and words combine, and this is an inherently dynamic situation.  Such a system of language, an imagery-language dialectic, would be explained if we found, independently, that language began in a gesture-speech unity.  We shall later see how this may have happened (post 3).

Seeing the gesture and the co-expressive speech it synchronizes with, we witness a moment of an ongoing imagery–language dialectic. Gesture-speech unity is the nexus at which imagery and the codified forms of language form intersect – two dimensions of language with equal weight. The picture is not unlike Humboldt’s distinction of Ergon and Energeia (language viewed as structure and language as an “embodied moment of meaning located both in the organism and in the medium that the organism uses for expression.” The latter is language at the moment of its use, “alive, in an actor”: Joseph Glick describing seminars by Heinz Werner; from Elena Levy).

The larger picture – 1. Historically, the dynamic and static have been approached separately – each with its own traditions, methodologies, sciences, and institutional practices (& prejudices).  Each tradition describes something of substance:

Static = language is a thing, not a process.  This is the Saussurian tradition and it bears on Wundt’s sequential mode.  The academic field of linguistics has specialized on the static dimension.

Dynamic = language is a process, not a thing. This is the Vygotsky tradition and it bears on Wundt’s simultaneous mode.  The budding field of gesture studies focuses on this dimension.

However, we must combine them. The dynamic does not replace the static. Gesture gives us access to the dynamic mode. Linguistic form gives the static (no particular synchronic description is favored: we go with whatever fits best the dynamic picture we are trying to paint). The important point is that both modes are present.

The larger picture – 2. An imagery–language dialectic implies:

  • A conflict or opposition of some kind, in our case between the two semiotic modes, a dual semiosis.
  • Resolution of the conflict through change, its unpacking.

A dialectic is inherently dynamic and a good model of the psycholinguistics of speaking.

A dialectic presupposes Vygotsky’s concept of a unit as the smallest component that retains the quality of a whole.  This whole is the imagery–language dialectic.  A GP of unified gesture and speech is the smallest unit in which an imagery-language dialectic takes place. Further reduction to a gesture and a linguistic segment separately destroys the unit itself, leaving only a gesture or linguistic segment but not a dynamic process.

A quick list of GP’s properties:

  • It is proposed as the minimal unit of the imagery-language dialectic.
  • It is a dialectic package that has both linguistic categorial and imagistic components.
  • Growth points are inferred from the totality of communicative events with special focus on speech-gesture synchrony and co-expressivity.
  • By focusing on these properties we bring out the modes of cognition envisioned by Wundt.

All of this is why we gesture.  Gesture is an integral part of speaking. And language could not have begun without it.  The next post in this series will take up the evolutionary precursors of this dual semiotic system of gesture-speech unity.

The many binaries. I have made the following diagram to sort out the several distinct but related binary oppositions, plus a few other critical terms in this posting.  The numbers are the order in which the first mention of the term occurred:

David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago. 

