The brave new world of emoji: Why and how has emoji taken the world by storm?

Blog post written by Cambridge author Vyvyan Evans.

An emoji is a glyph encoded in fonts, like other characters, for use in electronic communication. It’s especially prevalent in digital messaging and social media.  An emoji, or ‘picture character’, is a visual representation of a feeling, idea, entity, status or event.  From a historical perspective, the first emojis were developed in the late 1990s in Japan for use in the world’s first mobile phone internet system. There were originally 176, very crude by today’s standards.

Early Emoji Faces

Early emoji faces

In 2009, the California-based Unicode Consortium, which specifies the international standard for the representation of text across modern digital computing and communication platforms, sanctioned 722 emojis.  The Unicode approved emojis became available to software developers by 2010, and a global phenomenon was born.  Today, there are a little over 1,200 emojis available.

The new universal ‘language’?

While emoji is not, strictly speaking, a language, in the way that say, English, French or Japanese are languages, it is certainly a powerful system of communication.  English is often said to be the world’s global language, so a comparison is instructive.
English has 335 million native speakers, with a further 505 million speakers who use it as a second language.  It’s the primary or official language in 101 countries, from Canada to Cameroon, and from Malta to Malawi – far outstripping any other language.  It has been transplanted far from its point of origin – a small country, on a small island –  spreading far beyond English shores.  But more than the range, English has steadily gained ground in almost all areas of international communication: from commerce, to diplomacy, from aviation to academic publishing, serving as a global Lingua Franca.

But in comparison, emoji dwarfs even the reach of English. The driver for the staggering adoption of emoji has been the advent of mobile computing, especially the smartphone.  Emoji was introduced as an international keyboard in Apple’s operating system (iOS) in October 2011.  And by July 2013 it had been introduced across most Android operating platforms.
There are different measures for assessing the stratospheric rise of emoji.  One factor has been the rapid adoption of smartphones.  Today one quarter of the world’s global population owns a smartphone; and based on a survey of mobile computing habits in 41 countries it is estimated that today there are over 2 billion smartphone users with 31% of the global population accessing the internet by smartphone.  In terms of specific countries, China exceeded 500 million smartphones during the course of 2014, and it is estimated that India will have over 200 million smartphone users this year, and in the USA the same figure will be achieved by 2017, when 65% of the population of the United States will own a smartphone.[i]   In terms of smartphones alone, some 41.5 billion text messages are sent globally every day, using around 6 billion emojis—figures that are mindboggling.[ii]

Emoji all around us

Today emoji is seemingly everywhere, having spread far beyond the messaging systems it was developed for.  The New York Subway has now introduced a system, using emoji, to advise passengers of the status of particular subway lines: whether trains are running normally or not.  As the NY City website explains: “We’re trying to estimate agony on the NYC subway by monitoring time between trains and adding unhappy points for stations typically crowded at rush hour.” [iii]  Here’s an example:

New York Subway Emoji

Reprinted from the WNYC website

Even an institution as august as the BBC is not immune.  Each Friday, the Newsbeat page on the BBC website—associated with BBC Radio 1 and aimed at younger listeners—publishes the news in emoji. Radio listeners are invited to guess what the headline means. See whether you can figure out which headline this emoji ‘sentence’ relates to:

Emoji Question

  1. Four climbers find what they think is a Dodo chick egg. But it’s not. The bird has been extinct for 450 years.
  2. One in four people don’t know the Dodo is extinct, a poll finds.
  3. Four children win a science competition to genetically recreate the Dodo.

(The correct answer is 2).

Moreover, the literary canon is not excluded: a visual designer with a passion for emoji has translated Lewis Carroll’s Alice in Wonderland, a book of 27,500 or so words, into a pictorial narrative, consisting of around 25,000 emoji.[iv]  Some example emoji ‘sentences’ are below:

Alice in Wonderland Emoji

Frivolous or the future?

A common question that people ask is whether anyone—you or I—can simply create their own emojis?  The short answer is yes.  For instance, Finland, on behalf of the Finnish people, has created its own set of national emojis that express Finnish identity.  These include emojis of people in saunas, of a Nokia phone and of a headbanger.

These is a computer generated emojis made available by Finland's Foreign Ministry on Wednesday Nov. 4, 2015. Finland is launching a series of ‘national emojis’ that include people sweating in saunas, classic Nokia phones and heavy metal head-bangers. Petra Theman from the Finnish Foreign Ministry says the emojis will be released as a way to promote the country’s image abroad and are based on themes associated with Finland. (Finnish Foreign Ministry via AP)

These are a computer generated emojis made available by Finland’s Foreign Ministry on Wednesday Nov. 4, 2015. Finland is launching a series of ‘national emojis’ that include people sweating in saunas, classic Nokia phones and heavy metal head-bangers. Petra Theman from the Finnish Foreign Ministry says the emojis will be released as a way to promote the country’s image abroad and are based on themes associated with Finland. (Finnish Foreign Ministry via AP)

Finnish national emojis

But while Finland was the first country in the world to embrace its national identity through emojis, you or I won’t be able to text one another the headbanger emoji anytime soon.   And that’s because the Finnish emojis have not been officially sanctioned by the Unicode Consortium—and Finland has no plans to submit them for consideration.

A new emoji has to meet various criteria to become a candidate emoji.  And only after a lengthy vetting process, taking around 18 months, does a successful candidate emoji pass muster.  Even then, it can take still longer for a newly sanctioned emoji to make it onto our digital keyboards – once approved, emojis can take several operating system – updates, and sometimes several years, to make it onto a smartphone or tablet computer near you.  So, for now, at least, Finland’s bespoke emojis are classed as ‘stickers’: bespoke images that have to be downloaded as part of an app, in order to be inserted them into text messages.

On January 25th 2016, a Chinese – American businesswoman, YiYing Lu, from San Francisco, succeeded where Finland had declined to tread.  Supported by a publically-funded kickstarter campaign, Lu succeeded in having a dumpling achieve official emoji candidate status.  And if successful, the proposed dumpling is set to become a bona fide emoji by the end of 2017.  In so doing, it would join a growing catalogue of food emojis, including pizza, hamburger, doughnuts and even a taco glyph.

Dumpling Emoji

The proposed dumpling emoji. From The Dumpling Project.

             The entire emoji vetting process is controlled by a handful of American multinational corporations, based in California.  And there are strict qualifying criteria for new emojis: they may not depict persons living or dead, nor deities, for instance.  This is why there is no Buddha, or Elvis emojis. Moreover, a candidate emoji must be deemed to have widespread appeal.   On this score, the proposal for a dumpling emoji looks to be a strong candidate. A dumpling – a dough filled food parcel – is popular around the world, with exemplars ranging from Italian ravioli to Russian pelmeni, to Japanese gyoza. In Argentina there is empanadas, Jewish cuisine has kreplach, in Korea there is madoo and China has popstickers.  But when Lu, an aficionado of Chinese dumplings, attempted to text a friend about the dish, she noticed there wasn’t an emoji she could use.

In early 2016, the fact that the dumpling had officially achieved candidate emoji status in California hit the headlines around the world, from New York, to London, to Beijing; even the broadcast media got in on the act. I was invited onto BBC Radio to discuss the success of the Dumpling Kickstarter project, headlining with Lu herself.   The Kickstarter campaign  –  to raise the necessary funds to prepare the proposal  –  had been a self-evident success, achieving over $12,000 and reaching its target within a few hours of going live.  But the headlines beg the very question: why all the fuss about dumplings? Isn’t this simply frivolity gone mad, an expensive bit of silliness?

On the contrary: emoji matters. The Dumpling Project stands for far more than a simplistic bid to have the favourite food of a Bay area business woman become sanctioned as an emoji. It is an instance of internet democracy at work: indeed, the slogan of the project was ‘emoji for the people, by the people’.

One reason why emoji matters is the following; love it or loathe it, emoji is today the world’s global form of communication.  A quarter of the world’s population owns a smartphone, and over 80% of adult smartphone users regularly use emoji, with figures likely to be far higher for under 18s. In short, most of the world’s mobile computing users use emoji much of the time.  And yet, the catalogue of emojis that show up on our smartphones and tablet computers  –  the vocabulary that connects 2 billion people  –  is controlled by a handful of American multinationals – eight of the eleven full members of the Unicode Consortium are American: Oracle, IBM, Microsoft, Adobe, Apple, Google, Facebook and Yahoo.  Moreover, the committee reps of these tech companies are overwhelmingly white, male, and computer engineers – hardly representative of the diversity exhibited by the global users of emojis.  Indeed, as of 2015, the majority of food emojis were associated with North American culture, with some throwbacks to the Japanese origins of emoji (such as a sushi emoji).
Hence, one motivation for the Dumpling Project was to ensure better representation. Of course, on its own, a campaign and proposal for a new food emoji cannot do much.  But as an appeal to global cultural and culinary diversity, and as call for better representation of this diversity, the dumpling is a powerful emblem.  Emoji began as a bizarre little known North Asian phenomenon; since, control has come to rest in the hands of American corporate giants. Dumplings, on the other hand, in their various shapes and guises are truly international and get at the global nature of emoji.
Perhaps more than anything, the Dumpling Project is fun; and in terms of emoji, a sense of fun is the watchword.  While these colourful glyphs add a dollop of personality to our digital messaging, the Dumpling Project makes a powerful point without resorting to burning either bras or effigies.  It avoids gender, religion or politics in conveying a simple message about inclusiveness in the world’s most widely used form of communication. And in the process, it provides us with an object lesson in the unifying and non – threatening nature of emoji. Perhaps the world can, indeed, be united for the better by this new, quasi-universal form of communication.

Communication and emotional intelligence

Setting aside dumplings, one of the serious questions surrounding the rise and rise of emoji is this: Why has the uptake of emoji grown exponentially: why is a truly global system of communication?  Some see emoji as little more than an adolescent grunt, taking us back to the dark ages of illiteracy.   But this prejudice fundamentally misunderstands the nature of communication. And in so doing it radically underestimates the potentially powerful and beneficial role of emoji in the digital age as a communication and educational tool.
All too often we think of language as the mover and the shaker in our everyday world of meaning.  But, in actual fact, most of the meaning we convey and glean in our everyday social encounters, comes from nonverbal cues.  In the spoken medium, gesture, facial expression, body language and speech intonation provide a means of qualifying and adjusting the message conveyed by the words.  A facial wink or smile nuances the language, providing a crucial contextualisation cue, aiding our understanding of the spoken word.  And intonation not only ‘punctuates’ our spoken language—there are no white spaces and full – stops in speech that help us identify where words begin and sentences end—intonation even provides ‘missing’ information not otherwise conveyed by the words.
Much of our communication is nonverbal.  Take gesture: our gestures are minutely choreographed to co-occur with our spoken words. And we seem unable to suppress them. Watch someone on the telephone; they’ll be gesticulating away, despite their gestures being unseen by the person on the other end of the line. Indeed, if gestures are suppressed, in lab settings say, then our speech actually becomes less fluent. We need to gesture to be able to speak properly.  And, by some accounts, gesture may have even been the route that language took in its evolutionary emergence.

Eye contact is another powerful signal we use in our everyday encounters.  We use it to manage our spoken interactions with others.  Speakers avert their gaze from an addressee when talking, but establish eye contact to signal the end of their utterance. We gaze at our addressee to solicit feedback, but avert our gaze when we disapprove of what they are saying. We also glance at our addressee to emphasise a point we’re making.
Eye gaze, gesture, facial expression, and speech prosody are powerful nonverbal cues that convey meaning; they enable us to express our emotional selves, as well as providing an effective and dynamic means of managing our interactions on a moment by moment time – scale.   Face – to – face interaction is multimodal, with meaning conveyed in multiple, overlapping and complementary ways.  This provides a rich communicative environment, with multiple cues for coordinating and managing our spoken interactions.

Digital communication increasingly provides us with an important channel of communication in our increasingly connected 21st century social and professional lives. But the rich, communicative context available in face-to-face encounters is largely absent.  Digital text alone is impoverished and emotionally arid.  Digital communication, seemingly, possesses the power to strip all forms of nuanced expression even from the best of us.   But here emoji can help: it fulfils a similar function in digital communication to gesture, body language and intonation, in spoken communication.  Emoji, in text messaging and other forms of digital communication, enables us to better express tone and provide emotional cues to better manage the ongoing flow of information, and to interpret what the words are meant to convey.

It is no fluke, therefore, that I have found, in my research on emoji usage in the UK, commissioned by TalkTalk Mobile, that 72% of British 18-25 year olds believe that emoji make them better at expressing their feelings.  Far from leading to a drop in standards, emoji are making people – especially the young – better communicators in their digital lives.



[ii] Swyftkey April 2015

[iii]  (accessed 8th July 2015 7.30pm BST).


The Study of Language by George Yule | 5th Edition

The Study of Language has proven itself to be the student and instructor choice for first courses in language and linguistics because of its accessible approach to, what is often, a complicated subject. In every edition, readers have praised the book for being easy to follow, simple to understand, and fun to read, with its quirky anecdotes and examples of languages from around the world. Now in its fifth edition, it is further strengthened by the addition of new student ‘tasks’ (guiding readers to connect theory to real-world scenarios), including examples from even more foreign languages, and updating the text to reflect the most current linguistic theory. We will also be offering an enriched learning experience with our new enhanced eBook (publishing in Autumn), which will include pop-up glossary terms, embedded audio and interactive questioning. All of these features make this the most student-friendly edition of the textbook yet.


The Study of Language

Paragraph above by Valerie Appleby, Development Editor, Cambridge University Press

Romanian Words of Turkish Origin

by Julie Tetel Andresen
Duke University, North Carolina

My favorite words in Romanian are those of Turkish origin. Because parts of present-day Romania were under Ottoman rule for a long time, it’s natural that Romanian would have lexical borrowings from Turkish. One is the word for tulip. Now, tulips are not native to Holland. They are native to Central Asia, and in the eighteenth century there was a craze for tulips at the Ottoman court, and images of tulips could be found on clothing and furniture, while real tulips flourished in gardens and parks. Still today the tulip is a symbol for Turkey. The English word ‘tulip’ comes from the Turkish word tulbend ‘turban’ because the flower resembles the shape of a turban. However, the Turkish word is lâle, and the Romanian word is lalea.

Why do I like this word? Because it’s fun to say, especially in the plural: ‘tulips’ is lalele and ‘the tulips’ is lalelele. There’s ‘coffee’ cafea, ‘coffees’ cafele, and ‘coffees’ cafelele. Same goes for ‘hinge’ balama, plural ‘hinges’ balamale and ‘the hinges’ balamalele and for ‘crane (piece of construction equipment)’ macara, ‘cranes’ macarale and ‘cranes’ macaralele. Not all Turkish borrowings have the phonetic form that generates these plurals, and not all words in Romanian with this plural type come from Turkish, but most of them do.

The other reason I like Turkish borrowings in Romanian is they often come with nice semantic twists. The word belea is usually used in the plural belele and means ‘troubles,’ which is tinged almost, but not quite, with a sense of the ridiculous. When I think of ‘my troubles’ as belelele mele, they don’t seem so bad. And what could be better than the word beizadea ‘son of a bei, a high ranking Turkish official’? It would never be used in Romanian as a compliment, and we need such a word in English, because entitled spoiled brat doesn’t quite cover it.

Finally, there’s the Romanian word for ‘neighborhood, suburb’ mahala, and it, too, is freighted with negative connotations. The politică de mahala, which includes personal attacks and reckless speech, would characterize much of what’s gone on in Washington DC is recent years. Those readers with knowledge of Arabic will recognize the root halla ‘to lodge’ with the place prefix ma-, making a word that means something like ‘building.’ So, the Turkish borrowing is itself a borrowing from Arabic. This word was also borrowed into Persian and is immortalized in the name Taj Mahal, which means in Persian ‘best of buildings.’ So, in the western extent of this etymon, we have a down-market usage, while in the eastern extent, we find something beautiful. Romania has its beauties, too. They’re found in the language.

The origin of language in gesture–speech unity

Part 4:  Mead’s Loop (2). Wider consequences.

David McNeill, University of Chicago

As it evolved Mead’s Loop created “new actions,” as mentioned previously.  New actions are one of the “wider consequences” of Mead’s Loop. Action itself was a target of natural selection, and the new actions emerged organically. They did not need a separate evolution. A second consequence is metaphoricity. A third is the emblem, a culturally established gesture with metaphoricity at the core. A fourth is how children acquire language – twice, the first of which goes through the equivalent of extinction.  A fifth (many more can be identified) is what phenomenologist philosophy calls “being” – “inhabiting” gesture and speech, rather than only displaying them as elements of communication.

1. New actions and old. What remains of “old actions” in the new world? Many consider actions to be the source of gestures, that by adopting an action-based semiotic a gesture of something being flat is a truncated version of “making something flat.”

The idea that gestures derive from actions is plausible at first glance but there is more (or less) than meets the eye. A gesture may look like a pragmatic action but the action has changed at its core. To describe a gesture as “outlining” or “shaping” is useful as a description but to also say that such a practical action is still within the gesture is to disregard what makes the gesture a human sign.

In the illustration, a “rising hollowness” gesture looks like the action of lifting something in the hand, but it is not lifting at all. It is an image of the character rising, of the interior of the pipe through which he rose, and of the direction of his motion upward – all compacted into one symbolic form to differentiate a field of meaningful equivalents having to do with HOW TO CLIMB A PIPE: ON THE INSIDE. This complex idea, as a unity, orchestrated the hand shape and movement; it is the same motor response but it is not the same action as lifting up an object.

While a gesture may engage some of the same movements and tap in part the same motor schemas as an “action-action” it has its own thought–language–hand link. In keeping with the unity of speech and gesture, the manual movements of gestures as well as the actions of speech are co-opted by Mead’s Loop and orchestrated in new ways by significances other than those of the original actions. We observe the separation of “action-actions” and “gesture-actions” directly in the IW case, where there has been a complete deafferentation.  Without vision, action-actions and gesture-actions dissociate, the first being impossible while second are normal (IW is described fully in a later post).

Some gestures do depict actions; they ritualize the actions they depict. They are more than actions – a kind of performance, a replication of the action, and may also include posture, spatial location, voice, etc. as well as the manual action.

These gestures are of two kinds. Some are pantomimes at their own locus on the continuum of gesture types. Others are “character viewpoint” (C-VPT) gestures, gesticulations with the viewpoint of the character that is being recounted. And here again the difference of pantomimes and gesticulations applies

The C-VPTs pass through the thought–language–hand link. Unlike pantomimes they are co-expressive with speech. Their character’s viewpoint is part of the semiotic which speech opposes in a dialectic. A C-VPT is, among other things, not an O-VPT or “observer viewpoint,” a contrast that it has but that is not part of a pantomime.

Part of the scope, creativity and distinctiveness of human thought lies precisely in its freedom from pragmatic action constraints. When the hands make a gesture, it is thought that controls them and not a hidden action with its own purposes in relation to the physical world.

2. Metaphoricity is the semiotic basis of metaphor, and it also arises out of Mead’s Loop’s “new actions.” It is the mental ability (apparently unique to humans) to experience one thing in terms of something else (“metaphors” = specific cultural-linguistic packages, or temporary individual impromptu ones, that rest on this semiotic). Thinking with metaphors is natural and irresistible, and is explained if metaphoricity (not necessarily any specific metaphor) was a product of how language began. According to Mead’s Loop metaphoricity has existed from the very beginning.

The metaphoricity semiotic came about when the orchestration of actions of the vocal tract and hands was undertaken by something other than those actions, by meaningful gesture imagery, one thing (voice, hands) gaining significance in terms of something else that it is not.

Examples are the hands rotating in the Part 1 illustration – a process as a rotation (co-expressive with “barreling up,” a spoken version of the same metaphor). A novel, not culturally shared, impromptu example is a gesture for inaccessibility – hands separated, one above, the other below, Tweety on his perch, Sylvester on the street below.  The speaker first struck this pose as she said, “part of the problem is that Tweety Bird’s inaccessible,” and then struck it six more times as she listed the various attempts that Sylvester had made to reach Tweety, each gesture starting as an iconic depiction of the attempt and seguing into the metaphor, conveying the futility of the attempt because of this inaccessibility.

3. Emblems as metaphors.  Metaphoricity shows up spontaneously in these impromptu gestures made on the fly by individual speakers but it also appears in gestures of the seemingly opposite kind – culturally ratified gestures of the kinds listed in dictionaries of “Neapolitan Gestures” or “Persian Gestures” and the like – the so-called emblems.  Many if not all cultural emblems contain probably originally impromptu metaphors at the core. The very possibility of an emblem can be considered another consequence of Mead’s Loop, by way of metaphoricity.

Cultures imbue certain metaphors (arguably ones reflecting cultural values and history) with standards of form and specific functions.  This moves them from impromptu gesticulation toward (but not quite reaching) the sign language end of the gesture continuum.

The “OK” sign (sometimes called “the ring”) is a convenient example that appears in many cultures. It is a culturally mandated version of gestures for precision. In performing them one experiences the abstract idea of precision as a feeling of minimization of the space between surfaces; surfaces not automatically in contact but which in the gesture touch. The OK sign’s meaning as an emblem reflects this precision origin, something “being OK,” earning approbation, because it “precisely” meets the requirements at hand. In raw form a precision gesture can be made in various ways, and it has variety even with a single hand, any finger in contact with the thumb (the thumb is invariable for anatomical reasons, but the different fingers contacting it may have their own significances).

The “OK” emblem restricts the handshape – only the forefinger makes contact, the other

fingers extending outward. Meaning is likewise restricted – approbation because something is precise, is just so (like the spoken, “that’s it!”). Reflecting its precision source “OK” differs from another approbation emblem, thumbs-up, which has its own metaphor – “up is good.”  Thumbs-up (or -down) uses pointing to indicate the metaphor proper. The upturned thumb indicates the location of the good in “up-is-good” (conversely, thumbs-down indicates the location of the bad in “down-is-bad”). (The “precision” meaning of “OK” also rules out another theory that the gesture has nothing to do with metaphor but reproduces the letters “O” and “K” – then why is precision the core?)

Emblems do not reach the sign language end of the gesture continuum because, while they can co-occur with other emblems, the combinations lack any stable syntagmatic value: waving the “OK” sign back and forth could be “not OK,” or “everything is OK,” or “look, it’s OK!” a range so broad and replete with contradictions that it is fundamentally non-languagelike.

Children pick up some emblems as early as the first birthday (waving “bye-bye” and others) but it is hugely doubtful that metaphoricity plays any part. A metaphor source of any kind is unlikely (to say the least, if the source of “bye-bye” is something like wiping a situation or oneself away).

So-called “child metaphors” likewise probably do not involve metaphoricity – a 24-month-old saying “cup swimming” as he pushed a cup along in his bath or “I’m a big waterfall” as he slides down his father’s side while wrestling, presumably are not experiencing the cup’s motion as swimming or his own motion as a waterfall.  Instead, there is a piling on of descriptions as they are “shared with the other,” which is quite a different thing (cf. Werner & Kaplan’s remark: the speech of children this young has “the character of ‘sharing’ experiences with the other rather than of ‘communicating’ messages to the other”).

4. What it means if children acquire language twice. Of all the possible indicators of gesture-first it is early ontogenesis that most convincingly suggests it may once have existed. Something like a recapitulation of it arises and performs a scaffolding function like that envisioned by gesture-first advocates. But it dies out and is followed by a kind of extinction during a transitional period from 2 to 3 years roughly.

GPs then emerge “late,” at age 3 or 4 years, with several indications of dual semiosis emerging at the same time, suggesting that gesture-first had once existed (both in children and in the ancient past) but went extinct and a new form of language followed, where speech and gesture imagery merged into the unified packages inhabited by thought and being that we see in ourselves.

That is, language appears to emerge in the child twice, with the first emergence extinguishing – children first acquiring a single semiotic language of which a gesture-first creature also would have been capable; later developing the dual semiotic language we all carry with us.

This style of argument – resting on ontogeny-recapitulates-phylogeny – has often been derided but there has been a recent revival of interest in it. It can be useful and heuristic for sorting out steps in phylogenesis.

For current-day children, the argument implies, contrary to a longstanding assumption that children develop more or less continuously (perhaps with stages, but earlier acquisitions still carrying forward), that ontogenesis is not cumulative; it is a mixture of continuity and discontinuity.

Discontinuities come from the recapitulation of the two origins. Continuities come from an autonomous development of speech control. Speech in the child separates from Mead’s Loop, its evolutionary origin according to theory, which could be due to other evolutionary pressures that adapted speech to garner parental attachment (“baby-talk” is the adult half of the same adaptation).

The early single-semiotic acquisition is limited, much as Bannard et al. remark:

“…children’s speech for at least the first 2 years of multiword speech is remarkably restricted, with constructions being seen with only a small set of frequent verbs … and many utterances being built from lexically-specific frames.”

Limitation is seen in what Braine called “pivot grammars” and Lieven et al “templates.”  Pivots could have been the highest reach of gesture-first.  The table below is an example from Braine.  There would be as many “grammars” as there are pivots, possibly in the hundreds for gesture-first creatures (such a language would not have one of the major features of human language, the infinite productivity in which there is no last sentence).

In other words, the first steps children take toward language may not lead to language, but to something coming from a long-extinct creature; then a second origin of the language that we take for granted, but this is not until the first has extinguished. The single-semiotic gestures without co-expressive speech of the first phase – pointing, pantomime, emblems, action-stubs, diffuse motor responses – are quire different from those of the last – dual semiotic and unified with speech.

A pivot “grammar” with  “want”

want     + { baby

We can take the recapitulation argument a step further: when something emerges in current-day ontogenesis only at a certain stage we reason (in this way of arguing) that the original natural selection of the feature (if any) took place in a similar psychological milieu in phylogenesis. We exploit the fact that children’s intellectual status is not fixed; it is changing. Thus we look for new states that seem pegged to steps in the ontogenesis of growth points and Mead’s Loop underlying them, and consider these steps as possible windows onto phylogenesis.

Using this argument, we are able to look at the ontogenesis of the GP and formulate possible phylogenetic landmarks.  Most importantly, the GP’s emergence seems tied to the development of the child’s self-aware agency, appearing first at age 4 or so, suggesting that in phylogenesis a similar sense of one’s own agency was a condition for Mead’s Loop, a plausible hypothesis given that Mead’s Loop is adaptive when the adult sees her own gesture–speech as social/public. It made “instruction” possible as opposed to “doing” with an onlooker.

5. Material carriers, “inhabitance,” and cognitive being. Another consequence of Mead’s Loop is what Vygotsky termed the material carrier – the embodiment of meaning in enactments or material experiences. Having a material carrier enhances the symbolization’s experiential potency. The speaker/hearer “inhabits” the materialized symbols.

Experiential enhancement of language is possible if the gesture is the image in an imagery–language dialectic, not an “expression” or “representation” of it, but is it. From this viewpoint, a gesture, the global-synthetic whole, is an image in its most developed – that is, in its most materially, naturally embodied – form. The absence of a gesture is the converse, an image in its least material form.

The material carrier concept thus explains how an imagery–language dialectic still is possible in the absence of visible gestural movement. When there is no overt gesture there is still imagery and with linguistic categorization a dialectic, still a simultaneous rendering of meaning in opposite semiotic modes – the dialectic in its essentials – but bleached and at the lowest level of materialization. This leads us to expect that gestures are more elaborate, more materialized and more frequent – more “existent” – when the gesture has greater newsworthiness, as we shall see in the next post of this series.

The source of the material carrier effect is ultimately Mead’s Loop with gesture-actions orchestrated under significances other than action-actions: materialization follows ineluctably. Materialization implies that the gesture, this natural material carrier, the actual motion of the gesture itself, is a dimension of meaning.

The concept of a material carrier is brought to a whole new level when we turn to Merleau-Ponty for insight into the unity of gesture and language and what we expect of gesture in a dual semiotic process.

Gesture, the instantaneous, global, nonconventional component, is “not an external accompaniment” of speech, which is the sequential, analytic, combinatoric component; it is not a “representation” of meaning, but instead meaning “inhabits” it (partly quoted in Part 3):

The link between the word and its living meaning is not an external accompaniment to intellectual processes, the meaning inhabits the word, and language “is not an external accompaniment to intellectual processes” (Merleau-Ponty’s quotation is from Gelb and Goldstein 1925). We are therefore led to recognize a gestural or existential significance to speech . . . Language certainly has inner content, but this is not self-subsistent and self-conscious thought. What then does language express, if it does not express thoughts? It presents or rather it is the subject’s taking up of a position in the world of his meanings. (p. 193).

The GP is geared to this “existential content” of speech – this “taking up a position in the world.”  Gesture, as part of the GP, is inhabited by the same “living meaning” that inhabits the word (and beyond, the discourse).

A deeper answer to the query – when we see a gesture, what are we seeing? – is that it is part of the speaker’s current cognitive being, her very mental existence, at the moment it occurs. This extends the material carrier and ultimately rests on a gesture–speech unit. By performing the gesture, a core idea is brought into concrete existence and becomes part of the speaker’s existence at that moment.

The Heideggerian echo in this statement is not accidental. Following Heidegger’s emphasis on being, a gesture is not a representation, or is not only such: it is a form of being. From a first-person perspective, the gesture is part of the immediate existence of the speaker. Gestures (and words, etc., as well) are themselves thinking in one of its many forms – not only expressions of thought, but thought, i.e., cognitive being, itself. To the speaker, gesture and speech are not only “messages” or communications, but are a way of cognitively existing, of cognitively being, at the moment of speaking.

The speaker who creates a gesture of Sylvester rising up fused with the pipe’s hollowness is, according to this interpretation, embodying thought in gesture, and this action – thought in gesture-action over the thought–language–hand link – was part of the person’s being cognitively at that moment.

To make a gesture, from this perspective, is to bring thought into existence on a concrete plane, just as writing out a word can have a similar effect. There is not a causal sequence: thought → speech/gesture. Speech and gesture are the thought coming into being at that instant. The greater the felt departure of the thought from the immediate context, the more likely its materialization as a gesture, because effort adds to being. Thus, gestures are more or less elaborated depending on the importance of material realization to the existence of the thought.

6. The theater of the mind, closed. The “H-model” avoids the homunculus problem encountered by the third person perspective inherent to the concept of a “representation” and with it the “theater of the mind” problem.  The theater of the mind is the presumed central thinking area in which representations are “presented” to a receiving intelligence. The possibilities for homunculi – each with its own theater and receiving intelligence – spiraling down inside other homunculi are well known.  In the H-model, there is no theater and no extra being; the gesture is, rather, part of the speaker’s momentary mode of being itself, and is not “watched.”  The theater is closed or, rather, it never opened.

Further Reading


Kendon, Adam. 2009. ‘Manual actions, speech and the nature of language.’ In Gambarara, Daniele and Givigliano, Alfredo (eds.). Origine e sviluppo del linguaggio, fra teoria e storia. Pubblicazioni della Società di Filosofia del Linguaggio, pp. 19-33. Rome: Aracne editrice s.r.l.

Kendon, Adam. 2010. ‘Accounting for forelimb actions as a component of utterance: An evolutionary approach.’ Plenary Lecture. International Society for Gesture Studies, Frankfurt/Oder, July 25, 2010. (abstract at, accessed 09/10/12).

LeBaron, Curtis and Streeck, Jürgen. 2000. ‘Gestures, knowledge, and the world,’ in D. McNeill (ed.) Language and Gesture, pp. 118-138. Cambridge.

Streeck, Jürgen. 2010. Gesturecraft: The manu-facture of meaning. Benjamins.

Child language and ontogeny recapitulates phylogeny

Bannard, C., Lieven, E., and Tomasello, M. 2009. ‘Evaluating constructivist theory via Bayesian modeling of children’s early grammatical development.’ Abstract posted on the International Cognitive Linguistics Conference website, accessed 03/30/09.

Braine, Martin D. S. 1963. ‘The ontogeny of English phrase structure: the first phase.’ Language 39:1-13.

Butcher, Cynthia & Goldin-Meadow, Susan. 2000. Gesture and the transition from one- to two-word speech: When hand and mouth come together. In D. McNeill (ed.), Language and Gesture, pp. 235-257. Cambridge

Goldin-Meadow, Susan & Butcher, Cynthia. 2003. Pointing toward two-word speech in young children. In S. Kita (ed.), Pointing: Where language, culture, and cognition meet, pp. 85-107. Erlbaum.

Levy, Elena, 2011.  ‘A new study of the co-emergence of speech and gestures: Towards an embodied account of early narrative development.’ Language Fest, University of Connecticut, Storrs, CT.

Lieven, Elena, Salomo, Dorothé and Tomasello, Michael. 2009. ‘Two-year-old children’s production of multiword utterances: A usage-based analysis’, Cognitive Linguistics. 20: 461-507.

MacNeilage, Peter F. 2008. The Origin of Speech. Oxford.

Werner, Heinz and Kaplan, Bernard. 1963. Symbol Formation. Wiley.

Material carriers, inhabitance, and cognitive being

Dreyfus, H. 1994. Being-in-the-World: A Commentary on Heidegger’s Being and Time, Division I. MIT.

Gallagher, Shaun.  2005. How the Body Shapes the Mind. Oxford.

Merleau-Ponty, Maurice. 1962.  Phenomenology of Perception (C. Smith, trans.). Routledge.

Quaeghebeur, Liesbet. 2012. The ‘All-at-Onceness’ of embodied, face-to-face interaction. Journal of Cognitive Semiotics 4: 167-188.


Cienki, Alan and Müller, Cornelia. 2008. Metaphor and Gesture. Benjamins.

Lakoff, George and Johnson, Mark. 1980. Metaphors We Live By. Chicago.

Müller, Cornelia. 2008. Metaphors – Dead and Alive, Sleeping and Waking. A Dynamic View. Chicago.


Carlson, Patricia and Anisfeld, Moshe. 1969. ‘Some observations on the linguistic competence of a two-year-old child.’ Child Development 40:569-575.

Theater of the mind

Dennett, Daniel C. 1991. Consciousness Explained. Little, Brown.

David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago.

His new title How Language Began: Gesture and Speech in Human Evolution is now available from Cambridge University Press at £19.99/$36.99


The origin of language in gesture–speech unity

Part 3: Mead’s Loop (1).

by Professor David McNeill

Part 1 of this series put forth the idea that language is inseparable from imagery, in particular the imagery of gesture, and that theories of language origin can be judged by how well they predict this gesture–speech unity.  The second part applied the test to a widely held origin theory, gesture-first, and found it wanting – doubly so, in fact. This part applies the test to a new hypothesis, which I call “Mead’s Loop.”

Mead’s Loop holds that gesture was essential in the origin of language.  In this it agrees with gesture-first, but differs in that, it says, gesture and speech had to be naturally selected together.  Rather than gesture-first (or speech-first), gesture and speech were what Liesbet Quaeghebeur, philosopher at the University of Antwerp, has called “equiprimordial,” the antithesis of gesture- or speech-first.

Mead’s Loop rests upon an idea from the early 20th Century philosopher, George Herbert Mead, formulated as an origin hypothesis to portray what, some one-half to one million years ago, emerged in the evolution of the human brain. It posits the mirror neuron circuits that gesture-first also assumes, but again with a difference. Mirror neurons in Mead’s Loop were “twisted” to respond to one’s own gestures as if they were from someone else.

Mirror neurons have been directly recorded in monkeys and reside supposedly in all primate brains, including ours.  Part 2 quoted Rizzolatti’s and Arbib’s definition. A Wikipedia article also defines it succinctly: “[a] mirror neuron is a neuron that fires both when an animal acts and when the animal observes the same action performed by another.” I call these mirror neurons “straight,” to distinguish them from Mead’s Loop.  Note what they provide.  The significance of the straight mirror neuron response is that of the action it mimics.  For example, seeing someone picking up a treat, the mirror neuron repeats this action, with its meaning. The action of another is repeated (not necessarily overtly but at the orchestration level) and it becomes one’s own. If the mirror neuron circuit produces a gesture it will be a mimicked action like the one perceived. It in fact resembles pantomime, a gesture, as we saw in Part 2, that systematically blocks gesture–speech unities.

Mead’s Loop refers to a posited new adaptation, a thought-language-hand link, located at least in part in the area now called Broca’s Area (other brain areas also must have been involved). Here is the twist:  G. H. Mead said that a gesture is meaningful when it evokes the same response in the one making it as it evokes in the one receiving it. For evolution, this suggests that mirror neurons came to bring one’s own gesture imagery and its significance into Broca’a Area, the motor area for orchestrating actions including speech and gesture. While straight mirror neurons reproduce the actions of another, with meanings that are those of the actions, the Mead’s Loop twist responds to one’s own gestures as if from another, and brings different meanings into the action-orchestration areas of the brain, those of the gestures.

The Mead’s Loop twist, because it brings the gesture’s meaning into the orchestration process, merges and synchronizes speech with gesture at points where they co-express the same idea. Hence the unity: it is built in. In all of this the gesture is fundamental. Mead’s Loop creates “new actions,” actions orchestrated under significances other than their practical goal-directed meanings – those of the gestures that Mead’s Loop imports.  Because Mead’s Loop gave gestures the power to orchestrate speech, Mead’s Loop was the beginning of everything in language.

These achievements opened a door to language dynamically. Mead’s Loop had both semiotic and motor effects:

  • Semiotically, it brought the gesture’s meaning into the mirror neuron area. Mirror neurons no longer were confined to the semiosis of actions. One’s own gestures entered, opening action control to the imagery of gesture. Extended by metaphoricity, the significance of imagery is unlimited. So from this one change, the meaning potential of language moved away from only action and expanded vastly.
  • At the motor level, in the areas of the brain where speech movements are orchestrated, Mead’s Loop enabled significant imagery – gesture – to “chunk” motor control of the vocal tract and diaphragm, and laid the foundation of the GP.

How does Mead’s Loop produce gesture-speech unity?  As mentioned, it was built in from the start.  The evolutionary step was a self-response by mirror neurons.  Mirror neurons complete Mead’s loop in a part of the brain where action sequences are organized – two kinds of sequential actions, speech and gesture, converging, with meaningful imagery the integral component. Co-opting sequential actions by a socially referenced stimulus (imagery) provides a new kind of action in the vocal tract – speech, with its own movements, timing, tongue postures, and breathing. It thus explains, which gesture-first could not explain, why gesture and speech are unified.

By treating imagery as a social stimulus Mead’s loop also explains why gestures occur preferentially in a social context of some kind (face-to-face, on the phone, but not alone talking to a tape recorder).

But was the twist needed?  It was, because the gesture, although emanating with full meaning from the same brain area as speech, does not unite with it. It is neither synchronous nor co-expressive. It is incomplete. Gesture–speech unity happens only when the gesture gets a self-response via Mead’s Loop and becomes able to orchestrate speech movements (not sequentially, but self-response is an essential aspect of the gesture’s meaning, the meaning under which speech and gesture combine). This was Mead’s insight. He recognized that gesture (and speech) have fundamentally a social character and, to be meaningful, must have a social/public presence: the gesture, he said, evokes the same response in the one making it as in the one receiving it. With Mead’s Loop, this occurs when the one making and the one receiving are the same; this is the “twist”; then the gesture is a meaningful and socially pertinent event, with the potential to connect to everything else in language dynamically. It orchestrates vocal and manual movements.

Then speech passes from “display” (of which chimps are capable) to “communicating messages to the other” (a phrase from Werner & Kaplan).

Straight mirror neurons do not respond because there is no external action; only the “twist” can self-respond in this way.

A self-response to the gesture can pick up other meanings as well, and these can further cement gesture–speech unity.  In the process the gesture also changes to meet its role of forming a unit with speech. In Part 4 of this series I’ll show gestures reshaped by gesture–speech unity.

Mead’s Loop also gains substance for a reason identified by Merleau-Ponty: “Language … presents or rather it is the subject’s taking up of a position in the world of his meanings. (p. 193).  Via Mead’s Loop and its social reference, the gesture takes up its position in the world of meanings as well.  This move equally reshapes the gesture, in keeping with gesture–speech unity.

That Mead’s Loop gave one’s gestures a public, social significance had importance for another reason, natural selection. It meant that the Mead’s Loop twist was adaptive in social-interactive situations (so those favorites, “man-the-tool-maker” and “man-the-hunter,” would be incidental to language origin, effective insofar as they are also social but not significant in themselves). The social reference gave adults, in particular mothers inculcating cultural norms in infants, the sense of being an instructor as opposed to being just a doer with an onlooker (which is what happens with chimpanzees).  Entire cultural practices of human childrearing depend upon this sense. The adult must be sensitive to her own gestures as social/public actions. Hence the adaptiveness of Mead’s Loop.  Sensing actions as social impacts the next generation of children who, as a result of it, do better at coping, and pass it on.

Origin of syntax.  To many the origin of patterned language, of syntax, is the crux of the origin of language as a whole. How, when, or even why syntax emerged is far from obvious. Proposals range from a “big bang” single mutation, through cultural practices such as ritual or grooming, to no special sources at all, just a natural by-product of human intelligence in general.  Whatever it was, over eons it has led to vast crosslinguistic diversity. I follow Eric Lenneberg and affirm that syntax rests on a biological foundation, hence is a topic in the origin of language.

The basic idea stemming from Mead’s Loop is that words and syntax are continuations of GPs. They and GPs are linked organically. I seek the natural selection of syntax (the general ability, not specific constructions, although some constructions also could have been naturally selected) in three places – the nature of the GP and its unpacking; the new paths this opened; and shareability. These in turn suggest three kinds of adaptive advantages.

First, syntax is crucial for a GP dialectic. Without morphs and combinations of morphs there cannot be a semiotic opposition to gesture imagery.

Second and linked, syntax stabilizes the dialectic. It is the resting point par excellence.

Third, syntax helps make language shareable in sociocultural encounters.

Any or all of these factors could have favored an ability to form syntactic patterns, defined generally as creating meaningful wholes out of segmented elements (morphs); meeting standards of form; providing cultural identity; learning this system, and transmitting and maintaining it over space and time. We are focusing on the dynamic dimension of language. This dimension crosscuts the static and is not reducible to it (nor vice versa, the static is not reducible to the dynamic; they are two dimensions, not one dimension in two forms).

That the static and dynamic arose together, were equiprimordial, is explained by Mead’s Loop’s built-in social referencing, combined with gesture imagery. From this vantage point, we can claim that words and sentences continue the evolution of the GP. Contrary to traditions both philological and Biblical, language did not begin with a “first word.” Words emerged from GPs. There was an emerging ability to differentiate newsworthy points in contexts; a first gesture–speech unit but not a first word.

The paradox of an emerging syntax is that it is almost invisible in current humans. Children learn their language with speed, but they are given a language, not inventing one. When gestures are forced to be the sole medium of communication in experiments, however, they quickly develop linguistic values original to the speaker and the situation, not borrowed from an existing language, including novel axes of selection (paradigmatic values) and combination (syntagmatic values), suggesting a faculty for syntactic innovation in current-day humans. It is this hidden ability we propose that arose out of GPs at the origin.

Whence such a faculty for syntactic invention?  An important insight is “shareability” from a 1983 paper by Jennifer Freyd. To share information imposes a “discreteness filter” such that the semiotic properties of words (discreteness) and word combinations arise.  Shareability would have existed at the dawn.  It also existed in the gesture-communication experiments, so conditions for new word forms and combinations existed in both. Words and combinations of words are part of the GP’s imagery-language dialectic, both opposing codified linguistic form to gesture semiotically, and providing a dialectic stop-order through unpacking.

GPs and syntax thus emerged together according to Mead’s Loop, and could do so because in Mead’s Loop the gesture assumes the guise of a social other and invites shareability from the beginning. Here began the static dimension of language.

Most of the static dimension, however, is not biological but socio-cultural and historical, shaped over time. To have forms that are repeatable, standardized, and non-context-bound makes them durable and portable from encounter to encounter, where they can be reshaped by intragroup and intergroup encounters, including migrations where newcomers encounter existing populations (there may be spontaneous “mutations” beyond encounters as well).

This in itself would have given syntactic innovation adaptive value and replaced temporal order syntax with morph elaborations, releasing static structure meanings from temporal sequence. The primordial syntax according to Mead’s Loop was mapping meanings onto temporal sequences. The orchestration of actions under some significance with shareability allots meaning fragments to ordered segments of time.  The response to encounters is to shake up this temporal syntax. Given gesture–speech unity, gestures change as well (the examples in the fourth part of this series illustrate this as well).

The cumulative effect would have been to liberate temporal sequence for other expressive functions, some of which may also take part in imagery–language dialectics on muliple levels. Edward Sapir long ago divided the world’s languages according to how they combine meanings into single words – analytic or isolating, relying on temporal sequence (e.g., Chinese), synthetic, with some liberation (e.g., Latin), or polysynthetic, with much freedom (e.g., Inuit), which reflect degrees of adornment of the basic brain orchestration plan (and English, with its relatively fixed word orders, is one of the less adorned).

Rethinking language as action control. Speech according to Mead’s Loop, among other things, is thus a culturally mandated action, orchestrated by imagery. Action is a target of natural selection in any case, and in the selection scenario where Mead’s Loop had adaptability, adults inculcating cultural norms in infants, the overt actions of the adults fed natural selection. Linguistic standards are not only about “good forms” but also about “good actions.”  The discovery of the FOXP2 gene points to the centrality of action control at the foundation of language. The mutation in the KE family that led to its discovery affects fine motor control, speech articulation and other actions, as well as syntax. As a gene affecting fine-tuned action control, it would influence the raw material on which Mead’s Loop and its new form of action worked (the Mead’s Loop innovation itself would be something else genetically). The gene (actually, a transcription factor, a genetic “on-off” switch), which differs in the human version compared to that in chimps, has undergone accelerated evolution and when implanted into engineered mice changes vocalization. Taking this lead, we can consider syntax as a form of culturally authorized action control of the vocal organs, the hands and other body parts.

Brain model. The language centers of the brain have classically been regarded as just two, Wernicke’s and Broca’s areas. But if we are on the right track with Mead’s Loop, many other areas of the brain are involved and are equally “language areas.” Broca’s Area itself is not a “language area” but a region for complex action orchestration under various significances. Typical item-recognition, memory and production tests would not tap these other brain regions, but discourse, conversation, play, work, and the exigencies of language in daily life (where language originated) would.  Broca’s area may be the convergence point of Mead’s Loop and the imagery–language dialectic, including unpacking, but other areas – the left rear hemisphere (categorial content in GPs), the right hemisphere (imagery and metaphor), and the prefrontal cortex (the alternatives a GP differentiates) – can equally be called the “language areas” of the brain. Thought-language-hand links tie them together when the dynamic dimension of language is engaged.

Selection scenario. The family, particularly in its child-rearing aspects, is an environment where the social/public value of one’s own gestures is adaptive, and where Mead’s Loop could have been naturally selected (no doubt Mead’s Loop was adaptive in other contexts as well). Archeologists date the dawn of family life (with cooking hearths) to about one million years ago, implying a stable family membership and a division of labor. So it was possibly back then that the natural selection of Mead’s Loop also began.

The focus of this selection pressure was adults, women in particular. In this scenario language began in adults, in the form of mothers instructing infants. Their infants, both female and male, would benefit from superior cultural inculcation, and so become more able to carry on any genetic disposition for Mead’s Loop themselves.

Did Neaderthals speak?  The Neanderthal genome project has shown that this extinct form of human also had FOXP2, and also may have been capable of fine motor control. Whether this control covered the vocal tract is unknown but speech seems not impossible.  Some have suggested that the Neanderthal brain, although large, had a different developmental time course from that of human children (much briefer) and did not sustain robust activity of the prefrontal cortex.  A short ontogenesis meant less time for any GP-like development. The prefrontal cortex, among its other functions, arranges and selects alternatives. The formation of the contexts that GPs differentiate is a place in language where this ability is tapped.  Weakened contexts would have yielded cognitive inflexibility and gesture–speech redundancy rather than unity. Any GP-like dynamics is thus also likely to have been muted.

Even if Neanderthals could speak, their speech is likely to have been temporally sequenced, and limited to what Derek Bickerton posited as proto-language and what Martin Braine called pivot grammars, each pivot a separate “grammar” unto itself. A collection of disparate pivot grammars, lacking an overall system, may have been their highest achievement. Possible gestures would be gesture-first-like pantomimes and pointing (available to today’s sub-two-year-olds).  Kindly opinion is that our ancestors had nothing directly to do with the Neanderthal extinction but we may have out-competed them. A cultural superiority over cognitive inflexibility and a limited, single semiotic (a profile not unlike Downs syndrome) could have been fatal, if unintended.

Further Reading

Adult–Infant inculcation

Hrdy, Sarah Blaffer. 2009. Mothers and others: The evolutionary origins of mutual understanding.  Harvard.

Tomasello, Michael. 1999. The Cultural Origins of Human Cognition. Harvard.

Brain model

McNeill, David, & Pedelty, Laura. 1995. Right brain and gesture.  In K. Emmorey & J. Reilly (eds.), Sign, Gesture, and Space, pp. 63-85.  Erlbaum.

Nishitani, Nobuyuki, Schürmann, Martin, Amunts, Katrin and Hari, Riitta. 2005. ‘Broca’s region: from action to language.’ Physiology 20: 60-69.

Where language began

Atkinson, Quentin D. 2011. ‘Phonemic diversity supports a serial founder effect model of language expansion from Africa.’ Science 332: 346-349.

Mead’s Loop “twist”

Cohen, Akiba A. 1977. ‘The communicative function of hand illustrators.’ Journal of Communication 27: 54-63.

McNeill, David., Duncan, Susan. D., Cole, Jonathan., Gallagher, Shaun. and Bertenthal, Bennett. 2008. ‘Growth points from the very beginning.’ Interaction Studies (special issue on proto-language, D. Bickerton and M. Arbib, eds.) 9: 117-132.

Mead, George Herbert. 1974. Mind, self, and society from the standpoint of a social behaviorist (C. W. Morris ed. and introduction).  Chicago.

Merleau-Ponty, Maurice. 1962.  Phenomenology of Perception (C. Smith, trans.). Routledge.


Bickerton, Derek. 1990. Language and Species. Chicago.

Braine, Martin D. S. 1963. ‘The ontogeny of English phrase structure: the first phase.’ Language 39: s1-13.

Pääbo, S. and colleagues. 2009. News focus in Science 323: 866-871.

Rozzi, Fernando V. Ramirez and de Castro, José Maria Bermudez. 2004. ‘Surprisingly rapid growth in Neanderthals.’ Nature 428: 936-939.

Wynn, Thomas & Coolidge, Frederick. 2011. How to Think Like a Neandertal. Oxford.

Speech as action control

MacAndrew, Alec. ‘FOXP2 and the evolution of language.’

MacNeilage, Peter F. 2008. The Origin of Speech. Oxford.

“Straight” mirror neurons

Rizzolatti, Giacomo and Arbib, Michael. 1998.  ‘Language within our grasp.’  Trends in Neurosciences 21: 188-194.

Wikipedia article on the Mirror Neuron.

Syntax and shareability

Freyd, Jennifer J.  1983.  ‘Shareability:  The social psychology of epistemology.’  Cognitive Science 7: 191-210.

Lenneberg, Eric. 1967. Biological Foundation of Language. Wiley.

McNeill, David and Sowa, Claudia. 2011.  ‘Birth of a morph.’ In G. Stam and M. Ishino (eds.), Integrating Gestures: The Interdisciplinary Nature of Gesture, pp. 27-47. Benjamins.

Sapir, Edward 1921. Language: An Introduction to the Study of Speech. Harcourt, Brace & World.

Thomason, Sarah. 2011. ‘Does language contact simplify grammars? (No).’ Talk given at the University of Chicago, April 12.


David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago.

His new title How Language Began: Gesture and Speech in Human Evolution is now available from Cambridge University Press


The origin of language in gesture–speech unity

Part 2: Gesture-first

By Professor David McNeill

This popular hypothesis says that the first steps of language phylogenetically were not speech, nor speech with gesture, but were gestures alone.  In some versions, it was a sign language. In any case, it was a language of recurring gesture forms in place of spoken forms. Vocalizations in non-human primates, the presumed precursors of speech without gesture’s assistance, are too restricted in their functions to offer a plausible platform for language, but primate gestures appear to offer the desired flexibility. Thus, the argument goes, gesture could have been the linguistic launching pad (speech evolving later). The gestures in this theory are regarded as the mimicry of real actions, a kind of pantomime, hence the appeal of mirror neurons as the mechanism. To quote Rizzolatti and Arbib (1998), in their exposition of gesture-first, mirror neurons are “neurons that discharge not only when the monkey grasped or manipulated the objects, but also when the monkey observed the experimenter making a similar gesture” (p. 188).  Current chimps show this kind of action mimicry (see illustration later in this post).

Did gesture scaffold speech, then speech supplant it?  Even if mirror neurons were a factor in the origin of language, our basic claim is that a primitive phase in which communication was by gesture or sign alone, if it existed, could not have evolved into the kind of speech–gesture combinations that we observe in ourselves today. We see two problems. First, gesture-first must claim that speech, when it emerged, supplanted gesture; second, the gestures would be pantomimes, that is, gestures that simulate actions and events. However, such gestures do not combine with co-expressive speech but rather fall into other slots on the continuum of gestures, supplements (rather than co-expressiveness) and pantomime.

Looking over a roster of gesture-first advocates, including several writing before the mirror neuron discovery, all say at some point that speech supplants the original gesture language, which then is marginalized. For example, Henry Sweet (said to be Shaw’s model in Pygmalion for Henry Higgins) wrote, “…gesture which later would be dropped as superfluous” (pp. 3-4).  More recently, Rizzolatti and Arbib said,  “… gesture became purely an accessory factor to sound communication.”  In all cases, as in these quotes and many others, gesture withers to the status of an “add-on.”

This is the first wrong assertion. Gesture-first commits one to the false prediction that speech replaced gesture rather than, as we see in ourselves, speech and gesture united as one “thing.”  We say that gesture-first incorrectly predicts that speech would have supplanted gesture, and fails to predict that speech and gesture became a single system. It thus is falsified – twice in fact. The contradiction of gesture-first is that speech supplants gesture, it says, yet ends up integrated with it. The logic of gesture-first, at its very core, means that supplantation, overt or hidden, is inescapable. This is why every advocate naturally posits it.

Empirically, there is this perfect correlation of those advocating gesture first and the supplantation step. Moreover, there is a conceptual point that explains it. It is important to see that gesture-first is a theory about the origin of speech (not gesture). Given that aim, it must logically consider that from gesture one gets to speech; and here supplantation enters: it is unavoidable. Even Sweet, who envisions a transition from hand gestures to tongue gestures, and with them to speech, wants to leave hand gestures out at the end as “superfluous.” He has no way to say from his several transitions that gestures in the end are other than left-overs.

When it emerged, why did speech not gradually integrate with gesture? This is possibly what “scaffolding” intends in part. But even if scaffolding took place it could only have been a temporary arrangement. For speech to become an autonomous system, sooner or later gesture and speech must have separated. The reason again lies in the gesture-first tenet. The whole logic of gesture-first is to picture one code coming after another. The models of supplantation immediately below show the effects. The most that can happen is that the codes divide the labor of communication, as will be seen with the second model. Even if speech integrates with a gesture-language (as a kind of vocal gesture) it must sooner or later become an encoded system of its own, and the would-be integration is lost. The first of the models shows this happening – two codes, one for gesture and one for speech refusing to synchronize. It does not help to point to gestures in non-linguistic primates. There is nothing in them to show how they could lead to language without encountering the same roadblock of supplantation.

Models of supplanting and scaffolding.  To see what may happen when two codes co-occur, as they would at the hypothetical gesture-first/speech supplantation crossover, we have two models: Aboriginal signs performed with speech, and hearing bilingual ASL signs with spoken English. In neither case is there the formation of packages of semiotic opposites, as the example in post 1 illustrated and the growth point explains. When a pairing of semantically equivalent gesture and speech is examined in these models, the two actively avoid speech–gesture combinations at co-expressive points. They repel each other in time or functionality or both, and do not coincide at points of co-expressivity.

1. Warlpiri sign language. Women use the Warlpiri sign language of Aboriginal Australia when they are under (apparently quite frequent) speech bans and also, casually, when speech is not prohibited. When this latter happens signs and speech co-occur and lets us see what may have occurred at the hypothetical gesture or sign-speech crossover. Here is one example from Kendon:

The spacing is meant to show relative durations, not that signs and speech were performed with temporal gaps (both were performed continuously). Speech and sign start out together at the beginning of each phrase but, since signing is slower, they immediately fall out of step. Each is on a track of its own and they do not unify. Speech does not slow down to keep pace with gesture, as would be expected if speech and gesture were unified (mutual speech–gesture slowing is shown by the deafferented patient, IW, “the man who lost his body,” described in post 3). They then reset (there is one reset in the example) and immediately separate again. So, according to this model, co-expressive speech–gesture synchrony would be systematically interrupted at the crossover point of gesture and speech codes. Yet synchrony of co-expressive speech and gesture is what evolved.

2. English-ASL bilinguals. The second model is Emmorey et al.’s observation of the pairings of signs and speech by hearing ASL/English bilinguals. While 94% of such pairings are signs and words translating each other, 6% are not mutual translations. In the latter, sign and speech collaborate to form sentences, half in speech, half in sign. For example, a bilingual says, “all of a sudden [LOOKS-AT-ME]” (from a Sylvester and Tweety narration; capitals signify signs simultaneous with speech). This could be “scaffolding” but it does not create the combinations of unlike semiotic modes at co-expressive points that we are looking for. First, signs and words are of the same semiotic type – segmented, analytic, repeatable, listable, and so on. Second, there is no global-synthetic component, no built-in merging of analytic/combinatoric forms with gesture’s global synthesis, and the spoken and gestured elements are not co-expressive but are the different constituents of a sentence. Of course, ASL/English bilinguals have the ability to form GP-style cognitive units. But if we imagine a transitional species evolving this ability, the bilingual ASL-spoken English model suggests that scaffolding did not lead to GP-style cognition; on the contrary, it implies two analytic/combinatoric codes dividing the work. If we surmise that an old pantomime/sign system did scaffold speech and then withered away, this leaves us unable to explain how gesticulation emerged and became engaged with speech. We conclude that scaffolding, even if it occurred, would not have led to current-day speech-gesticulation linkages.

Corballis, in his 2002 argument for speech supplanting a gesture-first system of communication, points out the advantages of speech over gesture. There is the ability to communicate while manipulating objects and to communicate in the dark. Less obviously, speech reduces demands on attention since interlocutors do not have to look at one another (p. 191). While valid, these qualities are not necessary. There are also positive reasons for gestures not being language-like, and they would be so even if gesture and speech co-evolved as a single adaptation. All across the world, languages are spoken/auditory unless there is some interference to the channel (deafness, acoustic incompatibility, religious practice, etc.), and no culture has a visual/gestural primary language. Susan Goldin-Meadow, Jenny Singleton and I once proposed that gesture is the non-linguistic side of the speech–gesture dual semiotic because it is better than speech for imagery: gesture has multiple dimensions on which to vary, while speech has only the one dimension of time.  Given this asymmetry, even if speech and gesture were jointly selected, as proposed in this series, it would work out that speech is the medium of linguistic segmentation.

Problems with pantomime. The second problem is that the gestures of gesture-first would be pantomimes. Gesture-first claims the initial communicative actions were symbolic replications of actions of self, others and entities, and these pantomimes later scaffolded speech. This process appeals because it so clearly taps the mirror neuron response. Merlin Donald likewise posited mimesis as an early stage in the evolution of human intelligence. It is conceivable that pantomime is something that an apelike brain is capable of and was already in place in the last common chimp–human ancestor, some 8 million years back. Contemporary bonobos are capable of it, supporting this idea:

Bonobo Gestures

The problem is not a lack of pantomime precursors but that pantomime repels speech. The distinguishing mark of pantomime compared to gesticulation is that the latter is integrated with speech; it is an aspect of speaking. In pantomime this does not occur. There is no co-construction with speech, no co-expressiveness; timing is different (if there is speech at all), and no dual semiotic modes. Pantomime, if it relates to speaking at all, does so, as Susan Duncan points out, as a “gap filler” – appearing where speech does not, for example completing a sentence (“the parents were OK but the kids were [pantomime of knocking things over]”). Movement by itself offers no clue to whether a gesture is “gesticulation” or “pantomime”; what matters is whether or not two modes of semiosis combine to co-express one idea unit simultaneously. Pantomime does not have this dual semiosis.

Last word on gesture-first.  Whether you are persuaded by these arguments depends, ultimately, on taking seriously gesture–speech unity, that gesture and speech comprise a single multimodal system, and that gesture is not an accompaniment, ornament, supplement or “add-on” to speech but is actually part of it. Gesture-first does not predict this language–gesture integration. When we look at models of speech–gesture crossovers of the kind that, in theory, gesture-first would have encountered when speech supplanted an original gesture language, we do not find conditions for gesture–speech unity, but instead non-co-expressiveness or mutual speech–gesture exclusion.

Joining the damage is Woll’s (2005/2006) argument that not only does gesture-first leave gestures unable to integrate with speech but it also blocks, within speech itself, the arbitrary pairing of signifiers with signifieds that is characteristic of (or, Saussure says, defining of) a linguistic code.

Michael Arbib, in his gesture-first theory, envisions an “‘expanding spiral’ of increasingly sophisticated protosign and protospeech,” a spiral moving from gesture-first to speech.  A spiral pictures gradual changes from gesture (or protosign) to speech (or protospeech). This appears not to be the “crossover” modeled above, but the models still apply. Pantomime and signs push synchrony and co-expressiveness with speech away, and do not break out of this self-defeating pattern (despite the spiral’s openness, as Arbib also argues, to sign and speech shaping each other). Nothing in the spiral forms co-expressiveness and gesture–speech unity. With each turn gesture spins off (“scaffolds”) a bit more of itself into speech; but then speech, far from shaping gesture or being shaped by it, repels it and/or divides the labor between itself and its former gesture master.

Michael Corballis likewise continues to advocate gesture-first in a new work, which takes as its central theme a posited linguistic universal, recursion.  However, recursion is equally beyond gesture-first. This is because recursion enters into gesture–speech unities. It co-expressively appears in both gesture and speech simultaneously. In one example, a speaker outlined what she took to be an ambiguity in the bowling ball episode of the cartoon stimulus described in post 1.  She first states the perceived ambiguity (“you can’t tell if the bowling ball”) and then, recursively, states the alternatives (“is under Sylvester or inside of him”); concurrently and co-expressively, she moves her left hand to a certain space for the ambiguity itself, and then opposes two spaces within it for the two poles of the ambiguity (two further gestures in the “ambiguity” space  – first the hand moves forward with “is under”, then inward with “or inside of him”); so there is recursion on both sides of the dialectic. The recursions, spoken and gestured, partake of the usual semiotic oppositions: while speech is codified, comprised of recurrent elements with constraints of meaning and form, gesture is global and synthetic and the meaning of the whole (ambiguity) determines the meanings of the parts (the “under” pole, in particular, being anti-iconic for the meaning of being under something).  None of this can gesture-first explain.

I do not deny that gesture-first may once have existed, and in fact I assume that it did exist once. But if it did it could not have led to human language.  It would have created pantomime, a type of gesture that does not unify with speech. Gesture-first either extinguished or shunted off into a dead end.  I propose in a later post that it was a dead end seen now only in children’s earliest language.

The upshot is that gesture-first has little light to shed on the origin of language, as we know it; at best it explains the evolution of pantomime as a stage of phylogenesis that, if it once occurred, went extinct as a code and landed at a different point on the continuum of gestures.


Further Reading


Arbib, M. A. 2005. ‘From monkey-like action recognition to human language:  An evolutionary framework for neurolinguistics.’  Behavioral and Brain Sciences, 28: 105-124.

Armstrong, David F. and Wilcox, Sherman E. The Gestural Origins of Language. Oxford.

Armstrong, David F., Stokoe, William F. and Wilcox, Sherman E. 1995. Gesture and the Nature of Language. Cambridge.

Corballis, Michael C. 2002. From hand to mouth: the origins of language. Harvard.

Corballis, Michael C. 2011.  The Recursive Mind: The Origin of Human Language, Thought, and Civilization. Princeton.

Donald, Merlin. 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and Cognition.  Harvard.

Henderson, E. (ed.). 1971. The Indispensable Foundation: a selection from the writings of Henry Sweet. Oxford

Hewes, Gordon W. 1973.  ‘Primate communication and the gestural origins of language.’  Current Anthropology 14:5-24.

Rizzolatti, Giacomo and Arbib. Michael. ‘Language within our grasp.’ Trends in Neurosciences 1998 21:188-194


McNeill, David, Duncan, Susan D., Cole, Jonathan, Gallagher, Shaun & Bertenthal, Bennett. 2008.  ‘Growth points from the very beginning.’  Interaction Studies 9: 117-132.

Goldin-Meadow, Susan, McNeill, David, and Singleton, Jenny. 1996. ‘Silence is liberating: Removing the handcuffs on grammatical expression in the manual modality.’ The Psychological Review 103: 34-55.

Woll, Bencie. 2005/2006. ‘Do mouths sign? Do hands speak?’ in Botha, Rudie & de Swart, Henriette (eds.), Restricted Linguistic Systems as Windows on Language Evolution. Utrecht: LOT (Netherlands Graduate School of Linguistics Occasional Series, Utrecht University). (accessed 05/02/11).

Sign languages with speech:

Emmorey, Karen, Borinstein, Helsa B. and Thompson, Robin. 2005. ‘Bimodal bilingualism: Code-blending between spoken English and American Sign Language’, in Cohen, Rolstad and MacSwan (eds.) Proceedings of the 4th International Symposium on Bilingualism, pp. 663-673.  Somerville, MA: Cascadilla Press.

Kendon, Adam. 1988. Sign languages of aboriginal Australia: cultural, semiotic and communicative perspectives. Cambridge.

Gestures of Apes:

Call, Josep and Tomasello, Michael (eds.). 2007. The gestural communication of apes and monkeys. Erlbaum.




David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago.

His new title How Language Began: Gesture and Speech in Human Evolution is now available from Cambridge University Press at £19.99/$36.99

A Layman’s Guide to “Roots of English”

by Professor Sali A. Tagliamonte
University of Toronto

Have you ever wondered about the weird ways of speaking of someone you know? In 1995, I moved to England from Canada, taking up a position at the University of York in Yorkshire. My colleagues came from all over Britain, the south, the north, Scotland and Northern Ireland as well as other parts of Europe. The topic of dialect differences was in the air all the time as we compared our varieties of English. Surprisingly, despite the obvious phonological differences in my speech compared to all my colleagues, there were unexpected correspondences between myself and my Scots, Northern Irish and Northern English colleagues. In some cases, we had the same vowel merger or we had the same lexical item or some odd bit of syntax was similar or we used the same form of one adverb or another. The correspondences came from all levels of grammar and sometimes in unexpected ways. It was curious to me that there were so many similarities and I wondered, why? I discovered that northern varieties of British English were among the most prominent dialect regions from which people migrated to other parts of the world in the late 18th century, particularly my own country of origin, Canada. Could it be that the roots of my way of speaking could be tracked back to these founding dialects? In 1999, embarked upon a research project to study the varieties of English these dialect regions.

Linguistic Wooly Mammoths. The research traditions of dialectology, historical linguistics and sociolinguistics have demonstrated that researchers can gain access earlier points in time. In the absence of a time machine, how is this possible? Consider a woolly mammoth frozen in a glacier. We can gain remarkable insight into past time by studying its characteristics. Linguists employ a similar method.

Places that are geographically remote, socially isolated or set apart from the rest are slow to adopt new changes, or are missed entirely. Such areas are referred to as tend to preserve older features. In this way remote, inaccessible, or otherwise isolated locations provide prime evidence about an earlier stage (or ancestor) of a language and play a key role in reconstructing earlier stages of a language’s development. There is perhaps no place more akin to these descriptions than the British and Northern Irish north country.

Dialects galore!  What I refer to as the Roots Archive is a rich compendium of oral histories from dozens of elderly people that I collected between 2001-2003. The materials contain rich language data with a wealth of rarely heard features of the English language. There are innumerable dialect words and expressions, e.g. fuzzok, peery, thrang. There are unusual sounds, och, aye. There are unexpected twists in the arrangement of sentences and in the way sentences begin and end, e.g. and that, you know. There are unusual conversational rituals. There are many things that are unusual and exotic; there are some things that are entirely unknown and yet others are hauntingly familiar. In many cases, features long gone from mainstream varieties of English endure.  In order to give readers a profound sense of the dialects, I have sprinkled the chapters with quips, stories and interchanges from the conversations e.g. weans and it’s a good job, as in:



Aye, they just come on the phone- “Morag could you come out the night there’s somebody, ken. Such and such a body can nae manage yin”. “Aye, Aye, I’ll just come out aye”. She’s just leaving the dogs. Says I, it’s a good job it’s no weans you’ve got for you would nae- could nae go!


These quotes expose innumerable dialect features. I have made note of some of them in footnotes so that readers can try to spot the features themselves and then verify whether they have found them all. Here is the footnote to the ‘weans’ quote.

Note the use of aye as a discourse marker; ken as a discourse particle; somebody rather than someone followed by use of a body in the generic; yin for ‘one’; inverted, says; the expression it’s a good job; the syntactic structure it’s no weans you’ve got ‘you’ve got no children’; use of can nae, would nae for ‘wouldn’t’, ‘couldn’t’.

Many of the features I discuss in the book are well known across English vernaculars, including regularized pasts, e.g. knowed, come, past tense seen and done among others. Others are typical of the northern UK dialects and often reported in compendia of varieties of English. However, a few have rarely been reported.

Linguistic detectives. Each chapter of Roots of English offers readers a “Dialect Puzzle” so that they can get a taste of what it is like to be a sociolinguist.

Dialects are the storehouse of the heart and soul of culture, history and identity. For analysts of language, dialects are a tremendous resource for understanding the grammatical mechanisms of linguistic change. Delving deep into the nuts and bolts of language, deeper than words and phrases and expressions, down into the grammar, we discover a treasure trove. Beneath the anecdotes and nonce tales are hidden patterns and constraints that are a system unto themselves reflecting the legacy of regional factions, social groups and human relationships. As language evolves through history its inner mechanisms are evolving incrementally, but not in the same way in every place nor at the same rate in all circumstances. One of my goals is to leave the reader with new ideas about the roots of his or her own dialect and how its particular socio-geographic co-ordinates might offer a ‘goldmine’ for ongoing study.

Sali A. Tagliamonte is a professor in the Department of Linguistics at the University of Toronto. Her new title, Roots of English is now available from Cambridge University Press.

The origin of language in gesture–speech unity

Part 1: Language and Imagery

By Professor David McNeill

Why do we gesture? Many would say that it brings emphasis, energy, and ornamentation to speech (which is assumed to be the core of what is taking place); in short, as Adam Kendon says, also arguing against the view, gesture is an “add-on.”  However, the evidence is against this.  The reasons we gesture are more profound. Language is inseparable from imagery. The natural form of imagery with language is gesture, with the hands especially.  While gestures can enhance communication, the core is gesture and speech together. They are bound more tightly than saying the gesture is an “add-on” or “ornament” implies. Even if for some reason a gesture is not made (social inappropriateness, physical difficulty, etc.), its imagery is still present, hidden but part of the speech process (it may surface in some other part of the body, the feet for example).

To answer to the question, why we gesture?, it is because gesture was built in from the start. Language could not have evolved without it.  If a theory of language origin is to predict the nature of language, it must among other things predict this gesture-speech unity.  But if a theory says that gesture-speech unity did not evolve, and/or predicts that something incompatible with it did evolve, the theory cannot be correct.  A widespread theory that I will call “gesture-first” fails this test. In this first post I explain gesture–speech unity. In later posts I apply the test and propose a new theory, called “Mead’sLoop,” that meets it.

The smallest unit of gesture–speech unity is called a growth point, or GP.  Growth points are inferred from the totality of communicative events, with special focus on speech–gesture synchrony and co-expressivity. They are called growth points because they are meant to be the initial pulses of thinking-for-and-while-speaking, a dialectic (or “multilectic”) from which a dynamic process of organization emerges.  The result is what Wundt described as the two modes of consciousness in speech:

“From a psychological point of view, the sentence is both a simultaneous and a sequential structure.  It is simultaneous because at each moment it is present in consciousness as a totality even though the individual subordinate elements may occasionally disappear from it.  It is sequential because the configuration changes from moment to moment in its cognitive condition as individual constituents move into the focus of attention and out again one after another.” (Blumenthal 1970 translation of Wundt 1900, p. 21).

In a GP dialectic Wundt’s two modes are a natural outcome. The “simultaneous” is consciousness of the GP and dialectic itself; the “sequential” is awareness of what I call unpacking. The model is that a GP differentiates what for the speaker is the point of newsworthiness in the immediate context of speaking. This differentiation, partly linguistic form, partly imagery, is then “unpacked” into a construction, both rendering it communicable as a social effort and putting a stop-order to the dialectic.  (It’s the nature of this post that I must introduce a number of new terms, several of them binary opposites; I’ve placed a diagram at the end that lists them and shows how they relate.)

The figure below is an example of the kind of gesture we focus on (sometimes called “gesticulation”), illustrating the two simultaneous modes (the gestures were spontaneous occurrences, recorded during an experiment in which speakers were retelling an animated Tweety and Sylvester cartoon they had just watched; in this episode, Sylvester (an ever-seeking cat) attempts to reach Tweety (a pugnacious canary), who is perched on a windowsill several stories above the street, in a stealth approach by climbing a drainpipe on the inside). The gestures are iconic signs but the iconicity is semantic, not photo-like. They are images of concepts of the events in the story:

  • Pointing upward, she says, “he tries to go up inside,” localizing the character in gesture space.
  • Then, making a spiraling upward movement, she says, “barreling up through it,” depicting the character’s presumed spinning inside the pipe – an inference (not shown in the film).
  • Her left hand is shaped as a cup and embodies her concept of the inside of the drainpipe, timed exactly to go with the “barreling” part of her description.

Photos of a man gesturing

It is important to note that gesture and speech cover the same idea units. It is not that gesture holds hidden messages. Statements that such and such a percentage of meaning is “non-verbal” fly against the reality of gesture-speech unity. In nearly every case, speech and gesture convey the same meaning, but they do it in opposite ways. We see in the gesture visuospatial thinking – not only about space as such but about the same content also expressed verbally. Note the use of the left hand for inside the pipe. It is her concept of interiority, not a depiction of the actual pipe, which enclosed Sylvester and was vertical, not horizontal. Both the verbal and imagistic modes capture interiority but in opposite ways.  Unlike the sentence, the ‘parts’ of the gesture (the shape, the direction, the motion, etc.) do not have their own meanings; they are meaningful only in the context of the gesture as a whole.  This is called the global property: the meanings of the parts depend on the meaning of the whole. It is the opposite in speech.  There the parts (words) have their own meanings and build up the meaning of the whole through combination.  This is called the syntagmatic property.

So in gesture-speech unity different modes of semiosis (“semiosis” and “semiotic” refer to the nature of symbols) are presenting the same meanings at the same time – global whole-to-part in gesture, syntagmatic part-to-whole in speech, and they are synchronous. Throughness is visualized as a hollow space – not an iconic replica of the pipe but the concept realized imagistically with its own location.  It goes not with ‘inside’ but with its conceptually parallel ‘barreling up through.’  When gesture and speech synchronize (as they do in that vast majority of utterances), one idea – here, Sylvester’s ascent via the pipe – is simultaneously in two semiotic modes, imagery and language. The result is an idea unit in which imagery and words combine, and this is an inherently dynamic situation.  Such a system of language, an imagery-language dialectic, would be explained if we found, independently, that language began in a gesture-speech unity.  We shall later see how this may have happened (post 3).

Seeing the gesture and the co-expressive speech it synchronizes with, we witness a moment of an ongoing imagery–language dialectic. Gesture-speech unity is the nexus at which imagery and the codified forms of language form intersect – two dimensions of language with equal weight. The picture is not unlike Humboldt’s distinction of Ergon and Energeia (language viewed as structure and language as an “embodied moment of meaning located both in the organism and in the medium that the organism uses for expression.” The latter is language at the moment of its use, “alive, in an actor”: Joseph Glick describing seminars by Heinz Werner; from Elena Levy).

The larger picture – 1. Historically, the dynamic and static have been approached separately – each with its own traditions, methodologies, sciences, and institutional practices (& prejudices).  Each tradition describes something of substance:

Static = language is a thing, not a process.  This is the Saussurian tradition and it bears on Wundt’s sequential mode.  The academic field of linguistics has specialized on the static dimension.

Dynamic = language is a process, not a thing. This is the Vygotsky tradition and it bears on Wundt’s simultaneous mode.  The budding field of gesture studies focuses on this dimension.

However, we must combine them. The dynamic does not replace the static. Gesture gives us access to the dynamic mode. Linguistic form gives the static (no particular synchronic description is favored: we go with whatever fits best the dynamic picture we are trying to paint). The important point is that both modes are present.

The larger picture – 2. An imagery–language dialectic implies:

  • A conflict or opposition of some kind, in our case between the two semiotic modes, a dual semiosis.
  • Resolution of the conflict through change, its unpacking.

A dialectic is inherently dynamic and a good model of the psycholinguistics of speaking.

A dialectic presupposes Vygotsky’s concept of a unit as the smallest component that retains the quality of a whole.  This whole is the imagery–language dialectic.  A GP of unified gesture and speech is the smallest unit in which an imagery-language dialectic takes place. Further reduction to a gesture and a linguistic segment separately destroys the unit itself, leaving only a gesture or linguistic segment but not a dynamic process.

A quick list of GP’s properties:

  • It is proposed as the minimal unit of the imagery-language dialectic.
  • It is a dialectic package that has both linguistic categorial and imagistic components.
  • Growth points are inferred from the totality of communicative events with special focus on speech-gesture synchrony and co-expressivity.
  • By focusing on these properties we bring out the modes of cognition envisioned by Wundt.

All of this is why we gesture.  Gesture is an integral part of speaking. And language could not have begun without it.  The next post in this series will take up the evolutionary precursors of this dual semiotic system of gesture-speech unity.

The many binaries. I have made the following diagram to sort out the several distinct but related binary oppositions, plus a few other critical terms in this posting.  The numbers are the order in which the first mention of the term occurred:

Further Reading

Kendon, Adam. 2008. ‘Some reflections on the relationship between ‘gesture’ and ‘sign.’’ Gesture 8:348-366.

McNeill, D. and Duncan, S. D. 2000.  Growth points in thinking for speaking.  In D. McNeill (ed.), Gesture and Language, pp. 141-161. Cambridge University Press.

Saussure, Ferdinand de. 1959. Course in General Linguistics (Charles Bally and Albert Sechehaye, eds., Wade Baskin, trans.).  New York: The Philosophical Library.

Vygotsky, Lev S. 1987. Thought and Language. Edited and translated by E. Hanfmann and G. Vakar (revised and edited by A. Kozulin). MIT Press.

Wundt, Wilhelm. 1970.  ‘The psychology of the sentence.’ In A. Blumenthal (ed. and trans.). Language and Psychology: Historical Aspects of Psycholinguistics, pp. 20-33. Wiley.

David McNeill is a professor in the Departments of Linguistics and Psychology at the University of Chicago. 

His new title How Language Began: Gesture and Speech in Human Evolution is now available from Cambridge University Press at £19.99/$36.99


Duels and Duets: Why Men and Women Talk So Differently

Duels and DuetsA blog post by John L. Locke

It has long been known that men and women talk differently when conversing with members of the opposite sex. This has never been explained, but insights emerge from same-sex conversations where, free of the need to accommodate to each other, deeper differences between men and women readily bob to the surface.

In Duels and Duets, I claim that modern men and women talk differently because our male and female ancestors followed different evolutionary paths.  Since men were selected to aggress and dominate, but could end up killing themselves, they needed a safer way of achieving their goals. Ritualized duels, using words instead of weapons, filled the bill.  Verbal duels also provided a way for me to display the fitness information that women needed in making their long-term mating choices.

In a number of traditional societies, anthropologists have encountered various contests, from song duels to drum duels, poetic duels, and sung poetic duels – all fought with words, and all fought by men.  By itself, this is intriguing, but the underlying disposition to duel also leeches into the speech of men in modern societies, even when they are merely socializing with their colleagues and friends. In these ordinary duels, men denigrate their friends in a humorous way, often before an audience, but they also hold competitive joke- and story-telling sessions that feature verbal artistry.

Today, many men see themselves as performers, seeking eloquence where it could attract favorable attention from women and men, and portraying themselves as heroes in the stories they tell.  One trial lawyer, so theatrical in the courtroom that someone said he should have been an actor, responded, “What do you mean ‘should have been’?”

For reasons I describe in the book, women have tended to compete with other women indirectly, through mutual friends, and they have enlisted a more harmonious way of talking to build their relationships.  In these duets, women create feelings of closeness through intimate disclosures about others and themselves.  Gossip, the name we give to conversations that impart information about others, derives from “god-sibs” – originally “God’s siblings” – the 15th and 16th century women who gathered in bedrooms to witness new births but, while awaiting the natal event, discussed matters of mutual interest.  An unusually pure form of duetting occurs when women collaborate, effectively co-authoring and co-telling their personal stories.

Some books on “gendered language” say that little boys learn to talk like their father, and little girls imitate their mother.  But it is clear that males and females come into the world pre-wired to engage with other members of their sex in vastly different ways.  The endocrine system plays an important role here, supporting various relationships between the speaking voice and reproduction.  Men who enjoy locking verbal horns in public also tend to have more testosterone than others – trial lawyers are off the charts.

The adaptive value of this is revealed, predictably, on the evaluative side of the equation: women prefer men with low-pitched voices, especially during the high fertility phase of their menstrual cycle.  In this sense, women literally call the tune.  But other areas of speech and language are also involved in courtship, and I discuss the things that men do, in their speaking behaviors, to convince women that they have the right biological stuff.  Of course, duetting has its own set of physiological supports.  Intimate vocalization tends to increase oxytocin, which appears to facilitate emotional connection, and to decrease cortisol, a stress hormone.

How discrete are the relationships between the human sexes and their preferred ways of talking?  I’ve found lots of cases, historically, where women verbally assaulted each other, but these assaults were typically genuine – lodged in anger, usually as a form of reprisal for a perceived injustice – not as a way of posturing or relating, and none was ritualized.  Women may denigrate themselves, but they do not insult their close women friends, even humorously.  Men do talk quietly and privately with other men, but they usually shy away from the intimate self-disclosures that could increase their vulnerability, and they rarely work through other men in their efforts to compete with male rivals.  If they have something to say to a foe or competitor, they usually go up to him and say it.

Toward the end of the book, I revisit linguistic evolution, suggesting that if human language is built the way it is because the designers, the ancient human architects, were built the way that they were, then the shape of language would have been formed around these innately scripted preferences and priorities.  But how did this happen?  How did the human sexes’ ways of relating and interacting affect the design of spoken language?  In earlier chapters I focus on the things that language, as a communicative tool, has done for men and women; in the final chapter I ask what men and women did for language.


In recent years, writers have discussed the fact that speaking differences can cause couples to clash in their conversations.  But couples also need to collaborate in carrying out a broad range of domestic operations, from getting the car fixed to raising the children, paying the bills, and maintaining some sort of social schedule.  Teams usually work better if the members bring different strengths to the table, and divide up the responsibilities.  Different speaking strategies, I suggest, can and do help men and women to mesh in their lives.


John L. Locke is the author of Duels and Duets: Why Men and Women Talk So Differently. Click here to find out more about the book and order your copy today for just £14.99 / $28.00

Why not listen to Professor Locke in discussion with Kirsten Hoge on Woman’s Hour

Language Erosion

There’s a timely article by Laura Spinney in The Independent today highlighting the recent discovery of Koro, a previously unknown language in India spoken by around 800 people. The ensuing discussion around language evolution, and indeed extinction, draws upon the research of Cambridge University Press authors Tecumseh Fitch and Stephen Levinson – click here to read the article.

The Evolution of Language


The Evolution of Language

Tecumseh Fitch

Language, more than anything else, is what makes us human. It appears that no communication system of equivalent power exists elsewhere in the animal kingdom. How, and why, did language evolve in our species and not in others? Tecumseh Fitch brings together important insights from diverse disciplines to explore one of the biggest unsolved puzzles of human history.

2010 | £29.99

Find out more and read a free excerpt

Grammars of SpaceBESTSELLER

Grammars of Space: Explorations in Cognitive Diversity

Edited by Stephen C. Levinson & David P. Wilkins

In this collection, a team of leading linguists and psychologists look at how the spatial domain is structured in language. Drawing on data from a wide range of languages, they uncover considerable cross-linguistic variation across this central domain, adding to debates about the innate foundations of human cognition.

2006 | £39.99

Find out more and read a free excerpt