Gesture: in modern language teaching and learning.



For decades there have been linguists who proclaimed the significant role of kinesic expression in carrying the meaning of spoken language (e.g. Efron, 1941; Birdwhistell, 1952; Kendon, 1972). Drawing on such studies, there have also been pioneer studies of gesture in second language learning and use (McNeill, 1992; Gullberg, 1998, 2006), and approaches to teaching foreign languages using gestures (Menot, 1970, 1981; Orton et al., 1995; Maxwell, 2002; Taeschner, 2005). Yet to this day, language teaching continues to focus most heavily on the verbal, to a lesser extent on the vocal, and virtually not at all on the kinesic channels of expression. Recent interdisciplinary studies combining neurophysiology, psychology, and linguistics (Kita, 2003; Goldin-Meadow, 2003a; Roth, 2004; Gullberg & Indefrey, 2006) not only confirm the integral role of gesture in natural spoken interaction, but also make evident the significance of gesture in all human learning, in any field. These findings suggest that a quite remarkable rethink is needed about the place of the kinesic in learning, not least in language classrooms. To this end, the essentials of contemporary knowledge about gesture in language use and learning are presented here, and some of the second language pedagogical approaches now being employed to put this knowledge into practice are discussed. As these works show, it is time that the long-neglected domain of kinesics was included in our discourse on teaching and learning languages.


Second language pedagogy, kinesic learning, gesture.


Despite our knowledge for decades that in practice 'people use and interpret verbal, prosodic, and kinesic signals in relation to one another, not in isolation' (Arndt & Janney, 1987, p. 5), language teaching has focused most heavily on the verbal (words), to a lesser extent on the vocal (voice), and virtually not at all on the kinesic (body) channels of expression, major among which last are gestures. Indeed, for a very long time, using gestures while speaking, even in informal settings, was frowned on in English-speaking and some other countries. Such use was understood to show the speaker was less than proficient in the clear verbal expression expected in civilised society. Despite attracting this reprobation, there have always been some scholars of verbal behaviour (Efron, 1941; Birdwhistell, 1952; Kendon, 1972; Argyle, 1975) who saw gesturing as a natural and integral component of normal spoken expression. Greatly assisted in obtaining solid data by modern technology, the study of gesture in the last decade and a half (McNeill, 1992; Kendon, 2004) has shown this view to be even more true than some had guessed, and these results have led to gesture becoming a focus of increasing interest in a range of academic areas (Roth, 2001), not least in education. In the field of modern languages education there have recently been some very important developments that have gesture as their base (e.g. Taeschner, 2005). The purpose of this article is to introduce these and to raise the long-neglected domain of kinesics for inclusion in our discourse on the teaching and learning of languages. Before getting to teachers and learners, however, it is necessary to draw some simple boundaries around the term 'gesture' and to set out some basic information about what gestures are and what they do.



Several scholars over some decades have made categories of gesture type. We can gesture with our hand, arm, head, eyes, nose, mouth, foot, and even whole body, so one first boundary to he drawn is which part of the body will be included for consideration. In keeping with most second language scholars in the field, the reference for gesture in this article will be movements of the hand. Another boundary to note in any discussion of the word 'gesture' is the level of standardisation. One often cited classification, though still contested--not least by Kendon himself (2004, p. 104)--is Kendon's continuum (McNeill, 1992, p. 37), which plots gestures from highly conventionalised to highly personal. At the conventionalised end are found those systems of movements which belong to a shared code, such as sign languages for the deaf, emblems, and mime. Emblems are gestures like head nods, thumbs up, finger across the throat, which a whole speech community shares and which, like sign languages and mime, can substitute for speech because meaning is recognised in the sign itself. At the other end of the continuum are spontaneous, personal hand movements used in conjunction with speech. All humans use these gestures, though they may vary in frequency, type, and size. It is these gestures that are being referred to and discussed in this article.

Natural hand gestures used in conjunction with speech have been classified over time by several scholars. A composite list is given here, drawn largely from McNeill (1992):

* beats (emphasis)

* deictics (pointing)

* iconics (pictures)

* metaphorics (abstract iconics)

* cohesives (linking concepts)

* adaptors (self touching, e.g. preening).


Blind speakers gesture, people using the telephone gesture, people asked to describe something to a blank wall gesture. So, it is argued, using gesture must serve primarily an intrapersonal function: as aid in the retrieval of what is to be said and in the timing of its delivery (Hadar & Butterworth, 1997). However, as other scholars point out, speakers without visible interlocutors use fewer and less well-defined gestures than those speaking to a person they can see (Bavelas et at., 1992; Gerwing, 2007), and recounting the one story a number of times in succession clears up semantic retrieval problems, but such speakers still gesture (Beattie & Coghlan, 1998). These scholars argue that there is always an interactive function in gesturing and many (e.g. Bavelas & Chovil, 2000; Kendon, 2004) would argue that the interpersonal function of gesture is primary. In support of this, Bavelas and Chovil (2000, 2006) have shown that all gestures, even beats, can be seen to orient at some point to the interlocutor. As well as these communicative considerations, there has been development in the epistemology of gesture, with most recently McNeill and others (e.g. Duncan, Cassell, & Levy, 2007; Roth & Lee, 2007) proposing that gesture (and language) does not express an inner thought so much as present a speaker's living meaning: that which is enacted is 'part of the speaker's current cognitive being, her very mental existence, at the moment it occurs' (McNeill, 2005, p. 92).


Babies enter the world of language through their eventual use of spontaneous hand movements that are taken as pointing and are responded to by someone in their environment. Babies often wave their arms about, but the gestures taken as pointing, which develop from a series of showing and reaching hand movements, are recognised as such because they are more clearly intentional; and, like adult gesturing, are often aligned with accompanying vocalic production and a checking that the audience is attending (Butterworth, 2003). Once inducted into the communicative loop, babies learn the gesturing style of their family and social groups, though as with verbal expression, they use what they receive from others in their own way.

Form and placement

In formal analysis, a gesture is composed of a preparation phase, a stroke phase, and a recovery phase. The key link between gesture and speech is self-synchrony of voice and movement: the stroke of the speaker's gesture will coincide with (or fall just before) the speaker's vocalic stress--and the vocalic stress will be on the appropriate verbal syllable (Condon & Ogston, 1966; McNeill, 1992, 2005). It is in this way that all three expressive channels are systematically connected to one another in the act of making meaning. As contributors to meaning, gestures may add redundant information, and thereby add emphasis to what is said, or they may provide additional information, or be used to pull a string of propositions into an imaginary organised set. Gestures may even contradict what is being said verbally. This can cause confusion for the interlocutor, or result in mistrust that what is being expressed verbally is true.

Speaker self-synchrony is not confined to gestures, but occurs with all movement, macro and micro. Thus analysis will show that a speaker's blinking, or a walking speaker's footfall, will be synchronised with their vocalic stresses. Furthermore, engaged interlocutors will also time their own movements (blinking, head nods, hand gestures) to fall on the speaker's beat. This interactional synchrony, as it is called, and the phenomenon of self-synchrony, were first observed by Condon and Ogston, who reported:
 The body of the speaker dances in time to his speech ... and the
 body of the listener dances in rhythm with that of the speaker!
 (Condon & Ogston, 1966, p. 338)

Later, Condon and Sander found that babies only fourteen days old develop kinesic synchrony with the vocalic stress of their mother's speech:
 The infant thus participates developmentally through complex
 socio-biological entrainment processes in millions of repetitions
 of linguistic forms long before he later uses them ... (Condon &
 Sander, 1974, p. 101)

 The infant lives itself into language and culture. (Condon, 1980,
 pp. 57-58)

Two further important features of vocalic stress relevant to gesture placement need to be noted here. Firstly, the placement of stress and unstress over the syllables in an utterance produces a rhythmic string. Within one language there is (effectively) a finite set of rhythmic strings and hence each language has typical prosodic patterns. The prosodic patterning of a language has been shown to be governed by one or other of two principles: in some languages, when the stresses fall over the utterance will vary according to the number of syllables contained in the various phrases of the utterance; in other languages (including English), the stress will fall after virtually regular intervals of time, irrespective of the number of syllables in any interval. Thus, for example, spoken in normal declarative mode, the English sentences 'He lost the book he bought' and 'He lost the umbrella he was carrying', are spoken not only with the same number of primary stresses (marked), but also with these stressed syllables timed to occur alter the same length of interval (Mortimer, 1976). This stress timing of speech has a number of effects on other syllables in a phrase and is largely responsible for the frequent English 'schwa' ([??]) or reduced vowel in unstressed syllables.

Word stress is created by means of increase in breath intensity, length, pitch, and sometimes volume of the voice, and by the visual emphasis of the stroke phase of any gesture accompanying it. In an utterance not otherwise affected by emotional colour, word stress falls on the major lexical items, and as these occur in typical grammatical patterns, the prosodic and kinesic patterns of a language are closely linked to the grammatical patterns of the language. Bounded by the processing limits of the human brain, the typical grammatical patterns of a language comprise a finite number of standard sentence strings, and thus the prosodic patterns do also (Adams, 1979).

Due to different grammatical arrangements in different language groups, the order of key lexical items, and hence the placement of vocalic stress and gesture, also differ from language group to language group (McNeill & Duncan, 2000; Stam, 2006).

Thus some languages express motion and path with a verb + adverbial phrase (e.g. in English: runs across the road, slides down a drainpipe), whereas other languages incorporate path into the verb but need a separate clause for manner, e.g. in French: Il traverse la rue en courant ('he crosses the road running'). A number of experiments have shown that native speakers of each language group gesture differently when expressing these events and often retain the style of their mother tongue when they start learning a language from the other group. One way to assess such a learner's acquisition of the new language is to look for shift in how gestures are being placed in this kind of utterance, i.e. away from first language norms to those of the second language.

To sum up this simple account of the relationship between gesture and language, we all use and interpret gestures as part of spoken communication from the very beginning of life. The shape, size, and frequency of our gestures are influenced by our culture. We synchronise our gestures with the stress pattern of our language, which in turn, is linked to the meaning of the lexical and grammatical elements of what we are saying. As a result, 'all speaking involves the choice and placement of three elements: the verbal, the vocal, and the kinesic' (Clark, 1996, p. 188); and 'in actual practice, people use and interpret verbal, prosodic [vocalic], and kinesic signals in relation to one another, not in isolation' (Arndt & Janney, 1987, p. 5).



Gesture and learning is an area studied by psychologists, neurologists, linguists, and educators in the full range of learning fields and learner type. Thus studies have been and are being conducted of the spontaneous use of gestures by, for example:

* normal and abnormal infants

* school students of all ages

* tertiary students in engineering and IT

* second and foreign language learners

* fully functioning older learners (University of the Third Age students)

* older learners with impaired faculties (Alzheimer patients).

As the result of years of close observation and analysis aided by the technology of video, neurological scanning, and computers, we now perceive that the spontaneous use of gesture plays a significant role in natural learning processes, assisting the learner to grasp concepts, develop skills, and store new knowledge--including new language--in any field. (See the many studies by Goldin-Meadow and her circle, for example, Goldin-Meadow, Alibali, & Church, 1993; Alibali & DiRusso, 1999; Goldin-Meadow, 2003a, 2003b; Church, Ayman-Nolley, & Mahootian, 2004; as well as those by Roth and various associates, listed in Roth, 2001; also Lindamood, Bell, & Lindamood, 1997; Minogue & Jones, 2006).

The findings of these studies of natural gesture use in learning have prompted researchers and teachers to introduce explicit kinesic experience into learning activities so as to enable learners to explore, organise, and store new ideas, information, words, action processes, etc. (Roth, 2004, 2007; Tellier, 2005; Goldin-Meadow, 2007). This does not mean just the usual 'hands-on' practice in technical subjects, but includes having learners of any subject consciously touch objects, trace paths on a screen or paper, 'draw' elements and relationships with their finger in the air or on a table, make gestures and/or walk while speaking, and so on. The resulting learning in terms of comprehension and retention has been shown to be superior.

Language teaching and learning

In the area of language teaching and learning, these discoveries have led to two developments. Firstly, second language learners naturally use gesture to orchestrate their attempts at mastery and these spontaneous gestures have been examined for clues to the learners' cognitive state in the process of acquisition; for signs of first language interference; and for communication of content (Gullberg, 1998, 2006; McCafferty, 2002; Stam, 2006; Yoshioko & Kellerman, 2006).

Secondly, gesturing has been explicitly introduced into language teaching and learning to aid the development of mastery. There are five principal ways in which this is occurring:

* to aid comprehension and retention

* as the basis of whole approaches

* for remedial vocal work

* to represent the deep meaning of grammatical forms

* taught as part of 'appropriate behaviour in self presentation' in second language learning.

Introduced gestures to aid comprehension and retention

Teachers gesture naturally when they speak and language teachers gesture whether they are speaking in their first or their second language. Their gestures convey information to the learners about the subject matter and about their feelings for the content and classroom management matters (Lazaraton, 2004; Tellier, 2005; Sime, 2006). Language teachers are now learning to intentionally increase and elaborate these natural gestures as a way of enhancing the meaning of what they are saying so as to assist learners m comprehending and storing the new. The gestures highlight certain ligatures or facts, give reinforced emotional tone to words or structures, and illustrate the basic referential meaning of concepts and objects not available in the room. Some of these teacher gestures may also be used as models for learners to copy. At other times, learners are encouraged to gesture, but left to develop their own particular movements. Teachers' and learners' gesturing will naturally be synchronised with their speech and this also serves very usefully to make the prosodic patterns (and tones in Chinese, for example) more salient to learner ears and more deeply embedded somatically (Orton et al., 1995; McCafferty, 2006).

The basis of whole approaches

The use of introduced gestures for teaching and learning has recently been integrated into whole approaches to language teaching, notably the Accelerated Integrative Approach (AIM) created privately a decade ago in Canada by Wendy Maxwell (2002, 2005), which (so far) has been developed for the teaching of French; and the Narrative Format (or Magic Teacher) Approach, created for the teaching of English, French, German, Spanish, and Italian in the European Program Socrates Lingua in 2004 by Traute Taeschner and colleagues at the University of Rome.

Although developed without knowledge of each other, the two approaches are strikingly similar in some respects, and they also differ in some significant ways. While both have been successfully trialled with older learners, including retired adults, the two sets of materials published so far are aimed at primary school learners. Both combine utterances with gestures--iconics, beats, and cohesives--and make high use of vocalic and facial expression, and also of mime. Both stress positive teacher-student rapport and teach language content in connected strings of natural discourse based in narratives. Where they differ in this is the basis for the number of gestures used in any string. Thus while both synchronise gesture stroke only with natural syllable stress, video data of each method show they do this differently.

Training teachers in her method, Taeschner adds gesture stroke to the primary stressed syllable of words that are naturally stressed, and the whole is spoken in natural chunks, in a flow at natural (public) pace, e.g. 'Once upon a / time there was an / egg'. Her gestures comprise iconics (and metaphorics) for most lexical items and also include beats and cohesives. Video of actual French classes being taught in Australia show that Maxwell's approach as used by her local representative provides a gesture for most of the words in any string. The gestures are placed on lexical and functional items and mark both primary and secondary stresses, 'although these arc differentiated by the length and emphasis of the gesture (and voice), e.g. Qu' / est-ce / qui se / passe? ('What's going on?'). The gestures predominantly comprise beats, although these are interspersed with loonies (including metaphorics) and mime. Chunking of the speech is natural and hence the whole phrase is rhythmically congruent in itself, i.e. kinesic and vocalic expression match. However, because time must be allowed for the gestures made on secondary stresses, such as ce and se above, delivery is slightly slower than normal spoken pace. In both approaches, however, gestures synchronise with voice, and delivery within one phrase is fluent. Hence in both approaches, within one spoken phrase there is little or no pause or recovery between the end of one gesture and the beginning of another. As a result, the time difference of delivery over the total phrase between the one way, which marks only primary stress with gesture, and the other, which marks both primary and secondary stress, is small.


The significance for learners of the underlying difference in prosodic detail marked by gesture in the two approaches remains to be seen. The European approach risks learners missing the 'little words', but privileges normal prosodic patterns by marking only primary stress. This may teach the rhythm and intonation patterns before the acquisition of full grammatical patterns, a natural ordering in first language acquisition, which Condon and Sander (1974) have suggested may be the key to the universal success of babies in acquiring their first language grammar. It also sets the pace of delivery at 'normal' speed, useful in itself and something that may allow the learner to be easily understood from the start, even despite some missing words. We do not know if, like babies, the various types of second language classroom learners could utilise these strengths successfully and then gradually pick up any missing words over time. The Canadian approach slightly distorts the natural prosodic pattern by making all stresses salient, albeit not necessarily equally. This makes the 'little words' of grammatical structures occurring between lexical items--words that often escape learners' aural attention--perceptible when spoken by the teacher, and provides a temporal space that must be filled when the phrase is said by the learner. If this not entirely normal prosodic pattern and pace can eventually be successfully modified--if tout / le / monde ('everyone') can become toutle / monde--this way may offer an advantage to second language classroom learners, whose opportunity to hear and naturally change incomplete structures over time is far smaller than that of native-speaker learners.

Another difference between these approaches is that the narratives in the European material--the adventures of two young dinosaurs--were created for the program, whereas AIM makes major use of known stories such as The Three Little Pigs, albeit also using original local 'narratives', e.g. J'ouvre la tete, prends mon anglais et mets l'anglais dans la poubelle ('I open my head and take out my English and put it in the rubbish bin'). A final difference is that the AIM approach uses a reduced lexicon to allow the learners to express themselves without being overburdened by excessive amounts of new vocabulary.

Despite confidence in the benefits of their approach, the European creators found some results of its first widespread use were disappointing. In a close study of teachers and learners using the approach, Taeschner (2005) discovered two significant differences in the teaching style of those who were not as successful as predicted: (i) they were not synchronising their gestures with their vocalic stress, and (ii) the stretches of teacher talk before handing over to the learners were significantly longer than those of the successful (magic) teachers. The first factor is particularly congruent with our understanding of the need for synchrony between voice and body if hearers are not be confused as to meaning, or doubt speaker trustworthiness. The second Factor, we might assume, interfered with progress by lowering learner engagement and reducing their practice time.

Remedial gestures

Based on the integral role of kinesics in meaning making and on the phenomenon of synchrony of voice and movement in natural speech, gestures are also used as aids to remediate the whole range of vocalic features of expression: articulation, stress, pitch, volume, pace, chunking, and intonation. Given the tight alignment between vocalic and lexicogrammatical choices and meaning, this is work at the deep level of language acquisition, not just a tidying up of the candles on the cake, which, it seems, it is often considered to be.

One highly developed form of this use of gesture is the verbo-tonal approach developed some 40 years ago by the Croatian speech therapist Petar Guberina (Guberina, 1970; Renard, 1975). His techniques include modulating the frequency of oral language presented and having learners walk, tap, and hum in time to the prosodic pattern while speaking. There is also remediation of specific pronunciation, intonation, and stress problems using improvised gestures, which match the tension, voice contour, and length of the target sound or phrase. While initially employed for teaching French prosodic patterns to the deaf, Guberina's techniques were quite quickly adapted fur the teaching of foreign or second languages, especially French (Menot, 1970; Wylie, 1975) and English (e.g. Vuletic, 1966: Gassin, 1986, 1990), and eventually Chinese (Orton et al., 1995). The verbo-tonal approach was also used as the basis of a series of audiovisual programs for teaching a number of languages, which was published some 30 years ago in France, among which the ESL program All's Well was widely used internationally (Dickinson et al., 1975).

Intuitive understanding of the power of perceptual phonetics understood by Guberina can be found in the pronunciation exercises developed for China's radical commercial language program Crazy English, in which there is a high emphasis on developing oral proficiency. All sounds are introduced in class using a system of gestures, which show where in the mouth the sound is produced, and then mirror the contour of the sound as expressed. These are worked on at every class and the gestures have been recorded on DVD (Li, 2006) to allow students to study and practise them on their own.


The movements of cognitive grammar

The work of Lapaire (2005) explores a yen new form of kinesic representation, where analyses of the varying degrees of meaning and force of closely similar grammatical options--in English, for example, should, must, and have to--are illustrated through gesture and movement. The aim is to help learners grasp subtleties of difference difficult to internalise simply from receiving verbal explanations and text samples.

The appropriate presentation of self

This area involves the teaching of natural gesturing in the target language to learners from other cultures where the type, size, frequency, and placement of gestures generally considered appropriate for the speech event is different. The aim here is not so much to teach specific gestures--though in the case of Japanese this is sometimes done--as to help the learners use enough acceptable gestures so as to present themselves without evoking a negative response--confusion, boredom, irritation, derision, offence--in native and other competent speakers, and learn to interpret native speakers' gestures (Jungheim, 2006). This teaching is most common in adult learning, especially for those wanting to take part in public speaking such as work presentations and job interviews (Ardila, 2001; Orton, 2006). It also includes teaching gestural taboos.


There is overwhelming evidence that the integral role of the body in learning, and especially of the hands, needs to be thoroughly reconsidered and appreciated. Research is increasing and findings will no doubt continue gradually to make their appearance in more and more teaching material in all fields, including the area of language learning. In the meantime, however, there is sufficient evidence to suggest that language teachers could most beneficially make rich use of gestures in their teaching, include gestures and other kinesic expression in learning activities, observe and learn to understand learners' natural gestures as indicators of learning stage, and learn more about gesture in language and learning by reading and sharing experience.


Jane Orton co-ordinates Modern Languages Education at the University of Melbourne, where her teaching and research centre on intercultural communication, including the nonverbal domain. She may be contacted at
Author:Orton, Jane
Date:Nov 1, 2007
