La grafo-fonemica en el entorno ILE: sobre la fiabilidad de las reglas de pronunciacion de vocales acentuadas.
Any EFL teaching method that at some point relies on the written word very soon has to face the challenges posed by grapheme/phoneme correspondences in English. Grapho-phonemic training is, nevertheless, blatantly absent from most EFL handbooks to this day. When Charles W. Kreidler discussed such absence as early as 1972, he argued that "[w]e don't teach the elementary student about English orthography because we really don't understand the nature of our spelling system" (4). However, English grapheme/phoneme correspondences had already been studied quite thoroughly by a number of scholars prior to his comment (Wijk 1966; Venezky 1970). Further effort was invested during the 1970s and 1980s into discerning the systematics of English orthography at the grapho-phonemic level. Extensive corpora were examined and the predictability of phoneme-to-grapheme and grapheme-tophoneme correspondences were empirically calculated. Uninterrupted, though increasingly sparse, research looked into the processes of accessing the mental lexicon through printed words (Frederiksen and Kroll 1976), the cognitive aspects of native and non-native dealings with orthography (Schwartz, Kroll and Diaz 2007), and the possibility of automatic grapheme-to-phoneme and phoneme-to-grapheme conversions (Daelemans and van den Bosch 1996).
Concern with orthography within EFL teaching has been rather marginal as a whole, although a few scholars, like H. D. Brown (1970), S. Schane (1970) and C. W. Kreidler (1972) have tackled the issue. More recently, specialists have argued that EFL teachers need to "understand the correspondences between English phonology and English orthography" in order to teach learners "to predict the pronunciation of a word given its spelling" (Celce-Murcia et al. 1996). Celce-Murcia and her colleagues build upon the work of Wayne B. Dickerson (1984; 1987; 1994), which is remarkably revealing in relation to the prediction of stressed-syllable location, and also, collaterally, in the prediction of the value of vocalic graphemes in the stressed position--the latter, mostly when it is a matter of choosing between the so-called lax and tense pronunciations. (1) Stressed letter <a> in a monosyllabic word, for example, will be /ae/ when closed by a single consonant, rat, and /ei/ when closed by consonant plus silent <e>, rate. Axel Wijk's (1966) and Richard L. Venezky's (1970) rendering reflected a far more complex panorama where stressed <a> regularly predicts other values in stressed position: /a:/ in spa, schwa, ah, wad, wan, art, car, smart, etc.; /[??]: / in war, quartz, ward, dwarf, etc.; /[??]: / in all, salt, bald, talk, etc. (2)
A simplified rendering of grapheme-to-phoneme correspondences is, however, common procedure among EFL specialists: when incorporated into EFL programs, pronunciation rules for vowels are often reduced to the CV, CVCe and CVVC rules that predict long pronunciation as in be, Pete and seed; and CVC predicting short pronunciation as in wet (Olshtain 2001, 209). Anne M. Ediger additionally deals with the Vr/CVr environment that guides predictions in the case of art, car or her, but she overlooks another known, relevant and regular context that involves <r>: care, here, fire, pure, which contrast with car, her, fir and purr (2001, 157).
In any case, despite the rich theoretical and empirical background available to assist EFL instructors in their dealings with grapho-phonemic issues, there seems to be a gap between orthography researchers and teaching specialists. A very small and general part of what is known about English orthography occasionally finds its way into EFL teaching practice. This might be partly due to the predominance, since the early 80s, of communicative and meaning-focused approaches to EFL. A situation that is beginning to change as reliable evidence mounts, suggesting that form-focused interventions tend to facilitate learning (Long and Robinson 1998; Lan and Wu 2013). Recent research shows that learners naturally extract statistic regularities from written language, and that such statistic learning can be enhanced through specific training (Doignon-Camus and Zagar 2014). Mere exposure to orthography, on the other hand, has been shown to have a positive effect on the development of phonological awareness (Cheung et al. 2001; Escudero, Hayes-Harb and Mitterer 2008). Explicit teaching of phonics and letter-to-sound correspondences seems to improve learning results not only in relation to EFL pronunciation (Rangriz and Marban 2015) but also in reading comprehension and literacy (Martinez Martinez 20ii). The time seems ripe for a reconsideration of the role of grapho-phonemic training in the EFL classroom.
A second reason for the general neglect of grapho-phonemic competence in the EFL classroom has to do with its intrinsic complexity. Book-length studies on the subject such as Donald Wayne Cummings's description of American spelling (1988) and Timothy Bozman's handbook for students of phonetics (1988) cannot be introduced into the EFL arena without adequate reduction and adaptation. The cumulative nature of these and other fundamental works is far from EFL-friendly. The system encompasses rules overriding rules, and exceptions that seem to be regular in their own way, only to have further exceptions that conform to the initial rules, etc. However, these rules are rarely presented within a pedagogical hierarchy distinguishing between essential and complementary rules, or determining logical sequences of overriding. We know very little about each rule in particular--the relative amount of words that they regulate, for example, with more or less regularity.
A third reason for the paltriness of EFL grapho-phonemic training might be the lack of direct applicability of much of the existing empirical research, which is almost entirely concerned with measuring the consistency of English orthography. Paul R. Hanna, Jean S. Hanna, Richard E. Hodges and Edwin H. Rudorf's exhaustive exploration of phoneme-to-grapheme correspondences, uncovering an 80% consistency within a corpus of 17,310 English words, has resulted in no pedagogical applications (Hanna et al. 1966). In 1987, Rita Sloane Berdnt, James A. Reggia and Charlotte C. Mitchum reversed the data in Hanna et al.'s analysis in order to determine grapheme-to-phoneme consistency; this has remained, over the years, mostly anecdotal. As Berdnt and her colleagues pointed out, the probabilities that they calculated were independent of context and therefore, as the authors themselves admitted, "[they] provide a conservative estimate of the extent to which particular letters and letter clusters are pronounced as particular phonemes in English, but they provide no information about the rules responsible for the derivation of these correspondences" (Berdnt et al. 1987, 1). Teachable rules are, after all, what EFL teachers would demand.
More recent research (Stanback 1992; Treiman et al. 1995; Kessler and Treiman 2001; 2003) incorporates a remarkable sophistication of statistical procedures, and continues to add to a perceived need to convince teachers--particularly teachers of Li English--that English spelling is much more consistent than it seems. However, such calculations of consistency are often derived from a consideration of monosyllabic words. Polysyllabic words are conveniently avoided because they "raise difficult questions of where syllable boundaries lie," and because they "have much higher proportions of foreign, Latinate, and technical words" (Kessler and Treiman 2001, 593).
After decades of increasingly sophisticated research, there can be no doubt today that the English grapho-phonemic system is consistent. Paradoxically, most specialists also agree that it seems chaotic. The famous 80% consistency of the system seems to have worked as an excuse for not teaching pronunciation rules: since the system is consistent, simple exposure to it will in the long run build up the necessary competence in our EFL students. That said, the 20% of recalcitrant words, apparently hiding just about anywhere within the system, may be quite dissuasive. In relation to exceptions and teaching, Axel Wijk wrote: "Since a large number of the irregular spellings are found among the commonest words in the language, it is obvious that there would not be much sense in foreigners making any systematic study of the rules of pronunciation when they begin their study of the language" (1966, 11). His suggestion for incrementing the EFL students' competence is the one that has been strictly followed by most textbooks: "to learn the pronunciation of each new word that they come across, by itself, [...] without much reference to any rules for the pronunciation of the various letters or combinations of letters of which the words are made up" (ii). From such a perspective, there is little hope that grapho-phonemic correspondences may ever become part of an EFL training program. As far as I know, Wijk did not fully substantiate his assertion about irregularity at the basic level; a claim, on the other hand, that contradicts Kessler and Treiman's findings already mentioned (2001). One way or another, we are told by important orthographists about the overall consistency of large corpora, but we do not know much about the regularity or frequency of most rules, or about rule-regularity and rule-frequency within particular groups of words, or within smaller samples like the glossary of a coursebook for beginners. (3) If we managed to shed some light on such issues, we might be able to offer a more enticing panorama to our EFL teachers. We might, for example, base our choice of a particular reading material on its grapho-phonemic characteristics and the kind of rules that are present in it. A specifically EFL-way of looking at pronunciation rules is required. This would imply the use of a conceptual frame that could help us discern between rules and organize them into a pedagogical hierarchy.
2. Basic Rules for the Interpretation of Stressed Unigraphs
A pronunciation rule is a hypothesis about the predictive properties of a particular graphemic environment in relation to the phonemic value of the letters affected by it. (4) A very well-known pronunciation rule establishes, for example, that a stressed unigraph in an oxytone structure followed by a single consonant and closed by a silent <e> is to be pronounced with the long version of that particular unigraph (Venezky 1970, 104; Bozman 1988, 13). (5) The long version coincides with the name of the unigraph, and this rule--often expressed as VC<e># or similar--allows us to predict values /ei, i:, ai, o[??], (j)u: / for the stressed vowels in pale, scene, Mike, hose and rude, respectively, and in any other words with the same structure. A rule that, incidentally, fails to predict the pronunciation of have, allege, police or move.
If we are to generate useful knowledge about pronunciation rules, the first thing to do is to isolate and describe as strictly as possible a comprehensive set of such rules. The panorama is more complex than that registered by Olshtain (2001) or Ediger (200i), mentioned above, and yet, it is important to reduce the detailed complexity described in Venezky (1970) or Cummings (1988) to manageable terms. It is not easy to navigate through all the different pronunciation rules on offer. We have rules for the interpretation of consonant-letters--like <c> before <o, u, a>--and for the interpretation of vowel-letters; rules for the location of primary, secondary or tertiary stresses; rules for the interpretation of stressed and unstressed syllables; etc. Our first task is to isolate a particular set of rules, governing a particular aspect of orthography, such that a minimum of them is able to cover a maximum of occurrences. I will therefore focus on rules that assist the interpretation of single vowel letters--unigraphs--located at the stressed syllable of any English word. For the sake of pedagogical efficiency, we need to isolate the kind of rules that could be applied to as many English words as possible--instead of only to those that have <c> or <y>, for example--provided that they are sufficiently effective in predicting pronunciation. Table 1 represents the set of what I call the ten basic general-systemic rules for stressed unigraphs in American English; a comprehensive set of rules that can be applied to all the words that have a unigraph in their stressed syllable, without exception.
A "general systemic rule" is one that is capable of predicting the value of at least four of the five vocalic unigraphs. The above mentioned VC<e># rule belongs to this category, but there are pronunciation rules--I will call them domain-specific--that are only applicable to one or two unigraphs, or domains; Olshtain mentions one such rule (2001, 210). In my own terms, adjacent post-nuclear <ll> has predicting power in relation to the stressed unigraph <a> but not in relation to <e, i, u>. (6) Notice that bet and bell, bit and bill, dust and dull are subject to the same general systemic rule--PR2.1--while the contrasts bat~ball, tan~tall, mad~mall, etc. point to adjacent post-nuclear <ll> as a domain-specific context affecting <a> quite regularly, and <o> in a less regular way--rot~roll.
The pronunciation rules in table 1 constitute a development of Bozman's general rules for the spelling of long and short pronunciations (1988, 14-15). All of them, without exception, will also be found in the works of Wijk (1966), Venezky (1970) and Cummings (1988). There are, however, a few aspects where I depart from the traditional positions. Apart from the technical terms and descriptions used above, there is the subdivision into oxytone, paroxytone and proparoxytone structures, the value that PRI assigns to <a>, and the boxes marked with NA for domains where a particular rule is "not applicable." A word like aluminum, for example, which would fit into the PR6 type will be interpreted according to PR3.2 because PR6 is not applicable to the <u> domain (Bozman 1988, 48).
Table i represents basic grapho-phonemic competence in General American. An RP version would be almost identical; in standard British English the value for <o> in PR2 and PR6 would be /d/, and words like sorry, current and oracle would fill the NA boxes in PR2.2 and PR6. On this occasion, our calculations of rule reliability and presence have been made for American English, but our guess is that with slightly different measures in some marginal cases, the overall results would be quite similar for RP.
3. Research Questions
The general idea is that EFL teachers could make use of table 1, and teach it either completely or partially to their students. Before that, however, it is necessary to put these contents to the test and provide an answer to the following research questions:
RQ1. How exhaustive is this particular set of rules in relation to the interpretation of
American English stressed unigraphs? Would we need additional rules?
RQ2. How do these rules differ in terms of frequency and regularity? Are there any word types that feature as particularly (in)frequent and/or (ir)regular?
RQ3. Are there areas of special difficulty that EFL teachers might consider avoiding?
RQ4. Are there significant differences at the different levels (beginners, intermediate, etc.)?
How could the teaching of rules be distributed across the different levels?
Using Excel 2010 and the statistical package SPSS 20, I have checked the reliability and frequency of each of the ten general-systemic rules against Mark Davies's 5,000 wordlist containing the most frequent words in American English (2015). The corpus has been conveniently reduced by filtering away words with stressed digraphs and trigraphs, which are not covered by the rules whose reliability I am trying to assess (see table 2). The original wordlist also contains a number of repetitions. The word work, for instance, appears in rank 117 as verb, and then again in rank 199 as a noun. I have computed this and all repeated items only once whenever I found exactly the same grapho-phonemic correspondences. Where pronunciation changes along with function, the items have been treated and computed separately--protest for example, has been computed as a PR2.1 type with stressed <e> when functioning as a verb and as a PR3.2 type with stressed <o> when functioning as a noun.
Special words like mm-hmm (rank 2,966) and acronyms have also been discarded. Compound words have been separated into their components and reintroduced in the database. The grapho-phonemics of, for example, understand, comprises that of under--a regular PR2.2-type--and that of stand--a regular PR2.1-type. There are few compounds that break this rule. In fact, the corpus only contains three cases: gentleman, freshman and businessman; the words gentle, fresh and business are already present in the corpus. In most cases, in fact, reintroduced items were found to constitute repetitions not to be computed. As a result of this process of filtering out the irrelevant items for the sake of maximally reliable results, I obtained 3,005 frequent English words against which to test the ten basic rules presented above (see table 2).
For each of the words contained in the final list, I have registered information concerning the grapheme--<a, e, i/y, o, u>--found in its stressed syllable, the applicable pronunciation rule in each case--PRi, PR2.1, etc.--and a yes/no tag depending on whether the applicable rule works or not. For the process of tagging each item with their respective PR type I have taken as a reference the American English transcriptions from Longman Pronunciation Dictionary, edited by John C. Wells (1990). When more than one possible pronunciation was offered, I have only considered the first and most standardized one. In the case of function words with strong and weak forms, the strong form has been considered.
The tagging of derivatives has been quite challenging. It seems obvious that once you have tagged and computed a word like color as a PR3.2 type, the word colorful, also registered in the original list, looks very much like a repetition. If treated as such, the word colorful should not be computed, and all suffixed words should then be subjected to the same consideration. However, not all suffixed words lend themselves so nicely to that treatment. Many common words like protective, defendant or evidence would not be computed with such a procedure, and the corpus would have been drastically impoverished.
My solution here has been rather practical. I have discarded as repetitions those words consisting of a root already existing in the database plus affixes -ly, -ing, -ed, -ful, -ness, -ment, -less, -ship, -wise, -hood, -th (in ordinal numbers), un-, and plural -s. These common and frequent affixed words could be easily treated in grapho-phonemic training through a simple elimination strategy: when faced with a word like gathering, students could be instructed to eliminate the -ing ending and consider the predictability of gather.
When a word with any of these affixes was found whose root was missing from the database, it is the root itself that has been tagged and computed. So, the word frankly has been tagged and computed as if it was frank--i.e., as PR2.1 rather than PR2.2--because frank itself was not in the database. The rest of the derivatives, with affixes other than those listed above, have been computed according to their corresponding type as whole words: director as a PR2.2 type, performance as PR4.2, racism as PR3.2, etc.
Still, a further complication arises when dealing with derivatives. A word like additional, for example, would be categorized as a regular PR6 type. However, regularity here might not be due to the PR6 context, but to the fact that it derives from addition, which actually breaks PR3.2. Inversely, a word like favorable breaks PR6, but it does so in order to preserve the recognition of favor, a regular PR3.2 type. It is not convenient, then, to compute additional as a confirmation of PR6, nor to treat favorable as an exception to that same rule, since both words are actually subjected to yet another rule: a principle of preservation of etymological traceability (Venezky 1970, 120; Camara-Arenas 2010, 78).
Another practical solution was then felt to be in order: derivatives which confirm their rules--as additional confirms PR6--have been computed only when their roots also confirm their own rules; derivatives which break their rules--as favorable breaks PR6--have been computed only if their roots also break their own rules; and they have not been computed otherwise. In this way, a word like global has been computed as confirming PR3.2 because globe also confirms PR3.1; fully has been computed as breaking PR2.2 because full also breaks PR2.1. But neither additional nor favorable have been computed as either confirming or breaking PR6. In this way, I did not boost the reliability of PR6 with words like additional, clinical, columnist or developer; nor did I undermine it with words like agency, behavioral, educational or frequency. All these examples owe the phonemic value of their stressed vowel to their root. In most cases, the respective roots--addition, clinic, behavior, education, etc.--were in the list and were tagged and computed accordingly, the only exceptions being theoretical, tropical and practitioner. These last three have not been computed in any way; if they had been, they would have, however, added to the reliability of PR6.
For operational purposes, I have further divided the words into five levels. Level one contains the most frequent 601 words, which would be ideally taught to beginners, with successive levels incrementing 601 words each until reaching the final amount of 3,005 words at the end of level five. This procedure has allowed me to consider rule regularity and presence from a progressive point of view. Teachers of EFL for beginners might be more interested in rule regularity and frequency at levels one and two than at later levels.
A total of 2,514 items were found to be predictable from the application of the rules. This rendered a general predictability of 83.6%, out of the 3,005 stressed-unigraph words. The mean regularity of pronunciation rules is 87%--derived from the values presented in figure i . As we can see in this figure, the only value that stands out is that of PR3.2, which registered significantly lower regularity.
When considering regularity across levels, results show an average of 8i%. Rules are less regular at level one (76%), with a clear tendency to increase their regularity as the amount of computed words increases. At level two, with 1,202 words, regularity is 80%. This percentage increases to 82% at level three (1,803 words), and rises above 83% at levels four and five (2,404 and 3,005 words respectively). The more words we include, the more regular the set becomes. As can be seen in figure 2, this seems to be the case for all rules, except PR5.2. The five bars for each rule in figure 2 represent, from left to right, levels one to five.
Figure represents the presence of each of the rules within the sample. Frequency data shows more dispersion than regularity data. The average presence was 10%, with a standard deviation of 9%. Values beyond 19% stand out as particularly relevant--PR2.2 and PR2.1. Values around 1%--PR5.1, PR1--point to a rather incidental presence. Taken together, 86% of the processed words were found to belong to PR2, PR3 and PR6.
Figure 4 represents the presence of word-types at the different levels. For each rule, the five bars indicate from left to right the percentage values of levels one-to-five I have identified word-types whose presence decreases, like PR1, PR2.1, PR3.1 and PR4.1; word-types whose presence increases, like PR2.2, PR3.2 and PR6; and word-types, like PR4.2, PR5.1 and PR5.2 whose presence neither increases nor decreases in any significant way.
If we compare figures 1 and 3, a number of facts stand out: some rules are very regular, but have very little presence--e.g., PR5.1; one rule, PR3.2, has a rather high frequency but low regularity; and there are some, like PR2.1, PR2.2 and PR.6 that feature a both frequent and regular.
A final set of significant results has to do with the regularity of the function words in the list. We can now reconsider Wijk's claim that the most frequent words in English, which ha pen to be function words, are mostly irregular (1966, ii). Our study allows for a con rastive analysis of regularity in function words at levels one (beginners) and five (whole corpus). The results for both levels show similarities. At level one, only articles (92%), prepositions (90%) and conjunctions (80%) show significant regularity. Similarly, at level five, the most regular are, again, articles (92%), prepositions (84%) and conjunctions (80%), as well as interjections (88%). At the initial level, as far as function words are concerned, PR2.1 (80%), PR4.1 (100%), PR4.2 (100%), PR5.2 (100%), PR6 (100%) are regular. At the highest level, these rules remain similarly regular, except for PR5.2, which drops to 50%. At this final level PR2.2 also reaches regularity (80%). At both levels, PR3.2 registers significantly low regularity: 30% at level one and 17% at level five. Despite these similarities, the average regularity of function words is significantly lower at level one (69%) than at level five (77%). These low percentages are related to the irregularity, at both levels, of pronouns, numbers and demonstratives.
Concerning the possible "teachability" of our ten basic pronunciation rules, there are at least two fundamental variables to consider: regularity and frequency--or presence. Of these, regularity is clearly a sine qua non: any rule that had (almost) as many exceptions as regulated cases should no longer be considered a rule. Frequency, on the other hand, raises issues of pragmatism: we might be interested in teaching a rule only insofar as our students are likely to encounter many chances to apply it. An infrequent rule, however, would still be a rule. The teachable character of four of our ten basic rules seems unquestionable inasmuch as they are both regular and frequent: PR2.1, PR2.2, PR3.1 and PR6, which regulate words like not, letter, fine and animal.
The consideration of regularity is by no means a simple matter. It is not easy to determine how many exceptions a given rule might be permitted to have before it turns into a case of anecdotal regularity. It is probably the EFL teachers faced with the results of this study who must decide whether a given regularity threshold is acceptable for them or not. Should this threshold be set at 90%, only half the rules--PR2.2, PR3.1, PR4.1, PR4.2, PR5.1 and PR6--would turn out to be teachable. If the threshold is lowered to 80%, only PR3.2, regulating words like nation, even, final, over or student would be left out.
Quite clearly, the greatest challenge for teaching our ten adjacent post-nuclear general-systemic indicators is posed by PR3.2. A reliability of 58% for a set of 540 items actually allows us to question PR3.2 as a general-systemic rule, despite its somewhat larger reliability in the <a, u> domains--see appendix. So, while it is clear that a stressed unigraph followed by CC is predictable, the same unigraph followed by Cv is not predictable to a comparable extent. All I can say, at best, is that the long version--PR3 phonemic value--in these cases is slightly more frequent than the short version--or PR2 phonemic values. That is, words like lemur are somewhat more frequent than words like lemon.
This has technical implications for teaching. While pairs like mat~mate, pet~Pete, pin~pine, cod~code and cut~cute could be used to take pedagogic advantage of the contrast between PR2.1 and PR3.1, the same procedure would be misguiding in the case of PR2.2 and PR3.2; not because conforming pairs like matter~mater or saddest~sadist cannot be found, but because PR3.2 is not fully reliable. Still, a reliability of 58% might be worth taking into account somehow, and before rejecting any application of the pattern, one should see if the group of exceptions might possibly be reduced through the application of domain specific rules.
With a reliability of 30%, PR3.2 for the <i> domain stands out as the most unreliable sub-rule. However, many of the seventy-four words that do not follow PR3.2 here actually follow other easy and reliable domain specific rules. For example, we know that stressed <i> retains pronunciation /i/ despite a VCv environment when it fits the description VCvv, as in condition, civilian, continue, efficient, suspicious, widow, etc. (Wijk 1966, 20). A total of forty-one of the seventy-four supposedly irregular words are actually subject to this domain specific rule.
Furthermore, twelve of the seventy-four irregular PR3.2 words are actually predicted by known distant post-nuclear contexts--such as the suffixes -ish, -ic or -it--which tend to fix stress on the previous syllable and to predict the short value of the unigraph (Bozman 1988, 48). Words like clinic, diminish or explicit break PR3.2, but do so in order to follow this overriding rule. The application of other known rules finally reduces the seventy-four supposedly irregular words to only five: casino, city, consider, prison and sibling. Not all groups of exceptions, however, allow such consistent and intensive reduction, although there can be no doubt that the general total reliability of 83.6% for the rules presented here would increase if domain specific rules were also included.
Another challenge concerning regularity is the confirmation of Wjik's claim that irregularities tend to abound within the most basic vocabulary (1966, ii). A portion of this basic vocabulary is constituted by function words, which have, as we have seen, an average regularity of 69% at level one. In general, the average regularity of the rules when applied to the first 601 words falls below 80%. However, on more detailed inspection we see that PR2.2, PR4.1, PR4.2, PR5.1, PR5.2 and PR6 each achieve a regularity well above 80% even within this first level--see table 5.
The strong form of some of the function words--articles, prepositions and conjunctions--has proved to be rather regular at both level one and five. Although function words are most frequently pronounced with their weak form, common items like there, of, were, to, whom, etc. may add to the perception of a chaotic system. In fact, Wijk's perception of irregularity among common words, both functional and lexical, is confirmed by the results. However, his subsequent conclusion that pronunciation rules are not to be taught at beginner levels should be reconsidered in light of these findings. With the enhanced discrimination that our procedure permits, we see that there is much within basic vocabulary that remains regular and teachable at level one.
In relation to rule presence, stressed-unigraph words are much more frequent in English than stressed-digraph and stressed-trigraph words. Up to seven out of ten words that students encounter during their English training will be of the kind to which one of our ten basic rules is applicable. Of these, however, PRi, PR4 and PR5 are alarmingly infrequent. In the case of PR1, for example, there are merely forty words, to be taught over the five levels. One actually wonders whether PRi--referred to by other authors as the CV rule--actually exists at all. If we move beyond the 5,000 wordlist, we would certainly find more cases, but they would have to be considered relatively infrequent, and their usefulness for non-advanced EFL students would thus be rendered debatable. Furthermore, the reliability of PRi is only beyond 80% in the <e, y> domains, which instructors might choose to teach as domain-specific rules, if at all--see appendix.
The situation with PR5.1 and PR5.2 is very similar. The frequency of these words here seems insufficient, and if we consider it a fundamental condition for "teachability," the convenience of investing effort in the teaching and learning of these rules becomes questionable, to say the least. That said, they do actually constitute a very reliable set of pronunciation rules. If an EFL instructor decides they are worth teaching, the best strategy would probably not be waiting for the words in question to come up, but rather to confront their teaching explicitly, and be ready to work with not so frequent words or even pseudo-words. The same could be said about the PR4 type, where presence, though still limited, is larger than in PR5, and, what is more, reliability is the highest of the entire set.
An interesting aspect that emerges upon analyzing our results--see figure 4--is that there is an inverse correlation between presence and level in the case of oxytones, and a direct correlation in the case of paroxytones and proparoxytones. Words like can, cane, car or care are in general less frequent than words like manner, vapor, party or Mary, but they are more frequent at level one. This is related, in part, to the fact that the strong forms of many frequent functional words are mostly oxytones. Still, figure 4 suggests a possible order in the teaching of pronunciation rules: PRi, PR2.1, PR3.1, PR4.1 and PR5.1 might be taught in the first levels. Chances for practicing PR2.2, PR3.2, PR4.2, and PR5.2, by no means scarce at level one, will only increase in subsequent levels.
7. Conclusions and Further Research
The first conclusion that we can draw from our study is that there is, in fact, a reduced number of pronunciation rules that may help our EFL students to interpret the phonemic value of the stressed unigraph in most cases. We should not underestimate the relevance of this fact. Although books on English orthography are usually lengthy and complex, a small set of ten basic rules has proved to be extremely exhaustive. We must assume that the reason why these rules are not usually taught in EFL courses has little to do with their reliability or their applicability. Further domain specific rules would increase our perception of consistency, but they would actually cover a much smaller number of cases.
Orthographic structures like those present in manner (PR2.2), pet (PR2.1), enemy (PR6) and cone (PR3.1) are, in this particular order, the most frequent in everyday American English, and they regularly allow the prediction of a particular vowel phoneme in the stressed syllable. On the other hand, orthographic structures like those present in rely (PR1), car (PR4.1), person (PR4.2), mire (PR5.1) and hero (PR5.2) are much less frequent, but they tend to be even more reliable in the prediction of the phonemic value of any unigraph in their stressed syllable. While research into the best ways of teaching pronunciation rules is still to be carried out, it seems reasonable to think that for rules that have limited presence but high regularity, explicit teaching would be appropriate.
Word-type PR3.2 is extremely problematic in grapho-phonemic terms. It stands as the third most frequent structure in English. At the same time, the predictability of the phonemic value of its stressed unigraph is only slightly above 50%: we have vapor but manor, Peter but second, icon but idiot, odor but body, Cuban but punish. One of the most frequent structures in English is also the most irregular. Any EFL instructor minded to teach pronunciation rules should probably either discard PR3.2 or, preferably, be ready to give it special treatment--making room, perhaps, for some domain specific rules.
The fact that words like can, cane, car and care--oxytone structures--tend to be relatively frequent among the first 601 items of our list suggests that the oxytone rules PR1, PR2.1, PR3.1, PR4.1 and PR5.1 might be best taught at the first levels. Paroxytone and proparoxytone rules might be quite profitably dealt with at later stages.
The present study has some limitations. Although some parts of the processing have been made automatically, word-type tagging and information on rule regularity had to be manually completed. This has made it impossible to work with a larger corpus, which would have been very desirable. Nevertheless, the amount of words processed, having been selected with a view to EFL applicability, is neither insufficient for consolidating reliable knowledge, nor particularly small when compared with previous research. Venezky (1970) dealt with 20,000 words and Stanback (1992) with 17,602, but Kessler and Treiman (2001; 2003), also using both automated and manual procedures, made important contributions by processing smaller carefully selected corpora (1,329 and 914 words respectively). Relatively small corpuses, compiled following strict criteria, may lead to strong conclusions about specific aspects of English orthography--monosyllabic words, rhymes, vocalic unigraphs, etc. On the other hand, careful selection is not necessarily incompatible with a larger corpus. I am currently working on the design of scripts and automatic protocols that will hopefully increase automatic processing and allow me to explore the "teachability" of a larger battery of rules--domain specific rules, overriding principles, unstressed syllables, digraphs, etc.--within a much larger corpus.
For the time being, we may hold with the conclusion that the ten basic post-nuclear general-systemic rules constitute, as a whole, teachable material; at least in terms of their frequency and regularity. However, ten basic rules might still be too many for instructors who are legitimately interested in the development of effective communicative skills rather than in the unveiling of peculiar fine-grained aspects of the English language to their students. Some of the research reviewed above points to possible unexpected advantages to grapho-phonemic training and this is a matter that requires further study. Would grapho-phonemic training aid vocabulary memorization and recall? Would it build up confidence in EFL students? Would it have a positive impact on the assimilation of the English phonological system? Would the development of grapho-phonemic competence correlate with an improvement in listening skills? Or oral skills? These and other related questions must be left for future exploration.
Aro, Mikko and Heinz Wimmer. 2003. "Learning to Read: English in Comparison to Six More Regular Orthographies." AppliedPsycholinguistics 24: 621-635.
Berndt, Rita Sloane, James A. Reggia and Charlotte C. Mitchum. 1987. "Empirically Derived Probabilities for Grapheme-to-Phoneme Correspondences in English." Behavior Research Methods, Instruments, & Computers 19 (1): 1-9.
Bozman, Timothy. 1988. Sound Barriers. A Practice Book for Spanish Students of English Phonetics. Zaragoza: Universidad de Zaragoza.
Brown, H. Douglas. 1970. "Categories of Spelling Difficulty in Speakers of English as a First and Second Language." Journal of Verbal Learning and Verbal Behavior 9: 232-236.
Camara-Arenas, Enrique. 2010. La vocal inglesa. Correspondencias grafo-fonemicas. Valladolid: Universidad de Valladolid.
Celce-Murcia, Marianne, Donna M. Brinton and Janet M. Goodwin. 1996. Teaching Pronunciation: A Reference for Teachers of English to Speakers of Other Languages. Cambridge: Cambridge UP.
Cheung, Him, Hsuan-Chih Chen, Chun Yip Lai, On Chi Wong and Melanie Hills. 2001. "The Development of Phonological Awareness: Effects of Spoken Language Experience and Orthography." Cognition 81: 227-241.
Cummings, Donald Wayne. 1988. American English Spelling. An Informal Description. London: The John Hopkins UP.
Daelemans, Walter M.P. and Antal P.J. van den Bosch. 1996. "Language-independent and Data-oriented Grapheme-to-phoneme Conversion." In Progress in Speech Synthesis, edited by Jan P.H. Van Santen, Richard Sproat, Joseph Olive and Julia Hirschberg, 77-90. New York: Springer.
Davies, Mark. 2015- Word Frequency Data. Corpus of Contemporary American English. [Accessed online on June 15, 2015].
Dickerson, Wayne B. 1984. "The Role of Formal Rules in Pronunciation." In On TESOL'83: The Question of Control. Selected Papers from the Seventeenth Annual Convention of Teachers of English to Speakers of Other Languages, Toronto, Canada, March 15-20, 1983, edited by Jean Handscombe, Richard A. Orem and Barry P. Taylor, 135-148. Alexandria, VA: TESOL.
--. 1987. "Orthography as a Pronunciation Resource." World Englishes 6 (1): 11-20.
--. 1994. "Empowering Students with Predictive Skills." In Pronunciation Pedagogy and Theory: New Directions, New Views, edited by Joan Morley, 17-35. Alexandria, VA: TESOL.
Doignon-Camus, Nadege and Daniel Zagar. 2014. "The Syllabic Bridge: The First Steps in Learning Spelling to Sound Correspondences." Journal of Child Language 41 (5): 1147-1165.
Ediger, Anne M. 2001. "Teaching Children Literacy Skills in a Second Language." In Teaching English as a Second or Foreign Language, edited by Marianne Celce-Murcia, 153-170. London: Heinle & Heinle.
Escudero, Paola, Rachel Hayes-Harb and Holger Mitterer. 2008. "Novel Second-language Words and Asymmetric Lexical Access." Journal of Phonetics 36: 345360.
Frederiksen, John R. and Judith F. Kroll. 1976. "Spelling and Sound: Approaches to the Internal Lexicon." Journal of Experimental Psychology: Human Perception and Performance 2 (3): 36i-379.
Hanna, Paul R., Jean S. Hanna, Richard E. Hodges and Edwin H. Rudorf. 1966. Phoneme-grapheme Correspondences as Cues to Spelling Improvement. Washington D.C.: US Government Printing Office.
Kessler, Bret and Rebecca Treiman. 2001. "Relationships between Sounds and Letters in English Monosyllables." Journal of Memory and Language 44: 592-617.
--. 2003. "Is English Spelling Chaotic? Misconceptions Concerning its Irregularity." Reading Psychology 24: 267-289.
Kreidler, Charles W. 1972. "Teaching English Spelling and Pronunciation." TESOL Quarterly 6 (i): 3-i2.
Lan, Yizhou and Mengjie Wu. 2013. "Application of Form-focused Instruction in English Pronunciation: Examples from Mandarin Learners." Creative Education 4 (9): 29-34.
Long, Michael H. and Peter Robinson. 1998. "Focus on Form: Theory, Research and Practice." In Focus on Form in Classroom Second Language Acquisition, edited by Catherine Doughty and Jessica Williams, 15-41. Cambridge: Cambridge UP.
Martinez Martinez, Angelica Maria. 2011. "Explicit and Differentiated Phonics Instruction as a Tool to Improve Literacy Skills for Children Learning English as a Foreign Language." Gist. Education and Learning Research Journal 5: 25-49.
Olshtain, Elite. 2001. "Functional Tasks for Mastering the Mechanics of Writing and Going Just Beyond." In Teaching English as a Second or Foreign Language, edited by Marianne Celce-Murcia, 207-218. London: Heinle & Heinle.
Rangriz, Samaneh and Amin Marban. 2015. "The Effect of Letter-sound Correspondence Instruction on Iranian EFL Learners' English Pronunciation Improvement." Journal of Applied Linguistics and Language Research 2 (7): 36-44.
Schane, Sanford. 1970. "Linguistics, Spelling, and Pronunciation." TESOL Quarterly 4 (2). 137-141.
Schwartz, Ana I., Judith F. Kroll and Michele Diaz. 2007. "Reading Words in Spanish and English: Mapping Orthography to Phonology in Two Languages." Language and Cognitive Processes 22 (1): 106-129.
Stanback, Margaret L. 1992. "Syllable and Rime Patterns for Teaching Reading: Analysis of a Frequency-based Vocabulary of 17,602 Words." Annals of Dyslexia 42: 196-221.
Treiman, Rebecca, John Mullenix, E. Daylene Richmond-Welty and Ranka Bijeljac-Babic. 1995. "The Special Role of Rimes in the Description, Use, and Acquisition of English Orthography." Journal of Experimental Psychology: General 124 (2): 107136.
Venezky, Richard L. 1970. The Structure of English Orthography. The Hague: Mouton.
Wells, John C. 1990. Longman Pronunciation Dictionary. London: Pearson Longman.
Wijk, Axel. 1966. Rules of Pronunciation for the English Language. London: Oxford UP.
Appendix. Full Data Concerning Regularity and Frequency PR-1 Domain Items Regular Irregular Ratio <a> 2 1 1 50% <e> 7 7 0 100% <i> 3 2 i 66% <y> I7 I7 0 100% <o> ii 7 4 63% <u> 0 Totals 40 34 6 85% PR-2.1 Domain Items Regular Irregular Ratio <a> 174 144 30 82% <e> 163 163 0 100% <i> 190 158 32 83% <y> 3 3 0 100% <o> 106 54 52 50% <u> 81 74 7 91% Totals 717 596 121 83% PR-3.1 Domain Items Regular Irregular Ratio <a> 92 90 2 97% <e> 14 I3 1 92% <i> 98 85 13 86% <y> 2 2 0 100% <o> 65 49 16 75% <u> 23 23 0 100% Totals 294 262 32 89% PR-4.1 Domain Items Regular Irregular Ratio <a> 44 37 7 84% <e> 22 22 0 100% <i> 12 12 0 100% <y> 0 ... ... ... <o> 41 37 4 90% <u> 13 13 0 100% Totals 132 121 11 91% PR-5.1 Domain Items Regular Irregular Ratio <a> 16 16 0 100% <e> 7 5 2 71% <i> 11 11 0 100% <y> 0 ... ... ... <o> 11 11 0 100% <u> 7 7 0 100% Totals 52 50 2 96% PR-6 Domain Items Regular Irregular Ratio <a> 61 54 7 88% <e> 102 100 2 98% <i> 85 78 7 91% <y> 1 1 0 100% <o> 66 59 7 89% <u> NA NA NA NA Totals 315 292 23 92% PR-2.2 Domain Items Regular Irregular Ratio <a> 163 149 14 91% <e> 2I7 214 3 98% <i> 148 141 7 95% <y> 8 8 0 100% <o> I05 80 25 76% <u> 83 76 7 91% Totals 724 668 56 92% PR-3.2 Domain Items Regular Irregular Ratio <a> 193 140 53 72% <e> 69 31 38 44% <i> 106 32 74 30% <y> 3 2 i 66% <o> 100 51 49 51% <u> 69 57 12 82% Totals 540 313 227 57% PR-4.2 Domain Items Regular Irregular Ratio <a> 34 33 1 97% <e> 32 32 0 100% <i> 7 7 0 100% <y> 0 <o> 43 41 2 95% <u> 15 15 0 100% Totals 131 128 3 97% PR-5.2 Domain Items Regular Irregular Ratio <a> 9 6 3 66% <e> 17 13 4 76% <i> 6 5 1 83% <y> 0 ... ... ... <o> 18 I7 1 94% <u> 10 9 1 90% Totals 60 50 10 83%
Received 13 September 2017 Revised version accepted 7 March 2018
Enrique Camara-Arenas is a lecturer and researcher in the English Philology Department at the University of Valladolid who specializes in the grapho-phonemics of the English language and pronunciation teaching methodologies. He is the author of La vocal inglesa: eoooespoadeaeias grafofoaemieas (Valladolid, 2010) and has designed a Native Cardinality Method for teaching English pronunciation to Spanish students of EFL--Curso de pronunciation de la lengua inglesa para hispanohablantes (Valladolid, 2013).
Address: Departamento de Filologia Inglesa. Facultad de Filosofia y Letras. Universidad de Valladolid. Plaza del Campus, s/n. 47011, Valladolid, Spain. Tel.: +34 983423747.
Universidad de Valladolid
(1) Many approaches to grapho-phonemics focus on this distinction, as if vocalic letters were to have mainly two regular values: the long/tense/free--depending on the author--and the short/lax/checked.
(2) All examples in this study are given for General American pronunciation.
(3) When characterizing a rule in terms of frequency, I will be considering the number of words within the corpus used in this work to which a particular rule is applicable; it implies an estimate of how frequently our students will have to apply it. A particular word-type--i.e., a group of words that follow the same rule--has a high or low presence depending on whether the corpus contains many or few instances of it, respectively.
(4) Notice that in talking about "prediction" we are already siding with the EFL student, who lacks a priori grapho-phonemic competence in English, and is constantly facing new words in the written format whose pronunciation is not to be identified, as in the case of natives, but rather guessed. Instead of prediction, most experts outside EFL talk about phonological decoding or even phonological translation (Aro and Wimmer 2003).
(5) Words stressed on the last syllable constitute oxytone structures--like bet, attack, introspect, etc. A paroxytone structure has the primary stress on the penultimate syllable--like manner, indulgent, condemnation, etc. Words stressed on the antepenultimate syllable or before are proparoxytones--enemy, indiscriminate, ceremony, etc. Although experts do not usually make this distinction, I believe that there is much to be gained in terms of discrimination power by incorporating them, as I will later show.
(6) In the English system there are also adjacent pre-nuclear grapho-phonemic contexts--like onset <w-, qu-> before <a, o>, as in and~wand, horse~worse--and distant post-nuclear contexts--all the suffixes explored by Dickerson (1984)--that tend to determine both where the primary stress is located and how the nuclear vowel is to be pronounced.
Caption: Figure 2. Rule regularity across levels
Caption: Figure 4. Frequency of rules across five levels
Table 1. American English General Systemic Rules PR-1 <a>, /a: / <e>, /i:/ ...V# spa me PR2.1 <a>, /ae/ <e>, /e/ a. ...VC# cat bet b. ...VCC# back belt c. ...VCCC# match fetch d. ...VCC<e># lapse ledge PR2.2 <a>, /ae/ <e>, /e/ a. ...VCCv ... happen question b. ...VCCC ... android eetry c. ...V<rr>v... narrow merry PR3.1 <a>, /ei/ <e>, /i:/ ...VC<e># make Pete PR3.2 <a>, /ei/ <e>, /i:/ a. ...VCv ... paper Peter b. .VC<r>v. patriot secrrt c. ...VC<l>v... able NA PR4.1 <a>, /a:/ <e>, / [??]: / a. ...V<r># car her b. ...V<r>C# part term c. ...V<r>CC# arch perch d. ...V<r>C<e># large seere PR4.2 <a>, /a:/ <e>, / [??]: / a. ...V<r>Cv... party service b. ...V<r>CC... partner interpret PR5.1 <a>, /er/ <e>, /ir/ VC<re># care here PR5.2 <a>, / er/ <e>, / ir/ V<r>v... parent period PR6 Proparoxytones <a>, /ae/ <e>, /e/ a. ...VCv ... family president b. ...V<r>v... charity American c. ...VC<r>v... African integrity PR-1 <i> <o>, /o[??]/ /<y>, /ai/ ...V# I, my go PR2.1 <i>/ <o>, /a: / <y>, /1 / a. ...VC# pin hot b. ...VCC# myth block c. ...VCCC# bitch blotch d. ...VCC<e># bridge lodge PR2.2 <i>/ <o>, /a: / <y>, /i/ a. ...VCCv ... issesu office b. ...VCCC ... hindrance cockle c. .. .V<rr>v... mirror NA PR3.1 <i>/ <o>, /oo/ <y>, / ai/ ...VC<e># time hom PR3.2 <i>/ <o>, /oo/ <y>, / ai/ a. ...VCv ... final open b. .VC<r>v. micron copra c. ...VC<l>v... title noble PR4.1 <i>/ <o>, / [??]: / <y>, / [??]: / a. ...V<r># fir fro b. ...V<r>C# girl report c. ...V<r>CC# birth scorch d. ...V<r>C<e># dirge horse PR4.2 <i>/<y>, <o>, / [??]: / / [??]: / a. ...V<r>Cv... virtual important b. ...V<r>CC... myrtle northern PR5.1 <i> <o>, /[??]: / /<y>, / ai/ VC<re># fire vimoe PR5.2 <i> <o>, /[??]: / /<y>, / ai/ V<r>v... viHOU sitor PR6 Proparoxytones <i> <o>, /a: / /<y>, /i/ a. ...VCv ... significant policy b. ...V<r>v... miracl NA c. ...VC<r>v... fibrillate Socrates PR-1 <u>, /(j)u:/ ...V# gnu PR2.1 <u>, / a / a. ...VC# mud b. ...VCC# lung c. ...VCCC# gulch d. ...VCC<e># budge PR2.2 <u>, / a/ a. ...VCCv ... number b. ...VCCC ... buckl c. .. .V<rr>v... NA PR3.1 <u>, /(j)u:/ ...VC<e># include PR3.2 <u>, /(j)u:/ a. ...VCv ... human b. .VC<r>v. nutrient c. ...VC<l>v... bugle PR4.1 <u>, / [??]: / a. ...V<r># fur b. ...V<r>C# rer-urn c. ...V<r>CC# burnt d. ...V<r>C<e># nurse PR4.2 <u>, / [??]: / a. ...V<r>Cv... surface b. ...V<r>CC... purchase PR5.1 <u>, / [??]r/ VC<re># ensure PR5.2 <u>, / [??]r/ V<r>v... jury PR6 Proparoxytones <u>, / [??]: / a. ...VCv ... NA b. ...V<r>v... NA c. ...VC<r>v... NA PR: Pronunciation rule; NA: Not applicable Table 2. Reduction of Davies's wordlist Total words 5,000 Special words, acronyms, repetitions, non-computed 1,066 derivatives, compounds, etc. Computed words (ref.) 3,934 100% Words with stressed unigraph 3,005 76% Words with digraphs or trigraphs 929 24% Regular stressed unigraph words 2,514 64% Irregular stressed unigraph words 491 12% Table 3. Tagging sample Frequency (rank) Item Unigraph Type Reliability 5 a a PR1 n 6 in i PR2.1 y 7 to o PR1 n 8 have a PR3.1 n 10 it i PR2.1 y 11 I i PR1 y 12 that a PR2.1 y 13 for o PR4.1 y Table 4. Pronunciation Rule 3.2 PR3.2 Domain Item Regular Irregular Ratio <a> 193 140 53 72% <e> 69 31 38 44% <i> 106 32 74 30% <y> 3 2 i 66% <o> 100 51 49 51% <u> 69 57 12 82% Totals 540 3I3 227 58% Table 5. Regularity at level one Level 1 Level 1 PR1 77% PR4.1 90% PR2.1 73% PR4.2 94% PR2.2 83% PR5.1 86% PR3.1 81% PR5.2 93% PR3.2 47% PR6 97% Figure 1. Rule regularity, percentage. Ranking display PR4.2 98% PR5.1 96% PR6 93% PR2.2 92% PR4.1 92% PR3.1 89% PR1 85% PR5.2 83% PR2.1 83% PR3.2 58% Note: Table made from bar graph. Figure 3. Frequency of rules, percentage. Ranking display PR2.2 24.2 PR2.1 23.8 PR3.2 18.0 PR6 10.5 PR3.1 9.8 PR4.1 4.4 PR4.2 4.4 PR5.2 2.0 PR5.1 1.7 PR1 1.3 Note: Table made from bar graph.
|Printer friendly Cite/link Email Feedback|
|Publication:||Atlantis, revista de la Asociacion Espanola de Estudios Anglo-Norteamericanos|
|Date:||Dec 1, 2018|
|Previous Article:||Evaluacion del "estatus" como herramienta persuasiva en debates preelectorales espanoles y norteamericanos en tiempos de crisis.|
|Next Article:||Alliterative Metre and Medieval English Literary History. A Critical Review of.|