Shouldn't it be breakfunch? A quantitative analysis of blend structure in English *.
The present paper investigates the word-formation process of blending in English. Following a brief review of several previous classificatory studies, the paper analyzes the orthographic and phonemic structure of blends on a quantitative basis. The main factors to be discussed are (i) the amount of information each source word contributes (following Kaunisto 2000a, 2000b) and (ii) the similarity of the source words to the blend Several points of critique concerning previous analyses are raised and improved upon by introducing reliable ways of operationalization and statistical testing. The results show that the amount of material contributed by the words is determined by the degree of recognizability of the source words and that the similarity of source words to the blend plays a vital role in blend formation. The paper concludes by further validating these results on the basis of a comparison of intentional blends to speech-error blends and pointing out potentially rewarding avenues of further research that is currently being conducted.
This paper is concerned with a morphological process in English that is commonly referred to as (lexical) blending. Blending is a frequent and productive word-formation process that can be defined as follows: blending involves the coinage of a new lexeme by fusing parts of at least two other source words of which either one is shortened in the fusion and/or where there is some form of phonemic or graphemic overlap of the source words; some typical and well-known examples are given in (1).
(1) a. br(eakfast) x (l)unch [right arrow] brunch b. mot(or) x (h)otel [right arrow] motel c. fanta(stic) x (f)abulous [right arrow] fantabulous d. fool x (phi)losopher [right arrow] foolosopher
In previous analyses, blending has mainly been investigated in terms of the following questions:
(2) How can blending be distinguished from other word-formation processes?
(3) How can different kinds of blends be distinguished from one another?
(4) Why do blends have the structure they have? Put differently, why are blends created the way they are?
The present study is most of all concerned with the third question, but in order to understand the make-up and the database of the study, it is necessary to briefly turn to findings concerning the first two questions.
Blending (for the moment, I will restrict myself to non-speech-error blends) has been investigated in a variety of studies, most of which are classificatory in nature and focus on the questions (2) and (3) from above (examples include Pound 1914; Algeo 1977; Cannon 1986; Stekauer 1991). Unfortunately, the criteria that were adopted as a basis of comparison were often diverse, difficult to operationalize objectively and sometimes not even adhered to consistently. As Cannon (1986: 729) put it, "a chronological tracing of the definitions of blend reveals no consistent refining of the parameters." To give an idea of which parameters previous researchers were concerned with and as a basis for the empirical investigation to follow, let us look at only a few examples. Section 1.1 is concerned with the distinction between blends and other derivational processes; section 1.2 briefly introduces different kinds of taxonomies and structural analyses of blends.
1.1. Classificatory studies: blends vs. other derivational processes
In one of the earliest studies, Pound (1914: 1) analyzes 314 blends, proposing the following definition:
Blend-words [...] may be defined as two or more words, often of cognate sense, telescoped as it were into one; as factitious conflations which retain, for a while at least, the suggestive power of their various elements.
She argues that blends have to be distinguished from (among other things)
--analogical extensions or enlargements (such as judgmatical [judgment x dogmatical]) because (i)judgmatical does not imply the meaning of dogmatical and, thus, no semantic fusion has occurred and (ii) such forms are "generally unintentional" whereas blends are "often conscious or intentional" (1914: 7); however, on the same page, she acknowledges that neither criterion is failsafe;
--whimsical folk-etymological perversions (such as jawbacious [jaw x audacious]) because of their folk-etymological origin--again, however, Pound admits that "the subjects of folk-etymology and blending do merge. The test of motive in origin is not always either a clear nor a trustworthy guide" (1914: 8);
--agglutinative or elliptical forms or contractions of frequently co-occurring expressions (such as starkarageous [stark x outrageous]) because the "predominant motive in their formation was clearly elliptical" (1914: 9). There are some problems with this distinction: first, while Pound does not count them as blends, she nevertheless says "[t]hese [contractions] are undoubtedly blends" (1914: 9), but does not provide a motive for blend creation according to which "real" blends and her contractions can be distinguished. Second, some expressions she considers contractions are definitely not blends in any sense: Frisco (from San Francisco), for example, does not involve the fusion of elements of two words at all. Finally, as before, Pound claims that in some cases the distinction is not an absolute one (cf. 1914: 11).
Algeo's (1977: 48) definition of blends is similar to the one I proposed above: Blending refers to "a combination of two or more forms, at least one of which has been shortened in the process of combination." This definition is based on structural characteristics and implies that, for example, cases where full forms combine without overlap do not count as blends but rather as compounds (cf. 1977: 54); examples of non-blends mentioned include squandermania, daisy (historically a compound, namely day's eye) and meritocracy ("a derivative with the combining form -ocracy" [1977: 54]). However, I believe the case of meritocracy is a difficult one since, strictly speaking, meritocracy can be argued to be covered by Algeo's definition of blends (merit x aristocracy), so it seems as if the definition is either not followed by consistently or is in need of refinement in terms of additional criteria. (1) Additionally, he also points out some cases where the dividing line between blends and other derivational processes is far from clear: for instance, while breadth can be analyzed as a blend (OE brede x length), it is equally plausible an instance of analogical extension following the pattern long [right arrow] length : broad [right arrow] x. Also, he argues that in cases like dumbfound (dumb x confound) blending may be difficult to distinguish from what he calls free composition (cf. 1977: 51).
Cannon's (1986) paper is based on an analysis of 132 written English blends. After a thorough review of the literature, he formulates a definition which also gives some criteria that are, although not necessary or sufficient, characteristic of the most typical blends:
[...] a blend involves a telescoping of two or more separate forms into one, or, rarely, a superposition of one form upon another. It usually contains overlapping and preserves some of the meaning of at least one of the source words, though sometimes so much of the roots are lost that a blend is unanalyzable. (Cannon 1986: 730; his emphasis, STG)
However, little explicit discussion of how blends differ from superficially similar phenomena can be found. For instance, Cannon does not address the question raised by Algeo (1977) and Pound (1914) whether forms like radarange (what Pound and Algeo would have called a contraction) do constitute blends or not.
Bauer's (1983: 234) definition of blends is "[a] blend may be defined as a new lexeme formed from parts of two (or possibly more) other words in such a way that there is no transparent analysis into morphs," but already the following sentence questions his own definition by (correctly) pointing out that "in many cases some kind of analysis can be made [because] at least one of the elements is transparently recoverable." Later on, he adds that "blends normally take the first part of one word and the last part of another" (1983: 235). As to distinguishing blends from other derivational processes, he points out cases where one source word is left intact in the blend, which might therefore be analyzed as the addition of one source word to a case of clipping (examples include mocamp [motor x camp] and Amtrack [American x track]), but he does not seem to take a definite stand on how to resolve the issue. It is hard to see, however, how mocamp fits into, for example, his own traditional definition of compounds since mo is not a word or a free morpheme (cf. also Stekauer 1991: 27).
Stekauer (1991) is a typical example of the classificatory approach towards blends. His definition is, strictly speaking, slightly circular: blends "have resulted from two motivating words which have been blended [...] into a new coinage which is unanalysable into determinant and determinatum, thus representing monemes" (1991: 26). Like others before him, he points to the importance of phonemic overlap in distinguishing blends from compounds and, following Pound (1914), contends that elliptical forms (such as trafficator [traffic x indicator]) are not blends as they do not constitute a new meaning resulting from the blending process.
Finally, let us turn to Kemmer (2003), who adopts Bauer's (1983: 234) definition of blends: "a new lexeme formed from parts of two or more other lexemes." Like others, she comments on the role played by phonemic overlap and phonemic as well as phonological similarity, correctly emphasizing that these properties are not necessary conditions for lexical blends. She summarizes as follows:
Blends combine parts of lexical sourcewords, rather than whole sourcewords; this distinguishes them from compounds. Morphological structure is not particularly relevant to blends. [...] Phonological properties are highly relevant to blending; phonological similarity of the blend with part or whole source words increases the likelihood or felicity [...] of the blend. (Kemmer 2003: 75)
This brief characterization of previous accounts of the distinction between blends and related/similar products of word-formation processes highlights the most important features figuring in the definition of blends; for a more thorough overview, the reader is referred to the comprehensive survey by Cannon (1986). The following section will look briefly at the different kinds of blends that were proposed.
1.2. Classificatory studies: taxonomies and the structure of blends
Pound (1914: 20ff.) merely proposes a list of nonexclusive labels such as clever literary coinages, nonce blends (i.e. speech errors), conscious folk formations, etc., where the exact basis of the underlying criteria and the question of how to limit this inventory remain opaque. As to formal/ structural aspects of blends, she states that "no very definite grouping seems advisable" (1914: 22), given that virtually every source word can be altered in seemingly unpredictable ways and that the number of syllables of blends do not display patterns lending themselves to easy explanation etc.
Much more insightful is, again, the work by Algeo (1977). He develops two kinds of classifications of blends. On the one hand, he distinguishes three structural groups of blends:
--blends with phonemic overlap; this group is subdivided on the basis of (i) where and what kind of shortening and overlap occurs and (ii) whether the phonemic overlap is one of full segments or one of distinctive articulartory features;
--blends with clipping (with subdivisions concerning the numbers and locations of the clippings);
--blends with phonemic overlap and clipping.
His second classification contrasts syntagmatic blends (so-called telescope blends of source words that usually co-occur sequentially like radarange [radar x range]) and associative blends (blends of source words that were [usually semantically] linked in the wordmaker's mind). As to the definition of syntagmatic blends, Algeo seems somewhat indecisive because in the same section he states that "a consistent taxonomy would regard them merely as contractions" (1977: 56).
Cannon (1986) proposes criteria similar to those of Algeo (e.g. by looking at the overlap of source words in blends and the location of the point of fusion), but includes some more parameters such as word classes, syllabic lengths, and morphological properties of the source words, semantic groups of the denotata of blends, etc. Simply speaking, all possibly relevant information is catalogued, (2) but when it comes to theoretical conclusions bearing on the structure of blends Cannon appears a bit indecisive. On the one hand, he correctly notes that "a blend should not differ very much in form and meaning from its sources" and "the major parts of the source words should be preserved" (1986: 739)--on the other hand, he points out that "our blends are little illuminated by an analysis of sound, phonotactics, and the tiny bit of rhyme [...]. Their segments are too varied to suggest any propensities for blending" (1986: 746), thus offering no definite conclusions.
Bauer (1983) is concerned with by now already familiar distinctions. He mainly differentiates between (i) blends where only parts of the original words figure in the coinage, for example, chunnel (channel x tunnel), and (ii) blends where the two words used as the bases are both present in their entirety, for example, glasphalt (glass x asphalt), involving overlap in pronunciation, spelling or both. (3) An additional group is discussed, namely that where the blend looks as if it is "analysable in terms of other word-formation processes, in particular as a neo-classical compound" (1983: 236), for example, autocide (automobile x suicide).
Stekauer (1991: 30) merely proposes an "onomasiological" classification of blends (arguing for an improvement over purely formal classifications) and discusses various individual examples; his conclusions, however, do not seem to go beyond previous research.
Finally, let us turn to Kemmer (2003). She introduces a terminological distinction between intercalative blends ("in which the two words in the blend are so tightly integrated [...] that the sounds of one source word are interspersed between the other" (2003: 72), for example, chortle [chuckle x snort]) and nonintercalative or sequential blends. There are two problems with this distinction: on the one hand, Kemmer states that "[t]here are no intercalative blends in my data that do not also have a possible non-intercalative analysis" (2003: 72), which, if true, raises the question of the explanatory value of this distinction (cf. Occam's razor). On the other hand, Kemmer undermines her own distinction by citing examples which are in fact intercalative without having a linear analysis, namely chortle and slithy (slimy x lithe). (4)
As to other distinctive parameters, Kemmer also notes the frequency of overlap blends (i.e. blends involving segments which are shared by both source words and are located in the area where fusion occurs) and substitution blends (i.e. blends where one part of a source word replaces one part of the other source word without overlapping material). She also comments on the degree of similarity between the two source words and notes (like Algeo 1977) that similarity need not involve segments, but that overlap in terms of distinctive features may sometimes suffice.
While the above has only been a very brief summary of (part of) the voluminous literature on blending in English, it has demonstrated that (i) there is a variety of parameters along which blends do vary and (ii) that previous analyses do also vary strongly as to how blends should be defined: most propose purely structural definitions which are constrained by psychological, historical, or semantic considerations and always need to be taken with a grain of salt, given that many criteria are not absolute.
While these issues are not central to the points to be made here, they were necessary in order to introduce the database of the present study. For the remainder of the paper, I will restrict myself to some objectively identifiable structural parameters (mainly following Algeo 1977) according to which blends can be distinguished, omitting other criteria altogether (given their limited viability across all cases):
--the number of source words entering into the blend: commonly one distinguishes between blends resulting from two source words and blends resulting from more than two source words; given the scarcity of blends resulting from the conflation of more than two words, I will restrict my attention to blends resulting from two source words only;
--the number and kinds of words which are shortened in the blend: in the case of two-word blends, both forms or only one of the two forms can be shortened;
--the kind of conflation resulting in the blend: usually, blends result from a juxtaposition of the beginning of the first source word and the end of the second source word (with or without graphemic and/or phonemic overlap); I will call this process linear blending and the positions where the words fuse are hereafter referred to as breakpoints--however, in some much less frequent cases one source word is altered by some part of the other source word;
--the presence or absence of overlap of the source words in the blend: sometimes, the blend involves sequences of graphemes and/or segments that occur in both source words.
The various forms of blends resulting from this classification are exemplified in Table 1; the most prototypical examples of blends involve linear blending with a shortening of both source words at some point of (graphemic or phonemic) overlap (cf. Kubozono 1990: 4).
Note that this classification does not only accommodate blends--rather, other word-formation processes such as compounds or complex clippings find a natural place within this categorization. My own corpus of blends contains 585 examples from
--previous studies, more precisely from Adams (1973), Akmajian et al. (1995 ), Algeo (1977), Bauer (1983), Bryant (1974), Cannon (1986), Irwin (1939), Kaunisto (2000a, 2000b), Kelly (1998), Kemmer (2003), Murray (1995), Pound (1914), and Stekauer (1991);
--the Oxford English Dictionary on CD-ROM (version 1.15) (search word: blend);
--the Encyclopedia Britannica 2000 (CD-version; s. v. blend);
--the internet pages of the course Linguistics/English 215, Words in English: Structure, History and Use, taught by Suzanne Kemmer at Rice University (www.owlnet.rice. edu/~ling215);
--a summary on the LinguistList (issue 11.1378) by Suzanne Kemmer. Of these 585 blends, a majority of 541 (92.5%) are linear blends proper. With this database, the present study is based on one of the largest blend corpora analyzed so far.
1.3. Structural aspects of blends: review and points of critique
The above review has shown that while there are many purely classificatory approaches, much less seems to be known about why blends have the structure they have or, put differently, are assembled the way they are. Apart from the analyses mentioned above, which point to a large degree of variation, to the difficulty to detect major patterns and to the impossibility to formulate general patterns, etc., what we find is the following:
In his discussion of how dove and hawk are to be blended, Bauer (1983: 235) considers dawk and hove the only possible alternatives, but "the choice of one rather than the other would appear to be fairly arbitrary, although the Sprachgefuhl of the native speaker may find one more suitable than the other"--since the notion of Sprachgefuhl is not clarified further, however, this comment is little helpful. Moreover, he speculates "[i]t seems likely that there is not a single 'right answer' when searching for a blend, and that the blend chosen is at least partially random" (1983: 235); more succinctly,
in blending, the coiner is apparently free to take as much or as little from either base as is felt to be necessary or desirable. [...] Exactly what the restrictions are, however, beyond pronounceability and spellability is far from clear. One seems to be the rejection of forms that lead to the splitting up of consonant clusters from either of the original words, but this may be a spurious restriction. (1983: 235)
Cannon (1986: 746) is, in some respect, more informative: "The structure of the longer word of the two source words usually dictates the maximum number of syllables, as well as the primary stress." Also he provides some quantitative information on different structural kinds of blends (cf. 1986: 747), but in general he seems equally pessimistic: "So we find no discernible relationship between phonology [...] and a viable blend. [...] This fact helps to make blends one of the most unpredictable categories of word formation" (1986: 744).
Kemmer makes the interesting proposal that the creation of a blend involves creating a balance between two competing factors, namely (i) the recognizability of the source words and (ii) the similarity of syllable structure (including stress). This, she argues, makes it "impossible to state a general formal rule that will license some blends and exclude others" because "speakers are operating with a facility for global patternmatching" (2003: 77). This proposal lies at the heart of the present analysis, and we will return to it later.
Two studies not mentioned so far, however, are directly concerned with the structural character of blends, namely Kelly (1998) and Kaunisto (2000a, 2000b). Kelly (1998) differs from all studies discussed so far in many respects. First, he is not so much concerned with developing classifications of blends, but rather with cognitive and linguistic determinants governing blend structure. Second, his study is empirical/quantitative in the sense that he tests his hypotheses on the basis of a corpus (or 426 blends, not all of which, however, could be used in every single part of his study) using inferential statistics. His findings can be summarized as follows:
--in general, the first source words of those two-word blends that could be expanded into coordinate phrases are significantly shorter, (6) significantly more frequent and denote significantly more prototypical category members than the second source words;
--the breakpoints of blends (without overlap and without phonemic alterations in the blend) occur significantly more often at syllable/word breaks than elsewhere; additionally, within-syllable breaks preferably preserved the rime;
--the boundaries of blends in which an expected consonant from the first component was supplanted by a different consonant from the second component are significantly more similar to one another (in terms of the sonority hierarchy) than might be expected by chance.
In sum, Kelly shows convincingly that the amount of variability so frequently commented on by previous authors is in fact far less arbitrary than has hitherto been assumed. Some minor problems need to be recognized, though. As to the first case study, on the one hand, it intuitively seems to make sense to restrict the analysis of source-word frequencies to those blends that can be expanded into coordinate structures (namely in order to have a motivation to apply other variables also known to influence coordinate structures). On the other hand, there is no reason not to include noncoordinate blends (such as Westralia [West x Australia]) in the analysis as well to see whether frequency plays a role for all kinds of blends, especially since Kelly is generally sympathetic towards the application of evidence from speech-error blends to intentional blends and there is evidence that frequency plays a role for speech-error blends (cf. MacKay 1973).
A second objection is that his way of determining the syllabic structure of the source words is orthographic (using the American Heritage Dictionary ), although most other studies have claimed that blends are more of a phonological phenomenon (cf. note 3). In addition, as Kelly himself points out (cf. 1998: 585), the issue of ambisyllabicity is not resolved.
Finally, note that Kelly investigates the play with word junctures in 33 linear blends by looking at the similarity of the end of the first source word and the beginning of the second source word in terms of the overlapping consonant (in terms of the sonority hierarchy), speculating that "blends do tend do be arranged so that the boundary involves similar phonemes" (1998: 587). The first thing to be noted is the extremely small part of Kelly's database entering into this part of the analysis, but a more serious drawback is the following. Given some of the examples explicitly discussed in this section of his paper, his methodology amounts to strongly downplaying the actual degree of similarity between the source words and the linear blends. For instance, he investigates the linear blend clantastical (clandestine x fantastical) by looking at the degree of similarity between the [d] of clandestine and the first [t] of fantastical, but neglects the fact that the two source words display a much higher degree of similarity AROUND the breakpoint--I believe, it would be more rewarding to look at the size of the overlap and the degree to which similarity holds before and after the (overlap) breakpoint; I will return to and exemplify this point in detail below. However, in spite of these points of critique, I believe Kelly's study constitutes a first step towards analyses of blend structure transcending previous classificatory approaches.
Kaunisto (2000a, 2000b) pursues a similar objective, that is, the analysis of blend structure in terms of cognitive principles. On the basis of Bergstrom's (1906) proposal to investigate the quantity of the contribution of each element in each different case, Kaunisto makes the interesting suggestion that
[i]t may be argued that the deletion of any items from the source words presents a certain amount of "danger" or "threat" as to the understandability of the final blend word. Ideal blends then would naturally be ones where the ending of the first source word and the beginning of the second source one overlap, resulting in a way in no deletion at all. (Kaunisto 2000a: 49)
He goes on to argue that one would expect the shorter word to contribute a larger percentage of it to the blend than the longer word in order to preserve its recognizability. Consider Figure 1 for the blend brunch.
[FIGURE 1 OMITTED]
Kaunisto's prediction is borne out: lunch has fewer letters but contributes more of itself (namely 80%) to the blend than the longer breakfast, which contributes only 22.2% of itself to the blend. (7) Of the 101 blends he analyzed, 55.4% behave as predicted while 16.8% do not (the remainder are blends in which both source words are present in their entirety and blends deriving from equally long source words).
Still though, there are some problems with this way of analysis. First, Kaunisto does not subject the results to standard tests of significance, leaving us, strictly speaking, with uncertainty as to the generalizability of his result.
Secondly and more importantly, Kaunisto's investigation is based on the source words' graphemic contributions to the blend although we have already seen that most if not all researchers have rather emphasized the phonemic and phonological structure of blends. Thus, this variable needs to be included in the analysis.
Thirdly, I believe that Kaunisto's approach overlooks something quite important. His approach is based on measuring the recognizability of the source words in the blend on the basis of the letters they contribute to the blend. However, in the vast majority of blends, the two source words contribute different portions of themselves: typically, the first lexeme contributes its beginning whereas the second lexeme its end. However, previous studies have shown that x segments of the beginning of a word increase its chance of being recognized more than the same number of segments of its end (cf. Noteboom 1981). Take, for example, the blend grudge (grutch x gredge), where both words contribute an equal amount and an equal percentage of letters from themselves to the blend. On the bails of the above studies, I would expect that the three letters (50%) of the first source word (<gru>) make it easier for the hearer to identify grutch as the first source word than the three letters (50%) of the second source word (<dge>) enable the hearer to identify gredge as the second source word. (8) I would therefore expect that, if recognizability plays indeed the role Kemmer and Kaunisto claim, the second source word should, on average, contribute slightly more material to the blend, a hypothesis to be tested below.
Finally, the question may be posed how Kaunisto approaches blends without a clear breakpoint, that is, those where we find common graphemes (or phonemes, for that matter) outside of the overlap area; recall that these common elements were also neglected in Kelly's analysis who investigated the degree of similarity between the blend words only with respect to the phonemic similarity of the consonant at the breakpoint. Consider, for example, fantabulous (fantastic x fabulous) with two different kinds of analyses represented in Figure 2 and Figure 3 respectively. Figure 2 analyzes the contributions of each source word only up to the breakpoint or point of fusion (and will be referred to as analysis 1) whereas Figure 3 also considers source words' contributions before and after the breakpoint (and will be referred to as analysis 2).
[FIGURES 2-3 OMITTED]
While both of these approaches yield identical results for fantabulous, we can easily find cases with contradictory results. Let us thus look at Kelly's treatment of similarity and Kaunisto's account of informativeness at the same time by considering chunnel (channel x tunnel) and its two analyses in Figure 4 and Figure 5.
[FIGURES 4-5 OMITTED]
First, consider Figure 4, which is, again for expository reasons only, concerned with graphemes only. Kelly's phonemic analysis would consist of assessing the similarity of [t[integral]] in channel to the [t] in tunnel in terms of the sonority hierarchy; in terms of traditional articulatory features, we would conclude that the similarity is moderately high since [f[integral]] and [t] share the features [-voiced] (for voicing) and [+alveolar] (for place of articulation) while they differ in their manner of articulation ([+affricate] and [+plosive] for [I[integral]] and [t] respectively). Turning to Kaunisto's analysis, we see that his claim is borne out by the data: tunnel is the shorter word and contributes more (in percent of graphemes) to the blend than channel.
Analysis 2 in Figure 5, however, yields very different results. With a less constrained view of similarity, we find what we would already assume on an intuitive basis, namely that channel and tunnel strongly overlap in terms of graphemes and the similarity is even more obvious once we also consider similarities in terms of articulatory features since both source words fit the pattern in (5), where "|" denotes boundaries between segments. (10)
(5) [-voiced] [+alveolar][absolute value of[-rounded] [-high]] [n] [absolute value of[e]] [l]
Thus, while Kelly's assumption that blends play with word similarity is generally on the right track, his definition of similarity turns out to be too narrow: it reduces the degree of phonological similarity to the breakpoint although speakers rather exploit the overall similarity of the two source words--perhaps the former is not even noticed at all, something that can hardly be assumed of the latter. Thus, for a more adequate analysis, a better measure of similarity is necessary and will be introduced in section 3 below.
When we return to Kaunisto's claim, the situation has changed with the second way of analysis: while tunnel is of course still shorter, it is now channel that contributes more (in percent) to the blend so his claim is not supported anymore. Note also that there are various other examples with similar results such as formamide (formite x amide), hesiflation (hesitation x inflation), preet (pretty x sweet) to name but a few.
In sum, the recent studies of Kelly (1998), Kemmer (2000), and Kaunisto (2000a, 2000b), although going beyond previous studies and yielding interesting results and/or hypotheses, also suffer from some drawbacks upon which I would like to improve. The following section outlines my methodology in some detail. The present paper is therefore concerned with quantitative aspects governing the structure of blends, namely (i) the question which source word contributes more to the blend (cf. section 2) and, related to that, (ii) the question of to what degree similarity of source words and blends is exploited in blend formation (cf. section 3).
2. Case study 1: contributions of source words
For each blend in my data, I determined the graphemic/phonemic contributions of each source word (henceforth SW) to the blend according to both analyses introduced above (cf. Figure 4 and Figure 5) as well as their graphemic and the phonemic lengths. (11) The resulting data set was then analyzed in two steps. First, I did a loglinear analysis with the variables and variable levels listed in (6).
(6) LENGTH: S[W.sub.1] = S[W.sub.2] (both source words are equally long); S[W.sub.1] > S[W.sub.2]; S[W.sub.1] < S[W.sub.2] CONTRIBUTION: S[W.sub.1] = S[W.sub.2] (both source words contribute equally much); S[W.sub.1] > S[W.sub.2]; S[W.sub.1] < S[W.sub.2] MEDIUM: spoken vs. written ANALYSIS: analysis 1 vs. analysis 2
On the basis of Kaunisto's earlier work, we would expect a significant interaction between LENGTH and CONTRIBUTION such that high frequencies are expected for LENGTH: S[W.sub.1] > S[W.sub.2] x CONTRIBUTION: S[W.sub.1] < S[W.sub.2] as well as LENGTH: S[W.sub.1] < S[W.sub.2] x CONTRIBUTION; S[W.sub.1] > S[W.sub.2]. Also, we would expect a main effect of CONTRIBUTION: S[W.sub.1] < S[W.sub.2] such that these cases should be more frequent than expected.
Second, the frequencies for which specific predictions were derived above were tested with a configural frequency analysis (CFA; cf. von Eye 1990) with Holm's correction for multiple post hoc (binomial) tests.
According to the loglinear analysis, all interactions of more than two variables failed to reach significance; the best model (in terms of parsimony and goodness-of-fit; [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; df = 21; p = 0.733) involved the significant effects represented in Table 2.
We find strong general preferences such that (i) S[W.sub.2] tends to be longer, (12) and (ii) S[W.sub.2] contributes more of itself to the blend. However, the interpretation of these main effects must be qualified with a view to the two-way effects, for some of which Kaunisto's predictions are relevant.
The results for CONTRIBUTION X LENGTH demonstrate that Kaunisto's hypothesis is indeed strongly supported: the two combinations with the highest absolute parameter estimates show that, when S[W.sub.1] is longer, then S[W.sub.2] contributes more, and when S[W.sub.2] is longer, then S[W.sub.1] contributes more. What is more, we even find a strongly negative parameter estimate for cases where S[W.sub.2] is longer and contributes more to the blend, which is also in accordance with the prediction. All these results are even strongly supported by those of the CFA for these cell frequencies: all sixteen possible combinations of (LENGTH and CONTRIBUTION) x (MEDIUM and ANALYSIS) for which Kaunisto's predictions hold are among the strongest significant types and antitypes (as ranked by the Q coefficient of pronouncedness).
In addition to the predicted effects, we also find that when both source words are equally long, they strongly tend to contribute to the blend equally. While this result was not anticipated, it is, I believe, not difficult to explain a posteriori: we have seen above that blends play with word similarity. That is, in cases where both source words are equally long such as snark (snake x shark) or meld (melt x weld), the fact that the blend is as long as each source word and that each source word contributes an equal number of graphemes (around some shared amount of overlap) further increases the similarity and, thus, the playful character blends tend to exhibit.
Finally, we also find some that CONTRIBUTION interacts separately with MEDIUM and with ANALYSIS: (i) in the spoken medium, both source words contribute to the blend equally, slightly more often than expected; and, (ii) with analysis 1, S[W.sub.2] contributes more to the blend slightly more often. However, these effects do not play a major role for several reasons: first, if MEDIUM and ANALYSIS are not included in the model in the first place, the model's expected frequencies already do not deviate significantly from the observed ones anymore ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]; df = 27; p= 0.119). Secondly, the parameter estimates for these interactions are very small compared to those of all other significant effects. Finally, these effects are the only ones not supported by the CFA. For CONTRIBUTION x MEDIUM, two thirds of the configurations have p-values larger than 0.05 and all exhibit very small Q-values; for CONTRIBUTION x ANALYSIS, all of the significant types and antitypes can already be explained on the basis of the interaction between LENGTH and CONTRIBUTION. Given these results and the fact that they do not bear upon the present hypothesis, I will not discuss these interactions here.
In sum, we find solid evidence for an account of blend structure in terms of information quantity and recognizability of source words; it is interesting to note in passing that, contrary to what was considered plausible above at first, this principle is strong enough not to be influenced by the medium in which the analysis is conducted and the way of analysis. This is interesting since it underlines the strong graphemic influence on blend formation. Hence, although most researchers have pointed out that blends exhibit a multitude of difficult-to-motivate structures, the data strongly support the idea that cognitive and information-processing capacities of human beings impose boundaries on which blends are likely to occur. The following section will be concerned with the degree to which the similarity between blends and source words further contributes to blend formation.
3. Case study 2: similarity
As a first step, we need an objective quantitative measure of the similarity between source words and blends that is powerful enough to not only include the breakpoint. (13) Second, if the similarity of the source words to the blend can be quantified as required, it is still not immediately obvious what amount of similarity we would assume on the basis of a random distribution. Let us tackle these problems one by one.
First, how would a more appropriate quantification of the similarity of the source words to their corresponding blend look like? Intuitively and ideally, it would have to be a similarity index (hence S[I.sub.G] and S[I.sub.p] for graphemes and phonemes respectively) that (i) is fairly high for cases like chunnel, (ii) fairly low for cases like brunch, and (iii) fixed to a set interval of possible values so that SI values of particular blends can be compared easily. By analogy to Tversky's (1977) contrast model, where similarity increases/decreases with increasing/decreasing numbers of shared features, I suggest to assess the proportion of graphemes (or phonemes and articulatory features) each word contributes to the blend according to analysis 2 TOGETHER WITH the proportion these graphemes/phonemes make up in the blend. In the present example, chunnel consists of seven graphemes, six of which are contributed by the seven-letter word channel and five of which are contributed by the six-letter word tunnel. That is to say, 85.7% (6 letters out of 7) of channel make up 85.7% (6 letters out of 7) of chunnel while 83.3% (5 letters out of 6) of tunnel make up 71.4% (5 letters out of 7) of chunnel (the sum is higher than 100% because of the overlapping graphemes), resulting in (7).
(7) S[I.sub.G(chunnel)] = (6/7 x 6/7) + (5/6 x 5/7)/2 = (0.857 x 0.857) + (0.833 x 0.714)/2
= 0.735 + 0.595/2 [congruent to] 0.665
SI can take on values between 0 and 1, (14) and the two factors in the numerator of the last fraction indicate the similarity of each source word to the blend: according to these figures, channel is graphemically more similar to chunnel than tunnel, which makes sense since, if both source words contribute all but one letter to the blend, then the longer source word is in fact more similar to the blend.
While channel is, as postulated, a case with a fairy high value of SI, let us look at a case where we intuitively feel that the source words are less well integrated in the blend, for example, brunch. As is easy to verify, brunch consists of six graphemes, two of which are contributed by the nine-letter word breakfast and four of which are contributed by the five-letter word lunch, resulting in the equation in (8), where, as anticipated, S[I.sub.G](brunch) is much smaller than S[I.sub.G(chunnel)], representing quantitatively what we feel intuitively.
(8) S[I.sub.G(brunch)] = (2/9 x 2/6) + (4/5 x 4/6)/2 = (0.222 x 0.333) + (0.8 x 0.667)/2
= 0.074 + 0.533/2 [congruent to] 0.304
The second issue to be resolved is how do SI's of random words look like? Obviously, it would be absurd to assume 0 similarity for random words since most words are similar to one another to some degree, which is why a different way of estimating results of random blends is necessary. In order to guarantee that the random blends are representative of real blends, two criteria had to be taken into consideration: first, I determined the distributions of source-word lengths for real blends and sorted them into three groups (since it is unlikely that blend producers monitor exact graphemic/phonemic length relations of source words):
--S[W.sub.1] is maximally 1 element (grapheme/phoneme) longer or shorter than S[W.sub.2] ([approximately equal to] 34% in my corpus);
--S[W.sub.1] is a little longer or a little shorter (2 or 3 elements) than S[W.sub.2] ([approximately equal to] 36%);
--S[W.sub.1] is much longer or much shorter (4 or more elements) than S[W.sub.2] ([approximately equal to] 30%).
Six pairs of random words were randomly chosen such that each of these three groups is represented by two pairs of source words; the words of each pair belonged to the same word class. Then I coined all possible linear blends out of each pair of words. Let me explain this procedure by looking at one example, namely the graphemic blends of the words strong and powerful. I started with strong as S[W.sub.1] and attached successively smaller parts of powerful to it, resulting in strongowerful, strong-werful, strongerful, etc., up to strongl. Then, strong was shortened by one letter to stron, to which powerful and again successively shorter parts of powerful were added. (15) This process stopped after the shortest possible blend of strong and powerful, namely sul. Doing this for all six word pairs resulted in 228 graphemic and 146 phonemic blends, the length frequencies of which were approximately normally distributed (just like those of the authentic blends), so any effects to be found cannot be attributed to differences in length distributions. Then, the average S[I.sub.G] and S[I.sub.p] values were computed for both the authentic blends and the simulated blends; a t-test (Welch) was performed to determine whether the difference between blends and the random word pairs is significant or not. Consider Table 3.
Obviously, both SI values of the authentic blends are significantly higher than the corresponding SI values for the simulated blends (for S[I.sub.G]: [t.sub.Welch] = 11.7; df= 299; p < 0.001; for S[I.sub.p]: [t.sub.Welch] = 10.9; df = 160; p < 0.001). In other words, we have very strong empirical evidence for the hypothesis that the similarity between the source words and the blends indeed plays a decisive role for blend formation. Also, it is obvious that the notion of similarity as used by Kelly (1998) is indeed too narrowly defined: the similarity of source words to their blends can be defined more broadly, thereby providing both a broader basis for the analysis and a cognitively more realistic approach (since more graphemic or phonemic material outside of the immediate breakpoint area can be included).
4. Discussion and conclusion
4.1. Validation of the results: recognizability and similarity again
The preceding sections have exclusively dealt with non-speech-error blends. However, to validate the results, we should also investigate how the findings of sections 3.1 and 3.2 relate to authentic speech-error blends.
First, while the quantitative mechanisms underlying blend structure described in section 3.1 do probably not involve completely conscious word-formation processes, they nevertheless provide insights into information-processing capabilities of human beings that need to be considered if a blend is to be viable. Obviously, this should not hold for speech-error blends and the simulated blends whose generation was described above in section 3. Since these blends are not coined intentionally, there is no attempt to render both source words recognizable. Accordingly, the above results for intentional blends should differ from those of speech-error blends and simulated blends. To test this prediction, I assembled a small corpus of ninety speech-error blends from some previous studies; for the class of simulated blends, the set of phonemic blends described above was used. Then, Chi-square tests were used to determine whether the cells for which Kaunisto made his predictions do in fact not yield significant results. (16) Consider Table 4 for the results on error blends.
While the Chi-square value for Table 4 is highly significant ([chi square] = 37.3; df = 4; p < 0.001), the more precise investigation of the individual contributions to Chi-square by way of a CFA (cf. above) shows that this cannot be attributed to the four lower right cells for which Kaunisto's predictions applied. By contrast, the highest contribution to Chi-square is observed in the upper left cell (CONTRIBUTION: S[W.sub.1] = S[W.sub.2] x LENGTH: S[W.sub.1] = S[W.sub.2]), supporting the finding mentioned above in note 12. The results are even more extreme for the simulated blends: for letters, not even the Chi-square test for the whole table reaches significance ([chi square] = 5.49; df = 2; p = 0.064); for phonemes, the Chi-square test for the whole table is significant ([chi square] = 9.83; df = 4; p = 0.043), but the cell with the highest contribution to Chi-square is again (CONTRIBUTION: S[W.sub.1] = S[W.sub.2] x LENGTH: S[W.sub.1] = S[W.sub.2]) and does not reach significance. We can thus safely assume that speech-error blends and simulated blends do differ significantly from intentional blends in a way that is predictable on the basis of Kaunisto's hypothesis. It also follows that the results obtained above are not artefactual in the sense that they are to be expected for all juxtapositions of words.
Let us now turn to the second question, namely the role similarity plays for speech-error blends as opposed to intentional and simulated blends. Since previous studies have shown that speech-error blends involve segmental similarity (cf. MacKay [1987: 34] and the references mentioned there as well as Kemmer  and Laubstein [1999: 137]), the computation of the above similarity index should result in a value higher than for those of the simulated blends. Thus, the average S[I.sub.p]'s of speech-error blends, authentic intentional blends, and simulated blends were compared with an ANOVA. Consider Figure 6 for the results.
As is obvious, the proposed similarity index yields significantly different results for the three types of blends under investigation ([F.sub.2,818] = 65.45; p < 0.001): error blends and intentional blends exhibit high degrees of similarity and differ significantly (according to a Scheffe test) from the randomly coined simulated blends. The fact that such a large difference can be observed provides further evidence that SI does indeed capture the similarity between source words and blends well and, thus, lends further credibility to the above analysis of intentional blends.
4.2. Summary: findings and methodology
Let me briefly summarize what I consider to be the most important issues of this study. After having pointed out a variety of conceptual and methodological problems of previous approaches to blend structure, we have seen that, while blends exhibit many structural characteristics, their structure is governed by a desire to guarantee the recognizability of both source words. From this, several precise predictions describing the results in the form of contingency tables were derived and empirically tested. The influence of recognizability and information processing was shown to be highly significant both in graphemic and in phonemic form and for source words up to the breakpoint as well as across whole source words: virtually every single prediction was borne out by the data. Moreover, it was shown that, as was to be expected, this property is not shared by speech-error blends and random blends (that is, arbitrary nonauthentic juxtapositions of words).
Secondly, I introduced a similarity index SI that precisely quantifies the similarity between two source words and their blends. It was shown that, in most cases, SI corresponds naturally to one's own intuitions. Also, I demonstrated to what degree similarity plays a role in blend formation: on average, both intentional blends and speech-error blends exhibit a much higher degree of similarity to their source words than blends created randomly (to represent results following from the null hypothesis).
On a methodological level, I hope to have shown that even something as diverse as blend structures can be fruitfully investigated (i) from a hypothesis-testing perspective (cf. already on this Kelly 1998: 588) and (ii) on the basis of quantitative data and methods. This is especially true of concepts such as similarity that are otherwise different to handle objectively and reliably. I hope, therefore, to have provided some possibilities/ techniques that can be used to shed light on further aspects of blends and their structure, some of which I would like to briefly mention in the final section of this paper.
The analyses proposed so far can be (and, in fact, already are in a larger project of mine) further exploited along three major lines. First, it was shown above that structural characteristics of blends can also be investigated fruitfully with reference to articulatory features. Thus, it could be interesting to see whether both of the interrelated questions (information contribution and similarity) could be further pursued by not counting graphemes or phonemes, but articulatory features instead. While this would be an extremely time-consuming task, the results would probably be even more precise, given that examples such as (5) above could be accounted for better. The preliminary results I have are promising, but as yet the data base is still too small as to yield results with significance levels comparable to those reported in the present paper.
Second, an interesting further way of supporting the analysis would be to conduct blend-production experiments where native speakers are asked to blend two words into one. Analyses of the above sort can then be performed and on the data thus obtained in order to further test the viability of the approaches advocated here. For instance, consider how to blend the source words (in this order) Chevrolet and Cadillac. Several possibilities (obviously of different likelihood) come to mind, some of which are Chevrolac, Chevrillac, Chevillac, Chedillac, Chadillac, etc. The question arises whether blend formation can be predicted on the basis of the above approach (and, of course, Kelly's and others' findings, some of which are mentioned below). Given the findings in sections 2 and 3, we would expect competition between two conflicting functions, namely
--the tendency to have as much material as possible in the blend (yielding Chevrolecadillac or, in a different order, Cadillachevrolet), increasing the recognizability (for which an overall measure would then be necessary, e.g. the sum of percent the source words contribute to the blend), and
--the tendency to form a blend that is most similar to both source words (where Chevrolecadillac does not fare well too well: S[I.sub.G] = 0.472).
Similarly, consider the blend alluded to in the title of this paper, namely brunch. As we saw above, brunch consists of 22.2% of breakfast and 80% of lunch and has rather moderate SI values (S[I.sub.G] = 0.304; S[I.sub.p] = 0.275). Along the lines of this paper at least, however, breakfunch would be a "better" or a more "typical" blend:
--it consists of 66.6% of breakfast and 80% of lunch (resulting in a higher recognizability of S[W.sub.1] and S[W.sub.2]);
--it has higher SI values (S[I.sub.G] = 0.36; S[I.sub.p] = 0.336);
--it preserves the beginning of S[W.sub.1] up to its uniqueness point.
Consider, therefore, two mutually nonexclusive explanations for why brunch has been coined. First, brunch might possess a property that outweighs the competing blend breakfunch, for example the tendency of blends to have as many syllables as S[W.sub.2], an observation made by Kubozono (1990: 12, 15f.). In his data, this tendency is true for 111 out of 142 intentional blends (i.e. 78.2%). While the results are not equally high in my corpus of intentional blends (only 55.7% of the blends are as long as S[W.sub.2]), this tendency might have been strong enough to override the above preferences speaking in favor of breakfunch.
An alternative account is that brunch's pattern is due to findings described in Berg (1989): coining brunch splits up breakfast between its consonantal onset [br] and its superrime [ekfest]. (17) Again, then, it might be possible that this is responsible for the otherwise less than ideal blend. Ultimately, many such observations would have to be integrated into a single account of blend formation.
Finally, it could prove interesting to further contrast properties of intentional blends to those of speech-error blends, going beyond the as yet superficial characterization in section 4.1 (cf. also Gries [forthcoming]). For example, why is S[W.sub.1] shorter than S[W.sub.2] in intentional blends, but longer in speech-error blends? To what degree is the fusion of intentional blends constrained by the uniqueness point of the two source words? These are some of the questions that are currently addressed in my ongoing work.
I believe that we have not yet exploited all the information blends can provide about the linguistic system. Given the multitude of variables bearing on blends and the fact that blends constitute an intersection of conscious and unconscious processes as well as spoken and written language, their analysis should throw light on many (psycho)linguistic processes.
Table 1. Classification and exemplification of blends (phonemic overlap is italicized) Both source words Only the first source are shortened word is shortened +Overlap kretIkjele fjutIletaerien +Linear blending (critical x particular) (futile x utilitarian) +Overlap ka:nIbeles aembIsekstres -Linear blending (carnivorous x nibble) (ambidextrous x sex) -Overlap br[??]nt[??] kraenaepel +Linear blending (breakfast x lunch) (cranberry x apple) -Overlap aed3Itpr[??]p -- -Linear blending (agitation x propaganda) Only the second source No source word word is shortened is shortened +Overlap bouldeI[??]es paelemenI +Linear blending (bold x audacious) (pal x alimony) +Overlap slaekedem -- (5) -Linear blending (slacker x academy) -Overlap sm[??][??]ekeIt -- +Linear blending (smother x suffocate) (compounds) -Overlap smoukeloutIv -- -Linear blending (smoke x locomotive) Table 2. Significant effects identified in the hierarchical loglinear analysis partial ass. [chi square] Effect df p LENGTH 2 572.26 0 CONTRIBUTION 2 436.18 0 CONTRIBUTION x LENGTH 4 621.41 0 CONTRIBUTION x MEDIUM 2 11.51 0.0032 CONTRIBUTION x ANALYSIS 2 11.53 0.0031 Levels and combinations of levels with Effect highest [absolute value of [lambda]] LENGTH S[W.sub.1] = S[W.sub.2] (-0.579) S[W.sub.1] < S[W.sub.2] (0.624) CONTRIBUTION S[W.sub.1] = S[W.sub.2] (-0.532) S[W.sub.1] < S[W.sub.2] (0.506) CONTRIBUTION x LENGTH CONTRIBUTION: S[W.sub.1] > S[W.sub.2] x LENGTH: S[W.sub.1] < S[W.sub.2] 0.772 CONTRIBUTION: S[W.sub.1] < S[W.sub.2] x LENGTH: S[W.sub.1] > S[W.sub.2] 0.737 CONTRIBUTION: S[W.sub.1] < S[W.sub.2] x LENGTH: S[W.sub.1] < S[W.sub.2] -0.561 CONTRIBUTION: S[W.sub.1] = S[W.sub.2] x LENGTH: S[W.sub.1] = S[W.sub.2] 0.556 CONTRIBUTION x MEDIUM CONTRIBUTION: S[W.sub.1] = S[W.sub.2] x MEDIUM: written -0.112 CONTRIBUTION: S[W.sub.1] = S[W.sub.2] x MEDIUM: spoken 0.112 CONTRIBUTION x ANALYSIS CONTRIBUTION: S[W.sub.1] < S[W.sub.2] x ANALYSIS: 1 0.114 CONTRIBUTION: S[W.sub.1] < S[W.sub.2] x ANALYSIS: 2 -0.114 Table 3. Comparison of S[I.sub.G] and S[I.sub.p] for authentic and simulated blends Authentic blends S[I.sub.G] Example (source words) Mean (sd) 0.48 (0.14) dramedy (drama x tragedy) Maximum 0.85 skittenish (skittish x kittenish) Minimum 0.12 comint (communications x intelligence) S[I.sub.p] Example (source words) Mean (sd) 0.49 (0.14) fantabulous (fantastic x fabolous) Maximum 0.92 racketeer (racket x racqueteer) Minimum 0.11 amping (amphetamine x smoking) Simulated blends Mean S[I.sub.G] Mean S[I.sub.p] Mean (sd) 0.37 (0.11) 0.35 (0.11) Maximum 0.61 0.57 Minimum 0.07 0.05 Table 4. Phonemic LENGTH x CONTRIBUTION (analysis 2) for speech-error blends CONTRIBUTION S[W.sub.1] = S[W.sub.1] > S[W.sub.1] < S[W.sub.2] S[W.sub.2] S[W.sub.2] LENGTH S[W.sub.1] = S[W.sub.2] 15 3 12 S[W.sub.1] > S[W.sub.2] 1 9 25 S[W.sub.1] < S[W.sub.2] 1 14 10 Column totals 17 26 47 Row totals LENGTH S[W.sub.1] = S[W.sub.2] 30 S[W.sub.1] > S[W.sub.2] 35 S[W.sub.1] < S[W.sub.2] 25 Column totals 90
* I thank the following people (in alphabetical order) for their feedback during the various stages leading to this work: Thomas Berg, Anatol Stefanowitsch, and Stefanie Wulff. Also, I thank members of the audiences at CLS 38 and CSDL 6 for some critical comments. Finally, let me express my gratitude to an anonymous reviewer for a number of extremely helpful comments, which resulted in a variety of both methodologically and conceptually highly relevant improvements. Of course, I alone am responsible for any remaining inadequacies. Correspondence address: Institut for Fagsprog, Kommunikation og Informatiousvidenskab, Syddansk Universitet, Grundtvigs Alle 150, 6400 Sonderborg, Denmark. E-mall: firstname.lastname@example.org.
(1.) Note that Algeo's (1977) definition is slightly unclear and only saved by the examples given later. For instance, it is probably a matter of taste whether filmania (film x mania) does in fact involve a shortening of either or both of the two forms since one might as well argue that both source words are present entirely. Nevertheless, in his discussion of anecdotage, Algeo claims that the overlapping sounds [dout] correspond to a shortening of both source words, thereby saving his definition.
(2.) Unfortunately, not all of Cannon's observations are supported: for instance, he claims that a "blend which has more syllables than does its longer source word preserves at least one of its source words" (1986: 747), which is simply incorrect, given the following examples: fantabulous (fantastic x fabulous), happenident (happening x accident) and linar (line x star), two of which he himself discusses.
(3.) Most scholars have commented on both graphemic and orthographic characteristics of blends, but do not go beyond simply issuing caveats: Algeo (1977: 51) states "[b]lends may be either phonological or orthographic. No effort to distinguish between the two modes is made here, although such a distinction must be drawn in a thorough taxonomy" and bases his discussion (of, e.g., overlapping) mainly on phonological criteria. Cannon (1986: 726) simply remarks that "[w]e will check the viability of some oral vs. written data, though an obvious caveat is that findings from written data do not necessarily apply to oral ones, in view of differences between speech and writing" and, again, focuses on phonological criteria for the most part. Finally, Kaunisto (2000a, 2000b) is only concerned with orthographic characteristics.
(4.) It might of course be the case that the examples chortle and slithy are only used in the text for expository reasons, but were not counted as data included in the analysis proposed later, but then Kemmer (2003) would have to explain on what a priori basis these cases were excluded from consideration.
(5.) This means no such blends were found in my corpus.
(6.) My own corpus supports Kelly's (1998) finding: the first and second source words are, on average, 6.6 letters/5.7 phonemes and 7.4 letters/6.3 phonemes long respectively; these differences are highly significant ([t.sub.Welch] = -5.6; df = 584; p < .001 and [t.sub.Welch] = -4.8; df = 584; p < .001).
(7.) In cases where the blend contains graphemic overlap, these graphemes are counted once for each source word.
(8.) In this case, this is particularly obvious due to the phonological similarity of the two source words' codas, which differ only with respect to voicing ([tf] vs. [d.sub.3]).
(9.) Strictly speaking, there is yet a third possibility, where the <s> in fantastic is matched with the <s> in fabulous, but I leave out this option because while it seems plausible to assume that speakers/listeners notice the word-initial similarity of both source words, I think it is less likely that the common <s> plays a crucial role in the pattern-matching process underlying blends. While I admit to have no empirical evidence for this assumption, it is nevertheless supported by the fact that the two <s> graphemes and [s] phonemes also play different roles in their respective source words. The <s> in fantastic can be perceived as ambisyllabic and part of a consonant cluster whereas the <s> in fabulous/fantabulous is in syllable- and even word-final position and does not stand together with other consonants. Also, I am not concerned with which <a> of fantastic the <a> in fabulous is matched since this decision does not affect my overall point.
(10.) The question of whether there is a syllabic [l] or a schwa between [n] and [l] does not affect my main argument.
(11.) The pronunciation of words has been determined on the basis of the Cobuild on Compact Disc Dictionary V1.2 (1995).
(12.) In this respect, intentional blends differ from speech-error blends. MacKay found that in German speech-error blends S[W.sub.1] is significantly more often longer than S[W.sub.2] (MacKay 1973: 790f.) whereas Gries (forthcoming) found that the mean lengths of source words of English speech-error blends do not differ significantly from each other irrespective of whether lengths are determined on the basis of letters, phonemes, or syllables.
(13.) Let me emphasize that this is the question of how similar source words are to the blend--it is not the question of whether source words are similar to each other (which is why I did not use established measures of orthographic similarity such as the Dice coefficient or any of its extensions); cf. Gries (forthcoming) for empirical results concerning this issue.
(14.) Note that SI's theoretically possible values of 0 and 1 will rarely be obtained on the basis of actual data. SI = 0 would mean that both source words contribute nothing to the blend (i.e. we don't have a blend at all) whereas SI = 1 entails both words overlap completely in the blend. But SI still serves its function well; as an example for a relatively low value, consider the following hypothetical case: two ten-letter source words contribute their first and their last letter respectively to a three-letter blend, that is, one in which there is also one letter as filler material (e.g. intruding letters as in donkophant [donkey x elephant]). In such a case, SI = 0.033, i.e. practically 0. A similar case can be made for the maximal value of 1, although for such an example we need to look at the phonemic makeup of the blend as well: consider a case where two words are spelt differently and mean two different things, but are pronounced identically; for example, the hypothetical case of <racket> and <racquit> both pronounced [raekit]. In this case, we could have a blend with a S[I.sub.p] of 1, since both words contribute all of their phonemes to the blend, which would be recognizable on the basis of the spelling only, namely, for example, <rackit>. Since means alone may distort the overall picture, I will also provide standard deviations and the most extreme SI values obtained in my analysis.
(15.) During this process, simulated blends violating English spelling conventions were discarded (e.g. strongrful, where the sequence ngrf did not occur once in the British National Corpus 1.0 other than in the apparently faulty file FYY).
(16.) Given the low frequencies and the fact that I only wanted to test an interaction, no loglinear analysis was performed.
(17.) I am grateful to Thomas Berg for pointing this out to me.
Adams, Valerie (1973). An Introduction to Modern English Word-Formation. London: Longman.
Akmajian, Adrian; Demers, Richard A.; Farmer, Ann K.; and Harnish, Robert M. (1995 ). Linguistics: An Introduction to Language and Communication, 4th ed. Cambridge, MA: MIT Press.
Algeo, John (1977). Blends, a structural and systemic view. American Speech 52, 47-64.
Bauer, Laurie (1983). English Word-Formation. Cambridge: Cambridge University Press.
Berg, Thomas (1989). On the internal structure of polysyllabic monomorphemic words: the case for superrimes. Studia Linguistica 43, 5-32.
Bergstrom, Gustav Adolf (1906). On Blendings of Synonymous or Cognate Expressions in English. Lund: Hakim Ohlsson.
Bryant, Margaret M. (1974). Blends are increasing. American Speech 49, 163-184.
Cannon, Garland (1986). Blends in English word formation. Linguistics 24, 725-753.
Cutler, Anne; Hawkins, John A.; and Gilligan, Gary (1985). The suffixing preference: A processing explanation. Linguistics 23, 723-758.
von Eye, Alexander (1990). Introduction to Configural Frequency Analysis: The Search for Types and Antitypes in Cross-Classifications. Cambridge: Cambridge University Press.
Gries, Stefan Th. (forthcoming). Some characteristics of English morphological blends. In Papers from the 38th meeting of the Chicago Linguistics Society: The Panels, Mary Andronis et al. (eds.). Chicago: Chicago Linguistics Society.
Irwin, Betty J. (1939). Trends in blends. American Speech 54, 284.
Kaunisto, Mark (2000a). Relations and proportions in the formation of blend words. Conference handbook, Fourth Conference of the International Quantitative Linguistics Association (Qualico), Prague, August 24-26.
--(2000b). Relations and proportions in the formation of blend words. Presentation handout, Fourth Conference of the International Quantitative Linguistics Association (Qualico), Prague, August 24-26.
Kelly, Michael H. (1998). To 'brunch' or to 'brench': some aspects of blend structure. Linguistics 36, 579-590.
Kemmer, Suzanne (2003). Schemas and lexical blends. In Motivation in Language: From Case Grammar to Cognitive Linguistics. A Festschrift for Gunter Radden, Thomas Berg et al. (eds.), 69-97. Amsterdam and Philadelphia: Benjamins.
Kubozono, Haruo (1990). Phonological constraints on blending in English as a case for phonology-morphology interface. Yearbook of Morphology 3, 1-20.
Laubstein, Ann Stuart (1999). Lemmas and lexemes: the evidence from blends. Brain and Language 68, 135-143.
MacKay, Donald G. (1973). Complexity in output systems: evidence from behavioral hybrids. American Journal of Psychology 86, 785-806.
--(1987). The Organization of Perception and Action: A Theory for Language and Other Cognitive Skills. Heidelberg: Springer.
Murray, Thomas (1995). The Structure of English: Phonetics, Phonology, Morphology. Boston: Allyn and Bacon.
Noteboom, Sieb G. (1981). Lexical retrieval from fragments of spoken words: Beginnings vs. endings. Journal of Phonetics 9, 407-424.
Pound, Louise (1914). Blends: Their Relation to English Word Formation. Heidelberg: Winter.
Stekauer, Pavol (1991). On some issues of blending in English word-formation. Linguistica Pragensia 1, 26-35.
Tversky, Amos (1977). Features of similarity. Psychological Review 84, 327-352.
University of Southern Denmark at Sonderborg
Received 10 July 2001
Revised version received
2 December 2002
|Printer friendly Cite/link Email Feedback|
|Author:||Gries, Stefan Th.|
|Publication:||Linguistics: an interdisciplinary journal of the language sciences|
|Date:||May 1, 2004|
|Previous Article:||Can we make heads or tails of Spanish endocentric compounds? (1).|
|Next Article:||On the L2 acquisition of the morphosyntax of German nominals (1).|