Four types of morpheme: evidence from aphasia, code switching, and second-language acquisition(1).
This paper presents empirical evidence supporting a new model of morpheme classification called the 4-M model. This model emphasizes the notion that lemmas underlying different types of morphemes become salient at different levels of production. This explains their different distributions. While the 4-M model classifies morphemes, it is primarily a model of how morphemes are accessed. The argument is that particular instantiations of morphemes are classified as a consequence of the mechanisms that activate them. The evidence considered comes from studies of code switching, Broca's aphasia, and second-language acquisition. One finding that the 4-M model captures is that not all functional elements pattern alike. Some are conceptually activated at the level of the mental lexicon along with their contentmorpheme heads. Two other types of functional element are structurally assigned and do not become salient until later in the production process. These differences explain their different distributions in the data considered.
A number of well-known distributions of morpheme types in a range of linguistic phenomena are still wanting an explanation. This paper argues that classifying morphemes according to a set of abstract distinctions explains distributions of morpheme types in a principled way that generalizes across many linguistic phenomena. These distinctions point to a four-way classification of how morphemes are activated: the 4-M model. Under this model, the classification of morphemes is not basic; rather, the mechanisms for activating and combining morphemes are basic. The interaction between conceptual information and complex grammatical structure in any entry in the mental lexicon and the rules of grammar in the formulator are what give any reality to the notion of morpheme. This paper uses the 4-M model to explain certain data distributions in three phenomena: code switching, aphasia, and second-language acquisition.
The 4-M model has a number of advantages.(2) In addition to being linked to a model of the mental lexicon, it offers indirect evidence for how language production works and how competence and performance are linked. Specifically, it offers an explanation for how content morphemes differ in their access from functional elements, and how functional elements can differ from each other in systematic ways. Thus it adds a needed dimension to discussions of language production. For example, while Levelt et al. (1999) present one of the most extensive theories of language production to date, they do not claim completeness and have more to say about content words than other types of morpheme. They write, "We have ... a much better understanding of access to open class words than of access to closed class words" (1999: 3).
We claim that the abstract distinctions on which the 4-M model is based reflect a universal way that the mental lexicon is organized. In our view, the entries in the mental lexicon (lemmas) are neither words nor morphemes but sets of directions for ultimately realizing simple and complex words. The notion of morpheme is a way of recognizing that words have internal structure and that, because they undergo grammatical processes, their parts may occur in different combinations and yet be recognizable as mappings to conceptual structure.
The primary distinction between morpheme types resulting from the 4-M model is the distinction between content and system morpheme. We follow Myers-Scotton (1993, 1997) in using the term system morpheme from Bolinger (1968); he applied it to both inflectional morphemes and function words. Not surprisingly, "4-M" means that in the model, there are four types of morpheme. Content morphemes are one type and there are three types of system morpheme. The characteristic properties of content morphemes should be largely self-evident; they convey the core semantic/pragmatic content of language. The three types of system morpheme carry the relational aspects of language and are discussed in detail in section 4, where they are formally defined.
This paper is organized as follows. First, we outline our views of relevant aspects of language production. Second, we present the basic division of content vs. system morpheme. Third, we compare this distinction with other ways of classifying morphemes. Fourth, we show how the 4-M model refines the distinction. Fifth, we show how the 4-M classification explains certain distributions in three types of data:
(1) embedded-language system morphemes in classic code switching,
(2) elements absent or inaccurately produced by Broca's aphasics, and
(3) contrasts in morpheme accuracy in interlanguage in second-language acquisition.
2. How language production works
Language production begins with conceptual structure; this level is prelinguistic even though concepts are what language conveys. Speakers draw on the conceptual level to express their intentions. Intentions activate language-specific semantic/pragmatic feature bundles. These bundles select lemmas, language-specific entries in the mental lexicon. Lemmas are what mediate between the intentions at the conceptual level and the production of grammatical structures, including surface structures (cf. Levelt 1989; de Bot 1992; de Bot and Schreuder 1993; Poulisse 1997; Green 1998; Levelt et al. 1999). As Bock and Levelt point out, "between lexical concepts and lemmas there are systematic relations" (1994: 953). Levelt (1993: 4) makes a link between lemmas and phrase structures: "the generation of surface structure is lexically-driven. Each lemma requires a particular syntactic environment." Lemmas send directions to a "formulator" that turns on the actual morphosyntactic procedures that will result in surface-level utterances.
The lemmas that are directly elected at the conceptual level underlie content morphemes. Yet, content morphemes are not sufficient to realize speakers' intentions and indicate the relationships between abstract lexical structures and language-specific surface forms. System morphemes accomplish this. Some lemmas underlying system morphemes become relevant to the production process at the lemma level and some at the formulator. The formulator (at the functional level) receives directions from lemmas on how to assemble larger constituents; that is, smaller units are combined to create larger, hierarchically structured constituents. These are then realized at the surface level. Figure 1 represents our views regarding how lemmas are activated in the production process. The details of the 4-M model and its relation to the production process are developed in sections 3 and 4.
[Figure 1 ILLUSTRATION OMITTED]
The way the 4-M model classifies morphemes emphasizes the notion that lemmas underlying morphemes become salient at different levels. This explains their different distributions. Garrett, of course, has claimed for some time that not all forms are accessed at the same level (e.g. Garrett 1975). For example, he writes, "Lexical formatives" are accessed at a higher level and "non-lexical formatives are incorporated as features of planning frames" (1993: 80). Garrett's distinction between lexical and nonlexical formatives reflects the traditional open-class vs. closed-class distinction. The 4-M model agrees that lexical elements are accessed at different levels but rejects the open-class and closed-class distinction as not precise enough. The 4-M model's finer distinctions motivate the hypothesis that different classes of lemmas are activated at three different levels of production: at the conceptualizer, at the lemma level, and at the formulator. These points are developed below.
While we claim that the 4-M model explains aspects of production, that is not to say that mechanisms of morpheme activation are the only factor in production. Local markedness may play a role in how morphemes are supported by entries in the mental lexicon (Tiersma 1982). More frequent plurals (such as bacteria) are favored over the less morphologically complex, but less frequent, singular (bacterium), and leveling may occur in favor of the more complex form. As Tiersma notes, "It is with such words that one may detect a type of competition between general markedness (which refers to categories [such as singular vs. plural]) and local markedness (which applies to semantically similar lexical items [such as small animals or consumable mass nouns])" (1982: 847). Frequency also plays a role; consider the difference between inflected nouns and verbs in the work of Sereno and Jongman (1997). Their results further show that regular plural nouns in English behave as units in word-recognition response time. Such results suggest to us that some inflections behave differently from others that have less content. The research in Baayen et al. (1997) focuses on regular morphologically complex words and also suggests that many plural nouns are stored as units, thus "avoid[ing] the time-costly resolution of the subcategorization conflict" associated with certain Dutch plurals (1997: 94). Their "parallel dual-route model" echoes the difference between different types of system morpheme, as developed in section 4. A second frequency effect depends on the number of morphologically interrelated words, or family size (Schreuder and Baayen 1997). These authors suggest that the effect of family frequency on lexical processing "can only be explained under the assumption that many complex words have their own semantic representations in the lexicon" (1997: 136). They further suggest that even "simplex words" must be viewed as complex because of the interrelatedness of morphological forms. Similarly, the 4-M model recognizes a connection between forms in the mental lexicon. Because the model claims that lemma entries are activated in more than one way, the complexity may be more than just relatedness of root or stem forms but may also reflect grammatical information not available until the level of the formulator.
3. The basic division: content vs. system morphemes
Research in code switching has motivated the most basic division of morphemes into content morphemes and system morphemes (cf. Myers-Scotton 1993, 1997). This distinction captures a fundamental property of language; that is, some grammatical forms encode specifically how arguments and predicates are related. This is the notion of thematic role.
3.1. Content morphemes
Content morphemes are the surface result of intentions that activate semantic-pragmatic feature bundles at the conceptual level. These bundles point to lemmas in the mental lexicon that underlie those surface morphemes carrying most of the informational content of a message.
The feature [+/- thematic role] is the basis for the content vs. system morpheme distinction. Content morphemes either assign or receive thematic roles. System morphemes do not. Verbs are prototypical content morphemes that assign thematic roles and nouns are prototypical receivers of thematic roles.(3) Adjectives are content morphemes, too, because they assign thematic roles, most obviously as predicate adjectives, but also in other constructions. Note that in (1), interested [in] assigns the role of theme (stimulus) to horticulture and the role of experiencer to Stella.(4) System morphemes are discussed below in detail.
(1) Stella is interested in horticulture.
Whether or not a particular morpheme assigns or receives a thematic role is language-specific because the mapping of thematic roles onto surface morphemes varies across languages. Thus, a content morpheme in one language may have a counterpart in another language that is a system morpheme, or may even have no counterpart in terms of matching morphemes. For example, an English preposition that assigns a thematic role (therefore, a content morpheme) corresponds to an applied suffix in a Bantu language, a system morpheme (cf. Bresnan 1994). While it can be argued that the verb in Bantu including the suffix assigns thematic roles, the suffix itself is a system morpheme because its occurrence depends on the presence of the verb, the content morpheme. Example (3) is the Swahili counterpart of (2); Bora receives the recipient thematic role from the verb complex -pikia `cook for' that includes the applied suffix -i. We consider cases similar to -pikia in section 4.1 when we discuss English phrasal verbs in the discussion of indirectly elected early system morphemes.
(2) He cooked for Bora.
(3) A-li-m-pik-i-a Bora. 3S-PST-OBJ-cook-APPL-FV Bora `He cooked for Bora.'
3.2. Defining system morphemes
In contrast to content morphemes, system morphemes neither assign nor receive thematic roles. Prototypical system morphemes are some function words and inflections. For example, in English, "3rd person singular" -s, and articles (a, an, the) are system morphemes. Obviously, articles do not receive a separate thematic role; they specify qualities about the NPs that they identify and that receive thematic roles. For example, some occurrences of the indicate the degree of definiteness or prior knowledge of the NP in the discourse. Similarly, degree adverbs specify the extent of a quality or frequency, as in very blue or too fast. Some morphemes indirectly receive thematic roles; however, they are still system morphemes. That is, within the thematic grid, only those morphemes that directly receive or assign thematic roles are content morphemes. Examples of those system morphemes that indirectly receive thematic roles include clitic pronouns that do not occur in argument position (where a full NP would). French clitics are an example of this type of system morpheme.(5)
While other approaches to morpheme classification also employ the feature [+/- thematic role], they do so in order to define particular lexical categories (Abney 1987; Ouhalla 1991). In contrast, under the 4-M model, morpheme classification is not isomorphic with lexical category status. That is, the feature [+/- thematic role] does not define any particular set of lexical categories. For example, some prepositions assign thematic roles and, therefore, are content morphemes. Others do not. Those that do not assign a thematic role often satisfy surface case assignments in a particular language. An obvious and much remarked upon example is possessive of in English. In student of physics, student assigns the role of theme to physics and of assigns objective case to physics. Another lexical category in which some members are content morphemes and others are not is pronoun. For example, while the personal pronouns in English are all content morphemes because they receive thematic roles, the expletive pronouns it and there behave differently. They do not receive thematic roles: evidence of this is that they cannot be questioned. See (4) and (5) below. Elsewhere we have shown that not all entries in a lexical category pattern alike cross-linguistically (Jake 1994; Myers-Scotton et al. 1996; Myers-Scotton and Jake 2000b).
(4) a. I said that he went.
b. I'm sorry. I didn't hear you. Who went?
(5) a. I said it's raining.
b. I'm sorry. I didn't hear you. *What's/who's raining?
3.3. Other ways of classifying morphemes
The content versus system morpheme distinction may resemble other ways of classifying morphemes but is, in fact, very different. The theoretical basis of the content versus system distinction is unlike the two ways typically used to categorize lexical structure. One traditional distinction is word-based: the open-class vs. closed-class distinction. Open-class items are members of categories that accept new entries (e.g. nouns and verbs). Closed-class items generally do not accept new members (e.g. pronouns and prepositions). This distinction is usually applied only to words, that is, free morphemes (Harris 1951: 251-252, note). The second typical distinction is based on lexical category: thematic elements vs. functional elements. The feature [+/- F] is a category-defining feature (Abney 1987). He distinguishes functional and thematic elements in saying, "Functional elements lack what I [Abney] will call `descriptive content'" (1987: 65). Abney's approach also assumes that functional elements select their thematic complements. As will become clear in the discussion below, we do not make this assumption. For example, while determiners are specifiers of NPs under the 4-M model, they are not the functional heads of determiner phrases (DPs). Under this view, content morphemes select determiners, not the other way around.
The most serious problem with both the open-class vs. closed-class and the thematic vs. functional classifications is that they make the wrong predictions about how words or lexical categories are distributed. For example, in code-switching data, some pronouns from the embedded language occur in mixed constituents, but others do not. Yet, according to these two distinctions, all pronouns are, at least at the primary division, members of the same category. All are closed-class items; they are all also functional elements. Yet, pronouns do not pattern uniformly, as is shown in section 3.2.
Linguists have long recognized that while the tenets of the structuralists' classification of morphemes are appealing, they are over-simplistic. We acknowledge the problems inherent in any morpheme classification in which morpheme is the central unit of structure mapping meaning to surface form. As Anderson (1992) points out, assuming there is nothing more basic than the morpheme results in such problematic forms as portmanteau morphs, cumulative morphs, and empty morphs, among others. Under the 4-M model, morphemes are not basic in the lexicon. Lemma entries support morphemes that are realized only at the positional level, but via mechanisms available at different levels of grammatical organization. How these different mechanisms interact explains the diversity in surface morphs.
Although the 4-M model begins with a two-way distinction (content versus system morphemes), the distinction is independent of lexical category. In addition, the 4-M model recognizes that a two-way division is insufficient. This division is refined by recognizing that there are three types of system morpheme according to their relations with lexical heads (i.e. content morphemes). In contrast, the other classifications recognize only one type of nonthematic morpheme (i.e. closed-class items or functional elements).
4. The 4-M model
The 4-M model classifies morphemes according to abstract grammatical features that are already inherent to the organization of language, namely thematic roles, maximal projections, and coindexing of elements. Thus, the four-way classification of morphemes results from a confluence of two sets of abstract distinctions. One of these ([+/- conceptually activated]) refers to the mechanisms by which morphemes are accessed. It captures the distinctions in levels that we have discussed earlier. That is, lemmas supporting morphemes activated at the conceptual level or lemma level are called conceptually activated; those that are activated at the level of the formulator are not.
The second set of distinctions refers to two properties in building phrase structures. The first property is the feature [+/- thematic role assigner/receiver]. Lemmas supporting content morphemes have this feature; they differ from those supporting all types of system morpheme in that they either receive or assign a thematic role, as noted earlier. This means that they determine the hierarchical structure of mapping arguments and predicates. However, this feature of content morphemes separates them from the other type of conceptually activated morpheme: indirectly elected or "early" system morphemes. These early system morphemes are discussed in detail below. Yet, because they are conceptually activated, early system morphemes are distinguished from the other two types of system morpheme.
The second property of phrase building is the feature [+/- looks outside of own maximal projection]. This feature is only relevant when larger constituents are constructed at the level of the formulator. For this reason, the system morphemes to which it applies are called "late." While we are using a temporal metaphor for production processes, its intent is to capture the generalization that late system morphemes require morphosyntactic information not available until the level of the formulator. How the feature [+/- looks outside] separates late system morphemes into two classes is discussed below.
In sum, two sets of abstract distinctions, one conceptually based and one based on phrase-structure-building properties, result in the four morpheme types, as shown in Figure 2. How these features of language define four classes of morpheme naturally will be discussed below.
[Figure 2 ILLUSTRATION OMITTED]
4.1. Content morphemes
The properties of content morphemes have been discussed sufficiently in section 3.1. In addition, this section makes the distinction between content morpheme and open class/thematic element.
4.2. Early system morphemes
Because early system morphemes are activated at the lemma level, we refer to them as "early" system morphemes. The feature [+ conceptually activated] conveys the idea that early system morphemes group with content morphemes as expressing the bundle of semantic and pragmatic features satisfying the speaker's intentions. However, while early system morphemes pattern with content morphemes for this feature, they contrast with content morphemes in regard to thematic role-assigning properties; no system morphemes receive or assign thematic roles. The feature [+ conceptually activated] also indicates that, except for surface-level phonological form, the critical information necessary for the form of the morphemes is available at the lemma level. This feature separates content and early system morphemes from late system morphemes. Examples of early system morphemes are definite article the in (6) and up in (7).
(6) I found the book that you lost yesterday
(7) a. Bora chewed up Lena's toy yesterday.
b. Bora chewed Lena's toy up yesterday.
c. *Bora chewed Lena's toy yesterday up.
In (6), book indirectly elects the to complete the semantic/pragmatic feature bundle called by the speaker's intentions; that is, the adds definiteness to book. In (7), chew indirectly elects up to add necessary conceptualization to chew to convey a different idea than chew does by itself.
Formal properties distinguish early system morphemes from both content and late system morphemes. Content morphemes are directly elected by intentions; they can occur independently of other elements. In contrast, early system morphemes rely on the heads of their maximal projection (content morphemes) for information about their form. For example, in (7a) and (7b), up depends on chew for information about its form and position. The problem with the unacceptable (7c) is that the early system :morpheme up does not occur within the maximal projection of the verb chew. Thus, early system morphemes are "called" by their heads.
4.3. Late system morphemes
"Late" system morphemes contrast with early ones in that they satisfy different requirements and are accessed later in the production process. The information contained in late system morphemes is grammatical as opposed to conceptual. The two types of late system morpheme are not elected to complete a semantic and pragmatic feature bundle with their heads; rather, they are structurally assigned to indicate relations between elements when a larger constituent is constructed. We speak of the form of late system morphemes as structurally assigned because of their role and because all of the morphosyntactic information necessary for their projection is not available until directions are sent to the formulator to assemble larger constituents.(6) Their formal features are outlined in sections 4.3.1 and 4.3.2.
4.3.1. Bridge system morphemes. One of the two types of late system morpheme is a "bridge" for specific constructions. A bridge occurs when the structure of its maximal projection requires it. This means that both "early" system morphemes and "bridge" system morphemes depend on their maximal projection for their form. However, "bridge" system morphemes differ from "early" system morphemes in their relationship with their heads. Their form depends on the grammatical configurations that the language-specific grammar requires of that projection, not on the content morpheme that is the head of that maximal projection, as is the case with "early" system morphemes. That is, "bridge" system morphemes connect content morphemes with each other without reference to the specific semantic/pragmatic properties of a content head.
The preposition of in English is discussed in section 3.2 as a system morpheme. Under the 4-M model, it is clear that it is a bridge system morpheme. Semantics and pragmatics of the head have nothing to do with the relationship of of to its head; instead, the relationship is purely grammatical. For example, in English, when two nouns are adjacent (e.g. as in friend of Bora), one is the head of a grammatical configuration (here, friend) and assigns the role of theme to Bora as its complement. To be well formed in English, the complement must reflect its structurally subordinate status. Thus, in head--complement order, the complement is marked by the system morpheme of In complement--head order, the complement is marked with the possessive -'s (i.e. Bora's friend), also a bridge. Similarly, in French, a partitive construction involving beaucoup + N requires the preposition de in an invariant form before the noun, as in beaucoup de gens `many people'. In summary, the presence and the form of of, -'s, de, and other "bridge" late system morphemes satisfy the grammatical requirements of a structural configuration of the maximal projection itself.
4.3.2. Outsider system morphemes. The second class of late system morpheme contrasts with bridge system morphemes. We refer to this type as late "outsider" system morphemes. They depend on grammatical information OUTSIDE of the immediate maximal projection in which they occur. This information is only available when the formulator sends directions to the positional/surface level for how maximal projections are unified in a larger construction. For example, under analyses in which English third person singular -s is under AGR in INFL, the morpheme's form depends on coindexing with the subject NP. For this reason, -s is an outsider system morpheme.
4.3.3. Multimorphemic words. Obviously, some words are multimorphemic, composed of morphemes of different types. For example, German articles are multimorphemic. In addition to definiteness, gender, and number, they also encode case. Case assignment in German depends on information outside the maximal projection of the case-marked NP, and, thus, case is "late," affecting how determiners are realized.
For example, in (8a), den `the' in den Wagen `the car' is definite, masculine, singular, and accusative. While the notions of definiteness and number and gender are theoretically available within the NP, the determiner cannot be realized until the case is assigned by the motion verb gehen `go'. That is, the entire morphological complex behaves as if it were an outsider system morpheme. Likewise, some prepositions in German are also multimorphemic. In (8b), for example, the preposition zu has the dative feminine definite suffix -r. That is, the definite article must look outside of the NP `the school' to the dative preposition for its form. The entire preposition is not available until the level when the form of the suffix becomes available. Not only is case an outsider system morpheme, but the entire form zur behaves as an outsider.(7)
(8) a. Er geh-t in den Wagen. he go-3SG in the/ACC/MASC car/MASC `He is getting into the car.' b. Er geh-t zu-r Schule. he go-3SG to-DAT/DET/FEM school. `He is going to school.'
Some words include both types of late system morpheme. For example, consider the Swahili associative construction in (9). The associative forms w-a in (9a) and ch-a in (9b) consist of two morphemes, the bridge morpheme, which is invariant (-a), and an outsider morpheme that only receives its form when the larger NP constituent is assembled. In order for the outsider morpheme to be realized, it must look outside its own maximal projection (the PP associative -a + noun) to the head of the larger NP. In this example, the noun class of the head is what is relevant. That is, in (9a) w- agrees with the class three noun-class prefix m- from m-lango `door', and in (9b) ch- agrees with the class seven noun-class prefix ki- from ki-su `knife'.
(9) a. m-lango w-a ny-umba CL3-door CL3-of CL9-house `[the] door of [the] house' b. ki-su ch-a mw-alimu CL7-knife CL7-of CL1-teacher `[the] knife of [the] teacher.'
4.3.4. One form and multiple types. We have been at pains to argue that morpheme type depends on the mechanism and is not tied to a specific phonological shape. That is, a single surface form may be realized by more than one morpheme type, depending on the lemmas that speakers' intentions activate. Examples abound. The preposition down is a content morpheme in Stella stood down by the riverside, but it is an early system morpheme in Stella laid the bone down/Stella laid down the bone. Another more nefarious example is the determiner the. The article the is an early system morpheme when it encodes definiteness, as in Miles liked the book I gave him on his birthday. Yet the is a late system morpheme in some collocations. For example, in American English, Lena is at the store simply means Lena is shopping and does not pick out a specific store. It is simply a bridge (perhaps bleached from a once-early system morpheme).
4.4. Summary of morpheme types
There are four categories of morphemes in the 4-M model.
-- Content morphemes are activated at the lemma level and assign or receive a thematic role. They are "directly elected" by a semantic/ pragmatic feature bundle, mapping conceptual structure onto the lemma.
Three types of system morpheme make up the other two classes (i.e. early and late system morphemes):
-- Like content morphemes, early system morphemes are activated at the lemma level, but they do not assign or receive thematic roles. Such morphemes are "indirectly elected" because content morphemes "point to" them. They may be in a different lemma from the content morpheme pointing to them (e.g. regular plural -s in English) or in the same lemma (e.g. irregular plural in English).
-- There are two types of late system morpheme, bridges and outsiders. Neither type is activated at the lemma level, nor does either receive/assign thematic roles.
4.5. Related classifications
In our initial formulations of the 4-M model (e.g. Myers-Scotton and Jake 2000b) we often refer to the contrast between direct and indirect election of lemmas, following Levelt (1989) and Bock and Levelt (1994). However, as the 4-M model evolved, a fine distinction in our use of "indirectly elected" from their use has become clear. Bock and Levelt (1994) refer to some morphemes as indirectly elected in their discussion of lexical concepts. They state that indirectly elected elements are words that do not "correspond to lexical concepts." The example they give is to in listen to the radio. Clearly, Bock and Levelt had in mind some of the same morphemes that we call early system morphemes, but the match is not identical. First, they do not indicate what features define indirectly elected elements, so the extent to which they intended these elements to overlap with early system morphemes is not clear. (Related to this point, they refer to words, while the class of early system morpheme includes both words and affixes). Second, they come close to implying that indirect election involves structural properties. They do not identify indirectly elected morphemes except to say that "to does not represent a concept." The next sentence states, "Rather, the lemma for the transitive verb listen requires the preposition to, so the lemma to must be activated via an indirect route at the lemma level. We refer to this as indirect election" (Bock and Levelt 1994: 953). In contrast, the 4-M model views early morphemes not as structurally assigned, but as conceptually activated; late system morphemes, indeed, are structurally assigned. Thus, while the ideas of Bock and Levelt stimulated our formulation of early system morphemes ii1 the 4-M model, there are distinct differences in which morphemes we include under the rubric "indirectly elected" and in what their features are.(8)
The approaches of some other researchers to morphological issues show similarities to the 4-M model. For example, using "relevance to syntax" as a criterion for classifying different types of inflectional morpheme, Booij (1996) separates what he calls contextual from inherent inflection. Contextual inflection is "that kind of inflection that is dictated by syntax, such as person and number markers on verbs that agree with subjects and/or objectives, agreement markers for adjectives, and structural case markers on nouns." In contrast, inherent inflection is "the kind of inflection that is not required by the syntactic context, although it may have syntactic relevance" (1996: 2). Further, he notes that inherent inflection is more similar to derivation. Although the 4-M model rests on more specific, formal distinctions and is linked to levels of access in language production, there are clear resemblances to Booij's divisions. Booij seems to have in mind late outsider system morphemes when he refers to contextual inflection and early system morphemes when he discusses inherent inflection. Noting Booij's classification, Mithun (1999: 35) sees the system of tense agreement in Yup'ik as "a clear case of contextual inflection." Tense can be marked on Yup'ik nouns as well as verbs. She concludes that describing the Yup'ik system in this way solves a problem that "would be perplexing for traditional accounts of inflection" (1999: 43). That tense is marked on both verbs and nouns in Yup'ik is not a problem for the 4-M model because it does not depend on lexical category (such as N or V) for classifying inflections but makes distinctions based on the closeness of the morpheme to conceptual structure vs. the level of the formulator.
4.6. Implications for linguistic theory
By showing how the four-way classification of morphemes can predict distributions in three diverse sets of data, we demonstrate the usefulness of the 4-M model. However, what about the model's theoretical value? First, the model endorses the view of those syntacticians who argue for the autonomy of the lexicon and the grammar -- but with a reservation. Clearly, the notion in the 4-M model that not all morphemes are accessed at the same level in production supports the autonomy argument. That is, one premise behind the model is that the two types of late system morpheme are not salient until directions are sent to the formulator; this claim supports the idea that phrase-structure correspondences licensed by lexical entries are autonomous of the lexicon itself.
Second, the reservation: the fact that conceptually activated content morphemes are the stuff realizing predicate-argument structure belies the total autonomy of grammar from the lexicon. This leads to a related point. Rather obviously, because content morphemes and their early system morphemes are conceptually activated implies that intentions are most directly connected to contentful elements. In turn, this implies that these elements are activated first in language production. Thus, phrase structure is BUILT AROUND the content morphemes and their early system morphemes. This calls into question the idea that content elements are INSERTED into syntactic structures. And, in fact, this is Jackendoff's point when he writes, "Lexical items, then, are not `inserted' into syntactic derivations; rather they license the correspondence of certain (near-) terminal symbols of syntactic structure with phonological and conceptual structures" (1997: 89). While similar, this view is different from the minimalist position that lexical items are realized in a derivation under the operation merge (Chomsky 1995). Jackendoff also states, "a lexical item is to be regarded as a correspondence rule, and the lexicon as a whole is to be regarded as part of the PS-SS [Phonological Structure-Syntactic Structure] and SS-CS [Syntactic Structure-Conceptual Structure] interface modules" (1997: 89; italics in the original). That Jackendoff's view is compatible with the mechanisms underlying the 4-M model is suggested by his acknowledging that his notion of a lexical item as a correspondence rule is similar to Levelt's notion of a lemma (1997: 226, note 5).
Further, the interaction of some late outsider system morphemes with other morpheme types provides an argument against the view that lexical items are just inserted in a derivation. Because late system morphemes show coindexical relations across maximal projections, they do not become salient until directions to the formulator create structures larger than their own maximal projections. Consequently, if an outsider late system morpheme is part of a multimorphemic word that contains either a content morpheme or an early system morpheme, it blocks directions from these conceptually activated morphemes. They cannot realize other correspondences at the functional and positional levels (such as word order and phonological form) until larger constituents containing the information needed for the late outsider to be realized are available. In other words, the entire multimorphemic word patterns as a late system morpheme in its distribution. Such interaction belies the autonomy of the different levels of structures, specifically those directed by the lexicon and grammar. Recall the case of articles and case in German (den composed of an early system morpheme determiner plus an outsider accusative case suffix -n) and zur (composed of a content morpheme preposition zu, an early definite article, and an outsider dative case suffix -r) in 4.3.3. Producing such forms clearly requires grammatical information contained in constituents larger than a lexical item's immediate maximal projection.
5. Predictions of the 4-M model
While the 4-M model obviously classifies morphemes, it is primarily a model of how morphemes are accessed. Its concern is how abstract linguistic features are organized and, therefore, it is a model of abstract linguistic competence. The 4-M model also implies a link between language competence and production. Such implications lead to testable predictions regarding the distributions of morpheme types in linguistic phenomena. This paper tests some of these predictions for certain distributions in three types of data: code switching, agrammatic aphasia, and interlanguage in second-language acquisition.
Elsewhere we have indicated that the 4-M model implies explanations for other types of data, for example certain distributions in specific code-switching data sets and speech-error data on morpheme stranding (Jake and Myers-Scotton 1997; Myers-Scotton and Jake 2000b). The model should also have applicability to first-language acquisition data and other aspects of second-language acquisition data. It is being applied to other forms of language-contact phenomena, such as child bilingual attrition, creole formation, and language shift (Bolonyai 2000; Fuller 2000; Gross 2000; Schmitt 2000; Wei 2000a, all included in Myers-Scotton and Jake 2000a).
6. How the 4-M model explains double morphology in classic code switching
"Classic code switching" is defined as the use of morphemes from two or more linguistic varieties in the same intrasentential clause (CP), with the grammatical frame derived from only one of the participating varieties. Speakers are proficient enough in that variety to produce well-formed, monolingual utterances, and they are often proficient in the other participating variety as well. Our use of the term "classic code switching" recognizes the fact that there are also other types of code switching (e.g. code switching where the two varieties are not maintained as two distinct linguistic systems but instead show systematic and regular structural convergence; cf. Myers-Scotton 1998).
6.1. The matrix language vs. the embedded language distinction
The matrix-language frame (MLF) model explains classic code switching (Myers-Scotton 1993, 1997). The model rests on two hierarchies. The content versus system morpheme distinction is the first one (discussed in section 3). The second hierarchy refers to the asymmetry of the participating varieties: the variety setting the grammatical frame is designated as the matrix language and the other(s) are called the embedded language(s). A thorough understanding of the model is not necessary to explicate the data discussed in this paper. Here, it is sufficient to recognize that two principles, the morpheme-order principle and the system-morpheme principle, predict that the participating languages do not participate equally in framing a code-switched mixed constituent (consisting of morphemes from more than one of the participating languages). The principles predict that morpheme order and all of one class of system morphemes must come from only one of the participating languages, the matrix language (ML). In the original statement of the model, the system-morpheme principle states that "all system morphemes which have grammatical relations external to their head constituent ... will come from the ML" (Myers-Scotton 1993, 1997: 83). Under the new 4-M model, this class of system morpheme is more explicitly identified as the late outsider system morpheme. While other types of system morpheme may come from the embedded language (EL), in fact, almost all -- not just those required by the system-morpheme principle -- come from the matrix language. That is, the embedded language's main contribution to mixed constituents is limited to singly occurring "congruent" content morphemes. Embedded-language islands are also possible.(9)
6.2. CP as unit of analysis
Following the extended MLF model (Myers-Scotton and Jake 1995), the unit of analysis for bilingual speech is the CP (projection of the Comp node). CP refers to a constituent consisting of a proposition-expressing part plus an accompanying complementizer-like element that may or may not be null. "Complementizer-like element" should be "understood as any of the clause-peripheral words/particles/morphemes that are so common cross-linguistically and used with subordinate clauses or clauses with nonindicative mood" (Michael Hagerty, personal communication). The term CP is superior to "clause" because clause is ambiguous between something with the status of IP and something that includes the complementizer.
Examples (10), (11), and (12) present data from diverse data sets that support the MLF model's claims about the differential roles of the participating languages in structuring mixed constituents. In (10), the matrix language, Spanish, supplies the system morphemes and the morpheme order of the NP within the PP como un Pinocchio silly. However, the embedded language, English, can supply the adjective silly, a content morpheme. In (11) and (12), English is also the embedded language and therefore can supply the content morphemes ship and buy and father. Note that the verbs ship and buy receive all the requisite inflections from the respective matrix languages, Finnish and Swahili. Some of these inflections are agreement morphemes and show that the system morphemes that bear relations outside their head (the verb) come from the matrix language. This is in line with the system-morpheme principle.
(10) Spanish/English (Milan 1996) Oh, quieres quedar-te en la calle, oh want-2S/PRES remain/INF-2SOBJ in the street como un Pinnocchio SILLY? like a Pinnocchio silly `Oh, you want to stay in the street, like a silly Pinnocchio?' (11) Finnish/English CS (Halmari 1997: 63) Sie + lta + k + saa SHIPpaat vai? There + ABL + Q + you ship + VM/2SG or? `You are shipping from there, are you?' (12) Swahili/English (Myers-Scotton 1993, 1997: 87-88) Labda yeye ha-na vi-tabu maybe he 3S/NEG-ASSOC [have] CL8/PL-book vy-ake FATHER a-li-m-BUY-i-a CL8/PL-POSS/3SG father 3S-PST-3SIO-buy-APPL-FV `... Maybe he doesn't have his books [which] father bought for him ...'
6.3. Explaining double morphology
What does the 4-M model add to the MLF model in explaining classic code switching? The system-morpheme principle of the MLF model only identifies the type of embedded-language system morpheme that CANNOT occur in mixed constituents (i.e. late outsiders). Sometimes other embedded-language system morphemes do occur in mixed constituents. Often they are early system morphemes that occur with their embedded-language heads; this results in what we call double morphology, if the matrix language also supplies a system morpheme counterpart. Thus, in double morphology, the system morpheme on the embedded-language content morpheme is supplied by both the embedded language and the matrix language (see the discussion of double morphology in Myers-Scotton 1993, 1997: 132-135). For example, an embedded-language plural noun may appear with both an embedded-language marker for plural and also a corresponding matrix-language marker, as in (13), (14), and (15).
While such structures do not violate the system-morpheme principle, the original formulation of the MLF model does not explain them, either. The 4-M model implies an explanation. The model underlies the prediction that of all embedded-language system morphemes in mixed constituents, early system morphemes will be the most frequent. Two reasons are inferred. First, under the model, early system morphemes pattern with their heads (content morphemes) as conceptually activated. Second, in the production process, early system morphemes are activated at the lemma level because they are indirectly elected by their heads. It is an easy step to see that in code switching, when an embedded-language content morpheme is called, mistiming can occur and the embedded-language early system morpheme, as well as its head, can be accessed.
The fact that the 4-M model's classification is based on how morphemes are accessed leads to a prediction about double morphology. The prediction has several parts: (1) double morphology will only occur with embedded-language singly occurring content morphemes, and (2) it will only involve early system morphemes.
(I) Double-morphology hypothesis:
In mixed constituents in classic code switching, only embedded-language early system morphemes double system morphemes from the matrix language.
The fact that across diverse data sets the only system morphemes that double are early system morphemes supports this hypothesis. Examples (13)-(15) illustrate the most common type of doubling, doubling of the plural affix, an early system morpheme. Such forms are found in many data sets but are only quantified in one study to our knowledge, Bernsten (1990) on Shona/English code switching. Bernsten (1990: 82) reports that 69% (86/124 tokens) of English (embedded language) nouns receiving the ma-prefix marking Shona noun class six (a plural class) also received the English suffix -s (e.g. ma-day-s). A reanalysis of 20 of the 132 interviews making up Bernsten's corpus indicates that 69% (22/32) of the English nouns that are plural show double morphology, exactly the same percentage that Bernsten found for the entire corpus. That is, they received both ma- and English -s.(10)
Plural doubling examples cited here come from Shona/English code switching in (13), from Turkish/Dutch in (14), and from Russian/English in (15).
(13) Shona/English (Crawhall 1990; cited in Myers-Scotton 1993, 1997: 111) BUT ma-DAY-S a-no a-ya ha-ndi-si but CL6/PL-day-PL CL6-DEM CL6-DEM NEG-1S-COP ku-mu-on-a INF-3S/OBJ-see-FV `But these days I don't see him much anymore.' (14) Turkish/Dutch (Backus 1992: 90) POL-EN-lar-a Hollandaca ders verdi Pole-PL-PL-DAT Dutch [people] lesson give/PRET/3S `He taught Dutch to Poles.' (15) Russian/English (Schmitt 1999) U t-ebya ne ostalos' NOTE-BOOK-S-ov? with you-GEN/SG not remain note-book-PL-GEN/PL `You don't have any notebooks left?'
Examples (16) and (17) illustrate that other early system morphemes can be doubled. Example (16) comes from the bilingual production of a young Danish/English bilingual. Her matrix language is English, and so the English system morpheme the would be predicted; however, she also, produces the Danish suffix -n, the Danish realization of definite article. Example (17) comes from a corpus of Lingala/French code switching, for which Lingala is the ML. The infinitive marker is doubled in the form ko-comprend-re. The French early infinitive suffix is indirectly elected along with the content French verb morpheme. In Lingala, ko- is also indirectly elected; it is the prefix that signals membership in noun class 15, the class including all infinitives.
(16) Danish/English (Petersen 1988: 488) a. ... THE pige-n ... ... DEF girl-DEF ... `... the girl ...' b. ... THE navel-n ... DEF navel-DEF ... `... the navel ...' (17) Lingala/French (Bokamba 1988: 23)(11) L'HEURE ya kala TROIS QUART-S ya DET'hour ASSOC past three quarters ASSOC ba-JEUNE-S ba-za ko-COMPREND-RE CL2/PL-youth-PL CL2/PL-COP INF-understand-INF AVENIR te, ... future NEG `In the past, three-fourths of the young people did not understand the future, ...'
Data from diverse and numerous code-switching corpora support the double-morphology hypothesis. In all of the available data in the literature, if double morphology occurs, the only system morphemes from the embedded language that double are early system morphemes. In addition to the examples above in which both morphemes involved in doubling are early system morphemes, an example from English/Xhosa code switching offers another type of evidence supporting the hypothesis. In (18), the infinitive marker is doubled, but only the embedded language version of the marker is an early system morpheme; the matrix language version of the marker is not. However, the double-morphology hypothesis only requires that if doubling occurs, it is an embedded language morpheme that is the doubler. In this example, English is the matrix language and, accordingly, when an infinitive marker is required, it comes from English. In this example, the matrix clause control verb start projects a control CP as its complement with a matrix language infinitive marker (to u-ku-ntshontsh-a i-zi-moto `to [to] rob banks'). In English, the infinitive marker to is a late outsider system morpheme; the system-morpheme principle requires that such morphemes come from the matrix language. However, note that in this example, the Xhosa infinitive marker uku- is also present, doubling the English marker. In Xhosa, the infinitive marker is an early system morpheme, because it is an invariant marker of the noun class that includes infinitives.
(18) English/Xhosa (Kieswetter 1995: 82; cited in David Gough, personal communication) N. Phela [name], this thing is very serious NGOBA because they sniff glue and benzine. That's when they start to U-KU-NTSHONTSH-A PREPREFIX-PREFIX-steal-FV I-ZI-MOTO ... PREPREFIX-PREFIX-car ...
`N. Phela, this thing is very serious because they sniff glue and benzine. That's when they start to steal cars ...'
While double morphology may appear to be a competition between the matrix-language and embedded-language early system morphemes, it is not. It is only through mistiming that the embedded language supplies an early system morpheme along with the embedded-language content morpheme that is its head. The matrix language is always the "winner" because in its role as the source of the grammatical frame, the matrix language supplies a relevant early system morpheme. That is, the matrix language morpheme is the one that may govern relations external to its head within the CP. Thus, in (13), for example, the class 6/plural early system morpheme from Shona in ma-day-s governs agreement prefixes on the following demonstratives (a-no and a-ya). If one considers only the surface realizations without considering the asymmetry between the roles of the two languages, then the matrix-language and embedded-language early system morphemes appear to be in competition.(12)
7. Evidence for the 4-M model from Broca's aphasics
It is well known that patients with Broca's aphasia typically produce speech that lacks functional elements. Further, psycholinguistic researchers often point out that not all functional elements are equally absent or incorrectly produced in the speech of these aphasics. In this paper, we do not even begin to survey the vast literature on the nature of language production in Broca's aphasics. Rather, we only consider several representative studies to show how the 4-M model and the hypotheses following from it are relevant to explaining Broca's aphasia data more precisely.
The aphasic literature makes general observations corresponding to some of the same distributions that the 4-M model would predict. The description of Broca's aphasia by Caramazza and Berndt (1982: 484) is a classic characterization: "the Broca's aphasic typically produces short phrases made up primarily of substantive words -- concrete nouns, main verbs, and important modifiers ... Broca's speech is characterized as `agrammatic' because of the frequent omission of the grammatical `function words'." Among other elements, the "important modifiers" may well include the forms classified as early system morphemes in the 4-M model because of their conceptual role. These early system morphemes would contrast with "grammatical function words," late system morphemes.
Damasio (1992) discusses Broca's aphasia more in terms of the neural network that he sees as damaged in this impairment. He notes that "this is the network concerned with the relational aspects of language, which include the grammatical structure of sentences and the proper use of grammatical morphemes and verbs" (1992: 533-534). The 4-M model implicates late system morphemes, both outsiders and bridges, as expressions of the relational aspects of language.
In reference to how deficits in aphasia have been described, Friederici and Saddy, among many others, recognize the inadequacy of the traditional lexical vs. functional or open-class vs. closed-class distinctions: they write, "It is ... not adequate to treat the functional vocabulary as the complement set of the lexical category, that is, to suggest that anything that is not captured by the [+/-N], [+/-V] features is functional." They point out that "the reason this approach fails is because most of the functional vocabulary could be characterized as [-N, -V] and differentiating within the set would not be possible" (1993: 171).(13)
The 4-M model supports more precise predictions regarding production of morphemes by Broca's aphasics. There are two reasons for this. First, the model differentiates content vs. system morphemes along different lines from open-class and closed-class items. Second, it distinguishes three types of system morpheme, not all of them related to language production in the same way. Two predictions follow from the 4-M model: (1) some functional elements or closed-class items will occur in agrammatic aphasia simply because they are content morphemes under the 4-M model, such as personal pronouns in English; (2) the division of [+/- conceptually activated] in the 4-M model will mean that early system morphemes should pattern more like content morphemes and, therefore, be more accurate than either of the other two classes of system morphemes. This is because bridge and outsider late system morphemes are part of the structure-building apparatus.
(II) Aphasia hypothesis 1: In patients with Broca's aphasia, content morphemes will be more accurate than any type of system morpheme. (III) Aphasia hypothesis 2: In patients with Broca's aphasia, late system morphemes will be missing or less accurate than early system morphemes.
We will test these hypotheses against findings reported for several data sets, including corpora from English, Zulu, French, and German aphasics. In addition to citing the specific findings of some researchers, we reconsider the data from two studies in Menn and Obler (1990).
7.1. English aphasic data
English aphasics are among the most studied. Like other aphasics, even though they show the expected lack of functional elements, their impairment is selective. Under the 4-M model, this "selectivity" reflects the different types of system morpheme. For example, the 4-M model would predict Pulvermuller's findings that "the function items the, I, and and -ing are present in the spontaneous speech of most agrammatics" (1995: 177). These functional elements are a "mixed bag," but all are conceptually activated. That is, they are either content morphemes (I and and) or early system morphemes (the and -ing).
Menn (1990: 122-123) studied two English-speaking aphasics. The results from one subject support aphasia hypothesis 2.(14) That is, if one views Menn's data from in terms of the way morphemes are classified under the 4-M model, the distributions of the functional items studied receive an explanation beyond recognizing that certain inflections cause problems. Among the functional elements studied are verbal affixes. The subject (Mr. F) correctly supplied seven out of twelve (58%) of the third person singular morphemes required by context. In contrast, he correctly produced 24 out of 26 (92%) verbs with a required -ing suffix. Because third singular -s in English is a late outsider system morpheme, the 4-M model would predict a much less accurate production for -s than for the -ing progressive marker, which is an early system morpheme under the model.(15)
7.2. Zulu aphasic data
Data from aphasics speaking quite a different language, Zulu, also show that early system morphemes are more accurately produced than late outsider system morphemes (Herbert 1991). The prefix errors (agreement with a noun or a noun-class prefix) of nine Zulu-speaking aphasics were studied. The noun-class prefixes on the nouns are early system morphemes because they encode number and class (i.e. gender) features of the noun. Therefore, they are indirectly elected by the semantic/pragmatic feature bundle electing the noun stem. No noun-class prefixes on the nouns were absent, and they were also mostly correct (779/1180, or 66%).
As noted above, in addition to nouns having noun-class prefixes, most elements in a Bantu construction show agreement with nouns in that structure. For example, verbs and adjectives agree with nouns. These agreement prefixes are late system morphemes because they look outside of their own immediate maximal projection (e.g. AP or VP) to a noun for information about their form. For example, an INFL agreement "slot" on an inflected verb requires reference to its subject. In this data set, such noun-class prefixes on verbs are largely missing and often incorrect. The most common error in the data set was the occurrence of an incorrect noun-class agreement prefix on a verb with the correct noun-class prefix on a noun (62% of the errors, or 732/1180); see examples (19) and (20). An interesting point about this set of data is that in some noun classes, the early system morpheme (i.e. the noun-class prefix) and the late system morpheme prefixes (on adjectives, verbs, etc., agreeing with the noun) are homophonous.
(19) (Herbert 1991: 115-116) um-fana ba-phethe um-khonto NC1-boy NC2-has NC3-spear `the boy has a spear' Target: um-fana u-phethe um-khonto (1991: 115) (20) I-hashi u-gijima kakhulu NC5-horse NC 1-run quickly `the horse is running quickly' Target: i-hashi li-gijima kakhulu (1991: 116)
Note that the noun-class information is available from the subject NP immediately preceding the verb in such examples. However, the aphasics fail to make use of this information.(16) In sum, these Zulu aphasics do show errors in their use of inflectional morphemes, but their early system morphemes (noun-class prefixes) are much more accurate than their late system morphemes (agreement prefixes). Given the difference that the 4-M model claims for how these two types of system morpheme are elected, this difference in accuracy is what the model would predict. These findings support aphasia hypothesis 2: late system morphemes are less accurately produced than early system morphemes.
7.3. French aphasia data
One set of French data from Jarema and Friederici (1994) contrasts accurate production of articles (early system morphemes) with that of clitic pronouns (outsider late system morphemes), testing aphasia hypothesis 2. Whey compare the production of the French definite articles le and la with the homophonous object pronominal clitics in agrammatic aphasia. In terms of the 4-M model, French articles are early system morphemes, indirectly elected by the gender and number features of their noun. Gender and number features are all the information that is necessary to activate them; therefore, they differ from homophonous clitic pronouns (i.e. le vs. le) that require information outside their maximal projection to be activated. This information comes from the subject or object of the verb (i.e. the subject/object AGR slots in INFL).
While, understandably, Jarema and Friederici do not discuss their results in terms of the 4-M model, their results support aphasia hypothesis 2. Jarema and Friederici note that there was a significantly higher percentage of errors with object le clitic pronouns vs. definite NP objects with le. Compare the indirectly elected early system morphemes in (21) with the late structurally assigned system morphemes in (22) (Jarema and Friederici 1994: 687).
(21) Definite determiner le le soldat quitte le poste `the soldier is leaving the post' (22) Object clitic le le soldat le quitte `the soldier is leaving it' (antecedent: le poste `the post')
These results were obtained in a comprehension task in which the experimenter read one of two sentences to subjects who were then required to select one of two pictures as appropriate to the sentence presented to them. Jarema and Friederici report that "sentences containing a pronoun yielded overall more errors than did sentences with object noun phrases featuring the article (35% [errors with le as a clitic] vs. 13.13% [for le as a definite article] mean error rate)" (1994: 688). They note that "four of five patients showed a significantly higher percentage of errors on pronouns than on articles" (1994: 688). In addition to the [X.sup.2] test on individuals, the entire group was subject to a Mann-Whitney U test (p [is less than or equal to] 0.01). The following quotations are representative of the conclusions of Jarema and Friederici: "the ability to process gender-marked articles is generally well preserved in French-speaking agrammatic patients" (1994: 683); and "the results obtained ... reveal a distinction between the aphasic subjects' ability to process gender information in articles and their ability to process the same information encoded in pronouns" (1994: 689).
Jarema and Friederici link their results to the notion of thematic-role assignment; those morphemes that are not linked to thematic-role assignment are processed more accurately. One inference from their explanation would be that linking to thematic-role assignment proves difficult for agrammatic aphasics. A telling observation regarding agrammatic aphasia mitigates against this and suggests that another explanation is needed: such categories as main verbs (they assign thematic roles) and adjectives (they can assign thematic roles or be part of a larger NP that receives a thematic role) are more accurately produced by aphasics than any other category (cf. the many studies in Menn and Obler 1990).
Instead, these differential responses to the early system morpheme definite article le and the late system morpheme object clitic le can be explained more precisely by the 4-M model and in a way linking them to distributions in the other data considered in this paper. Under the model, French clitic pronouns are late outsider system morphemes coindexing null pronominals instead of overt elements in argument positions. These clitics refer to arguments whose identities have been previously established in the discourse. These clitics are late system morphemes because they are not realized until the agreement morphology under the inflected verb (under INFL) is spelled out (cf. Jake 1994). In contrast, French definite articles are early system morphemes. As such, definite articles enable the referent of a particular semantic/pragmatic feature bundle to be identified. They include information about definiteness, plurality, and specificity. Again, the difference in when and how early and late system morphemes are elected offers an explanation for the difference in their accurate production. Outsider late system morphemes are more vulnerable to errors for two reasons: they are less linked to speakers' intentions (they are only structurally assigned) and their form depends on the speaker's ability to put together information from several sources in the construction of a larger constituent (they must look outside their own maximal projection for some of this information).
7.4. German aphasia data
In a study of German aphasics, Friederici et al. (1992) conducted two experiments; we discuss only the second. This experiment tested the ability to process information about person, number, and gender in a particular verb inflection. The authors compared the response times of Broca's aphasics with age-matched normals (as well as other aphasic controls) in identifying a target word in grammatical and ungrammatical sentences. The ungrammatical stimuli either omitted verb endings or had incorrect verb endings. The results were as predicted. Age-matched normal controls took longer to process ungrammatical stimuli, whereas Broca's aphasics did not. Examples (23) and (24) illustrate sample grammatical and ungrammatical stimuli (Friederici et al. 1992: 762).
(23) Grammatical: Sie hatte ihn lange Zeit beobachtet, doch er TANZTEE nur mit alteren Damen `After watching him a long time, she saw that he danced[3S] just with older women' (24) Ungrammatical: Sie hatte ihn lange Zeit beobachtet, doch er TANZTEST nur mit alteren Damen `After watching him a long time, she saw that he danced[2S] just with older women'
Table 1 summarizes the reaction times for the Broca's aphasics and age-matched controls. "Age-matched controls showed a significant grammaticality effect with faster lexical decision times in the grammatical than in the ungrammatical condition ... Broca's aphasics showed no grammaticality effect" (1992: 759). The data showed "Broca's patients in contrast do not demonstrate a reliable sensitivity to the grammatical information given by the inflectional morpheme in its on-line task. Word monitoring times are not affected by the grammatical status of the preceding inflectional element" (1992: 759).
Table 1. Summary of reaction times in msec: aphasics vs. age-matched nonaphasics
Grammatical Ungrammatical stimulus stimulus Age-matched controls 317 352 Broca's asphasics 474 478
Source: Friederici et al. 1992.
The authors link the results to an inability to access "the full syntactic information encoded in closed-class elements in a fast and obligatory way ..." (1992: 760). Although the aphasia hypotheses above do not address on-line processing by aphasics, these results are consistent with the 4-M model without having to posit an impairment specifically affecting response time. That is, the 4-M model would predict that late system morphemes are most vulnerable to disruptions because their form refers to relational information outside their immediate maximal projection. Aphasia hypotheses 1 and 2 predict that aphasics produce late system morphemes least accurately. A corollary prediction would be that aphasics will have problems differentiating late system morphemes. Therefore, grammatical and ungrammatical late system morphemes may be processed similarly, as they are by these German aphasics.
7.5. Reanalysis of data from two French aphasics
In this section, we present our most extensive reanalysis of data from the literature, looking at two French aphasics studied in Nespoulous et al. (1990). The reason for the reanalysis is that the authors' analysis is based on lexical categories, as opposed to the feature-based classification of the 4-M model. We studied the transcripts in their study and reclassified determiners, pronouns, verbs, and prepositions. While the results of the original analysis of Nespoulous et al. also support the claims of the 4-M model, the reanalysis of the data gives a clearer picture of the source of errors and allows for a more precise testing of the predictions of the 4-M model. Our prediction is that a reanalysis going beyond "functional categories" will show that aphasics will produce content morphemes and early system morphemes relatively accurately, but will have trouble with late system morphemes, especially outsider late system morphemes. In the discussion below, our statistics are somewhat different from those reported in Nespoulous et al. First, we do not reanalyze all of their data. Second, we discarded false starts that were repaired.
7.5.1. Determiners and pronouns. We can test both aphasia hypotheses by comparing accuracy for full pronouns (content morphemes), determiners (early system morphemes), and pronominal clitics (late outsider system morphemes). Although determiners, such as definite articles and possessives, show agreement with their head nouns, they do not look outside of the NP for this agreement. Therefore, they are indirectly elected early system morphemes in French, as they are in English. As such, they are conceptually activated. Thus, we expect the definite articles (le, la, and les) and possessive adjectives (e.g. ma, mon, mes) to be largely intact. Table 2 gives the combined results for definite articles and possessive adjectives.
Table 2. Correct production of selected elements under determiner by French-speaking aphasics (two types of early system morphemes)
Definite Possessive articles adjectives % (n) % (n) Mme. Auvergne 86 (85/99) 92 (12/13) M. Clermont 78 (78/99) 46 (6/13)
For one patient, Mme. Auvergne, 79.5% (99/112) of her determiners were correct. Of the incorrect determiners, only one was missing. The second patient, M. Clermont, was less accurate. However, 78% of his definite articles (78/99) were correct. Out of the 21 errors, seven articles were missing. Another category of determiners is the possessive adjective. These were largely correct for Mme. Auvergne (12/13, or 92%). However, M. Clermont only produced 6/13 correctly (46%).
The relative correctness of the determiners contrasts with the subjects' production of clitic pronouns. In addition, the production of clitic pronouns contrasts with the subjects' production of pronouns that are content morphemes (e.g. moi). The findings regarding pronouns support aphasia hypothesis 1: content morpheme pronouns are entirely intact in the data for both patients, but late system morpheme pronouns are not, especially object clitics.
188.8.131.52. Mme. Auvergne's pronouns. Eight out of eight of Mme. Auvergne's content pronouns are correct (100%). Out of 110 clitics, 93 are correct, almost 85%. Interestingly, one subset of these clitics, the object clitics, is less accurately produced than the subset of subject clitics. For object clitics, only 15 out of 22 (68%) are correct; for subject clitics, 66/67 (98.5%) are correct. The 4-M model would predict this disparity: a much larger grammatical frame referring to a null object must be constructed in order to correctly select a preverbal clitic. There is a third category of pronoun in French that is also a late system morpheme: dummy pronouns such as il in il pleut. Mme. Auvergne produces only 12 out of 21 required dummy pronouns correctly (57%).
184.108.40.206. M. Clermont's pronouns. The more impaired subject, M. Clermont, does less well with pronouns than Mme. Auvergne. Still, he does well with content pronouns (six out of six are correct). Overall, he produces 65.9% of required late clitic pronouns correctly (56/85); 29% (25/85) are omitted, and 4.7% (4/85) are incorrect. If we break down the late pronouns into three types, M. Clermont is relatively accurate with subject clitics; 91% are correct (41/45). However, only 36% of his object clitics are correct (13/36). Fifty-three percent (19/36) of object clitics are missing. M. Clermont produces only two out of four required dummy pronouns (50% correct). Table 3 summarizes the results for the different types of pronouns. See also Table 6 for a comparison of pronouns with other morphemes.
[TABULAR DATA 3 and 6 NOT REPRODUCIBLE IN ASCII]
7.5.2. Verbs. Nespoulous et al. divide verbs into three categories: main verbs, "have" and "be" as main verbs, and auxiliary verbs. Both auxiliary verbs and the "have" and "be" verbs can be classified as late system morphemes. Both avoir and etre in French function as nonthematic role-assigning copulas, that is, as late system morphemes whose form depends on information not available until subject--verb agreement is available. Under our analysis, thematic role-assigning verbs, such as mange in il mange, are content verbs; however, avoir and etre are classified under late system morphemes.(17)
Both patients' production of main verbs that are content morphemes were very accurate, and more accurate than that of any of the system morphemes, as aphasic hypothesis 1 predicts. Mme. Auvergne produced 92/96 main verbs correctly (96%) while M. Clermont produced 120/131 of his main verbs correctly (92%). M. Clermont produced 7/14 of the "have"--"be" verbs correctly and 10/20 of the auxiliary verbs correctly (50%). Mme. Auvergne produced 22/23 "have"--"be" (96%) verbs and 27/35 (77%) other auxiliary verbs correctly; in total, 49/58 of her late system morpheme verbs were correct (84%); see Table 4.
[TABULAR DATA 4 NOT REPRODUCIBLE IN ASCII]
7.5.3. Prepositions. Both subjects produced prepositions that are either content morphemes (e.g. avec "with") or early system morphemes (e.g. de in souvenir de "remember") relatively correctly. However, as might be predicted, M. Clermont showed greatest impairment in producing prepositions that are bridge late system morphemes, such as partitive de. Mme. Auvergne produced 97% of required prepositions correctly (66/68). Of those that were content morphemes, 33/34 were correct (97%). She also omitted one bridge late system morpheme (16/17, 94%). All early system morphemes were correct (17/17). M. Clermont was less correct. Out of 101 required prepositions, 80 were correct (75%). However, his content morphemes were largely correct (28/30, 93%). Furthermore, his early system morphemes were relatively correct (24/30, 80%). These early system morpheme prepositions include pour plus infinitives (since these prepositions are indirectly elected by the purpose infinitive clause). An example of one omitted early system morpheme is de in Ca je [me] souviens [d']une chose, pas deux `[that] I only remember one thing, not two'. M. Clermont makes most of his errors with bridge late system morphemes. In par l'hopital [de] Montlucon `through the Montlugon hospital', de is an omitted bridge system morpheme. Only 27/46 are correct (59%); see Table 5.
[TABULAR DATA 5 NOT REPRODUCIBLE IN ASCII]
7.5.4. Summary of reanalysis. Table 6 summarizes the relative accuracy of the production of selected morphemes by Mme. Auvergne and M. Clermont. Because there are prepositions in all the different types of morpheme, they offer a good test for the predictions of the 4-M model. And, as predicted (aphasia hypothesis 1) those prepositions that are content morphemes are more accurately produced than any class of system morphemes. Further, aphasia hypothesis 2 is supported. Recall that among system morphemes, early system morpheme prepositions were all correct for Mme. Auvergne. As predicted, late system morphemes (bridges) caused more problems, especially for M. Clermont. For the more impaired M. Clermont, 59% correct for bridges contrasts sharply with the 80% he had correct for early system morphemes.
A statistical test (a logistic regression analysis) shows that the difference in accuracy of production of the three types of morpheme is statistically significant. When accuracy of late system morphemes is compared with early system morphemes, p [is less than or equal to] 0.0255. When the accuracy of late system morphemes is compared with content morphemes, the difference is significant at p = 0.0001. If the accuracy of early system morphemes is compared with that of content morphemes, the difference is significant at p [is less than] 0.0001.(18)
The reanalysis of the results from these diverse studies and others not mentioned here provides support for the validity of the divisions under the 4-M model. They are of even more interest to researchers studying language production and processing because they indicate that agrammatic aphasia is more of a production problem than a more general problem of lexical access. We are aware that others have discussed the merits of the production vs. syntactic knowledge issue as an explanation for aphasic speech and have reached similar conclusions. However, their evidence typically conflates different categories of morpheme across classes and is thus less discriminating in its categories of analysis. For this reason, the possible results of such studies are more open-ended because the configurations of data do not rule out other possible interpretations of the data (e.g. Shankweiler et al. 1989).
The analysis here implicates differences in levels of production (i.e. morpheme activation) as critical to predicting relative accuracy of production by agrammatic aphasics. This, in turn, supports psycholinguistic models of incremental access and constituent construction (cf. Green's inhibitory control model in Green 1998). The formulator as a critical mechanism is implicated. That is, lexical realization appears to be of (at least) two distinct types (conceptually activated and structurally assigned).
8. Evidence for the 4-M model from SLA data
For many years, researchers in second-language acquisition have studied relative accuracy in production of morphemes. They have argued that accuracy implicates order of acquisition, but few general principles explaining accuracy have been offered (cf. Bailey et al. 1974; Dulay and Burt 1974; and Cook 1993 for references to stages of acquisition; cf. also the discussion of stages in Klein and Perdue 1993). The 4-M model can make principled predictions about accuracy. Under the assumption that learning a language depends on mapping conceptualizations onto an abstract lexicon and the grammar it projects, the prediction is that directly elected content morphemes are acquired accurately before system morphemes. Further, within the class of system morphemes, early system morphemes are expected to be more accurately produced (and acquired) before late ones. Finally, because bridge morphemes do not depend on relations outside their own maximal projection for information about their form, they are produced more accurately before outsiders (Wei 1996, 2000a, 2000b; Jake 1998). These predictions are formalized in the SLA hypothesis:
(IV) SLA hypothesis:
In interlanguage, the order of accurate production of targetlanguage morphemes reflects the four-way classification of the 4-M model:
a. Content morphemes are acquired before both types of system morpheme.
b. Within the class of system morphemes, early morphemes are more accurately produced than late ones.
c. Within the class of late system morphemes, bridge morphemes will be more accurately produced than outsider morphemes.
8.1. Three categories of pronoun
The following analysis is based on a study reported in Wei (1996). Wei studied the accuracy of production of adult Chinese and Japanese learners of English at three different stages. For each subject, he collected comparable data via directed conversation in informal interviews and picture description tasks.
Table 7 reports Wei's results for pronouns. The prediction is that conceptually active morphemes will be produced more accurately than those that are structurally assigned. As these data show, the conceptually active possessives (e.g. my, his) pattern with directly elected content pronouns (e.g. I, he, him) more than they do with the dummy pronouns.
[TABULAR DATA 7 NOT REPRODUCIBLE IN ASCII]
According to the 4-M model, English personal pronouns and demonstrative pronouns are content morphemes because they occur in a thematic-role-receiving argument position. From the prebasic stage onward, these forms are produced with a high degree of accuracy: 92% by Chinese learners at the prebasic stage, and 99% (personal pronouns) and 94% (demonstrative pronouns) by Japanese learners at this stage. Note that by the basic stage, both groups show almost 100% accuracy.
Pronouns that are early system morphemes (here, possessive pronouns) contrast sharply in accuracy of production with late system morphemes (here, dummy pronouns). For example, while prebasic Chinese learners produce 112/136 or 82% of the possessive pronouns accurately, they produce only 15/48 or 31% of cases of dummy it accurately and even fewer of dummy there forms (4/48 or 8%). The accuracy of both groups of learners increases dramatically by the basic stage, but their production of dummy pronouns remains far below that of other pronouns until they reach the beyond-basic stage. Dummy pronouns are late system morphemes because they depend on information outside of their NP. For example, dummy it only occurs with verbs that do not assign an external thematic role, as in it seems to be raining; dummy there depends on the presence of any form of be, as in there was a man killed last week. (Note that the classification represents a partial reanalysis of Wei's divisions.)
The results in Table 7 are statistically significant. A logical regression shows that when the production of dummy pronouns (late system morphemes) is compared with the production of possessive pronouns (early system morphemes), p [is less than or equal to] 0.0004.(19) The differences in production of pronouns that are content morphemes (independent personal pronouns and independently occurring demonstratives) with the production of possessives is also significant; p [is less than or equal to] 0.0082. (For personal pronouns, p = 0.0082; for demonstrative pronouns, p = 0.0006.)
Perhaps the most telling evidence that all bound morphemes do not group together is the difference in accuracy of production of three homophones in English, -s plural, -'s possessive, and -s present tense. Instead, these morphemes do distribute themselves according to the classification of the 4-M model. The model predicts that early system morphemes, such as -s plural, should be more accurately produced earlier by SLA learners of English than late system morphemes. Further, in line with the SLA hypothesis, bridge late system morphemes (possessive -s) should be more accurately produced than outsider late system morphemes (present tense -s).
Wei's data support these predictions.(20) See Table 8. For the prebasic learners, the morpheme that is conceptually activated (-s plural) is far more accurately produced than the structurally assigned late system morphemes (either the bridge's possessive or the outsider -s 3rd person singular). Within the class of late system morphemes, bridges are more accurate than outsiders. Compare, for example, Japanese learners at the basic stage: while all of their bridge morphemes are correct (admittedly only 4/4), only 5/37 or 14% of their outsider morphemes are correct. Interestingly, for the basic learners, bridge system morpheme accuracy approaches or matches that of early system morphemes. By the time learners reach the beyond-basic stage, differences in accuracy across morpheme type have leveled out.
[TABULAR DATA NOT REPRODUCIBLE IN ASCII]
The differences in production of these different -s morphemes are statistically significant. A logical regression shows that when the production of the 3rd singular verb ending, a late outsider system morpheme, is compared with the possessive late bridge or the early plural system morpheme, p [is less than or equal to] 0.0004. The test results also show that while the Japanese first-language speakers were frequently more accurate that the Chinese first-language speakers, the difference between the two groups of speakers is not significant; p = 0.5719.
This paper has argued that a classification of morphemes highlighting abstract lexical features, not form classes, offers insights into how morphemes are distributed (in language production) that go beyond previous explanations. These features link competence and performance in that they take account of both how the elements underlying morphemes are organized in linguistic competence and when they are accessed in production. This resulting classification is called the 4-M model. The model refers to four types of morpheme, content morphemes and three types of system morpheme or functional element. However, we have been at pains to argue that the model is less a taxonomy and more a window with a view of how morphemes are elected. Because how election happens is important, the lemma entry underlying the same surface forms may be elected by more then one mechanism. The result is that these forms represent different types of morpheme. It should be evident that the election status of morphemes encoding various relationships can vary cross-linguistically.
There are three ways in which morphemes are elected. Content morphemes are directly elected at the conceptual level when intentions give rise to semantic-pragmatic feature bundles that point to the lemmas in the mental lexicon underlying content morphemes. Then at this lemma level, there is lemma-to-lemma activation as these lemmas indirectly elect early system morphemes to flesh out the intended meanings called by intentions. The essential information necessary to arrive at their surface forms (except for phonological information) is available for content morphemes and early system morphemes at this level. In contrast, although the information needed to realize late system morphemes is present at the lemma level, this information is not salient until directions are sent to the formulator level on how to assemble larger constituents. Thus, content and early system morphemes are conceptually activated, but late system morphemes are not. This production scenario is reflected in the formal definitions of the four morpheme types.
While other ways of classifying morphemes, such as open-class and closed-class words or thematic vs. functional elements, have their merits in organizing some linguistic data, they do not capture generalizations in the data that we study here. For this reason the 4-M model deserves attention. While obviously the form classes of the other classifications have some linguistic reality, the 4-M model is based on the premise that abstract lexical features determine many basic configurations of language that do not necessarily align themselves with form-class distributions. For example, descriptions of the morphemes produced accurately by Broca's aphasics cuts across form-class memberships, including some closed-class words and inflections along with open-class thematic elements. The distributions of data reported here, doubling of some inflections in code switching, choice of morphemes most impaired or missing in Broca's aphasia, and relative accuracy of production of English elements in second-language acquisition, receive plausible explanations under the 4-M model.
In general, the explanations are the following. First, consider code switching. Even though the MLF model only explicitly predicts that late outsider system morphemes from the matrix language must occur in mixed constituents, that all system morphemes will come from the matrix language is implied. Then, why is it that early system morphemes from the embedded language can double system morphemes from the matrix language, as in the encoding of plural? It is easy to see how, through mistiming, an early system morpheme is accessed along with its embedded-language content morpheme head -- even though the frame of the constituent -- because it is under matrix-language control and therefore prefers matrix-language system morphemes -- does not call for an embedded-language inflection.
Second, what is the basis for referring to the speech of Broca's aphasics as "agrammatic" if not all functional elements are equally missing? When the speech of Broca's aphasics is analyzed in terms of the 4-M model, it is clear that more late system morphemes than early system morphemes (or content morphemes) are missing or incorrect. Why should late system morphemes give Broca's aphasics more trouble? The 4-M model offers two related answers. Late system morphemes are vulnerable because they are not conceptually activated; that is, the speaker can convey intentions (although not relations between elements) without them. Also, because their activation requires information outside their heads (when larger constituents are assembled late in the production process), they are vulnerable to disruption. Thus, the 4-M model supports previous ideas that Broca's aphasia is a problem with the production process (cf. Menn and Obler 1990: 6-8 for an overview of theories of production).
Third, why are some inflectional elements and closed-class words more accurate in the interlanguage of adult second-language learners than others? Similar to the case with Broca's aphasics, the 4-M model predicts a hierarchy of accuracy for morphemes that reflects their election: content morphemes will be most accurate followed by early system morphemes, with late system morphemes least accurate. Because late system morphemes lack conceptual activation, the model also predicts the most problems with them. They also cause problems because their form depends on information only available when larger constituents are assembled at the formulator level; thus, learning these morphemes depends on speakers' putting together a number of pieces of information.
One goal of this paper has been to report empirical evidence supporting the 4-M model's predictions for three different types of data: double morphology in code switching, morpheme loss in Broca's aphasia, and relative production accuracy in second-language acquisition. The distribution in these three data sets imply differences in the production process for morphemes. A second goal of this paper is to show how the 4-M model provides a scenario explaining these differences. Its premises are that entries in the mental lexicon are accessed via different mechanisms and that the morphemes they support become salient at different levels. The result is a four-way classification of morphemes. Different types of morpheme are accessed differently; that is, a morpheme's status in the grammar of a language affects its place in the process of production. We certainly do not claim to "solve" all the puzzles these data present; yet, clearly, the distributions in these data can be explained in general in terms of the 4-M model. Further, the fact that these diverse data lend themselves to very similar explanations under the 4-M model suggests that the model captures a universal and important generalization about the nature of language.
Received 19 April 2000 Revised version received 31 August 2000
University of South Carolina Midlands Technical College
(1.) We thank Louis Boumans, David Green, and Dan Slobin for helpful comments on earlier versions of this paper. We are especially grateful to Holmes Finch of the University of South Carolina Statistics Lab for assistance in statistical analysis. Correspondence address: Carol Myers-Scotton, Linguistics Program, c/o English Department, University of South Carolina, Columbia, SC 29206, USA. E-mail: email@example.com and firstname.lastname@example.org.
(2.) For example, the model presented here accounts for how functional elements are distributed in code switching in a more precise way than the earlier formulations of the matrix-language frame model do (Myers-Scotton 1993, 1997; Jake 1994; Myers-Scotton and Jake 1995). See Myers-Scotton and Jake (2000b) for a more complete treatment of code switching. The model is also relevant to explanations of distributions in other types of language-contact phenomena (cf. Myers-Scotton and Jake 2000a).
(3.) While discourse elements are closed-class elements, we argue that they are content morphemes at the discourse level. We are under no illusions that they participate in the thematic grid of the IP; however, it can be argued that they assign discourse-relevant thematic roles at the discourse level. Discourse elements clearly restrict the interpretation of CP or other phrasal categories they head.
Note that some subordinators are obviously discourse markers since they signal contrast or reason, although they may be realized as part of a larger lexeme combined with system morphemes, as in the Arabic subordinators that agree with their complement subject (e.g. li?anni `because/1SG'). However, we argue that such subordinators including relative pronouns also structure the discourse. Consequently, such subordinators are content morphemes (cf. Myers-Scotton 1997). Emphatic or topicalizing pronouns make up a third type of discourse element. Emphatic pronouns may or may not participate in the thematic grid mapping thematic roles to predicate-argument structure. Under the 4-M model, discourse elements are content morphemes. Not only do subordinators such as because or if assign reason or condition, discourse elements such as even and yet establish contrast and other emphases. Neither of these types participates in the thematic-role argument structure of a CP.
(4.) While the assigning and receiving of thematic roles appears at first glance obvious, there are many unsettled issues. Some of these issues have to do with which form classes can assign or receive thematic roles. One issue is indirect assignment of thematic roles, as in the case of object clitics (see note 5). Under the 4-M model, another type of pronoun that receives its thematic role indirectly is the English possessive adjective. In my friend, the possessive adjective my resembles a definite determiner by more uniquely identifying the possessed NP. We take the position that possessive adjectives receive their own thematic roles indirectly from another NP in the larger discourse, in this case the speaker.
Elsewhere in the literature, there are various positions taken regarding the assignment of possessive thematic roles in NP's NP constructions (e.g. Bora's bone). For example, if -s is the head of a Spec-GenP, then the possessor's thematic role is mediated by the functional head (-s). We do not accept this analysis. Another analysis assumes that the possessor role is assigned directly to a possessor NP (Bora) but does not discuss possessive adjectives such as my. We thank Michael V. Hegarty for bringing this to our attention.
(5.) For example, while the personal pronouns in English receive thematic roles and occur in a grammatical argument position (thus, they are content morphemes), the clitic personal pronouns in French do not (see Jake 1994). In contrast, French emphatic pronouns can occur in thematic-role-receiving positions and are content morphemes. A French clitic pronoun is coindexed with a null morpheme that does receive a thematic role (see also the various discussions in Jaeggli and Safir 1989). While clitics receive a thematic role indirectly, it can be shown that they do not behave syntactically like content morphemes that receive thematic role directly. Compare je with moi or any noun in (i). A content morpheme (moi or Jean) can occur after a thematic-role-assigning preposition, such as avec `with'. The difference between subject clitics and nouns is also illustrated in (ii). In inversion, the clitic follows the inflected verb; the content morpheme Jean cannot. Compare (iib) with (iiib) and (iiic).
(i) Les etudiants sont alles avec moi/Jean/*je/*me.
`The students have gone with me/Jean/*I/*me'
(ii) a. il a du pain. 3S/M has/3S PART bread `He has some bread.' b. a-t-il du pain? have-3S/M PART bread `Does he have some bread?' (iii) a. Jean a du pain. Jean have/3S PART bread `Jean has some bread.' b. *A Jean du pain? c. Jean, a-t-il du pain? have 3S/M PART bread `Does Jean have any bread?'
(6.) See Myers-Scotton and Jake (2000b) for a discussion of how the interaction of late system morphemes and early system morphemes of German determiners makes a difference in permissible outcomes in code-switching data.
(7.) Indirect support for the notion that some morphemes, i.e. late system morphemes, are not accessed at the same time as their content heads comes from speech errors. Vigliocco and Zorzi (1999: 59) note that "word exchanges are virtually absent." This suggests to them that "fully inflected words are not lexical units of encoding." However, we do not agree with all of their interpretations. For example, they state that in the production process, "features such as number and gender of nouns would be specified during the construction of the corresponding noun phrases, not at the lexical level." The semantic/pragmatic features they exemplify, that is, number and gender, are examples of early system morphemes under our model. As such, under the 4-M model they ARE indirectly elected at the lexical level; that is, it is OnLY late system morphemes that are not salient at the lexical level.
(8.) While we follow Levelt's ideas about language production for the most part, we do not agree with some of the views expressed in Levelt et al. (1999) on how function words are accessed. For example, they point out that "the theory does allow for the selection of function words on purely syntactic grounds" (1999: 4). However, the problem is that under the 4-M model so-called function words are accessed in different ways, not just on "purely syntactic grounds." First, some function words are, in fact, content morphemes. Levelt et al. exemplify their point with a complementizer (that as in John said that ...). Under our analysis, English complementizers are content morphemes and are therefore directly available at the lemma level when a corresponding semantic-pragmatic feature bundle is selected at the conceptual level. Second, some function words are early system morphemes; that is, they are indirectly elected by their content morpheme heads (e.g. the as in I want the book I saw yesterday). Third, still other function words are only accessed at the level of the formulator when they are structurally as larger constituents are assembled (e.g. of as in bone of Bore). Because there are different types of "function words," only those that are structurally assigned might be referred as selected on "purely syntactic grounds."
(9.) Embedded-language islands are constituents that are well-formed in the embedded language and do not necessarily follow the requirements of the matrix language. For example, in the embedded-language island in (iv), our brother, the order of the possessive and noun is determined by the embedded language, English; Swahili would call for the noun to precede the possessive.
(iv) Swahili/English (Myers-Scotton 1993, 1997: 141) tu-na-m-let-e-a OUR BROTHER w-a Thika 1PL-PROG-3SO-take-APPL-FV our brother CL1-ASSOC Thika `We are taking [it] to our brother of [in] Thika'
Further, a strong confirmation of the predictions of the MLF model about embedded languages is found in Shona/English subsample relevant to double morphology in section 6.3. All examples of English nouns with English -s, but without Shona ma-, were in embedded-language islands (e.g. about thirty minutes) or internal embedded-language islands (e.g. pa-twelve miles). (See Myers-Scotton 1993, 1997:152-154 on internal embedded-language islands.) This is entirely in accord with the definition of embedded-language islands: all material must be well formed in the embedded language and therefore any inclusion of affixes from the matrix language (Shona in this case) is prohibited.
(10.) In this same subsample of 20 interviews, only 12% (25/202) of English nouns that are attested as borrowed forms into Shona showed double morphology; that is, they were marked with Shona ma-, but not with English -s (e.g. mu-ma-supermarket `at the supermarkets').
(11.) In cases such as ba-jeune-s `the youths' in (17), where French is the embedded language in spoken code switching, it is difficult to determine if a French plural morpheme doubles the matrix language plural.
(12.) David Green (personal communication) offers a different scenario than mistiming for how it happens that EL early system morphemes are accessed with their content morpheme heads. He suggests that there may be a reevaluation of the message that interrupts the utterance as initially planned rather than a mistiming. However, we argue that the matrix language remains "in charge" of the grammatical frame. The fact the rest of the constituent (following the embedded-language content morpheme and its indirectly elected early system morpheme) is clearly under matrix-language control belies the argument that the "control," even momentarily, shifts to the embedded language.
(13.) In criticizing the open-closed-class approach to aphasia data, Friederici and Saddy (1993) suggest that future research should concentrate on the efficacy of two other approaches: an extended feature-based lexical classification or an extension of Levelt's (1989) and Garrett's (1992) approach to the functional domain that some (functional) morphemes are directly elected and others are indirectly elected. In Friederici and Saddy's terms, investigations should address "the distinction between formal knowledge and procedural mechanisms ..." (1993: 179). We take quite a different approach in that we argue that the appropriate analysis of morpheme types considers BOTH abstract feature-based distinctions AND procedural knowledge (i.e. how elements are accessed in production and comprehension). Perhaps one reason why feature-based classifications alone are unsatisfactory is that such analyses give preference to form-class distributions over more abstract features involved in structuring larger units in language.
(14.) According to Menn, the second subject she studied "is quite unusual among reported English-speaking agrammatic aphasics in that he only made minimal use of the -ing form of the verb" (1990:127). Also uncharacteristic was his largely correct use of third singular -s (24 times correct out of 26 "reconstructed contexts"). This second subject's "predilection for the simple 3rd singular is even more impressive" considering that he supplied only 45 lexical verbs. This contrasts with the first subject, who supplied 66 lexical verbs.
(15.) Both progressive -ing and past participle -ed/-en are conceptually activated; thus, they are early system morphemes. Progressive -ing and past participle -ed/-en carry intrinsic meanings. Participle inflections flesh out the event encoded on the verb by indicating whether the event or attribute is completed or in progress. Further, participles contrast with tense (a late system morpheme in English) because they include thematic-role information about the NPs with which they occur. For example, in the speaker is interesting, it is clear that the subject NP is an actor or a stimulus. In contrast, in the speaker is interested the subject an cognitive experiencer.
(16.) The fact that Zulu aphasics produce any noun-class prefixes on verbs at all, even though they are largely incorrect, shows that a grammatical frame projected by the verb content morphemes is intact.
(17.) Two points are in order here. First, we recognize that the French equivalents of most instances of be and have as main verbs are multimorphemic (see discussion in 4.3.3). Second, unlike other main verbs, they do not assign thematic roles (e.g. la journaliste a vingt ans `the journalist has twenty years/she is twenty years old', or elle a trois gateaux `she has three cakes', or le jardin est joli `the garden is pretty'). In each of these examples a property (e.g. an age or prettiness) or a figure is mapped onto a ground (e.g. trois gateaux is mapped onto elle). Essentially, such sentences express a location, but not an event. As such, the mapping of one element onto another does not require assignment of a specific thematic role from the predicate (cf. Talmy 1985). In this sense, then, be and have as main verbs are similar to a possessive construction; that is, they include a bridge late system morpheme. They are also inflected with AGR, a outsider late system morpheme.
As auxiliary verbs with participles, the participle is what conveys aspect, not the auxiliary verb (e.g. elle a bien chante `she has sung well' or elle est sortie `she has left'). The form of these auxiliary verbs also depends on information outside of their maximal projection (the subject).
(18.) A logistic regression analysis allows one to model the likelihood of the correct answer as a function of the type of morpheme (content morpheme, early system morpheme, and late system morpheme). Thus, this analysis allows one to answer the question, is the likelihood of getting the correct answer higher for one type of morpheme than the other two? A primary assumption underlying such an analysis is the assumption that the observations are independent of one another. In this data set, independence of the observations seems unlikely because there are multiple utterances made by the same individual. That is, it is reasonable to assume that the responses made by one individual should correlate with one another. In such a case, one can use a technique such as generalized estimating equations (GEE). GEE basically estimates the correlations among responses made by the same individuals and then uses these model estimates of variation (standard error) so that the resulting test statistics are accurate. The scale parameter of GEE estimation was computed as the square root of the normalized Pearson's chi-square.
(19.) See the discussion in note 18. Because more than one response could be tabulated for each of the subjects, independence of all observations is unlikely. The data reported in Tables 7 and 8 were analyzed using generalized estimating equations, as were those in section 7.5.
(20.) Wei (1996) found statistically significant results for the distribution of these data using a Poisson regression analysis. However, we have combined and split some cells so that these statistics are no longer entirely applicable.
Abney, Steven P. (1987). The English noun phrase in its sentential aspect. Unpublished dissertation, MIT.
Anderson, Stephen (1992). A-Morphous Morphology. Cambridge: Cambridge University Press.
Baayen, R. Harald; Dijkstra, Ton; and Schreuder, Robert (1997). Singulars and plurals in Dutch: evidence for a parallel dual-route model. Journal of Memory and Language 37, 94-117.
Backus, Ad (1992). Patterns of Language Mixing: A Study of Turkish-Dutch Bilingualism. Wiesbaden: Harrassowitz.
Bailey, Nathalie; Madden, Carolyn; and Krashen, Stephen (1974). Is there a "natural sequence" in adult second language learning? Language Learning 21, 235-243.
Bernsten, Janice G. (1990). The integration of English loans in Shona: social correlates and linguistic consequences. Unpublished dissertation, Michigan State University.
Bock, Kathryn; and Levelt, Willem (1994). Language production, grammatical encoding. In Handbook of Psycholinguistics, M. A. Gernsbacher (ed.), 945-984. San Diego: Academic Press.
Bokamba, Eyamba (1988). Code-mixing, language variation, and linguistic theory: evidence from Bantu languages. Lingua 76, 21-62.
Bolinger, Dwight (1968). Aspects of Language. New York: Holt, Rinehart and Winston.
Bolonyai, Agnes (2000). "Elective affinities": language contact in the abstract lexicon and its structural consequences. International Journal of Bilingualism 4, 81-106.
Booij, Geert (1996). Inherent versus contextual inflection in the split morphology hypothesis. In Yearbook of Morphology 1995, G. Booij and J. van Made (eds.), 1-15. Dordrecht: Kluwer.
Bresnan, John (1994). Locative inversion and the architecture of universal grammar. Language 70, 72-131.
Caramazza, Alfonso; and Berndt, Rita S. (1982). A psycholinguistic assessment of adult aphasia. In Handbook of Applied Psycholinguistics, S. Rosenberg (ed.), 477-535. Hillsdale, NJ: Erlbaum.
Chomsky, Noam (1995). The Minimalist Program. Cambridge, MA: MIT Press.
Cook, Vivian J. (1993). Linguistics and Second Language Acquisition. New York: St. Martin's Press.
Crawhall, Nigel (1990). Shona/English data. Unpublished data set.
Damasio, Antonio (1992). Aphasia. New England Journal of Medicine 326, 531-539. de Bot, Kees (1992). A bilingual production model: Levelt's "speaking" model adapted. Applied Linguistics 13, 1-24.
--; and Schreuder, Robert (1993). Word production and the bilingual lexicon. In The Bilingual Lexicon, R. Schreuder and B. Weltens (eds.), 191-214. Amsterdam: Benjamins.
Dulay, Heidi C.; and Burr, Marina K. (1974). Natural sequences in child second language acquisition. Language Learning 24, 37-53.
Friederici, Angela D.; and Saddy, Douglas (1993). Disorders of word class processing in aphasia. In Linguistic Disorders and Pathologies, G. Blanken, J. Dittman, H. Grimm, J. C. Marshal, and C.-W. Wallesch (eds.), 169-181. Berlin: Mouton de Gruyter.
--; Wessels, J. M. I.; Emmorey, K.; and Bellugi, U. (1992). Sensitivity to inflectional morphology in aphasia: a real-time processing perspective. Brain and Language 43, 747-763.
Fuller, Janet M. (2000). Morpheme types in a matrix language turnover: the introduction of system morphemes from English into German. International Journal of Bilingualism 4, 45-58.
Garrett, Merrill (1975). The analysis of sentence production. In Psychology of Learning and Motivation, vol. 9, G. Bower (ed.), 133-177. New York: Academic Press.
--(1992). Disorders of lexical selection. Cognition 42, 143-180.
--(1993). Errors and their relevance for theories of languge production. In Linguistic Disorders and Pathologies: An International Handbook, G. Blanken, J. Dittmann, H. Grimm, J. Marshall, and C. Wallesch (eds.), 72-92. Berlin: Mouton de Gruyter.
Green, David W. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism: Language and Cognition 1, 67-81.
Gross, Steven (2000). When two become one: creating a composite grammar in creole formation. International Journal of Bilingualism 4, 59-80.
Halmari, Helena (1997). Government and Codeswitching: Explaining American Finnish. Amsterdam: Benjamins.
Harris, Zellig S. (1951). Structural Linguistics. Chicago: University of Chicago Press.
Herbert, Robert K. (1991). Patterns in language change, acquisition and dissolution: noun prefixes and concords in Bantu. Anthropological Linguistics 33, 103-131.
Jackendoff, Ray (1997). The Architecture of the Language Faculty. Cambridge, MA: MIT Press.
Jaeggli, Osvaldo; and Safir, Kenneth J. (1989). The Null Subject Parameter. Dordrecht: Kluwer.
Jake, Janice L. (1994). Intrasentential code switching and pronouns: on the categorial status of functional elements. Linguistics 32, 271-298.
--(1998). Constructing interlanguage: building a composite matrix language. Linguistics 36, 336-382.
--; and Myers-Scotton, Carol (1997). Codeswitching and compromise strategies: implications for lexical structure. International Journal of Bilingualism 1, 25-39.
Jarema, Gonia; and Friederici, Angela D. (1994). Processing articles and pronouns in agrammatic aphasia: evidence from French. Brain and Language 46, 683-694.
Kieswetter, Alyson (1995). Code-Switching Amongst African High School Pupils. Occasional Papers in African Linguistics 1. Johannesburg: University of the Witwatersrand.
Klein, Wolfgang; and Perdue, Clive (1993). Utterance structure. In Adult Language Acquisition: Cross-Linguistic Perspectives, C. Perdue (ed.), 3-40. Cambridge: Cambridge University Press.
Levelt, Willem J. M. (1989). Speaking, from Intention to Articulation. Cambridge, MA: MIT Press.
--(1993). Language use in normal speakers and its disorders. In Linguistic Disorders and Pathologies, G. Blanken, J. Dittmann, H. Grimm, J. C. Marshall, and C. W. Wallesch (eds.), 1-15. Berlin: Mouton de Gruyter.
--; Roelofs, Ardi; and Meyer, Antje S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences 22, 1-75.
Menn, Lise (1990). Agrammatism in English: two case studies. In Agrammatic Aphasia, A Cross-Language Narrative Sourcebook, vol. 1, L. Menn and L. Obler (eds.), 117-178. Amsterdam: Benjamins.
--; and Obler, Loraine K. (1990). Theoretical motivation for the cross-language study of agrammatism. In Agrammatic Aphasia, A Cross-Language Narrative Sourcebook, vol. 1, L. Menn and L. Obler (eds.), 3-12. Amsterdam: Benjamins.
Milian, Silvia (1996). Spanish/English data. Unpublished data set.
Mithun, Marianne (1999). The status of tense within inflection. In Yearbook of Morphology 1998, G. Booij and J. van Marie (eds.), 23-44. Dordrecht: Kluwer.
Myers-Scotton, Carol (1993). Duelling Languages: Grammatical Structure in Codeswitching. Oxford: Clarendon.
--(1997). Duelling Languages: Grammatical Structure in Codeswitching, 2nd ed., with a new Afterword. Oxford: Oxford University Press.
--(1998). A way to dusty death: the matrix language turnover hypothesis. In Endangered Languages, L. Grenoble and L. Whaley (eds.), 289-316. Cambridge: Cambridge University Press.
--; and Jake, Janice L. (1995). Matching lemmas in a bilingual language competence and production model: evidence from intrasentential code switching. Linguistics 34, 981-1024.
--; and Jake, Janice L. (eds.) (2000a). Testing a model of morpheme classification with language contact data. International Journal of Bilingualism 4 (guest issue).
--; and Jake, Janice L. (2000b), Explaining aspects of codeswitching and their implications. In One Mind, Two Languages: Bilingual Language Processing, J. Nicol (ed.), 91-125. Oxford: Blackwell.
--; Jake, Janice L.; and Okasha, Maha (1996). Arabic and constraints on codeswitching. In Perspectives on Arabic Linguistics, vol. 9, M. Eid and D. Parkinson (eds.), 9-43. Amsterdam: Benjamins.
Nespoulous, Jean-Luc; Dordain, Monique; Perron, Cecile; Jarema, Gonia; and Chazal, Marianne (1990). Agrammatism in French: two case studies. In Agrammatic Aphasia, A Cross-Language Narrative Sourcebook, vol. 1, L. Menn and L. Obler (eds.), 623-716. Amsterdam: Benjamins.
Ouhalla, Jamal (1991). Functional Categories and Parametric Variation. London: Routledge.
Peterson, Jennifer (1988). Word-internal code-switching constraints in a child's grammar. Linguistics 26, 479-493.
Poulisse, Nanda (1997). Production in the second language. In Tutorials in Bilingualism, A. de Groot and J. Kroll (eds.), 201-224. Hillsdale, NJ: Erlbaum.
Pulvermuller, Friedemann (1995). Agrammatism: behavioural description and neurobiological explanation. Journal of Cognitive Neuroscience 7, 165-181.
Schmitt, Elena (1999). Russian/English unpublished codeswitching corpus.
--(2000). Overt and covert codeswitching in immigrant Russian children. International Journal of Bilingualism 4, 9-28.
Schreuder, Robert; and Baayen, R. Harald (1997). How complex simple words can be. Journal of Memory and Language 37, 118-139.
Sereno, Joan; and Jongman, Allard (1997). Processing of English inflectional morphology. Memory and Cognition 25, 425-437.
Shankweiler, Donald; Crain, Stephen; Gorrell, Paul; and Tuller, Betty (1989). Reception of language in Broca's aphasia. Language and Cognititive Processes 4, 1-33.
Talmy, Leonard (1985). Lexicalization patterns: semantic structure in lexical form. In Language Typology and Syntactic Description, vol. 3, T. Schopen (ed.), 51-149. New York: Cambridge University Press.
Tiersma, Peter (1982). Local and general markedness. Language 58, 832-849.
Vigliocco, Gabriella; and Zorzi, Marco (1999). Contact points between lexical retrieval and sentence position (Open Peer Commentary). Behavioral and Brain Sciences 22, 58-59.
Wei, Longxing (1996). Variation in the acquisition of morpheme types in the interlanguage of Chinese and Japanese learners of English as a second language. Unpublished dissertation, University of South Carolina, Columbia.
--(2000a). Types of morphemes and their implications for second language acquisition. International Journal of Bilingualism 4, 29-43.
--(2000b). Unequal election of morphemes in adult second language acquisition. Applied Linguistics 26, 106-140.
|Printer friendly Cite/link Email Feedback|
|Author:||MYERS-SCOTTON, CAROL; JAKE, JANICE L.|
|Publication:||Linguistics: an interdisciplinary journal of the language sciences|
|Date:||Nov 1, 2000|
|Previous Article:||Intentional relation and suspended reading of before" clauses: the case of the French avant que(*).|
|Next Article:||A constructional approach to clefts(*).|