What's in a word?

1. Preliminaries

This article is the enlarged version of the opening lecture at the 'Conference on Word-Formation Theories', II (Kosice, June 2015). Its aim is not to present new researches but to give a preliminary overview of recent and on-going discussions (for a general overview of the many aspects of word-formation one may now consult Muller et al. 2015). The main points of the article are summarized in the abstract.

If we want to have a meaningful discussion of word-formation phenomena, we are in need of a proper definition of 'word'. Therefore, my starting point will be some thoughts on the very nature of 'word'. I shall touch upon some crucial points and discuss various examples capable of illustrating these points (in spite of the risk of repeating matters well-known to specialists). I have paraphrased in the title the famous Juliet's question 'what's in a name?' to 'what's in a word?', i.e. what is a word?, what do we mean by word, slovo, parole, lexi, vorba, and further kelime or soz(cuk) in Turkish, kelma in Maltese, szo in Hungarian, xiaoxi or danci in Chinese, etc.? Are we sure that all these lexemes have the same referential meaning? Already Quintilian in his Institutionum libri,, noted that uerbum "duplex intellectus est", i.e. it has a double meaning:
   alter qui omnia per quae sermo nectitur significat [...] alter in
   quo est una pars orationis "the one means all the parts of which
   discourse is composed, in the other meaning it refers to a part of

More or less at the same time as Quintilian, uerbum started to be used in the Christian sense as translation of Greek logos: [TEXT NOT REPRODUCIBLE IN ASCII.] In principio erat verbum. I will not enter into the discussion of how logos and verbum have to be understood from a theological point of view. Let's stick to Quintilian and his linguistic approach. It is clear that we have to grasp which the properties of a word are, what makes a sound chain a word. Given the difficulty of finding an allembracing, satisfying definition, some linguists such as Andre Martinet (1985: 84) suggested giving up the very notion of 'word' as not useful in any serious syntactic analysis. But one has to remember Saussure's words in his Cours de linguistique generale (p. 154):
   le mot, malgre la difficulte qu'on a a le definir, est une unite
   qui s'impose a l'esprit, quelque chose de central dans le mecanisme
   de la langue [the word, in spite of the difficulties of its
   definition, is a unity that is evident for the mind, something
   pivotal in the language mechanism].

Moreover, anthropologists note that almost all written traditions have developed systems for separating the words in the phonic uninterrupted chain (see Cardona 1985: 77). Everyone who has studied archaic Greek or Latin inscriptions will remember how difficult it is to read them fluently because words are not separated. This means that speakers have an intuitive idea of what constitutes a word. Sapir spoke (1921: 33) of the psychological reality of the word in the speaker's mind. I'll come back to this crucial point later (see [section] 4.).

1.1. The prototypical properties of a word

Many years ago I tried to define the prototypical properties of what can be considered to be a 'word' (Ramat 1990) and I picked out three main criteria. I started from the well-established, though not universally accepted, definition of 'word': a unit of meaning that can stand alone in a sentence and can be uttered in isolation. The three properties span from semantics to (phono)morphology and syntax. Though morphology seems to play the major role in defining what a word may be, a sharp division cannot be drawn between morphology and syntax, as we will see in the next examples (cp. Coulmas 1988: 316). According to (Radical) Construction Grammar the lexemes that belong to different classes of speech (or categories, i.e. the traditional partes orationis) derive their appurtenance from constructions (see Croft 2000: 84f.). This is true insofar as the functions and meanings of a word can be different according to different constructions. We know, however, that, for instance, a Latin word ending in -ibus (like omnibus or hominibus) will never be a verb and that the Czech suffix -eme, -ame, -ime unambiguously indicates 1st pers. plur. pres. We may conclude that, especially in fusional languages, morphology plays a major role in individuating words, much more than in isolating languages where the syntactic position of a word establishes its function. There exists a continuum between the isolating and the polysynthetic strategy which we could label 'phonomorphosyntactic' since phonological, morphological and syntactic factors play a relevant role, as regards the semantic value of the concerned sound string.

The criteria I have suggested are autonomy, mobility, and cohesion. Certainly, these criteria are not brand-new (see now Mugdan's summary in Mugdan 2015). For example, Booij (2009: 97) considers cohesiveness or non-interruptability as the defining property of the notion 'word'. But the three criteria, if taken together, are useful since they embrace both morphology and syntax (plus phonetics as well, inasmuch as both morphological complex formations and syntactic constructions can impact on the phonological shape).

Let's start with the autonomy property and take a very simple case:

(1a) Germ. Ich liebe dich sehr, lit. I love you much.

The Subject inversion in the sentence (1b)--with strong (contrastive) emphasis on the Object--is mandatory:

(1b) DICH, liebe ich sehr (und nicht Erika) vs. *DICH ich liebe sehr.

In (1a,b) we observe that to a certain extent personal pronouns may be autonomous and movable even in non-pro-drop languages like German. Clitics, on the contrary, cannot be autonomous though they are movable in some languages:

(2) Ital. Glie-lo dicz?INTERR. D(-glie-loIMPRT, Si, glie-lo dico. *Si glie-lo

to.him/ say-to.him/her-it. Yeah, to.him/her I.say *Yeah, to.him/her-it

"Do you say it to him/her?" "Say it to him/her" "Yes, I'll say it to him/her"------

Glielo cannot appear isolated and its position depends on the morphosyntax of the sentence: preverbal in interrogative and affirmative sentences, postverbal in imperatives: *Glielo di is not acceptable, and also *Si glie-lo is ungrammatical: the verb is mandatory also in the reply. Finally, even among pronouns there are forms which are more words, so to say wordier', than others. See the French difference between je, tu, il/elle and moi, toi, lui. The former cannot appear in isolation, the latter can:

(3a) Qui va venir demain? Moi ou lui?

"Who's coming tomorrow? Me or he?"


(3b) Qui va venir demain? *Je ou il?

We see from the examples (1)-(3) that the three properties alluded to above are only partially shared by pronouns: they can be movable; but usually they are not autonomous. We may say that moi and lui are more 'wordy' (or 'wordier') than je and il since they can stand alone. This fact hints at a gradualness of the concept of word, as we will see later on with more details: there are prototypical words and less prototypical ones and there are even cases where one may wonder whether the phonetic string can be considered a word at all. Crucially, this gradualness has important consequences for every word-formation theory.

1.2. Cohesion

Cohesion refers to the fact that nothing can be inserted into a word, except infixes that have grammatical function as morphs ('grams', endowed with their own meaning/function, like the already quoted Lat. suffix -ibus). In ancient IE languages it was thus possible to have a nasal infix in the root which represented the basis of a word family: the root leikw- 'to leave, abandon' had in Latin a present form li-n-quo 'I leave' but a perfect liqu-i 'I left', and the Semitic languages use vowel insertion in the three consonant root to distinguish different meanings: see the often-cited example of the Arabic root k-t-b whose general meaning has to do with the concept of 'write, writing': kitab 'book', kutub 'books', katib 'writer', katib-a 'written document', kuttab 'writing school', kataba 'he wrote', yaktubu 'he writes', etc. We speak in such cases of inflectional allomorphy (see Dressler 2015: [section] 2.2). However, in order to distinguish meanings in a 'word family' there exist other strategies than stem/affix allomorphy: for example, to avoid homophonies Modern Mandarin has developed many bisyllabic compounds like shigao 'plaster' (lit. stone cream), shik? 'grotto' (lit. stone cave), shihui 'lime' (lit. stone dust) etc. But in compounds such as shui-jiao 'to sleep' from "sleepVB+sleepN" the second element cannot appear isolated, i.e. it is cohesive, strictly united with the first one; consequently, it cannot be considered as an autonomous word, just as Engl. suffixes like -hood or -dom can't, though etymologically they derive from full words. Nevertheless, wd shui le jiao 'I slept' with insertion of the perfectivizing morpheme le is perfectly grammatical (but, of course, it cannot be considered as a compound).

In the NP the bottle we could consider the article the as a prefix with the function to indicate the status of Noun of the following bottle (vs. to bottleVB)- Since the cannot stand alone it is not a word according to the autonomy criterion, though, contrary to the cohesiveness principle, something can be inserted in the NP: e.g. the green bottle. This possibility does not apply to languages with postposed article like Romanian (lup 'wolf and lupul 'the wolf, never *lup griul lit. 'wolf gray-the') or Macedonian (grad 'town' and gradot 'the town', never *grad golemot lit. 'town big-the'). We may conclude that, according to the autonomy criterion, Romanian and Macedonian definite articles are more strongly bound to Nouns than the English counterpart the, and therefore they must be located at a lower level of the autonomy scale. These examples confirm what I said before, namely that words are disposed along a continuum of higher or lower wordiness. Note also that the is an invariable morph but in other languages the article can be inflected: see French l'enfant 'the child' vs. les enfants 'the children'. However, in French, too, the article cannot appear isolated: *les alone does not make any sense.

1.3. Mobility

As for mobility, a word can change its position in a sentence:

(4a) John has probably lost the key of his house

(4b) Probably, John has lost the key of his house

(4c) John has lost the key of his house, probably.

Clearly, the three sentences do not have exactly the same meaning, as the adverb is differently focalised. In (4b) it has an epistemic value and the entire sentence is in its scope, whereas in (4c) it represents a kind of an afterthought. In (4a) we have the unmarked position of the adverb.

Note that in (4) the NP has a rigid order: ?? of his house the key sounds very poetic and unusual, while *the house of key his is ungrammatical. Languages with a rich array of inflections have a much freer word mobility:

(5a) Suae domus fortasse amisit Johannes clavem

(5b) Johannes clavem domus suae fortasse amisit, etc.

In strongly inflectional languages like Latin the word is, so to say, self-sufficient and capable to express 'per se' its ties with the rest of the sentence. On the contrary, the poorer a language is in inflection, the more crucial the syntactic order is in the sentence construction. Strongly isolating languages like Chinese show a very low degree of mobility. Modern Mandarin Chinese is an analytic language that depends on syntactic word order and sentence structure.

We have already seen examples where suffixes and grammatical particles are neither autonomous nor movable. The already quoted Engl. -hood, originally a noun meaning 'condition, quality', is just a suffix and cannot appear in isolation: *the/a hood as well as the inversion *hoodchild are impossible. This does not mean that speakers do not analyze childhood as 'child' plus a suffix to build an abstract noun expressing 'the quality/state of being X', as is proved by nonce-formations like doghood or soulhood (on the nonce formations created by the speakers according to the WFRs of their own language see Gaeta & Ricca 2015: 843, 847). Actually, some exceptions to the non-isolation principle of affixes do exist: they are represented by a few prefixes such as extra, super (see extraordinary, extrasensory versus constructs as meals are extra, i.e. 'not included in the price'; superhuman, supernatural versus constructs as that film was super, i.e. 'very very good'). And from words like socialism, capitalism, naturalism etc. the suffix ism has been extracted becoming a noun with the meaning 'abstract general notion': I have enough of all your isms! At this point of its development we may suspect that ism is felt as a compound second member and no longer as a suffix, similar to -gate that from Nixon's Watergate, to Clinton's Sexgate, and further to Petrolgate etc. has become the second member of compounds with the meaning of 'scandal, dirty affair'. Note that in English and in many other languages compounding is a highly productive strategy for word-formation. Thus, it is quite understandable that English native speakers may analyze the suffix -hood as compound second member, as is proved by the above-mentioned nonce-formations like dog-hood. In the same way, from Bronchitis, Gastritis, Nephritis etc. the suffix -itis meaning 'desease' has been used in German for compounding new words like Telefonitis 'odd habit to use the telephone, telephone addicted' (Ramat 1992).

These examples are important: on the one hand they show the metalinguistic competence of the speakers related to properties of their cognitive endowment, on the other--more strictly relevant to the linguistic system proper--they show via categorial transformations {suffix noun; noun suffix} that the boundaries between words and compounded formations are not waterproof. For example, the category of ADJ may include verbs or nouns whereby verbs or nouns can be used with an adjectival function: Turk. guzel means both 'beauty'NOUN and 'beautiful'ADJ, and it may be used even as an ADV: guzel konustu 's/he spoke beautifully' (see Gaeta 2014: 230).

The dependency relation between noun-bases and morphemes is a basic Word-Formation Rule (WFR) in every language and in general the order of the elements forming a word is fixed: inter-nation-al-iz-ation-s (and never *inter-nation-s- iz-al-ation !) where the order of derivational affixes reflects the order of semantic operations and the inflectional suffix -s has the most peripheral position (Stekauer 2015: [section]3.6). The same holds for the elements forming a compound: in German Feldbau 'agriculture' is not equivalent to Baufeld 'ground for construction'! However, Joan Bybee (1985: 96) quoted Eskimo as a language having the possibility of different suffix orders:

(6a) ino- -rssu- -anguag

person- -big- -little, i.e. 'little giant'


(6b) inunguarssuag, lit. person-little-big, i.e. 'big midget'

The suffix occurring closer to the root affects the inherent meaning of the root ('giant' or 'midget'), while the second suffix has a more adjectival function. With other morphological means, namely adjectives, we obtain the same semantic effect in English or Italian, where moral connotations play an important role:

(7a) big little man and grande piccolo uomo have a totally different meaning than

(7b) little big man, piccolo grande uomo.

Semantically, little man and big man come close to a compound, and in many linguistic traditions compounds behave like frozen phrases, the classical example being Vergissmeinnicht, forget-menot 'myosotis' (note that English uses hyphens!).

Phrases like little man, piccolo uomo, have been defined as MWEs ('Multi Word Expressions'), i.e. lexical unities formed by more words but referring to a unitarian concept, just as 'midget' (see Huning & Schlucker 2015). The suggested criteria for individuating MWEs are twofold: semantic compositionality degree and syntactic rigidity: in an MWE the binding relation (BR) between the anaphoric pronoun and its antecedent is missing (cp. Germ. Weri seinerzeitj? / zu seineri? Zeit ein Wappen trug, war zugleich ein Waffentrager : seinerzeit / zu seiner Zeit means 'in those times' and does not necessarily refer to the specific, particular time of the person referred to by wer).

Also very usual expressions such as sort of / kind of do not strictly concern the WFRs since they are still not univerbated phrases. Voghera (2013) considers the Italian equivalent of sort/kind of, namely tipo in, for example, tipo di pittura 'type of painting', and further in constructs as caffe espresso tipo Africa 'express coffee type Africa', una scuola tipo Universita popolare 'a school of the popular University sort' as a "non noun": tipo[-N] X . Such constructs with espece, sort, type, tipo, are called by Simone and Masini (2014b) 'Light Nouns' whose referential force is more or less reduced, "with a scalar classification of nouns" according to their referential force (ibid., p.53). In any case, we have to distinguish between compounds and these binominal constructs. Compounds are not NPs. The former do not allow insertions, the latter do: un grosso colpo inaspettato di fortuna, but not *une pomme rouge de terre.

At the lexical level MWEs may be subject to 'univerbation', thereby giving rise to new words. Univerbation conflates MWEs into full-fledged words belonging to a definite class (Simone & Masini 2014a: 4). This is the case of adverbs such as perhaps or ital. forse, Germ. heute, etc. whose etymology remains totally obscure to the native speaker (respectively, from per haps 'by accident'--cp. happen, happening--; from Lat. fors sit 'be the chance', and from Old High Germ. *hiu dagu 'an diesem Tag, today'). Moreover, in Alb. kinse, Serbo-Croat. morda, Czech mozda lit. (it) can that > 'perhaps' the complementizer has been incorporated into the verbal form giving rise to the new adverb.

2. Compounds and word formation rules

The above examples and observations lead to the definition of 'word' in prototypical terms, already alluded to in [section]1.1: a prototypical word is autonomous, movable and cohesive.

Under this point of view compounds such as wardrobe and skyscraper are words at the same rate as ward, robe, sky and scraper (though scraper has a more complex formation, namely a basis scrap + the 'nomen agentis' suffix -er). But, contrary to the just quoted construct un grosso colpo inaspettato di fortuna, you cannot have insertions like *ward-safe-robe or *sky-strong-scraper instead of safe wardrobe and strong skyscraper: under the viewpoint of cohesiveness they are OK.

Zero-marked compounds with simple juxtaposition of its members (like skyscraper) but referring to superordinate-level concepts have been called 'co-compounds' (see Arcodia, Grandi & Walchli 2010; Walchli 2015): e.g. Tok Pisin man-meri lit. man-woman > 'people', Sanskr. mata-pitaraunuDUAL. mother-fatherDUAL > 'parents'; Mand. Chin. papa-mama lit. daddy-mom > 'parents', Erzya Mordvin t'et'a-t=ava-t 'fatherPL-motherPL > parents' (lit. 'fathers-mothers', see Walchli 2015: 712). Chin. dao-qiang, lit. sword-spear refers to the superordinate concept of 'weapons' and is different from Span. lanza-espada 'spear-sword' which indicates a special kind of spear with a blade and therefore establishes a dependency connection between the two terms. Dao-qiang, contrary to lanza-espada, represents a 'dvandva' compound where the two terms are not in a dependency relation: it lies on the border between words and phrases. On the other hand, compounds such as Germ. Regierungs-prasident 'president of the government' show a dependency relation between the two compound elements. Moreover, Regierungs-prasident has an -s- which does not belong to the inflection of the -ung- names and appears just in compounds as marker of composition, a so-called 'Fugenelement' or 'Fugenmerkmal'. This -s- derives via analogy from the genitive of words such as Konig-s 'of the king' as in Konigsberg, a place name, along with Berg des Konigs, and Bund-es 'of the league/union' as in Bundeskanzler. This example shows how compounds may be influenced by inflection. In fact, Bundeskanzler is traditionally alluded to as 'improper compound', like Ancient Greek Dioskuri [TEXT NOT REPRODUCIBLE IN ASCII.] the 'sons [Koupoi] of Zeus [TEXT NOT REPRODUCIBLE IN ASCII.] kuri Dios [TEXT NOT REPRODUCIBLE IN ASCII.] would also be fine. Actually, we could consider Bundeskanzler and Dioskuri more as syntactic constructs than as compounds. Actually, in the written tradition of the Homeric texts the two members of the 'compound' are written separately: [TEXT NOT REPRODUCIBLE IN ASCII.] But even in real compounds such as riverbank it is possible to refer to the first member (i.e. the determiner) of the compound, river, as a separate unit, i.e. as (the) river's bank (Coulmas 1988:324):

(8a) The riverbank was damaged when it overflowed after three days of heavy rain,

(8b) The river's bank was damaged when it overflowed after three days of heavy rain.

Clearly, the anaphoric it refers both in (8a) and (8b) to the river and not to its bank.

In Ancient Greek and Latin compounds -mostly, exocentric ('bahuvrihi') compoundswere often formed with an -i- : argipus [TEXT NOT REPRODUCIBLE IN ASCII.] 'who has rapid [TEXT NOT REPRODUCIBLE IN ASCII.] feet [TEXT NOT REPRODUCIBLE IN ASCII.] kydianeira [TEXT NOT REPRODUCIBLE IN ASCII.] 'glorious', lit. (who gives) glory [TEXT NOT REPRODUCIBLE IN ASCII.] to the men [TEXT NOT REPRODUCIBLE IN ASCII.] agricola 'farmer', lit. one who takes care of/lives in the fields, silvicola 'one who lives in the forest'. This -i- does not belong to any of the declensions of the concerned compound first members: the isolated forms kudi [TEXT NOT REPRODUCIBLE IN ASCII.] or *silvi do not exist. Contrary to the above quoted Germ. -s-, the origin of the inserted -i- is uncertain. It has been suggested that it derives from an ancient locative: this could apply to agricola '[one who] lives in the fields (ager)' and silvicola '[one who] lives in the forest (silva)', but certainly it does not apply to [TEXT NOT REPRODUCIBLE IN ASCII.] Be that as it may, we note that compounds may have a dedicated marker to indicate their nature of compounded words, i.e. they show a particular word-formation rule which is more than the bare juxtaposition of the two elements we find in numerals such as Ital. ventidue, Engl. twenty two, or Turk. yirmi iki. The last two forms are also graphically divided into two words. Consequently, we have to consider skyscraper and wardrobe as prototypical compound forms, of the type N[empty set] +N[empty set], where [empty set] means absence of any morph, while Bundeskanzler and Dioskuri are not prototypical compounds.

In a recent contribution two young researchers have studied, also from a historical point of view, the emergence of compounded words in the Syriac language, that, as other Classical Semitic languages, strongly resist compounding, which was basically limited to numerals (Ciancaglini & Alfieri 2013; according to the philological tradition, in the following examples transliterations are in bold characters, while transcriptions are in italics). Ciancaglini and Alfieri have shown that Syriac makes use of both matter replication, i.e. loanwords (as in plwpdywn = polipodion [TEXT NOT REPRODUCIBLE IN ASCII.] and pattern replication (i.e. calquing as in sgy regl' saggi regla a calque from Gk. noXinoSiov 'polypode') compounded by saggi 'much, many' + regla 'foot'(traditionally). Calquing involves a higher degree of linguistic consciousness, as it means "identifying a structure that plays the pivotal role in the model language and matching it with a structure in the replica language" (Matras & Sakel 2007: 829).

Phonological adaptations happen in many languages during the borrowing process and borrowed compounds are no longer understood according their components: thus, a new word arises respecting the phonology of the borrowing language: no one recognizes in Ital. bistecca the English compound beefsteak, nor in stoccafisso the (Old) Dutch stoc visch; and adaptations to the morpho-phonology of the target language may also occur, as shown by Ciancaglini & Alfieri 2013 in the case of Syriac: this language may adopt derivative suffixes from another language, as for instance in the case of the Middle Iranian relational suffix -agan (= qn': e.g. hmrqn' 'donkey-driver' from hmr 'donkey', qytwnqn' 'chamberlain' from qytwn, a loanword from Gk. koiton [TEXT NOT REPRODUCIBLE IN ASCII.]'bedroom'). The -agan (qn') suffix has the same function as the Engl. -er we have already seen in -scrap-er. The 'nomina agentis' refer to a person/a tool that has something to do with the idea expressed by the basic term: a farmer is someone who has to do with a farm, a driver is someone who has to do with the action of driving, and we could translate the Syriac hmrqn' by something like 'donkey-er'.

2.1. Concatenation and word formation rules

Derivation via affixes has the same concatenative strategy we find in compounds and inflection. The difference consists in the fact that derivational and inflectional morphemes cannot appear in isolation while the members of compounds can: sky and scraper are autonomous words, whereas -er is not; child is an autonomous word, -hood is not. The same holds for heavily inflecting languages:

(9) Turk. ev- ler- im- iz- -de

house-PLUR-1PERS- PLUR- in "in our houses":

ev is an autonomous and unchangeable word, followed by a series of morphological suffixes that have a fixed order (*evimlerdeiz would be impossible). Swahili has both prefixes and suffixes, but the technique is again concatenative:

(10) wa- na- pig -w -a


'They are being hit'

These observations raise a problem I have not yet touched upon: how are so-called polysynthetic languages to be considered? What is their position in a word-formation model? Notoriously, they are called also 'incorporating languages'.

In Yupik (Alaska, Eskimo-Aleut) we have 'sentence-words' such as

(11) tuntu- ssur-qatar -ni-ksaite-ngqiggte-uq

reindeer- hunt-FUT - say-NEG- again-3SG. INDIC.

'He had not yet said again that he was going to hunt reindeer'

Only tuntu 'reindeer' can appear in isolation; all the other members of this 'sentence-word' cannot, just as ev 'house' versus the other morphemes in the Turkish example (9). We begin to suspect that inflectional and polysynthetic languages have much in common.

On the other hand, incorporating languages like Chukchy may unite more lexemes as, for instance, in the classical German example Rheindampfschiffahrtgesellschaft 'company for the steam navigation on the Rhine', which theoretically could be endlessly continued--kapitanstochter 'the daughter of the captain of the company...', the essential difference being that the long German compound which contains five lexemes, 'per se' does not build a sentence, just as the simple two-members compounds like watchman, watchdog do not, while Chukchy does--as in the following example:

(12) t-e- meyn-e-levt-e-peyt-e-rkan

ISUBJ-e-great-e-head-e-hurt-e-PRES 1.SG (-e- represents a connecting vowel)

"I have a fierce headache" (Comrie 1989: 45)

This example has three incorporated lexemes 'great', 'head' and 'hurt' with a verbal morpheme rkan: therefore it is a full sentence. Let's now consider the following example from Vietnamese:

(13) KhO tUi []En nh[??] ban toi, chung toi bat []],u I[??]m b[??]i

when I came house friend I, PLUR I begin do lesson

'When I came to my friend's house, we began to do lessons' (Comrie 1989:43),

Vietnamese is a heavily isolating language, whereas English is only partially isolating:

(14) I do not say it to her


Compare now (14) to the corresponding

(15) Ital. Non glielo dico

NEG to.her/ I.say1SG.INDIC.PRBS

English is obliged to use a personal pronoun, while the person is included in the final -o of the Italian verb dic-o; negation is expressed in English by a periphrastic form (i.e. auxiliary+not) and the dative (or second object) needs the preposition to which is not present in Italian. We may conclude that English is less analytically isolating than Vietnamese, where 'we' is expressed (as in Chinese) by a plural marker (chung) + 'I' (toi). At the same time, English is more analytically isolating than Italian. The morpheme-per-word ratio is in English higher than in Vietnamese, but lower than in Italian. Our examples show that from the typological point of view concerning word structure, as well as WFRs there are two morphosyntactic poles without clear-cut boundaries among the intermediate types (cf. (8a) and (8b)). One pole is represented by an ideal totally isolating language where the morpheme-per-word ratio is 1:1, which means that each word contains a single morpheme as in Vietn. chung toi 'we'. But we have seen that even Chinese and Vietnamese are not completely isolating: above I have quoted Chinese bisyllabic compounds and example (13) presents bat dau which is a periphrastic form for 'begin'. More or less near to the opposite pole we find what are traditionally called fusional, agglutinative and polysynthetic types: all of them make use of concatenative strategies, more or less extensive and extendable; from the theoretically endless composition of the German type (Rheindampfschiffahrt...) to the Turkish inflection extremely rich in morphemes and finally to the polysynthetic type. Even in this case we do not find an absolutely synthetic language. Analysis and synthesis represent two ideal types. Skalicka (1966) called them 'Konstrukte'.

3. What's in a word

We are now able to propose an answer to the question asked in the title of this paper, namely to advance a definition of word, which, by the way, is an always debated question among linguists and particularly among typologists. Typologies such as Klimov's 'kontensivnaja tipologija' which distinguishes between ergative, active and nominative/accusative languages (Klimov 1983, 1986) are less relevant for the definition of the word concept. The same holds for the distinction between head-marking and dependent-marking (Nichols 1986), as this distinction concerns the relations among the constituents of a construction: The man's house marks the possession relation on the dependent element of the NP, whilst Hung. az ember haza marks the possession on the head of the NP (lit. the man house.his).

The prototypical word is a phonetic string that does not have any necessary relation to its semantic content (remember Saussure's arbitraire du signe) and does not contain any morphological or syntactic relational sign: happy, do, boy are 'wordier' than happiness, did, boys and if in zwanzig a German speaker would perhaps still recognize the same basis as in zwei, than zwanzig would be less wordy than Fr. vingt whose etymological connection (< Lat. viginti) remains completely obscure.

Opaqueness and symbolism are characteristic of the prototypical word, whereas iconism and transparency are characteristic of (poly)synthetic words (cp. Ramat 2005 [1990] : 119). In morphology iconism is expressed by more complex forms: plurals are usually more complex than their singular counterparts: see boy-s vs. boy or Lat. puell-a-rumGEN.PL. vs. its Nomin. Sing. puella; the indicative is usually less complex than conditional, optative and other moods (see I say vs. I would say , Lat. dico1Sg.PRES.INDIC vs. dicerem1Sg.IMPF.SUBJ, etc. Accordingly, completely arbitrary brand names such as Meriva, a type of car produced by Opel (Ronneberger-Sibold 2015: 2199), acronyms which remain totally obscure to the hearer (e.g. FIAT from Fabbrica Italiana Automobili Torino) and, last but not least, proper names such as Pablo, Pavel, Paolo, Paul are excellent examples of prototypical words, much more than the compounded names Cam-bridge, Bene-dikt, Gott-lieb, Miro-slav or Blanche-fleur, whose etymological formation is still easily recognizable. Brand names, acronyms and (personal) nouns are according to Seiler's (1975) terminology "etikettierende Benennungen", unmotivated labels (as opposed to "deskriptive Benennungen").

3.1 About word-formation (theory)

Once we have obtained a satisfying word definition (see [section]2 and 3), we can look more accurately at word-formation and word-formation theories. What do we mean by word-formation? In this final part of the paper I will make some cursory observations about some crucial features of word-formation rules, more or less implicitly alluded to in the previous discussion and in the corresponding examples.

WFRs do not apply to simple words, like Germ. heute, Ital. oggi. Though they historically derive from NPs (see [section] 2.3), they are no longer analyzable in their components, whilst Engl. today and French aujourd'hui may be still analyzed (at least partially) by native speakers: to-day as to-morrow and -jour- in the French complex expression. Crucially, WFRs concern derivation and composition: e.g. childhood and skyscraper. There is also zero derivation or 'conversion' as in cleanADJ to cleanVB, bottleNOUN to bottleVB, waterNOUN to waterVB, etc. We have here a clear example of transcategorization, that is, a categorial shift of a lexical item with no superficial marking, resulting from its employment in a new syntactic environment: cp. the water in the bottle vs. to water the garden (see Jezek & Ramat 2009). The poorer a language is in morphology, the more transcategorization is possible: the same lexeme may be ADJ and ADV, Noun and VB. This refers to the difference between what Hengeveld (1992) has called 'rigid' vs. 'flexible' languages. Flexible languages allow much more permeability between word classes than rigid languages. Rigid languages such as the old Indo-European ones have dedicated markers (morphs) for different classes and different forms inside the classes: as I said before, suffixes like the DAT./ABL. -ibus in Latin or the GEN.SG. -oio [-010] in Homeric Greek will never be used in the verbal system, nor will -averunt of amaverunt 'they loved' or -somai [oopm] of belsomai [TEXT NOT REPRODUCIBLE IN ASCII.] 'I will come' ever appear as adverbial suffixes. On the contrary, English has a rather poor morphology and a lexeme such as like may be used as VB (I like swimming), as ADJ (to be of like mind), and as ADV (she doesn't prefer vivid colours, like red). Only the syntactic environment will clarify the meaning of such a polysemous word. This is the main characteristic of isolating languages such as Mandarin or Vietnamese.

3.1.1. Some (important) aspects of word-formation

As is well known, affixes are the main strategy to form a new word from a lexical basis. As Creissels (2014: 90) has it, "[t]he ability to be the target of word formation processes is commonly considered a typical property of major word classes". He quotes the case of the Bantu language Tswana that makes use of the prefix bo- to derive nouns from adjectives as in

(16) boi 'timid' bo-boi 'timidity', thata 'strong' bothata 'strength'.

Infixes are not that usual. I have already quoted ([section] 2.1) the -n- infix in Lat. li-n-quo that is typical of old Indo-European languages. This is a residual morph whose function remains unclear. It disappeared during the development of Old Indic, Greek, Latin etc. (Latin has pingo 'I depict' but pictum 'depicted' without -n- according the old rule. On the contrary, the perfect tense pinxi 'I depicted', with extension of the -n-, shows that the old rule of nasal insertion just in the present tense was no longer respected, and new analogical forms replaced the old ones). Even circumfixes are not very frequent among the languages of the world. However, they exist as morphological markers. We may quote the case of the Hungarian superlative legnagyobb 'biggest' from nagy 'big'; Germ. has ge-hab-t as the past participle of haben with a circumfix around the basis -hab- which never stands alone, except perhaps in the imperative that in fusional languages often represents the uninflected form.

Furthermore, along the inflection-derivation continuum, i.e. both in the morphology of single lexemes and in the WFRs of compounds, there are more and less productive morphs. Thus, the suffix -ness of happiness is a frequent means to form abstract nouns from adjectival and nominal bases, i.e. 'the state of being X': darkness, kindness and also oneness, treeness (< tree). On the contrary, -ess as a feminine marker seems to be in a recessive state: see princess, actress, waitress and a few others. The suffix appears to be no longer productive.

Finally, in this sketchy overview of (some of) the main problems of WFR, a couple of words about a phenomenon which has been relatively little studied by linguists: I mean the formations that French linguists call 'mots-valises' (literally 'suitcase-words') which are formed via fusion of two words that present partial homophony such as motel from motor-h(ot)el, or even just a blending of two words such as eurovision from Europe and television, Franglais from francais + anglais or Spanglish from Spanish + English. The word television is scarcely used in everyday language and people prefer to speak either of tele or of TV (French [ts'vs], It. [ti'vi] etc.)). Nevertheless, to my knowledge a 'compound' eurotele or eurotava has never been created. Such 'compounds' prove that speakers have a spontaneous metalinguistic knowledge of what is a word in their own language (see the previous reference to Sapir). We get univerbations as motel and eurovision but never *eursion nor *tortel nor even *orotel. These 'compounds' are more or less recognizable in their blending, so that they cannot be considered as prototypical words as defined in [section] 4. The public uses and sometime abuses this new compounding possibility, which has become very popular in recent times. See, for example, the name of a large chain of restaurants and food shopping centers called Eataly ['i:tali]: it has been created via shortening and blending from eat Italy [i:t 'itali], a very shrewd word play, which is intended to mean 'eat according to the Italian style'. Syntactically, Italy plays in such a new formation the unusual role of object of the transitive verb to eat. It is too early to foresee whether Eataly will be generally accepted and recognized as a new word. But motel has certainly become an international word and there exist dictionaries of Franglais and Spanglish. No doubt this relatively new domain of linguistic research has its own WFRs and deserves to be accurately studied.


Paolo Ramat, Pavia

Paolo Ramat

Piazzetta Arduino 11. I

27100 Pavia

In SKASE Journal of Theoretical Linguistics [online]. 2016, vol. 13, no.2
Author:Ramat, Paolo
Publication:SKASE Journal of Theoretical Linguistics
Article Type:Report
Geographic Code:1USA
Date:Dec 1, 2016
