Singing and speaking.

After years of laborious publication on the philosophy of art in general and music in particular, I was twice asked by colleagues why I had never written about song. One of these colleagues was in musical performance, the other was a semiotician married to an opera singer. Neither of them explained to me Just what I was being upbraided for not doing, so I had to guess. It could not, I reasoned, be anything about the history of songwriting, which is not my field and is already covered by encyclopedia articles on song.(1) Nor could it be the technique of singing, which again is not, and would not be expected to be, within my competence. What it could be, and probably was, is the general scope and proprieties of meaning-relations between text and tune, between tune and accompaniment, and between accompaniment and text, In any work that sets words to music, and how all these are affected when the work in question forms part of a larger composition like a music drama, where it is a functioning constituent of action and dialogue.(2) Meanings, after all, are the sort of thing philosophers are supposed to talk about. So I expect that is what they wanted.(3) More basically, though, the kind of philosopher I am could have been asked to consider two other topics: the conceptual relation between song as such and music in general, and the difference between singing and talking. So that is where I might have started. But how?

The first thing to say is that it is usually easy to tell when someone starts singing. Anthropologists say this is true in all cultures. But what makes it so easy? And why is the difference so widely observed? Nothing I have been able to find in the New Grove sheds any light on this--it seems to be taken for granted. The articles in my ancient Encyclopedia Britannica (11th edn) are a little more helpful, but not much--apparently they did not see it as a real problem (the current Britannica is plainly overwhelmed by the plethora of available information and viewpoints, and lacks useful direction).(4) I am sure musicologists have gone into this. But I do not want to read them just now. I do not want to be informed yet; I am deeply puzzled, and want to have my initial puzzlement relieved. I have spent much of my life trying to reinvent the wheel in this way: I am sure wheels exist, and very good ones, better than I could make, but I want to see if I can make something that could spin and go.

What sort of difference is made in this very noticeable transition from speech to song? What was spoken was mostly words, what is sung is still mostly words, so the primary difference should be in the sound. But not the only difference: the word `mostly' is important. In speech there are also umms and aahs, throat-clearings, and other slowings of the word flow; in song there are cantilenas and other vocalized interpolations--which also slow down the flow, though this is not how we think of them. The non-verbal concomitants of speech simply give speaker and hearer time respectively to generate and absorb the concentrated significance of speech proper, though an attentive hearer may often gather unintended nuances of meaning from the nature and frequency of the sounds emitted (and, conversely, speakers may use such sounds to convey messages they are unwilling to put into words). But the non-language components of a melodic line are essential to a song of which they form part, fully integrated in the musical meaning--though in many traditions they may be extrinsic to the song as composed, being contributed by the performer as an accepted part of the art of singing songs of that sort. (Such a contribution establishes the status of the singer as artist, and makes the song the performer's own while it is being sung.)

To me as an uninstructed amateur, the difference in sound between singing and speaking seems to be something like the following. In speaking, we humans use a built-in sound-producing apparatus that causes the breath to vibrate, emitting an intermittent and variable stream of sound that we can call `vowel'. This vowel stream can be differentiated into different vowel sounds by modifying the shape of the operative air channels;(5) it can be varied by length, pitch, loudness, and sound quality (whispering, mumbling, nasalizing, and so on, roughly corresponding to timbre in instrumental music); and it is punctuated and interrupted by consonants and silences. All these variations in speaking are governed by the communication function of what is said; some of them (such as stress and pitch accents) form part of the language system itself, others depend on the communicative exigencies of the speech context. In the latter case especially, the variations often, perhaps usually, appear spontaneously in the process of conveying the intended message. But it is important that a speaker may well be aware of them, producing them deliberately and self-consciously.

In singing, these speech variations are overlaid and largely superseded by a different system of changes, which follows its own rules; the vowel stream takes on a different character, more precisely pitched, so that it is intoned rather than spoken, and these precise pitches, basically conceived as conscious modifications from a monotone, are defined by a fixed system of mutual relations (intervals).(6) These are no doubt all transformations or formalizations of the variations in spoken vowel production. But the essential point is that this system of variation is separate from and inherently independent of the communication functions that determine the vowel variations in speech, no matter how the systems may interact. Singing everywhere is a practice with recognized procedural rules constituting a sound system. All societies have such practices, which we can think of as rhetorical systems governing the formal modes of behaviour recognized in the society. Singing everywhere is a part of this system of systems.

But why should the formation of such a system be possible? Singing is a fundamental use of a part of our built-in psychophysical apparatus, the voice mechanism. If the basic use of that mechanism is speech, why should it be susceptible of modification in a different way, proving to have possibilities that speech does not exploit? How is it that humans can tune their voices? For this to be possible, our voices must have been tunable to begin with. But why? Could something so striking and capable of such complex development be a mere accidental by-product of the evolution of something else that had its own, quite different, survival value? Well, it could be so; but that is the sort of possibility to which one resorts only when other explanations fall.

At this point I find it helpful to go back to the explanatory conjectures on the fundamentals of sociology and economics with which Aristotle introduces the text we know as his Politics.(7) The depth of Aristotle's ignorance and the remoteness of his time from our own should preserve us from any temptation to take what he says as scientific truth; but, as often happens with ancient texts, the conceptual thrust of his intelligence, unhampered by intractable heaps of fact, may help when we are trying to grasp something that still baffles us in our own very different world.

What may prove helpful here is that the difference between speaking and singing is related to, but by no means the same as, a distinction that Aristotle makes between what he calls voice (phone) and what he calls language (logos). He treats these in effect as two radically different but related communication systems.(8) Voice, which is common to all social animals, is a direct audible manifestation of psychophysical states and attitudes, including feelings: warning cries, sexual signals, food markers, child identifiers, threats, and so on. In animals other than humans, these are purely mechanical indicators of the condition of their utterer.(9) Vocal signs are causally determined indications of the condition and situation of the emitting organism, and other organisms respond to them no less automatically, so that they perform the function of correlating the behaviour of the individuals that make up animal groups or societies.

Language, in contrast with voice, formulates and combines concepts, expressing ideas about values and reality, and makes possible the conscious maintenance and policy formation of political societies. Neither the utterance of language nor the response to it is immediate and automatic. Animal cries, to use a more recent formulation, tend to be holophrastic, simple cries or series of such cries not susceptible of analysis. But languages, of course, are articulate. Linguistic utterances are formed strings of signs in mutual relation, with syntaxes admitting free modification and combination.

Voice is bound to the immediate motivation and occasion of its utterance in a way that linguistic utterance is not. Linguistic utterances can be composed, stored, and manipulated. Visible equivalents of spoken language are developed, so that the same communication that can be spoken can also be written. And the manipulability of language means that essentially the same content can be put into different verbal forms, in ways that are studied by transformational grammar. For this to be possible, language must be digital, with distinct units, whereas voice is analog, each item a continuum of smooth transitions. Most importantly, not only is language not bound to the moment of utterance, but the temporal order of linguistic exposition is not essential to the structure and content of what is conveyed. Logical relations are not causal relations.

The heart of Aristotle's exposition in its context is that humans are social animals as well as citizens. We show and share our feelings as well as discussing our thoughts. Our communicative repertoire must accordingly comprise voice as well as language. This fact might tempt us to say that singing corresponds to voice as talking corresponds to language. We might then feel like adding, following Aristotle at some distance, that all human action is, on the one hand, suffused and mediated by language, and, on the other hand, embedded in the basic animality to which voice gives expression. Humans are vocalizers as fundamentally as they are speakers. Both aspects of humanity are equally basic; neither has priority over the other.

The Aristotelian model might then tempt us to say (as people often have said) that song brings to the fore the expression of feeling in a voiced musical utterance, whereas in talking the language component is dominant. The trouble with saying that is that it suggests that singing, in its closeness to animal cries, is natural, whereas talking, because languages are artifactual systems, is artificial. But did I not begin by invoking an anthropological thesis to the effect that singing everywhere belongs to the rhetoric of formal behaviour whereas talking is basically informal, part of the everyday commerce of social interchange? Yes, I did, but that thesis discounted two facts. The first fact is that writing divorces language from the spontaneity of talk. Grammar becomes a matter of study, and written texts have to be painfully composed; and this conscious grammaticality infects spoken utterance, so that spontaneous talking is accompanied by formal speech in which what is spoken becomes rhetorical. The other fact is that our linguistic skills are so routinized that grammatical utterance becomes second nature. In a song or songlike utterance, the resources of voice can be used unselfconsciously to add an expressive dimension to a meaning spontaneously recognized in an uttered text, whether that text itself be composed and remembered, or improvised, or read off from an inscription. The formal practice of singing is a rhetorical use of something that may, once the practice becomes second nature, be put to informal and spontaneous use.

It may seem, from what I have said so far, that the theoretical perplexities that surround the practices of singing stem largely from the combination and interaction of two disparate communicative systems, corresponding roughly to what Aristotle called voice and language. Song, however, though it uses the sound-producing apparatus of (animal) voice, is not itself a manifestation of voice. It is part of the behaviour of us culture-developing and language-using humans, who are always putting our natural resources to new uses. It is basically a self-conscious practice, even though it can be spontaneously used. This makes at least three immediate differences. First, the mechanisms of voice-production can be used to produce sound variations other than those forming part of any spontaneous repertoire of holophrastic items we may have inherited or adapted from our non-human or subhuman ancestors. Human voice expands its repertoire freely, and in its new range the sounds uttered are emancipated from the hereditary task of giving a limited range of signals with determinate and preordained meanings; they are spontaneously recognized as belonging to human vocalization with its human meaningfulness, but what specific meanings they may take on, if any, is not to be predicted from general facts about humanity but can vary freely from culture to culture. Second, the modifications thus recognized as typical of voice are not tied to their psychophysical origin, but can be simulated by non-vocal signifiers, using what we know as musical instruments. The kind of way they sound identifies them as having the kind of meaning that voice has, without needing to be traced to a real, or conjectural organism. And musical instruments, since they may well have qualities and capacities beyond the range of the human apparatus, can further extend the audible scope of what is still immediately recognized and responded to as voice. And third, building on the other two, vocal sounds may be subsumed in a specifically musical kind of structure, tune, or melody. The human uses of voice are by no means holophrastic; vocal units enter into well-formed structures of distinctive kinds, just as verbal units (words, syllables, phonemes) enter into linguistic structures. Song, after all, is vocal music--a fact I have hitherto failed to mention. And the concept of music introduces immense complexities of its own.

By `music' here I mean not the whole range of the world's arts of sound, but specifically what some have called the arts of musical sound; that is, sound organized in systems of tones, intervals, scales, and such, which voices must be trained to produce and recognize and instruments must be crafted to produce.(10) It is a tradition of our civilization that music as thus understood is basically a branch of mathematics, superficially having to do with the frequencies and forms of sound waves, and fundamentally perhaps representing something more recondite.(11)

A sort of singing, a variably pitched but unmethodical vocalizing, may be a spontaneous use of what Aristotle called voice, cut free from psychological necessities and no doubt from any special personal or social connotations. But that is not quite what we call singing, any more than flinging the limbs around is quite what we think of as dancing. It is to be distinguished from spontaneous vocalizing in song, which uses the tuned singing voice and the determined relationships of one's own internalized musical practice, just as people unselfconsciously engaged in talk spontaneously shape the vowel stream according to the rules of the language they are talking--rules that have become second nature to them.

Even musically formed vocalizing is not quite what we mean by singing in a fuller sense. Songs have words: the voiced sounds are superimposed on a verbally structured text, to which they may be meaningfully related in many sorts of ways. Here, voice and language are, in a way, fused. But, as we saw, the inherently musical structure to which the voiced sounds themselves are subjected is itself perceived to be language-like. Like Aristotle's `language', it is articulated, using discrete elements, subject to conscious composition and manipulation. The course of our discussion thus far might accordingly lead us to ask: if singing originates in the communication system of voice, does music originally pertain to the communication system of language? Or should we rather say that there are two kinds of music, of which one is primarily voice-based and the other primarily language-based?

Whichever we say, however, calling music `language-based' is problematic. In talking, where the concept of language is most at home, we, saw that the sound variations in the vowel stream are not autonomous but are determined by the various needs of the linguistic communication. But in what I have called a `language-based' music based on a structure of musical tonal relations there are no such requirements. Musical features and relations are themselves the focus of interest. In Eduard Hanslick's famous phrase, the content of such music is `tonally moving forms'.(12) Absolute music may be based on form that is in some respects fundamentally language-like, but it is not a language for communicating conceptual content. The voice-based music with which it might be contrasted seems closer to its communicative origins, but it too has its own autonomy. The functions and potentialities of voice as such are not (as has often been said in the past) limited to the expression of determinate affective content, but coextensive with possible experience--or rather, constitute a sui generis realm of affective experience.(13)

If we were to ask Aristotle which of his two communication systems, voice or language, was the most natural and fundamental in human affairs, the passage in Politics where he makes the distinction would give us his answer. It depends which way you look at it. From one point of view, voice is prior, because it is tied to the natural, precultural basis of our animal existence, without which we could not sustain ourselves in an environment. From another point of view, language is prior, because our nature, what makes us the beings we are, is specifically human nature; we can be ourselves only in political societies articulated through language. In music, analogously, we might say that spontaneous utterance in singing and systematic musical composition each has its own priority. Continuous vocal modulation without preformation must be in some sense primary, because it utters the inherent variability, the continuum on which digital systems of modulation are superimposed; but in another sense composed music is prior, because it alone manifests the realized form of music as an art capable of generating a repertory of determinate objects of experience recoverably inscribed. If such an art did not exist, to call unformed vocalization `music' would be meaningless.(14)

It is instructive to compare Hanslick's approach with that of Johann Mattheson, a voluminous writer on music in the second quarter of the 18th century, with which it is often contrasted. Mattheson used to be credited with an influential theory called the `Theory of Affect' (Affektenlehre), according to which the primary function of music was to express emotions, and a language was to be worked out in which a musical equivalent for every emotional nuance could be determined. The underlying motivation was to win for the voice-based music of opera, in which he was active, the prestige traditionally reserved for the language-based polyphonic music preferred by the ecclesiastic establishment.(15) But Mattheson's major book, The Compleat Music Director, in which this theory was alleged to be worked out, is not really like that at all.(16) He does indeed claim that the primary function of music is to express emotion in a way that will improve morals, but he has nothing beyond a few traditional commonplaces to say about the means by which this is to be done. The bulk of his discussion is devoted to musical procedures in musical terms, matters of voice-leading and such. The task of finding appropriately expressive music is left to the practical music-makers, in the light of their talent and experience as musical professionals and their knowledge of the world and the human heart.

Mattheson's main musical point is the priority of melody, as opposed to contrapuntal construction. There are rules for good counterpoint, but good melody is not something for which formal principles can be found. It is a matter of how the tune goes, whether it sounds and feels right. What is required is a unity of feeling. But the way Mattheson conceives this unity is in effect plainly derived from opera, a musical structure with a dramatic base articulated in a libretto. The secret of writing a good tune for an operatic aria is neither to neglect the text nor to follow its emotive implications word by word, but to concentrate on the dominant feeling of the song as a whole in its dramatic context. It is to make this requirement of identifying a dominant feeling intelligible that Mattheson invokes the `theory of affect', the Affektenlehre. On the few occasions when Mattheson uses it, this term refers not to any doctrine of musical equivalents but (as the word itself suggests) to a purely psychological theory, Descartes' famous and influential treatise on The Passions of the Soul, published in 1645-1646.(17) What Descartes had done in that book was not to describe the variety of human feelings, which as he says would be an indeterminate and interminable task; it was to work out more or less a priori (though not without supporting reference to the phenomena) what must be the basic affective and practical responses of any animal that consists of a spiritual substance inseparably fused with a material substance, interacting with other such animals in a physical environment that differentially sustains and threatens them. The same armature, obviously, must underpin all the situations with which the opera composer has to deal; and it is only the availability in principle of such an armature that makes it sensible to specify the basic affective character of any situation.

Incidentally, it seems obvious that the attitudes and reactions picked out by the Cartesian analysis, on which Mattheson relies, would have to furnish the underpinnings for the basic repertory of cries and other sounds belonging to `voice' as Aristotle conceived it. Of course, modern researches have shown that animal sign systems do not really work like this at all, being much more highly specialized and unpredictable; but the world as Mattheson's contemporaries saw it will have had more in common with that of Aristotle's hunches than with that revealed by ethological and other scientific investigations today.

Be that as it may, Mattheson's procedures reveal a conviction that the experientially established relation between a piece of music and the theoretically defined affective and practical reality to which it relates cannot and does not form any part of the music itself, which can only be described in musical terms. And there is no real difference between Mattheson's views here and those of that notorious `formalist' Eduard Hanslick, whose insistence that music consists of tonally moving forms is usually taken to be saying that music has nothing to do with the expression of emotion. For Hanslick does not in fact refer to `music' but to `the content (inhalt) of music', and with this qualification his position seems to be substantially the same as Mattheson's.(18) Hanslick as a concert reviewer was as susceptible to, and as eloquent about, the affective aspects of the actual experience of music as anyone else.(19) And, after all, he does not and cannot explain why a composer chooses to devise tonal movement in one set of audible forms rather than any other. The difference between the positions of the two theorists is mostly a matter of slogans and rhetoric--and, more fundamentally, of intellectual style and life experience.

To recapitulate, the implications for song of the line of thought I have been developing seem to be much as follows. It looks as if there may be, in Aristotelian terms, two tendencies in musical practice: voice-based and language-based respectively. The activity of singing would seem to have a bias towards the voice-based rather than the language-based. Peter Kivy, who has explored these issues with an erudition that combines musicological with philosophical sophistication, is interpreted by Levinson as espousing a position much like the following.(20) There is indeed such a thing as a premusical use of the singing voice, dronings and ululations, that would count as song in a society without musical practice, if such a society existed. But in a society with a developed musical system, which means every society we know about, song is a form of music: the artistic use of the singing voice is subject to musical form, and primarily to be judged by the criteria of music per se. This position, whether it is really Kivy's or not, may pay too little to what I have extracted from Aristotle, the apparent fact that voice and language represent two radically different communication systems, each fundamentally important to the lives of such social organisms as we are. Musical systems are, in important ways, language-like; but it is hard to see why the potentialities of voice should not admit of free extension and exploitation as an artistic resource, and it is equally hard to see how a musical practice in which vocal and instrumental forms are in complex symbiosis could be explained without attention to the resources of voice and language alike. If I had written at length on the philosophical problems of singing and song, as my colleagues urged me to do, the lost Aristotelian heritage is something about which I would have tried to do something.

The conclusion to which we have now been brought is not, however, one in which we can rest. On the face of it, a fully developed song involves not two separate communication systems but three, each elaborated in art. There is the verbal text set, developed according to the devices of poetry; there is the music, as such, elaborated by the development of a quantified system of tonal relationships; and there is vocal communication, elaborated in a method and repertory of significant vocalization. Songs depend on the development of all these, and on their interaction. The interaction of three such different communicative systems could be expected to yield a new and immensely complex field of significant variation, which would be what songwriters have practical knowledge of and of which the theory perhaps awaits development.

In thus correcting the Aristotelian dualism which, we can now see, falsified our discussion by inviting us to subordinate music either to language or to voice rather than allowing it its own place as a third system, we are in danger of going too far. If music is indeed to be treated as a system independent of the others but in symbiosis with them, we should perhaps follow the tradition to which I referred when I introduced the concept of music as such, and recognize it as essentially mathematical in nature. As such, music per se would be devoid of content other than its own relationships, and free from any affect other than the profound delights of mathematical order as such. By the same token, music in this sense, unlike voice and language, would not be a communication system at all: mathematics as such is not a communication system, it is simply mathematics. But what, exactly, is mathematics?(21) The proper identification and description of our three systems remains to be carried out, and the limited contribution the present paper has been able to make may well be that of an irritant rather than a foundation.

A song, at any rate, involves a multiplicity of disparate systems, each having its own integrity but developed in relation to the others. To get a hold on the integration of these materials and methods in actual musical productions, we might begin by following Cone's lead in invoking the `composer's voice': that is, the required integration would be a distinctive kind of ordering creativity, irreducible because it is an aspect of the overall integration that an intelligent organism must constantly employ as it melds its plans and perceptions in a single stream of active life.(22) But to follow this line of investigation is beyond our scope here.


