Douglas Hodge reading Keats's Elgin Marbles sonnet.

This paper is part of a large-scale investigation of the nature of the rhythmical performance of poetry.(1) It uses the computer to analyze delivery instances of verse lines within the theoretical framework of the perception-oriented theory of meter, worked out back in 1971-1973, and published in 1977. It is devoted to the actor Douglas Hodge's reading of Keats's Elgin Marbles sonnet. It examines some of the vocal manipulations by which Hodge renders his reading rhythmical. Hodge appears singular in the devices he deploys among the readers - including leading British actors and colleagues from the academy - we have examined so far. One of the most effective vocal devices we have encountered in the readings of other readers is what Gerry Knowles ("Pitch Contours") has called "late peaking," that is, when the intonation peak hits the syllable nucleus later than in the middle or even on the following continuant, usually a sonorant. This device has been perceived as having an impetuous forward push, contributing to the solution of a variety of problems arising from the conflict between the linguistic and versification units. In Hodge's readings, we have found so far only two instances of late peaking, both in cases when he artificially generates an unnecessary stress maximum in a weak position. He prefers, instead, to deploy a variety of vocal devices, some of which are unfamiliar in the other readings. By way of discussing these devices, I shall emphasize some basics of the rhythmical performance of poetry that are essential also to analyzing readings in which different vocal devices are deployed.


The dominant metric system in English poetry from Wyatt (but according to Halle and Keyser from Chaucer) to Yeats and after is the "syllabo-tonic" or "syllabo-accentual" system. In this system, both the number of syllables and the order of stressed and unstressed syllables in a verse line is fixed (in contrast to the "accentual" system, for instance, in which only the number of stressed syllables in a verse line is fixed). "Iambic pentameter" means that a verse unit consists of an unstressed and a stressed syllable and that the verse line consists of five such units. In the first 165 verse lines of Paradise Lost, there are just two such lines. Why should we speak at all, then, of "iambic pentameter"? Robert Bridges provided in his Milton's Prosody a list of "allowable deviations." These deviations were "allowable," mainly, on Milton's authority. Such a conception is both outrageously unparsimonious and counterintuitive in that people do not read poetry with a list of allowable deviations in hand.

What we need, then, is a systematic explanation of what is it that we perceive when we perceive a verse line as iambic pentameter and, no less important, what are the constraints on this. During the first half of the present century, several scholars attempted to solve the problem by claiming that the syllabo-tonic meter was not in fact syllabo-tonic. One solution proposed was the assumption that in iambic pentameter we have, in fact, an accentual meter with four beats. Another approach proceeded on the assumption that in the reading of poetry there are equal or proportional time periods between stresses, or between "regions of strength." In the 'twenties and 'thirties of the present century, the so-called "sound recorders" (whose work was summarized by Schramm) approached the issue empirically. This approach had the theoretical weakness that in many instances they mistook the structure of accidental performances for the structure of the poem, but much in their findings can be utilized for establishing the inventory of the reader's (or the vocal performer's) rhythmic competence. One of their achievements was that they failed to demonstrate the existence of equal or proportional time periods. In 1959 Wimsatt and Beardsley published a paper in which they attempted to clean the table, arguing that the syllabo-tonic meter was in fact syllabo-tonic. They said that the propounders of the four-beat accentual solution would be hard put to explain why Pope's

1. A little learning is a dangerous thing

is more acceptable than

2. A little advice is a dangerous thing

As for the equal or proportional timing solution, they said that Wallin began with E. W. Scripture's concept of strong stresses, or "regions of strength," which Scripture called centroids; "the longest observed was seven times the shortest when there was no intervening pause, and fourteen times the shortest when pauses occurred." Ada F. Snell in "An Objective Study of Syllabic Quantity in English Verse" presents experimental evidence against the assumption that readers of English verse observe any kind of "equal time intervals" (589n).

It was Morris Halle and Samuel Jay Keyser who proposed in their generative theory an exceptionally parsimonious criterion for distinguishing all "metrical" lines from all "unmetrical" lines, assuming that this criterion is internalized by the reader. The iambic-pentameter line consists of an abstract pattern of regularly alternating weak and strong positions upon which the sequences of linguistic stresses are "mapped." The "stress maximum" is a theoretical construct, defined as "a stressed syllable between two unstressed ones, within the same line and the same syntactic constituent." The phrase "a garden" contains a stress maximum; "a big garden" contains no stress maxi-. mum, because neither of the two stressed syllables occurs between two unstressed ones. All mappings are "allowable," except one: a stress maximum in a weak position, which renders the line unmetrical.

Immediately after the publication of the papers by Wimsatt and Beardsley and Halle and Keyser, there was a great outcry in the metrist community: these people were robbing metrists of their right to say whatever they like and be irrefutable. A counter-offensive on behalf of the equal timers was led by Hendren, who, in response to the measurements quoted by Wimsatt and Beardsley, put his theory beyond refutation (and, by the same token, beyond proof): "Machines can only tell us what sort of material the mind classes as rhythmic. Sensory tests, supported by mechanical tests, show that a line of verse is divided into a number of rhythmically (not mathematically) equal time periods marked by stress" (302). This, in the final resort, however, amounts to saying that "rhythmically (not mathematically) equal time periods are those which the mind classes as rhythmic."

In my work, I have adopted from Wellek and Warren (chapter 13) the assumption that poetic rhythm can be accounted for with reference to three dimensions: "prose rhythm" (linguistic stress pattern), metric pattern, and pattern of performance. I suggest that Wellek and Warren need the performance dimension both to account for why two unlike delivery instances may still be performances of the same metric structure and to point out that some "sound-recorders" mistake in their analysis an accidental performance for a poem's meter. In my approach, the notion of performance has more far-reaching consequences. In practice, I contend, all restrictions on metricalness have been violated by the greatest masters of musicality in poetry. Thus, for instance, as we have seen, according to Halle and Keyser all mappings of linguistic stress upon meter are "allowable," except one: a stress maximum in a weak position, which renders the line unmetrical. Halle and Keyser and their outraged critics in the 1970s found twelve unmetrical lines under the stress-maximum theory. In an appendix to my book I have provided a list of over forty further instances in major English poems (and there appear to be many more). What is more, the distribution of the violating stress maxima seems far from being random. I do not imply, however, that "anything goes." It is, rather, that constraints are placed not in the verse structure, but in a reader's resources in performing a line rhythmically. In cases of deviation, "prose rhythm" and meter may be conceived as analogous to the two incompatible terms of a metaphor. A reader registers in each case the incompatibility and resolves it in a pattern of performance. The utmost limit of rhythmicality (as of the meaningfulness of a metaphor) is the reader's ability or willingness to cooperate, that is, to resolve the incompatibility of the two terms by a rhythmical performance (or, in the case of a metaphor, by a semantic interpretation). Thus, my approach attempts to handle semantic and metric phenomena by a homogeneous set of principles.

Earlier sound-recorders made time-measurements of linguistic events, measured pitch intervals (and even made estimations of musical intervals), showed the acoustic correlates of stress, and compared intonation contours. But my investigation is the first one, as far as I know, that tries to investigate these not as issues in their own right, but as solutions to a perceptual problem.(2) Moreover, it has taken me twenty-five years of agonizing search to find a way to interpret the machine's output as evidence of success or failure to effect such a solution.

At this point I wish to note one important aspect of my approach that is in utter disagreement with the received view as presented in a recent summary of the "state of the art" of performance - the "Performance" entry of The New Princeton Encyclopedia of Poetry and Poetics. The view presented there concerns ambiguity:

Chatman isolates a central difference between the reading and scansion of poems on the one hand and their performance on the other: in the former two activities, ambiguities of interpretation can be preserved and do not have to be settled one way or the other ("disambiguated"). But in performance, all ambiguities have to be resolved before or during delivery. Since the nature of performance is linear and temporal, sentences can only be read aloud once and must be given a specific intonational pattern. Hence in performance, the performer is forced to choose between alternative intonational patterns and their associated meanings. [893]

In my book on meter I have criticized at great length this view of Chatman's. I have suggested (134 and passim) that where linguistic stress pattern or intonation pattern conflicts with the patterns required by versification, the performer may have recourse to conflicting cues. Typically, the actual intonation contour in poetry reading is not the one predicted for ordinary prose, but a contour distorted to some degree, and the listener decodes it in terms of the interaction between two contours: the ones required by ordinary prose and by versification. This applies, as I hope to show, both to enjambment and to instances where the linguistic stress pattern deviates from the metric pattern.

In the following two paragraphs, I shall briefly present a few of my theoretical assumptions and contrast them to assumptions held by some metrists of the generative school. In the Cognitive Poetics workshop at the Katz Research Institute, Tel Aviv University, two rival cognitive theories of meter have been worked out: one in the generative tradition, by David Gil and Ronit Shoshani; the other one by me, in the Gestaltist tradition (the perception-oriented theory of meter). Both theories postulate (1) a string of immediate constituents of language (syllables) having a not-too-regular stress pattern, and (2) a metrical grid of regularly alternating weak and strong positions mapping syllables onto the metrical grid. Both theories claim considerable psychological reality for the metrical grid.(3) In most other respects, however, the two theories are diametrically opposed. In the case of iambic pentameter, for instance, the Gil-Shoshani theory assumes twelve positions instead of ten; in most instances, one or two metrical positions are left empty, enabling the prosodist to manipulate the mappings so as to achieve maximum harmony between the two dimensions of rhythm. By contrast, my perception-oriented theory assumes (1) that the metric grid contains only ten positions and (2) that whenever there is a conflict between the metric grid and the linguistic constituents, the metric grid "puts up" active resistance, "striving" to reassert itself in the listener's perception. The greater the conflict, I contend, the stronger is the reassertion of the metric grid in the listener's perception, and, up to a certain point, the greater is the tension between the two dimensions. When that theoretically undefined point is exceeded, the metric grid suddenly disappears from consciousness, and tension ceases. The role of a vocal performance is to manipulate the vocal material in such a way that both the linguistic and the metrical dimension become available to consciousness at one and the same time. In my view, this simultaneity constitutes a "rhythmical performance."

One of the crucial differences between my perception-oriented theory and most generative theories of meter lies in this area. The latter are looking for mapping rules that allow for the best possible correspondence between the string of stressed and unstressed syllables, on the one hand, and, on the other, the sequence of strong and weak positions. Only when these efforts fail do they speak of tension. My theory postulates not two but three dimensions of poetic rhythm: linguistic stress pattern, metric pattern, and performance. Far from attempting to make the stress pattern conform to the metric pattern, mine actually welcomes and even sharpens deviance, eventually resolving the conflicting patterns in a "rhythmical performance." In harmony with the above assumptions, I also assume that if the wider versification unit (hemistich or verse line) is firmly established in perception, a relatively great number of deviations of considerable strength can be tolerated. What is more, up to a certain point, the more numerous and the stronger these deviations, the stronger will be the tendency of the higher units to preserve their perceptual integrity. Indeed, they will put up ever increasing resistance as the deviating smaller elements "push" against the boundaries.


Consider the first two lines of Keats's Elgin Marbles Sonnet:

3. My spirit is too weak; mortality

Weighs heavily on me like unwilling sleep

The first line is segmented by a caesura into 6+4 metrical positions; the second line into 5+5 positions. If the integrity of the line is preserved, the second hemistich of the first line is relatively "required" - relative to the 4+6 division. I say "if" because in this line the last strong position is occupied by an unstressed syllable of a polysyllabic and thus it is not properly "closed;" what is more, the line-ending is enjambed. From the syntactic point of view, the following aspects in the first verse line should be pointed out. The first six syllables, "My spirit is too weak," constitute a self-sufficient independent clause that arouses no further expectations. The next rather long word, "mortality," starts another independent clause (running on to the next line).

Douglas Hodge's performance of these two lines poses a serious problem both to the inexperienced listener and to the trained theoretician. At the same time, however, it is quite transparent to the experienced listener.

The first, most conspicuous observation that occurs to the listener is the exceptionally long, 1162-msec pause in midline, between "weak" and "mortality." To be sure, there is here a major syntactic boundary that does warrant this pause; still, the pause is long enough to make the listener despair of ever completing the perceptual unit that might constitute an iambic pentameter line. The second most conspicuous observation is that there is no measurable pause between "mortality" and "weighs" at the transition from the first to the second line of the sonnet. Consequently, there is a real danger that the lineation may be lost, so that the perceptual segmentation reflects only the syntactic structure, not the versification structure.

The third conspicuous observation concerns articulation of syllables by intonation. Consider "too weak": too is a stressed syllable in a weak position, and is assigned a "terminal" intonation contour, falling from 133.636 to 107.561 Hz. Being 384 msec long, it is perceived as strongly stressed. Weak is only 359 msec long. This small difference becomes rather big, however, if we compare the duration of the vowels only. The duration of the vowel of weak is 116 msec long; that of too is 284 msec long, that is, almost two and a half times longer. There is, however, a 147-msec-long pause between the vowel of weak and the release of the [k], which is perceived as the over-articulation of the voiceless stop. This 169-msec-long sequence of pause + release renders the voiceless stop 1.45 times longer than the preceding vowel. There is, then, a strongly stressed syllable in the fifth (weak) position, the stress of which is cued both by duration and a long-falling intonation curve. Such a stress in a weak position threatens to overthrow the metric integrity of the line, unless meter is emphatically reinstated in the next (strong) position. The duration of the vowel of weak alone would hardly warrant such a reinstatement. Still, this word bears exceptional perceptual prominence, owing to the steeply rising and falling intonation contour assigned to it: it moves from 126.724 to 140.446 to 109.158 Hz, amply compensating for the short duration of the vowel and reinstating meter.

The essential problem we face with Hodge's performance of this verse line is as follows. Assuming that the line's integrity is preserved, what is the status of the long silence after the sixth position? Two opposite views can be suggested: that the silence is or is not a structural part of the perceived whole. Some scholars with a generative inclination tend to suggest that the silence indicates some unoccupied metrical positions in the deep structure (smuggling back, through the back door, the equal-timing conception). Other scholars with a generative inclination who assume no unoccupied positions would insist that the long silence destroys the verse line as a perceptual whole. According to the present conception, by contrast, the perceptual integrity of the verse line must first be submitted to direct experiencing of listeners, and the theoretical explanation must account for conflicting intuitions (my intuition, for instance, is that the integrity of the verse line is preserved in this performance, whereas my research assistant insists that it isn't; ideally, one should collect responses from a great number of professors of literature and of professional actors, but the methodology outlined below would be the same).

My explanation assumes that poetic rhythm cannot be accounted for by direct appeal to temporal relationships and to proportional time periods. Duration is significant, but only as an acoustic cue for perceptual prominence. Perceptual prominence, in turn, effects the perception of stress and of bisyllabic occupancy of one metrical position. According to the Gil-Shoshani theory, one possible way to account for the long pause in Hodge's vocal performance of the first line of Keats would be to assume that in this delivery instance the two metrical positions are not left just empty: they are occupied by the long, long pause. According to my conception, by contrast, the nature of the pause in midline is quite different: far from confirming some hidden metrical positions, and also far from interfering with temporal relationships of the performance, it intrudes, rather, upon the integrity of the iambic pentameter line in the listener's perception. The rhythmicality of the delivery instance will depend on whether, in the final resort, the versification unit can or cannot reassert itself in the listener's perception, despite the intruding pause. The longer the pause, the greater the tension, provided that the versification unit can reassert itself in the listener's perception. If the versification unit cannot reassert itself, tension abruptly ceases.

In light of the foregoing generalizations let us have a close look at line one again: first the structure and performance of the two hemistiches and then the nature of the pause between them:

4. My spirit is too weak; mortality

w s w s w s w s w s

The metric grid consists in a sequence of regularly alternating weak and strong positions. Confirming this sequence, the stress pattern begins with an unstressed and a stressed syllable. This, however, is followed by a sequence of two unstressed and two stressed syllables; I have called such a sequence a "stress grade." Wimsatt and Beardsley recommend performing such a sequence as follows: each later syllable in the sequence "-rit is too weak" is more strongly stressed than the preceding one; thus, conforming with the stress pattern of language, the iambic lilt is preserved by stressing the even-numbered syllables more strongly than the preceding odd-numbered syllable. I have called this pattern a "stress slope." Wimsatt and Beardsley seem to believe that this is the performance (or even the structure) of such a verse line. The present assumption is that this is one possible performance of such iambic lines that contain a sequence of two unstressed syllables followed by two stressed ones. Elsewhere (Perception-Oriented) I have predicted that such verse lines would be performed in a rather different way. The first two unstressed syllables will be equally unstressed; the next two syllables will bear equally heavy stress; the phonemes as well as the boundaries of the stressed syllables will be exceptionally well-articulated. The heavily stressed syllable in the weak position will be grouped forward with the following heavily stressed syllable in the strong position, seeking "focal stability." The boundary of such a stress grade will tend to coincide with the caesura or with the line ending.

Contrary to Wimsatt and Beardsley's predictions, all these predictions have been amply fulfilled in Hodge's performance. The syllables "-rit is" are equally unstressed; the syllables "too weak" are both over-stressed. As Figure 1 and my discussion of it suggest, their boundaries are over-articulated by intonation; the great prominence of "too" is enhanced by duration, that of "weak" by the rising-and-falling intonation contour. The heavily stressed "too" occurs in a weak position and threatens the perceptual integrity of the meter. The extremely prominent syllable "weak" occurring in a strong position reinstates meter; by the same token, it bestows exceptionally great stability and strong articulation on the (marked) caesura. As suggested above, the deviating stress on "too" pushes as it were against the coinciding boundaries of the wider versification units: those of the metric foot and of the hemistich, both of which, in turn, "put up" vigorous resistance. The articulation of the boundary of "too" is effected by the long-falling intonation contour, separating it from "weak." At the same time, its forward grouping to "weak" is effected by the absence of measurable pause after it, and the linguistically unwarranted 87-msec-long pause preceding it. Oddly enough, this is not perceived as a straightforward pause, but rather as an indication of the over-articulation of the /s/ preceding it and the forward-grouping of the adverb following it. The unstressed syllables "-rit is" must likewise be perceptually grouped forward, since the nearest point where focal stability can be achieved is at "weak" in a strong position.

The intonation contour of "mortality," as shown in Figures 1 and 2, is noteworthy. The contour of "mor-" has the shape of an independent "hill," ranging from 102.558 to 124.576 to 118.548 Hz. The contour of "-ta-" resets to 141.346, then to 153.125 Hz, and then falls to 140.446 Hz. Thus, this stressed syllable becomes a pivotal point initiating an "internally defined prosodic pattern," continuing on the contour of "-li-," moving from 139.557 to 147.987 to 132.036 Hz. This falling movement terminates on an emphatic terminal contour on "-ty," falling from 97.137 to 75.514, where a humpback begins culminating in 78.191 Hz, finally falling to 65.044 Hz. As we shall see, this humpbacked portion is quite significant.

This pitch movement generates a rather unusual, and forceful dynamics. The first syllable of "mortality" has a duration (222 msec) and a rising-and-falling intonation contour that would suffice to render it exceptionally prominent, but owing to the ensuing reset of pitch, which renders the next syllable even more prominent - in fact, making it the stressed syllable of the polysyllabic - it assumes a different character. It is fairly isolated by its intonation contour, followed by a brief pause; its isolation and relatively long duration are perceived as an arrest before vaulting upward to the pitch of the stressed syllable. Then the reset of pitch on the second syllable initiates an impetuous forward movement, leading toward a firm line boundary confirmed by a stressed syllable in the tenth (strong) position. This stressed syllable, however, is never to come: "mortality," as we have seen, ends with an unstressed syllable. What is more, the clause is run on to the next line, and there is no measurable pause between the two lines.

Notwithstanding, there arises a feeling that the verse line is somehow closed, that it does put up a resistance against the forward-pushing pitch movement. This closural quality can be ascribed to the duration and the falling contour of the last vowel. When one listens to the whole line, there is a feeling that this contour is exceptionally long and falls exceptionally low. Furthermore, one may discern a slight inflection toward its end, as reflected in the graph. There is no objective measure to determine whether a syllable is longer than it ought to be, or a contour falls lower than it ought to fall. Still, there may be some not-too-straightforward ways to support one's intuition concerning exceptional duration or the curve of falling intonation. First, it would not be too great an exaggeration to expect the last two syllables of "mortality" to be of roughly equal duration. When measuring them, however, we find that [ty] is more than twice as long as [li]; we obtain [li]: 159 msec, and [ty]: 325 msec. When comparing the vowels only, we obtain an insignificantly higher ratio: 103:222 Hz. The duration of the last vowel until the humpback is 131 msec only. In an attempt to find out the effect of the portion beginning with the humpback in the intonation contour, I have removed it from the sound sequence. The truncated word "mortality" sounded perfectly natural, with respect to the intonation contour as well as to vowel quality and duration. When comparing the two versions of the word, one may hear the difference of length, of intonation base-line, and the inflection on the untruncated contour. There is, however, an additional difference too. The original version is somehow "rounded out," but this had nothing to do with the humpback shape of the intonation contour.

Curiously enough, the presence of this "coda" seems to have two opposite effects. On the one hand, it enhances the sense of termination (foregrounding the versification unit, the line); on the other hand, it seems to suggest continuity between the words "mortality weighs," enhancing the run-on syntactic unit. What is the source of this sense of continuity? When the humpback portion was isolated, one could tell that the vowel quality in this portion is different from that of the preceding portion: the aperture of the [i] had been gradually constricted and rounded, obtaining something like a dark [j], almost [u]. This constriction can be construed as the result of an articulatory gesture, coarticulating the [i] with the ensuing /w/. One important effect of this gesture is highlighted when the words "mortality weighs" are sounded together, in the truncated and the untruncated versions. As I said, there is no measurable pause between the two words. A discontinuity is observed in the truncated version, as compared to the untruncated one. This discontinuity can be explained as follows: coarticulation typically enhances continuity; the disappearance of coarticulation generates a sense of discontinuity. We have already seen two aspects of this "coda": the unusual length of the vowel and the unusually deep fall of the intonation contour have a conspicuous terminal effect. I have already mentioned that when the "coda" is excised, both the length of vowel duration and of intonation fall appear to be perfectly natural This may indicate that in the untruncated version both vowel duration and intonation fall are, indeed, unusually long. But the vowel quality of this "coda" also has an aspect that may contribute to the terminal quality of the syllable. As Cooper and Ross have pointed out, in a sequence of two syllables in which the nuclei are an unrounded front vowel and a rounded back vowel respectively, it is felt to be more natural if the rounded back vowel comes last. Thus, "ping-pong, sing-song, ding-dong" are perceived as more natural than "pong-ping, song-sing, dong-ding." It has been suggested above that in the course of the coarticulation of [i] with the ensuing /w/, the unfounded front vowel gradually turns into a rounded back vowel. Thus, this coarticulatory gesture has two opposite aspects: coarticulation generates a feeling of continuity, but the transition from an unfounded front vowel to a rounded back vowel generates a feeling of termination and, therefore, discontinuity. This appears to be in harmony with the perceptual demands of enjambment: the ending of the line and the continuity of the clause must be perceived simultaneously.


For understanding the nature of the huge 1162-msec pause in midline we must offer two important considerations: first, the handling of pauses in speech perception in general; and, second, the nature of back-structuring in perceptual processes in general and in speech perception in particular. In Hodge's readings we find a relatively large number of pauses between words as well as within words. Other reciters are rather sparing with this device. In the sequence "My spirit is too weak," for instance, there is in Hodge a 66-msec-long pause after the [s] in "spirit," a 131-msec-long pause after "spirit," an 84-msec-long pause after "is," and a 147-msec-long pause between the vowel of weak and the release of the [k]. Now if you play "wea-" until the release of the [k], you hear what you see on the screen: [wi:] plus a pause; but if you include in the sequence the release of the [k] as well, you hear no pause. but an over-articulated [k]: the pause is re-interpreted as the time-period when the articulatory organs are closed before the release. This renders the voiceless stop (pause + release) 169 msec long, 1.45 times longer than the preceding vowel (cf. [ILLUSTRATION FOR FIGURE 1 OMITTED]). Thus, the perception of the pause is changed after the event; that is what I call "back-structuring." Exactly the same process takes place in the middle of the word "spirit." If we play on the computer the phoneme sequence [majs] and the pause, we hear what we see on the screen: a phoneme sequence and a pause; but if we add the release of [p] to the sequence, we hear no pause at all but, instead, an over-articulated [s] and an over-articulated [p]. Curiously enough, even the pause between the words "is too" isn't heard as a pause, but as the over-articulation of the word boundaries between which it is enclosed; it also effects, perhaps, the forward-grouping of "too." The only pause heard as a proper pause is the one between the words "spirit is." It would appear that there is a tendency in the course of ordinary speech to perceive pauses as parts of articulatory gestures whenever possible, even if acoustically they do not differ from pauses proper. This is the case even if such a perception requires back-structuring, that is, changing one's perception after the event.

Let us consider now the long pause after "weak," and its effect on the integrity of the verse line. It would be rather unsophisticated to regard it as the only factor that effects integrity. Gestalt psychologists have explored the conditions that maximize our tendency to perceive a stimulus pattern as an integrated whole, as parts that belong together. These include, among other things, similarities. From Arnheim's (66-72) illustrations I have redrawn here four [ILLUSTRATION FOR FIGURES 3-6 OMITTED]: grouping by similarity of size, by proximity (similarity of location), by similarity of shape, and by similarity of color. To this Arnheim adds similarity of orientation, of direction of movement, of speed, and the like. To this one might add closure (in the drawings below, each one of the geometrical designs is perceived as one whole because, among other things, it is a closed area), good continuation (when there is a choice between several possible continuations there will be a spontaneous preference for the one that carries on the intrinsic structure most consistently; Arnheim 71), and so forth.

In Hodge's performance of the verse line under discussion, the long pause threatens integrity by violating "proximity," but there may be elements that counteract this violation. There is, for instance, a principle formulated by the Gestalt psychologists, that every given whole tends to break up into similar parts, if the prevailing conditions allow. Similar parts tend to stand out at the expense of the whole (this is yet another way in which similarity affects integrity). In the case of this sonnet (and, in fact, in much English poetry) the smallest unit that returns consistently in the whole poem is the iambic pentameter line. Thus, the longer the sequence of iambic pentameter lines, the more it tends to impose perceptual unity upon consecutive fragments that might constitute such a line. The verse line is divided into two "hemistiches" by a "caesura," which occurs in midline, governed by the dynamics of perception: in the iambic tetrameter and hexameter exactly in the middle, after the fourth and sixth position, respectively, without regard to the linguistic boundaries. The pentameter line is divided into two segments, after the fourth, the fifth or the sixth position. The mere fact that in various lines the division occurs at different points ensures that the smallest recurring unit should be the decasyllabic one. If the caesura occurs after the fifth position, it divides the line into segments of equal length, but of unequal structure: the first segment begins and ends with a weak position; the second segment begins and ends with a strong position. If the line is segmented into 4+6 or 6+4, each segment begins with a weak position, ends with a strong position. This similarity of structure tends to group the two segments of this line together, but also to foreground their segregation. Intonation also contributes to the unity of the line in a variety of ways. We have already noted that intonation generates an effective closure at the end of the line, in spite of the continuity in many respects. But there is also the factor of "good continuation."

The intonation contours on "weak" and on the last syllable of "mortality" or, perhaps, on its last three syllables [ILLUSTRATION FOR FIGURE 1 OMITTED] indicate that the former is a minor versification boundary, the latter a major one. The contour of "-ty" may be perceived as a good continuation of either the contour of "weak" or of the contour of "mortali." The contour of "weak" falls from 140.446 to 109.158 Hz, that of "-ty" from 97.137 to 65.044 Hz. Half-way is at 102.745 Hz, between the top of the former and the bottom of the latter contour. There is also similarity of location: both contours occur at the end of a segment. Thus, there is an "iconic" suggestion that the two segments constitute one whole. Thus, the end of the verse line is confirmed and clearly articulated, in spite of the continuity of syntax and in spite of the absence of a stressed syllable in the last strong position. Now in Hodge's performance, an exceptionally long pause is inserted between the two segments. In view of the foregoing analysis, if the various indications that the two segments constitute a cohesive whole are strong enough, the intruding pause only enhances the whole's tendency to reassert its integrity in the listener's perception. If they are not sufficiently strong, the versification unit will fall into pieces. The analyst can only point out the structure of this delicate balance; individual responses may vary as a result of shifting emphasis from the cohesive to the fragmenting factors and back.

The perception-oriented theory of meter predicts that the rhythmical performance of poetry requires clearer articulation than ordinary speech. All the reciters whose readings we have examined so far do over-articulate the speech sounds and syllable or word boundaries. But Douglas Hodge does it to a greater extent than the others and has recourse to pauses as a means for over-articulation more frequently than any other of our readers. Consider the word sequence "spirit is." In ordinary speech we would expect the speaker to run the first word into the next one; in this reading there is a very conspicuous release of the [t], then a longish pause, and then a glottal stop before the vowel; we would encounter none of these in the stream of ordinary speech. In fact, it appears to me that in this specific instance over-articulation exceeds the requirements of the rhythmical performance of poetry.

Back-structuring in speech perception is a commonplace observation among speech researchers. Anybody involved in speech research, phonetics, or phonology knows that when we play minute segments of the speech signal, we hear noises that barely resemble speech at all. When we hear a sufficiently long segment that allows us to form and test a hypothesis concerning the string of phonemes, the noises abruptly become intelligible speech; what is more, the listener cannot attend at will to the noises any more. It is much less well-known, however, that back-structuring occurs not only in speech perception but, sometimes quite dramatically, in other cognitive processes also. In his brilliant study of the subjective experiencing of time, Robert Ornstein found that we experience time periods as longer or shorter according to the amount of mental storage space required by the information processed during that period. The same amount of information takes up less mental storage space (that is, is experienced as of shorter duration) if it is more efficiently coded. When we look at a meaningless and irregular visual design, it takes up rather much mental storage space. When it is assigned a referential meaning, verbal or iconic, the duration of the period experienced is abruptly reduced. Ornstein found this reduction to hold true when meaning was assigned to the visual design either before or after the period of observation. This experiment may throw some light on the way we seem to handle the long, 1162-msec pause after "weak" in figure 1. It is differently experienced in pro-spect and in retrospect. In pro-spect, the listener is looking forward to see whether a familiar versification unit such as an iambic pentameter line will eventually emerge; the over-articulation of phonemes and word boundaries seems to encourage such expectations (this is perhaps one reason for some unwarranted over-articulations in this reading). On the other hand, the pause is long enough to have the listener expect no sequel. If, however, the versification unit is effectively closed by a variety of perceptual means, the emerging structure (that is, the emerging "meaning" of the perceptual sequence) causes the listener to reinterpret the nature of the pause. It should be noted, first, that during the silent period some information-processing activity (such as forming and discarding expectations) may take place, but no information is stored, so that when the verse line is completed the silent period may shrink in memory to a minimum size; moreover, as Ornstein's experiments indicate quite unambiguously, subjective duration is determined not by the amount of information processed during that period, but by the amount of information stored. And, second, it should also be noted that in the present reading a period slightly longer than one second has to be back-structured, whereas in Ornstein's experiment a one-minute (that is, an almost sixty times longer) period was back-structured. Thus, one need not assume that the pause is too long for back-structuring. Furthermore, according to my conception the pause is not part of the structure of the line, but rather is an event intruding upon it, against which the structure strives to reassert itself in the listener's perception.

As we have seen, there is a long-standing dispute between the structuralist conception represented here, and a conception that there are equal or proportional time periods measurable in a rhythmical performance. But in the final account, all versions of equal or proportional timing broke down in face of measurements. There are many reasons why the structural conception should be preferred (which, I believe, I have presented in the foregoing discussion). But there are also some very good reasons why the proportional or equal timing conception cannot work, of which I shall mention only one. We have discussed "too weak" in Hodge's performance. We have seen [ILLUSTRATION FOR FIGURE 1 OMITTED] that Hodge cues the stress on "too," among other things, by excessive vowel length; the vowel of "weak" is much shorter, but the prominence of this syllable is rendered equal to that of the preceding one by a rising-and-falling intonation contour. So, what we have here is not equal time periods, but equal perceived prominence. What is more, these prominent events do not alternate with non-prominent ones, but rather two prominent events follow two non-prominent ones. A much better account would be that the irregular succession of prominent and non-prominent events disturbs and confirms successively an underlying sequence of regularly alternating weak and strong positions that exists as a mental set and results in a set of frustrations and gratifications in the reader or the listener. Occasional pauses are parts of the irregularly alternating less and more prominent events, but not of the underlying structure of regularly alternating weak and strong positions, refuted and confirmed by them.

We have discussed the versification structure of the first line of Keats's Elgin Marbles sonnet, and Douglas Hodge's performance of it. In this performance there is a very long pause in the middle of the line, and a tense ejambment threatening the perceptual integrity of the verse line. According to the present assumption, the period of silence is not part of the line structure, but an event intruding upon it. According to a principle formulated by Gestalt psychologists, entities tend to reassert themselves in perception in front of intruding events, up to a certain point; when the strength of the intruding event passes a certain point, the perceptual entity falls to pieces. We have pointed out the processes that contribute to the integration of the verse line and to the shrinking of the pause in memory. In all events, this reading remains a boundary case, balancing the cohesive and the fragmenting factors one against the other; and the integrity or lack of integrity of the line may rest on the relative weight assigned to these factors by the listener's cognitive system. Only at this point of the argument, by no means earlier, should one mention past experience: the listener's familiarity with iambic pentameter verse may be the last straw that tilts the balance in favor of integrity. This is the point, too, where the effect of possible theoretical predilections may be considered.


In line 2 of Hodge's reading, stress is displaced from the second to the first syllable of "unwilling." Semantically, this may be construed as an emphatic stress. Metrically, however, this means that the only regular part of the line is made irregular. A stress on "wil-" confirms meter in a strong position; a stress on "un-" infringes upon it in a weak position.

5. Weighs heavily on me like unwilling sleep

w s w s w s w s w s

What is more, it becomes an artificial "stress maximum in a weak position." A "stress maximum" is, according to Halle and Keyser, a stressed syllable between two unstressed ones, within the same syntactic constituent, and within the same verse line. A "stress maximum in a weak position," according to Halle and Keyser, renders a verse line unmetrical. Since I have found (Perception-Oriented) a considerable number of stress maxima in weak positions in major English poetry, two thirds of which occur, as here, in the seventh position of the iambic pentameter line, I have concluded that a stress maximum in a weak position is acceptable if it can be performed rhythmically; and that it is easiest to perform a stress maximum rhythmically in precisely this one out of four weak positions available for violation. A stress maximum in the seventh position constitutes an infringement upon meter, and there arises an urge to achieve again focal stability in the next stressed syllable in a strong position. This stress happens in the tenth position, achieving stability (and powerful closure) in precisely the last position of the line. The regular alternation of weak and strong positions is suspended during these four syllables, but by a series of vocal manipulations mental processing space can be saved such that the metrical set becomes available to consciousness. Thus, for instance, in ordinary speech articulation is rather careless, and the listener must do much guesswork; consequently, clear-cut articulation of phonemes and of syllable boundaries is a rather powerful way for sparing mental processing space, and so is the emphatic grouping of these four syllables together into a symmetrical closed shape (called a "stress valley"). Accordingly. I have predicted that mental processing space can be saved if the speech is over-articulated on various levels, if the first (violating) syllable of the "stress valley" is over-stressed rather than played down, and if this symmetrical closed group is isolated, in one way or another, from the rest of the line. We have accumulated an ever-increasing number of instances in which this performance pattern is followed. In fact, this is precisely what Hodge is doing here. In the present instance, "un-" is over-articulated by its excessive duration (relative to the other syllables), and by the elaborate intonation contour assigned to it, which includes the highest pitch peak in the phrase. Syntactically, the preposition "like" ought to be grouped with "unwilling sleep." Here, however, it is emphatically grouped with the preceding phrase: there is no measurable pause between them, and they have one continuous intonation contour. On the other hand, it is separated from its sequel, quite unusually, by a rather long, 94-msec pause; and the release of the /k/ preceded by a pause is construed as an over-articulation of the phoneme as well as of the word boundary, suggesting what Knowles ("Pitch Contours") has called "segmental discontinuity" before the stress valley. There is a jump from the falling intonation contour of "like" to the rising beginning of the high onset of the next intonation contour, indicating a new start and forward grouping. Furthermore, this is one of the two places in Hodge's readings where we have encountered so far a "late peak" (that is, where the intonation peak occurs after the middle of the vowel); a very late peak at that, it is usually perceived as a strong forward drive. It should be noted that in most instances of our corpus there is no measurable pause before a stress maximum in the seventh position, because it occurs by definition in the middle of a phrase or a word, and separation is indicated merely by pitch discontinuity, late peaking, and sometimes segmental discontinuity. In this instance, a syntactically unwarranted pause further enhances discontinuity. Hodge did not necessarily "know" what he was doing here, but, obviously, he seems to have had a very strong intuition as to how such a verse line can be rendered rhythmical.

The displacement of stress to the prefix "un-" is not uncommon in the performance of poetry. Consider line 5 of Shakespeare's Sonnet 3:

6. For where is she so fair whose unear'd womb

w s w s w s w s w s

Both Gielgud and Marlowe Society displace the stress from the second to the first syllable of "unear'd." Gielgud also adds a rather strong secondary stress to the second syllable (reflected by the intonation contour). But, by this displacement, meter becomes here more regular: the main stress is transferred from a weak position to a strong one. In passage 5, by contrast, the stress displacement substitutes a violating stress maximum in a weak position for a regular metric sequence. Here the syllable following "un-" is more like that in Figure 8a, unstressed, than that in 8b, stressed.

A listening to Hodge's performance, as reflected in the above analysis, strongly suggests that some performers may prefer to generate a stress maximum in the seventh (weak) position, even where more "legitimate" solutions are available - provided that the vocal material is manipulated in such a way that the metric pattern becomes perceptible in spite of the deviant pattern of linguistic stresses. What is more, Hodge, like many other performers, uses for this end precisely the vocal manipulations predicted by my study.

In Hodge's performance of line 13 of this sonnet we encounter another instance of what appears to be a (genuine) stress maximum in the seventh position, where traditionally a very different kind of solution (considered more "legitimate") is offered. Usually we find something like passage 7:

7. Wasting of old Time, with a billowy main

w s w s w s w s w s

In the second hemistich of this line there are six syllables, but only five metrical positions available. Halle and Keyser have formulated the phonetic conditions in which two syllables can be allocated to one metrical position, among them: when there are two consecutive vowels with no intervening consonant; or the two are separated by a sonorant or by a voiced fricative; sometimes two (unstressed) function words can be allocated to one position. I have elsewhere (Tsur, 1977; Tsur and Adam) pointed out the performance rationale of the first three conditions: it is relatively easy to under-articulate in these conditions the boundary between the two syllables. My theory predicts that the boundary between the two syllables occupying one position will be under-articulated; the boundary following the second syllable, coinciding with the position boundary, will be over-articulated. In fact, this is an additional instance of a principle propounded above, back-structuring: the over-articulation of the position boundary causes the reader or the listener to restructure the preceding two syllables, dimming in immediate memory the intervening boundary - provided that the appropriate phonetic and articulatory conditions are present. Shortening of the duration of such syllables may reinforce the impression of bisyllabic occupancy of one position.

Under Halle and Keyser's conditions there are here three obvious candidates for bisyllabic occupancy of metrical position: "with a," "billo-," and "-lowy." If the syllables "with a" are assigned to one position, "bil-" becomes a stress maximum in the seventh (weak) position. Because the first syllable of "billo-" is stressed, under-articulation appears to be less plausible, so the most natural solution seems to be assigning the syllables "-lowy" to one position. In the present performance, the unstressed syllables "-lowy" are considerably longer than the preceding stressed syllable "bil-" (145 and 164 msec respectively, as against 86 msec). When a word like "power" is squeezed down to one position, frequently there is little or no acoustic trace of the /w/. Here the /w/ is rather clearly articulated. On the other hand, there is no trace of over-articulation at the end of this word: the vowel is run into the following /m/, and the whole sequence "billowy main" forms what Knowles calls "an internally defined prosodic pattern." By contrast, the boundary after "with a" is conspicuously over-articulated by an intonation contour and by a 48-msec pause; whereas the /??/ appears to be more closely related to the ensuing "a" than to the preceding "wi-" (see the wave plot in [ILLUSTRATION FOR FIGURE 9 OMITTED]), possibly indicating under-articulation between the boundaries of the two words.

This analysis supports our intuition that in this performance, the syllables "with a" are more likely to be squeezed down in perception to one position than "-lowy," even though they are not of significantly shorter duration. Such a parsing manipulates "bil-" into a weak position, rendering the verse line unmetrical, both under Halle and Keyser's and Kiparsky's theory of generative metrics (under the former, because it manipulates a stress maximum into a weak position; under the latter, because it manipulates the stressed syllable of a polysyllabic into a weak position [see Barsch, 10]). My theory predicts that when a stress maximum occurs in the seventh position, it may nevertheless be rendered rhythmical by certain vocal manipulations: if the last four syllables are emphatically grouped together and are perceptually segregated (in mid-phrase or in mid-word) from the preceding unstressed syllable. One might also expect an over-stressing rather than playing down of the deviant stress, with a late peak on it or on the following sonorant, suggesting forward grouping. In the present instance this syllable is stressed, but not over-stressed; and it is segregated from the preceding article by a 48-msec pause and by a conspicuous pitch discontinuity (resetting from 93.830 Hz at the end of a falling contour to 130.473 Hz at the onset of the next contour, peaking late, after the vowel, on the /l/; [ILLUSTRATION FOR FIGURE 9 OMITTED]). Because this pause is totally unwarranted from the linguistic point of view, its only justification appears to be to satisfy the demands of two rhythmical solutions discussed above: (1) the over-articulation of the position boundary where two syllables are squeezed into one position and (2) the segregation of the "stress valley" beginning with a stress maximum in a weak position. Such a manipulation of bisyllabic occupancy generating a stress maximum in a weak position is net without precedents in our corpus.


In this paper we have dwelt on some of the metric complexities of Keats's Elgin Marbles sonnet and on some of the rhythmic solutions offered by Douglas Hodge. As an extra bonus, we have learned about the cognitive processes that make possible the rhythmical performance of a metrically complex line. We have gained some understanding of the handling of pauses in speech perception and in some other mental processes by back-structuring. Moreover, we have offered a structural model for handling pauses in poetic rhythm, alternative to that of equal or proportional timing. We have pointed out that grouping and over-articulation may be an efficient means for saving mental processing space required for the perception of the sequence of regularly alternating weak and strong positions wherever they conflict with the sequence of irregularly alternating stressed and unstressed syllables. Where two syllables must be assigned to one metrical position, we have suggested that under-articulation of the intervening syllable boundary and over-articulation of the second syllable's boundary might facilitate the perception of rhythmic regularity. Such a conception would be compatible with Halle and Keyser's conditions for bisyllabic occupancy of metrical position. Here, too, moreover, back-structuring may be at work. We have observed that given these possibilities of rhythmical performance, performers might attempt to violate meter, even where verse structure does not require it, by, for instance. manipulating a stress maximum into the seventh (weak) position and then deploying the vocal devices predicted by my research for a rhythmical solution. Gestalt theory also predicts that well-organized perceptual wholes will tend to reassert themselves up to a certain point in perception against possible intrusions or violations, thereby generating tension; beyond that point, however, they fall into chaos and tension ceases. Infringements upon meter tend to generate tension when such boundaries of versification units as metrical position, caesura, and line ending exert resistance. It is such resistance that may initiate back-structuring too.

Ultimately, one cannot escape questioning upon what authority we draw the utmost limit of what is acceptable in prosody. While we do not know what intuitive rules Shakespeare or Milton or Keats or Shelley followed, we do know the rules explicitly formulated by later prosodists have been frequently violated by the greatest masters of musicality in poetry. In my view, as between "hills" and "mountains," there is no natural cut-off point between "metrical" and "unmetrical." While the prevalent paradigm draws the utmost limit in a more or less arbitrary manner based, eventually, on considerations of frequencies in the texts, my approach assumes that in addition to the poetic text there is a human perceiver too. Competent readers do not just impose their whims upon the rhythm of a poem, but respond to subtle cues in the text in offering a rhythmical solution to the perceptual problem posed by the conflicting patterns of "prose rhythm" and meter. It is one's "rhythmic competence" that constrains the process. One's solution is governed by Gestalt principles and the properties of such cognitive faculties as speech perception, short-term memory, and the experiencing of time. I suggest that a model based on these assumptions must account both for agreements and disagreements between respondents to poetry. Since we have no access to the subjective experience of poetic rhythm as it takes place in the human mind, we must rely in our research on instances when readers vocalize their perceptions and on judgments of such vocalizations.


1 See now also Tsur, "Poetic Rhythm." I am indebted to Galit Adam, who asked me provocative questions, the answers to which are included in this paper. This research was supported by a grant from the Israel Science Foundation.

2 Ivan Fonagy preceded me in noticing the "paradoxical" nature of poetic rhythm and investigated some of the solutions to this paradox offered by leading Hungarian actors.

3 Gil and Shoshani also postulate an hierarchic tree underlying both the metrical and the linguistic dimension of poetic rhythm, but this aspect of their theory does not concern us here.

Reuven Tsur is professor of Hebrew Literature at Tel Aviv University, and has developed a theory of cognitive poetics. His books in English include Toward a Theory of Cognitive Poetics (1992), What Makes Sound Patterns Expressive: The Poetic Mode of Speech-Perception (1992), On Metaphoring (1987), The Road to "Kubla Khan" (1987), and A Perception-Oriented Theory of Metre (1977).
