Different encoding activities during list learning, such as writing down or reading aloud, have a differential effect on memory performance. It has been argued that intermodal processing, that is, auditory processing of visually presented material and vice versa, results in a better performance than intramodal processing. This has been referred to as the "translation hypothesis." In this study, we set out to test the translation hypothesis looking at all four possible experimental conditions using visual and auditory presentation and writing and vocalization as encoding activities. The results show a similar memory performance in all conditions apart from the one in which visually presented words had to be written down. That is, in the only condition in which subjects did not hear the words (either via auditory presentation or via their own vocalization), fewer words were remembered. These findings do not support the translation hypothesis and are more in agreement with previous theoretical proposals regardi ng long-term modality effects.

For centuries, students and educators have been interested in developing strategies in order to improve memory performance. This has lead to the development of numerous methods that may improve memory (Searleman & Herrmann, 1994). For example, methods have been proposed that focus on the learning of strategies in order to structure the to-be-learned material, thus requiring an active involvement of the subject. Other methods may concentrate on the way in which the information is presented to the individual. For example, instructional materials that use a dual-mode presentation technique--such as auditory text combined with visual diagrams--may result in superior learning and memory performance compared to single-modality formats--such as visual text and visual diagrams (Tindall-Ford, Chandler, & Sweller, 1997). A third, different approach concerns the effects of distinct encoding activities on memory performance. For example, it has been argued that encoding activities which are carried out in more than one modality, for example, visual and auditory, may result in better memory performance than encoding activities confined to one modality only, for example, visual only (Conway & Gathercole, 1990). The present study concerns an investigation of this latter proposal.

Memory performance is influenced by the modality in which the stimuli are presented and the type of encoding activity that is employed. For instance, there is evidence that auditory presentation results in a better performance than visual presentation of the same verbal material (e.g., Gardiner & Gregg, 1979; Murdock & Walker, 1969) and that the use of, for instance, mental imagery during encoding can produce a substantial increase in the amount of remembered material (Paivio & Te Linde, 1982). Several theoretical proposals have been put forward to explain the auditory advantage, but in general most models assume that auditorially presented words are processed in a different or more 'distinct' manner than visually presented words. Murdock and Walker were among the first to observe a modality effect in short-term memory. Further research extended this finding in several other experimental studies. The occurrence of a modality effect first seemed to be confined to short-term memory, but subsequently, it has be en shown that input modality can influence some aspects of long-term retention (e.g., Gardiner & Gregg, 1979). Findings concerning the long-term modality effect, however, tend to be less robust and more dependent upon details of the experimental conditions than findings concerning the short-term modality effect (for an overview, see Penney, 1989).

In a series of experiments, Conway and Gathercole (1987) and Gathercole and Conway (1988) have investigated the effect of different ways in which subjects subsequently rehearse presented words, such as vocalizing, mouthing silently, or writing down. In a typical experiment, they showed subjects a list of words on a computer monitor and asked them to read the words either out aloud or silently. The results showed that the visually presented words were better remembered when they had been read aloud by the subject compared with the condition in which the words had been read silently.

The temporal distinctiveness theory attributes this improvement in memory performances to the different activities carried out at encoding (Conway & Gathercole, 1987; Gardiner, 1983). It is suggested that these encoding activities give rise to memory traces that are more easily accessible to retrieval processes. According to this theory the vocalization advantage is the consequence of the subject carrying out an additional activity at encoding. Gathercole and Conway (1988), however, found no corresponding memory benefit when visually presented words were written down rather than spoken aloud at input. Thus, the performance was comparable whether words were written down or just read silently. Assuming that writing a word should produce a more distinctive memory trace than just reading it (which is also necessary for writing), this result did not support this theoretical interpretation.

Alternatively, Conway and Gathercole (1990) suggest that a critical difference between the vocalization of visually presented words and the writing of visually presented words concerns the fact whether processing has taken place in one or two different modalities. Vocalizing visually presented words calls upon processes in the visual and the auditory domain. In contrast, writing visually presented words only involves processing activities in one modality. Therefore, they propose that the beneficial effect resulted from the fact that vocalization requires both phonological and orthographic processing and some type of mapping between these two processing domains. Writing of visually presented words requires only orthographic processing and no intermodal mapping, which leads to a lesser memory performance. In this view, "distinctiveness" is, in a sense, achieved by multimodal processing. Conway and Gathercole (1990) refer to this suggestion as the "translation hypothesis."

Conway and Gathercole (1990) found evidence for this hypothesis in a set of experiments using an intentional and an incidental learning procedure (see also Thompson & Paivio, 1994). In both experiments, it was observed that writing down auditorially presented words leads to a better memory performance than the writing of visually presented words. The latter condition was again better than conditions in which subjects had to passively register input without carrying out an additional task, that is, simply hearing the auditorially presented words or reading the visually presented words. This series of experiments, therefore, supported the hypothesis that writing improves memory performance more for auditorially presented than visually presented words, and that this phenomenon generalizes across learning paradigms.

There are, however, two methodological problems with these experiments, which shed some doubt on the final conclusion. First, Conway and Gathercole (1990) used only writing as encoding activity with visual and auditory presentation of stimulus words. Thus, they investigated only half of the four combinations that are possible given that the presentation can be visual or auditory and the input condition can ask for writing or saying aloud. Gathercole and Conway (1988) investigated the effect of different types of encoding activities but no study has looked directly at the four possible permutations in an integrated design. Gathercole and Conway's (1988) main finding of an overall beneficial effect of vocalization over all other types of encoding activity does not support, nor contradict, the translation hypothesis. Second, in the incidental learning task, where the most pronounced differences between the inter- and the intramodal conditions were observed, the passive input conditions (read only and hear only condition) resulted in very low memory performances on the recognition test, with less than 60% correctly recognized words. Given the possibility that the visual presentation mode is overall more difficult, the observed difference between the two active input conditions could be spurious. That is, the observed larger difference in memory performance between the active and the passive condition with auditory presentation (intermodal) compared with visual presentation (intramodal) could be caused by the fact that the performance in visual passive condition could not fall beneath that of the auditory passive condition due to the possibility of a floor effect.

The aim of this study is to test the translation hypothesis in a fully crossed design using tasks that are less susceptible to floor effects. The experiments are set up as replications of the original Conway and Gathercole study, with the addition of the encoding activity of vocalization, both with visual and auditory presentation of the stimulus words. Experiment 1 employed an intentional learning procedure, whereas subjects in Experiment 2 were given incidental learning instructions. Conway and Gathercole (1987) argued that incidental learning procedures are more sensitive to long-term modality effects than intentional learning procedures. This could be caused by strategic differences, initiated by intentional learning. They assume that mnemonic strategies employed at encoding, such as rehearsal, are sufficient to obscure underlying modality differences.



Sixty-four subjects participated in this study, 32 in each experiment. They were undergraduate students at the University of Utrecht and aged between 18 and 25. They were all paid for their participation, and they all had normal or correct to normal vision and none reported any hearing problems.


Eighty high-frequency five-letter words were selected from the Celex database (mean frequency value was 85 per million, with a standard deviation of 31.7). Input list A contained 40 words as did input list B. Both input lists were balanced on mean and standard deviation of frequency. Half of the subjects was presented with input list A, while input list B was used as a filler in the recognition questionnaire. The other half of the subjects was presented with input list B, while input list A was used as a filler in the recognition questionnaire.


The experimental stimuli consisted of words presented visually or auditorially. The memory items in the visual presentation conditions in Experiment 1 and 2 were presented on a computer screen, which was used to display and control the timing of the stimuli. The memory items in the auditory presentation conditions in Experiment 1 and 2 were spoken by a speech therapist and presented with the 'Sound Blaster' software package under Windows. Auditory presentation took place via two loudspeakers placed on each side of the computer.


Subjects were tested individually. They were seated in front of a computer screen and were given printed instructions at the beginning of the experimental session. In Experiment 1, subjects were given an intentional learning instruction. They were informed that their memory would be tested after the list presentation. Subjects were required to memorize the words in the list and were encouraged to actively try to remember as many words as they could.

In Experiment 2, the instructions consisted of a cover story to minimize intentional learning. They were told that they were participating in a pilot study for an experiment eventually to be carried out with children. The aim of the study with the children would be to evaluate their ability to perform very simple activities following instructions. Subjects were told that the purpose of the pilot study was to provide a check that the presentation rate of the instructions and words was not too fast and that the words were sufficiently discriminable. Subjects were informed that they would be asked their opinion about these aspects after the memory list. They were given no indication that a recognition test would be given. When questioned after the experiment, all subjects who participated in the incidental learning paradigm said that they did not expect a memory test.

Forty words were presented one at a time in blocked design. Half of the subjects was randomly assigned to the visual presentation condition, that is, all words were presented visually. The other half of the subjects was assigned to the auditory condition. Each word was preceded by a printed instruction specifying the input activity for that word. This instruction was displayed on the screen in upper-case lettering for 2.5 seconds and informed the subjects how the word had to be trained. The activity instructions were 'WRITE' or 'SAY ALOUD.' For each subject, 20 words were assigned to one input activity and 20 words to the other input activity. The sequence of input activities across trials was in an unpredictable order. After the encoding activity instructions, there was a clear screen for 1 second. In the visual presentation conditions, the word then appeared on the screen, where it remained 2.5 seconds. In auditory presentation conditions a voice was heard saying the word. The screen cleared before the ins truction corresponding to the next word in the list was presented. The subjects were instructed to press the space bar to initiate the following presentation, so there was enough time to execute the instructions. In the writing conditions, subjects were provided with a bloc-note on which they wrote the presented words. When a word had been written, the subjects immediately turned over the page. Subjects did not see any of the words they had written again at any other point in the experiment.

At the end of the memory list, subjects were instructed orally about how to complete the recognition questionnaire, which was placed in front of them. The interval between the end of the memory list and the start of the recognition test took on average 3 minutes. The recognition questionnaire contained 80 words, consisting of the two memory lists. For each word, subjects had to indicate whether that word had been presented or not by circling either 'YES' or 'NO.' It was emphasized that half of the words in the questionnaire had been presented. Subjects were instructed to complete the words on the questionnaire in the sequence in which they appeared on the paper and not to leave blanks. The time taken to complete the questionnaire was on average 5 minutes.


The mean percentages of words correctly recognized for both experiments are shown in Figure 1 (visual presentation) and Figure 2 (auditory presentation).

Experiment 1 (intentional learning)

The recognition data were entered into a two-way analysis of variance with two factors: one between-subjects factor, which was the modality of presentation (visual versus auditory) and one repeated measures factor, which was the encoding activity (writing versus saying aloud). For the number of correctly recognized words, no significant main effect of input modality was found, F(1, 30) = 0.06, p [greater than] 0.5, and the main effect of input activity did not reach significance either, F(1, 30) = 3.30, p [less than] 0.08. There was a marginally significant interaction effect between input modality and encoding activity, F(1, 30) = 3.75, p [less than] 0.06. To explore this trend, comparisons with post-hoc Tukey tests were carried out. These analyses showed that the only significant difference was between 'write' and 'say aloud' in the visual presentation condition (Tukey test, a = 0.05; critical distance is 6.6).

Experiment 2 (incidental learning)

No significant main effect was found of input modality on the number of correctly recognized words F(1, 30) = 2.87, p [less than] 0.1. In contrast, the main effect of encoding activity, F(1, 30) = 9.37, p [less than] 0.005, and the interaction effect, F(1, 30) = 4.59, p [less than] 0.040, were significant. The post-hoc analyses (Tukey test, a = 0.05; critical distance is 6.5) showed only the visual/writing condition to be significantly different from the other conditions, with subjects showing worse performance in this condition.


This study was set up to test the translation hypothesis put forward by Conway and Gathercole (1990) based on experiments in which they observed a distinct effect of writing on recognition memory. Writing strongly improved the long-term retention of auditorially presented words and had a weaker effect for visually presented words. However, they used an incomplete design in which vocalization was not used as a encoding instruction, and there were two conditions in which floor effects were likely, given the high error rates (according to Conway & Gathercole, 1990, performance was at chance level). Therefore, we repeated the experiments of Conway and Gathercole (1990) using an overall easier task with the addition of two input conditions in which subjects were asked to say the presented word aloud.

The results of Experiment 1 showed no significant main effect of input modality or input activity. There was, however, a marginally significant interaction effect between input modality and input activity. Post-hoc analysis suggested that the condition in which subjects had to write down visually presented words produced a lesser performance compared to the condition in which visually presented words had to be vocalized. Therefore, the translation hypothesis is only consistent with conditions in which stimuli were presented visually. With auditory presentation, the writing encoding condition was not associated with better performance than vocalization, as would be predicted by the translation hypothesis.

The incidental learning task in Experiment 2 showed no significant main effect of the input modality. The main effect of the input activity was significant with better performance after auditory presentation, but this finding has to be interpreted against the background of a significant interaction effect. Post-hoc analysis of this interaction showed that the conditions involving auditory presentation and the condition in which visually presented words had to be said aloud did not differ significantly. When subjects were asked to write down visually presented words the performance decreased significantly. The pattern of results is comparable to what was observed in Experiment 1. Despite our efforts to avoid floor effects, subjects performed at a low level in the condition in which visually presented words had to be written down. Fortunately, the interpretation of the data is not affected by this possible floor effect, as an even worse performance in this condition could only have strengthened the present fin dings. A possible limitation of the present study is that we were not able to take into account effects of bias as indexed by d'. The design of our experiments did not allow a meaningful calculation of cf that could be attributed to a specific training condition, given that the recognition list consisted of 'old' words from two conditions in addition to 'new' words. For example, in the visual presentation condition, the recognition list consisted of words that had been written after presentation (read/write condition), words that had been said (read/say condition), and new words that had not been presented. This implies that false alarms could not be attributed to a specific condition. Because our study originated as an attempt towards replication, it was our intention to use the same design and measures as Conway and Gathercole (1990), who did not include d' either. Future research may consider the use of designs with only one condition for each recognition list in order to take response bias into account. H owever, a limitation of such designs would be that the sequence of input conditions can not be randomized within an experimental session.

In short, the present results are not consistent with the translation hypothesis. The data of the intentional learning task are only marginally significant, but a clear significant result is obtained in the incidental learning task. Obviously, when words are presented visually, vocalization improves memory performance. However, the reverse pattern in which writing increases the amount of words remembered after an auditory presentation is not supported by the data. This contradicts the hypothesis of Conway and Gathercole (1990). The reason for the divergence of our conclusion from the conclusion drawn by Conway and Gathercole (1990) may lie in the fact that we investigated all four possibilities (with reading, hearing, writing, and saying) in a fully crossed design, whereas Conway and Gathercole (1990) limited their study to only hearing or reading with subsequent writing. Thus, although Conway and Gathercole interpreted the better performance in the hear/write condition as evidence for a "translation" hypoth esis, this could as well be explained by a long-term modality effect (with an auditory processing advantage), leaving the results inconclusive.

Our observations may be better explained in terms of long-term modality effects. The three conditions in which the subjects heard the stimulus word, either via the auditory presentation or via their own voices in the vocalization condition, all produced very comparable results. The only condition in which subjects perform worse (especially in the incidental learning task) is the condition in which subjects have seen and written the stimulus word. That is the only condition in which there has been no auditory input, and subsequently no phonological processing.

Previous research has pointed to the importance of vocalization for the modality effect. This vocalization advantage has been observed in short-term memory (Engle & Roberts, 1982) and in long-term memory (Conway & Gathercole, 1987; Gathercole & Conway, 1988). Vocalization with concurrent auditory feedback has a beneficial effect on the retention of words. In contrast, when the auditory feedback is absent, the beneficial effects of articulation are not to be found (Penney, 1989). Silent overt articulation or mouthing sometimes produced higher recall than silent reading (Greene & Crowder, 1984,1986; Nairne & Walters, 1983), which may also involve some form of articulation, but it never resulted in recall as high as when words were presented auditorially or subjects vocalized audibly (Campbell & Dodd, 1980). Further evidence that auditory feedback is critical was provided by Murray (1966). He showed that the beneficial effects of vocalization were reduced when the sound of the subject's voice was masked by whit e noise. This finding is in accordance with our suggestion that the better memory performances that occurred when visually presented words were vocalized rather than written, are due to the auditory aspect of vocalization. As an alternative to the superiority of auditory processing interpretation, the possibility might be considered that in the visual/writing condition one need never encode the word as a word at all when writing down a visually presented word. One may simply be copying letter strings. This would be particularly problematic for incidental learning, which was the condition that showed significant worse performance. However, this explanation may lose plausibility, when it is taken into account that subjects with an academic background are likely to semantically encode words automatically when they read. Future research is needed, concentrating on the exact cognitive mechanisms underlying modality effects in memory performance, and in which the findings are extended to more natural stimuli, such as prose passages.

Correspondence may be addressed to Edward H. F. deHaan, Psychological Laboratory, Utrecht University, Heidelberglaan 2, 3584 CS Utrecht, The Netherlands. (Phone: +31 30 2531897; Fax: +31 302534511; E-mail:


