# Data methods in optometry--Part 9: experimental design and analysis of variance.

[ILLUSTRATION OMITTED]

In a previous article in this series (1) we introduced analysis of variance (ANOVA) as the most efficient method available for the analysis of data from experiments. The origin of ANOVA, the logic that underlies the method, and the assumptions necessary to apply it to data were described. In addition, the simplest form of the analysis, the one-way ANOVA in a randomised design, was illustrated using a clinical experiment. The various methods available for making specific comparisons between pairs of group means (known as 'post-hoc' tests) were also discussed. There are many different forms of ANOVA, however, and each is closely linked to a specific type of experimental design. Hence, it is possible to use an incorrect form of the analysis and as a consequence, draw the wrong conclusions from the data.

The present article extends ANOVA to a variety of different experimental designs, likely to cover many common situations in optometry and vision science. These will be considered under the following headings: (a) one-way ANOVA, 'random effects' model, (b) two-way ANOVA in randomised blocks, (c) three-way ANOVA, (d) factorial ANOVA, (e) factorial ANOVA, split-plot design, and (f) factorial ANOVA, repeated measures design.

The one-way ANOVA, 'random effects' model

In our previous article, (1) a one-way ANOVA in a randomised design was applied to an optometric experiment designed to compare the effect of the drug Tropicamide on the degree of pupil dilation in a group of control subjects, a group of patients diagnosed with Alzheimer's disease (AD), and a group of patients diagnosed with dementia other than AD. This type of ANOVA was described as a 'fixed effects' model in which the objective was to estimate the differences between the subject groups and these were regarded as 'fixed' or discrete effects to be estimated. There is, however, an alternative model called the 'random effects' model in which the objective is to estimate the degree of variation of a particular measurement and to compare different sources of variation in space and time.

Scenario

An example of an experiment of this type is given in Table 1. Five measurements of intraocular pressure (IOP) were made on a subject, one minute apart, on three randomly chosen days. The objective was to determine, for individual subjects, the degree of variation in IOP from minute-to-minute compared with the variation between measurements made on different days. Hence, estimates of variability are the objective of the study rather than measuring 'fixed effects'. Based on these estimates of variability, a suitable protocol for measuring IOP in a clinical context could be devised. For example, if IOP varied considerably from minute to minute but variation between days is small, on average, several measurements of IOP on a subject on a single occasion might be an appropriate strategy. By contrast, if minute-to-minute variation was negligible but there was significant day to day variation, it might be better to measure IOP only once on a specific occasion, but on a sample of days. This type of experiment is also described as having a 'hierarchical' or 'nested' design, especially if each sample is composed of sub-samples and these in turn are sub-sampled. In addition, in other circumstances, variation could be spatial rather than temporal, e.g., visual function may be quantified at different locations on the retina. In this case, there may be variation between right and left eyes, between different locations on the retina within the same eye, and between sequential measurements made at the same retinal location.

The analysis

The ANOVA appropriate to this design provides an F test of whether there is significant variation between days. However, in this design it is possible to calculate the 'components of variance' [S.sub.m] and [S.sub.D] from the ANOVA table (Table 1). These are estimates of the variance of the measurements made between days and between determinations on a single day. In the example quoted, the F test suggests there is significant variation in IOP between days. In this context, however, the components of variance are more useful and reveal that the component between days is approximately twice as great as that between successive measurements on a single day.

It is often necessary to identify whether a 'fixed' or 'random' effect model is appropriate in each experimental situation. This is particularly important in more complex factorial-type designs in which there may be a mixture of 'fixed' and 'random' effect factors. (2) One way of deciding whether a factor is 'fixed' or 'random' is to imagine the effect of changing one of the levels of the factor. (3) If this makes it a different experiment, for example, by substituting a different group of subjects, then it is a 'fixed' effect factor. By contrast, if we considered it the same experiment, for example, by simply substituting a different sample day, as in the example described above, it would be a 'random' effect factor.

The two-way ANOVA in randomised blocks

In the one-way, 'fixed' effects ANOVA, (1) each observation was classified in only one way, i.e. in which treatment or subject group the observation belonged. Replicates were either allocated to treatment groups at random or subjects within a group were a random sample of a particular population of subjects. Such an experiment is often described as in a 'randomised design'. More complex experimental designs are possible, however, in which an observation may be classified in two or more ways.

Scenario

An example of an experiment in which the observations are classified in two ways is shown in Table 2. This experiment studied the effect of four coloured filters on the reading rate of 12 patients grouped by age. In such an experiment there is a restriction in how subjects are randomised to treatments: 1) subjects are first grouped into 'blocks' or replications of similar age and 2) treatments are applied at random to the subjects within each block separately. The name given to each group varies with the type of experiment. Originally the terminology 'randomised blocks' was applied to this type of design as it was first used in agricultural experiments in which experimental treatments were given to units within 'blocks' of land; plots within a block tending to respond more similarly compared with plots in different blocks. (2) In addition, the block may be a single trial or replication of the comparison between treatments, the trial being carried out on a number of separate occasions. Furthermore, in experiments with human subjects, there is often considerable variation from one individual to another and hence a good strategy could be to give all treatments successively to each 'subject' in a random order; the subject therefore comprising the 'block' or 'replication'. Note that the two-way design has been variously described as a matched-sample F test, simple within-subjects ANOVA, one-way within-groups ANOVA, simple correlated-groups ANOVA, and a one-factor repeated measures design! This confusion of terminology is likely to lead to problems in correctly identifying this analysis within commercially available software. The essential feature of the design, however, is that each treatment is allocated by randomization to one experimental unit within each group or block.

The analysis

The ANOVA appropriate to the two-way design in randomised blocks is shown in Table 2. This design is often used to remove the effect of a particular source of variation from the analysis. (1) For example, if there was significant variation due to age of the subjects and if subjects had been allocated to treatments at random, then all of the between subject age variation would have been included in the pooled error variance. The effect of this would be to increase the error variance, reduce the 'power' of the experiment, and therefore, to make it more difficult to demonstrate a treatment effect. In a two-way randomised blocks design, however, variation between subjects, attributable to their age, is calculated as a separate effect and therefore does not appear in the error variance. This may increase the power of the experiment thus making it more probable that a treatment effect would be demonstrated. In the example quoted, despite the blocking by age, there is no evidence for an effect of coloured filter on reading rates but significant effects were present between 'replications' presumably reflecting the effect of age on reading ability.

A comparison of the ANOVA table in Table 2 with that for a one-way ANOVA in a randomised design' demonstrates that for a given number of replications, reducing the error variance by blocking has a cost, viz., a reduction in the degrees of freedom (DF) of the error variance which makes the estimate of the error variation less reliable. Hence, an experiment in randomised blocks would only be effective if the blocking by age or some other factor reduced the pooled error variance sufficiently to counter the reduction in DF. (4)

The three-way ANOVA

In the two-way ANOVA in randomised blocks, if treatments are given sequentially to a subject, there is the possibility of a 'carry-over' effect of one treatment on to the next or for the subject to become fatigued as the tests proceed. An example of the former might include the sequential application of drugs to a patient without a sufficient recovery period between them, whilst an example of the latter might include reading tests with different filters or magnifiers applied sequentially to the same subject. The solution is to have each combination of treatments given to the same number of subjects such that systematic effects due to treatment order will not create bias in the comparison of the treatment means.

Scenario

Examples of this type of design are shown in Table 3. With two treatments (A and B) and n subjects, each of the treatment orders AB and BA would be given to n/2 subjects. With three treatments (A, B, and C), if all treatment combinations were used, the order of treatments would be ABC, ACB, BAC, BCA, CAB, and CBA and each would be given to n/6 subjects.

The analysis

In the ANOVA table (Table 3), variation attributable to the order of the treatments now appears as a specific experimental effect. This variation will not appear in the pooled error variance or affect the comparison of the treatment means. A limitation of this design, however, is that the number of replications n must be a multiple of the number of treatments. Hence, with many treatments there will be a large number of possible orders of these treatments and the level of replication will increase accordingly. One method of solving this problem would be to use an 'incomplete design' in which only some of the combinations of treatments would be given. For example, it would be possible to ensure that each treatment was given first, second, and third to an equal number of subjects, e.g., only the combinations ABC, BCA, CAB could be used. (2)

The factorial ANOVA

In a factorial experiment, the effects of a number of different factors can be studied at the same time. Combining factors usually requires fewer experimental subjects or replications than studying each factor individually in a separate experiment. In addition, by contrast with the three-way design, the between treatments or groups sums of squares (SS) is partitioned into specific comparisons or 'contrasts' (3) which reveal the possible interactive effects between the factors. The interactions between factors often provide the most interesting information from a factorial experiment.

Scenario

Consider an example of the simplest type of a factorial experiment involving the application of two drugs (A and B) each given at two 'levels' (given or not given) to 24 subjects, randomly allocated, six to each treatment combination (Table 4). There are four treatment combinations, no drugs given, either 'A' or 'B' is given separately, or both drugs are given. This type of design is called a '22 factorial', i.e., two factors with two levels of each factor; in this notation, the superscript number refers to the number of factors or variables included and the integer refers to the number of levels of each factor.

The analysis

As in previous examples, the total SS can be broken down into that associated with differences between the effects of the drugs and error (Table 4). In this case, the between treatments SS can be broken down further into 'contrasts' which describe the main effects of 'A' and 'B' and the interaction effect 'A x B'. These effects are linear combinations of the means, each being multiplied by a number or 'coefficient' to calculate a particular effect. In fact, the meaning of an effect can often be appreciated by studying these coefficients (Table 4). The main effect of drug 'A' is calculated from those groups of subjects that receive drug 'A' (+) compared with those who do not (-). Note that in a factorial design, every observation is used in the estimate of the effect of every factor. Hence, factorial designs have 'internal replication' and this may be an important consideration in deciding the number of subjects to use in a study. The main effect of 'B' is calculated similarly to that of 'A'. By contrast, the two-factor interaction ('A x B') can be interpreted as a comparison of the effect of the combined action of 'A' and 'B' with the individual effects of 'A' and 'B'. A significant interaction term would imply that the effects of 'A' and 'B' were not additive, i.e., the effect of the combination 'AB' would not be predictable from knowing the individual effects of 'A' and 'B'. In the quoted example, there is no significant effect of drug A, the effect of drug B is significant, and the non-significant interaction indicates that the effect of B was the same regardless of whether A was given or not.

Note that in a [2.sup.2] factorial design, partitioning the treatments SS into factorial effects provides all of the information necessary for interpreting the results of the experiment and further post-hoc tests would not be needed. However, with more complex factorial designs, e.g. those with more than two levels of each factor, further tests may be required to interpret a main effect or an interaction. With factorial designs, it is even more important to define specific comparisons before the experiment is carried out rather than to rely on numerous post-hoc tests. Factorial experiments can be carried out in a completely randomised design, in randomised blocks, or in a more complex design. The relative advantages of these designs are the same as for the one-way design.

The factorial ANOVA, split-plot design

In the [2.sup.2] factorial ANOVA design described above, the experimental subjects were assigned at random to all possible combinations of the two factors. However, in some designs, the two factors are not equivalent to each other. A common case, called a 'split-plot design', arises when one factor can be considered to be a major factor and the other a minor factor.

Scenario

An investigator wished to study whether IOP in right and left eyes was elevated in patients with high blood pressure (Table 5). The problem that arises in these types of design is the dependence or correlation between the measurements on eyes made on the same subject. (5) In these experiments, the subject group would be the major factor while right/left eye would be regarded as the minor factor. The difference between this and an ordinary factorial design is that previously, all replicates could be allocated at random to all treatment combinations whereas in a split-plot design, replicates can only be allocated at random to the main-plots. In some circumstances, a second treatment could be allocated to the subplot treatments at random, e.g. two treatments could be given to a subject but to different eyes, the eye receiving a particular treatment being selected at random. In some applications, experimenters may subdivide the subplots further to give a 'split-split-plot design'. (1)

The analysis

The resulting ANOVA (Table 5) is more complex than that for a simple factorial design because there are two different error terms. Hence, in a two-factor split-plot ANOVA, the main-plot error is used to test the main effect of subject group while the sub-plot error is used to test the main effect of eyes and the possible interaction between the factors. In the quoted example, there is a significant increase in IOP in patients with elevated blood pressure but no difference between eyes; the non-significant interaction suggesting that the elevation in IOP was consistent in both eyes. The sub-plot error is usually smaller than the main-plot error and also has more DF. Hence, such an experimental design will usually estimate the main effect of the sub-plot factor and its interaction with the main plot factor more accurately than the main effect of the major factor. Some experimenters will deliberately design an experiment as a 'split-plot' to take advantage of this property.

A disadvantage of such a design, however, is that occasionally, the main effect of the major factor may be large but not significant because estimation of this effect lacks power, while the main effect of the minor factor and its interaction may be significant but too small to be important. In addition, a common mistake is for researchers to analyse a split-plot design as if it were a fully randomised two-factor experiment. In this case, the single pooled error variance will either be too small or too large for testing the individual treatment effects and the wrong conclusions could be drawn from the experiment. To decide whether a particular experiment is a split-plot, it is useful to consider the following: 1) are the factors equivalent or does one appear to be subordinate to the other, 2) is there any restriction in how replicates were assigned to the treatment combinations, and 3) is the error variation likely to be the same for each factor?

Caution should be employed in the use of post-hoc tests in the case of a split-plot design. Post-hoc tests assume that the observations taken on a given subject are uncorrelated so that the subplot factor group means are not related. This is not likely in the present example or in split-plot experiments in general since some correlation between measurements made on the sub-plots within a main plot is inevitable. Standard errors appropriate to the split-plot design may be calculated (4,6) and can be used, with caution, to make specific comparisons between the treatment means. A better method, however, is to partition the SS associated with the main effects and interaction into specific contrasts and to test each against the appropriate error. (1)

The factorial ANOVA, repeated measures design

The repeated measures factorial design is a special case of the split-plot type experiment in which measurements on the experimental subjects are made sequentially over several intervals of time. With two groups of subjects, the ANOVA is identical to the preceding example but with time constituting the sub-plot factor. Repeated measurements made on a single individual are likely to be highly correlated and therefore the usual post-hoc tests cannot be used.

Nevertheless, it is possible to partition the main effects and interaction SS into contrasts. In a repeated measures design, the shape of the response curve, i.e. the regression of the measured variable on time, may be of particular interest. A significant interaction between the main-plot factor and time would indicate that the response curve varied at different levels of the main-plot factor. Notice the difference between a two-factor repeated measures design in which the same sequential measurement is being made on a subject and a two-way randomised blocks design in which different treatments are being given sequentially.

Summary

The key to the correct application of ANOVA is careful experimental design and matching the correct analysis to that design. The following points should therefore be considered before designing any experiment:

1. In a single factor design, ensure that the factor is identified as a 'fixed' or 'random effect' factor.

2. In designs with more than one factor, there may be a mixture of fixed and random effect factors present so ensure that each is clearly identified.

3. Where replicates can be grouped, the advantages of a randomised blocks design should be considered. There should be evidence, however, that blocking can sufficiently reduce the error variation to counter the loss of DF compared with a randomised design.

4. Where different treatments are applied sequentially to a patient, the advantages of a three-way design in which the different orders of the treatments are included as an 'effect' should be considered.

5. Combining different factors to make a more efficient experiment and to measure possible factor interactions should always be considered.

6. The effect of 'internal replication' should be taken into account in a factorial design in deciding the number of replications to be used. Where possible, each error term of the ANOVA should have at least 15 DF.

7. Consider whether a particular factorial design can be considered to be a split-plot or a repeated measures design. If such a design is appropriate, consider how to continue the analysis bearing in mind the problem of using post-hoc tests in this situation.

References

See www.optometry.co.uk/references

Richard Armstrong BSc, DPhil and Frank Eperjesi BSc, PhD, MCOptom, DOrth, FAAO
```Table 1: The one-way AN0VA 'random effects model' (hierarchical or
nested design) on 5 measurements of IOP made on a subject one minute
apart on 3 sample days (A).

Design

Sample Day Day [A.sub.2] Day [A.sub.2] Day [A.sub.3]

Repeat 18 17 19
Measurements 19 18 18
20 16 20
19 17 20
21 17 19

ANOVA table:

Variation SS DF MS F ExpMS

Between days 17.73 2 8.866 10.64 [Sm.sup.2] +
5[SD.sup.2]

Between repeat 10.0 12 0.833 [Sm.sup.2]
measurements
within days

ExpMS = expected mean square. Components of variance: between
days ([S.sub.0.sup.2]2) = 1.607, between repeat measurements
([Sm.sup.2]) = 0.833

Table 2: Two-way ANOVA in randomised blocks with four treatment
groups (coloured filters) and with replicates also classified
into three age groups (blocks). Data are the reading rate of
the patient (number of correct words per minute).

Design Coloured filter

Red Yellow Green Blue

Age 1 84.5 72.0 70.5 39.5
Age 2 79.3 68.2 62.6 47.2
Age 3 36.0 46.1 48.9 38.0

ANOVA table:

Variation SS DF MS F

Treatments 1102.95 3 367.65 3.40 ns
Blocks 1448.98 2 724.49 6.71 *
Error 647.91 6 107.98

* P < 0.05; ns = not significant

Table 3: A three-way ANOVA with different treatments applied
in sequence to the same subject.

Example 1. Two treatments with n subjects:

Combinations AB BA

Subjects n/2 n/2

Example 2. Three treatments with n subjects:

Combinations ABC ACB BAC BCA CAB CBA

Subjects n/6 n/6 n/6 n/6 n/6 n/6

Structure of ANOVA table for three treatments and 36 subjects:

Variation SS DF MS F

Total 107
Treatments (a) 2 [F.sub.tx]
Order (c) 5 [F.order]
Subjects (b) 35 [F.sub.subj]
Error 65

n = number of subjects, tx = treatments, subj = subjects

Table 4: Factorial ANOVA with two drugs (A and B) given at
two levels, given (+) or not given (-) (a '22 factorial'
design) with six replications.

Design

Treatment combinations and orthogonal coefficients:

None (1) +A +B +AB

Treatments totals 832 853 881 966

Factorial effects A -1 +1 -1 +1
B -1 -1 +1 +1
AB +1 -1 -1 +1

Structure of ANOVA table:

Variation SS DF MS F

Total 11
(Drugs) (3)
A 468 1 468 2.38 ns
B 1094 1 1094 5.57 *
AB 171 1 171 0.87 ns
Error 3926 20 196.3

A, B main effects, AB interaction effect, * P < 0.05,
ns = not significant (P > 0.05)

Table 5: Factorial ANOVA split-plot design, with two subject
groups (normal and elevated blood pressure) with three subjects
(P1, P2, P3) in each group and the left (L) and right (R) eye
studied from each patient. Data are IOPs.

Design P1 P2 P5

Control B eye 17.3 16.9 14.7
L eye 17.1 16.5 14.3

ARMD R eye 21.4 24.3 21.4
L eye 20.7 22.1 24.2

ANOVA table:

Variation SS DF MS F

Subject group 115.94 1 115.94 34.4 **
Main plot error 13.48 4 3.37
Right/Left eye 0.10 1 0.10 0.06 ns
Interaction 0.067 1 0.067 0 04 ns
Sub plat error 6.59 4 1.65

** P < 0.01; ns = not significant
```