# Exploring pattern analysis with sycamore aphids.

Many students enjoy biology as a qualitative science but struggle with its quantitative aspects. Yet the detection of patterns and testing hypotheses about their causes is a central aim of biological inquiry. That is the goal of the laboratory analysis described in this article. This exercise involves the students right away in collecting data from living organisms, calculation of some basic statistics, and the formal test of a hypothesis. This fall will be the 10th year I've used this exercise at the college freshman level, but I believe this lab could be used on almost any campus and adapted to any class size. The equipment requirements are minimal and the protocol is simple to follow. The real value lies in empowering students to work with their own data and to see that patterns in biology can be approached quantitatively (requiring only algebra), all critical tools for inquiry-based learning.The aphid of study, likely to be found on practically any sycamore tree in North America, is Drepanosiphum platanoides (Schrank), a relatively large "pale or dark green or reddish yellow species with the wing veins slightly dusky" (Essig, 1958, p.234). A few other aphids may be found on sycamore leaves (Drepanaphis spp., Periphyllus spp.) and may be used instead. The point is to illustrate how individuals in a population may be distributed and to test a null hypothesis of randomness that, if rejected, suggests a host of possible biotic or abiotic causes. To activate student interest in the project, solicit their own ideas for causes, after a brief discussion of aphid biology. What follows is the handout I give to the students on the first day; they have a week to complete the lab. After the handout discussion is a section for the instructor on implementation of the techniques used and some possible extensions.

Purpose

In this lab you will practice important aspects of biological inquiry: observation, data collection, data analysis, and scientific writing. You will work in pairs and do the lab on your own time.

The objectives of this laboratory are:

1. to become familiar with how to calculate a sample mean and variance

2. to become familiar with population sampling and data analysis

3. to collect your own data from live organisms and puzzle over a real biological pattern.

Ecology has been defined as the study of the distribution and abundance of organisms (Andrewartha & Burch, 1954). This is because patterns we see may reflect and reveal unseen ecological relationships. Thus the analysis of spatial pattern can lead to a better understanding of the biotic and abiotic forces that affect organisms.

Our null hypothesis may be that individuals of a species are distributed at random, implying no particular interactions with each other or the environment. For example, when cottonwood seeds drift down onto a newly-formed Platte River sandbar, they may land randomly (that is, any chosen square meter of sandbar has an equal probability of receiving a seed). Later, the emerging seedlings may not show a random pattern.

Pattern analysis might reveal a population of organisms to be aggregated (clumped) or evenly-distributed (regular). Both terms refer to a deviation from randomness that can be quantified. One must use care in selecting a technique for sampling organisms that will not bias one's results. Also, different techniques and scales will have different powers of detection. For example, consider that a newspaper photo represents an image at a distance but only dots of various sizes on very close inspection.

What we choose to measure, count, etc. has a great influence on the pattern we detect. In some cases, sampling units (SUs) present themselves naturally (i.e., leaves on a plant). In other cases we may choose the size, shape, and location of the SUs. Over a century ago, two University of Nebraska graduate students (Roscoe Pound & Frederick Clements, 1898) championed the use of the quadrat in plant community analysis. This is basically a square meter frame placed either randomly or systematically on the ground, and all plants in that frame are counted. In this exercise we will use "natural" SUs.

The Poisson Probability Distribution

If organisms are distributed at random, then each SU has an equal probability of containing an individual, independent of the number of individuals it already contains. This does not predict that all SUs will have the same number of individuals in them. The shape of the Poisson series depends on the mean (or expected) number of individuals per sample unit. This is the total number of individuals divided by the number of possible SUs. Figure 1 plots the Poisson probability (or proportion) of sample units having 0, 1, 2 ... individuals in them on the vertical axis (the ordinate) vs. the number of individuals per SU on the horizontal axis (the abscissa) for two different means. The variance is a measure of the spread of values on either side of the mean. You can see the distribution with the higher mean (black bars) also has a greater variance. In Figure 2 the gray bars show an expected frequency distribution of individuals/SU if 25 SUs are sampled from a site where the mean is 1.3 individuals/sample unit and the distribution is random. If the black bars represented some observed values of numbers of individuals per SU, would this be random? We see right away we observed twice as many empty SUs than we expected from the random distribution, but in all other categories there were fewer SUs observed than expected from the random distribution. Statistical tests can tell us if this probably reflects a true deviation from randomness.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

When we ask: "What is the probability of finding x individuals in any given SU?" if individuals are distributed at random, the answer follows the Poisson probability distribution that has a fairly unique property; the mean equals the variance. If the observed frequency distribution has this property too, then we have evidence of true randomness. Hence our null hypothesis is

Ho: the mean ([bar.x]) = the variance ([s.sup.2]).

Statistics

Note that our observed (sampled) frequency distribution is never the actual distribution unless we exhaustively count every individual in the area. So our sample mean and variance (statistics) are approximations of the true population mean and variance (parameters.) In general, the greater our sample size (number of SUs), the closer the approximation and the more confidence we place in our test of randomness. We could test whether our observed data fit the Poisson with a chi-square test or by comparing the ratio of the variance to the mean (which we will call the index of dispersion) to one.

We'll use a t-test described in Brower, Zar and von Ende (1990) and based on Clapham (1936) to see if the index of dispersion varies significantly from one. In a t-test, you calculate a t-value from your data and compare it to a critical t-value, which is found on statistical tables for any chosen level of significance (usually 0.05) and N-1 degrees of freedom (number of SUs minus one). See Rohlf and Sokal (1994) or visit http://www.biology.ed.ac. uk/research/groups/jdeacon/statistics/table1.html#Student's%20t%20test for examples.

Calculations

The mean of a series of observations is the sum ([SIGMA]) of the number of individuals observed in each SU ([x.sub.i]) divided by the number of sampling units (N):

The variance is the sum of the squared deviations of each observation ([x.sub.i]) from the sample mean divided by N-1. The formal definition and a formula useful with calculators are below:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

The index of dispersion is simply the ratio of the variance to the mean:

ID = [s.sup.2]/[bar.x]

An Example

Suppose you counted individuals in 25 SUs and of those, 10 SUs had none, seven SUs had 1, three SUs had 2, two SUs had 3, two SUs had 4 and one SU had 5. This is as shown by the black bars in Figure 2. Your mean would be calculated as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

The variance is found by the calculator formula:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Notice the order of operations--in the first, [ ] is the sum of all squared observations; in the second, [ ] is the sum of all observations squared divided by N. The second is subtracted from the first before division by N-1.

The index of dispersion is: ID = 2.2/1.3 = 1.7

Now we calculate our t value:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

We compare this to the critical t-value (from a t table) for [alpha] = 0.05 * and n = 24:

t = 2.3 > [t.sub.crit.] = 2.06 so we reject the null hypothesis that this follows a Poisson distribution.

Discussion

Since our calculated t-value is greater than the critical t-value, we can say (with 95% confidence) that our sampled population does not follow a random distribution. The critical value is for a two-tailed test which means 2.5% of the probability is in the upper tail of the distribution and 2.5% is in the lower tail. We use this because our null hypothesis states that the index of dispersion equals one, which leaves two logical alternatives: greater than one and less than one. If our t-value had been less than 2.06, we could not reject the null hypothesis. If we reject the Poisson distribution, then either: a) each SU doesn't have the same probability of an individual, or b) having an individual affects the probability of another individual in the same SU.

What is our alternative hypothesis? If the ID is significantly greater than one, the variance is greater than the mean: a clumped (or contagious) distribution. If the ID is less than one, the mean is greater than the variance: a uniform or even distribution. Try to imagine what a bar graph of the frequency distribution (like Figure 1) would look like for each of these cases. If we could rule out abiotic causes, then what biotic interactions could be responsible for these distributions? Clapham (1936) originally used the index of dispersion for plants distributed among quadrats in an eastern Nebraska prairie and discussed several possible biotic interactions. In a series of papers, Dixon explored biotic factors affecting aggregation in sycamore aphids (Dixon, 1966; Dixon & McKay, 1970; Dixon & Logan, 1972) but also noted occasions where exposure to weather was a main factor.

[FIGURE 3 OMITTED]

Implementation & Extensions

As a sampling unit, a sycamore leaf can contain hundreds of aphids, though counts are commonly much lower. One would not want to attempt this on a very windy day, and wind does influence aphid behavior. Generally, my students' counts are manageable for the calculation of the mean and variance. This task alone is daunting enough for some and many will seek help, which I think is a good thing. Those who know become the teachers of those who don't. They also face the new terminology of statistical hypothesis testing, which they will use throughout the course, and no doubt in other courses. For many, this will be the first time they have considered mathematical representations of populations, which will be important as they assault Hardy-Weinberg and other concepts. Even the formal definition of randomness is a powerful concept.

In an excellent but hard-to-find source of ecology lab ideas, Wratten and Fry (1980) feature labs on pattern analysis and suggested (for another analysis) the use of sycamore aphids. Brower, Zar, and von Ende (1990) present the t-test statistic used here, but my search of publications about the variance-to-mean ratio test could not find a precedent for it. They attribute it to Clapham (1936), who applied the index of dispersion but did not use a t-test. Most authors have used [X.sup.2] tests of some sort, and many multiply the index of dispersion by n-1, yet these same authors, after Fisher (1950), consistently note the problem of this test when expected counts in any category are less than 5. The performance of many tests is contrasted in Karlis and Xekalaki (2000), who conclude it is " ... reasonable to consider the use of the index of dispersion as the test statistic. The advantages of this test are that it can be used with small samples, it is more sensitive than the usual [X.sup.2] test and using the value of its test statistic we can detect the sort of deviations from the Poisson assumption" (p. 359). Their Equation 8 gives a test statistic they deemed "most powerful" against a negative binomial distribution (which describes most aggregated patterns) that is algebraically equivalent to the t-test statistic used for this exercise.

Having performed a t-test, my students will better understand the t-test they use in their next lab to compare sample means, even though Excel will do the calculations. When students do the calculations correctly, they almost always reject the null hypothesis, as these aphids tend to be highly aggregated. Asking them why can lead to interesting discussion, particularly when they discover the parthenogenetic mode of reproduction in aphids. In a take-home assessment quiz, I ask them to look for these dispersion patterns in other species and apply the index of dispersion test. For example, they may count birds on a wire, where the span between adjacent poles is the SU, or galls on hackberry leaves, or a species' abundance in quadrats in a prairie or old field.

While students do not design their own question and protocol as in a true inquiry-based lab, when analyzing patterns in nature there is no one "right answer," but many ways to find answers and many more questions. This simple investigation provides fodder for their first write-up using a scientific style. They have a more personal stake in their own data, and this makes the whole process more valid to them. If time allows, the students should be guided to see the relevance of distribution patterns to species they have an interest in, as well as opportunities to look at aphids closely and develop some of their own questions. The sooner they can gain confidence in framing and testing hypotheses, the sooner they can appreciate that biology isn't just a great can of facts someone opened and poured in a textbook. And if this activity alleviates some of the math phobia they have; so much the better.

Acknowledgments

Matthew Beachly assisted me in producing the figures. Dr. Amy Morris worked with the students to develop reports in scientific paper format for this lab. I would also like to thank two anonymous reviewers for their comments.

References

Andrewartha, H.G. & Birch, L.C. (1954). The Distribution and Abundance of Animals. Chicago: University of Chicago Press.

Brower, J.E., Zar, J.H. & von Ende, C.N. (1990). Field and Laboratory Methods for General Ecology, 3rd Ed. Dubuque, IA: W.C. Brown.

Clapham, A.R. (1936). Overdispersion in grassland communities and the use of statistical methods in plant ecology. The Journal of Ecology, 24, 232-251.

Deacon, J. The really easy statistics site. Biology Teaching Organisation, University of Edinburgh. http://www.biology.ed.ac.uk/research/groups/jdeacon/ statistics/table1.html#student's%20%20test.

Dixon, A.F.G. (1966). The effect of population density and nutritive status of the host on the summer reproductive activity of the sycamore aphid, Drepanosilphum platanoides (Schr.). The Journal of Animal Ecology, 35,105-12.

Dixon, A.F.G. & McKay, S. (1970). Aggregation in the sycamore aphid, Drepanosilphum platanoides (Schr.) (Hemiptera: Aphididae) and its relevance to the regulation of population growth. The Journal of Animal Ecology, 39, 439-54.

Dixon, A.F.G. & Logan, M. (1972). Population density and spacing in the sycamore aphid, Drepanosilphum platanoides (Schr.), and its relevance to the regulation of population growth. The Journal of Animal Ecology, 41, 751-9.

Essig, E.O. (1958). Insects and Mites of Western North America; A Manual and Textbook for Students in Colleges and Universities and a Handbook for County, State, and Federal Entomologists and Agriculturalists as well as for Foresters, Farmers, Gardeners, Travelers, and Students of Nature, 2nd Ed. New York: McMillan.

Fisher, R.A. (1950). The significance of deviations from expectation in a Poisson series. Biometrics, 6(3), 17-24.

Karlis, D. & Xekalaki, E. (2000). A simulation comparison of several procedures for testing the Poisson assumption. The Statistician, 49, 355-82.

Pound, R. & Clements, F.E. (1898). A method of determining the abundance of secondary species. Minnesota Botanical Studies, 2, 19-24.

Rohlf, F.J. & Sokal, R.R. (1994). Statistical Tables, 3rd Ed. San Francisco, CA: W.H. Freeman.

Wratten, S.D. & Fry, G.L.A. (1980). Field and Laboratory Exercises in Ecology. London: Edward Arnold, Ltd.

* the probability of rejecting the null hypothesis when it is true

WILLIAM BEACHLY is Professor in Biology, Hastings College, Hastings, Nebraska 68901; e-mail: wbeachly@hastings.edu.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | INQUIRY & INVESTIGATION |
---|---|

Author: | Beachly, William |

Publication: | The American Biology Teacher |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Nov 1, 2008 |

Words: | 2800 |

Previous Article: | Tissue regeneration in the classroom! |

Next Article: | Mapping linked genes in Drosophila melanogaster using data from the F2 generation of a dihybrid cross. |

Topics: |