A NULL MODEL FOR DETECTING NONRANDOM PATTERNS OF SPECIES RICHNESS ALONG SPATIAL GRADIENTS.
Abstract. I present a null model that can be used for detecting nonrandom patterns of species richness along spatial gradients, such as latitude and elevation. Because estimates of species richness along a single spatial gradient are nonindependent, tests of statistical inference should not be applied. The null model described here circumvents this problem of nonindependence by randomly placing the actual range width of each species along the gradient. The simulated richness curve generated in this way can be compared to an "average" random curve composed of many such simulations. The comparison involves measuring the mean displacement (D) of the simulated curve from the average random curve. By repeating these steps a specified number of times, one can obtain a distribution of D values. The displacement of the actual species richness curve from the average random curve can also be determined and compared to the distribution of D values of the simulated random curves. This comparison allows one to determine whether the actual curve is random. The null model was found to be very powerful (power [greater than] 0.96) for curves consisting of at least 10 richness estimates from a total pool of 20 or more species.
Key words: nonindependent estimates of species richness; nonrandom pattern; null model; power analysis; randomization of data; Rapoport's rule; spatial gradient; species richness.
Species richness often changes along spatial gradients such as latitude (Stevens  and references within; Pagel et al. 1991, France 1992, Rohde 1992, Ruggiero 1994) and elevation (Graham 1990, McCoy 1990, Rahbek 1995, 1997). When ecologists study these gradients, they sometimes find statistically significant patterns of species richness, without fully considering whether the assumptions of their test of significance are satisfied. An example of this casual use of statistics is the purported decrease in species richness from low to high elevations (Stevens 1992). Using regression analysis on several data sets, Stevens (1992) found that as elevation increases species richness decreases, while the range widths of species actually increase with elevation. Stevens (1989, 1992) named this latter pattern "Rapoport's rule," in recognition of the earlier work of Eduardo Rapoport (see Rapoport ). However, the validity of Rapoport's rule has been called into question for several reasons (Gaston et al. 1998), most notably, the potential misapplication of statistics used to test for increases in range width along gradients (Rohde et al. 1993, Roy et al. 1994).
Stevens (1992) used regression to test for significant increases in mean range width of species at sites along elevational gradients. However, because adjacent sites share species, the mean range widths at sites are not independent; tests of significance on nonindependent data are not valid. Estimates of species richness at sites along spatial gradients also may not be independent, thereby precluding use of statistical tests of significance on such data. In studies of species richness along spatial gradients, data from regional surveys of species occurrences (range distributions), or occurrences along elevational line transects, are used to obtain estimates of species richness at different sites. In the case of latitudinal data, a "site" is a latitudinal band, and the data come from regional surveys. In the case of elevational data, a site may be an actual point or zone along a real elevational transect (e.g., a line transect up the side of a mountain). Alternatively, if the data come from regional surveys, a site may be a particular elevation or elevational zone, even though an actual physical site of surveying does not exist anywhere. Regardless of the method of compiling data, one ends up with estimates of species richness at sites along a spatial gradient. From there, one could test for pattern by applying some type of significance test, such as a correlation or regression analysis. However, if the data points are not independent, the P values resulting from such tests are not valid.
Patterns of species richness along spatial gradients are dependent upon the range widths of species. Colwell and Hurtt's (1994) null model indicates the importance of hard geographical boundaries (at the ends of the gradient) in constraining range widths, and thereby influencing species richness patterns (also see Pineda and Caswell ). The random placement of ranges of random widths along a spatial gradient can give rise to statistically significant patterns of species richness, when significance is tested by traditional (but inappropriate) methods, such as regression or correlation analysis. This is because adjacent sites along a gradient will tend to share species (Fig. la). Imagine a typical bivariate plot with the dependent variable, species richness, along the y-axis, and the independent variable, elevation (or latitude), along the x-axis. The species richness estimate at any given point along the x-axis is constrained to be very similar to the previous estimate of richness. The bivariate points w ill tend to have a smooth linear or curvilinear form (Fig. 1b). Therefore, obtaining a low P value in a test of the relationship between species richness at sites and locations of the sites along a spatial gradient is insufficient evidence to conclude that some biological mechanism is at work. Patterns of species richness along spatial gradients should not be tested for structure with traditional probability-based tests of inference. Instead, one should use a null-model test that involves randomly placing observed range widths along a spatial gradient (Pineda and Caswell 1998). From a null-model test, one can obtain an estimate of the probability that the observed pattern of species richness could have occurred by chance alone. Here, I describe a null model that does this, and I show that it is powerful, flexible, and easy to use (for software, see Supplementary Material).
The null model
The null model is intended to be used with real data; it is not a simulation model attempting to discover patterns that would exist under different assumptions, as in the model of Colwell and Hurtt (1994). The output of the model is a P value that the user can consider when judging whether a nonrandom pattern of species richness exists in the data. The user begins by inputting the following information in the model: total number of species, high point and low point of the gradient, number of sampling sites or zones, the observed species richness at each site or in each zone (the species richness curve), and the observed range of each species along the gradient. The model consists of the following steps:
1) The observed range width of each species is randomly placed along the gradient, such that the width remains constant (i.e., the range widths are not truncated).
2) The simulated richness curve is obtained from the dispersion of species resulting from step 1 and from the number of sites or zones along the gradient.
3) The simulated curve is compared to a composite "average" curve, by measuring the mean displacement of the simulated curve from the composite curve (Fig. 1b). This displacement, referred to as D, equals ([sigma][d.sub.j])/n, where [d.sub.j] is the absolute difference between the observed species richness and the species richness from the composite curve for sites j = 1, 2, [ldots], n; where n is the total number of sites (or zones) along the gradient.
4) Steps 1-3 are repeated a specified number of times to obtain a distribution of the test statistic, D.
5) The D value of the observed species richness curve is obtained in the same way and compared to the distribution of null D values obtained in step 4. The significance of the real D value is given as a P value, which is the proportion of null D values greater than the real D value. Notice that a null D value less than the real D value comes from a null species richness curve that is more similar to the composite curve than is the observed species richness curve.
The "average" composite curve is obtained by repeating steps 1 and 2 a specified number of times ([greater than]100; preferably 1000). The mean species richness is then calculated for each site, and those values form the curve (Fig. 1b). This composite curve serves as a reference by which to measure nonrandomness in other curves. If an observed species richness curve is truly random, it will be very different from the composite curve and, hence, will have a very large D value and very small P value.
Power analysis of the null model test
I conducted a power analysis of the null model, so as to determine its ability to detect nonrandom species richness curves. For the power analysis, it was necessary to produce some simulated richness curves with a known amount of structure (i.e., nonrandom curves). However, it was not possible to determine absolute effect sizes represented by these curves. The simulated curves were then tested with the null model. Because the curves have structure, the model was expected to indicate that they were nonrandom by returning a low P value. In this way, the power of the model was estimated for different combinations of the input variables. The following input parameters were varied: gradient width (gw), number of species (S), and number of zones or sites (z). The structure in the curves arises from a fixed gradient width, a linear increase in the range widths of species along the gradient, and a factor called the midpoint location factor (MLF), which specifies the placement of the midpoints of the ranges along the gradient. The linear increase in range widths is due to a factor called the range increase factor (RIF).
The RIF increases the range width of each species as a specified percentage of the gradient width plus an additional random increment [leq]0.01gw:
range width of species A
= (gw X A X RIF) + a small random increment
for 1 to A species. The total number of species, S, equals 1/RIF, so setting the value of RIF also determines the number of species. Because RIF is based on the gradient width, having S = 1/RIF guarantees that most species will have a range width less than the gradient width. However, because a small random increment is added to the range width, the model occasionally produces species whose range widths slightly exceed the gradient width. For those cases, range width is reset to equal the gradient width.
The MLF determines the location of the midpoint of each range. It can range 0-0.5. Without the midpoint location factor (i.e., MLF = 0), each range begins at zero and, therefore, has a midpoint that is one-half the range width. The MLF essentially redefines where each range begins, because the difference between gw and the range width is multiplied by the MLF. If [min.sub.new] is defined as the new beginning point for a range, then [min.sub.new] = gw -- range width X MLF. For instance, if gw = 2000, MLF = 0.25, and the range width of a particular species is 200, then the range of that species begins at 450 (not zero), and its midpoint is at 550, not 100. The midpoints were always constrained to be [less than] 0.5gw. The structure of the curve is most clearly indicated by the MLF value, because as MLF approaches 0.5 the peak in species richness of the simulated curve occurs closer to the exact middle of the gradient. Curves without structure (i.e., curves with random range widths randomly placed along a gradi ent) have midgradient peaks in species richness; thus an MLF of 0.5 essentially duplicates a curve without structure.
For MLF values of 0, 0.25, and 0.5, I produced a set of 25 simulated structured curves for all possible combinations of the following input variables: gw = 1000, 3000, and 5000; z = 10, 30, and 50 zones; S = 20, 40, 60 species. Each curve was then tested with the null model. The proportion of significant curves in each set of 25 curves is an estimate of the power of the null model, for each combination of input variables. The simulated curves varied in where the peak in species richness occurred; peak in the middle (MLF = 0.5), peak halfway between the low end of the gradient and the middle (MLF = 0.25), and peak at the very lowest end with a steady decline in richness (MLF = 0). Of course, peaks toward the high end of the gradient are possible, and it is possible to have a steady increase in species richness. There was no need to test these types of curves, because the power estimates should be symmetrical around MLF = 0.5. That is, the power for detecting a nonrandom curve with MLF = 0.25 should be identic al to the power for detecting one with MLF = 0.75.
RESULTS AND DISCUSSION
Preliminary testing of the model showed it to have low power ([less than]0.5) when the total number of species S [less than] 20 and number of zones z [less than] 10, irrespective of gradient width (gw) and midpoint location factor (MLF). Therefore, I began the testing at those values of species number and zone number. The model was found to be very powerful (power [greater than] 0.96) when the data included species richness estimates from [geq]30 zones, regardless of gradient width or the total number of species. The only exceptions were when S = 20 species and MLF = 0.5 (Figs. 2 and 3). This is not surprising, given that an MLF value of 0.5 represents a species richness curve that closely resembles species richness curves that are completely random. When MLF = 0, then the power of the null model test was always [greater than]0.96 for curves divided into 10 zones (Table 1) or when z [greater than] 10 (results not shown). Additionally, when the MLF = 0.5, then the power of the model was always zero, when test ed on species richness curves with S [leq] 20 species distributed among [leq]10 zones (Table 1); however, the power increased substantially as the number of zones and species increased, even when MLF = 0.5 (results not shown). With an increasing number of species, the power approached unity (Fig. 2), and, overall, the model was found to be powerful at a significance level of [alpha] = 0.05, as long as the peak in species richness was not located in the middle of the curve (i.e., MLF [neq] 0.5) (Table 1). Gradient width (also zone width) had an interesting effect on power for some simulated curves. When S = 20 species, z = 30 or 50 zones, and MLF = 0.5, the model was the least powerful for intermediate gradient widths (i.e., gw = 3000 vs. 1000 and 5000) (Fig. 3).
The null model provides a powerful test for detecting nonrandom patterns of species richness along spatial gradients when the total pool of species is [geq]20 and the gradient is divided into [geq]10 zones. These minimal requirements are probably met by most data sets. However, ecologists often study species richness using data from a series of point samples along an elevational or latitudinal transect. Such data can be "converted" into a form in which species occurrences within zones are reported, so that the present null model can be applied. Alternatively, the null model can be modified for data arising from point samples. In such a case, the points may be approximately equally spaced along the x-axis, such that the form of the data may or may not approximate that presented in Fig. 1. In applying the model to point sample data, one would need to enter the locations of the samples, as well as the location of each species' range along the gradient. Ideally, the data would be collected from point samples tha t are approximately equally spaced and frequent compared to the total gradient width. Such data increase the likelihood that the real curve is accurately described, and they also guarantee a powerful test using the null model. Point samples that are over- or underdispersed along the gradient would probably alter the power estimates presented here.
Although I have only used this null model on species richness data derived from the elevational distributions of species (Fig. 1b), I believe it can be easily modified to test species richness data from any type of one-dimensional spatial gradient (e.g., latitude). For instance, it may be useful in testing for nonrandom pattern in species richness along gradients that often exhibit a midgradient peak in richness, such as urbanization gradients (Blair and Launer 1997), productivity gradients (Al-Mufti et al. 1977, Rosenzweig and Abramsky 1993, Guo and Berry 1998, Pollock et al. 1998), exposure gradients (Wilson and Keddy 1988), and herbivore density gradients (Lubchenco 1978). If there are hard bounds at both ends of the gradient, then midgradient peaks in species richness can occur solely by random placement of the ranges of species. Rahbek (1995) reported a widespread occurrence of midgradient peaks in his literature survey of species richness along elevational gradients. All eight of the species richness v s. latitude plots presented in Ruggiero (1994) show midgradient peaks in richness; her study included 536 species of South American mammals divided into eight taxa.
One could also modify the null model for use on unbounded gradients. Presently, the random placement of the species' actual ranges along the gradient cannot exceed the bounds of the gradient. However, the model could be modified so that ranges are allowed to extend beyond the ends of the sampling gradient. Whenever one is presented with data that exhibit a peak in species richness somewhere in the middle of the gradient, the first step should be to rule out the possibility that the peak's location is simply due to random placement of species' ranges. The null model presented in this paper can assist in accomplishing this task.
Recently, a Kolmogorov-Smirnov test was applied to test for nonrandom structure in a species richness curve (Fleishman et al, 1998). However, Kolmogorov-Smirnov tests are only appropriate for continuous, univariate data, which prevents their use in testing for pattern in bivariate data, such as species richness along spatial gradients. The two-dimensional Kolmogorov Smirnov test of Fasano and Franceschini (1987) will probably be found to have some value to ecologists as it becomes more widely known (Thomson et al. 1996, Garvey et al. 1998). It may be useful in testing for pattern in species richness data, in which there are multiple observations (species richness estimates) at each value along the x-axis (spatial gradient); the null model presented herein cannot be applied to such data, unless the multiple observations are averaged. However, species richness data are typically in a form (Fig. 1b) in which researchers can use the present null model to test for nonrandom pattern.
Pineda and Caswell (1998) presented a model similar to the one described in this paper. In their model, a null distribution of quadratic regression lines was generated, and it was used to determine whether the actual species richness patterns of gastropods and polychaetes were significantly nonrandom along a bathymetric gradient. They compared the curvature, location, and magnitude of their regression line to those of the null regression lines. The curvature, location, and magnitude of a quadratic regression line are easily determined from the coefficients of the equation of the line (see Pineda and Caswell  for details). The null regression lines gave null distributions of curvature, location, and magnitude that were used in testing the actual richness patterns for nonrandomness. The model of Pineda and Caswell (1998) is a reasonable alternative to the one I present, but there are several reasons why the present model may be preferred over that of Pineda and Caswell.
First, Pineda and Caswell (1998) were examining observed species richness patterns that were clearly hump shaped and, therefore, appropriately described by a quadratic regression line. In its present form, their model is only useful for species richness patterns that are hump shaped. It is not reasonable to obtain the curvature, location, and magnitude of a peak for patterns that do not exhibit a definite peak in richness (i.e., are not hump shaped). The model that I present is not limited to hump-shaped patterns. However, given that many species richness patterns probably do have an intermediate peak (Rahbek 1995), the Pineda and Caswell model is widely applicable. Second, the Pineda and Caswell model can give mixed results by the mere fact that three different characteristics of the line (pattern) are tested. Pineda and Caswell (1998) found that the location of the peak in gastropod species richness along a bathymetric gradient was essentially random, whereas curvature and magnitude were not. Additionally, the curvature of the peak in polychaete species richness was essentially random, whereas location and magnitude were not (Pineda and Caswell 1998). The present model cannot give similar inconsistent results, because only the mean displacement (D) of the species richness pattern is tested, Third, D is neither a parameter of a regression line, nor a property of the pattern, but instead is simply a measure of the difference between an actual species richness pattern and a pattern that results from the random placement of species' ranges. D is more composite than are curvature, location, and magnitude taken singly. Because mean displacement is not a property of a pattern, it does not beg of any explicit link to theory. The curvature, location, and magnitude of a species richness pattern may be nonrandom, but what does theory have to say about the reasons for the nonrandom structure? Theory concerned with species richness patterns and distributions along gradients is not yet sufficiently developed to be useful in making precise predictions about the curvature, location, and magnitude of species richness peaks. However, when theory does become more developed, the Pineda and Caswell (1998) model will be preferable to the present model, because the Pineda and Caswell model will then have valuable links to theory.
Ecologists and others involved in the study of species richness patterns along spatial gradients should be aware of the need to rigorously test the patterns for nonrandom structure, before invoking mechanisms or causal explanations of the patterns. The model presented in this paper and the model of Pineda and Caswell (1998) are both recommended for their ease of use as well as rigor.
The manuscript was improved by the helpful comments of Steve Jenkins and James Lyons-Weiler. I thank Andrew Solow for informing me of the Pineda and Caswell model. I also thank Marilyn Banta for help in programming the model and Robert Espinoza for providing data on Liolaemus distribution.
Al-Mufti, M. M., C. L. Sydes, S. B. Furness, J. P. Grime, and S. R. Band. 1977. A quantitative analysis of shoot phenology and dominance in herbaceous vegetation. Journal of Ecology 65:759-791.
Blair, R. B., and A. E. Launer. 1997. Butterfly diversity and human land use: species assemblages along an urban gradient. Biological Conservation 80:113-125.
Colwell, R. K., and G. C. Hurtt. 1994. Nonbiological gradients in species richness and a spurious Rapoport effect. American Naturalist 144:570-595.
Fasano, G., and A. Franceschini. 1987. A multidimensional version of the Kolmogorov-Smirnov test. Monthly Notices of the Royal Astronomical Society 225:155-170.
Fleishman, E., G. T. Austin, and A. D. Weiss. 1998. An empirical test of Rapoport's rule: elevational gradients in montane butterfly communities. Ecology 79:2482-2493.
France, R. 1992. The North American latitudinal gradient in species richness and geographical range of freshwater crayfish and amphipods. American Naturalist 139:342-354.
Garvey, J. E.,E. A. Marschall, and R. A. Wright. 1998. Detecting relationships in continuous bivariate data. Ecology 79:442-447.
Gaston, K. J., T. M. Blackburn, and J. I. Spicer. 1998. Rapoport's rule: time for an epitaph? Trends in Ecology and Evolution 13:70-74.
Graham, G. L. 1990. Bats versus birds: comparisons among Peruvian volant vertebrate faunas along an elevational gradient. Journal of Biogeography 17:657-668.
Guo, Q., and W. L. Berry. 1998. Species richness and biomass: dissection of the hump-shaped relationships. Ecology 79:2555-2559.
Lubchenco, J. 1978. Plant species diversity in a marine intertidal community: importance of herbivore food preference and algal competitive abilities. American Naturalist 112:23-39.
McCoy, E. D. 1990. The distribution of insects along elevational gradients. Oikos 58:313-322.
Pagel, M. D., R. M. May, and A. R. Collie. 1991. Ecological aspects of the geographical distribution and diversity of mammalian species. American Naturalist 137:791-815.
Pineda, J., and H. Caswell. 1998. Bathymetric species-diversity patterns and boundary constraints on vertical range distributions. Deep Sea Research II 45:83-101.
Pollock, M. M., R. J. Naiman, and T. A. Hanley. 1998. Plant species richness in riparian wetlands--a test of biodiversity theory. Ecology 79:94-105.
Rahbek, C. 1995. The elevational gradient of species richness: a uniform pattern? Ecography 18:200-205.
Rahbek, C. 1997. The relationship among area, elevation, and regional species richness in neotropical birds. American Naturalist 149:875-902.
Rapoport, E. H. 1982. Areography: geographical strategies of species. Pergamon, Oxford, UK.
Rohde, K. 1992. Latitudinal gradients in species diversity: the search for the primary cause. Oikos 65:514-527.
Rohde, K., M. Heap, and D. Heap. 1993. Rapoport's rule does not apply to marine teleosts and cannot explain latitudinal gradients in species richness. American Naturalist 142:1-16.
Rosenzweig, M. L., and Z. Abramsky. 1993. How are diversity and productivity related? Pages 52-65 in R. E. Ricklefs and D. Schluter, editors. Species diversity in ecological communities: historical and geographical perspectives. University of Chicago Press, Chicago, Illinois, USA.
Roy, K., D. Jablonski, and J. W. Valentine. 1994. Eastern Pacific molluscan provinces and latitudinal diversity gradient: no evidence for "Rapoport's rule." Proceedings of the National Academy of Sciences USA 91:8871-8874.
Ruggiero, A. 1994. Latitudinal correlates of the sizes of mammalian geographical ranges in South America. Journal of Biogeography 21:545-559.
Stevens, G. C. 1989. The latitudinal gradient in geographical range: how so many species coexist in the tropics. American Naturalist 133:240-256.
Stevens, G. C. 1992. The elevational gradient in altitudinal range: an extension of Rapoport's latitudinal rule to altitude. American Naturalist 140:893-911.
Thomson, J. D., G. Weiblen, B. A. Thomson, S. Alfaro, and P. Legendre. 1996. Untangling multiple factors in spatial distributions: lilies, gophers, and rocks. Ecology 77:1698-1715.
Wilson, S. D., and P. A. Keddy. 1988. Species richness, survivorship, and biomass accumulation along an environmental gradient. Oikos 53:375-380.
Software and documentation for the null-model tests are available from ESA's Electronic Data Archive: Ecological Archives E081-010.
Power estimates for the null-model test when number of zones (z) = 10, at various levels of significance [alpha]. Power gw S MLF [alpha] = 0.05 [alpha] = 0.025 [alpha] = 0.01 [alpha] = 0.001 1000 20 0 1 1 1 1 0.25 0.88 0.20 0 0 0.5 0 0 0 0 40 0 1 1 1 1 0.25 1 1 1 1 0.5 0.44 0.08 0.04 0 60 0 1 1 1 1 0.25 1 1 1 1 0.5 1 0.88 0.52 0.04 3000 20 0 1 1 1 1 0.25 1 0.56 0.28 0 0.5 0 0 0 0 40 0 1 1 1 1 0.25 1 1 1 0.80 0.5 0.88 0.12 0.04 0 60 0 1 1 1 1 0.25 1 1 1 1 0.5 0.64 0.28 0.08 0 5000 20 0 1 1 1 1 0.25 0.92 0.32 0 0 0.5 0 0 0 0 40 0 1 1 1 1 0.25 1 1 1 0.92 0.5 0.80 0.24 0.12 0 60 0 1 1 1 1 0.25 1 1 1 1 0.5 1 0.68 0.28 0.04
Notes: The estimates were obtained by producing a set of 25 simulated species-richness curves for each input parameter combination, and then determining the proportion of curves that were significant at the various levels of [alpha]. Notation follows that given in the text: gw = gradient width, S = number of species, and MLF = midpoint location factor. Note that an estimate of unity is actually a value [greater than]0.96.