# Estimating selection on quantitative traits using capture-recapture data.

Determining the form and intensity of natural selection on phenotypic traits in natural populations is a topic of continuing interest to evolutionary ecologists. A number of methods for estimating selection coefficients for continuously varying (quantitative) traits have been developed and applied to natural systems (for recent reviews, see Manly 1985; Endler 1986). A general requirement for these methods is that values for the fitness and phenotypic traits of each individual are available. Capture-recapture (CR) methods are used widely to estimate survival rates in animal populations. Such methods have been used frequently to estimate differences in mean survival rates for individuals in discrete phenotypic classes (for recent reviews, see Manly 1985; Lebreton et al. 1992). The available methods for estimating selection on continuously varying traits (where each individual has unique trait values) cannot be used directly with CR data, because in general the survival rate or life span of each individual is not observed directly. Researchers using CR data have typically either divided the continuous variation into arbitrary, discrete phenotypic classes, or have used the last date of recapture as a minimum estimate of an individual's life span. Both of these approaches discard potentially valuable information contained in the original data set, and the latter approach provides biased estimates.Here we describe a maximum-likelihood method for estimating selection coefficients for continuously varying phenotypic traits from capture-recapture data. The method incorporates a survival (fitness) function dependent upon the phenotypic traits into a standard CR analysis similar to the Cormack-Jolly-Seber method (Cormack 1964; Jolly 1965; Seber 1965). One can then estimate the coefficients in the survival function by maximizing the likelihood function with respect to the parameters. Hoffmann (1993) has independently developed a similar method for incorporating individual covariates. We illustrate the model using data on wing-pattern traits for a population of the western white butterfly, Pontia occidentalis Reakirk. We also discuss the choice of appropriate fitness functions.

THE MODEL

The method outlined here applies to a single- or multiple-release, multiple-recapture experiment, in which the values for a set of phenotypic traits are measured for each individual. We will describe the model in terms of a daily time interval, but of course other intervals are possible. To allow useful comparison of the effects of different phenotypic traits on survival, all phenotypic variables are assumed to be in standard form with mean 0 and standard deviation 1 (Lande 1979; Lande and Arnold 1983), though this is not essential to the model. We first introduce the survival function that defines the selection coefficients, and then develop the likelihood function for the capture-recapture (CR) model.

1. The Survival Function

Let [[Phi].sub.jk] equal the conditional probability for individual j surviving day k + 1 given survival up to day k (k = 1, 2, . . ., K - 1). We wish to define an appropriate fitness function relating the phenotypic trait to this conditional survival probability. Here we use a logistic regression (log-odds) model for [[Phi].sub.jk].

log [[[Phi].sub.jk] / 1 - [[Phi].sub.jk]] = [D.sub.k] + [summation of] [[Beta].sub.q][z.sub.qj] where q=1 to Q, (1)

where:

[D.sub.k] = the log odds on day k for an individual with mean phenotypic values (z = 0 for all Q traits),

[z.sub.qj] = value of phenotypic trait q for individual j,

Q = the total number of phenotypic traits considered, and

[[Beta].sub.q] = the directional selection coefficient for trait q.

Note that if we define

[W.sub.jk](z) = [D.sub.k] + [summation of] [[Beta].sub.q][z.sub.qj] where q=1 to Q,

then

[[Phi].sub.jk] = exp[[W.sub.jk](z)] / {1 + exp[[W.sub.jk](z)]}. (2)

The log odds (eq. [1]) for the survival function take on values from negative infinity to positive infinity (unlike a probability), thus avoiding problems with constraints on the possible values of the selection coefficients. The log-odds model has been used widely in survival analysis in the biomedical literature (Cox and Snell 1989; Kalbfleish and Prentice 1980) and represents one member in the family of generalized linear models (McCullagh and Nelder 1989). As we emphasize in the Discussion, equation (1) is one of a number of possible functions that may be appropriate for linking survival probabilities in analyzing selection (Manly 1976, 1977; Williams et al. 1990; Lebreton et al. 1992; S. G. Smith and J. R. Skalski MS). The survival function may be readily modified to include quadratic terms that represent stabilizing, disruptive or correlational selection (Lande and Arnold 1983; Manly 1985), but this is not considered here.

2. CR Model

Suppose there is a single release of all individuals on day k = 1, with subsequent multiple recaptures (e.g., Burnham et al. 1987). In the general case, the recapture probabilities and survival probabilities depend on day k. The CR model directly parallels that of Cormack (1964), Jolly (1965), and Seber (1965), except that here the survival probability [[Phi].sub.jk] varies among individuals based upon their phenotypes. Let [[Delta].sub.j] = {[[Delta].sub.j1] [[Delta].sub.j2] . . . [[Delta].sub.jK]} be the capture history of an individual j on each day k, where [[Delta].sub.jk] = 1 if captured on day k, and [[Delta].sub.jk] = 0 if not captured on day k. Let K = the last day of recaptures in the population, [t.sub.j] = the last day on which a particular individual j is recaptured, and [P.sub.k] = the probability of recapture on day k (given alive) for all individuals, for k = 2, . . ., K. Here we assume that the recapture probability on day k is the same for each individual (but see the Discussion). It is further assumed that the survival and recapture probabilities for each individual are independent of the prior-capture history for that individual and of all other individuals. The likelihood contributed by individual j, [L.sub.j], is given by

[Mathematical Expression Omitted],

where [[Chi].sub.[t.sub.j]] is the probability that [t.sub.j] is the last day the individual is seen. In terms of [[Phi].sub.jk] and [P.sub.k], this is

[Mathematical Expression Omitted],

with the conventions that [[Phi].sub.jK] = 0 and that null products equal 1. The likelihood function for all J individuals in the sample is the product of the [L.sub.j]'s:

[Mathematical Expression Omitted].

Using CR data, the parameters [[Phi].sub.K-1] and [P.sub.K] for the final day of recapture cannot be estimated separately, but the product [[Phi].sub.K-1][P.sub.K] can be estimated. Analytical solutions for the maximum likelihood estimators for the selection coefficients ([[Beta].sub.q]) are not possible. Maximum-likelihood estimators are found numerically by locating the maximum value of the likelihood function. Our simulations have not detected the presence of local peaks in the likelihood surface for any given data set.

APPLICATION OF THE MODEL

To illustrate the model, we have estimated model parameters from a field study of a population of Pontia occidentalis near Corfu, Washington, in July 1989 (Kingsolver 1995). In this study, adults with little evidence of wing wear (wing-wear classes 1 or 2: see, e.g., Watt et al. 1977) were captured with hand nets in the field, placed in coolers, and brought to our field laboratory. Within 6 h of capture, the dorsal and ventral wing surfaces of each individual were videotaped using a camcorder against a background with both size and gray-level intensity calibration standards under controlled lighting conditions. An identification number was then written with a felt-tip marker on the ventral forewings. All butterflies were released at the site in the evening of the day of capture. A total of 289 individuals (203 males and 86 females) were marked and released in 1 d. Intensive recapture efforts using hand nets then occurred daily for the following 7 d; these efforts resulted in estimated daily recapture probabilities ranging from 28% to 60%. Because of low capture rates, recapture data for the final 2 d were combined into a single period. Using the videotapes and JAVA image-analysis software on a Tandy 3000 microcomputer, two phenotypic traits were measured on the right hindwing for each marked individual (for further details, see Kingsolver and Wiernasz 1991a): hb, mean gray level (on a scale of 0 to 1) of the basal region of the dorsal hindwing; and pv, mean gray level (on a scale of 0 to 1) of the posterior region of the ventral hind-wing. Scores for hb and pv were arcsine-square-root transformed before further analysis (Kingsolver and Wiernasz 1991b). Standardization of phenotypic traits was determined separately for males and females. Because of their thermo-regulatory effects (Kingsolver 1987), hb and pv were predicted to experience negative selection during the warm summer conditions at this site.

TABLE 1. Results for selection analyses on wing melanin traits in 289 Pontia occidentalis butterflies at Corfu, Washington in July 1989. See text for details of the experiments and analyses.

Number of Log Model description parameters likelihood

No selection 12 -389.941 Selection on pv and hb 14 -385.885 Selection on pv alone 13 -387.825 Selection on hb alone 13 -386.056

The data analyses reported here were conducted using the SURPH software program (Skalski et al. 1993; S. G. Smith and J. R. Skalski MS) implemented for a Sun Sparc2 workstation. SURPH provides a flexible framework for the exploratory modeling and hypothesis testing of capture-recapture data. We used a model in which both capture and survival probabilities vary between time intervals (eqs. 1-5); all model parameters are estimated simultaneously by maximizing the likelihood function. To test for significance of the selection coefficients, likelihood ratio tests for models with and without selection are computed.

Results of the analyses are summarized in table 1. The analysis for the model with selection on both traits yielded estimates of [Beta] = -0.234 for hb, and [Beta] = -0.071 for pv. A likelihood-ratio test for this model against the null hypothesis of no selection on either trait (i.e., [Beta] = 0 for each trait) indicated a significant effect of the traits on survival ([Mathematical Expression Omitted], P = 0.017). Likelihood-ratio tests also show that omitting pv from the model doesn't significantly reduce the likelihood ([Mathematical Expression Omitted], P = 0.559); but omitting hb from the model does significantly reduce the likelihood ([Mathematical Expression Omitted], P = 0.049).

To evaluate the statistical power of the method for detecting differences in mean survival rates between groups (e.g., between different treatment groups), we have generated artificial data sets. For these studies, we assigned a common daily probability of recapture (given alive) of [P.sub.k] = 0.5 for each day k, and different daily probabilities of survival for individuals in two different treatment groups. Monte Carlo methods were then used to generate daily recapture records for 500 individuals. These generated data sets were then analyzed using the constant survival model to determine the ability of the method to detect differences in survival rates between groups. The model was implemented by J.G.K. in Turbo Pascal 5.0 on a 80386 microcomputer. These simulation studies show that differences in the mean survival probability ([Phi]) of 15%-20% between groups can be reliably detected using likelihood-ratio tests at the P = 0.05 level, and differences of 35% can be detected at the P = 0.001 level.

DISCUSSION

The existing methods for analyzing selection on continuously varying phenotypic traits of individuals (e.g., Lande 1979; Schluter 1988) cannot be directly applied to mark-recapture data, in which the life span or survival rate of each individual is not directly observed. The main contribution of our results is to provide a method for incorporating a survival function into a capture-recapture model that allows estimation of selection coefficients for continuously varying phenotypic traits of individuals. The fitness function used here [eq. (1)] represents one of several appropriate link functions (McCullagh and Nelder 1989) commonly used for analyzing the relationship between individual characteristics and mortality or survival rates using generalized linear models (e.g., Kalbfleisch and Prentice 1980; Cox and Oakes 1984; Cox and Snell 1989; S. G. Smith and J. R. Skalski MS). The extensive biomedical literature on risk factors and mortality and disease rates has considered the effects of censored data in which the time of death is not known for some individuals but has not considered the general problem of recapture probabilities inherent in most mark-recapture data. Manly (1976, 1977, 1985) has used the proportional hazards function for estimating selection but again did not consider the problem of recapture probabilities, and Williams et al. (1990) have explored a variety of link functions in analyzing laboratory selection experiments. A. Hoffmann has independently developed survival models for CR studies that incorporate individual covariates (i.e., quantitative traits), based on a proportional hazards survival function, that are similar to those reported here (Hoffmann 1993; Skalski et al. 1993); her studies include a more thorough exploration of the statistical properties of these types of models.

Two quite distinct issues exist in choosing an appropriate fitness function. First, what function provides the best fit to the data? The answer will of course depend on the particular data available. For example, for the data set analyzed above, substituting a proportional hazards survival function for the logit (log odds) function has little effect on the model fit: the likelihood ratio between the two models is only 1.076. In the context of selection experiments, Williams et al. (1990) discussed methods to identify the best fitting model using generalized linear models, considering a variety of link (fitness) functions. We emphasize that such survival functions may be readily altered to include quadratic or other terms (Lande and Arnold 1983), or a mixture of discrete and continuous traits. Schluter (1988) has proposed the use of non-parametric methods for estimating fitness functions. Second, how do selection coefficients relate to the evolutionary response to selection? Lande and Arnold (Lande 1979, 1982; Lande and Arnold 1983) have shown that the selection gradient, a set of linear partial-regression coefficients of relative fitness on the phenotypic traits, is directly proportional to the evolutionary response to directional selection; this result does not assume that the actual regressions of fitness on the traits are linear. The selection coefficients defined here by equation (1) do not represent selection gradients in this sense. The choice of the fitness measure relevant to the evolutionary response to selection will depend on life-history considerations (Charlesworth 1980; Lande 1982). In cases where a discrete generation approach is appropriate, each sampling interval could be considered a distinct episode of selection (Arnold and Wade 1984), and the identity link function used in place of equation (1), such that the selection coefficients are linearly related to the conditional survival probabilities. However, because survival probabilities are constrained between 0 and 1, this approach poses computational difficulties. In the case of an age-structured population in continuous time, where exp(r) (r = instrinsic rate of increase) is an appropriate fitness measure (Charlesworth 1980; Lande 1982), the proportional-hazards function may be more appropriate, as it implies an exponential effect of phenotypic traits on survival probabilities (Manly 1976, 1977).

Of course, all the assumptions and limitations of capture-recapture analyses apply to the method proposed here. A variety of studies have considered generalizations and special cases of the simple capture-recapture model defined by equations (3-4) (see, e.g., Seber 1982; Manly 1985; Lebreton et al. 1992). Two issues are of particular interest. First, most CR studies cannot distinguish between death and emigration, thus (1 - [[Phi].sub.jk]) typically represents the rate of disappearance, not the rate of mortality. However, the rate of emigration will affect the estimate of [D.sub.k] but not the estimates of the selection coefficients ([[Beta].sub.q]), unless emigration rate depends on the phenotypic trait values. One approach to evaluating the possibility of phenotype-dependent emigration is to divide the study site into a series of subsites and to record the rate of movement among subsites upon recapture for each individual (e.g., Nichols et al. 1993). One can then examine whether the rate or probability of movement among subsites between successive recaptures of an individual depends on the individual's phenotypic trait values, using logistic regression or other appropriate generalized linear models.

The second issue is that the above model assumes that recapture probabilities are independent of individual phenotype. In some cases, this assumption may be inappropriate for the organism and phenotypes under consideration. One approach to this difficulty is to model recapture probabilities in a manner analogous to the survival function. For example,

log [[P.sub.jk] / 1 - [P.sub.jk]] = [E.sub.k] + [summation of] [[Alpha].sub.q][z.sub.qj] where q=1 to Q, (7)

where [E.sub.k] = the log odds on day k for the recapture of an individual with mean phenotypic values for all Q traits, and [[Alpha].sub.q] = the recapture coefficient for trait q. The recapture (a) and selection ([Beta]) coefficients could then be estimated using the likelihood function given by equation (3) or (6). The SURPH program used here allows such phenotypic effects on recapture rates to be estimated.

Finally, we note that the methods developed here and elsewhere (Hoffmann 1993; Skalski et al. 1993) relating quantitative phenotypic traits to survival rates may also be valuable for capture-recapture analyses of fish and wildlife. In this regard, the program SURPH (Skalski et al. 1993; S. G. Smith and J. R. Skalski MS) provides a flexible framework for model fitting and hypothesis testing of the effects of individual and group traits on survival rates.

ACKNOWLEDGMENTS

J.G.K. is indebted to N. Breslow, K. Cole, and B. McKnight of the Biostatistics Department at the University of Washington for suggesting the basic approach and for help in developing the model. R. Cormack, R. Lande, J. Felsenstein, J. Skalski, and an anonymous reviewer provided statistical advice or helpful comments or both on previous versions of the manuscript. The United States Fish and Wildlife Service permitted us to conduct the field studies at Corfu, Washington. Thanks to G. Gilchrist, D. Secord, S. Toba, M. Wells, S. Wenning, and J. Winterer for assistance with the CR studies, and to S. Toba for video and data analyses. This research was supported by National Science Foundation grants BSR 89-08131 and IBN-9220748 to J.G.K. The development of SURPH was supported by grant DE-BI79-90BP02341 from the Bonneville Power Authority to J. R. Skalski. The Sun computers used in these studies were made available through a grant from the M. J. Murdock Charitable Trust.

LITERATURE CITED

Arnold, S. J., and M. J. Wade. 1984. On the measurement of natural and sexual selection: theory. Evolution 38:709-719.

Burnham, K. P., D. R. Anderson, G. C. White, C. Brownie, and K. H. Pollock. 1987. Design and analysis methods for fish survival experiments based on release-recapture. American Fisheries Society Monograph 5, Bethesda, Md.

Charlesworth, B. 1980. Evolution in age-structured populations. Cambridge University Press, Cambridge.

Cormack, R. M. 1964. Estimates of survival from the sighting of marked animals. Biometrika 51:429-438.

Cox, D. R., and D. Oakes. 1984. Analysis of survival data. Chapman and Hall, London.

Cox, D. R., and E. J. Snell. 1989. Analysis of binary data. Chapman and Hall, London.

Endler, J. A. 1986. Natural selection in the wild. Princeton University Press, Princeton, N.J.

Hoffmann, A. 1993. Quantifying selection in wild populations using known-fate and mark-recapture designs. Ph.D. diss. University of Washington, Seattle.

Jolly, G. M. 1965. Explicit estimates from capture-recapture data with both death and immigration - stochastic model. Biometrika 52:225-247.

Kalbfleisch, J. D., and R. L. Prentice. 1980. The analysis of failure time data. Wiley, New York.

Kingsolver, J. G. 1987. Evolution and coadaptation of thermoregulatory behavior and wing pigmentation pattern in pierid butterflies. Evolution 41:472-490.

-----. 1995. Viability selection on seasonally polyphenic traits: wing melanin pattern in western white butterflies. Evolution 49.

Kingsolver, J. G., and D. C. Wiernasz. 1991a. Seasonal polyphenism in wing-melanin pattern and thermoregulatory adaptation in Pieris butterflies. American Naturalist 137:816-830.

-----. 1991b. Development, function, and the quantitative genetics of wing melanin pattern in Pieris butterflies. Evolution 45:1480-1492.

Lande, R. 1979. Quantitative genetic analysis of multivariate evolution, applied to brain-body size allometry. Evolution 33:402-416.

-----. 1982. A quantitative genetic theory of life history evolution. Ecology 63:607-615.

Lande, R., and S. J. Arnold. 1983. The measurement of selection on correlated characters. Evolution 37:1210-1226.

Lebreton, J. D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992. Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecological Monographs 62:67-118.

Manly, B. F. J. 1976. Some examples of double exponential fitness functions. Heredity 36:229-234.

-----. 1977. A new index for the intensity of natural selection. Heredity 38:321-328.

-----. 1985. The statistics of natural selection on animal populations. Chapman and Hall, London.

McCullagh, P., and J. A. Nelder. 1989. Generalized linear models, 2d ed. Chapman and Hall, New York.

Nichols, J. D., C. Brownie, J. E. Hines, K. H. Pollock, and J. B. Hestbeck. 1993. The estimation of exchange among populations or subpopulations. Pp. 265-280 in J. D. Lebreton and P. M. North, eds. Marked individuals in the study of bird population. Birkhauser, Basel, Switzerland.

Schluter, D. 1988. Estimating the form of natural selection on a quantitative trait. Evolution 42:849-861.

Seber, G. A. F. 1965. A note on the multiple-recapture census. Biometrika 52:249-259.

-----. 1982. Estimating animal abundance and related parameters. Griffin, London.

Skalski, J. R., A. Hoffman, and S. G. Smith. 1993. Pp. 9-28 in J. D. Lebreton and P. M. North, eds. Testing the significance of individual- and cohort-level covariates in animal survival studies. Marked individuals in the study of bird population. Birkhauser, Basel.

Watt, W. B., F. S. Chew, L. R. G. Snyder, A. G. Watt, and D. E. Rothschild. 1977. Population structure of Pierid butterflies. I. Numbers and movements of some montane Colias species. Oecologia 27:1-22.

Williams, C. J., W. W. Anderson, and J. Arnold. 1990. Generalized linear modeling methods for selection component experiments. Theoretical Population Biology 37:389-423.

NOTE ADDED IN PROOF

Both Unix (X-windows) and PC (Microsoft Windows) versions of SURPH are available for downloading via the World Wide Web at: http://www.cqs.washington.edu/warren/surph.shtml

JOEL G. KINGSOLVER Department of Zoology, NJ-15, University of Washington, Seattle, Washington 98195

STEVEN G. SMITH Center for Quantitative Science, School of Fisheries, HR-20, University of Washington, Seattle, Washngton 98195

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Brief Communications |
---|---|

Author: | Kingsolver, Joel G.; Smith, Steven G. |

Publication: | Evolution |

Date: | Apr 1, 1995 |

Words: | 3724 |

Previous Article: | Tree balance and tree completeness. |

Next Article: | Measuring the ratio of effective population size to adult numbers using genetic and ecological data. |

Topics: |