Printer Friendly

A fundamental problem in key factor analysis.


Key factor analysis is a popular method of life table analysis that originated with Morris (1959) and Varley and Gradwell (1960). Their idea was to identify factors that are "largely responsible for (contributing to) the observed changes (variations) in population." More specifically, in terms of the familiar relationship

K = [k.sub.1] + [k.sub.2] + . . . + [k.sub.n]; (1)

K = -log (generation survival)

[k.sub.i] = -log (ith stage survival).

Varley and Gradwell (1970) called the factor that largely determines the k value of a given stage "a key factor" if "the variations in [the] k-value [of that stage] contribute most to the variations in K" because this is the factor that "appears to be largely responsible for the observed changes in population" (Morris 1959).

The principle of the idea may be agreeable as long as it is understood that the phrase "observed changes (or variations)" in population means the "pattern" of population fluctuation, since the central issue of population dynamics is to find the ecological mechanism that generates the pattern. The problem is, however, that the specific ways the concept of key factors is translated into certain statistical indices to identify such factors (henceforth, the key factor indices) do not live up to the principle. These indices interpret the "variations (or observed changes) in population" only in terms of the variance of K (or, alternatively, the variance of population size) and its constituents, the variances and covariances of the componential k's. As I demonstrate, different factors play different roles in determining the pattern of population fluctuation and their roles cannot be evaluated by a mere degree of their contributions to the variance of K (or population size).

I believe that no less than 400 papers based on key factor analysis have been published in the past few decades. Also, many people have tried to solve some statistical problems of estimating (or interpreting) the key factor indices. My purpose here is to reveal the fundamental problem that lies beyond the technicality of statistics.

I first review what the key factor indices are, then examine what they really mean, and, finally, show why they are not useful. In the last section, I argue that there are many different, legitimate criteria to recognize factors that play key roles (hence "key factors") in determining, not only the degree of "variations" or "changes," but "patterns" of fluctuation in population, so that there would be no simple, general method of identifying them.


Morris (1959) considered the factor causing the stage-specific mortality that was significantly correlated with total (generation) mortality to be a key factor. Accordingly, Harcourt (1969) calculated a correlation coefficient between K and each k to determine key factors.

Varley and Gradwell (1970), however, insisted on visual comparisons of the time-plotted graphs of all k values along with K to look for a k that was not only correlated with K but also contributed greatly to the variation in K. With such a visual comparison, they suggested, one could exclude a k that was indeed correlated with K but made only a small contribution (numerical influence) to the variation in K.

Agreeing with Varley and Gradwell except for their idea of visual comparison, Metcalfe (1972) used the regression of K on each k. The most frequently used method currently, however, is that proposed by Podoler and Rogers (1975): to regress a k on K. In this method, it is supposed that the "relative importance of each factor will be proportional to its regression coefficient" (Southwood 1978).

Prior to Podoler and Rogers (1975), Smith (1973) proposed using the covariance between K and each k as a measure of each k's contribution to the variance of K. I shall show that the two methods are equivalent.

Manly (1977) proposed decomposing the variance of population size into its components. Although the method is somewhat elaborate in detail, there is little difference in the basic idea, compared with others, in that "changes in population" is merely reduced to the variance.

I now examine what these indices really mean.


Before we examine individual cases, one remark on a general point must be made. Although the above authors calculate "correlation" and "regression" coefficients, these are simple (or zero-order) coefficients, despite the fact that more than two "independent" variables are involved in relationship (1); in the Podoler-Rogers case, even an "independent" variable, k, is regressed on the dependent variable, K. Indeed, these coefficients are supposed to serve as a measure of the importance of a factor contributing to the variation in K and should not be confused, for the reason given below, with the "regression" and "correlation" coefficients estimated in the usual multiple regression analysis. I raise this point because Varley and Gradwell (1970) once made a confusing remark: "The use of correlation and multiple regression analysis is a valid approach to this problem [key-factor analysis] only when we can guarantee that there is no intercorrelation between the variables."

An objective of regression analysis is to determine the coefficients of the independent variables in a regression model. But relationship (1) is a complete description of the fact that K is the exact sum of the k's, so that the coefficient of each k is already known, by definition, to be identically equal to 1. So, there is no point applying a regression analysis here merely to confirm that fact: true but meaningless in the context of identifying a key factor. Likewise, the partial correlation between K and a given k, while the rest of the k's are held fixed, is identically equal to 1 and, hence, so is the multiple correlation between K and all k's. Again, there is no point applying correlation analysis to establish the already known fact that the variation in K in relationship (1) is completely determined by the variations in the componential k's.

Thus, we should look at the coefficients suggested by the above authors as indices for identifying a key factor (in each author's own interpretation) rather unrelated to the usual regression and correlation analyses.

1. "Regression" of a k on K: the Podoler-Rogers index. - Consider the linear regression coefficient of [k.sub.i] on K:

[[Beta].sub.i] = Cov(K, [k.sub.i])/Var(K). (2)

The numerator on the right-hand side can be decomposed as:

Cov(K, [k.sub.i]) = Cov([k.sub.1], [k.sub.i]) + Cov([k.sub.2], [k.sub.i]) + . . . + Cov([k.sub.n], [k.sub.i]). (3)

The denominator, Var(K), can be similarly decomposed into elements that are conveniently displayed in the following n x n symmetric matrix of variances and covariances:

[Mathematical Expression Omitted]

We see that either the [] column or row (identical with the column) of the matrix constitutes the elements of the sum on the right-hand side of Eq. 3. In particular, Var([k.sub.i]) [equivalent to] Cov([k.sub.i], [k.sub.i]) is [k.sub.i]'s own contribution, and Cov([k.sub.i], [k.sub.j]) is the joint contribution of [k.sub.i] with [k.sub.j] (j [not equal to] i), to Var(K).

Thus, index [[Beta].sub.i] represents [k.sub.i]'s relative contribution, in terms of the sum of the elements on either the [] column or the [] row, to the variance of total K, the sum of all elements of the matrix.

As already mentioned in Review of key factor indices above, Smith (1973) suggested the use of the covariance, Cov(K, [k.sub.i]), which is equal to [[Beta].sub.i] Var(K). Clearly, the rank of [k.sub.i] as indicated by the covariance is the same as indicated by [[Beta].sub.i].

2. "Regression" of K on a k: the Metcalfe index. - Consider now the simple regression coefficient of K on [k.sub.i]:

[[Beta][prime].sub.i] = Cov(K, [k.sub.i])/Var([k.sub.i]). (5)

The numerator on the right-hand side is the same as that of [[Beta].sub.i] but the diviser is Var([k.sub.i]) instead of Var(K). This index provides no meaningful information, in any sense, about the contribution of [k.sub.i] to the variation in K. The numerator, Cov(K, [k.sub.i]), as a measure of [k.sub.i]'s contribution to Var(K) tends to become irrelevant when divided by Var([k.sub.i]). Typically, if [k.sub.i] is an independent factor, i.e., Cov([k.sub.j], [k.sub.i]) = 0 for all j [not equal to] i, then, Cov(K, [k.sub.i]) = Var([k.sub.i]), [TABULAR DATA FOR TABLE 1 OMITTED] so that [[Beta][prime].sub.i] is identically equal to I no matter what [k.sub.i]'s true contribution may be.

The above result merely confirms the fact that, as already pointed out, the coefficient of each k in Eq. 1 is identically equal to 1; in particular, if [k.sub.i] is independent of other k's, the simple regression coefficient is equal to the corresponding partial coefficient in multiple regression, which is always equal to 1.

3. "Correlation" between K and a k: the Morris-Harcourt index. - Consider, finally, the simple correlation coefficient between K and [k.sub.i]:

[[Rho].sub.i] = Cov(K, [k.sub.i])/[{Var(K)Var([k.sub.i])}.sup.1/2] (6)

and so [[[Rho].sub.i].sup.2] = [[Beta].sub.i][[Beta][prime].sub.i] as is well known. We see that [Rho], as a measure of relative contribution of [k.sub.i] to the variance of K, is contaminated with the irrelevant parameter [Beta][prime] except for the case of independent k's. If independent, [Beta][prime] [equivalent to] 1 as already mentioned, so that [[[Rho].sub.i].sup.2]= [[Beta].sub.i].

In summary, index [Beta] has a statistical meaning: it can be used as a measure of the relative contribution of [k.sub.i] (including its joint contributions with other k's) to the variance of total K. Index [Rho] is less meaningful as it contains irrelevant information carried by [Beta][prime].

The point is, however, that even index [Beta] (henceforth, the key factor index), which only takes into account the variances and covariances of the k values as the constituents of Var(K), provides no information about the central issues in the study of population dynamics, the pattern of population changes and its causal mechanism, as I now show.


I discuss three particular problems: (1) the key factor index, [Beta], cannot indicate qualitative differences in the variation in k value between different stages; (2) the index overlooks the potential importance of a factor that does not vary much in time; and (3) a key factor, identified by a high [Beta] value, can be artificially created by an arbitrary stage division in life tables.

Before we discuss these problems, however, I should point out the following. In the study of population dynamics, we are interested in the analysis of the rate of change in population from one generation to the next, which is made up of two major components: generation survival and recruitment of a new generation (e.g., mean number of eggs laid per adult). The total K in Eq. 1, a negative value of the log generation survival, does not provide full information on the rate of population change. If we add another k to represent the negative value of the log recruitment rate, K becomes equal to the negative value of log rate of change in population.

In the following two subsections, I shall use my notation for convenience and consistency. [H.sub.i]: log survival rate during the [] developmental stage (= -[k.sub.i]); [H.sub.g] (= [H.sub.1] + [H.sub.2] + . . . + [H.sub.n]): log generation survival (= -K); [H.sub.r]: log recruitment rate; R: log intergeneration rate of change in population. Written explicitly, the relationship is, instead of Eq. 1:

R = [H.sub.1] + [H.sub.2] + . . . + [H.sub.n] +[ H.sub.r]

= [H.sub.g] + [H.sub.r]. (7)

This modification will not affect the formalism of key factor analysis. I now discuss the three specific problems that the key factor index cannot deal with.

1. Qualitative differences in variation in stage survival. - Table 1 shows a set of actual data on stage survival rates of the spruce budworm, Choristoneura fumiferana [Clem.], Lepidoptera: Tortricidae (Royama 1984: [ILLUSTRATION FOR FIGURE 4 OMITTED]). [H.sub.1], [H.sub.2], [H.sub.3], and [H.sub.4] are the log survival rates during the four major stages, i.e., egg, overwintering larva, feeding larva, and pupa, respectively. The log recruitment rate ([H.sub.r]: [H.sub.5] in Royama 1984, 1992) is the log number of eggs laid (X) minus the log number [TABULAR DATA FOR TABLE 2 OMITTED] of moths emerged locally per unit area. R is the log rate of change in egg density from one generation year to the next.

Table 2 lists the values of index [Beta] for all five H's, calculated from Table 1 (excluding the incomplete years, 1945, 1946, and 1959), using Eq. 2 in which [k.sub.i] and K are replaced by [H.sub.i] and R, respectively. According to these index values, conventional key factor analysis would have concluded that [H.sub.r] (log recruitment rate) is the first key component since [[Beta].sub.r] is highest, and [H.sub.3] (log survival of feeding larvae) is the second most important one as [[Beta].sub.3] is second highest. The point is, however, that there are important qualitative differences between the recruitment rate and survival of feeding larvae in their contributions to the pattern of annual fluctuations in the population, a fact the key factor index cannot indicate.

Fig. 1 graphs the data in Table 1 in which [H.sub.1] to [H.sub.4] are combined as the log generation survival rate [H.sub.g] (= [H.sub.1] + [H.sub.2] + [H.sub.3] + [H.sub.4]) in order to enhance the point of my argument. We see that the pattern of fluctuation is very different between [H.sub.g] and [H.sub.r]; [H.sub.g] has a consistent downward trend over the years observed, whereas [H.sub.r] exhibited no such trend but fluctuated widely about its average value. It was this trend in [H.sub.g] that was the primary cause of the increase (to an epidemic level) and decrease (to an endemic level) in the population (measured in log egg densities, X, in Fig. 1) over the years observed. This is because the trend in [H.sub.g] was the cause of the same (overall) trend in R (the log rate of change in density) that declined from positive (indicating a population increase) to negative (indicating a population decrease) values over the years.

A further analysis has shown (Royama 1984, 1992) that the log survival rate of feeding larvae ([H.sub.3]) dictated the trend in the log generation survival ([H.sub.g]) and, hence, was a major cause of the periodic occurrence of an outbreak in this species. The recruitment rate, on the other hand, was not a primary cause of an outbreak, though it might influence how high an epidemic population could climb.

If we are interested primarily in the cause of the periodic occurrence of outbreaks of this species, we should investigate the ecological mechanisms that determine survival of the feeding larvae. In other words, [H.sub.3], rather than [H.sub.r], is a key in this respect, despite the fact that [[Beta].sub.3] is smaller than [[Beta].sub.r] (Table 2).

Furthermore, there is a multitude of factors operating at the larval stage, and there is no single factor that is paramount in influencing budworm survival during an outbreak. Rather, its survival is governed by the intricate interactions among its natural enemy complex, which includes uni- and multivoltine primary parasitoids and their hyperparasitoids (Royama 1992). Thus, studying their interactions is a key to understanding the outbreak processes of this species. The key factor index provides no such insight.

On the other hand, a forest manager may wish to know how high an epidemic population might climb so as to take appropriate action to minimize timber loss to defoliation. Then, the manager should carefully watch the annual changes in recruitment rate during the period of high populations. This is because, in spruce budworm, the recruitment of a new generation can be enormously influenced by the proportion of eggs brought in by immigrating moths. For the manager, then, the dispersal of egg-carrying moths is an important key factor. In other words, the relative importance of a given factor depends on the purpose of investigation. In this sense, a "key factor" cannot be a completely objective entity, and an attempt to quantify it objectively by means of a simple statistical index is bound to fail.

2. The potential importance of a factor that does not vary much over time. - Key factor analysis ignores a factor with a low [Beta] value because the variation in the effect of such a factor contributes little to the variance of R. One might think that, if mortality caused by a given factor did not vary much from year to year, it contributes little to the pattern of temporal variations. That is not true. I demonstrate the point, using two sets of simulated data.

As already shown in Eq. 7, the generation-to-generation rate of change in population is made up of two major components: survival of individuals during one generation and the recruitment of individuals for a new generation (e.g., eggs per adult). In Fig. 2, it is assumed that: (1) the log recruitment rate ([H.sub.r]) is an independent random number with mean = 0.3 and 1.2 in cases I and II, respectively, the variance being 0.01 in both cases; (2) the log generation survival ([H.sub.g]) is the density-dependent process of form

[H.sub.g] = -exp(aX), (8)

in which X is density at the beginning of the generation and parameter a (= 2.5 in both simulations) characterizes the density dependence of [H.sub.g]. Thus, making the generation number explicit by subscript t, the overall population process (to generate the graphs in Fig. 2) is:

[R.sub.t] = [X.sub.t+1] - [X.sub.t]

= [H.sub.rt] + []

= [H.sub.rt] - exp(a[X.sub.t]), (9)

which is a simple logistic process through scramble competition among the members of the population (Royama 1992). Note that the above assumptions and specific parameter values used are for convenience's sake and the point of my contention does not depend on them.
TABLE 3. Calculated values of key factor index [Beta] for the
recruitment rate ([[Beta].sub.r]) and generation survival
([[Beta].sub.g]) in the simulated data, cases I and II, in Fig. 2.

                     I        II

[[Beta].sub.r]     0.656     0.001
[[Beta].sub.g]     0.344     0.999

Table 3 compares the calculated values of the key factor index for the recruitment rate ([[Beta].sub.r]) and generation survival ([[Beta].sub.g]) in both cases. At their face values, the "key factor" is the recruitment rate in case I, whereas it is generation survival in case II. The point is: such a conclusion in not only superficial but even misleading.

Recall first that, between the two cases, every parameter but the mean of [H.sub.r] is identical; even the pattern of fluctuations in [H.sub.r] about its mean is identical (this fact may not be clearly recognizable in Fig. 2 because the [H.sub.r] axes for the two cases differ in scale). Thus, the differences in the pattern and magnitude of fluctuations in [H.sub.g] and, hence, in R between the two cases depend on the difference in the mean of [H.sub.r]. In other words, how much the variation in [H.sub.g] contributes to the variation in R depends on the mean of [H.sub.r]. Even though the variation in R in case II is almost completely dictated by the variation in [H.sub.g], the latter depends ultimately on the mean of [H.sub.r]. The ultimate dependence on [H.sub.r] in this model is due to the fact that the effect of [H.sub.g] on R depends on population density at the beginning of each generation, which is determined by [H.sub.r]. Evidently, we cannot ignore the role played by the recruitment rate even though its [[Beta].sub.r] value is insignificant in case II.

Again, in general, different factors play different roles in determining the process dynamics and their importance cannot be evaluated merely in terms of the variances and covariances of stage survival rates.

3. Key factor as a result of arbitrary stage division in life tables. - Life tables may contain an interval in which mortality factors are multiple, difficult to identify and to evaluate individually so that they have to be lumped as a single k value. An example is the well-known "winter disappearance" in the winter moth (Operophtera brumata) life tables ([k.sub.1] in Varley et al. 1973: Fig. 7.3), which represents "the loss from all causes between the count of females up to the count of fully grown larvae."

Varley and Gradwell (1960) thought that: "the shape of the curve of [k.sub.1] is very like that of K, hence the mortality which we call winter disappearance includes that which is due to the key factor." If the authors considered that the "key factor" was hidden among the lumped mortality factors, their view is rather unlikely to be true. It is most likely that the pattern of variation in [k.sub.1] much resembled that of K merely because [k.sub.1] represented the total effect of many unidentified and/or unquantified mortality factors that collectively, and consistently in each generation, constituted a large portion of K. Unless the lumped k's are negatively correlated with each other so as to negate each other's effect, the [Beta] value of the lumped k's would be greater than that of any one of the individual k's.

If we divided the collective mortality into those caused by a number of specific factors, in an effort to find the hidden "key factor," the contribution of each k value to Var(K), hence the resemblance in pattern between [k.sub.1] and K, would have most certainly been reduced, and the "key factor" might have even disappeared. In other words, the more detailed information we have in the life tables, the less likely we are to find a key factor; that is, the notion of key factors, conceived as the k's contributions to the variance of K, becomes a methodological irony.

In the analysis of life table data, it is sometimes necessary to combine or separate the effects of some stage mortality factors in order to bring out similarities and differences in roles they play in determining the overall pattern of population fluctuations. A typical example is, as already mentioned, the collective action of the complicated network of natural enemies on spruce budworm during its feeding stage. To determine which ones to combine or separate, we need to consider the qualitative aspect of the variations that cannot be indicated by a simple statistical parameter.


The above problems suggest that the concept of "key factors," as those factors playing "key" roles, cannot be defined in simple terms inasmuch as we cannot (and need not) precisely define the notion of "important," "major" or "significant" factors.

Index [Beta] may be used as a measure of the relative importance of k values if there exist, unlike Problem 1, little or no qualitative differences in their variations. One might appeal to correlogram (or autocorrelation) analysis to test such a possibility, although most life table data are not long enough for the analysis to be meaningful (Royama 1992). But this situation is rather likely to be exceptional. Besides, the index would certainly miss an important factor, if any, as in Problem 2, or the face-value interpretation of the index could still be as misleading as in Problem 3.


The essence of the study of population dynamics is to analyze an observed process into major components and then to synthesize them to recreate the dynamics. In general, different factors play different roles in determining the process dynamics. For judging which components are major, the criteria are multiple and subtle beyond the simplistic idea of key factor analysis. The analysis of population processes, in which life table (or survivorship) data provide basic information, requires a much deeper insight than a simple index might provide. In this sense, there will be no simple alternative.

We must look into a more comprehensive method, or a system of methods, to analyze each set of population process data at hand, rather than into a simple and easy-to-use tailor-made method; each set of data may be unique and may, accordingly, require a unique treatment. In other words, there is no easy road to the analysis of population processes. The development of effective methods requires an involved effort, as discussed at length elsewhere (Royama 1992), and is rather too large a subject to even summarize in the short space available here.


The following people read my manuscript in different draft forms: D. T. Quiring (University of New Brunswick), S. L. Pimm (University of Tennessee), D. P. Ostaff, D. G. Embree, and E. S. Eveleigh (Canadian Forest Service). I also benefitted from the critical comments by the anonymous referees. I thank them all for sparing their valuable time.


Harcourt, D. G. 1969. The development and use of life tables in the study of natural insect populations. Annual Review of Entomology 14:175-196.

Manly, B. F. J. 1977. The determination of key factors from life table data. Oecologia (Berlin) 31:111-117.

Metcalfe, J. R. 1972. An analysis of the population dynamics of the Jamaican sugar-cane pest Saccharosydne saccharivora (Westw.) (Hom., Delphacidae). Bulletin of Entomological Research 62:73-85.

Morris, R. F. 1959. Single-factor analysis in population dynamics. Ecology 40:580-588.

Podoler, H., and D. Rogers. 1975. A new method for the identification of key factors from life-table data. Journal of Animal Ecology 44:85-114.

Royama, T. 1984. Population dynamics of the spruce budworm Choristoneura fumiferana. Ecological Monographs 54:429-462.

-----. 1992. Analytical population dynamics. Chapman and Hall, London, UK.

Smith, R. H. 1973. The analysis of intra-generation change in animal populations. Journal of Animal Ecology 42:611-622.

Southwood, T. R. E. 1978. Ecological methods. Second edition. Chapman and Hall, London, UK.

Varley, G. C., and G. R. Gradwell. 1960. Key factors in population studies. Journal of Animal Ecology 29:399-401.

Varley, G. C., and G. R. Gradwell. 1970. Recent advances in insect population dynamics. Annual Review of Entomology 15:1-24.

Varley, G. C., G. R. Gradwell, and M. P. Hassell. 1973. Insect population ecology, University of California Press, Berkeley, California, USA.
COPYRIGHT 1996 Ecological Society of America
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1996 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Royama, T.
Date:Jan 1, 1996
Previous Article:Are there clumps in body-size distributions?
Next Article:The effect of landscape heterogeneity on the probability of patch colonization.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters