# Variance Effective Population Size under Mixed Self and Random Mating with Applications to Genetic Conservation of Species.

IN STUDYING the effect of random drift freon gene frequencies attributable to small populations, Wright (1931) and Fisher (1930) developed the concept and the mathematical theory of effective population size. This is defined as the size of an ideal population that has the same amount of drift in allele frequency or the same rate of decrease in heterozygosity as the actual population. Effective population size can be taken as a measure of the genetic representativeness of a sample of individuals and is an important parameter in quantitative and population genetics because it measures the rate of genetic drift and inbreeding (Caballero, 1994; Wang, 1997; Santiago and Caballero, 1998).

Several kinds of effective sizes have been defined (Crow and Denniston, 1988; Frankel et al., 1995; Lindgren et al., 1996, 1997). One which is interesting in breeding programs and genetic conservation of species is the variance effective number [N.sub.e(v)] which is determined primarily by the number in the offspring generation ([N.sub.t]). A general equation for [N.sub.e(v)] was developed by Crow and Kimura (1970). It considered the effect of drift on allele frequency when a sample of offspring is taken from a reference parental population. It was later extended by Crow and Denniston (1988). On the other hand, a simple expression that relates [N.sub.e(v)] with the natural rate of self-fertilization (s) of species and that can be derived from Cockerham's (1969) intraclass correlation approach is given by [N.sub.e(v)] = n(1 - 0.5s) (Gale and Lawrence, 1984). This practical expression can be applied to both infinite (germplasm collection) or finite (accession regeneration) reference population; however, it is only applicable under very specific circumstances such as, among others, those that avoid family structure of the sampled seed.

Vencovsky (1978) and Crossa and Vencovsky (1997) developed a model for computing the variance of the number of contributed gametes, V(k), and for calculating [N.sub.e(v)] when allele frequency drift is viewed as occurring at two stages during sampling in monoecious species: sampling of parents from the reference population and sampling of gametes from these parents. In this model, a set of monoecious individuals is considered. Some of these contribute both female and male gametes, others contribute only male gametes, and others do not contribute any gametes at all. Using this model, Vencovsky (1987) and Crossa and Vencovsky (1994, 1997) developed formulas adapting [N.sub.e(v)] to specific aspects of germplasm collection and regeneration.

One basic assumption underlying the computation of the variance of the contributed number of gametes, V(k), for two-stage sampling of monoecious species is that there is no covariance between the number of female and male gametes contributed per individual within a set of monoecious plants. This means that the model proposed by Vencovsky (1978) and Crossa and Vencovsky (1997) for estimating predictive equations of V(k) and [N.sub.e(v)] excludes any artificial or natural self-fertilization above panmixia and thus covers a restrictive range of circumstances occurring in natural and experimental populations. It is well known that self-compatible species such as sorghum [Sorghum bicolor (L.) Moench], cotton (Gossypium sp.), coffee trees (Caffea arabica L.), eucalyptus (Eucalyptus sp.), several other forest species, etc. may have a sufficiently high rate of natural self-fertilization (s) to be considered in the category of species with mixed self and random mating. In nature, species may have a wide range of self-fertilization rates that vary from zero (strict allogamous) to one (strict autogamous) (0 [is less than or equal to] s [is less than or equal to] 1).

The genetic structure of a mixed self and random mating population is complex. Seeds collected from this type of species consist of a mixture of selfed ([S.sub.1]) seeds with proportion s, and outcrossed seeds with proportion 1 - s. Families collected from individual seed parents are intermediate between [S.sub.1] and half-sibs families. The proportion s can be changed artificially by hand-pollination from s = 1 with self-fertilization of all plants to s = 0 when plants are intercrossed. On the basis of a one-locus model, Wright and Cockerham (1985) described the complex genetic structure of a population with a mixed mating system under inbreeding equilibrium. Plants from such a population will have different levels of inbreeding, depending on how many generations of selfing each one has in its genetic history or pedigree. The fixation index, as a population parameter, is an expected value that even with constant s for all plants, as usually assumed, will vary among individuals. This will affect the actual [N.sub.e(v)] value or the exact amount of drift expected in a sample of seeds taken from a specific set of seed parents, especially when the number of seed parents is small. With molecular markers, the mating system of species can be studied and their natural rate of self-fertilization (s) and natural level of inbreeding (f) can be estimated and appropriate effective population sizes for mixed self and random mating species can be derived by considering the additional covariance between the number of contributed female and male gametes that arises from self-fertilization.

The objectives of this study were to: (i) develop a general model for obtaining predictive equations of V(k) and [N.sub.e(v)] for mixed self and random mating species by extending the two-stage sampling model of Vencovsky (1978) and Crossa and Vencovsky (1997) to cases where 0 [is less than or equal to] s [is less than or equal to] 1; (ii) provide direct formulas for calculating V(k) and [N.sub.e(v)] for mixed self and random mating; and (iii) adapt these formulas to specific aspects of germplasm collection and regeneration and discuss the effects of different levels of self-fertilization and different mating systems on the estimated values of [N.sub.e(v)].

MODEL FOR MIXED SELF AND RANDOM MATING SPECIES

Basic Scheme and Assumptions

The scheme for representing the basic mixed self and random mating model assumes an initial set of N diploid plants. P parents are expected to contribute both female and male gametes, and are randomly sampled for seed collection. R additional parents are expected to contribute only male gametes. Therefore, the total number of parents randomly sampled from the population that potentially contribute male gametes is M = P + R. N - (P + R) parents do not contribute any gametes at all (Table 1).

Table 1. Number of female and male contributed gametes for selfing and crossing.
```                  Female gametes              Male gametes

Parents       Setting       Crossing      Setting       Crossing

1             [a.sub.1]     [c.sub.1]     [a.sub.1]     [d.sub.1]
2             [a.sub.2]     [c.sub.2]     [a.sub.2]     [d.sub.2]
.             .             .             .             .
.             .             .             .             .
.             .             .             .             .
P             [a.sub.p]     [c.sub.p]     [a.sub.p]     [d.sub.p]
P + 1         O             O             O             O
.             .             .             .             .
.             .             .             .             .
.             .             .             .             .
M = P + R     O             O             O             [d.sub.M]
M + 1         O             O             O             O
.             .             .             .             .
.             .             .             .             .
.             .             .             .             .
N             O             O             O             O
Totals        ns            n(1 -s)       ns            n(1 -s)

Number
of
Parents       Total          seeds

1             [k.sub.1]     [m.sub.1]
2             [k.sub.2]     [m.sub.2]
.             .             .
.             .             .
.             .             .
P             [k.sub.p]     [m.sub.p]
P + 1         O             O
.             .             .
.             .             .
.             .             .
M = P + R     [k.sub.M]     O
M + 1         O             O
.             .             .
.             .             .
.             .             .
N             O             O
Totals        2n            n
```

Definition of terms in the basic scheme of Table 1 are as follows.

1. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the total number of seeds collected from the set of P parents.

2. [a.sub.i] is a variable defined as the number of female and male gametes contributed by Parent i for selfed seed (i = 1, 2, ..., P).

3. [c.sub.i] is a variable representing the number of female gametes contributed by parent i for crossed seed.

4. [m.sub.i] = [a.sub.i] + [c.sub.i] is a variable defined as the number of collected seeds from Parent i and is equal to the number of female gametes contributed by this parent. Note that the total number of gametes contributed by Parent i is [k.sub.i] = [2a.sub.i] + [c.sub.i] + [d.sub.i] = [m.sub.i] + [a.sub.i] + [d.sub.i]. [d.sub.i], is the number of male gametes contributed by Parent i' for crossed seed. These [d.sub.i]' gametes are assumed to be randomly distributed over the set of M = P + R parents (i'= 1,2, ..., M).

When an equal number of seeds is taken from each seed parent (female gametic control) [m.sub.i] is no longer a variable and becomes fixed ([m.sub.1] =[m.sub.2] = ... [m.sub.p] = m) within the given set of P parents. The natural or artificial rate of self-fertilization (s) is assumed constant over parents, and ns and n(1 - s) are the expected totals of gametes contributed for selfed and crossed seeds, respectively. The overall total of contributed gametes is 2ns + 2n(1 - s) = 2n.

Proportions u = P/N (0 [is less than] u [is less than or equal to] 1) and v = M/N (0 [is less than] v [is less than or equal to] 1) correspond to seed and pollen parents, respectively. Ratio u will depend on the reference population considered and the specific circumstances of the sampling process. In germplasm regeneration, u should be near 1 but may be reduced because of poor stand caused by poor germination, insects, diseases, or other factors. When sampling from large natural populations, however, u may be near zero. In practice, it is difficult to know the number of pollen parents (M), particularly in germplasm collection, because it depends on factors such as wind, plant density, pollen dispersal, and pollen vectors. For collecting from large reference populations of panmictic, or mixed self and random mating species, it can be assumed that M [right arrow] [infinity]. This is an oversimplification and is considered here for the purpose of calculating an upper limit for [N.sub.e(v)]. For strict autogamic species (s = 1), however, M = P and u = v. In germplasm regeneration, M is finite and may be known.

Basic Equations, Notations, Definitions, and Assumptions

The basic expression of Crow and Kimura (1970) (p. 357, Eq. [7.6.3.26]) is considered, with [[Alpha].sub.t-1] = f as a measure of departure from Hardy-Weinberg equilibrium in the parental population (Generation t - 1)

[1] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where n = N, is the number of offspring in Generation t and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and are the variance and mean of the number of contributed gametes in Generation t - 1. Wright (1969) ignored the Gaussian correction N/N-1 and considered the ratio [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] instead of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The use of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is relevant only when N is small. In the derivations given here the Gaussian correction is also ignored.

Coefficient f is the level of fixation (inbreeding coefficient) of the parental population (Generation t - 1), such that the expected genotypic frequencies (Q) for homozygous [A.sub.1][A.sub.1] and heterozygous [A.sub.1][A.sub.2] individuals are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [Q.sub.12] = 2[p.sub.1][p.sub.2](1 - f), respectively ([p.sub.1] and [p.sub.2] are the frequencies of alleles [A.sub.1] and [A.sub.2], respectively).

If a population is in inbreeding equilibrium, the relationship between the natural rate of self-fertilization (s) and the natural level of inbreeding (f) is f = s/2-s (Haldane, 1924; Wright and Cockerham, 1985). Inbreeding equilibrium is meant for populations with zygotic proportions remaining constant over generations according to Wright's Equilibrium Law. It implies that the population has a past history of reproduction by a combination of outcrossing and selling, under a constant rate. It also implies that the population is left to reproduce naturally, that is, no hand pollination or any interference is made. The inbreeding equilibrium condition in normal circumstances of germplasm collection and regeneration is a reasonable assumption for neutral alleles. As will be described later, however, inbreeding equilibrium is not a necessary condition in the mathematical derivation of V(k) and [N.sub.e(v)] given here for mixed self and random mating.

For the derivation of V(k), means, variances, and covariances involving Variables m, a, c, and d (Table 1) are necessary. These quantities are herein represented by the following symbols. If x denotes any of these variables, then [V.sub.p](x), [E.sub.p](x) = [[bar]x.sub.P], and V(x), E(x) = [bar]x are the variance and mean of x measured in relation to the set of P seed parents or in relation to the whole set of N initial individuals, respectively. Also V(x/ [m.sub.i]), E(x/[m.sub.i]) are the variance and expected value of x, conditional to the number of seeds of the ith parent. For covariances the corresponding symbols are [cov.sub.p](x,y) or cov(x,y) when relative to P or N, respectively. Concerning the distribution and expectation of these variables, the following is assumed.

1. m is binomial within set P such that [V.sub.p](m) = n/P(1 - 1/P); when seeds are sampled randomly. This is a simplified assumption. The actual variance of m may be larger because of variation in fertility; also [[bar]m.sub.P] = n/P.

2. a is binomial when conditional to [m.sub.i] with V(a/[m.sub.i]) = s(1 - s)[m.sub.i]; E(a/[m.sub.i]) = [sm.sub.i], consequently c is also binomial conditional to [m.sub.i] with V(c/[m.sub.i]) = s(1 - s)[m.sub.i]; E(c/[m.sub.i]) = (1 - s)[m.sub.i]. With constant s over parents, it follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. For the total number of contributed gametes, the average is [bar]k = 2n/N relative to the initial set of N plants.

It is also assumed that, within a set of P parents, there is no covariance between the number of male gametes (d) contributed for outcrossed seeds and the number of female gametes (a) contributed for selfed seeds and also no covariance between d and the number of female gametes (c) contributed for outcrossed seeds, or that [cov.sub.p](a,d) = cov(c,d) = 0. With female gametic control [V.sub.P](m) = 0.

DERIVATION OF V(k) AND [N.sub.e(v)] UNDER MIXED SELF AND RANDOM MATING FOR RANDOM SAMPLING OF SEEDS OR FEMALE GAMETIC CONTROL

The splitting of V(k) used here corresponds to the two-stage sampling model shown by Vencovsky (1978) and Crossa and Vencovsky (1997), where sampling is first of zygotes and subsequently of gametes to constitute the next generation. For random sampling of seeds, the variance V(k) is required for deriving [N.sub.e(v)]. Because [k.sub.i] = [m.sub.i] + [a.sub.i] + [d.sub.i] it follows that V(k) = V(m) + V(a) + V(d) + 2 cov(m,a) + 2 cov(m,d) + 2 cov(a,d) all terms relative to N.

It can be shown that the variance of the number of contributed gametes divided by k, for random sampling of seeds (RS) under mixed self and random mating, is as follows

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(Appendix 1).

Thus, for random sampling of seed (RS) and unrestricted inbreeding (UI), Eq. [1] becomes

[2] [N.sub.e(v)] = 2n/[D.sub.2]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Assuming inbreeding equilibrium (IE), f = s/2-s, Eq. [2] can be written as

[3] [N.sub.e(v)] = n (2 - 1)/[D.sub.3]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For female gametic control (GC), that is taking an equal number of seeds from each plant, the basic scheme and the component terms of V(k) are similar to that shown for random sampling of seeds except that now [m.sub.1] = [m.sub.2] = ... = [m.sub.p] = m.

Consequently, the ratio V(k)/[bar]k becomes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(Appendix 1).

Thus, Eq. [1] for female gametic control (GC) and unrestricted inbreeding (UI) becomes

[4] [N.sub.e(v)] = 2n/D4

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Assuming inbreeding equilibrium (IE), Eq. [4] f = s/2 - s' Eq. [4]

reduces to

[5] [N.sub.e(v)] = 2n(2 - 1)/[D.sub.5]

where

[D.sub.5] = [[(1 + s).sup.2][n(1 - u)]/P

+ (1 - s)[n(1 - v)(3 + s) - 1]/M

+ (3 + s)(1 - s)].

Equations [2] and [4] are unrestricted with respect to s and f. They are, for instance, applicable when gametes are manipulated through hand pollination in regeneration practices. For a set of individuals with a given f value, s = 1 corresponds to selfing all plants to obtain [S.sub.1] offspring while s = 0 corresponds to crossing plants with avoidance of selfing. If inbreeding equilibrium (IE) may be assumed, then Eq. [3] and [5] are adequate.

Equations [2] through [5] for computing [N.sub.e(v)] are general; however, under more restricted conditions they can be adapted and evaluated for some practical situations encountered in germplasm conservation.

ADAPTATION OF [N.sub.e(v)] FOR GERMPLASM COLLECTION AND REGENERATION FOR ARBITRARY SELFING RATE

Assumptions underlying the derivation of Eq. [2] through [5] allow the estimation of [N.sub.e(v)] for species with a given rate s and can be extended to the extreme cases of panmixia (s = 0) or complete autogamy (s = 1). Specific aspects of germplasm collection or regeneration will be considered under different mating systems.

Germplasm Collection

In germplasm collection the reference population is here considered to be of infinite size (N [right arrow] [infinity]) and the number of pollinator plants potentially large (M [congruent] N), such that v [congruent] 1. The number of seed parents (P), from which the sample of n seeds is taken, is considered much smaller than the entire population (N) implying that u = P/N [congruent] 0. Incorporating these assumptions into Eq. [2] through [5] leads to simplified expressions, as can be seen next.

For the case of random sampling of seeds (RS) and unrestricted inbreeding (UI), (RS - UI), Eq. [2] can now be written as

[6] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Similarly, Eq. [3] (RS - IE) simplifies to

[7] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

With female gametic control (GC) and unrestricted f(UI), Eq. [4] becomes

[8] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For the GC sampling scheme and with inbreeding equilibrium (GC-IE), Eq. [5] reduces to

[9] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Comparisons of Eq. [6] with Eq. [8] and Eq. [7] with Eq. [9] indicate that effective size attained in germplasm collection with female gametic control is equal to or larger than the corresponding [N.sub.e(v)] expected when seeds are sampled randomly. Some effective population numbers for seed collection obtained from Eq. [7] and [9] and assuming a reference population of infinite size under inbreeding equilibrium are given in Table 2. [N.sub.e(v)] is clearly reduced as s increases. In absolute values, the advantage of GC over RS of seed is also smaller with higher rates of s and becomes negligible when a larger number of seeds is collected per seed parent (n/P = 10 versus n/P = 50 in Table 2). With higher levels of self-fertilization, natural heterozygosity is reduced; consequently, a larger number of seed parents would need to be sampled to reach adequate effective sizes in germplasm collection.

Table 2. Effective population size, [N.sub.e(v)], for germplasm collection, relative to n = 1000 seeds collected from P = 100 and P = 20 plants for random sampling of seeds (RS) and for female gametic control (GC) for different values of the natural rate of self-fertilization (s), and natural inbreeding (f). Inbreeding equilibrium is assumed and the reference population is of infinite size.
```                        P = 100                   P = 20

s        f         RS        GC([double       RS        GC([double
([dagger])    dagger])     ([dagger])     dagger])

0.0    0.000     285.9         307.7        74.1           75.5
0.1    0.053     236.2         255.2        59.0           60.0
0.3    0.176     162.8         177.0        38.5           39.2
0.5    0.333     113.3         123.7        25.8           26.3
0.7    0.538      79.1          86.6        17.5           17.9
0.9    0.818      54.9          60.3        11.9           12.2
1.0    1.000      45.5          50.0         9.8           10.0
```

([dagger]) Estimates of [N.sub.e(v)] were computed using Eq. [7].

([double dagger]) Estimates of [N.sub.e(v)] were computed using Eq. [9].

Examples given in Table 2 also illustrate the advantage of collecting n/P = 10 seeds from P = 100 parents, instead of collecting n/P = 50 seeds from P = 20 parents. For the case in which P = 100, the average effective number per seed parent [[bar]N.sub.e(v)] = [N.sub.e(v)]/P varies from 2.86 to 0.46 under RS and from 3.08 to 0.50 with GC for s = 0 and s = 1, respectively. The highest possible average effective number ([[bar]N.sub.e(v)] = 4) expected for panmictic species in Hardy-Weinberg equilibrium (s = f = 0) was not attained in this example because of the relatively small number of seeds assumed to have been sampled per parent. On the other hand, for s = 1, all parental individuals being homozygotes, [[bar]N.sub.e(v)] = 0.5 with female gametic control, which corresponds to the sampling of a single gamete from an idealized random mating population.

Expressions [6] through [9] can be further adapted to the conditions of specific mating systems Keeping the assumptions already mentioned for germplasm collection, expressions were obtained from Eq. [7] (RS-IE) and Eq. [9] (GC-IE), for panmictic, mixed self and random mating, and self-fertilizing populations.

Panmictic Species

When s = f = 0 expressions for effective number are for RS [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and for GC [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (from Eq. [7] and [9], respectively). These equations were also given by Crossa and Vencovsky (1994) for monoecious species.

Mixed Self and Random Mating (Intermediate) Species

Considering, for illustration, an intermediate rate of self-fertilization, s = 0.5, such that f = 1/3, expressions for effective number are for RS [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and for GC [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Self-Fertilizing Species

When s = f = 1 effective number are for RS [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and for GC [N.sub.e(v)] = 0.5P. This set of equations, for s = 0.0, 0.5, and 1.0, among other aspects, can be used for quantifying in what circumstances the advantage of GC over RS becomes negligible as sample size (n) increases. Note that for autogamic species (s = 1) and with GC, [N.sub.e(v)] depends only on the number of seed parents (P), and not on the number of seeds collected (n). With sufficiently large sample sizes (n), [N.sub.e(v)] becomes dominated by P for all mating systems. In fact as n [right arrow] [infinity] the effective size tends to [N.sub.e(v)] [right arrow] 2P(2 - s)/[(1 + 2).sup.2] = P/[2[Theta].sub.m] (from Eq. [7] or [9], where [[Theta].sub.m] = [(1 + s).sup.2]/4(4 - s) is the coancestry among sibs within maternal families for equilibrium populations (Cockerham and Weir, 1984). For s = 0.0, 0.5, and 1.0, this coancestry is [[Theta].sub.m] = 1/8, 3/8, and 1.0, respectively, and the corresponding [N.sub.e(v)] limits are [N.sub.e(v)] = 4P, (4/3)P and 0.5P. On the other hand, when only one seed is taken per parent (n/P = 1) the effective numbers obtained from Eq. [9] for s = 0.0, 0.5, and 1 are [N.sub.e(v)] = P, (3/4)P, and 0.5P, respectively. Thus, for a fixed P, it is possible to increase [N.sub.e(v)] by increasing n/P (n/P = 1 versus n/P [right arrow] [infinity]) when s [is less than] 1. However, for s = 1, increasing n/P does not increase [N.sub.e(v)]. Hence, in germplasm collection, the values of [N.sub.e(v)] depend primarily on the number of parents (P) but also on the number of seeds collected per parent (n/P) when s [is less than] 1.

Note that for germplasm collection from an infinite base population (u [congruent] 0), [N.sub.e(v)]from Eq. [7] and [9] is equivalent to [N.sub.e(v)] expressed as a function of intraclass correlations (Appendix 2) as given by Cockerham (1969).

Germplasm Regeneration

With N individuals of a given accession planted in the field, all potentially producing both male and female gametes, the reference population is of finite size. For the regeneration process, it may be assumed that only a fraction u (0 [is less than] u [is less than or equal to] 1) of these N individuals contribute gametes to the next generation, such that P = M = uN and u = v. This condition may simulate the loss of seeds of the original accession because of germination problems or to any other factor affecting the viability of plants in the field. Some situations with practical applications will be discussed.

Inbreeding Equilibrium (IE) and Arbitrary Sample Size (n)

As already mentioned, when seeds are produced without interference in the regeneration process and the natural rate of self-fertilization remains unchanged, the assumption of inbreeding equilibrium is reasonable. Consequently, for the P = M = uN scheme and seeds sampled randomly (RS), Eq. [3] reduces to

[10] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Similarly, for P = M = uN and female gametic control (GC) Eq. [5] becomes

[11] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

More optimistically, it may be assumed that the integrity of the original accession is maintained such that M = N (0 [is less than or equal to] s [is less than] 1) or v = 1, but seeds are collected from P = uN plants only. Now, with random sampling of seeds (RS) Eq. [3] leads to

[12] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

and with female gametic control (GC) Eq. [5] becomes

[13] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In the process of seed regeneration, with preclusion of hand pollinations, rate s is inherent to a given accession. Since accession size N is also given in advance, only sample size (n) and proportions u and v can be manipulated or monitored in a regeneration plan.

Examining Eq. [10] through [13], it is seen that [N.sub.e(v)] increases linearly with sample size (n) when an accession is regenerated prior to any loss of viability and all original plants (N) can potentially contribute female and male gametes to the offspring (u = 1). In addition, for autogamic species (s = 1) and also u = 1, [N.sub.e(v)] = [infinity] with GC, for any sample size (n).

For the P = M = uN scheme (Eq. [10] and [11]), allowing n to become large enough (n [right arrow] [infinity]), effective size for both sample at this limit, is [N.sub.e(v)] = 0.5N(2 - s)u/1 - u for both sampling procedures (RS and GC). Hence, even when the quantity of regenerated seeds is very large, [N.sub.e(v)] can be small and much less than the original census number N, the determining factor being the proportion of functional parents u or, more precisely, the ratio u/(1 - u). This ratio varies from [approximately equals] 0 to infinity, for 0 [is less than] u [is less than or equal to] 1, being, for instance, equal to 1 and 9 for u = 0.5 and u = 0.9. A nine-fold gain in representativeness of the regenerated seeds is, therefore, expected in this example (u = 0.9 vs u = 0.5). Because in many situations u can be equated with the germination rate of an accession, results point clearly toward the extreme importance of monitoring this rate in the management of germplasm banks. In terms of [N.sub.e(v)] the reward for maintaining the proportion u at adequate levels is highly compensating and more than might be initially expected.

For the M = N and P = uN scheme, u is the proportion of plants chosen as seed parents. From Eq. [12] and [13] the upper limit of effective size, as n increases (n [right arrow] [infinity]), is now [N.sub.e(v)] = 2N(2 - s)[u/1 - u]/[(1 + s).sup.2], which is higher than that expected from Eq. [10] and [11] (for 0 [is less than or equal to] s [is less than] 1) because of the larger pollen pool available when M = N. However, the ratio u/(1 - u) plays the same role as that discussed for the P = M = uN scheme.

Equations [10] through [13] can be applied to answer questions that may arise in the management of germplasm banks. Consider, for instance, an accession originally containing N seeds. If this material is to be regenerated only when the germination rate drops to 80%, what sample size (n) would be needed to assure that effective size is maintained at least at a level of [N.sub.e(v)] = N? This will depend on the natural rate of self-fertilization and the sampling procedure. Assuming inbreeding equilibrium and female gametic control, and taking u = v = 0.8; P = M = 0.8N, with [N.sub.e(v)] = N, Eq. [11] furnishes n = N - 0.42 [congruent] N, for panmictic species (s = 0). A sample of n = N offspring has an effective size of [N.sub.e(v)] = N despite the loss of 20% of seeds of the original accession. This loss in the parental generation is being compensated for by female gametic control. Taking seeds at random from the P = 0.8N parents would require a larger sample, as given by Eq. [10], namely n =(4/3)N - 0.83 [congruent] 1.33N (for s = 0) for attaining [N.sub.e(v)] = N.

Considering a mixed self and random mating species, the same situation discussed above would require n = 0.875N - 0.31 [congruent] 0.875N for s = 0.5 and female gametic control. Hence, the mixed mating system (intermediate s) favors regeneration, as a correspondingly smaller sample would be necessary for maintaining [N.sub.e(v)] = N. Under s = 1 the value given by Eq. [11] is [N.sub.e(v)] = 2N, regardless of sample size n. This last outcome, which is relative to a polymorphic accession of an autogamic species, shows that effective size is always kept at [N.sub.e(v)] = 2N, even when the germination rate is reduced to 80% in the original accession. However, the condition is the practice of female gametic control and the collection of at least one seed from each one of the P = 0.8N parental plants. Note that, in this case, what is assumed is the regeneration of a polymorphic (heterogeneous) accession with complete natural self-fertilization (e.g., land race of an autogamic species with s = 1). This situation is entirely different from regeneration of pure lines (homogeneous accession, with s = 1) where the theory and application of the [N.sub.e(v)] measure is not necessary.

Inbreeding Equilibrium (IE) and Constant Population Size (n = N)

In the maintenance of germplasm bank accessions through regeneration, the census number of the regenerated sample of seeds (n) may remain the same as the original number (N) for a given accession. That is, population size is kept constant from one generation to the next (n = N).

With n = N, expressions adapted for regeneration become simpler. Table 3 shows four such expressions (Eq. [14] through [17]) that assume inbreeding equilibrium. Equation [14] is obtained from Eq. [11] and the negative term from the denominator -(1 - s)/P has been ignored. Equation [15] is derived from Eq. [13] and the negligible negative term from the denominator that has been ignored is -(1 - s)/N. Equation [16] is calculated from Eq. [10] and the neglected term is -[(0.5s(1 + s) + 1]/P. Equation [17] is from Eq. [12] and the neglected term is -[[(1 + s).sup.2]/(2P) + (1 - s)/ (2N)]. The value of each neglected terms is near zero for large values of P and/or N that may be expected in practice. In Eq. [14] through [17], [N.sub.e(v)] depends on three parameters: the natural rate of self-fertilization (s), accession size (N), and the proportion of functional parents (u) contributing gametes to the seeds.

[TABULAR DATA 3 NOT REPRODUCIBLE IN ASCII]

Results of evaluating Eq. [14] through [17] for n = N = 100 and some values of u and s are given in Table 4. For several combination of parameters (s and u), effective size is smaller than the census number. As expected, GC has a positive effect on [N.sub.e(v)] and its efficiency is now more pronounced as s increases, especially under higher u values. For high values of u (u = 0.9) and GC, [N.sub.e(v)] increases as s increases, that is, the more autogamic an accession becomes, the less effort is required for its regeneration because its [N.sub.e(v)] can be kept high (from Eq. [14], [N.sub.e(v)] = 116, 137, 199, and 450 for s = 0.0, 0.5, 0.8, and 1.0, respectively). On the contrary, when u is low (u = 0.5) and with GC, [N.sub.e(v)] decreases as s increases; that is, the more autogamic an accession becomes, the more effort is required for its regeneration. For example, Eq. [14] gives [N.sub.e(v)] = 57, 52, 50, and 50 for s = 0.0, 0.5, 0.8, and 1.0, respectively. Table 4 shows also that for high values of u (u [approximately equals] 1) and GC, panmictic or close to panmictic species (s [approximately equals] 0) attained much lower values of [N.sub.e(v)] than those expected for more autogamic species. However, panmictic species are less affected when accessions drastically deteriorate (low values of u; e.g., u = 0.5).

Table 4. Effective population size, [N.sub.e(v)], for germplasm regeneration with constant population size (n = N) and inbreeding equilibrium [f = s/(2 - s)] for arbitrary rate of self-fertilization (0 [is less than or equal to] s [is less than or equal to] 1) and proportion of effective parents (0 < u [is less than or equal to] 1). Sample size n = N = 100, with random sampling (RS), female gametic control (GC), and two mating schemes: seed parents P = Nu and pollen parents M = Nu and seed parents P = Nu and pollen parents M = N.
```                               [N.sub.e(v)]

Seed parents [right arrow] P = Nu

Pollen parents [right arrow] M = Nu

s       f      u         RS([dagger])     GC([double
dagger])

0.0   0.00    0.5            50               57
0.7            70               85
0.9            90              116
0.5   0.33    0.5            38               52
0.7            53               87
0.9            68              137
0.8   0.67    0.5            30               50
0.7            42               97
0.9            54              199
1.0   1.0     0.5            25               50
0.7            35              117
0.9            45              450

[N.sub.e(v)]

Seed parents [right arrow] P = Nu
Pollen parents [right arrow] M = N

s       f      u             RS                GC
([sections])    ([paragraph])

0.0   0.00    0.5            80              100
0.7            90              117
0.9            97              129
0.5   0.33    0.5            48               75
0.7            60              111
0.9            71              150
0.8   0.67    0.5            33               60
0.7            45              112
0.9            55              214
1.0   1.0     0.5            25               50
0.7            35              117
0.9            45              450
```

([dagger]) Estimates of [N.sub.e(v) were computed using Eq. [16].

([double dagger]) Estimates of [N.sub.e(v) were computed using Eq. [14].

([sections]) Estimates of (N.sub.e(v) were computed using Eq. [17].

([paragraph]) Estimates of [N.sub.e(v) were computed using Eq. [15].

Results in Table 4, although drawn from a specific case, show again that choosing or maintaining proportion u at an adequately high level and adopting gametic control is a very effective combination for keeping [N.sub.e(v)] values at high levels, in the management of germplasm banks. In fact, [N.sub.e(v)] can be made equal to or larger than n within a reasonable range of situations. On the other hand, with seeds sampled randomly (RS), [N.sub.e(v)] is smaller than n in all cases, and decreases as the rate of autogamy increases.

Because certain degrees of accession deterioration or loss can be recovered by gametic control, searching for a range of acceptable u values is meaningful. Consider the mating scheme where parents are P = M = Nu such that u can be taken as the germination rate. Now, if apart from the condition of constant population size, it is also required that [N.sub.e(v)] = N = n, Eq. [14] can be solved for u, and the result being u* = 4/[s.sup.2] + 5 assuming GC. An evaluation of this quantity gives u* = 0.80, 0.76, and 0.67 for s = 0.0, 0.5 and 1.0, respectively. Hence, a loss of 20% of seeds within accessions of species in general, seems to be acceptable if a standard procedure of female gametic control is adopted for regeneration. Accessions of monoecious species can also be regenerated through both female and male gametic control (plant to plant hand pollinations and equal number of seeds taken per plant) (Crossa and Vencovsky, 1994). Now, with n = N and [N.sub.e(v)] = n, the resulting proportion is u* = 0.67 as for self-fertilizing species. Note that for strictly autogamic species, female gametic control implies control over both types of gametes. Another way to recover losses within accessions is through increments in sample size (n [is greater than] N). As discussed in the preceding section, however, there are limits to this strategy.

The alternative mating scheme in which P = uN but M = N (Eq. [15] and [17]) is considered here for the purpose of investigating how the sampling of seed parents (P out of N) can affect effective size over the range of s values. Since, by assumption, no loss has taken place in the original accession, the potential contribution of the pollen pool is not negligible and is maximized when s = 0. However, Table 4 shows that when seeds are sampled randomly, maintaining [N.sub.e(v)] = n requires very high u values, even for this extreme case (s = 0). Discarding, for instance, only 10% of the seed parents during seed regeneration (u = 0.9), the effective number will be [N.sub.e(v)] = 97 for a sample of 100 seeds. These numbers are considerably reduced as the contribution of the male gametic pool is diminished when s increases. It is only with female gametic control that a certain quantity of seed parents can be neglected during seed harvesting in the regeneration process. In fact, with the requirement that [N.sub.e(v)] = n, and with constant population size, Eq. [15] furnishes u* = 0.5[(1 + s).sup.2]/[(1 + s).sup.2] - s] (if (1 - s)/N [approximately equals] 0). For s = 0.0, 0.5, 0.8, and 1.0 this proportion is u* = 0.5, 0.64, 0.66, and 0.67, respectively. Hence, under this specific mating structure (P = uN and M = N) and with female gametic control, discarding at most 1/3 of the seed parents when seeds are harvested will maintain [N.sub.e(v)] as large as or greater than the census number, for species in general.

As already mentioned, these two mating schemes, P = uN and M = N versus P = M = uN, are the same when the species is autogamic. Therefore, Eq. [14] and [15] are equivalent when s = 1 as are Eq. [16] and [17] (Table 4).

[N.sub.e(v)] for Germplasm Regeneration through Selting

Selling can be an option for germplasm regeneration, for species that naturally self-fertilize, and more generally, when field isolation in time or space is not possible. For species with high rates of cross-fertilization, this alternative has one obvious and serious restriction, namely inbreeding depression, although it still can be recommended, in such cases, for retaining useful alleles carried by specific accessions.

With artificial selfing, [N.sub.e(v)] can be computed by considering that in a base population (accession) of size N, P = Nu plants are selfed and n seeds are subsequently collected. Assuming, for simplicity, a constant population size (n = N) and that an equal number of seeds is collected in each [S.sub.1] progeny, Eq. [4] can be applied, considering s = 1. The resulting expression is then

[18] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As u approaches 1, [N.sub.e(v)] (Eq. [18]) tends to 2N, 4N and [infinity] for f = 0.0, 0.5, 1.0, respectively. For comparison with previous results, consider a reference accession under Hardy-Weinberg equilibrium (f = 0). Equation [18] now simplifies to [N.sub.e(v)] = 2Nu/2 - u. This now refers to n = [NS.sub.1] offspring and is identical to that shown by Crossa and Vencovsky (1994, Table 1) for biparental crosses and gametic control. This equivalence is expected when effective number is derived on the basis of gene frequency variations due to random sampling. With selfing and collecting progenies of equal size, an equal number of both female and male gametes is contributed by parents to the selfed offspring as with fullsib families of equal size. Hence, when control is exerted on both types of gametes, the genotypic constitution of the offspring generation is irrelevant for measuring [N.sub.e(v)].

A particular feature of Eq. [18] is detected when the condition [N.sub.e(v)] = n is imposed. This condition is met when u = 0.67, meaning that selfing 0.67N or more plants of an accession, regardless of the mating system of the species, will assure that [N.sub.e(v)] [is greater than or equal to] N at the offspring generation when population size is kept constant (n = N) and an equal number of seeds is collected per family.

The effectiveness of regeneration through artificial selfing may be questioned for species with a high rate of natural self-fertilization. This suggests comparing this alternative (Eq. [18]) with regeneration by natural reproduction (NR) or open pollination. Expressions [18] and [14] can be used for an insight but recalling that both equations are for constant population size and gametic control. Selling is expected to be superior to the NR alternative. However, this advantage becomes negligible as u decreases, but it is pronounced for higher values of u (Table 5). It is noteworthy that under inbreeding equilibrium a f = 0.9 value, for instance, corresponds to a natural rate of self-fertilization of s = (2f)/ (1 + f) = 0.947, although selfing accessions is expected to be a better option than allowing plants to reproduce naturally for regeneration under constant population size. However, decisions regarding this alternative will primarily depend on anticipated depression from selfing, the viability of the procedure, and costs incurred.

Table 5. [N.sub.e(v)]/N ratio for germplasm regeneration through natural reproduction (NR) assuming inbreeding equilibrium [f = s/(2 - s) referring to the parental generation] or selfing ([S.sub.1] with constant population size (n = N), gametic control and varying proportions of effective parents (u). Offspring from Nu parents.
```                             u

0.20                    0.50
f         S([dagger])   NR([double   [S.sub.1]    NR
dagger])
0.7         0.144         0.141        0.540     0.503
0.8         0.137         0.135        0.526     0.501
0.9         0.131         0.130        0.513     0.500
1.0         0.125         0.125        0.500     0.500

u

0.67                  0.80
f          [S.sub.1]    NR      [S.sub.1]     NR

0.7          1.00      0.880      1.74      1.40
0.8          1.00      0.914      1.82      1.55
0.9          1.00      0.954      1.90      1.74
1.0          1.00      1.00       2.00      2.00

u

1.00
f          [S.sub.1]         NR

0.7          6.67           3.49
0.8         10.00           5.14
0.9         20.00          10.14
1.0       [infinity]     [infinity]
```

([dagger]) For selfing, estimates of [N.sub.e(v)] were computed using Eq. [18].

([double dagger]) For natural reproduction, estimates of [N.sub.e(v)] were computed using Eq. [14].

DISCUSSION

In germplasm collection from reference populations of infinite size, it was seen that effective size decreases as the rate of natural self-fertilization increases (Table 2), regardless of the type of gametic control adopted. The additional effort required in seed collection when s [is greater than] 0, as compared with the expected [N.sub.e(v)] obtained when s = 0, can be investigated. Consider two sets of n seeds stemming from [P.sub.0] and [P.sub.s] seed parents belonging to populations with s = 0 and s [is greater than] 0, respectively. Equating the corresponding [N.sub.e(v)] values, the ratio [P.sub.s]/[P.sub.0] can be taken as a measure of this additional effort, in terms of the number of seed parents to be sampled. Allowing n [right arrow] [infinity], [7] and [9] lead to [P.sub.s]/[P.sub.0] [right arrow] [2[(1 + s).sup.2]]/ (2 - s) = [8[Theta].sub.m]. This ratio is not affected by the level of inbreeding of the collected seeds because, for sufficiently large n, [N.sub.e(v)] is dominated by the within families coancestry ([[Theta].sub.m]). Assuming s = 0.5 and s = 1, [P.sub.s]/[P.sub.0] = 3 and 8, respectively. This result indicates that to attain the same effective size as expected for a panmictic species (s = 0), three or eight times more seed parents would be necessary in the sampling process, respectively for these two levels of s.

In practice, it is not always reasonable to assume a very large n or a high number of seeds per parent (n/ P). Hence consider, alternatively, that the set of n seeds is sampled to maximize the number of parents, such that n/P = 1. In this circumstance the family structure of the seeds is avoided because a single seed is sampled from each seed parent. Now Eq. [9] reduces to

[N.sub.e(v)] = 2n(2 - s)/[(1 + s).sup.2] + (3 + s)(1 - s) = n(1 - 0.5s)

since n/P = 1. With inbreeding equilibrium (IE), this expression can also be written as [N.sub.e(v)] = n/(1 + f) =[P.sub.s]/(1 + f) and, consequently, [P.sub.s]/[P.sub.0] = 1 + f. Because seeds are now assumed unrelated, this ratio is solely a function of the inbreeding coefficient, with a limit value of 2 for f = 1. The actual value of the [P.sub.s]/[P.sub.0] ratio, therefore, lies between [8[Theta].sub.m] and 1 + f depending on the structure of the sample or, basically, on the n/P mean number.

Expression [N.sub.e(v)] = n(1 - 0.5s), or its corresponding [N.sub.e(v)] = n/(1 + f) which can be derived from Cockerham's (1969) intraclass correlation approach (Appendix 2), assuming m = 1 and P = n and inbreeding equilibrium, is only applicable, however, under very specific circumstances, as can be shown by imposing further restrictions to some of the more general equations developed in this study. To illustrate this the following three cases are examined.

Case 1

Germplasm is collected from an infinite base population and a random sample of n seeds is obtained from a potentially large number of seed parents. In this case, if P [right arrow] [infinity] and M [right arrow] [infinity] (which excludes populations with s = 1 for which M = P; M, however, now becomes negligible since 1 - s = 0), Eq. [3] can be written, under inbreeding equilibrium, as

[N.sub.e(v)] = n(2 - s)/2 = n(1 - 0.5s) = n/1 + f.

Case 2

Similar to Case 1, but assuming that only M [right arrow] [infinity] p is finite. With u [congruent] 0, Eq. [5] becomes

[N.sub.e(v)] [right arrow] 2n(2 - s)/[(1 + s).sup.2] + (3 + s)(1 - s)

= n(1 - 05s)

if a single seed is sampled per parent, or if n/P = 1, as already discussed.

Case 3

Consider regeneration of germplasm. A random sample of n seeds is taken from a finite reference population and population size is kept constant (n = N). Now, with P = M = uN scheme (Table 3, Eq. [16]), [N.sub.e(v)] [congruent] nu(1 - 0.5s), which is equivalent to the previous equations (Cases 1 and 2) when no losses took place in the original accession (u = 1).

It is interesting to note that under these specific conditions an equivalent amount of change in gene frequency due to drift is expected to occur in both germplasm collection and accession regeneration. In other words, [N.sub.e(v)] attained when a sample is regenerated assuming u = 1 and random sampling (Case 3) is the same as that expected when germplasm is collected and seeds are sampled from an infinitely large number of seed and pollen parents (Case 1), or when sampling of seeds is with female gamete control, the set of seed parents (P) is finite, and a single seed is taken from each one (Case 2). However, despite the different applications of [N.sub.e(v)] = n(1 - 0.5s) = n/(1 + f), it should be used only under the conditions previously discussed. Equations developed in this paper, for the two-stage mixed self and random mating model are more general and have sufficient flexibility to cover a wider array of situations.

Plants from a population with a mixed mating system will have different levels of inbreeding (Wright and Cockerham, 1985). A given plant x, having t generations of natural selfing in its pedigree, will have [f.sub.x] = 1 - [(1/2).sup.t] and its occurrence in the population is given by the probability [P.sub.x] = (1 - s)[s.sup.t]. Even with constant s for all plants, as assumed, [f.sub.x] will vary among individuals with variance [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (Cocherham and Weir, 1984). With s = 0.5 this variance is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] with a standard deviation of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which are quite large values. Within such a population, heterogeneity of the fixation index will certainly affect the actual amount of drift expected in a sample of seeds taken from a specific set of P seed parents (especially when P is small). Equations derived in this study for measuring [N.sub.e(v)], therefore, should be taken as expected values with a reliability that increases as P becomes larger.

Genetic markers have been used extensively to study the mating systems of species. This allows estimation of parameters that are required for calculating [N.sub.e(v)] in connection with conservation and preservation activities. In the estimation process, sampling errors inherent to s, f, or other quantities will affect the precision of computed [N.sub.e(v)] values. Data provided by Santos (1991) on oil palm (Elaeis guineensis Jacq.) are used here to illustrate this point. These data refer to a set of n = 97 trees, randomly sampled from a large local population and genotyped for 9 isozyme loci. Estimates of f were inconsistent among loci, ranging from f = -0.069 to f = 0.381, with an average, pooled over loci, of f = 0.145. This last estimate leads to [N.sub.e(v)] = 97/(1 + 0.145) = 84.7, for the sample of n = 97 trees. This estimate is valid only if it can be assumed that all sampled trees are genetically unrelated. The error associated with f, using bootstrapping over loci, is [[Sigma].sub.f] = 0.041, giving 95% confidence limits for [N.sub.e(v)] of 79.3 and 91.5. The heterogeneity of f among loci is responsible for this relatively large confidence interval. Thus, it is evident that estimates of f based on individual loci are not very informative, and that for more precise estimation of [N.sub.e(v)], a larger number of loci would be needed.

CONCLUSIONS

A general two-stage sampling model for estimating predictive equations for V(k) and [N.sub.e(v)] for mixed self and random mating species was developed. It was shown that specific [N.sub.e(v)] formulas found in the literature are particular cases of expressions derived under the more general model developed here.

For germplasm collection, results showed that [N.sub.e(v)] is reduced as s increases. This increase in difficulty for collecting from more autogamic species applies to both GC and RS. Even though GC always provides higher [N.sub.e(v)] values than RS, this difference becomes negligible when seeds are collected from a smaller number of seed parents (higher n/P). Thus, the important factor in seed collection is primarily the number of seed parents (P). The number of seeds collected per parent n/P is less important and becomes irrelevant as s approaches 1. On the other hand, as s approaches zero, increasing n/P, for a given value of P, will increase [N.sub.e(v)] up to a limit of 4P (n/P [right arrow] [infinity]).

In accession regeneration, the two major factors for maintaining high values of [N.sub.e(v)] are (i) practicing female gametic control, and (ii) maintaining the integrity of the accession (avoiding loss) at acceptable levels. These factors become even more important as s increases. As opposed to germplasm collection, it is easier to regenerate more autogamic accessions, assuming GC, IE, and high u because values of [N.sub.e(v)] can be maintained above N. However, when seed accessions deteriorate severely or experience losses (low u), [N.sub.e(v)] decreases more when the accession becomes increasingly autogamic. That is, more panmictic species are less affected by drastic decreases in the proportion of functional parents. Thus, for achieving appropriate [N.sub.e(v)]'s, increasing autogamy in polymorphic materials makes seed collection more difficult, but accession regeneration much easier, provided excessive loss of accessions is avoided (u [is greater than or equal to] 0.80).

In germplasm regeneration, loss of up to 20% within accessions can be recovered by practicing female gametic control. Thus, a seed viability of at least 80% is a safe lower limit for u. This study showed that artificial selfing followed by female gametic control maintained higher [N.sub.e(v)] than natural reproduction. This approach can therefore be considered an option for regenerating accessions.

The equations developed in this study for estimating V(k) and [N.sub.e(v)] of mixed self and random mating species should be taken as approximations; their accuracy depends on the number of seed parents sampled. Other factors, not considered in the model developed in this study, that can give rise to inaccurate estimates of [N.sub.e(v)] are (i) plant-to-plant variation of the rate of natural self-fertilization caused by uncontrolled factors, (ii) different plant fertility and female and male gametes reproductive success (Morgan, 1998), and (iii). differential mating rates between plants inside the local neighborhood and outside the local neighborhood. Further investigation, based on simulation or on Bayesian statistics, are possible means for clarifying some of these issues.

It is worth noting that two very distinct approaches for estimating [N.sub.e(v)], namely Eq. [7] and [9] for germplasm collection and Cockerham's (1969) intraclass correlations led to the same effective size expressions, for sampling from an infinite base populations in inbreeding equilibrium. This equivalence reinforces the adequacy of the model developed in this study.

Abbreviations: GC, female gametic control; RS, random sampling; IE, inbreeding equilibrium; UI, unrestricted inbreeding.

REFERENCES

Caballero, A. 1994. Developments in the prediction of effective population size. Heredity 73:656-679.

Cockerham, C.C. 1969. Variance of gene frequency. Evolution 23: 72-84.

Cockerham, C.C., and B.S. Weir. 1984. Covariances of relatives stemming from a population undergoing mixed self and random mating. Biometrics 40:157-164.

Crossa, J., and R. Vencovsky. 1994. Implications of the variance effective population size on the genetic conservation of monoecious species. Theor. Appl. Genet. 89:936-942.

Crossa, J., and R. Vencovsky. 1997. Variance effective population size for two-stage sampling of monoecious species. Crop Sci. 37:14-26.

Crow, J.F., and M. Kimura. 1970. An introduction to population genetics theory. Burgess Publishing, Minneapolis, MN.

Crow, J.F., and C. Denniston. 1988. Inbreeding and variance effective numbers. Evolution 42(3):482-495.

Fisher, R.A. 1930. The genetical theory of natural selection. Rev. Ed., 1958. Clarendon Press, Oxford, UK.

Frankel, O.H., A.H.D. Brown, and J.J. Burdon. 1995. The conservation of plant biodiversity. Cambridge Univ. Press, Cambridge, UK.

Gale, J.S., and M.J. Lawrence. 1984. The decay of variability. In J.H.W. Holden and J.T. Williams (ed.) Crop genetic resources: Conservation and evaluation. Allen and Unwin, London.

Haldane, J.B.S. 1924. A mathematical theory of natural and artificial selection. Am. J. Human Genet. 24:1-10.

Lindgren, D., L.D. Gea, and P.A. Jefferson. 1996. Loss of genetic diversity monitored by status number. Silvae Genetica 45(1):52-59.

Lindgren, D., L.D. Gea, and P.A. Jefferson. 1997. Status number for measuring genetic diversity. For. Genet. 4(2):69-72.

Morgan, M.T. 1998. Properties of maximum likelihood male fertility estimation in plant populations. Genetics 149:1099-1103.

Santiago, E. and A. Caballero. 1998. Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics 149:2105-2117.

Santos, M.M. 1991. Polimorfismo isoenzimatico de populacao subespontaneca de dende (Elaeis guineensis Jacq.) Doctoral Thesis, Faculty of Medicine, Ribeirao Preto, Univ. Sao Paulo.

Vencovsky, R. 1978. Effective size of monoecious populations submitted to artificial selection. Brazil J. Genet. I(3):181-191.

Vencovsky, R. 1987. Tamanho elective populacional na coleta e preservacao de germplasma de especies alogamas. Instituto de Pesquisas e Estudos Florestais 35:79-84. ESALQ-USP, Piracicaba, Sao Paulo, Brazil.

Wang, J. 1997. Effective size and F-statistics of subdivided populations. I. Monoecious species with partial selfing. Genetics 146:1453-1463.

Wright, S. 1931. Evolution in mendelian populations. Genetics 16: 97-156.

Wright, J., and C.C. Cockerham. 1985. Selection with partial selfing I. Mass selection. Genetics 109:585-597.

APPENDIX 1

Derivation of V(k) and [N.sub.e(v)] for Random Sampling of Seeds

Components of V(k)

V(m). [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Crossa and Vencosky (1997) demonstrated that this splitting was adequate to represent the two stage sampling process.

V(m) = u n/p(1 - 1/P) + u(1 - u)[n.sup.2]/[P.sup.2]

with u = P/N and dividing by [bar]k = 2n/N

V(a). Variable a is not necessarily binomial, within set P. However, it is binomial when conditional to [m.sub.i]. Therefore,

[V.sub.P](a) = s(1 - s)n/P + [s.sup.2]n/P(1 - 1/P) = ns/P(1 - s/P).

This is not a binomial variance, except when s = 1.

The variance of a relative to N is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with u = P/N and dividing by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

V(d). The fact that variable d is not correlated with a and c within set P means that it is not correlated with the number of seeds sampled (m). This allows the assumption that d is a binomial random variable within the set of M parents. This assumption of no correlation is realistic; it is reasonable to assume that the outcome of taking a smaller or larger number of seeds from a given parent has no relationship to the number of contributed male gametes that plants spread over a field and that ultimately generate crossed seeds produced by other plants in the area. Therefore, under these assumptions

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

with v = M/N and dividing by [bar]K

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

2cov(m,a). 2cov(m,a) = 2cov[(a + c),a] = 2V(a) +

2cov(a,c). The first term of the right hand side is already known. For cov(a,c) consider that [m.sub.i] = [a.sub.i] + [c.sub.i], such that within set P [2cov.sub.P](a,c) = [V.sub.P](m) - [V.sub.P](a) - [V.sub.P](c).

Recalling that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Observe that a and c are negatively correlated within P (for 0 [is less than] s [is less than] 1). The covariance between a and c relative to N is 2cov(a,c) = [ucov.sub.P](a,c) + u(1 - u)[bar][a.sub.P] [bar][a.sub.P] With [bar][a.sub.P] = ns/P and [bar][c.sub.P] = n(1 - s)/P

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With u = P/N and dividing by [bar]k.

2 cov(a,c)/[bar]k = s(1 - s)/P[n(1 - u) - 1]

Adding the term 2V(a)/[bar]k

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

2cov(m,d)

2cov(m,d) = 2cov[(a + c),d] = 2cov(a,d) + 2cov(c,d)

Because it was assumed that [cov.sub.P](a,d) = 0 then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With u = P/N and dividing by [bar]K

2cov(a,d)/[bar]k = s(1 - s)(1 - v)n/M

With the assumption of no correlation between c and d within P, the remaining term is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

With u = P/N and dividing by [bar]k

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Summing up both terms

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The last term of V(k)/[bar]k, as already shown, is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Collecting all terms of V(k)/k leads to:

(i) terms without u and v

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii) terms with (1 - u)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and

(iii) terms with (1 - v)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Therefore, for random sampling of seeds

where D = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Derivation of V(k) and [N.sub.e(v)] for Female Gametic Control (Taking Equal Number of Seeds Per Plant)

Components of V(k)

V(m). Since m is constant within set P, V(m) contains only the between sets component

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Introducing u and dividing by [bar]k

V(m)/[bar]k = n/ 2P(1 - u)

V(a). Relative to P

[V.sub.P](a) = [E.sub.P][V(a/m)] + [V.sub.P][E(a/m)]

= [E.sub.P][S(1 - s)m] + [V.sub.P][sm]

Since [V.sub.P](sm) = 0 then

[V.sub.p](a) = s(1 - s)m = s(1 - s)n/p

Relative to N

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

since the mean or expected value of a within P is ns/P.

With u = P/N and dividing by [bar]k

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

V(d). For the variance of d the same argument and derivation used for the case of random sampling of seeds is applied. Therefore

V(d)/[bar]k =(1 - s/2)[1 - 1/M + (1 - v)n(1 - s)/M]

2cov(m,a). Since m is constant within set P

2cov(m,a) = 2u(1 -u)[[bar]m.sub.P][[bar]a.sub.P] =2u(1 - u)n/P ns/P

and

2cov(m,a)/[bar]k = n/P s(1 - u)

2cov(m,d). Similarly,

with u = P/N and dividing by [bar]k

2cov(m,d)/[bar]k = s(1 - s)(1 - v)/M

2cov(a,d). The last term of V(k), 2cov(a,d), is derived in a way similar to that shown for random sampling of seeds

2cov(a,d)/[bar]k = s(1 - s)(1 - v) n/M

For obtaining V(k)/[bar]k, collect:

(i) terms without u and v

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii) terms with (1 - u)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and

(iii) terms with (1 - v)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Summing up all component terms of V(k)/[bar]k

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Therefore, [N.sub.e(v)] for female gametic control (equal number of seeds taken per seed parent) becomes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

APPENDIX 2

[N.sub.e(v)] as a Function of Intraclass Correlations

It is useful to compare Eq. [7] and [9] for germplasm collection assuming sampling from infinite base populations with the developments shown by Cockerham (1969) where [N.sub.e(v)] is expressed as function of intraclass correlations ([Theta];F) and sample size. With gene frequency dispersion due solely to random genetic drift (as assumed here) and taking Cockerham's intraclass correlations as [Theta] = [[Theta].sub.m] (coancestry within groups or families) and F = f (fixation index for the offspring generation), [N.sub.e(v)] relative to a set of P randomly chosen families, according to Cockerham (1969), derives from

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(for P independent families) where m is the family size assumed constant, as with female gametic control. Replacing [[Theta].sub.m] and f by their equilibrium values, which by assumption remain constant over generations, and since the total number of seeds sampled is n = mP, the following is obtained from the last expression:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

since m = n/p. This expression is identical to Eq. [9] for [N.sub.e(v)]. Similarly, if the number of seeds ([m.sub.i]) is not constant and follows a binomial distribution within the set of P seed parents, the intraclass correlation approach leads to Eq. [7].

The point to be stressed here is that, considering the family structure of seeds collected from an infinite base population (u [approximately equals] 0), two rather distinct approaches, the intraclass correlation and the two-stage sampling under mixed self and random mating, led to the same [N.sub.e(v)] expression.

Roland Vencovsky and Jose Crossa(*)

R. Vencovsky, Dep. Genetica, ESALQ-Univ. Sao Paulo, Cx.P. 83, 13.400-970, Piracicaba, SP, Brazil; J. Crossa, Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Lisboa 27, Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico. Research partially supported by CNPq (The Brazilian Council for the Development of Research and Technology). Received 12 May 1998. (*) Corresponding author (JCROSSA@CIMMYT.MX).3
COPYRIGHT 1999 Crop Science Society of America
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1999 Gale, Cengage Learning. All rights reserved.

Author: Printer friendly Cite/link Email Feedback Vencovsky, Roland; Crossa, Jose Crop Science Statistical Data Included 1USA Sep 1, 1999 11151 Marker-Assisted Best Linear Unbiased Prediction Single-Cross Performance. Comparison of Selection Strategies for Marker-Assisted Backcrossing of a Gene. Plant biotechnology Plant breeding Plant genetics Plant populations

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters