Additive main effects and multiplicative interaction model: II. theory on shrinkage factors for predicting cell means.
The AMMI model for the cell means is
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [y.sub.ij], [micro], [g.sub.i] and [e.sub.j] are the observed (ij)th cell mean, the overall mean, and the ith genotype and jth environment effects, respectively; [[lambda].sub.k] is the singular value (square root of the eigenvalue) for the kth PC axis; interaction parameters [[alpha].sub.ik] and [[gamma].sub.jk] are elements of the kth singular vector for genotypes and environments, respectively, and are interpretable as scores for the contribution of the ith genotype and the jth environment, respectively, to the kth PC; [[epsilon].sub.ij] is the residual, which includes the residual interaction not accounted for by the multiplicative terms and the contribution of experimental error to the cell mean. In the saturated AMMI model, the maximum number of principal components is equal to the smaller of [(m - 1), (n - 1)], m and n being the number of genotypes and environments, respectively. The AMMI model with 1, 2, 3, ..., etc., multiplicative PC components are characterized as truncated AMM[I.sub.1] AMM[I.sub.2], AMM[I.sub.3], etc. The truncated AMMI, with only the main effects of genotypes and environments, but without interaction, is called AMM[I.sub.0].
The random data splitting and cross validation procedure (Gauch, 1988; Gauch and Zobel, 1988, 1989) with the RMSPD criterion has been used for selecting the best truncated multiplicative model. In this procedure, some subset of replicates form the data used for fitting the model and the remaining replicates comprise the validation data (Gauch and Zobel, 1988, 1989; Crossa et al., 1990; Crossa and Cornelius, 1994). Since the choice of the best truncated multiplicative model depends on the number of replicates involved in the model data (Moreno-Gonzalez et al., 2003), the RMSPD criterion may not select the best truncated multiplicative model obtainable when all replications are used for fitting the model. Moreno-Gonzalez et al. (2003), in the context of an "eigenvalue partition method" (EVP), proposed the root mean square predictive difference (RMSP[D.sub.EVP]) criterion for selecting the best truncated AMMI model that can be applied to cell means involving all replications.
Cornelius et al. (1993, 1996) and Cornelius and Crossa (1995, 1999) proposed shrinkage factors for multiplicative models as a way to improve the prediction of cell means in a two-way table of GEI data. Shrinkage factors reduce the absolute value of the GEI terms in the multiplicative models because values of these shrinkage factors are always within the interval [0,1]. The authors showed that shrinkage estimation of multiplicative models produce better predictions of cultivar performance than truncated multiplicative models and are often also better predictors than BLUP on the basis of a two-way random effects model. One of the advantages of shrinkage factors is that they are computed from the complete data set, whereas the truncated models chosen by the RMSPD criterion of the random data splitting and cross validation procedure are from a modelling subset.
The shrinkage factors defined by Cornelius et al. (1993, 1996) and Cornelius and Crossa (1995, 1999), herein named CCC method, were constructed by analogy to shrinkage factors involved in empirical BLUPs in a two-way random effects model with interaction. The shrinkage estimates of the shrunken interaction term of the AMMI, GREG, SREG, and COMM multiplicative models was defined as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [S.sub.k] is the shrinkage factor for the kth PC; [[lambda].sub.k], [[alpha].sub.ik], and [[gamma].sub.ik] are as previously defined for unshrunken AMMI. Shrinkage estimation of SHMM is more complicated than for the other multiplicative models, computation of which requires an iterative algorithm (Cornelius and Crossa, 1995, 1999).
A derivation and theoretical justification for the shrinkage estimators is given in Appendix B of Cornelius and Crossa (1999). The resulting formula for [S.sub.k] is
 [S.sub.k] = ([[lambda].sup.2.sub.k] - [u.sub.k] [[sigma].sup.2.sub.k]/r)/[[lambda].sup.2.sub.k]
provided that this gives [S.sub.k] > 0; otherwise put [S.sub.k] = 0. In Eq.  [[lambda].sup.2.sub.k] is the empirical eigenvalue of the kth PC axis; r is the number of replications; [[G].sup.2.sub.e] is the error mean square, and [u.sub.k] is a parameter, rather analogous to degress of freedom (df), that multiplies [[sigma].sup.2.sub.k]/r in an expression for the expectation of the eigenvalue. A first approximation suggested for [u.sub.k] was the number of df of Gollob's (1968) approximate F-test; i.e., [u.sub.k] = m + n -1 - 2k (number of parameters in the kth multiplicative term minus number of constraints on those parameters).
In the companion paper, Moreno-Gonzalez et al. (2003) developed an EVP method to estimate the contribution of the GEI variance and the error variance to each AMMI PC axis. The EVP method was able to select the same truncated AMMI model than the conventional RMSPD cross validation criterion. However, the EVP has the advantage over the RMSPD cross validation that it can be applied to all replicates of the trial.
A potentially useful alternative strategy for estimating the shrinkage factors is to derive them from the EVP method by determining the contributions of interaction variance and error variance to the AMMI eigenvalues (Moreno-Gonzalez et al., 2003).
The objectives of the study were (i) to develop shrinkage factors for the multiplicative terms of the AMMI model on the basis of the EVP method of Moreno-Gonzalez et al. (2003) and (ii) to compare AMMI models fitted by shrinkage factors obtained by the EVP and CCC methods (the latter using Gollob's df as value of [u.sub.k]), unshrunken parsimonious AMMI models fitted by least squares and chosen by cross validation, and BLUP predictions based on a two-way model with main effects of cultivars and environments considered as fixed effects and the GEI term considered as random effect. These four estimation methods were compared by cross validation using the RMSPD.
MATERIALS AND METHODS
Estimation of Shrinkage Factors by the Eigenvalue Partition Method
Following the definition and terminology of Moreno-Gonzalez et al. (2003), the predicted interaction effect [z.sup.*.sub.ijp] for each cell in the AMM[I.sub.p] model can be expressed as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [z.sub.ijk] is the kth PC term of [z.sup.*.sub.ijp].
Since the shrinkage factors should multiply the interaction terms of the model, the shrunken interaction term can be written as
 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [z.sub.ijp] is the shrunken interaction term; [S.sub.k] is the shrinkage factor for the kth PC axis; and p [less than or equal to] min (m - 1, n - 1), m and n being the number of rows (genotypes) and columns (environments) in a matrix array, respectively. If number of sites is greater than number of genotypes, for expedient computation, the genotypes may be taken as columns and the sites as rows.
The expected mean square difference between the shrunken predicted means and the true means [i.e., the mean square error of predicted means (MSEPM)] over all cells can be computed by an approach similar to Eq.  of Moreno-Gonzalez et al. (2003). If their Eq.  is squared after substituting [z.sup.*.sub.ijp] and [Y.sub.ijv] for [z.sub.ijp] and [y.sub.ij], respectively, the following is obtained.
 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [a.sub.ij], [z.sub.ijk], and [y.sub.ij] are the additive main component, the GEI effect of the kth PC axis, and the true mean for the ij cell, respectively; [[sigma].sup.2.sub.GE] and [[sigma].sup.2.sub.e]/r are the structural GEI and error variance components associated with cell means in the ANOVA, respectively; r, m, and n are the number of repetitions, genotypes, and environments, respectively; p = min (m - 1, n - 1); [g.sub.k] and [e.sub.k] are the estimated adjusted coefficients of the structural GEI and error variance components for the kth PC axis, respectively; terms (m + n - 1) [[sigma].sup.2.sub.e]/mnr, and (m - 1)(n - 1) [[S].sup.2.sub.k][e.sub.k] [[sigma].sup.2.sub.e]/mnr are the error components associated with [a.sub.ij] and [z.sub.ijk], respectively;
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
is the structural variance component associated with the difference between the full true GEI effects and those predicted by the shrunken model.
The [S.sub.k] parameters can be estimated by minimizing MSEPM. Therefore, by taking the partial derivative of MSEPM with respect to [S.sub.k] and equating it to zero, the following expression was obtained
(m - 1)(n -1) [2[S.sub.k]([e.sub.k] [[sigma].sup.2.sub.e]/r + [g.sub.k] [[sigma].sup.2.sub.GE]) - 2[g.sub.k][[sigma].sup.2.sub.GE]]/mn = 0
 [S.sub.k] = [g.sub.k] [[sigma].sup.2.sub.GE]/[e.sub.k][[sigma].sup.2.sub.e]/r + [g.sub.k] [[sigma].sup.2.sub.GE] for k = 1, 2,...., n
Equation  was originally constructed by Cornelius et al. (1993) and Cornelius and Crossa (1995) by analogy to shrinkage factors involved in BLUPs in a two-random effects model with interaction. Further theoretical considerations were discussed by Cornelius et al. (1996) and a complete theoretical justification was given by Cornelius and Crossa (1999). Equation  was based on the EVP theory (Moreno-Gonzalez et al., 2003). Equation  is equivalent to Eq.  because [[lambda].sup.2.sub.k] = (m - 1)(n - 1)[[g.sub.k] [[sigma].sup.2.sub.GE] + [e.sub.k] [[sigma].sup.2.sub.e]/r] (Moreno-Gonzalez et al., 2003), and [u.sub.k] can be made equivalent to (m - 1)(n - 1) [e.sub.k]. The similarity of both equations, which were derived by different approaches, can be considered a reciprocal crosscheck for both methods.
No connection between the derivations of Eq.  and Eq.  was apparent when the senior author developed the heretofore unpublished Eq. . The merit of Eq.  and  will depend on the accuracy of the [u.sub.k] and [e.sub.k] estimates, respectively. Cornelius et al. (1993, 1996) and Cornelius and Crossa (1995) used Gollob's df as a first approximation of the [u.sub.k] values, and then, because Gollob's df is known to be an appropriate measure of error absorption by the multiplicative terms only for terms for which the true [lambda] value is very large relative to error variance (Goodman and Haberman, 1990), they employed a computer simulation scheme (parametric bootstrap) to obtain improved values.
The simulation scheme can be iterated as many times as desired. However, in a subsequent cross validation study (Cornelius and Crossa, 1999) involving five multienvironment cultivar trials, shrinkage estimation using Gollob's df performed virtually as well as the subsequent simulation estimates of the [u.sub.k] values. The most fundamental difference between the EVP and the CCC method is that the EVP method uses a nonparametric data resampling method, rather than relying on Gollob's df or a simulation scheme, to estimate the expected error absorption by the multiplicative terms.
The standard error of predicted means (SEPM) can be estimated by taking the square root in Eq.  after substituting [S.sub.k] from Eq. .
 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The shrinkage factors were estimated in yield data from three multienvironment trials that had been used for the EVP method of AMMI models (Moreno-Gonzalez et al., 2003) and for the GEAR model of Moreno-Gonzalez and Crossa (1998). Trial 1 is a multienvironment experiment including 16 triticale (X Triticosecale Wittpmack) cultivars with four replications in randomized complete blocks (RCBD) evaluated at 10 environments in Spain during 1989 (Royo et al., 1993). Trial 2 is a CIMMYT maize (Zea mays L.) international trial where eight maize genotypes were arranged in a randomized complete block design with four replications at each of 33 sites scattered across the tropical region in 1987. Trial 3 comprises 11 broad bean (Vicia faba L.) genotypes arranged in randomized complete block design with three replications grown at 10 environments in southern Spain (Cubero and Flores, 1995).
To test the applicability of the EVP-based shrinkage factors in a wide range of situations, simulation data sets were generated from the original empirical data by adding to each observed cell mean a random error component for each replication in each trial. The random error effects in the simulation data sets came from normal distributions with mean zero and arbitrary standard deviations of 2500, 1500, 1000, 750, 600, 400, and 240 kg [ha.sup.-1]. Sixty-three simulation data sets were generated for each arbitrary standard deviation in Cases 1 and 2; 42 simulation data sets were generated in Cases 3 and 4; and 84 simulation data sets in Case 5. Cases 1 to 5 will be described in the next section.
Random Data Splitting
Since all trials were arranged in RCB designs, data were first adjusted to remove the block effects at each site (Cornelius and Crossa, 1999). Similarly to Moreno-Gonzalez et al. (2003), cross validation was performed by splitting the data into model ([r.sub.m]) and validation data ([r.sub.v]). Five cases of data splitting were studied for the simulation and empirical data of Trials 1, 2, and 3. In Cases 1 and 2 involving Trials 1 and 2, respectively, three replicates were randomly selected for each genotype at each site to form the model data ([r.sub.m] = 3) and the remaining replicate ([r.sub.v] = 1) was used as validation data. In Cases 3 and 4 involving Trials 1 and 2, respectively, [r.sub.m] =2 and [r.sub.v] = 2. In Case 5 involving trial 3, [r.sub.m] = 2 and [r.sub.v] = 1. Four random splitting events were performed on each of the 63 simulation data sets of Cases 1 and 2, in such a way that the four replicates of each cell were involved, once each, in the four one-replicate validation data sets. Likewise, six random splitting events were done on each of the 42 simulation data sets of Cases 3 and 4, in such a way that the four cell replicates were involved, three times each, in the six two-replicate validation data sets. Also, three random splittings were made on each of the 84 simulation data sets of Case 5, in such a way that the three cell replicates were involved, once each, in the three one-replicate validation data sets. For empirical data, the same random splitting procedure was used, but assignment of random replicates to each cell was done 63 times in Cases 1 and 2; 42 times in Cases 3 and 4; and 84 times in Case 5. In total, 252 random model data sets were validated for each of the situations and cases studied.
Shrinkage Factors, Cross Validation Procedure, AMMI, and BLUP
Shrinkage factors were computed (i) by substituting the adjusted coefficients of variance components [g.sub.k] and [e.sub.k] estimated from the EVP method (Moreno-Gonzalez et al., 2003) into Eq. ; and (ii) by the CCC method (Eq. ) (Cornelius et al., 1993, 1996; Cornelius and Crossa, 1995, 1999) using the df according to Gollob as the estimate of Re. lf shrinkage factor estimates were negative, they were assigned a zero value.
The shrinkage factors were studied for different situations in each trial. Shrunken predicted terms [z.sub.ijp] were computed from Eq.  and added to the additive main component [a.sub.ij] to obtain the shrunken predicted means for the accumulative PC axis of the AMMI model. All PC axes were involved in the computations. The procedure was applied to the cell means of model data for each case. The RMSPD between the shrunken predicted means and the validation cell means was computed by taking the square root of the average of the squared predictive differences [([y.sub.ij] - [y.sub.ijv]).sup.2] across all cells and split data sets, where [y.sub.ij] is the cell mean predicted from the modelling data and [y.sub.ijv] is the corresponding observed validation data value. The cross validation RMSPD criterion was used to validate and compare the accuracy of two different shrunken methods: the adjusted EVP method and the CCC method using the Gollob df as estimate of [u.sub.k]. In addition, the RMSPD was also computed for the predicted means of the conventional truncated AMMI model, where the shrinkage factor of each PC axis retained in the model is unity and is zero for the truncated (i.e., omitted) PC axes.
The BLUP from a two-way table having random interaction effects was obtained for each cell mean on the same simulation and empirical data as the shrunken models, using the PROC MIXED of SAS (SAS Institute Inc., 1999). The main effects in BLUP were considered fixed for fair comparison to the other methods. The RMSPD cross validation criterion was also applied to the BLUP estimates. The same model data subset from each random splitting event was used for comparing the four methods: CCC, EVP, AMMI, and BLUP.
Shrinkage factors were also estimated for trials including all replicates. Since no replicate was left for validation, the SEPM was estimated for the EVP method from Eq. . The SEPM for AMMI was estimated from Eq.  (Moreno-Gonzalez et al., 2003) after removing the error variance associated with the validation observation. The BLUP estimates and the standard error of the BLUP predicted cell means were obtained from the PROC MIXED from SAS (SAS Institute Inc., 1999).
RESULTS AND DISCUSSION
Shrunken Models, Truncated AMMI, and BLUP
Nonadjusted shrinkage factors of the EVP method as compared with those of the CCC method using Gollob's df were smaller for the large singular values, whereas they were larger for the small singular values (Table 1). The adjusted shrinkage factors of the EVP method were always slightly smaller than the nonadjusted ones (Table 1). Since shrinkage factors are the ratio of the estimated structural GEI variance to the entire variance for each PC axis (Eq. ), it seems that the EVP method as compared with the CCC method is able to absorb a larger proportion of the error variance component in earlier PC axes and a smaller proportion in the remaining axes.
The RMSPD cross validation criterion was determined for models that included the accumulative PC axes. The shrunken model based on the EVP theory was generally the best among all models studied. Results show that this model had the least RMSPD in 31 out of the 40 situations compared (Table 2). The BLUP was better than the EVP model only for those situations of Cases 3, 4, and 5 where the simulated standard error was very large, viz., 2500. In these situations, the error variance associated with the simulation cell means, 3125000, was 6.9, 12.8, and 21.3 times as large as the structural GEI variance component, 453 107, 244 650, and 146 425, for Trials 1, 2, and 3, respectively. When the ratio of the error variance to the structural GEI variance is high, the estimate of the structural GEI coefficients, [g.sub.k], for the PC axis must be adjusted because of negative estimates (Moreno-Gonzalez et al., 2003). Thus, precision of the EVP shrinkage factors is reduced as the error variance increases. BLUP was also better than the EVP method when the simulation standard error for cases 4 and 5 was 1500. In these situations, the ratio of the cell means error variance to the structural GEI variance was also high, i.e., greater than 4.5.
Comparison of the shrunken and truncated AMMI models showed that the two shrinkage methods (adjusted EVP and CCC methods) were better than the truncated least squares AMMI models for most of the situations in the three trials (Table 2). A truncated least squares AMMI model was better than both EVP and CCC methods only for the empirical data of Trial 3, but the differences among the three methods in that case were rather trivial (RMSPD values 402.9, 402.4, and 401.6 for EVP, CCC, and truncated AMMI, respectively). The best truncated least squares AMMI model was worse than the other methods in all other situations that had error variances nearly the same as the empirical Trial 3 (i.e., standard deviation 400, 600, and 750).
Truncated AMMI was better than the CCC method but not better than EVP, in Cases 2, 3, 4, and 5 with simulated data and standard deviation 2500. This is probably because Gollob's df will not accurately express the error absorption by the large eigenvalues when the data are extremely noisy (Cornelius, 1993). The best truncated least squares AMMI model in all four of these situations was the additive model (AMM[I.sub.0]). It appears that, in general, CCC shrinkage estimates using Gollob's df as the value of [u.sub.k] are better than truncated least squares AMMI, and the EVP method is better than both truncated AMMI and BLUP based on a two-way random effects model, except for situations where the ratio of variance of a cell mean to GEI variance is nearly as large as, or larger than, 4.5. Such situations do not occur frequently in multienvironment cultivar trials. Indeed, if they do, such is an indication of a need for either more replications or better selection of experimental sites with respect to within-site uniformity of experimental conditions. These results agreed with Cornelius and Crossa (1999) who found shrinkage estimates to be better predictors of the validation data than truncated multiplicative models in five multienvironment cultivar trials.
The choice of the best truncated model depends on the experimental error variance and the number of replications used for forming the modeling data. In Trial 1 for model data with three replicates, AMM[I.sub.9], AMM[I.sub.2], AMM[I.sub.1], and AMM[I.sub.0] were selected for standard errors 400, 1000, 1500, and 2500, respectively, whereas for modeling data with two replicates, AMM[I.sub.9], AMM[I.sub.1], AMM[I.sub.1] and AMM[I.sub.0] were selected for standard errors 400, 1000, 1500, and 2500, respectively (Table 1). In Trial 2, AMM[I.sub.6], AMM[I.sub.1], AMM[I.sub.1], and AMM[I.sub.0] were selected for model data with three replicates and standard errors 600, 1000, 1500, and 2500, respectively, whereas AMM[I.sub.7], AMM[I.sub.0], AMM[I..sub.0], and AMM[I.sub.0] were selected for modeling data with two replicates and standard errors 600, 1000, 1500, and 2500, respectively (Table 2). In Trial 3, AMM[I.sub.9], AMM[I.sub.1], AMM[I.sub.1], and AMM[I.sub.0] were selected for standard errors 240, 600, 750, and 1000, respectively. Thus, the best truncated AMMI model for cell means including all replications is unknown, since the models cannot be validated and the choice based on validation with a lesser number of replications may not be correct.
Therefore, the shrinkage methods had the following advantages over the truncated AMMI model: (i) a clear criterion exists for including all PC axes in the shrunken models, whereas an adequate criterion for selecting the best number of the first PC axis is lacking in truncated AMMI; (ii) the shrunken models are better cell mean predictors than the truncated AMMI models, since their validation RMSPD estimates were generally smaller than those of AMMI.
The shrunken CCC and EVP models were generally better cell mean predictors than BLUPs for the three trials and the model data with two and three replicates (Table 2). Formulas of shrinkage factors for the CCC, EVP methods, and BLUP have similar structure (Cornelius et al., 1993, 1996; Cornelius and Crossa, 1995, 1999), but the CCC and EVP methods provide for different values for the shrinkage factors to be applied to the individual PCs that make up the whole interaction, whereas the BLUP method based on a two-way random effects model with interaction applies the same shrinkage factor to all interactions. Thus, it seems logical that minimization of the error for each single PC should yield better results than minimization of the error for the entire interaction. Application of shrinkage factors to the random additive effects produced a negligible improvement in the BLUP estimates of the three trials (data not shown). The BLUPs were better predictors of cell means than the best truncated AMMI models for all trials, except for the two-replicate model data of Trial 1 with simulation standard errors 600, and 1000, and the empirical data themselves (Case 3 of Table 2), and also for the empirical Trial 3.
Comparisons among Shrunken Models
The adjusted EVP shrinkage method gave smaller RMSPD values than did the CCC shrinkage method [with Gollob's df used to estimate [u.sub.k]] for all cases and trials except the simulation data of Case 3 with standard deviation 750 and 1000, and the empirical data of Case 4 (Table 2), but the differences between the two methods were rather trivial as long as error variance was not exceedingly large. Differences between the two methods were largest when standard deviation was 2500. Failure of CCC to perform as well as EVP in these cases can be explained by the fact, previously mentioned in the context of comparisons with truncated AMMI, that Gollob's df will tend to underestimate error absorption by the large eigenvalues if the data are extremely noisy (Cornelius, 1993). Apparently, in cases of extremely noisy data, one should either use the EVP method or the CCC method with [u.sub.k] values estimated by simulation. While the CCC method with [u.sub.k] estimated by Gollob's df is less computationally intensive than the EVP method, the reverse is true if the CCC method is used with [u.sub.k] estimated by simulation. The results obtained here, along with the results of Cornelius and Crossa (1999) (who found little difference in RMSPD results for the CCC method using Gollob's df as compared with estimating [u.sub.k] by simulation), suggest that there is little or nothing to lose, and, in some cases some improvement in accuracy (or precision) to be gained (particularly when data are very noisy), by using EVP shrinkage estimators as an AMMI model fitting method.
Models with All Replications and SEPM
The shrinkage factors based on the adjusted EVP method were estimated for empirical data of Trials 1, 2, and 3, when all replicates were involved in the model data. Since no replication was left for validation, the SEPM (Eq. ) will be taken as criterion for model comparisons (Table 3). The RMSPD based on the EVP method (RMSP[D.sub.EVP]) was shown to be a good criterion for selecting the best-truncated AMMI models (Moreno-Gonzalez et al., 2003). The SEPM has the same structure as the RMSP[D.sub.EVP] criterion, since both were derived from the same concepts. SEPM is the same as RMSP[D.sub.EVP] after removing the error associated with the validation observation and replacing the GEI effects from the PC analysis by shrunken GEI effects. Again, the shrunken EVP method was better (i.e., it has a smaller SEPM estimate) than the BLUP and the best truncated AMMI models for all trials (Table 3), as was seen when model data with incomplete number of replicates were validated with the RMSPD criterion (Table 2). Best models for the EVP method included all PC axes in all trials, whereas the best models in truncated AMMI included the first five, three, and one PC axes in Trials 1, 2, and 3, respectively (Table 3). As discussed above, selection of the best AMMI depends on the number of replications. The BLUP model was better than the best truncated AMMI model for Trials 2 and 3, but AMM[I.sub.5] was superior to BLUP for Trial 1.
A formula for estimating shrinkage factors for the PC axes in AMMI models was developed which consisted of the ratio of the contribution of interaction variance to the entire variance (interaction plus error) for each PC axis. Coefficients for interaction and error variance components were estimated by the EVP data resampling method (Moreno-Gonzalez et al., 2003). The shrinkage factors are similar to those developed by Cornelius et al. (1993, 1996) and Cornelius and Crossa (1995) and were compared with alternative estimation methods in a prediction assessment study involving five multienvironment cultivar trials by Cornelius and Crossa (1999). The results of the present study, along with results of Cornelius and Crossa (1999), conclusively establish that, for separation of pattern from noise in estimating cell means in cultivar trials, shrinkage estimators are superior to parsimonious ("truncated") least squares-fitted AMMI models (whether chosen by cross-validation or any hypothesis testing criterion).
In the present study, the EVP-based shrinkage estimators were found to be empirically more predictively accurate than the CCC method [if the absorption of error variance by PC axes was estimated as df defined by Gollob (1968)] in all but three of the comparisons made. The difference was trivially small if error variance was small, but became of greater importance as error variance increased. The study did not reveal any disadvantages with respect to performance of the EVP shrinkage estimators as compared with the other methods studied [which included parsimonious ("truncated") least squares-fitted AMMI models and BLUPs based on a two-way random effects model, in addition to the CCC method using Gollob's df].
A suggested protocol for estimating means in a multienvironment cultivar trial is to (i) compute an analysis of variance of the data and a two-way table of empirical cell means, (ii) obtain the least squares solution for the full AMMI model, (iii) use the eigenvalue partition (EVP) data resampling method to estimate the contributions of error and interaction variances to the AMMI least squares PCs and compute the resulting shrinkage factors, (iv) multiply the least squares estimates of the AMMI interaction singular values by their respective shrinkage factors, (v) estimate the cell means from the resulting shrunken AMMI model, and (vi) use the SEPM defined in this paper to estimate the standard errors of the shrinkage estimates of the cell means.
Abbreviations: AMMI, additive main effect and multiplicative interaction; BLUP, best linear unbiased predictor; CCC, work of Cornelius, Crossa, and associates; COMM, completely multiplicative model; EVP, eigenvalue partition; GEAR, genotype, environment, attribute model; GEI, genotypes x environment interaction; GREG, genotype regression model; MSEPM, mean squared error of predicted means; PC, principal component; RCBD, randomized complete block design; RMSPD, root mean squared predictive difference; SEPM, standard error of predicted means; SREG, sites regression model.
Table 1. Pertinent parameter estimates of shrunken models based on the Cornelius et al. (1993, 1996) and Cornelius and Crossa (1995-1999 (CCC) method and the eigenvalue partition (EVP) method for predicting cell means in the entire data of the multi-environment cultivar Trials 1, 2, and 3, with different repetions (r), genotype x environment interaction variance ([[sigma].sup.2].sub.GE]) and error variance ([[sigma].sup.2.sub./r]) components associated with cell means in the ANOVA. CCC method Singular Gollob's Shrinkage PC value df factor axes ([lambda].sub.k]) ([u.sub.k]) ([S.sub.k]) Trial 1, [[sigma].sup.2].sub.GE] = 410824, [[sigma].sup.2].sub.e]/r = 42284, r = 4 1 6610.9 23 0.978 2 3167.2 21 0.911 3 1469.2 19 0.628 4 1433.1 17 0.650 5 1207.1 15 0.565 6 919.0 13 0.349 7 650.4 11 0 8 555.5 9 0 9 434.1 7 0 Trial 2, [[sigma].sup.2].sub.GE] = 138921, [[sigma].sup.2].sub.e]/r = 105728, r = 4 1 4273.6 38 0.790 2 3706.8 36 0.723 3 2931.2 34 0.585 4 2151.6 32 0.269 5 1939.7 30 0.157 6 1833.3 28 0.119 7 1260.2 26 0 Trial 3, [[sigma].sup.2].sub.GE] = 104923, [[sigma].sup.2].sub.e]/r = 41512, r = 3 1 2998.8 18 0.917 2 1189.3 16 0.530 3 1141.7 14 0.464 4 904.6 12 0.391 5 639.7 10 0 6 521.0 8 0 7 356.7 6 0 8 220.0 4 0 9 106.9 2 0 CCC method Shrunken EVP method singular value Non-adjusted Adjusted PC (([S.sub.k]) (m - 1)(n - 1) (m - 1)(n - 1) axes [lambda].sub.k]) [e.sub.k] [e.sub.k] Trial 1, [[sigma].sup.2].sub.GE] = 410824, [[sigma].sup.2].sub.e]/r = 42284, r = 4 1 6463.8 33.6 34.5 2 2886.9 22.1 22.3 3 922.2 18.5 18.5 4 931.5 15.5 15.5 5 681.7 13.2 13.2 6 320.9 10.9 10.9 7 0 8.7 8.7 8 0 7.0 6.9 9 0 5.5 4.4 Trial 2, [[sigma].sup.2].sub.GE] = 138921, [[sigma].sup.2].sub.e]/r = 105728, r = 4 1 3455.0 45.9 48.4 2 2680.0 39.6 41.4 3 1704.8 35.3 36.1 4 579.1 31.4 31.6 5 304.3 27.4 27.4 6 218.6 23.7 23.7 7 0 20.7 15.3 Trial 3, [[sigma].sup.2].sub.GE] = 104923, [[sigma].sup.2].sub.e]/r = 41512, r = 3 1 2749.6 25.7 30.1 2 630.7 18.8 19.1 3 483.6 12.6 12.9 4 353.7 9.8 10.0 5 0 7.5 7.6 6 0 5.9 5.8 7 0 4.4 3.1 8 0 3.2 1.2 9 0 2.1 0.3 EVP method Adjusted Non-adjusted Adjusted shrunken shrinkage shrinkage singular value PC factor factor ([S.sub.k]) axes ([S.sub.k]) ([S.sub.k]) [lambda].sub.k]) Trial 1, [[sigma].sup.2].sub.GE] = 410824, [[sigma].sup.2].sub.e]/r = 42284, r = 4 1 0.968 0.967 6390.2 2 0.907 0.906 2869.4 3 0.640 0.637 936.0 4 0.683 0.681 975.3 5 0.620 0.618 745.4 6 0.459 0.456 418.9 7 0.130 0.125 81.6 8 0.060 0.057 31.6 9 0.001 0.001 0.3 Trial 2, [[sigma].sup.2].sub.GE] = 138921, [[sigma].sup.2].sub.e]/r = 105728, r = 4 1 0.744 0.732 3203.3 2 0.695 0.682 2527.1 3 0.566 0.555 1627.3 4 0.293 0.278 598.5 5 0.228 0.226 438.0 6 0.258 0.748 455.1 7 0 0 0 Trial 3, [[sigma].sup.2].sub.GE] = 104923, [[sigma].sup.2].sub.e]/r = 41512, r = 3 1 0.881 0.861 2582.5 2 0.448 0.437 520.5 3 0.520 0.508 528.9 4 0.503 0.492 444.8 5 0.233 0.228 145.7 6 0.117 0.114 59.7 7 0 0 0 8 0 0 0 9 0 0 0 Table 2. Root mean square predictive differences (RMSPD) of several cell prediction models: Cornelius (1993) and Cornelius and Crossa (1995) (shrunken CCC model), shrunken eigenvalue partition (EVP) model, best linear unbiased predictor (BLUP), and best truncated additive multiplicative model interaction (AMMI) model, and the ratio of Genotype 3environment variance ([[sigma].sup.2.sub.GE]) to error variance of cell means ([[sigma].sup.2.sub.e/r]), for simulation data with different trial standard errors and empirical data, averaged over 252 random data sets. RMSPD Type of Standard Shrunken Shrunken BLUP data error CCC EVP Case 1. Trial 1. Three replicates for the model, one replicate for validation Simulation 240 266.0 265.4 267.0 Simulation 400 438.2 437.3 444.8 Simulation 600 642.3 641.4 653.0 Simulation 750 793.9 793.2 808.2 Simulation 1000 1051.7 1051.2 1069.1 Simulation 1500 1559.1 1557.2 1575.7 Simulation 2500 2555.2 2549.0 2553.8 Empirical 411.3 462.4 461.4 469.3 Case 2. Trial 2. Three replicates for the model, one replicate for validation Simulation 240 257.6 257.4 258.0 Simulation 400 424.7 424.0 424.8 Simulation 600 629.5 627.5 629.0 Simulation 750 780.3 777.5 778.8 Simulation 1000 1025.7 1021.9 1023.5 Simulation 1500 1513.0 1506.9 1509.0 Simulation 2500 2471.0 2460.5 2461.6 Empirical 650.3 706.8 706.2 712.8 Case 3. Trial 1. Two replicates for the model, two replicates for validation Simulation 240 227.6 225.5 230.2 Simulation 400 366.7 364.0 377.0 Simulation 600 527.0 526.3 549.0 Simulation 750 645.8 646.1 669.6 Simulation 1000 845.1 845.2 867.2 Simulation 1500 1239.9 1235.7 1253.2 Simulation 2500 1974.7 1948.9 1945.9 Empirical 411.3 382.5 380.7 397.7 Case 4. Trial 2. Two replicates for the model, two replicates for validation Simulation 240 221.2 220.9 221.6 Simulation 400 358.8 357.0 358.6 Simulation 600 520.3 516.0 518.5 Simulation 750 635.0 629.4 631.7 Simulation 1000 815.5 808.4 809.4 Simulation 1500 1170.8 1160.0 1159.3 Simulation 2500 1898.0 1878.7 1875.3 Empirical 650.3 560.4 561.2 567.5 Case 5. Trial 3. Two replicates for the model, one replicate for validation Simulation 240 273.2 271.6 2745.5 Simulation 400 445.2 443.6 449.4 Simulation 600 652.3 651.3 657.9 Simulation 750 802.5 801.1 806.9 Simulation 1000 1060.8 1058.2 1061.0 Simulation 1500 1566.8 1556.5 1554.3 Simulation 2500 2567.0 2548.9 2541.9 Empirical 352.9 402.9 402.4 415.0 RMSPD Type of Best AMMI No. of PC in r [[sigma].sup.2.sub.GE/ data truncated best AMMI [[sigma].sup.2sub.e] Case 1. Trial 1. Three replicates for the model, one replicate for validation Simulation 267.9 9 23.80 Simulation 447.3 9 8.50 Simulation 662.0 2 3.78 Simulation 811.4 2 2.42 Simulation 1072.5 2 1.36 Simulation 1575.7 1 0.60 Simulation 2569.4 0 0.22 Empirical 472.1 2 8.03 Case 2. Trial 2. Three replicates for the model, one replicate for validation Simulation 259.3 7 12.74 Simulation 431.8 7 5.59 Simulation 648.6 6 2.04 Simulation 810.1 2 1.30 Simulation 1065.3 1 0.73 Simulation 1532.3 1 0.33 Simulation 2468.8 0 0.12 Empirical 721.2 2 1.73 Case 3. Trial 1. Two replicates for the model, two replicates for validation Simulation 232.8 9 15.73 Simulation 387.9 9 5.66 Simulation 548.1 2 2.52 Simulation 671.3 2 1.61 Simulation 862.9 1 0.91 Simulation 1269.8 1 0.40 Simulation 1956.4 0 0.14 Empirical 391.4 2 5.36 Case 4. Trial 2. Two replicates for the model, two replicates for validation Simulation 225.8 7 8.49 Simulation 376.1 7 3.06 Simulation 563.3 7 1.36 Simulation 690.4 2 0.87 Simulation 863.3 0 0.49 Simulation 1180.1 0 0.22 Simulation 1879.2 0 0.08 Empirical 594.3 2 1.16 Case 5. Trial 3. Two replicates for the model, one replicate for validation Simulation 279.6 9 5.08 Simulation 463.7 1 1.83 Simulation 664.6 1 0.81 Simulation 820.1 1 0.52 Simulation 1068.9 0 0.29 Simulation 1558.1 0 0.13 Simulation 2543.6 0 0.05 Empirical 401.6 1 1.68 Table 3. Standard error of predicted means (SEPM) of the shrunken eigenvalue partition (EVP) model, best linear unbiased predictor (BLUP), and best truncated additive multiplicative model interaction (AMMI) model for empirical data of different trials, averaged over 250 random data sets. Estimated genotype No. of x environment Type of empirical data repetitions variance Trial 1. Triticale 4 410824 Trial 2. Maize 4 138921 Trial 3. Faba beans 3 104923 SEPM Experimental Standard error Shrunken Type of empirical data of cell means EVP BLUP ([dagger]) Trial 1. Triticale 205.6 172.0 197.3 Trial 2. Maize 325.2 237.4 258.7 Trial 3. Faba beans 203.7 160.4 178.6 SEPM No. of PC Best in best Type of empirical data AMMI AMMI Trial 1. Triticale 191.2 5 Trial 2. Maize 279.0 3 Trial 3. Faba beans 186.4 1 ([dagger]) BLUPs were estimated with main effects fixed to provide fair comparison with the other methods.
Research was partially funded by INIA/Spain grants SC97074 and RTA01-140.
Cornelius, P.L. 1993. Statistical tests and retention of terms in the additive main effects and multiplicative interaction model for cultivar trials. Crop Sci. 33:1186-1193.
Cornelius, P.L., J. Crossa, and M.S. Sayedsadr. 1993. Tests and estimators of multiplicative models for variety trials. Proceedings of 5th Annual Kansas State Univ. Conference on Applied Statistics in Agriculture. Manhattan, KS.
Cornelius, P.L., and J. Crossa. 1995. Shrinkage estimators of multiplicative models for crop cultivar trials. Tech. Rep. 352, Univ. of Kentucky, Dep. of Statistics, Lexington, KY.
Cornelius, P.L., J. Crossa, and M.S. Seyedsadr. 1996. Statistical tests and estimators of multiplicative models for genotype-by-environment interaction, p.199-234. In M.S. Kang and H. G. Gauch (ed.) Genotype-by-environment interaction. CRC Press, Boca Raton, FL.
Cornelius, P.L., and J. Crossa. 1999. Prediction assessment of shrinkage estimators of multiplicative models for multi-environment trials. Crop Sci. 39:998-1009.
Crossa, J., R.W. Zobel, and H.G. Gauch. 1990. Additive and muitiplicative interaction analysis of two international maize cultivar trials. Crop Sci. 30:493-500.
Crossa, J., and P.L. Cornelius. 1994. Recent developments in multiplicative models for cultivar trials. In D.R. Buxton et al. (ed.) International crop science I. CSSA, Madison, WI.
Cubero, J.I., and F. Flores. 1995. Metodos estadisticos, ed. Junta de Andalucia, Consejeria de Agricultura y Pesca, Sevilla, Spain.
Gauch, H.G. 1988. Model selection for yield trials with interaction. Biometrics 44:705-715.
Gauch, H.G., and R.W. Zobel. 1988. Predictive and postdictive success of statistical analysis of yield trials. Theor. Appl. Genet. 76:1-10.
Gauch, H.G., and R.W. Zobel. 1989. Accuracy and selection success in yield trials. Theor. Appl. Genet. 77:473-481.
Gollob, H.F. 1968. A statistical model which combines features of factor analytic analysis of variance techniques. Psychometricka 33:73-115.
Goodman, L.A., and S.J. Haberman. 1990. The analysis of nonadditivity in two way analysis of variance. J. Am. Statist. Assoc. 85:139-145.
Moreno-Gonzalez, J., and J. Crossa. 1998. Combining genotype, environment and attribute variables in regression models for predicting cell means of multi-environment cultivar trials. Theor. Appl. Genet. 96:803-811.
Moreno-Gonzalez, J., J. Crossa, and P.L. Cornelius. 2003. Additive main effect and multiplicative interaction model. I Theory on variance components for predicting cell means. Crop Sci. 43:1967-1975 (this issue).
Royo, C., A. Rodriguez, and I. Romagosa. 1993. Differential adaptation of complete and substituted triticale. Plant Breed. 111:113-119.
SAS Institute Inc. 1999. SAS/STAT/IML user's guide, Version 8, Fourth Edition, Cary, NC.
J. Moreno-Gonzales, J. Crossa,* and P. L. Cornelius
J. Moreno-Gonzalez, Centro de Investigaciones Agrarias de Mabegondo, Apartado 10, A Coruna, Spain; J. Crossa, Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico DF, Mexico; P.L. Cornelius, Dep. of Agronomy and Dep. of Statistics, Univ. of Kentucky, Lexington, KY 40546-0091. Received 18 Dec. 2002. * Corresponding author (email@example.com).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Crop Breeding, Genetics & Cytology|
|Author:||Moreno-Gonzalez, J.; Crossa, J.; Cornelius, P.L.|
|Date:||Nov 1, 2003|
|Previous Article:||Additive main effects and multiplicative interaction model: I. theory on variance components for predicting cell means.|
|Next Article:||Root growth parameters of converted race stocks of upland cotton and two B[C.sub.2][F.sub.2] populations.|