Evaluation of discriminant analysis in identification of low- and high-water use Kentucky bluegrass cultivars.
These studies demonstrate that estimating comparative water use in turfgrass under non-limiting soil moisture conditions is a multivariate problem; comparative water use is affected by several morphological and growth characteristics associated with components of canopy resistance to ET and leaf area. These characteristics operate simultaneously, but their effects on KBG ET are not independent (Ebdon and Petrovic, 1998a). As a result, efficiency in water use prediction can be improved by considering simultaneously several plant attributes that are correlated with water use. The multivariate technique of discriminant analysis developed by Fisher (1936) provides an effective method for this purpose when groups are of interest, e.g., comparative water use groups. Discriminant analysis has been used successfully in horticulture in classifying plant material. For example, Lapins and Nash (1957) used discriminant analysis to identify peach [Prunus persica (L.) Batsch] cultivars, and Eaton and Lapins (1970) used it to distinguish between standard and compact types of apple trees (Malus domestica Borkh.). Bruneau et al. (1987) used discriminant analysis to classify KBG cultivars into four billbug resistance classes. The method has seen little application because computations are complex and time consuming. However, the use of discriminant analysis in classifying plant material has become more convenient because of the increasing speed of personal computers and the availability of statistical software that performs discriminant analysis.
This study was conducted to determine the effectiveness of discriminant analysis in recognizing water conserving types of KBG on the basis of several plant measurements, including components of canopy resistance to ET and leaf area. Plant measurements were obtained from unmowed, spaced plants and mowed turfgrass because both are relevant in evaluating turfgrass (Bourgoin and Mansat, 1977; van Wijk, 1989).
MATERIALS AND METHODS
Comparative Water Use Groups
Sixty-one KBG cultivars were evaluated for ET rate under controlled conditions in three VPD environments (1.263, 1.664, and 2.261 kPa) by the water balance method. The results along with methodology have been reported elsewhere (Ebdon et al., 1998b). The 61 cultivars were categorized into comparative water use groups by hierarchical agglomerative cluster analysis on the basis of their individual ET rates measured at VPD of 1.263, 1.664, and 2.261 kPa (see Ebdon and Petrovic, 1998a). Two distinct clusters (groups) were revealed in the analysis. The smaller of the two groups contained 28 members, which clustered below the grand mean ET of 5.91 mm [d.sup.-1], and hence was labeled as low ET cases. The larger of the two groups contained 33 members, which clustered above the grand mean and was labeled as high ET cases. The groups indicate similarities between cultivars in their water use properties based on actual ET data.
The use of discriminant analysis for identification of KBG cultivars having low water use patterns based on the cultivar's morphological properties is a reasonable procedure because of the relationship that exists between turfgrass morphology and comparative water use (Shearman, 1986; Ebdon and Petrovic, 1998a). The application of discriminant analysis in this present study is based on the premise that cultivar groups that are dissimilar in their water use properties are likely to differ in their morphological properties and therefore classification functions can be developed that will recognize the differences in pattern between water use groups. A categorical classification variable was defined having the value `2' for high-water use cultivars and `1' for low-water use cultivars. For specific case cluster membership and cultivars used in the study, see Ebdon and Petrovic (1998a).
A greenhouse study to evaluate the morphological characteristics of the 61 KBG cultivars grown as unmowed, spaced plants was initiated in the early spring of 1993. Plant measurements were also obtained beginning early December 1994 and ending late January 1995 from mowed 20-cm-diam. lysimeters used for evaluation of cultivar ET. Each cultivar was replicated six times in the unmowed, spaced plant study and four times in the water use study. For specific methodology and description of the 14 characteristics that were measured or calculated on each of the 61 KBG cultivars from unmowed, spaced plants and mowed lysimeters, see Ebdon and Petrovic (1998a).
Data had been previously analyzed by analysis of variance (ANOVA) to investigate the effect of cultivar, see Ebdon and Petrovic (1998a). Analysis of variance did not detect significant differences (P [is greater than or equal to] 0.05) between low- and high-water use groups for only two variables from unmowed, spaced plants (shoot fresh-weight and shoot dry-weight) and three variables from mowed lysimeters (number of leaves per shoot, shoot moisture content, and root density).
A brief description of discriminant analysis is given below; however, for a more thorough treatment of discriminant analysis and discriminant functions see Johnson and Wichern (1992). Discriminant analysis is concerned with the efficient separation, or discrimination, of two or more groups of cases (here, the low- and high-water use groups of cultivars) based on the observed attributes of the cases (here, the measured attributes of the 61 KBG cultivars). With the two group case, the simplest situation, the usual approach is to find the linear combination of the measured attributes, which are often referred to as Fisher's linear discriminant function, that maximizes the ratio of the between-groups to within-groups variation. The value of this function is then calculated and used to classify cases as belonging to one of the two groups.
For the two or more groups, a more general approach is to calculate for each Case x the linear discriminant scores for all Groups i = 1, ..., g,
 [d.sub.i](x) = [m.sub.1] [S.sup.-1] x - 0.5 [m.sub.i] [S.sup.-1] [m.sub.i] + In([p.sub.i]),
where x and [m.sub.i] denote the column vectors containing the attribute values for the case and the mean attribute values for all cases in Group i, respectively, [m.sub.i] denotes the transposition of column vector [m.sub.i] into a row vector, S denotes the pooled estimated within-group covariance matrix, and In ([p.sub.i]) is the natural logarithm of the prior probability that a case is from Group i. Then classify each Case x as belonging to the Group i that produces the largest value of [d.sub.i](x).
When the groups have a multivariate normal population with a common covariance matrix, assigning a Case x to the Group i that has the largest linear discriminant score [d.sub.i](x) minimizes the total probability of misclassifying a case. When the groups do not have a common covariance matrix, the pooled estimated covariance matrix S in Eq.  is replaced by [S.sub.i], the estimated covariance matrix for Group i. The resulting, more complex quadratic discriminant scores again minimize the total probability of misclassifying a case. If the prior probabilities pi are assumed to be equal, the terms In([p.sub.i]) are all equal, so they can be omitted from the linear or quadratic discriminant scores.
Fourteen plant measurements from unmowed, spaced plants and an equal number from mowed lysimeters were analyzed to identify discriminators of low- and high-water use groups. We were interested in the effectiveness of physical measurements based on unmowed, single plant morphology and in how they compared with turfgrass morphology from mowed lysimeters in discriminating between the two water use groups. The analysis included 61 cultivars, 28 from the low-water use group and 33 from the high-water use group. In all, 854 observations (61 cv. X 14 plant variables) from each study were included in the analysis.
Discriminant analysis was performed with stepwise variable selection with SPSS (SPSS Inc., 1990) to find a subset of the 14 predictor variables making a significant (P [is less than or equal to] 0.05) contribution in the variability of the categorical-dependent variable. Variable selection procedures have the same limitations in discriminant analysis as in regression analysis (Murray, 1977). Therefore, models of various sizes are reported here rather than a single model. MINITAB (MINITAB Inc., 1989) was used to develop linear discriminant functions and to perform cross-validation.
In assessing the effectiveness of discriminant functions in predicting group membership, an apparent error rate (APER) results from the use of the same data set to derive the classification function and then to validate the function. The holdout procedure described by Lachenbruch and Mickey (1968) can be used to compensate for the optimistic bias of the APER in estimating the actual error rate. The holdout procedure or leave-one-out method (LOER) is one of several methods of cross-validation. This procedure omits the first case (cultivar) from the analysis, develops a classification function using the 60 (n - 1) remaining cases, then classifies the omitted observation. The omitted case is then returned to the data set and the holdout procedure is repeated with every case in turn omitted and then classified. The holdout procedure is an alternative to splitting a data set into training samples and validation samples in estimating actual error rate (Johnson and Wichern, 1992). We report classification results as a percentage of total correct classification based on LOER, and for completeness, APER is also reported.
Departures from the assumption of equal group covariances were detected for some models. Therefore, quadratic discriminant functions may be more appropriate than linear functions. We reported the percentage of total correct classification for both linear and quadratic models because good classification may be achieved even if some of the assumptions for the analysis have not been met in a specific situation. (When the assumptions are satisfied, the classification rules are optimal and the error rates minimal.)
We had no reason to believe that the prior probabilities for the low- and high-water use groups are different. This initial assumption is supported by the sample proportions of 28/61 and 33/61 for the low- and high-water use groups, respectively, which are approximately 0.50. Therefore, an equal prior probability assumption was used in the analysis.
RESULTS AND DISCUSSION
The goal of discriminant analysis in this study was to predict group membership for the purpose of screening for water conserving types. The low- and high-water use group data in this study are from a random sample of 61 KBG cultivars representing a continuum of ET, from a low of 4.42 to a high of 8.54 mm [d.sup.-1], measured across a broad range of VPD. When evaluating error rates, it is important to compare the observed misclassification rate to that expected by chance alone. For example, if there are two groups with equal prior probability, the expected misclassification rate is 50%, hence we seek a classification function that achieves significantly better than 50% correct classification. These concepts should be kept in mind when interpreting the classification results reported here.
Classification results based on unmowed, spaced plant morphology in discriminating low- and high-water use groups are shown in Table 1. Based on a less biased estimate of actual error rate from cross-validation using the leave-one-out method (LOER), the best correct classification observed was 70.5% for both linear and quadratic functions using seven and five predictors, respectively. Therefore, we would expect to classify seven out of ten cases into their true groups. Compared with LOER, correct classification rates based on APER are inflated because of the optimistic bias associated with this estimate of error rate.
Table 1. Discriminant analysis correct classification rates based on variables from unmowed, spaced plants as predictors of low- and high-water use groups.
Number of Linear functions Quadratic functions predictors LOER([dagger]) APER LOER([dagger]) APER 7 70.5 77.1 68.9 85.2 6 65.6 78.8 67.2 83.6 5 67.2 77.1 70.5 80.3 4 67.2 70.5 63.9 70.5 14 63.9 77.1 60.7 95.1
([dagger]) Estimate of actual error rate from cross-validation using the leave-one-out method.
The 14 original plant variables from unmowed, spaced plants are shown in Table 2, ordered by their F-statistics from a oneway ANOVA with comparative water use group as the classification factor. Some of these predictor variables were identified by variable selection discriminant analysis as important discriminators between the low- and high-water use groups. Many of the important discriminators, such as shoot-to-root ratio, leaf angle, leaf extension rate, and tiller number, were entered early in the variable selection procedure because their mean values change considerably between the two groups, as indicated by their large F-ratios. Shoot-to-root ratio had the largest F to enter and therefore was entered first. Root density had a significant F but was highly correlated with shoot-to-root ratio (r = -0.45, P [is less than or equal to] 0.001) and therefore was never entered. Other variables, such as leaf length and sheath length, were highly correlated with vertical leaf extension rate, r = 0.78 (P [is less than or equal to] 0.001) and 0.67 (P [is less than or equal to] 0.001), respectively, and became important discriminators (or substitutes for leaf extension) when leaf extension rate was omitted from the analysis. Some variables, such as shoot dry weight, crown type, and shoot fresh weight, appear to be unimportant when considered individually, but in combination with other discriminators contributed significantly in discriminating between groups.
Table 2. Predictor variables from unmowed, spaced plants ordered by their F-ratio from oneway ANOVA with water use group as the classification factor.
Important Variable discriminator [F.sub.1,59] P value Shoot-to-root ratio x 7.55 0.008 Leaf angle x 6.33 0.014 Root density 5.97 0.018 Leaf extension rate x 3.46 0.068 Tiller number x 3.30 0.074 Leaf length x 2.04 0.159 Rhizome number 1.83 0.182 Sheath length x 1.68 0.200 Leaf width 1.61 0.210 Shoot moisture 1.24 0.269 Leaves per shoot 0.50 0.483 Shoot dry weight x 0.06 0.807 Crown type x 0.01 0.921 Shoot fresh weight x 0.00 0.976
Discriminant functions based on one to three variables from mowed turfgrass were identified by variable selection to be as effective as using all 14 original variables in discriminating between low- and high-water use groups (Table 3). Furthermore, for these discriminant functions a common pooled covariance matrix (S) was an adequate summary of the within group covariance matrices ([S.sub.i]), which indicated that linear discriminant analysis was appropriate. Compared with unmowed, single plant morphology, turfgrass morphology was more efficient in requiring fewer predictors to make a classification without any loss of discriminatory power. For example, a linear function using leaf angle alone as the predictor afforded 72.1% correct classification based on cross-validation. This is as good as or better than discriminant functions using five to seven variable based on unmowed, single plant morphology. A correct classification rate of 75.4% from cross-validation using linear discriminant functions based on two and three variable was the highest rate achieved. Classification results were similar for the functions based on one and three variables identified by variable selection.
Table 3. Discriminant analysis correct classification rates based on variables from mowed lysimeters as predictors of low- and high- water use groups.
Number of Linear functions Quadratic functions predictors LOER([dagger]) APER LOER([dagger]) APER 3 75.4 75.4 72.1 78.7 2 75.4 75.4 67.2 68.9 1 72.1 72.1 72.1 72.1 14 75.4 82.0 54.1 95.1
Estimate of actual error rate from cross-validation using the leave-one-out method.
Leaf angle was the most important discriminator (based on its F-statistic) of water use groups among the 14 variables evaluated from mowed turfgrass. Leaf angle is a component of canopy resistance to ET and had the largest F-ratio (19.01) for groups, which indicated the separation between groups was largest for this variable (Table 4). Other important discriminators identified by variable selection discriminant analysis are related to leaf area and included variables such as leaf width and sheath length. These variables also had large F-statistics and therefore their means differed significantly between groups.
Table 4. Predictor variables from mowed lysimeters; ordered by their F-ratio from oneway ANOVA with water use group as the classification factor.
Important Variable discriminator [F.sub.1,59] P value Leaf angle x 19.01 <0.001 Leaf width x 7.38 0.009 Shoot fresh weight 6.36 0.015 Tiller number x 6.33 0.015 Sheath length x 5.82 0.019 Shoot dry weight 3.95 0.051 Leaf ext. rate, 2.261 kPa x 1.36 0.247 Shoot moisture 0.75 0.391 Shoot-to-root ratio 0.48 0.490 Root density 0.30 0.585 Leaf ext. rate, 1.664 kPa x 0.21 0.648 Verdure x 0.12 0.733 Leaf ext. rate, 1.263 kPa x 0.09 0.767 Leaves per shoot 0.00 0.978
Comparative water use in turfgrass is affected by several morphological and growth characteristics operating in combination. In KBG, however, these characteristics associated with canopy resistance to ET and leaf area are likely to be interdependent (Ebdon and Petrovic, 1998a), with important implications for discriminant analysis. For example, tiller number (a component of canopy resistance) had a significant F but was highly correlated with leaf angle (r = -0.47, P [is less than or equal to] 0.001). Because tiller number and leaf angle are interdependent, there is a potential for the duplication of information and effort when both leaf angle and tiller number are considered simultaneously as predictors of water use groups. We found that tiller number (or shoot density) was not important in predicting water use groups when considered in combination with leaf angle. Tiller number was only entered as a discriminator of water use groups with leaf angle omitted from the analysis. Similarly, when considered individually, shoot fresh weight and shoot dry weight appear to be important based on their F-ratios; however, these variables are positively correlated with leaf width, r = 0.57 (P [is less than or equal to] 0.001) and r = 0.51 (P [is less than or equal to] 0.001), and therefore were not entered. Conversely, variables such as leaf extension rate and verdure appear to be unimportant when considered individually, but in combination with other predictors they can be important discriminators of water use groups.
A set of linear discriminant functions were obtained from standardized variables (predictor variables were standardized to a mean of 0 and a standard deviation of 1) using leaf angle and leaf width as predictors (Table 5) and a corresponding classification table for this set was developed (Table 6). This set of discriminant functions was developed based on established cultivars, and its suitability for predicting the water use group of unknown cultivars will depend on the range of data on which the discriminant functions were based. This specific standardization set is appropriate for the conditions of this experiment; it has not been evaluated under other conditions, but serves as an example. This set of linear discriminant functions is in the form given by Eq. . Variables were standardized so that the relative contribution of each component variable to the total compound discriminant score is indicated by the absolute magnitude of its corresponding discriminant coefficient. Recall that an observation is classified into the group (low- or high-water use) generating the larger discriminant score. Because the discriminant coefficients for leaf angle are approximately twice the magnitude relative to leaf width (Table 5), the relative contribution of leaf angle to the total compound score is approximately twice that of leaf width. Thus in the classification of an observation, leaf angle is given twice as much weight as leaf width (and hence can be viewed as twice as important).
Table 5. Coefficients for standardized linear discriminant functions using leaf angle and leaf width from mowed lysimeters as predictors of water use group.
Discriminant coefficients for group Variable Low High Constant -0.23 -0.16 Leaf angle -0.64 0.54 Leaf width -0.31 0.26
Table 6. Summary of classification with cross-validation using leaf angle and leaf width from mowed lysimeters as predictors of water use group.
Predicted group True group membership Group Number of cases Low High Low 28 19 6 High 33 9 27 Proportion correct (%): 67.9 81.8 Total correct (%): 75.4
The signs of the discriminant coefficients (Table 5) have important biological interpretations in the classification of an observation based on discriminant scores. For example, large observed values for leaf angle (e.g., a substantial vertical leaf orientation) and large observed values for leaf width (e.g., a wide leaf) are morphological characteristics associated with high water use rates and have corresponding large positive standardized values. These values contribute to small scores (negative terms) for the low-water use group and to large scores (positive terms) for the high-water use group. Conversely, small observed values for leaf angle (e.g., a substantial horizontal leaf orientation) and small observed values for leaf width (e.g., a narrow leaf) are morphological characteristics associated with low water use rates and have corresponding large negative standardized values. These values contribute to large scores (positive terms) for the low-water use group and small scores (negative terms) for the high-water use group. Thus, the classification of an observation as a low water user (based on discriminant scores) is consistent with the high canopy resistance to ET/minimal leaf area hypothesis that has been proposed in warm-season turfgrass (Kim and Beard, 1988a).
By means of leaf angle and leaf width as predictors, 75.4% of the observations were correctly classified into their true groups by LOER (Table 6). However, a higher error rate was observed in the classification of low-water use cases (67.9% correct classification) compared with high-water use cases (81.8% correct classification). Our objectives are to screen for water conserving patterns, so the identification of low-water use types is most important, and therefore misclassifications of these are more costly. The identification of water conserving types was improved by replacing leaf width with a different component of leaf area, leaf extension rate (Table 7). Overall correct classification by leaf angle and leaf extension rate as predictors remained unchanged (75.4%) by LOER compared with leaf angle and leaf width; however, correct identification of low-water use cases increased to 75.0%, so fewer water conserving types were misclassified.
Table 7. Summary of classification with cross-validation using leaf angle and leaf extension rate from mowed lysimeters as predictors of water use group.
Predicted group True group membership Group Number of cases Low High Low 28 21 8 High 33 7 25 Proportion correct (%): 75.0 75.8 Total correct (%): 75.4
A set of discriminant functions that has been thoroughly tested could be used to predict the water use patterns of new cultivars on the basis of a few simple plant measurements that are routinely assessed by turfgrass breeders. The results here, based on a random sample of 61 KBG cultivars, demonstrate that discriminant analysis may be an efficient and useful tool for this purpose. Further work will be needed, however, before the technique can be considered practical. First, the method should be reevaluated under field conditions. Second, visual ratings (qualitative variables) will need to be evaluated by a visual rating system similar to that utilized in assessing large collections in the field (Horst et al., 1984). This type of analysis can also be applied to other turfgrass species that share a similar relationship between morphological properties and comparative water use.
The authors would like to thank the Lofts Seed Co. for partially funding this research.
Abbreviations: ANOVA, analysis of variance; APER, apparent error rate; ET, evapotranspiration; KBG, Kentucky bluegrass; LOER, leave-one-out error rate.
Bourgoin, B., and P. Mansat. 1977. Comparisons of micro-trials and space planted nurseries with dense swords as means for evaluating turfgrass genotypes. p. 3-9. In J.B. Beard (ed.) Proc. 3rd Int. Turfgrass Res. Conf., Munich, Germany. 11-13 July. Int. Turfgrass Soc., ASA, CSSA, SSSA, Madison WI.
Bruneau, A.H., A.M. Parkhurst, and R.C. Shearman. 1987. Discriminant analysis for billbug resistance ratings. J. Am. Soc. Hort. Sci. 112(6):978-980.
Eaton, G.W., and K.O. Lapins. 1970. Identification of standard and compact apple trees by discriminant function analysis. J. Appl. Ecol. 7(2):267-272.
Ebdon, J.S., and A.M. Petrovic. 1998a. Morphological and growth characteristics of low- and high-water use kentucky bluegrass cultivars. Crop Sci. 38:143-152 (this issue).
Ebdon, J.S., A.M. Petrovic, and R.W. Zobel. 1998b. Stability of evapotranspiration rates in Kentucky bluegrass cultivars across low- and high-evaporative environments. Crop Sci. 38:135-142 (this issue). Fisher, R.A. 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenics. 7:179-188.
Horst, G.L., M.C. Engelke, and W. Meyers. 1984. Assessment of visual evaluations techniques. Agron. J. 76:619-922.
Johnson, R.A., and D.W. Wichern. 1992. Applied multivariate statistical analysis. 3rd ed. Prentice-Hall, Englewood Cliffs, NJ.
Kim, K.S., and J.B. Beard. 1988a. Comparative evapotranspiration rates and associated plant morphological characteristics. Crop Sci. 28:328-331.
Kim, K.S., and J.B. Beard. 1988b. Turfgrass morphological characteristics associated with evapotranspiration rate. p. 18-19. Texas Turfgrass Research. Texas Agric. Exp. Sta. PR-4662.
Lachenbruch, P.A., and M.R. Mickey. 1968. Estimation of error rates in discriminant analysis. Technometrics. 10(1):1-11. Lapins, K., and S.W. Nash. 1957. Discriminant function analysis in the identification of peach varieties in nursery trees. Can. J. Plant Sci. 37:12-25.
MINITAB Inc. 1989. Reference manual. Version 10. MINITAB Inc., State College, PA.
Murray, G.D. 1977. A cautionary note on selection of variables in discriminant analysis. Appl. Statistics 26(3):246-250.
Shearman, R.C. 1986. Kentucky bluegrass cultivar evapotranspiration rates . HortScience 21:455-457.
Sifers, S.I., J.B. Beard, and K.S. Kim. 1986. Criteria for visual prediction of low water use rates of bermudagrass cultivars. p. 22-23. Texas Turfgrass Research. Texas Agric. Exp. Sta. PR-4519.
SPSS Inc. 1990. SPSS/PC Advanced Statistics 4.0. SPSS Inc., Chicago. van Wijk, AT P. 1989. The relationship between turf performance and single plant morphology data in red fescue. p. 113-115. In H. Takatoh (ed.) Proc. 6th Int. Turfgrass Res. Conf., Tokyo. 31 July-5 August. Int. Turfgrass Sec., ASA, CSSA, SSSA, Madison WI.
J.S. Ebdon, Dep. of Plant and Soil Sciences, 12F Stockbridge Hall, Univ. of Massachusetts, Amherst, MA 01003; A.M. Petrovic, Dep. of Floriculture and Ornamental Horticulture, Cornell Univ., and S.J. Schwager, Biometrics Unit and Statistics Center, Cornell Univ., Ithaca, NY 14853. Received 3 Nov. 1995. (*)Corresponding author.
Published in Crop Sci. 38:152-157 (1998).
|Printer friendly Cite/link Email Feedback|
|Author:||Ebdon, J.S.; Petrovic, A.M.; Schwager, S.J.|
|Date:||Jan 1, 1998|
|Previous Article:||Morphological and growth characteristics of low- and high-water use Kentucky bluegrass cultivars.|
|Next Article:||Relationship between carbon isotope discrimination, water use efficiency, and evapotranspiration in Kentucky bluegrass.|