Printer Friendly


Byline: M. Qasim, M. Amin and M. K. S. Sarwar

Keywords: Seed Cotton Yield, Biochemical Traits; Multicollinearity; Least Squares Regression Analysis; Liu Linear Regression.


Cotton yield is important for textile industry, oil production and has a significant contribution to the economy of cotton-growing areas. Cotton production stood at 11.935 million bales on an area of 2,699 thousand hectares with an increase of 8.4% with a portion of 1% in the GDP. It paid 5.50% in agriculture value addition (Anonymous, 2017-18) which appeals researchers to keep focus for its stability and improvement. Cotton attains the potential in yield because of its thin genetic base (Rahman et al., 2008). Abiotic and biotic stresses caused fluctuation in cotton production drastically during the last decades. Cotton is the primary source of natural fibers in the world which impacts on the textile industry and is the leading agriculture economic indicator of Pakistan's economy (Amin et al., 2015; Anonymous, 2017-18). Cotton production was negatively influenced by water stress. Water stress increases total soluble sugars accumulation in different crops such as rice, wheat and Soybean.

Scarcity of water severely affects cotton cultivation in arid and semi-arid regions. Development of high yielding and drought-tolerant cotton varieties is necessary to fulfil the demand of the ever-growing population of the world. Physiological and biochemical traits are being used to identify the drought-tolerant cotton germplasm and to understand the mechanism and genetic variability. Drought stress has decreased the chlorophyll 'a', 'b' total chlorophyll, a/b ratio and carotenoids. The photosynthetic processes with photosynthetic pigments, e.g., chlorophyll "a" and chlorophyll "b" have a major share in increases crop growth and yield (Taiz and Zeiger, 2006). Chlorophyll "a", chlorophyll "b", total chlorophyll and "a/b" ratio drastically affected by drought and decreased in concentration (Hamayun et al., 2010; Sarwar et al., 2012, Abid et al., 2016).

Yuan et al., (2013) studied the association between biochemical traits (flavonoid, fructose, glucose, sucrose and total sugar contents) and fiber quality attributes (length, strength, uniformity, fitness, elongation rate and cellulose 50 DPA) using heterosis breeding in colored cotton through correlation coefficient (r) and they found a significant association between two attributes. Abid et al., (2016) were studied the association of different biochemical traits and enzymatic antioxidants with Bacillus thuringiensis (Bt) under normal and drought conditions. Malik et al., (2018) were analyzed fiber quality attributes (independent variables), quality-related biochemical traits and measured the boll weight (dependent variable) for colored cotton. Coefficient of correlation results revealed that there was a significant linear relationship among fiber quality attributes (Malik et al., 2018).

The study of the linear relationship among biochemical traits is crucial for the improvement of seed cotton yield (SCY) and identification tolerant genotypes. However, SCY is a quantitative character, which is mostly affected by biochemical traits chlorophyll contents (a, b and total chlorophyll), total soluble protein, total soluble sugar. In order to select for higher SCY, there is a need to analyze the mathematical relationship between SCY and biochemical traits. Measuring the direct effects of traits on a SCY requires the estimation of multiple linear regression model (MLRM) and measuring can be negatively affected by the multicollinearity problem due to the linear relationship between traits. In the presence of multicollinearity, the statistical inference may be erroneous or unreliable. To solve the issue of multicollinearity alternative estimation method can be used.

Bizeti et al., (2004) suggested the ridge regression approach for estimating the path coefficients of soybean traits on grain yield per plant. Wang et al., (2011) studied the effect of seed yield components on the seed yield using path and ridge regression methods. Toebe and Filho (2013) analyzed the effect of multicollinearity by using path analysis for the maize crop. The literature shows that no information is available on the mathematical relationship by using biased estimation methods for measuring the impact of biochemical traits on SCY in the presence of multicollinearity. Therefore, the aims of the current study were: i) Biochemical traits are linearly correlated to each other and traits are positively related to SCY; and ii) compare the traditional MLRM and alternative biased estimation methods (Liu linear regression) of minimizing the adverse effects of multicollinearity for estimating the effect of biochemical traits on SCY.


Experiment: The experimental material consisted of 32 upland cotton accessions collected from different research institutes located in different ecological regions of Pakistan. Evaluation of accession was performed under two irrigation regimes in the field at the research area of National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan. SCY (kg ha-1) was hand-picked from all the plots at 180 days after planting. To address the impact of biochemical traits on cotton crop productivity and its contribution towards yield, we experimented on chlorophyll contents (a, b and total chlorophyll), total soluble protein, total soluble sugar under two water regimes well water (one irrigation at planting and five subsequent irrigations) and limited water (one irrigation at planting and one supplemental 40 days after planting) at NIBGE and the SCY considered as a dependent.

Chlorophyll contents (a, b and total) were measured by following Arnon (1949) and calculated according to Davies (1976). Concentrations of total sugars and total soluble proteins were determined according to Riazi et al. (1985) and Lowry et al. (1951), respectively. The main aim of this experiment was to check the effects of biochemical traits on SCY.

Liu Linear Regression: Regression analysis was used to estimates the conditional expectation of the dependent variable given the independent variables (or "explanatory variables"). It considered a valuable technique in the agriculture field. The unknown regression coefficients of the MLRM were estimated through the ordinary least squares (OLS) method. Consider the following form of the multiple LRM:


where yi represents the estimated SCY and biochemical traits represent as explanatory variables, i.e., x1 = chlorophyll"a", x2 = chlorophyll"b", x3 = total chlorophyll, x4 = total soluble protein and x5 = total soluble sugar. In the matrix form, Equation (1) can be written as

Y = X[beta] + Iu

where Y is an (nx1) vector of observations on the response variable, X is an (nxp) fixed design matrix on the explanatory variables of full rank, [beta] is a (px1) column vector of unknown regression coefficients, where p represents the number of explanatory variables and Iu is (nx1) vector of random errors which have distributed to normal with E(Iu) = 0 and E(IuIu1) = I2In where In is (nxn) identity matrix.

The OLS estimator (OLSE) of the unknown parametric vector [beta] is defined as

[beta]OLS = (Y)-1X'Y

where Y=X'X matrix. It is a common assumption in the LRM that the explanatory variables are not correlated with each other. Although, in routine, there may be strong or near to strong linear relationship has been found among the explanatory variables which lead to the problem of multicollinearity. The problem of multicollinearity may also be called collinearity (the linear relationship between two explanatory variables) and ill-conditioning. However, in literature, many studies showed that the OLSE is no longer efficient in the presence of multicollinearity. Since the OLSE heavily depends on the cross-product of the matrix Y. If the matrix Y is ill-conditioned (|Y| a 0), the performance of the OLSE does not satisfactory, for instance, the coefficients may be insignificant with a wrong sign and have large variance, and statistical inference becomes problematic (Kibria, 2003; Qasim et al., 2018; Qasim et al., 2019b).

In the presence of multicollinearity problem, it is complicated to make the valid statistical inference. Numerous biased estimation methods are available to overcome the problem of multicollinearity and Liu estimator (Liu, 1993) is one of them. Liu estimator was defined by Liu (1993)


where d denotes the Liu shrinkage parameter, and it takes the values between zero and one. When d = 1 then [beta]OLS = [beta]LIU and in the case d 1,..., I>>p], I>>1 [greater than or equal to] I>>2 [greater than or equal to]... [greater than or equal to] I>>p. Since I>>j(j - 1, 2,..., p) is the jth eigenvalues of the matrix X'X and (Eq.) is known as the estimated residual variance. For more knowledge regarding Liu estimator in the LRM (see, e.g., Liu, 1993, Qasim et al. 2019a). The [beta]OLS is on average too long in the presence of multicollinearity. Thus, [beta]LIU is considered to be the best choice instead of [beta]OLS in that condition.

This study also makes a comparison between the OLS and Liu estimator, where the MSE, Akaike information criterion (AIC) and standard error (SE) of the regression coefficients were considered as performance criteria. We also demonstrate the benefits of the Liu estimator in the LRM by observing the effect of biochemical traits on SCY experiment. So the main aim of the experiment to analyze whether chlorophyll levels 'a' and 'b', total chlorophyll, total soluble protein and total soluble sugar improve the SCY or not. We used liureg{} statistical package in R for regression analysis.

Table 1. Correlations matrix among study variables.

###SCY###Chlorophyll 'a'###Chlorophyll 'b'###Total###Total Soluble###Total Soluble



Chlorophyll a###1.0000###0.4723[a]###0.8501[a]###0.4470[a]###0.0888

Chlorophyll b###1.0000###0.8588[a]###0.3177###0.2814

Total Chlorophyll###1.0000###0.4503[a]###0.2385

Total Soluble protein###1.0000###0.2199

Total Soluble sugar###1.0000

Table 2. Ordinary least-squares linear regression model summary.

Variables###Estimates###Standard Errors###t-statistic###p-value

Chlorophyll "a"###-29.53###1088.87###-0.027###0.9784

Chlorophyll "b"###47.30###1025.48###0.046###0.9632

Total Chlorophyll###308.96###1054.60###0.293###0.7695

Total Soluble Protein###115.94###48.39###2.396###0.0166

Total Soluble Sugar###31.41###26.37###1.191###0.2337

Table 3. Liu linear regression model summary.

Variables###Estimates###Standard Errors###t-statistic###p-value

Chlorophyll "a"###73.25###104.42###0.701###0.48301

Chlorophyll "b"###120.30###104.62###1.150###0.25022

Total Chlorophyll###198.17###67.48###2.937###0.00332[a]

Total Soluble Protein###117.22###45.08###2.600###0.00932[a]

Total Soluble Sugar###33.32###24.55###1.357###0.17476


First, we computed the correlation matrix between the biochemical traits and SCY given in Table 1 and Fig. 1. From Table 1, we observed that biochemical traits were positively correlated with the SCY. This indicated that all biochemical traits increasing the SCY. Correlation among traits was significant except the correlation between chlorophyll 'b' and total soluble protein (r = 0.3177), and the correlation coefficients of total soluble sugar with other commercial traits (r = 0.0888, r = 0.2814, r = 0.2385 and r = 0.2199) were non-significant (Table 1). Next, we fitted the MLRM using traditional OLSE, which were defined in Equation (3), and the results were reported in Table 2. We observed that the only biochemical trait, i.e. total soluble protein (p-value = 0.0166) contributing a significant role in SCY. While other biochemical traits were showed statistically insignificant role in SCY.

In practice, we were considering that the biochemical traits have a positive impact on SCY and showed a significant role in increasing the yield of seed cotton. The computed results of the model given in Equation (1) were presented in Tables 2. From Table 2, we observed that the chlorophyll 'a' has a negative impact on SCY, and this result clearly shows the drawback of the OLSE due to the presence of multicollinearity problem. We also scattered the residual scatter plot to check the other necessary assumption of the regression model. For the identification of multicollinearity, the correlation matrix and condition index (CI) were considered. From Table 1, we perceived that the biochemical traits experiment has a serious multicollinearity problem. The problem of multicollinearity can also be tested by Condition Index (CI) as


which showed the existence of multicollinearity in the experimental dataset. Therefore, we used a Liu estimator in the MLRM instead of traditional OLSE for estimating the biochemical traits. However, in practice, chlorophyll 'a' has a positive impact on SCY, and the expected results show by Liu estimator in Table 3. As we mentioned earlier, the OLSE may produce the wrong sign of the regression coefficients, and we overcome this problem using Liu estimator in LRM. We also observe that the standard errors of the OLSE are higher than the Liu estimator, which clearly shows the inconsistency of the OLSE. It is noted that total chlorophyll and total soluble protein have a significant positive impact on SCY by applying Liu estimator. While total chlorophyll and total soluble protein have an insignificant effect on SCY, when we were used the OLSE. So, the Liu estimator with the propose shrinkage parameter d provides efficient results as compared to the OLSE.

Besides, we can see that the t-statistic values are smaller by using OLSE than Liu estimator, which also demonstrates the advantage of our new method. The performance of the Liu estimator is quite well as compared to the OLSE in the sense of MSE and AIC; for the OLSE, MSE = 3352475 and AIC = 366.7; for the Liu estimator, MSE = 57212 and AIC = 363.8. One can easily see that Liu estimator gives minimum values of the MSE and AIC as compared to the MSE and AIC of the OLS estimator. By observing the above problems, we recommend that the agriculture practitioners should use Liu estimator, which performs quite well instead of the traditional OLSE in the presence of multicollinearity.


In the literature, we observed that biochemical traits contributing a significant role in the yield of different crops (Ansari et al., 1999; Boggs et al., 2003; Song et al., 2005). To determine the role of these biochemical traits, most researchers focused on the regression analysis without incorporating the multicollinearity problem (Ansari et al., 1999; Boggs et al., 2003; Abid et al., 2016). We were observed from the correlation matrix (Table 1) that some biochemical traits were linearly correlated with each other, which cause the issue of multicollinearity. For diagnosis of multicollinearity, we used another test known as CI, which also confirmed the existence of multicollinearity.

Multicollinearity among biochemical traits may give incorrect and unreliable results since the explanatory variables are linearly correlated, and the inverse of the matrix will be singular (Carvalho et al., 1999). In this study, the traditional OLSE demonstrated wrong inference due to the severe issue of multicollinearity. Therefore, we used an alternative biased method (Liu estimator). Similar biased estimation techniques were also used by Carvalho et al., (1999), Bizeti et al., (2004), Wang et al., (2011) and Toebe et al., (2013) for different crops. These authors were focused on path analysis for estimating direct and indirect effects. However, the current study was used MLRM with OLSE and Liu estimator. The OLSE demonstrated the negative impact of chlorophyll 'a' (B = -29.53) (Table 2) while the Liu linear regression analysis showed highly positive effect (B = 73.25) (Table 4). Similar wrong sign problem results using biased estimation method were also obtained by Hoerl and Kennard (1970).

Another point which was focusable due to multicollinearity, total chlorophyll showed now a significant role in the SCY. In contrast, Boggs et al. (2003) showed that total chlorophyll plays a vital role, and it was confirmed by Liu linear regression. The effects of total chlorophyll and total soluble protein on SCY were changed from non-significant (p-value = 0.7695 and p-value = 0.0166) to highly significant (p-value = 0.0033 and p-value = 0.0093) by using Liu linear regression (Tables 2 and 3) since Liu estimator showed lowest standard errors of the estimates of total chlorophyll and total soluble protein. These results confirm that OLSE is inconsistent, and not reliable in the presence of a high degree of collinearity and these results correspond with the findings of Carvalho et al., (2001) and Toebe et al., (2013). A regression model gives better estimation if it has minimum MSE, AIC and SEs of the estimated parameters.

As we have seen that the Liu estimator have minimum MSE, AIC and SEs as compared to the OLSE in the MLRM. The lowest MSE and SEs results are also found by Liu (1993). The MSE, AIC, and SE were larger in the presence of multicollinearity. Based on the above discussion and results, we suggest that researchers give information about the degree of multicollinearity in the X'X matrix in research publications that use regression analysis. The information regarding ill-conditioning matrix has been discussed for soybean traits (Bizeti et al., 2004), seed yield components (Wang et al., 2011) and maize crop (Carvalho et al., 2001; Mohammadi et al., 2003; Toebe et al., 2013).

Furthermore, we suggest the use of MLRM with biased estimation method (Liu estimator) instead of traditional OLSE when agriculture researcher wants to estimate the effects of different factors, such as biochemical traits, seed yield components, cellulose contents, fiber quality traits and among others on the agriculture output with multicollinear factors.

Conclusions: The biochemical traits were significantly correlated with each other and had a severe effect on seed cotton yield. The condition index and correlation matrix indicated the problem of multicollinearity among biochemical traits. Traditional least square regression analysis in the presence of multicollinearity provided inadequate statistical inference regarding seed cotton yield. The Liu linear regression was an efficient approach in reducing the adverse effects of multicollinearity and more adequate than the ordinary least square method. Thus, the Liu linear regression is a more reliable approach for estimating the actual effects of biochemical traits on seed cotton yield.

Acknowledgements: We acknowledge the kind support of Dr. Mehboob-ur-Rahman (PP), Principal Scientist Plant Genomics group and Molecular Breeding NIBGE, and Dr. M. Yasin Ashraf (T.I), Deputy Chief Scientist NIAB for their assistance in the experimentation of the research activities of this research project. The authors are also thankful to the Editorial team and the anonymous reviewers for their careful reading and suggestions, which certainly improved the quality of the article.


Abid, M.A., W. Malik, A. Yasmeen, A. Qayyum, R. Zhang, C. Liang, and J. Ashraf (2016). Mode of inheritance for biochemical traits in genetically engineered cotton under water stress. AoB Plants, 8: 1-15.

Amin, M., A. Akbar, and M.A. Manzoor (2015). Fitting regression model with some assumptions for the identification of cotton yield factors. Pakistan. J. Life Soc. Sci. 13(2): 86-90.

Anonymous. (2017-18). Economic Survey of Pakistan. Ministry of Food and Agriculture Division (Economic Wing), Govt. Pakistan., Islamabad.

Ansari, M.D.S., R.K. Mahey, and S.S. Sidhu (1999). Cotton yield prediction through spectral parameters. J. Ind. Soc. Rem. Sens. 27(4): 185-192.

Arnon, D. I. (1949). Copper enzymes in isolated chloroplasts.are Polyphenoloxidase in Beta vulgaris. Plant physio. 24(1): 1-15.

Bizeti, H. S., C. G. P. D. Carvalho, J. R. P. D. Souza, and D. Destro (2004). Path analysis under multicollinearity in soybean. Braz. Arch. Biol. Technol. 47(5): 669-676.

Boggs, J.L., Dr.T.D. Tsegaye, T.L. Coleman, K.C. Reddy, and A. Fahsi (2003). Relationship between hyperspectral reflectance, soil nitrate-nitrogen, cotton leaf chlorophyll, and cotton yield: A step toward precision agriculture. J. Sustain. Agri. 22(3): 5-16.

Carvalho, C. G., R. Borsato, C. D. Cruz, and J. M. Viana (2001). Path analysis under multicollinearity in S0 x S0 maize hybrids. Crop Breeding and Applied Biotechnology, 1(3): 1-10.

Carvalho, S. P. D., C. D. Cruz, and C. G. P. D. Carvalho (1999). Estimating gain by use of a classic selection index under multicollinearity in wheat (Triticum aestivum). Genet. Molec. Biol. 22(1): 109-113.

Davies, B. H. (1976). Chemistry and biochemistry of plant pigments. Carot. 2: 38-165.

Hamayun, M., S. A. Khan., Z. K. Shinwari, A. L. Khan, N. Ahmad, and I.J. Lee (2010). Effect of polyethylene glycol induced drought stress on physiohormonal attributes of soybean. Pakistan. J. Bot. 42(2): 977-986.

Hoerl, A. E., and R. W. Kennard (1970). Ridge regression: applications to nonorthogonal problems. Techno. 12(1): 69-82.

Kibria, B. G. (2003). Performance of some new ridge regression estimators. Comm. Stat. Sim. Comput. 32(2):419-435.

Liu, K. (1993). A new class of biased estimate in linear regression. Commun. Stat. Theor. Meth. 22(2):393-402.

Lowry, O. H., N. J. Rosebrough, A. L. Farr, and R. J. Randall (1951). Protein measurement with the Folin phenol reagent. J. Biol. Chem. 193: 265-275.

Malik, W., M. S. A., Shah, M. A. Abid, G. Qanmber, E. Noor, A. Qayyum, and R. Zhang (2018). Genetic basis of variation for fiber quality and quality related biochemical traits in Bt and non-Bt colored cotton. Intl. J. Agric. Biol. 20: 2117-2124.

Mohammadi, S. A., B. M. Prasanna, and N. N. Singh (2003). Sequential path model for determining interrelationships among grain yield and related characters in maize. Crop Sci. 43(5): 1690-1697.

Qasim, M., M. Amin, and M. Amanullah (2018). On the performance of some new Liu parameters for the gamma regression model. J. Stat. Comput. Sim. 88(16):3065-3080.

Qasim, M., M. Amin, and T. Omer (2019a). Performance of some new Liu parameters for the linear regression model. Commun. Stat. Theor. Meth. 1-19.

Qasim, M., B.M.G., Kibria, K. Mansson, and P. Sjolander (2019b). A new Poisson Liu regression estimator: method and application. J. App. Stat. 1-14.

Rahman. M., I. Ullah, M. Ashraf, J.M. Stewart, and Y. Zafar (2008). Genotypic variation for drought tolerance in cotton. Agron. Sustain. (28): 439-447.

Riazi, A., K. Matsuda, and A. Arslan (1985). Water-stress induced changes in concentrations of proline and other solutes in growing regions of young barley leaves. J. Experi. Bot. 36(11): 1716-1725.

Sarwar, M.K.S., M.Y. Ashraf, M. Rahman, and Y. Zafar (2012). Genetic variability in different biochemical traits and their relationship with yield and yield parameters of cotton cultivars grown under water stress conditions. Pakistan. J. Bot. 44:515-520.

Song, S-L., W-Z. Guo, Z-G. Han, and T-Z. Zhang. (2005). Quantitative trait loci mapping of leaf morphological traits and chlorophyll content in cultivated tetraploid cotton. J. Integ. Plant Biol. 47(11): 13-82-1390.

Taiz, L. and E. Zeiger (2006). Plant Physiology, 4th Ed. Sinauer Associates Inc. Publishers, Massachusetts.

Toebe, M., and F. A. Cargnelutti (2013). Multicollinearity in path analysis of maize (Zea mays L.). J. Cereal Sci. 57(3): 453-462.

Wang, Q., T. Zhang, J. Cui, X. Wang, H. Zhou, J. Han, and R. Gislum (2011). Path and ridge regression analysis of seed yield and seed yield components of russian wildrye (psathyrostachys juncea nevski) under field conditions. PloS one 6(4): 1-10.

Yuan, S. N., W. Malik, N. Bibi, G. J. Wen, M. Ni, and X. D. Wang (2013). Modulation of morphological and biochemical traits using heterosis breeding in coloured cotton. The J. Agricult. Sci. 151(1): 57-71.
COPYRIGHT 2020 Knowledge Bylanes
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2020 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:M. Qasim, M. Amin and M. K. S. Sarwar
Publication:Journal of Animal and Plant Sciences
Geographic Code:9PAKI
Date:Dec 31, 2020

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |