Nonlinear models applied to seed germination of rhipsalis cereuscula haw (Cactaceae)/Modelos de regressao nao-linear aplicados a germinacao de sementes de rhipsalis cereuscula haw (Cactaceae).
Nonlinear regression is a statistical technique in which a nonlinear mathematical model describes the relationship of response variables to predictor variables. In general, a nonlinear model is [gamma] = [eta](t, [beta]) + e where [eta](t, [beta]) is a function with at least one nonlinear parameter, [beta] is a vector of p unknown parameters, t is the predictor variable and e is a random error with normal distribution, zero mean and variance [[sigma].sup.2] (e ~ N(0, [[sigma].sup.2])).
In seed germination studies, the function [eta](t, [beta]) represents the number or the proportion of germinated seeds obeying a growth curve. This component is deterministic and the usual equations representing the model are mathematical equations such as the asymptotic exponential, logistic, Gompertz and Weibull. Although such equations are empirical representations of the biological mechanism, they permit the biological interpretation of the parameters. Very often, otherwise, the Frequentist approach has been applied to model such type of data, and this application has biased the responses, which results in ineffectiveness of the estimates and therefore inappropriate conclusions.
Ratkowsky (1983, 1990) stood out that the nonlinear parameters do not require the same properties as the linear models. However, close to linear models have similar properties asymptotically, and therefore can be parameterized to respond as a linear one. The dissemination of nonlinear models has induced some authors to investigate tests to evaluate the degree of nonlinearity of a model. O'Brien (2008) reported some tests to detect spurious nonlinearity, and Pena and Rodriguez (2005) reported a method to verify both the presence of nonlinearity and the power of the nonlinear regression.
In terms of cumulative numbers of germinating seeds during the chronological time, the responses can be represented by asymptotic or sigmoid curves. Currently, the parameters have been usually estimated using Frequentist methods as the mean squares to conclude about the responses, but without reporting nonlinear and goodness of fit tests.
Time for radicle protrusion can be used as a random variable when the objective is to report the germination curve during a period of evaluation. Thus, the distribution of the time for seed germination can be reported by some probability distributions while the curve along the time represents the cumulative shape of the probability distribution. Hunter et al. (1984) reported the normal distribution to represent the curve of seed germination. Recently, seed technologists have noticed about the asymmetry in the time for seed germination after proposing other probability distribution. O'Neill et al. (2004) investigated the germination of Perennial ryegrass seeds and suggested the inverse normal as an alternative model to the Lognormal, Log logistic, and Weibull distributions. The parameter estimation as suggested by Hunter et al. (1984), Brain and Butler (1988) applying the maximum likelihood method, and the deviation from the inverse normal distribution was lower and therefore the best response. Soliman et al. (2006) also estimated parameters using the Frequentist and Bayesian methods from the Weibull model to investigate the time for failure of industrial machines, and reported more accurate estimates with the Bayesian than with the maximum likelihood method.
The majority of authors consider the error, e, a continuous random variable with normal distribution, zero mean and variance [[sigma].sup.2], e ~ N(0, [[sigma].sup.2]). In such cases and when the errors do not follow the normal distribution, the Bayesian methods have been applied to model nonlinear equations. De la Cruz-Mesia and Marshall (2003) applied a procedure for nonlinear errors following a continuous autoregressive process. They argued about the advantage of the process because of the application of additional information based on previous experiments, which is suitable for small samples that is not based on the asymptotic theory.
The Bayesian method has also been suggested to estimate growth curves to describe replicate measurements where every individual has the measures replicated over time. Thus, Blasco et al. (2003) studied the Bayesian analysis from the selection effect on rabbit growth curves, and Martins Filho et al. (2008) fitted a logistic model to growth data of two cultivars of common beans. We sought to investigate the Frequentist and the Bayesian approaches to fit data over the time necessary for seed germination using the Weibull distribution with three parameters.
Material and methods
Models for the responses of seed germination One seed is considered germinated just after radicle protrusion indicating the presence of a normal seedling capable of developing a normal plant under field conditions. Time for radicle protrusion t follows the Weibull distribution (WEIBULL, 1951):
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)
in which f denotes the density probability function, F is the cumulative distribution function of the random variable T and [theta] = (b, c) is the vector of parameters.
Modeling the percentage of seed germination over time t, Carneiro et al. (2000) and Carneiro (1994) suggested the following Weibull curve with three parameters:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)
M is the third parameter. The parameters of nonlinear models represent quantitative experimental responses and permit direct interpretation. For example, in the following model:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)
M is the maximum of seed germination (BROWN, 1987; BROWN; MAYER, 1988a and b; CARNEIRO, 1994, 1996; CARNEIRO; GUEDES, 1995), b is time to 63.21% of M and c is the spread over the time t (CARNEIRO, 1994, 1996;
CARNEIRO; GUEDES, 1995).
In this context, the nonlinear regression model for the total of seed germination for the time t is [y.sub.i] ([t.sub.i]) = F([t.sub.i], [theta]) + [e.sub.i], with [e.sub.i] ~ N(0, [[sigma].sub.e.sup.2]). The errors [e.sub.i] are non-correlated and [y.sub.i] |[theta], [[sigma].sup.2.sub.e] ~ N(F([t.sub.i], [theta]), [[sigma].sup.2.sub.e]), whose maximum likelihood function is:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (4)
The application of the frequentist analysis to nonlinear models requires prior knowledge of the model and data to suggest initial values of parameters to obtain the estimates using the iterative process. Computer routines use these initial values to determine in the parametric space the values to maximize the logarithm of the likelihood function or minimize the sum of squared errors.
The value of the parameter vector that minimizes the residual sum of squares has sampling distribution close to normal with the covariance matrix [[sigma].sup.2.sub.e] [(W'W).sup.-1] where W is the matrix n X p from the first derivatives of F ([t.sub.i], [theta]) in relation to [??]. The concern is on the right choice of the numerical algorithm to obtain the estimates. In this context, the R software has the Gauss-Newton routines, and the SAS Institute Inc. (2008) has several possibilities in the proc nlin as the Gauss, Marquardt, Newton, Gradient, and DUD, which is the default method that uses numerical estimates of the derivatives.
The statistical properties of nonlinear models, the responses to the estimation process and the quality of asymptotic inferences for finite samples are all due to the model curvature. The origins of measuring the linearization level using the curvature of a nonlinear function was introduced by Bates and Watts (1980, 1988), discussed by Ratkowsky (1983, 1990) and other authors. This curvature detains two components: the intrinsic curvature (IN) and the parameter curvature (PE). The parameter curvature indicates the nonlinearity due to the parameterization of the model. The intrinsic curvature measures the change in the nonlinear model whether the parameter values are somewhat modified (BATES; WATTS, 1980, 1988; RATKOWSKY, 1990). The lower the curvatures, the better is the validity of the asymptotic inference.
The lower the curvature, the better is the validity of the asymptotic inference. Bates and Watts (1980, 1988) suggested using a significant level [alpha], the limit of:
1/2[square root of ([F.sub.1-[alpha]] (p, n - p))] (5)
to test both curvatures, where F is the quantiles of F distribution of Snedecor with p and n-p degrees of freedom, p is the number of parameters and n is the sample size. Another important measurement to diagnose the nonlinearity is the bias of Box, which helps to identify the parameter responsible for the excess of curvature. Ratkowsky (1983) suggested the limit of 1% of relative bias, or the absolute value of the bias quotient to the parameter estimate. These estimates can be achieved by the algorithm from proc iml in the SAS Institute Inc. (2008) (SOUZA, 1998).
The parameters were estimated by the proc nlin in SAS Institute Inc. (2008), and the presence of normal errors was checked by the proc univariate. The quality of the estimates was verified by designing a software in the proc iml following Souza (1998) recommendation.
In Bayesian inference, the researcher can combine prior information, which is called as a prior distribution. These types of information are obtained from previous studies carried out with the same experiment or from sampling data. Otherwise, they can be vague, but in both cases a probability density function must be expressed for every parameter in the model. Usually, this information is expressed by the likelihood function, which means the pool of the density function from the observations conditioned in the parameters. Based on the Bayes theorem, the a prior function is combined with the sampling information by multiplying the a priori density function versus the likelihood, and the product is a function on the parametric space. Based on the a prior choice for the parameters [theta] = ([M, b, c)] of the Weibull model , we suppose that the a prior density function is a product from two density functions, [pi]([theta], [[sigma].sup.2]) = [pi]([theta])[pi]([[sigma].sup.2] 0 [member of] [R.sup.3], [[sigma].sup.2] > 0.
De la Cruz-Mesia and Marshall (2003) suggested for nonlinear models the following a prior distributions in the parameter vector and random error:
[theta] ~ [N.sub.3] ([[mu].sub.0], [[SIGMA].sub.0])
with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [[sigma].sup.2] ~ IG([a.sub.1], [a.sub.2]), (6)
[N.sub.3] denotes a normal tridimensional distribution, and IG is the gamma inverse distribution. Although the specification of the hyper parameter [[mu].sub.0], [[SIGMA].sub.0], [a.sub.1] and [a.sub.2] can be difficult, non-informative a prior distributions can be used as values for these hyper parameters. In the current proposition we will consider five different a prior non-informative distributions for the vector of parameters [theta] = (M, b, c):
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)
[mu] will be estimated by the sampling mean and [L.sub.sup] will be 10 for M, 100 for b, 1,000 for c. The best model will be compared by the DIC values for every parameter (Deviance Information criterion (SPIEGELHALTER et al., 2002). Thus the models considered were:
Model [M.sub.1] - Assuming non-informative gamma distribution as a prior for all parameters M, b, c ~ gamma ([10.sup.3], [10.sup.3]).
Model [M.sub.2] - Assuming non-informative truncated normal distributions as a prior for all the parameters: M, b, c ~ normal [(0, [10.sup.6]).sub.(0+[infinity])].
Model [M.sub.3] - Assuming non-informative uniform distributions a prior for all the parameters: M, b, c ~ uniform (0, [L.sub.sup]), [L.sub.sup] = 100 for M, 1,000 for b, and 10 for c.
Model [M.sub.4] - Assuming non-informative exponential distributions as a prior for all the parameters: M, b, c ~ exp ([mu]), with [mu] mean estimated by the frequentist method.
Model Ms- Assuming non-informative lognormal distributions as a prior for all the parameters: M, b, c ~ lognormal (m, 106), with p mean estimated by the frequentist method.
Supposing that the Weibull model can describe the data, 200,000 values will be generated for each chain, with a burning period of 1,000. The final sampling will be composed of values selected with jumps of 20, which means a sample size of 10,000. The chain convergence will be verified by the CODA Software (BEST et al., 1995), and Heidelberger and Welch (1983) criteria. The a posteriori marginal distribution for all the parameters will be obtained by the BRugs software (SPIEGELHALTER et al., 1994) available in the R software.
The comparison of methods was illustrated from seed germination of Rhipsalis cereuscula Haw (Cactaceae) growing attached to the trees in the Inga Yard Conservation Reserve at Maringa town, Parana State, Brazil. The seeds were manually collected from various fruit, manually extracted, and dried in the shade under environmental light and temperature in the Seed Laboratory of the Universidade Estadual de Maringa, Experimental Research Farm, at Iguatemi County, Parana State, Brazil. Dried seeds were stored in open plastic containers. One hundred seeds were germinated on three germitest papers using plastic box measuring (11 x 11 x 5 cm) in the seed germinator Mangelsdorf protected by a germination room, both maintained at 20[degrees]C. The data were collected at 8h intervals, and every seed with the protrusion of the hypocotyl-radicle was counted as germinated.
The model was fit to the number of germinated seeds, and the analyses were based on the frequentist and Bayesian approaches conceiving the nonlinear model of Weibull with three parameters for describing the seed germination curve. The logarithm function of the likelihood is described in  and .
Results and discussion
The hypothesis of normal errors was verified using the tests of Kolmogorov-Smirnov, Cramervon Mises and Anderson-Darling whose p-values were higher than 10%.
Evaluating the results after fitting the Weibull model, the maximum of germinated seeds was about 31, in which 63.21% germinated in 429h with the spread of 4.64 (Table 1). The diagnostic of the fitting quality was based on curvature measures, bias of Box, and relative bias (Table 1) (RATKOWSKY, 1983, 1990).
Considering the relative bias of Box (Table 1), the parameter estimates of M and c were higher than 1%. Therefore, they are the most nonlinear parameters in this model. The limit for the curvatures are 0.8226 with p = 3 and n = 92 at 5% probability. Therefore, IN and PE were lower than 0.8226 validating the process of asymptotic inference.
The following are estimates: mean, standard error and Icr (95%), i.e., the interval with 95% reliability respectively for parameters of the Weibull model for the germination of Rhipsalis cereuscula Haw for every model.
Regarding the responses in Table 2, the maximum of seed germination was about 31 and 63.21% of the seed protrusion required 429h. Therefore, the estimates from the Frequentist are similar to the Bayesian method whether the vector of parameter has non-informative a prior distributions. To fit the models [M.sub.4] and [M.sub.5] we made use of [mu] estimated by the frequentist method, which may be a disadvantage of both models to the [M.sub.1] and [M.sub.3]. In the same Table 2, the model [M.sub.2] has the higher value of DIC indicating the worst fitting quality among all the five models investigated.
The frequentist method required the initial parameter values in the proc nlin of SAS Institute Inc. (2008), and further analysis to verify the goodness of fit of the model. In contrast, the Bayesian method required only a prior distribution, which can be non-informative for the vector of parameter. Considering the responses, except the model with a prior normal distribution, the other estimates are similar, but using less computational efforts. Another advantage of the Bayesian is the possibility of modeling several a prior distributions using the DIC by comparison.
The result of the modeling using the Bayesian approach corroborates the findings of De la Cruz-Mesia and Marshall (2003), who suggest that this is a worthwhile procedure, as well as adding prior information based on the experience of the researcher is also suitable for small samples, because it is not based on the asymptotic theory. We observed also that the results from this seed germination experiment agree with the findings of Soliman et al. (2006) when they compared the parameters of the Weibull model estimated by Bayesian and frequentist methods. Bayes estimates obtained from this model, as concluded Soliman et al. (2006), have more accuracy than the corresponding estimated by maximum likelihood method.
Modeling germinating seeds over time to radicle protrusion can be done using the frequentist and Bayesian approaches because both provide close estimates, but the Bayesian inference required less computational efforts. Considering the responses from these five models, the less appropriate was the normal model with zero mean and variance 106.
The authors are grateful to Alan C. Secorun for providing the data set.
BATES, D. M.; WATTS, D. G. Relative curvature measures of nonlinearity. Journal of the Royal Statistics Society B, v. 42, n. 1, p. 1-25, 1980.
BATES, D. M.; WATTS, D. G. Non linear regression analysis and its applications. New York: John Willey and Sons, 1988.
BEST, N. G.; COWLES, M. K.; VINES, S. K. CODA: Convergence diagnostics and output analysis software for Gibbs sampler output. Version 0.3. Cambridge: MRC Biostatistics Unit, 1995.
BLASCO, A.; PILES, M.; VARONA, L. A Bayesian analysis of the effect of selection for growth rate on growth curves in rabbits. Genetics Selection Evolution, v. 35, n. 1, p. 21-41, 2003.
BRAIN, P.; BUTLER, R. C. Cumulative count data. Genstat Newsletter, v. 22, n. 1, p. 38-47, 1988.
BROWN, R. F. Germination of Aristida armata under constant and alternating temperatures and its analysis with cumulative Weibull distribution as a model. Australian Journal of Botany, v. 35, n. 5, p. 581-591, 1987.
BROWN, R. R.; MAYER, D. G. Representing cumulative germination 1. A critical analysis of single-value germination indices. Annals of Botany, v. 61, n. 2, p. 117-125, 1988a.
BROWN, R. R.; MAYER, D. G. Representing cumulative germination 2. The use of the Weibull functions and others empirically derived curves. Annals of Botany, v. 61, n. 2, p. 127-138, 1988b.
CARNEIRO, J. W. P. Avaliacao do desempenho germinativo de acordo com os parametros da funcao de distribuicao de WeibulI. Informativo Abrates, v. 4, n. 2, p. 75-83, 1994.
CARNEIRO, J. W. P. Determinacao do numero de sementes para avaliar o desempenho germinativo de sementes de Stevia rebaudiana Bertoni. Revista Brasileira de Sementes, v. 18, n. 1, p. 1-5, 1996.
CARNEIRO, J. W. P.; GUEDES, T. A. Influencia do estresse termico no desempenho germinativo de sementes de Stevia Rebaudiana Bertoni avaliado pela funcao de distribuicao de Weibull. Revista Brasileira de Sementes, v. 17, n. 2, p. 210-216, 1995.
CARNEIRO, J. W. P.; GUEDES, T. A.; AMARAL, D.; BRACCINI, A. L. Analise exploratoria de percentuais germinativos obtidos com o envelhecimento artificial de sementes. Revista Brasileira de Sementes, v. 22, n. 2, p. 215-222, 2000.
DE LA CRUZ-MESIA, R.; MARSHALL, G. A Bayesian approach for nonlinear regression models with continuos errors. Communications in Statistics: Theory as Methods, v. 32, n. 8, p. 1631-1646, 2003.
HEIDELBERGER, P.; WELCH, P. Simulation run length control in the presence of an initial transient. Operations Research, v. 31, n. 6, p. 1109-1144, 1983.
HUNTER, E. A.; GLASBEY, C. A.; NAYLOR, R. E. L. The analysis of data from germination tests. Journal of Agricultural Science, v. 102, n. 1, p. 207-213, 1984.
MARTINS FILHO, S.; SILVA, F. F.; CARNEIRO, A. P. S.; MUNIS, J. A. Abordagem Bayesiana das curvas de crescimento de duas cultivares de feijoeiro. Ciencia Rural, v. 38 n. 6, p. 1516-1521, 2008.
O'BRIEN, E. J. A. Note on spurious nonlinear regression. Economics Letters, v. 100, n. 3, p. 366-368 2008.
O'NEILL, M. E.; THOMSON, P. C.; JACOBS, B. C.; BRAIN, P.; BUTLER, R. C.; TURNER, H.; MITAKDA, B. Fitting and comparing seed germination models with a focus on the inverse normal distribuition. Australian and New Zealand Journal of Statistics, v. 46, n. 3, p. 349-366, 2004.
PENA, D.; RODRIGUEZ, J. Detecting nonlinearity in time series by model selection criteria. International Journal of Forecasting, v. 21, n. 4, p. 731-748, 2005.
RATKOWSKY, D. Nonlinear regression modeling. New York/Basel: Marcel Dekker, 1983.
RATKOWSKY, D. Handbook of nonlinear regression models. New York/Basel: Marcel Dekker, 1990.
SAS Institute Inc. User's guide. Version 9.2. Cary: SAS Institute Inc., 2008.
SOLIMAN, A. A.; ELLAH, A. H. A.; SULTAN, K. S. Comparison of estimates using record statistics from Weibull model: Bayesian and non-Bayesian approaches. Computational Statistics and Data Analysis, v. 51, n. 3, p. 2065-2077, 2006.
SOUZA, G. S. Introducao aos modelos de regressao linear e nao-linear. Brasilia: Embrapa-SPEEmbrapa-SEA, 1998.
SPIEGELHALTER, D. J.; THOMAS, A.; BEST, N.; GILKS, W. BUGS - Bayesian inference using gibbs sampling. Version 1.4.2. Cambridge: MRC Bioestatistics Unit, 1994.
SPIEGELHALTER, D. J.; BEST, N. G.; VAN DER LINDE, A. Bayesian measures of model complexity and fit. Journal of the Royal of Statistical Society: Series B, v. 64, n. 4, p. 583-639, 2002.
WEIBULL, W. A statistical distribution function of wide applicability. Journal of Applied Mechanics, v. 18, n. 1, p. 293-297, 1951.
Received on June 18, 2013.
Accepted on January 13, 2014.
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Terezinha Aparecida Guedes (1) *, Robson Marcelo Rossi (1), Ana Beatriz Tozzo Martins (1), Vanderly Janeiro (1) and Jose Walter Pedroza Carneiro (2)
(1) Departamento de Estatistica, Universidade Estadual de Maringa, Av. Colombo, 5790, 87020-900, Maringa, Parana, Brazil. (2) Departamento de Agronomia, Universidade Estadual de Maringa, Maringa, Parana, Brazil. * Author for correspondence. E-mail: firstname.lastname@example.org
Table 1. Frequentist estimates of Weibull parameters, and qualitative measurements of goodness of fit of data describing the seed germination of Rhipsalis cereuscula Haw at 20 C. Parameter Estimate Standard CI (95%) Amplitude Error M 30.862 0.631 (29.609; 32.116) 2.507 b 429.011 6.333 (416.427; 441.594) 25.167 c 4.639 0.324 (3.994; 5.284) 1.290 IN--Intrinsic Curvature 0.1328 PE--Parameter Curvature 0.2218 Parameter Bias of Relative Box bias M 0.0094 0.0303 b -0.0088 0.0020 c 0.0249 0.5369 Table 2. Bayesian estimates of parameters from the Weibull model to describe the seed germination of Rhipsalis cereuscula Haw at 200C. Models Parameter Mean Standard Error ICr(95%) [M.sub.1] M 30.900 0.706 (29.560; 32.330) b 429.300 7.291 (415.600; 444.300) c 4.665 0.347 (4.031; 5.393) [sigma] 2.823 0.214 (2.447; 3.279) [M.sub.2] M 29.540 0.626 (28.300; 30.740) b 409.700 6.778 (396.100; 422.700) c 5.224 0.442 (4.463; 6.193) [sigma] 2.958 0.243 (2.534; 3.479) [M.sub.3] M 30.880 0.694 (29.560; 32.300) b 429.200 7.319 (415.200; 444.300) c 4.672 0.351 (4.036; 5.415) [sigma] 2.824 0.212 (2.452; 3.276) [M.sub.4] M 30.870 0.702 (29.560; 32.350) b 429.100 7.328 (415.500; 444.000) c 4.661 0.345 (4.018; 5.398) [sigma] 2.824 0.213 (2.436; 3.273) [M.sub.5] M 30.890 0.710 (29.570; 32.360) b 429.200 7.436 (415.500; 444.500) c 4.655 0.352 (4.022; 5.389) [sigma] 2.819 0.213 (2.435; 3.290) Models Parameter Range DIC [M.sub.1] M 2.770 455.6 b 27.700 c 1.362 [sigma] 0.832 [M.sub.2] M 2.440 463.8 b 26.600 c 1.730 [sigma] 0.945 [M.sub.3] M 2.740 455.6 b 29.100 c 1.379 [sigma] 0.824 [M.sub.4] M 2.790 455.6 b 28.500 c 1.380 [sigma] 0.837 [M.sub.5] M 2.790 455.6 b 29.000 c 1.367 [sigma] 0.855