# Selected estimated models with [empty set]-divergence statistics.

Abstract

When testing for discriminating between two competing models, a statistical method, usually, proceeds by evaluating the measure for discrepancy between the observed data and each parametric model. The parameter model with smaller value of measure statistic is generally chosen. This paper addresses the question of testing for choosing between two estimated models using some [empty set]-Divergence type statistics. We choice for arbitrary pn -asymptotically normal estimators to be used for introducing these statistics. The results here are illustrated by a simulation study, then Large Sample theory and bootstrap methods are used to construct our [empty set]-divergence tests in parametric models.

AMS Subject Classification: 62F03, 62F40, 62F05, 94A17.

Keywords: Asymptotic distributions, [empty set]-Divergence statistics, bootstrap methods, testing statistical hypotheses, test goodness fit.

1. Introduction

Cochran [6], Watson [34] and Moore [17] [18] have provided comprehensive surveys on Pearson chi-square type statistics, i.e., quadratic forms in the cell frequencies. Recently, Andrews [2], [3] has extended the Pearson chi-square testing method to non-dynamic parametric models, i.e., to models with covariates. Because Pearson chi-square statistics provide natural measures for the discrepancy between the observed data and a specific parametric model, they have also been used for discriminating among competing models. Such a situation is frequent in Social Sciences where many competing models are proposed to fit a given sample. A well know difficulty is that each chi-square statistic tends to become large without an increase in its degrees of freedom as the sample size increases. As a consequence goodness-of-fit tests based on Pearson type chi-square statistics will generally reject the correct specification of every competing model.

To circumvent such a difficulty, a popular method for model selection, which is similar to use of Akaike [1] Information Criterion (AIC), consists in considering that the lower the chi-square statistic, the better is the model.

The preceding selection rule, however, is not entirely satisfactory. Since chi-square statistics depend on the sample and are therefore random, their actual values are subject to statistical variations, we shall propose some convenient asymptotically standard normal tests for model selection based on [empty set]-Divergence type statistics. By analogy with the approach introduced by Vuong [32], our tests are testing the null hypothesis that the competing models are as close to the data generating process (DGP) where closeness of a model is measured according to the discrepancy implicit in the [empty set]-Divergence type statistics.

Following Morales and Pardo [21], let [P.sub.[theta]] : [theta] [member of] [THETA] be a family of probability measures on a measurable space (X, [beta]x) with open [THETA] [subset] [[??].sup.d], d [greater than or equal to] 1. Measures [P.sub.[theta]] are described by probability density functions (p.d.f.) [f.sub.[theta]](x) = d[P.sub.[theta]]/d[micro]] (x) with respect to a dominating [sigma]-finite measure [micro] on X. Sample space, X, is the support of [sigma]-finite measure [micro]. Statistical model, ((X, [beta]x), {[sub.[theta]] : [theta] [member of] x, [micro]), satisfies the regularity assumptions (R1)-(R3) appearing in pages 144-145 of Serfling [27] and the identifiability condition : (R4) if [f.sub.[[theta].sub.1]] = [f.sub.[[theta].sub.2]], then [[theta].sub.1] = [[theta].sub.2].

If [[theta].sub.0] is the true value of the parameter [theta] and ([R.sub.1])-(R4) holds, then there exist a strongly consistent sequence [[??].sub.n] of roots of the likelihood equations such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], (1.1)

where IF ([[theta].sub.0]) is the Fisher information matrix and [[??].sub.n] is assumed to be the maximum likelihood estimator (MLE).

We consider testing procedures based on a sequence of observations [X.sub.n] = ([X.sub.1], [X.sub.2],..., [X.sub.n]) with independent components taken from a p.d.f of the family [f.sub.[theta]] : [theta] [member of] [THETA].

Recently, in the literature, many papers appeared where divergence or type measures of information have been used in testing statistical hypothesis. We refer, among others, to Cressie and Read [8], Nayak [22], Zografos, Ferentinos and Papaioannou [33] Salicru, Morales, Menendez and and Pardo [23], Bar-Hen and Daudin [5] and references therein. Salicru et al. [26] introduced the divergence statistics [S.sub.[empty set],n] [equivalent to] 2n[C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1.2)

is the [empty set]-divergence of density from the family [f.sub.[theta]] : [theta] [member of] [THETA] introduced by Csiszar [9]. Liese and Vajda [15] have introduced a systematic theory of these divergences.

Morales et al. [16] have established that the asymptotic distribution of [S.sub.[empty set],n] [??] [[chi square].sub.d]. An important problem is to propose some divergences statistics for procedure tests.

The asymptotic behavior of the statistics based on [C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) is needed for choosing between two estimated models. In order to suggest a testing procedure, we present a new method in association with the divergence statistic.

The paper is organized as follows. Section 2 introduces the basic notations and defines a class of asymptotically normal estimators. In section 3, we investigate the model selection problem based on divergence type statistics. A large sample test is proposed. In section 4, Efron [10] bootsrap method is used to propose alternative and simpler testing procedures for model selection. Section 5, some simulation results are given. Section 6 concludes the paper and mentions some extensions.

2. Assumption and Asymptotic Behavior of the Divergence Statistic

Assumption (A1):

(i) The function [empty set] : [0, +[infinity][[right arrow]] -[infinity], +[infinity][ is convex and continuous. Its restriction on [0, +[infinity][ is finite, twice continuously differentiable, with [empty set](1) = [empty set]'(1) = 0 and [empty set]"(1) = 1;

(ii) Each [[theta].sub.0] [member of] [THETA] has an open neighborhood V ([[theta].sub.0]) and 1 [less than or equal to] i, j [less than or equal to] d, it holds:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

condition (i) deals with properties of [empty set]-divergence (cf. Liese and Vajda [15]).

Condition (ii) is needed to apply delta method for obtaining asymptotic distributions of [empty set]-statistics. Conditions sufficient for (ii) are presented in Morales et al. [19].

Assume that ([R.sub.1])-([R.sub.4]) and A1 hold. Under [H.sub.o] : [theta] [member of] [[THETA].sub.o] [subset] [THETA], we present the asymptotic distribution of [C.sub.[empty set]]([[??].sub.n], [[theta].sub.o]).

Theorem 2.1. Let the model and [empty set] satisfy (R1)-(R4) and (A1) respectively. Let [theta] bethe true parameter, with [theta] [not equal to] [[theta].sub.o]. Then we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].

where [[summation].sup.2.sub.[empty set]][theta], [[theta].sub.o]) = [AI.sub.F][([theta]).sup.-1] [A.sup.t] and A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) with [DELTA] = [partial derivative]/[partial derivative][[theta].sub.1],..., [partial derivative]/[partial derivative][[theta].sub.d].

Proof. A first order Taylor expansion gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

As

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

it is clear that the random variables, [square root of n][[C.sub.[empty set]]([[??].sub.n], [[theta].sub.o])-[C.sub.[empty set]]([theta], [[theta].sub.o])] and A[square root of n][([[??].sub.n]-[theta]).sup.t] have the same asymptotic distribution, because

[square root of n] o([parallel][[??].sub.n] - [theta][parallel]) = [o.sub.p](1)

3. Selecting Estimated Models

As we mentioned earlier, the type divergences statistics can be used to discriminate among alternative models.

Let h be the true probability density of the observations [X.sub.n] = ([X.sub.1],..., [X.sub.n]). We consider a specified model [F.sub.[theta]] = {F(.|[theta]); [theta] [member of] [THETA] [subset] [[??].sup.k} with [f.sub.[theta]](x) as the probability density function. Therefore, we define the discrepancy between the observations and the model [F.sub.[theta]] as following:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Of special interests to us is the situation in which a researcher has two competing parametric models [F.sub.[theta]] and [G.sub.[gamma]] = {G(.|[gamma]); [gamma] [member of] [GAMMA] [subset] [[??].sup.k]}, select the better of the two models based on their general discrimination statistics [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) and [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [g.sub.[[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1).

Definition 3.1. (Equivalent, Better and Worse) Consider two competing models [F.sub.[theta]] and [G.sub.[gamma]] and some discrimination type statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1). Let D(x) be the probability limit of [square root of n][D.sub.n](x).

The hypotheses

[H.sub.o ]: D([[theta].sub.o]) = D([[gamma].sub.o])

[H.sub.f] : D([[theta].sub.o]) < D([[gamma].sub.o])

[H.sub.g] : D([[theta].sub.o]) > D([[gamma].sub.o])

mean that the estimated models F(x|[[theta].sub.o]) and G(x|[[gamma].sub.o]) are equivalent, that F(x|[[theta].sub.o]) is better than G(x|[[gamma].sub.o]), and that F(x|[[theta].sub.o]) is worse than G(x|[[gamma].sub.o]), respectively.

Definition (3.1) calls for some remarks. First, in does not require that the same divergence type statistics be used in forming [D.sub.n]([[theta].sub.n]) and [D.sub.n]([[gamma].sub.n]). Choosing, however, different discrepancies for evaluating competing models is hardly justified. Second and more importantly, it allows estimators other than the matching divergence estimators to be used.

In any case, since [[??].sub.n], [[??].sub.n] are consistent estimators of [[theta].sub.o] and [[gamma].sub.o] by condition (1), we can use, from theorem 3.1, [square root of n]{[C.sub.[empty set](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [g.sub.[[??].sub.n]])} to consistently estimate the indicator [C.sub.[empty set]](h; [f.sub.[[theta].sub.o]]) - [C.sub.[empty set]](h, [g.sub.[[gamma].sub.o]]) which will be zero under the null hypothesis Ho. Using a standard Taylor expansion, we can obtain the asymptotic distribution of [square root of n]{[C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [[??].sub.n])}, which is normal with zero mean and variance [[[omega].sup.2] under [H.sub.o]. The detailed derivation and the expression for for [[omega].sup.2] can be found in the proof of the theorem (3.2).

Hence we define the statistic

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (3.1)

where [[??].sup.2] is a consistent estimator of [[omega].sup.2].(DI stands for Divergence Indicator).

We have,

Theorem 3.2. (Asymptotic Distribution of DI Statistic)Given H1-H4, then

(i) under the null hypothesis [H.sub.o], [DI.sub.n] [right arrow] N(0, 1) in distribution

(ii) under the alternative [H.sub.f], [DI.sub.n] [right arow] -[infinity] in probability,

(iii) under the alternative [H.sub.g], [DI.sub.n] [right arrow] +[infinity] in probability.

Proof.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

By difference, it follows that:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

From the multivariate central limit theorem and assumption (A1), we can now immediately obtain the asymptotic distribution of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

under the null hypothesis of equivalence [H.sub.o].

Define:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

with A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) and B = [DELTA][C.sub.[empty set]]([gamma], [[gamma].sub.o])

Let [[omega].sup.2] = T [LAMBDA] [T.sup.t], we then have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Remark 3.3. One can note that there are some important measures of divergence which can not be written as [empty set]-divergence; for instance, the divergence measures given by Battacharyya, Sharma-Mittal and Reyni. However, such measures can be written in the following form:

[C.sub.[empty set],h]([[theta].sub.1], [[theta].sub.2]) = h([C.sub.[empty set]]([[theta].sub.1], [[theta].sub.2]))

where h is a differentiable increasing function mapping from [0, +[infinity][ onto [0,+[infinity][, with h(0) = 0 and h'(x) > 0.

We present these divergence measures, in the following table.

Theorem (3.2) is quite general and gives us a wide variety of asymptotic standard normal tests for model selection based on divergence type statistics. Part (ii) and (iii) also implies that the test is consistent. In the next section, we detail the testing procedures based on Theorem (3.2) by using bootstrap methods.

4. Bootstrap Methods

Implementation of the model selection procedure proposed in section 3 requires the following computations:

(i) Estimation of the parameters [[??].sub.n] and [[??].sub.n],

(ii) Computation of the two divergences statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) and the difference [[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sub.n]) - [D.sub.n]([[??].sub.n])],

(iii) Computation of the variance [[??].sup.2] of [[??].sub.n] and finally, computation of [DI.sub.n] [equivalent to] [[??].sub.n]/[??]

Specifically, we carry out the following steps:

1) Let [F.sub.n] be the empirical probability distribution of the original data [x.sub.1], [x.sub.2], ... , [x.sub.n] i.e., [F.sub.n] : mass 1/n at [x.sub.i], (i = 1, 2, ... , n):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Then draw an i.i.d "bootstrap sample" [x.sup.*.sub.1], [x.sup.*.sub.2], ... , [x.sup.*.sub.n] from [F.sub.n], i.e., draw [x.sup.*.sub.i] randomly with replacement from the observed values [x.sub.1], [x.sub.2], ... , [x.sub.n],

2) Using this bootstrap sample [x.sup.*.sub.i], estimate the competing models to obtain [[theta].sup.*.sub.n] and [[gamma].sup.*.sub.n]. Then calculate the statistic

[[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sup.*.sub.n]) - [D.sub.n]([[??].sup*.sub.n])]

3) Independently repeat steps 1 and 2 a large number of times S, say S=1000. Obtain "bootstrap replications" [[??].sub.n.sup.*]1, [[??].sup.*2.sub.n], ... , [[??].sup.*S.sub.n], and compute the sample variance of {[[??].sup.*j.sub.n], j = 1,..., S}.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

where [bar.B] = 1/S [S.summation over (j=1)] [[??].sup.*j.sub.n] is the average of "bootstrap replications".

Once the bootstrap variance [[??].sup.2.sub.*] is obtained, the test statistic [DI.sub.n] is calculated easily using the initial estimates [[??].sub.n] and [[??].sub.n]. Under suitable regularity conditions and for a large number of replications [10], [[??].sup.2.sub.*] is a consistent estimator of [[omega].sup.2].

Thus, from theorem 3.2, a testing procedure for model selection can be based on the comparison of the value of [DI.sub.n] to critical values from a standard normal table. For example, at 5% significance level, we compare [DI.sub.n] with -1.96 and 1.96. If [DI.sub.n] falls between -1.96 and 1.96, we conclude that both estimated models fit the data equally well. If [DI.sub.n] is less than -1.96 (or larger than 1.96), then we reject the null hypothesis in favor of the alternative hypothesis that the estimated model F(x|[[??].sub.n]) (or G(x|[[??].sub.n])) is closer to the true distribution.

Although using the bootstrap method to obtain an estimate of [[omega].sup.2], the basic justification of the preceding testing comes from the asymptotic properties obtained in Theorem 3.2.

5. Numerical Study

We present briefly the basic assumptions on the model and parameter estimators, and we define our general divergence type statistics. Assumption (A2): The observed data [X.sub.i], i = 1,..., are independent and are identically distributed (iid) with some common true distribution H.

The sample space X is partitioned into M mutually disjoint fixed cells [C.sub.1], [C.sub.2], ... , [C.sub.M]. Let n be the sample size. Corresponding to the partition [C.sub.1], [C.sub.2], ... , [C.sub.M] we can compute the vector of observed cell probabilities

f = [([f.sub.1], [f.sub.2], f ... , [f.sub.M]).sup.t] where [f.sub.i] = 1/n [n.summation over (j=1)] [I.sub.[C.sub.i]]([X.sub.j]), for i = 1, 2, ... , M. (5.1)

and [I.sub.[C.sub.i]]([X.sub.j]) is the indicator function:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].

Let a specified model be [H.sub.[theta]] = {H(x|[theta]), [theta] [member of] [THETA] [subset] [[??].sup.d]} and denote the vector of its predicted cell probabilities by:

h([theta]) = ([h.sub.1]([theta]), [h.sub.2]([theta]), ... , [h.sub.M][([theta])).sup.t] where [h.sub.i]([theta]) = [[integral].sub.[C.sub.i]] dH(x|[theta])

where H(x|[theta]) is joint distribution for [X.sub.i].

We suppose [h.sub.i]([theta]) > 0 and [h.sub.i]([theta]) is continuously differentiable (Assumption A1) for every i = 1, 2,..., M.

To illustrate the model selection procedure in the preceding section, we consider an example. We need to define the competing models, and the divergence type statistic to measure the departure of each proposed parametric model from the data generating process.

Here, we choose an important measure of divergence given by Renyi [25] which can be written in following form :

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and limiting cases for [alpha] = 0 and [alpha] = 1. That is,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

which is Kullback-Leibler divergence.

In case that [f.sub.[[theta].sub.1]] and [f.sub.[[theta].sub.2] are discrete probability distributions, their Renyi's divergence is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (5.2)

In statistical literature, the problem of choosing between the family of log-normal distributions and the family of exponential distributions has a long history. See [7] and [4] among others.

The log-normal distribution is parameterized by r = ([r.sub.1], [r.sub.2]) and has density

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

otherwise.

The exponential distribution with parameter [beta] has density

g(x|[beta]) = 1/[beta] exp (-x/[beta]) for x > 0

and zero otherwise.

The estimator used for each competing model is the maximum likelihood estimator (MLE). Specifically, for the log-normal model,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

For the exponential model, the MLE is the sample average, i.e.,

[??] = 1/n [n.summation over (i=1)] [x.sub.i]

Lastly, we use the Renyi's divergence measure (5) to evaluate the discrepancy of a proposed model from the true data generating process. We partition the real line into M intervals {([a.sub.i-1], [a.sub.i]), i = 1, ... , M} where [a.sub.i] is a real number. The choice of the cells is discussed below. The Renyi statistic for the log-normal and exponential models are:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

where [h.sub.i](r) and [g.sub.i]([beta]) are probabilities of the interval ([a.sub.i-1], [a.sub.i]) under h(x|r) and g(x|[beta]) respectively, and f is the vector of observed cell probabilities defined in (4).

In our Monte Carlo study, we consider various sets of experiments in which the data are generated from a mixture of an exponential distribution and a log-normal distribution. These two distributions are calibrated so that have the same population means and variances, namely one and one. Hence the data generating process has density

d(p) = p Exponential (1) + (1 - p) Log-normal (-0.047, 0.5)

where p is set to some specific value for each set of experiments. In each set of experiments, several random samples are drawn from this mixture of distributions. The sample size varies from 100 to 1,000, and each sample size the number of replications is 1,000.

Throughout, the chosen partition has, four cells defined by the values [a.sub.0] = 1.0, [a.sub.1] = 1.5, [a.sub.2] = 2.0, [a.sub.3] = 3.0, and [a.sub.4] = +1. Similarly to the minimum Chi-square methods, note that because the log-normal distribution has two parameters, hence four is the minimum number of cells for which a perfect fit is not always achieved. Note also that the shapes of the log-normal and exponential densities differ greatly around the origin. This motivates the choice of [a.sub.0] = 1.0. The value [alpha] = 0.5 in (5) corresponds, approximatively, to the common density function in [1, +[infinity][ under the null hypothesis [H.sub.o] (see figure 1-c).

[FIGURE 1 OMITTED]

We choose five different values for p which are: 0.00, 0.25, 0.41, 0.75 and 1.00. Although our proposed model selection procedure does not require that the data generating process belong to either of the competing models, we consider the two limiting cases p = 0.00 and p = 1.00 for they correspond to the correctly specified cases.

The value p = 0.410 is determined to be the value for which the estimated log-normal distribution and the estimated exponential distribution are approximatively at equal distance from the mixture d(p) according to Renyi's divergence. Thus this set of experiments corresponds approximatively to the null hypothesis of our proposed model selection test [DI.sub.n].

The results of our four sets of experiments are presented in Tables 1-5. The first half of each table gives the average values of the ML estimators [??], [[??].sub.1], and [[??].sub.2], the divergence goodness-of-fit statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]), and the model selection [DI.sub.n] with its bootstrap estimated variance [[??].sup.2.sub.*]. The values in parentheses are standard errors. The second half of each table gives in percentage the number of times our proposed model selection procedures based on the method described in the previous section, favor the log-normal model, the exponential model, or are indecisive. The tests are conducted at the 5% nominal level.

In the two sets of experiments (p = 0 and p = 1) where one model is correctly specified, we use the labels "correct" and "incorrect" when a choice is made. This allows a comparison with the asymptotic N(0, 1) approximation under our null hypothesis of equivalence.

Tables 1 and 5, report the cases when one model is correctly specified. It is well-known that the MLE is consistent for the true parameter value under correct specification.

For example, in Table 1, the log-normal model is correctly specified, and the MLE ofe r = ([r.sub.1], [r.sub.2]) approaches the true value [r.sub.o] = (-0.047, 0.5) as the sample size increases from 100 to 1000. The bootstrap estimator of [omega] also converges as the sample size becomes larger. The test statistic for model selection [DI.sub.n] approximatively increases at a rate [square root of n]. In table 5, when the exponential model is correctly specified, one can observe similar results.

The second half of Table 1, summarizes the results for our model selection procedure. The method performs quite well and select the correct model almost 100% of the times, as expected.

For Tables 2, 3 and 4, the data was generated neither from the log-normal model nor from the exponential model, but from a mixture of these two models. Hence, the log-normal and the exponential model are both incorrectly specified.

In Table 3, the data generating process is chosen such that the log-normal model and the exponential model are approximatively equally close to it. The test statistic [DI.sub.n] is expected to have a limiting standard normal N(0, 1). This roughly confirmed in Table 3. For example, for n = 1000, [DI.sub.n] has mean -0.044 and standard error 0.910.

From our limited Monte Carlo study, one can observe that test statistic for model selection [DI.sub.n] works relatively well, and fits equally well the data with a probability of around 95%.

6. Discusson

In summary, by analogy with the classical type chi-square statistics, we have introduced the divergence measures and propose some convenient asymptotically standard normal tests for model selection based on type divergence statistics that use estimators in a quite general class. The tests are designed to determine whether the estimated competing models are as close to the true distribution against the alternative hypothesis that one estimated model is closer, where closeness is measured according to discrepancy implicit in the divergence type statistic used. To determine the statistical divergence for the discrepancy between the observed data and a specific parametric, computation has done by some numerical technique, by the help of Bootstrap methods, for evaluating the estimator of the asymptotic variance of our test statistic.

Several Monte Carlo experiments were conducted and showed that our procedure performs relatively well. Our work can be used to compare the power of tests statistics for model selection, based on some other type measures of information.

References

[1] Akaike H., 1973, Information theory and an Extension of the Likelihood Ratio Principe. Proceedimgs of the Second International Symposium of Information Theory, Ed. by Petrov, B.N. and Csaki, F. Budapest: Akademiai Kiado, pp. 257-281.

[2] Andews D.W.K., 1967a, Chi-Square Diagnostic Tests for Econometric Models: Theory, Econometrica, 56, pp. 1419-1453.

[3] Andews D.W.K., 1988b, Chi-Square Diagnostic Tests for Econometric Models: Introduction and Applications, Journal of Econometrics, 37, pp. 135-156.

[4] Atkinson A.C., 1970, A Method for Discriminating Between Models, Journal of Royal Statistical Society, Series B, 32, pp. 323-353.

[5] Bar-Hen A. and Daudin J.J., 1995, Generalization of the Mahalanobis distance in the mixed case, Journal of Multivariate Analysis, 53, pp. 332-342.

[6] Cochran W.G., The A2 Test of goodness of fit, Ann. Math. Statist., 23, pp. 315-345.

[7] Cox D.R., 1962, Further Esults on Tests of Separate Families of Hypotheses, Journal of the Royal Statistical Society, Series B, 24, pp. 406-421.

[8] Cressie N. and Read T.R.C., 1984, Multinomial goodness of fit tests, Journal of the Royal Statistical Society, Series B, 46, pp. 440-464.

[9] Csiszar I., 1967, Information-type measures of difference of probability distributions and indirect observations, Studia Sci. Math. Hung., pp. 299-318.

[10] Efron, 1982, The Jackknife, the boostrap and Other Resampling Plans, CBMSNSF Regional Conference Series in Applied Mathematics, 38.

[11] Jeffrey H., 1946, Theory of probability, Univ. Oxford, London.

[12] Burbea J., 1984, The Bose-Einstein entropy of degree [R] and Jensen difference, Utilitas Math., 26, pp. 171-192.

[13] Kagan M., 1963, On the theory of Fisher's amount information, Sov. Math. Dokl, 4, pp. 99-993.

[14] Kullback S., Leibler, 1951, On the information and Sufficiency, Ann. Math. Statist., 22, pp. 79-86.

[15] Liese F. and Vajda I., 1987, Convex Statistical Distances, Teubner, Leipzig.

[16] Menendez M.L., Pardo Morales D., and Salicru M., 1997, Divergences measures between populations : applications in the exponential family, Communications in Statistics (Theory and Methods), 25, pp. 1099-1117.

[17] Moore D.S., 1977, Generalized Inverses, Wald's Method and the Construction of Chi-Squared Tests of fit, Journal of Statistical Association, 7, pp. 131-137.

[18] Moore D.S., 1984, Measures of lack of fit from Tests of Chi-Squared Type, Journal of Statistical Planning and Inference, 7, pp. 131-137.

[19] Morales D., Pardo L., and Vajda I., 1997, Some new statistics for testing hypotheses in parametric models, Journal of Multivariate Analysis, 10, pp. 151-166.

[20] Morales D., Pardo L., and Zografos K., 1998, Informational distances and related statistics in mixed continuous and categorical variables, Journal of Statistical Planning and Inference, 75, pp. 47-63.

[21] Morales D., Pardo L., 2001, Some approximations to power functions of [empty set]-divergences tests in parametric models, Test, 10, pp. 249-269.

[22] Nayak T.K., 1985, On diversity measures based on entropy functions, Communications in Statistics (Theory and Methods), 14, pp. 203-215.

[23] Pardo L. Salicru M. Menendez M.L., and Morales D., 1995, Divergence mesures based on entropy functions and statistical inference, Sankhya, Series B, 57, pp. 315-337.

[24] Pardo L., Morales D., Salicru M., and Menndez, 1994, Asumptotic properties of divergence statistics in a stratified random sampling and its applications to test satistical hypotheses, Journal of Statistical Planning and Inference, 38, pp. 201-222.

[25] Renyi A., 1961, On measures of entropy and information, Proc. 4slth Berkeley Symp. on Math. Statist. Univ. Calif. Press, Berkeley, 1, pp. 547-561.

[26] Salicru M., Menendez, Pardo L., and Morales D., 1994, On the applications of divergence type mesures in testing statistical hypoteses, Journal of Multivariate Analysis, 51, pp. 372-391.

[27] Serfling R.J., 1980, Approximations Theorems of Mathematical Statistics, John Wiley, New York.

[28] Sharma B.D., Mittal D.P., 1977, New nonadditive measures of entropy for discrete probability distributions, J. Math. Sci., 10, pp. 28-40.

[29] Taneja I.J., 1987, Statistical aspects of divergence measures, Journal of Statistical Planning and Inference, 16, pp. 136-145.

[30] Taneja I.J., 1989, On generalized information measures and their applications, Adv. Electron. Phys., 76, pp. 327-413.

[31] Vadja I., 1973, [chi square]-divergence and generalized Fisher's information, Trans. 6th Prague Conf. on Inform. Theory Statistical Decision Functions and Random Process, Prague, pp. 873-886.

[32] Vuong Q. and Weiren W., 1993, Selecting Estimated Models Using Chi-Square Statistics, Annals D'Economie et de Statistique, 30, pp. 144-164.

[33] Zografos K., Ferentinos K., and Papaioannou T., 1990, [empty set]-Divergence statistics: sampling properties and multinomial goodness of fit and divergence tests, Communication in Statistics (Theory and Methods), 19, pp. 1785-1802.

[34] Watson G.S., 1959, Some Recent Results in Chi-Square Goodness-of-fit Tests, Biometrics, 15, pp. 440-468.

Papa Ngom

Laboratoire de Mathematiques appliquees (LMA),

Universite Cheikh Anta Diop--Dakar--Senegal

E-mail: pngom@ucad.sn

When testing for discriminating between two competing models, a statistical method, usually, proceeds by evaluating the measure for discrepancy between the observed data and each parametric model. The parameter model with smaller value of measure statistic is generally chosen. This paper addresses the question of testing for choosing between two estimated models using some [empty set]-Divergence type statistics. We choice for arbitrary pn -asymptotically normal estimators to be used for introducing these statistics. The results here are illustrated by a simulation study, then Large Sample theory and bootstrap methods are used to construct our [empty set]-divergence tests in parametric models.

AMS Subject Classification: 62F03, 62F40, 62F05, 94A17.

Keywords: Asymptotic distributions, [empty set]-Divergence statistics, bootstrap methods, testing statistical hypotheses, test goodness fit.

1. Introduction

Cochran [6], Watson [34] and Moore [17] [18] have provided comprehensive surveys on Pearson chi-square type statistics, i.e., quadratic forms in the cell frequencies. Recently, Andrews [2], [3] has extended the Pearson chi-square testing method to non-dynamic parametric models, i.e., to models with covariates. Because Pearson chi-square statistics provide natural measures for the discrepancy between the observed data and a specific parametric model, they have also been used for discriminating among competing models. Such a situation is frequent in Social Sciences where many competing models are proposed to fit a given sample. A well know difficulty is that each chi-square statistic tends to become large without an increase in its degrees of freedom as the sample size increases. As a consequence goodness-of-fit tests based on Pearson type chi-square statistics will generally reject the correct specification of every competing model.

To circumvent such a difficulty, a popular method for model selection, which is similar to use of Akaike [1] Information Criterion (AIC), consists in considering that the lower the chi-square statistic, the better is the model.

The preceding selection rule, however, is not entirely satisfactory. Since chi-square statistics depend on the sample and are therefore random, their actual values are subject to statistical variations, we shall propose some convenient asymptotically standard normal tests for model selection based on [empty set]-Divergence type statistics. By analogy with the approach introduced by Vuong [32], our tests are testing the null hypothesis that the competing models are as close to the data generating process (DGP) where closeness of a model is measured according to the discrepancy implicit in the [empty set]-Divergence type statistics.

Following Morales and Pardo [21], let [P.sub.[theta]] : [theta] [member of] [THETA] be a family of probability measures on a measurable space (X, [beta]x) with open [THETA] [subset] [[??].sup.d], d [greater than or equal to] 1. Measures [P.sub.[theta]] are described by probability density functions (p.d.f.) [f.sub.[theta]](x) = d[P.sub.[theta]]/d[micro]] (x) with respect to a dominating [sigma]-finite measure [micro] on X. Sample space, X, is the support of [sigma]-finite measure [micro]. Statistical model, ((X, [beta]x), {[sub.[theta]] : [theta] [member of] x, [micro]), satisfies the regularity assumptions (R1)-(R3) appearing in pages 144-145 of Serfling [27] and the identifiability condition : (R4) if [f.sub.[[theta].sub.1]] = [f.sub.[[theta].sub.2]], then [[theta].sub.1] = [[theta].sub.2].

If [[theta].sub.0] is the true value of the parameter [theta] and ([R.sub.1])-(R4) holds, then there exist a strongly consistent sequence [[??].sub.n] of roots of the likelihood equations such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], (1.1)

where IF ([[theta].sub.0]) is the Fisher information matrix and [[??].sub.n] is assumed to be the maximum likelihood estimator (MLE).

We consider testing procedures based on a sequence of observations [X.sub.n] = ([X.sub.1], [X.sub.2],..., [X.sub.n]) with independent components taken from a p.d.f of the family [f.sub.[theta]] : [theta] [member of] [THETA].

Recently, in the literature, many papers appeared where divergence or type measures of information have been used in testing statistical hypothesis. We refer, among others, to Cressie and Read [8], Nayak [22], Zografos, Ferentinos and Papaioannou [33] Salicru, Morales, Menendez and and Pardo [23], Bar-Hen and Daudin [5] and references therein. Salicru et al. [26] introduced the divergence statistics [S.sub.[empty set],n] [equivalent to] 2n[C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1.2)

is the [empty set]-divergence of density from the family [f.sub.[theta]] : [theta] [member of] [THETA] introduced by Csiszar [9]. Liese and Vajda [15] have introduced a systematic theory of these divergences.

Morales et al. [16] have established that the asymptotic distribution of [S.sub.[empty set],n] [??] [[chi square].sub.d]. An important problem is to propose some divergences statistics for procedure tests.

The asymptotic behavior of the statistics based on [C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) is needed for choosing between two estimated models. In order to suggest a testing procedure, we present a new method in association with the divergence statistic.

The paper is organized as follows. Section 2 introduces the basic notations and defines a class of asymptotically normal estimators. In section 3, we investigate the model selection problem based on divergence type statistics. A large sample test is proposed. In section 4, Efron [10] bootsrap method is used to propose alternative and simpler testing procedures for model selection. Section 5, some simulation results are given. Section 6 concludes the paper and mentions some extensions.

2. Assumption and Asymptotic Behavior of the Divergence Statistic

Assumption (A1):

(i) The function [empty set] : [0, +[infinity][[right arrow]] -[infinity], +[infinity][ is convex and continuous. Its restriction on [0, +[infinity][ is finite, twice continuously differentiable, with [empty set](1) = [empty set]'(1) = 0 and [empty set]"(1) = 1;

(ii) Each [[theta].sub.0] [member of] [THETA] has an open neighborhood V ([[theta].sub.0]) and 1 [less than or equal to] i, j [less than or equal to] d, it holds:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

condition (i) deals with properties of [empty set]-divergence (cf. Liese and Vajda [15]).

Condition (ii) is needed to apply delta method for obtaining asymptotic distributions of [empty set]-statistics. Conditions sufficient for (ii) are presented in Morales et al. [19].

Assume that ([R.sub.1])-([R.sub.4]) and A1 hold. Under [H.sub.o] : [theta] [member of] [[THETA].sub.o] [subset] [THETA], we present the asymptotic distribution of [C.sub.[empty set]]([[??].sub.n], [[theta].sub.o]).

Theorem 2.1. Let the model and [empty set] satisfy (R1)-(R4) and (A1) respectively. Let [theta] bethe true parameter, with [theta] [not equal to] [[theta].sub.o]. Then we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].

where [[summation].sup.2.sub.[empty set]][theta], [[theta].sub.o]) = [AI.sub.F][([theta]).sup.-1] [A.sup.t] and A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) with [DELTA] = [partial derivative]/[partial derivative][[theta].sub.1],..., [partial derivative]/[partial derivative][[theta].sub.d].

Proof. A first order Taylor expansion gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

As

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

it is clear that the random variables, [square root of n][[C.sub.[empty set]]([[??].sub.n], [[theta].sub.o])-[C.sub.[empty set]]([theta], [[theta].sub.o])] and A[square root of n][([[??].sub.n]-[theta]).sup.t] have the same asymptotic distribution, because

[square root of n] o([parallel][[??].sub.n] - [theta][parallel]) = [o.sub.p](1)

3. Selecting Estimated Models

As we mentioned earlier, the type divergences statistics can be used to discriminate among alternative models.

Let h be the true probability density of the observations [X.sub.n] = ([X.sub.1],..., [X.sub.n]). We consider a specified model [F.sub.[theta]] = {F(.|[theta]); [theta] [member of] [THETA] [subset] [[??].sup.k} with [f.sub.[theta]](x) as the probability density function. Therefore, we define the discrepancy between the observations and the model [F.sub.[theta]] as following:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Of special interests to us is the situation in which a researcher has two competing parametric models [F.sub.[theta]] and [G.sub.[gamma]] = {G(.|[gamma]); [gamma] [member of] [GAMMA] [subset] [[??].sup.k]}, select the better of the two models based on their general discrimination statistics [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) and [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [g.sub.[[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1).

Definition 3.1. (Equivalent, Better and Worse) Consider two competing models [F.sub.[theta]] and [G.sub.[gamma]] and some discrimination type statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1). Let D(x) be the probability limit of [square root of n][D.sub.n](x).

The hypotheses

[H.sub.o ]: D([[theta].sub.o]) = D([[gamma].sub.o])

[H.sub.f] : D([[theta].sub.o]) < D([[gamma].sub.o])

[H.sub.g] : D([[theta].sub.o]) > D([[gamma].sub.o])

mean that the estimated models F(x|[[theta].sub.o]) and G(x|[[gamma].sub.o]) are equivalent, that F(x|[[theta].sub.o]) is better than G(x|[[gamma].sub.o]), and that F(x|[[theta].sub.o]) is worse than G(x|[[gamma].sub.o]), respectively.

Definition (3.1) calls for some remarks. First, in does not require that the same divergence type statistics be used in forming [D.sub.n]([[theta].sub.n]) and [D.sub.n]([[gamma].sub.n]). Choosing, however, different discrepancies for evaluating competing models is hardly justified. Second and more importantly, it allows estimators other than the matching divergence estimators to be used.

In any case, since [[??].sub.n], [[??].sub.n] are consistent estimators of [[theta].sub.o] and [[gamma].sub.o] by condition (1), we can use, from theorem 3.1, [square root of n]{[C.sub.[empty set](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [g.sub.[[??].sub.n]])} to consistently estimate the indicator [C.sub.[empty set]](h; [f.sub.[[theta].sub.o]]) - [C.sub.[empty set]](h, [g.sub.[[gamma].sub.o]]) which will be zero under the null hypothesis Ho. Using a standard Taylor expansion, we can obtain the asymptotic distribution of [square root of n]{[C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [[??].sub.n])}, which is normal with zero mean and variance [[[omega].sup.2] under [H.sub.o]. The detailed derivation and the expression for for [[omega].sup.2] can be found in the proof of the theorem (3.2).

Hence we define the statistic

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (3.1)

where [[??].sup.2] is a consistent estimator of [[omega].sup.2].(DI stands for Divergence Indicator).

We have,

Theorem 3.2. (Asymptotic Distribution of DI Statistic)Given H1-H4, then

(i) under the null hypothesis [H.sub.o], [DI.sub.n] [right arrow] N(0, 1) in distribution

(ii) under the alternative [H.sub.f], [DI.sub.n] [right arow] -[infinity] in probability,

(iii) under the alternative [H.sub.g], [DI.sub.n] [right arrow] +[infinity] in probability.

Proof.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

By difference, it follows that:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

From the multivariate central limit theorem and assumption (A1), we can now immediately obtain the asymptotic distribution of

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

under the null hypothesis of equivalence [H.sub.o].

Define:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

with A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) and B = [DELTA][C.sub.[empty set]]([gamma], [[gamma].sub.o])

Let [[omega].sup.2] = T [LAMBDA] [T.sup.t], we then have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Remark 3.3. One can note that there are some important measures of divergence which can not be written as [empty set]-divergence; for instance, the divergence measures given by Battacharyya, Sharma-Mittal and Reyni. However, such measures can be written in the following form:

[C.sub.[empty set],h]([[theta].sub.1], [[theta].sub.2]) = h([C.sub.[empty set]]([[theta].sub.1], [[theta].sub.2]))

where h is a differentiable increasing function mapping from [0, +[infinity][ onto [0,+[infinity][, with h(0) = 0 and h'(x) > 0.

We present these divergence measures, in the following table.

Divergence h function [empty set] Battacharyya [h.sub.B](x) = -ln(-x +1) [[empty set].sub.B](x) = -[x.sup.1/2] + 1/2(x+1) Sharma-Mittal [h.sub.S](x) = 1/s-1 [[empty set].sub.s](x) = [[(1+r(r-1)x).sup.s-1/r-1 [x.sup.r] - r(x-1) - 1/ - 1] r(r-1) Renyi [h.sub.R](x) = 1/r(r-1) [[empty set].sub.R](x) = ln(r(r-1)x + 1) [x.sup.r] - r(x-1) - 1/ r(r-1) Table 1: (h; [empty set])-Divergences with r [not equal to] 0, 1

Theorem (3.2) is quite general and gives us a wide variety of asymptotic standard normal tests for model selection based on divergence type statistics. Part (ii) and (iii) also implies that the test is consistent. In the next section, we detail the testing procedures based on Theorem (3.2) by using bootstrap methods.

4. Bootstrap Methods

Implementation of the model selection procedure proposed in section 3 requires the following computations:

(i) Estimation of the parameters [[??].sub.n] and [[??].sub.n],

(ii) Computation of the two divergences statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) and the difference [[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sub.n]) - [D.sub.n]([[??].sub.n])],

(iii) Computation of the variance [[??].sup.2] of [[??].sub.n] and finally, computation of [DI.sub.n] [equivalent to] [[??].sub.n]/[??]

Specifically, we carry out the following steps:

1) Let [F.sub.n] be the empirical probability distribution of the original data [x.sub.1], [x.sub.2], ... , [x.sub.n] i.e., [F.sub.n] : mass 1/n at [x.sub.i], (i = 1, 2, ... , n):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

Then draw an i.i.d "bootstrap sample" [x.sup.*.sub.1], [x.sup.*.sub.2], ... , [x.sup.*.sub.n] from [F.sub.n], i.e., draw [x.sup.*.sub.i] randomly with replacement from the observed values [x.sub.1], [x.sub.2], ... , [x.sub.n],

2) Using this bootstrap sample [x.sup.*.sub.i], estimate the competing models to obtain [[theta].sup.*.sub.n] and [[gamma].sup.*.sub.n]. Then calculate the statistic

[[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sup.*.sub.n]) - [D.sub.n]([[??].sup*.sub.n])]

3) Independently repeat steps 1 and 2 a large number of times S, say S=1000. Obtain "bootstrap replications" [[??].sub.n.sup.*]1, [[??].sup.*2.sub.n], ... , [[??].sup.*S.sub.n], and compute the sample variance of {[[??].sup.*j.sub.n], j = 1,..., S}.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

where [bar.B] = 1/S [S.summation over (j=1)] [[??].sup.*j.sub.n] is the average of "bootstrap replications".

Once the bootstrap variance [[??].sup.2.sub.*] is obtained, the test statistic [DI.sub.n] is calculated easily using the initial estimates [[??].sub.n] and [[??].sub.n]. Under suitable regularity conditions and for a large number of replications [10], [[??].sup.2.sub.*] is a consistent estimator of [[omega].sup.2].

Thus, from theorem 3.2, a testing procedure for model selection can be based on the comparison of the value of [DI.sub.n] to critical values from a standard normal table. For example, at 5% significance level, we compare [DI.sub.n] with -1.96 and 1.96. If [DI.sub.n] falls between -1.96 and 1.96, we conclude that both estimated models fit the data equally well. If [DI.sub.n] is less than -1.96 (or larger than 1.96), then we reject the null hypothesis in favor of the alternative hypothesis that the estimated model F(x|[[??].sub.n]) (or G(x|[[??].sub.n])) is closer to the true distribution.

Although using the bootstrap method to obtain an estimate of [[omega].sup.2], the basic justification of the preceding testing comes from the asymptotic properties obtained in Theorem 3.2.

5. Numerical Study

We present briefly the basic assumptions on the model and parameter estimators, and we define our general divergence type statistics. Assumption (A2): The observed data [X.sub.i], i = 1,..., are independent and are identically distributed (iid) with some common true distribution H.

The sample space X is partitioned into M mutually disjoint fixed cells [C.sub.1], [C.sub.2], ... , [C.sub.M]. Let n be the sample size. Corresponding to the partition [C.sub.1], [C.sub.2], ... , [C.sub.M] we can compute the vector of observed cell probabilities

f = [([f.sub.1], [f.sub.2], f ... , [f.sub.M]).sup.t] where [f.sub.i] = 1/n [n.summation over (j=1)] [I.sub.[C.sub.i]]([X.sub.j]), for i = 1, 2, ... , M. (5.1)

and [I.sub.[C.sub.i]]([X.sub.j]) is the indicator function:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.].

Let a specified model be [H.sub.[theta]] = {H(x|[theta]), [theta] [member of] [THETA] [subset] [[??].sup.d]} and denote the vector of its predicted cell probabilities by:

h([theta]) = ([h.sub.1]([theta]), [h.sub.2]([theta]), ... , [h.sub.M][([theta])).sup.t] where [h.sub.i]([theta]) = [[integral].sub.[C.sub.i]] dH(x|[theta])

where H(x|[theta]) is joint distribution for [X.sub.i].

We suppose [h.sub.i]([theta]) > 0 and [h.sub.i]([theta]) is continuously differentiable (Assumption A1) for every i = 1, 2,..., M.

To illustrate the model selection procedure in the preceding section, we consider an example. We need to define the competing models, and the divergence type statistic to measure the departure of each proposed parametric model from the data generating process.

Here, we choose an important measure of divergence given by Renyi [25] which can be written in following form :

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and limiting cases for [alpha] = 0 and [alpha] = 1. That is,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

which is Kullback-Leibler divergence.

In case that [f.sub.[[theta].sub.1]] and [f.sub.[[theta].sub.2] are discrete probability distributions, their Renyi's divergence is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (5.2)

In statistical literature, the problem of choosing between the family of log-normal distributions and the family of exponential distributions has a long history. See [7] and [4] among others.

The log-normal distribution is parameterized by r = ([r.sub.1], [r.sub.2]) and has density

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],

otherwise.

The exponential distribution with parameter [beta] has density

g(x|[beta]) = 1/[beta] exp (-x/[beta]) for x > 0

and zero otherwise.

The estimator used for each competing model is the maximum likelihood estimator (MLE). Specifically, for the log-normal model,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

For the exponential model, the MLE is the sample average, i.e.,

[??] = 1/n [n.summation over (i=1)] [x.sub.i]

Lastly, we use the Renyi's divergence measure (5) to evaluate the discrepancy of a proposed model from the true data generating process. We partition the real line into M intervals {([a.sub.i-1], [a.sub.i]), i = 1, ... , M} where [a.sub.i] is a real number. The choice of the cells is discussed below. The Renyi statistic for the log-normal and exponential models are:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

where [h.sub.i](r) and [g.sub.i]([beta]) are probabilities of the interval ([a.sub.i-1], [a.sub.i]) under h(x|r) and g(x|[beta]) respectively, and f is the vector of observed cell probabilities defined in (4).

In our Monte Carlo study, we consider various sets of experiments in which the data are generated from a mixture of an exponential distribution and a log-normal distribution. These two distributions are calibrated so that have the same population means and variances, namely one and one. Hence the data generating process has density

d(p) = p Exponential (1) + (1 - p) Log-normal (-0.047, 0.5)

where p is set to some specific value for each set of experiments. In each set of experiments, several random samples are drawn from this mixture of distributions. The sample size varies from 100 to 1,000, and each sample size the number of replications is 1,000.

Throughout, the chosen partition has, four cells defined by the values [a.sub.0] = 1.0, [a.sub.1] = 1.5, [a.sub.2] = 2.0, [a.sub.3] = 3.0, and [a.sub.4] = +1. Similarly to the minimum Chi-square methods, note that because the log-normal distribution has two parameters, hence four is the minimum number of cells for which a perfect fit is not always achieved. Note also that the shapes of the log-normal and exponential densities differ greatly around the origin. This motivates the choice of [a.sub.0] = 1.0. The value [alpha] = 0.5 in (5) corresponds, approximatively, to the common density function in [1, +[infinity][ under the null hypothesis [H.sub.o] (see figure 1-c).

[FIGURE 1 OMITTED]

We choose five different values for p which are: 0.00, 0.25, 0.41, 0.75 and 1.00. Although our proposed model selection procedure does not require that the data generating process belong to either of the competing models, we consider the two limiting cases p = 0.00 and p = 1.00 for they correspond to the correctly specified cases.

The value p = 0.410 is determined to be the value for which the estimated log-normal distribution and the estimated exponential distribution are approximatively at equal distance from the mixture d(p) according to Renyi's divergence. Thus this set of experiments corresponds approximatively to the null hypothesis of our proposed model selection test [DI.sub.n].

The results of our four sets of experiments are presented in Tables 1-5. The first half of each table gives the average values of the ML estimators [??], [[??].sub.1], and [[??].sub.2], the divergence goodness-of-fit statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]), and the model selection [DI.sub.n] with its bootstrap estimated variance [[??].sup.2.sub.*]. The values in parentheses are standard errors. The second half of each table gives in percentage the number of times our proposed model selection procedures based on the method described in the previous section, favor the log-normal model, the exponential model, or are indecisive. The tests are conducted at the 5% nominal level.

In the two sets of experiments (p = 0 and p = 1) where one model is correctly specified, we use the labels "correct" and "incorrect" when a choice is made. This allows a comparison with the asymptotic N(0, 1) approximation under our null hypothesis of equivalence.

Tables 1 and 5, report the cases when one model is correctly specified. It is well-known that the MLE is consistent for the true parameter value under correct specification.

For example, in Table 1, the log-normal model is correctly specified, and the MLE ofe r = ([r.sub.1], [r.sub.2]) approaches the true value [r.sub.o] = (-0.047, 0.5) as the sample size increases from 100 to 1000. The bootstrap estimator of [omega] also converges as the sample size becomes larger. The test statistic for model selection [DI.sub.n] approximatively increases at a rate [square root of n]. In table 5, when the exponential model is correctly specified, one can observe similar results.

The second half of Table 1, summarizes the results for our model selection procedure. The method performs quite well and select the correct model almost 100% of the times, as expected.

For Tables 2, 3 and 4, the data was generated neither from the log-normal model nor from the exponential model, but from a mixture of these two models. Hence, the log-normal and the exponential model are both incorrectly specified.

In Table 3, the data generating process is chosen such that the log-normal model and the exponential model are approximatively equally close to it. The test statistic [DI.sub.n] is expected to have a limiting standard normal N(0, 1). This roughly confirmed in Table 3. For example, for n = 1000, [DI.sub.n] has mean -0.044 and standard error 0.910.

From our limited Monte Carlo study, one can observe that test statistic for model selection [DI.sub.n] works relatively well, and fits equally well the data with a probability of around 95%.

6. Discusson

In summary, by analogy with the classical type chi-square statistics, we have introduced the divergence measures and propose some convenient asymptotically standard normal tests for model selection based on type divergence statistics that use estimators in a quite general class. The tests are designed to determine whether the estimated competing models are as close to the true distribution against the alternative hypothesis that one estimated model is closer, where closeness is measured according to discrepancy implicit in the divergence type statistic used. To determine the statistical divergence for the discrepancy between the observed data and a specific parametric, computation has done by some numerical technique, by the help of Bootstrap methods, for evaluating the estimator of the asymptotic variance of our test statistic.

Several Monte Carlo experiments were conducted and showed that our procedure performs relatively well. Our work can be used to compare the power of tests statistics for model selection, based on some other type measures of information.

References

[1] Akaike H., 1973, Information theory and an Extension of the Likelihood Ratio Principe. Proceedimgs of the Second International Symposium of Information Theory, Ed. by Petrov, B.N. and Csaki, F. Budapest: Akademiai Kiado, pp. 257-281.

[2] Andews D.W.K., 1967a, Chi-Square Diagnostic Tests for Econometric Models: Theory, Econometrica, 56, pp. 1419-1453.

[3] Andews D.W.K., 1988b, Chi-Square Diagnostic Tests for Econometric Models: Introduction and Applications, Journal of Econometrics, 37, pp. 135-156.

[4] Atkinson A.C., 1970, A Method for Discriminating Between Models, Journal of Royal Statistical Society, Series B, 32, pp. 323-353.

[5] Bar-Hen A. and Daudin J.J., 1995, Generalization of the Mahalanobis distance in the mixed case, Journal of Multivariate Analysis, 53, pp. 332-342.

[6] Cochran W.G., The A2 Test of goodness of fit, Ann. Math. Statist., 23, pp. 315-345.

[7] Cox D.R., 1962, Further Esults on Tests of Separate Families of Hypotheses, Journal of the Royal Statistical Society, Series B, 24, pp. 406-421.

[8] Cressie N. and Read T.R.C., 1984, Multinomial goodness of fit tests, Journal of the Royal Statistical Society, Series B, 46, pp. 440-464.

[9] Csiszar I., 1967, Information-type measures of difference of probability distributions and indirect observations, Studia Sci. Math. Hung., pp. 299-318.

[10] Efron, 1982, The Jackknife, the boostrap and Other Resampling Plans, CBMSNSF Regional Conference Series in Applied Mathematics, 38.

[11] Jeffrey H., 1946, Theory of probability, Univ. Oxford, London.

[12] Burbea J., 1984, The Bose-Einstein entropy of degree [R] and Jensen difference, Utilitas Math., 26, pp. 171-192.

[13] Kagan M., 1963, On the theory of Fisher's amount information, Sov. Math. Dokl, 4, pp. 99-993.

[14] Kullback S., Leibler, 1951, On the information and Sufficiency, Ann. Math. Statist., 22, pp. 79-86.

[15] Liese F. and Vajda I., 1987, Convex Statistical Distances, Teubner, Leipzig.

[16] Menendez M.L., Pardo Morales D., and Salicru M., 1997, Divergences measures between populations : applications in the exponential family, Communications in Statistics (Theory and Methods), 25, pp. 1099-1117.

[17] Moore D.S., 1977, Generalized Inverses, Wald's Method and the Construction of Chi-Squared Tests of fit, Journal of Statistical Association, 7, pp. 131-137.

[18] Moore D.S., 1984, Measures of lack of fit from Tests of Chi-Squared Type, Journal of Statistical Planning and Inference, 7, pp. 131-137.

[19] Morales D., Pardo L., and Vajda I., 1997, Some new statistics for testing hypotheses in parametric models, Journal of Multivariate Analysis, 10, pp. 151-166.

[20] Morales D., Pardo L., and Zografos K., 1998, Informational distances and related statistics in mixed continuous and categorical variables, Journal of Statistical Planning and Inference, 75, pp. 47-63.

[21] Morales D., Pardo L., 2001, Some approximations to power functions of [empty set]-divergences tests in parametric models, Test, 10, pp. 249-269.

[22] Nayak T.K., 1985, On diversity measures based on entropy functions, Communications in Statistics (Theory and Methods), 14, pp. 203-215.

[23] Pardo L. Salicru M. Menendez M.L., and Morales D., 1995, Divergence mesures based on entropy functions and statistical inference, Sankhya, Series B, 57, pp. 315-337.

[24] Pardo L., Morales D., Salicru M., and Menndez, 1994, Asumptotic properties of divergence statistics in a stratified random sampling and its applications to test satistical hypotheses, Journal of Statistical Planning and Inference, 38, pp. 201-222.

[25] Renyi A., 1961, On measures of entropy and information, Proc. 4slth Berkeley Symp. on Math. Statist. Univ. Calif. Press, Berkeley, 1, pp. 547-561.

[26] Salicru M., Menendez, Pardo L., and Morales D., 1994, On the applications of divergence type mesures in testing statistical hypoteses, Journal of Multivariate Analysis, 51, pp. 372-391.

[27] Serfling R.J., 1980, Approximations Theorems of Mathematical Statistics, John Wiley, New York.

[28] Sharma B.D., Mittal D.P., 1977, New nonadditive measures of entropy for discrete probability distributions, J. Math. Sci., 10, pp. 28-40.

[29] Taneja I.J., 1987, Statistical aspects of divergence measures, Journal of Statistical Planning and Inference, 16, pp. 136-145.

[30] Taneja I.J., 1989, On generalized information measures and their applications, Adv. Electron. Phys., 76, pp. 327-413.

[31] Vadja I., 1973, [chi square]-divergence and generalized Fisher's information, Trans. 6th Prague Conf. on Inform. Theory Statistical Decision Functions and Random Process, Prague, pp. 873-886.

[32] Vuong Q. and Weiren W., 1993, Selecting Estimated Models Using Chi-Square Statistics, Annals D'Economie et de Statistique, 30, pp. 144-164.

[33] Zografos K., Ferentinos K., and Papaioannou T., 1990, [empty set]-Divergence statistics: sampling properties and multinomial goodness of fit and divergence tests, Communication in Statistics (Theory and Methods), 19, pp. 1785-1802.

[34] Watson G.S., 1959, Some Recent Results in Chi-Square Goodness-of-fit Tests, Biometrics, 15, pp. 440-468.

Papa Ngom

Laboratoire de Mathematiques appliquees (LMA),

Universite Cheikh Anta Diop--Dakar--Senegal

E-mail: pngom@ucad.sn

Table 1 Data generating Process = Log-norm(-0.047, 0.5) n 100 300 [??] 0.927 (0.051) 0.925 (0.028) [[??].sub.1] -0.046 (0.052) -0,046 (0.021) [[??].sub.2] 0.497 (0.035) 0.500 (0.021) [[??].sub.*] 1.413 (0.181) 1.383 (0.146) [D.sub.n]([[??].sub.n]) 3.699 (0.336) 3.636 (0.178) [D.sub.n]([[??].sub.n]) 3.131 (0.394) 3.103 (0.214) [DI.sub.n] 4.081 (1.187) 6.726 (1.082) Incorrect 0% 0% Indecisive 0% 0% Correct 100% 100% n 600 1000 [??] 0.924 (0.020) 0.925 (0.016) [[??].sub.1] -0.047 (0.021) -0.047 (0.016) [[??].sub.2] 0.500 (0.014) 0.500 (0.011) [[??].sub.*] 1.325 (0.125) 1.303 (0.107) [D.sub.n]([[??].sub.n]) 3.617 (0.125) 3.616 (0.096) [D.sub.n]([[??].sub.n]) 3.087 (0.152) 3.092 (0.116) [DI.sub.n] 9.846 (1.121) 12.806 (1.286) Incorrect 0% 0% Indecisive 0% 0% Correct 100% 100% Table 2 DGP = 0.25 Exp (1) + 0.75 Log-norm (-0.047, 0.5) n 100 300 [beta] 0.949 (0.061) 0.944 (0.038) [r.sub.1] -0.181 (0.075) -0,180 (0.046) [r.sub.2] 0.795 (0.122) 0.804 (0.074) [omega] 1.349 (0.409) 1.265 (0.264) [D.sub.n]([??]) 3.735 (0.336) 3.677 (0.198) [D.sub.n)([??]) 3.572 (0.385) 3.519 (0.225) [DI.sub.n] 1.295 (0.877) 2.609 (1.016) Favor Exp 0% 0% Indecisive 76% 38% Favor Log-n 24% 62% n 600 1000 [beta] 0.943 (0.026) 0.944 (0.019) [r.sub.1] -0.179 (0.032) -0.180 (0.042) [r.sub.2] 0.802 (0.053) 0.806 (0.042) [omega] 1.233 (0.188) 1.240 (0.165) [D.sub.n]([??]) 3.665 (0.135) 3.664 (0.103) [D.sub.n)([??]) 3.505 (0.154) 3.507 (0.117) [DI.sub.n] 3.250 (1.082) 4.091 (1.113) Favor Exp 0% 0% Indecisive 12% 2% Favor Log-n 88% 98% Table 3 DGP = 0.410 Exp (1) + 0.590 Log-norm (-0.047, 0.5) n 100 300 [beta] 0.957 (0.069) 0.955 (0.040) [r.sub.1] -0.263 (0.090) -0,263 (0.051) [r.sub.2] 0.932 (0.131) 0.941 (0.074) [omega] 1.201 (0.343) 1.125 (0.198) [D.sub.n]([??]) 3.775 (0.358) 3.712 (0.196) [D.sub.n]([??]) 3.771 (0.398) 3.711 (0.218) [DI.sub.n] 0.048 (0.896) 0.031 (0.921) Favor Exp 1% 1% Indecisive 96% 97% Favor Log-n 3% 2% n 600 1000 [beta] 0.955 (0.028) 0.955 (0.021) [r.sub.1] -0.264 (0.035) -0.265 (0.027) [r.sub.2] 0.942 (0.055) 0.944 (0.042) [omega] 1.103 (0.162) 1.103 (0.132) [D.sub.n]([??]) 3.710 (0.140) 3.706 (0.106) [D.sub.n]([??]) 3.711 (0.154) 3.708 (0.117) [DI.sub.n] 0.008 (0.908) -0.044 (0.910) Favor Exp 1% 1% Indecisive 97% 97% Favor Log-n 2% 2% Table 4 DGP = 0.75 Exp (1) + 0.25 Log-norm (-0.047, 0.5) n 100 300 [beta] 0.986 (0.090) 0.981 (0.051) [r.sub.1] -0.441 (0.116) -0,443 (0.067) [r.sub.2] 1.153 (0.135) 1.158 (0.080) [omega] 0.947 (0.254) 0.853 (0.138) [D.sub.n]([??]) 3.919 (0.416) 3.868 (0.229) [D.sub.n]([??]) 4.132 (0.436) 4.082 (0.239) [DI.sub.n] -2.319 (0.902) -4.426 (1.074) Favor Exp 66% 99% Indecisive 34% 1% Favor Log-n 0% 0% n 600 1000 [beta] 0.981 (0.036) 0.980 (0.027) [r.sub.1] -0.445 (0.046) -0.443 (0.036) [r.sub.2] 1.159 (0.055) 1.158 (0.043) [omega] 0.840 (0.105) 0.835 (0.090) [D.sub.n]([??]) 3.865 (0.105) 3.855 (0.121) [D.sub.n]([??]) 4.082 (0.164) 4.070 (0.127) [DI.sub.n] -6.388 (1.076) -8.238 (1.190) Favor Exp 100% 100% Indecisive 0% 0% Favor Log-n 0% 0% Table 5 Data generating Process = Exponential (1) n 100 300 [beta] 1.008 (0.105) 1.001 (0.059) [r.sub.1] -0.570 (0.131) -0,577 (0.076) [r.sub.2] 1.266 (0.128) 1.284 (0.078) [omega] 0.840 (0.227) 0.757 (0.107) [D.sub.n]([??]) 4.068 (0.466) 4.023 (0.250) [D.sub.n]([??]) 4.378 (0.465) 4.339 (0.250) [DI.sub.n] -3.833 (1.040) -7.300 (1.082) Correct 100% 100% Indecisive 0% 0% Incorrect 0% 0% n 600 1000 [beta] 1.001 (0.040) 1.000 (0.031) [r.sub.1] -0.576 (0.052) -0.577 (0.040) [r.sub.2] 1.280 (0.052) 1.282 (0.043) [omega] 0.738 (0.083) 0.735 (0.074) [D.sub.n]([??]) 4.012 (0.179) 4.007 (0.138) [D.sub.n]([??]) 4.328 (0.178) 4.324 (0.137) [DI.sub.n] -10.565 (1.195) -13.745 (1.430) Correct 100% 100% Indecisive 0% 0% Incorrect 0% 0%

Printer friendly Cite/link Email Feedback | |

Author: | Ngom, Papa |
---|---|

Publication: | Global Journal of Pure and Applied Mathematics |

Geographic Code: | 6SENE |

Date: | Apr 1, 2007 |

Words: | 5977 |

Previous Article: | On greedy algorithms with respect to generalized Walsh system. |

Next Article: | On the eigenstructure of a Sturm-Liouville problem with an impedance boundary condition. |

Topics: |

## Reader Opinion