Printer Friendly
The Free Library
22,725,466 articles and books

Selected estimated models with [empty set]-divergence statistics.


When testing for discriminating dis·crim·i·nat·ing  
a. Able to recognize or draw fine distinctions; perceptive.

b. Showing careful judgment or fine taste:
 between two competing models, a statistical method, usually, proceeds by evaluating the measure for discrepancy DISCREPANCY. A difference between one thing and another, between one writing and another; a variance. (q.v.)
     2. Discrepancies are material and immaterial.
 between the observed data and each parametric model In statistics, a parametric model is a parametrized family of probability distributions, one of which is presumed to describe the way a population is distributed. Examples
  • For each real number μ and each positive number σ2
. The parameter model with smaller value of measure statistic statistic,
n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample.


a numerical value calculated from a number of observations in order to summarize them.
 is generally chosen. This paper addresses the question of testing for choosing between two estimated models using some [empty set]-Divergence type statistics. We choice for arbitrary pn -asymptotically normal estimators to be used for introducing these statistics. The results here are illustrated by a simulation study, then Large Sample theory and bootstrap See boot.

(operating system, compiler) bootstrap - To load and initialise the operating system on a computer. Normally abbreviated to "boot". From the curious expression "to pull oneself up by one's bootstraps", one of the legendary feats of Baron von Munchhausen.
 methods are used to construct our [empty set]-divergence tests in parametric models.

AMS AMS - Andrew Message System  Subject Classification: 62F03, 62F40, 62F05, 94A17.

Keywords: Asymptotic distributions In mathematics and statistics, an asymptotic distribution is a hypothetical distribution that is in a sense the "limiting" distribution of a sequence of distributions. A distribution is an ordered set of random variables


for i
, [empty set]-Divergence statistics, bootstrap methods, testing statistical hypotheses, test goodness fit.

1. Introduction

Cochran [6], Watson [34] and Moore [17] [18] have provided comprehensive surveys on Pearson chi-square type statistics, i.e., quadratic forms In mathematics, a quadratic form is a homogeneous polynomial of degree two in a number of variables. The term quadratic form is also often used to refer to a quadratic space, which is a pair (V,q) where V is a vector space over a field k  in the cell frequencies. Recently, Andrews [2], [3] has extended the Pearson chi-square testing chi-square test: see statistics.  method to non-dynamic parametric models, i.e., to models with covariates. Because Pearson chi-square statistics provide natural measures for the discrepancy between the observed data and a specific parametric model, they have also been used for discriminating among competing models. Such a situation is frequent in Social Sciences where many competing models are proposed to fit a given sample. A well know difficulty is that each chi-square statistic tends to become large without an increase in its degrees of freedom as the sample size increases. As a consequence goodness-of-fit tests based on Pearson type chi-square statistics will generally reject the correct specification of every competing model.

To circumvent cir·cum·vent  
tr.v. cir·cum·vent·ed, cir·cum·vent·ing, cir·cum·vents
1. To surround (an enemy, for example); enclose or entrap.

2. To go around; bypass: circumvented the city.
 such a difficulty, a popular method for model selection, which is similar to use of Akaike [1] Information Criterion There are a number of statistics that can act as an information criterion. They include:
  • Akaike's information criterion
  • the Bayesian information criterion, also known as the Schwarz information criterion
  • Hannan-Quinn information criterion
 (AIC AIC Association des Infermières Canadiennes. ), consists in considering that the lower the chi-square statistic, the better is the model.

The preceding selection rule, however, is not entirely satisfactory. Since chi-square statistics depend on the sample and are therefore random, their actual values are subject to statistical variations, we shall propose some convenient asymptotically standard normal tests for model selection based on [empty set]-Divergence type statistics. By analogy with the approach introduced by Vuong [32], our tests are testing the null hypothesis null hypothesis,
n theoretical assumption that a given therapy will have results not statistically different from another treatment.

null hypothesis,
 that the competing models are as close to the data generating process (DGP DGP Director General of Police (India)
DGP Dog-Gone-Pain
DGP Dissimilar Gateway Protocol
DGP Deutsche Gesellschaft für Parodontologie
DGP Data Generating Process
DGP Daily Grammar Practice (education) 
) where closeness of a model is measured according to according to
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

 the discrepancy implicit in Adj. 1. implicit in - in the nature of something though not readily apparent; "shortcomings inherent in our approach"; "an underlying meaning"
underlying, inherent
 the [empty set]-Divergence type statistics.

Following Morales and Pardo [21], let [P.sub.[theta Theta

A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option.
]] : [theta] [member of] [THETA] be a family of probability measures on a measurable space (X, [beta]x) with open [THETA] [subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original. ] [[??].sup.d], d [greater than or equal to] 1. Measures [P.sub.[theta]] are described by probability density functions Probability density function

The function that describes the change of certain realizations for a continuous random variable.
 (p.d.f.) [f.sub.[theta]](x) = d[P.sub.[theta]]/d[micro]] (x) with respect to a dominating [sigma]-finite measure [micro] on X. Sample space, X, is the support of [sigma]-finite measure [micro]. Statistical model, ((X, [beta]x), {[sub.[theta]] : [theta] [member of] x, [micro]), satisfies the regularity assumptions (R1)-(R3) appearing in pages 144-145 of Serfling [27] and the identifiability condition In mathematics, the identifiability condition is defined as

which says that if a function evaluates the same, then the arguments must be the same. I.e.
 : (R4) if [f.sub.[[theta].sub.1]] = [f.sub.[[theta].sub.2]], then [[theta].sub.1] = [[theta].sub.2].

If [[theta].sub.0] is the true value of the parameter [theta] and ([R.sub.1])-(R4) holds, then there exist a strongly consistent sequence [[??].sub.n] of roots of the likelihood equations such that

[MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. .], (1.1)

where IF ([[theta].sub.0]) is the Fisher information In statistics and information theory, the Fisher information (denoted ) is the variance of the score. It is named in honor of its inventor, the statistician R.A. Fisher.  matrix and [[??].sub.n] is assumed to be the maximum likelihood estimator (MLE MLE Maximum Likelihood Estimation
MLE Managed Learning Environment
MLE Maximum Likelihood Estimate
MLE Medical Laboratory Evaluation (Medical Laboratory Proficiency Testing Program, Washington, DC) 

We consider testing procedures based on a sequence of observations [X.sub.n] = ([X.sub.1], [X.sub.2],..., [X.sub.n]) with independent components taken from a p.d.f of the family [f.sub.[theta]] : [theta] [member of] [THETA].

Recently, in the literature, many papers appeared where divergence divergence

In mathematics, a differential operator applied to a three-dimensional vector-valued function. The result is a function that describes a rate of change. The divergence of a vector v is given by
 or type measures of information have been used in testing statistical hypothesis. We refer, among others, to Cressie and Read [8], Nayak [22], Zografos, Ferentinos and Papaioannou [33] Salicru, Morales, Menendez and and Pardo [23], Bar-Hen and Daudin [5] and references therein. Salicru et al. [26] introduced the divergence statistics [S.sub.[empty set],n] [equivalent to] 2n[C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) where


is the [empty set]-divergence of density from the family [f.sub.[theta]] : [theta] [member of] [THETA] introduced by Csiszar [9]. Liese and Vajda [15] have introduced a systematic theory of these divergences.

Morales et al. [16] have established that the asymptotic distribution of [S.sub.[empty set],n] [??] [[chi square chi square (kī),
n a nonparametric statistic used with discrete data in the form of frequency count (nominal data) or percentages or proportions that can be reduced to frequencies.
].sub.d]. An important problem is to propose some divergences statistics for procedure tests.

The asymptotic behavior of the statistics based on [C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) is needed for choosing between two estimated models. In order to suggest a testing procedure, we present a new method in association with the divergence statistic.

The paper is organized as follows. Section 2 introduces the basic notations and defines a class of asymptotically normal estimators. In section 3, we investigate the model selection problem based on divergence type statistics. A large sample test is proposed. In section 4, Efron [10] bootsrap method is used to propose alternative and simpler testing procedures for model selection. Section 5, some simulation results are given. Section 6 concludes the paper and mentions some extensions.

2. Assumption and Asymptotic Behavior of the Divergence Statistic

Assumption (A1):

(i) The function [empty set] : [0, +[infinity][[right arrow]] -[infinity], +[infinity][ is convex Convex

Curved, as in the shape of the outside of a circle. Usually referring to the price/required yield relationship for option-free bonds.
 and continuous. Its restriction on [0, +[infinity][ is finite, twice continuously differentiable dif·fer·en·tia·ble  
1. That can be differentiated: differentiable species.

2. Mathematics Possessing a derivative.
, with [empty set](1) = [empty set]'(1) = 0 and [empty set]"(1) = 1;

(ii) Each [[theta].sub.0] [member of] [THETA] has an open neighborhood V ([[theta].sub.0]) and 1 [less than or equal to] i, j [less than or equal to] d, it holds:


condition (i) deals with properties of [empty set]-divergence (cf. Liese and Vajda [15]).

Condition (ii) is needed to apply delta method In statistics, the delta method is a method for deriving an approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.  for obtaining asymptotic distributions of [empty set]-statistics. Conditions sufficient for (ii) are presented in Morales et al. [19].

Assume that ([R.sub.1])-([R.sub.4]) and A1 hold. Under [H.sub.o] : [theta] [member of] [[THETA].sub.o] [subset] [THETA], we present the asymptotic distribution of [C.sub.[empty set]]([[??].sub.n], [[theta].sub.o]).

Theorem theorem, in mathematics and logic, statement in words or symbols that can be established by means of deductive logic; it differs from an axiom in that a proof is required for its acceptance.  2.1. Let the model and [empty set] satisfy (R1)-(R4) and (A1) respectively. Let [theta] bethe true parameter, with [theta] [not equal to] [[theta].sub.o]. Then we have


where [[summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) ].sup.2.sub.[empty set]][theta], [[theta].sub.o]) = [AI.sub.F][([theta]).sup.-1] [A.sup.t] and A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) with [DELTA] = [partial derivative partial derivative

In differential calculus, the derivative of a function of several variables with respect to change in just one of its variables. Partial derivatives are useful in analyzing surfaces for maximum and minimum points and give rise to partial differential
]/[partial derivative][[theta].sub.1],..., [partial derivative]/[partial derivative][[theta].sub.d].

Proof. A first order Taylor expansion gives




it is clear that the random variables, [square root of n][[C.sub.[empty set]]([[??].sub.n], [[theta].sub.o])-[C.sub.[empty set]]([theta], [[theta].sub.o])] and A[square root of n][([[??].sub.n]-[theta]).sup.t] have the same asymptotic distribution, because

[square root of n] o([parallel][[??].sub.n] - [theta][parallel]) = [o.sub.p](1)

3. Selecting Estimated Models

As we mentioned earlier, the type divergences statistics can be used to discriminate among alternative models.

Let h be the true probability density probability density
n. Statistics In both senses also called probability distribution.
1. A function whose integral over a given interval gives the probability that the values of a random variable will fall within the interval.
 of the observations [X.sub.n] = ([X.sub.1],..., [X.sub.n]). We consider a specified model [F.sub.[theta]] = {F(.|[theta]); [theta] [member of] [THETA] [subset] [[??].sup.k} with [f.sub.[theta]](x) as the probability density function. Therefore, we define the discrepancy between the observations and the model [F.sub.[theta]] as following:


Of special interests to us is the situation in which a researcher has two competing parametric models [F.sub.[theta]] and [G.sub.[gamma]] = {G(.|[gamma]); [gamma] [member of] [GAMMA] [subset] [[??].sup.k]}, select the better of the two models based on their general discrimination statistics [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) and [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [g.sub.[[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1).

Definition 3.1. (Equivalent, Better and Worse) Consider two competing models [F.sub.[theta]] and [G.sub.[gamma]] and some discrimination type statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1). Let D(x) be the probability limit of [square root of n][D.sub.n](x).

The hypotheses

[H.sub.o ]: D([[theta].sub.o]) = D([[gamma].sub.o])

[H.sub.f] : D([[theta].sub.o]) < D([[gamma].sub.o])

[H.sub.g] : D([[theta].sub.o]) > D([[gamma].sub.o])

mean that the estimated models F(x|[[theta].sub.o]) and G(x|[[gamma].sub.o]) are equivalent, that F(x|[[theta].sub.o]) is better than G(x|[[gamma].sub.o]), and that F(x|[[theta].sub.o]) is worse than G(x|[[gamma].sub.o]), respectively.

Definition (3.1) calls for some remarks. First, in does not require that the same divergence type statistics be used in forming [D.sub.n]([[theta].sub.n]) and [D.sub.n]([[gamma].sub.n]). Choosing, however, different discrepancies for evaluating competing models is hardly justified. Second and more importantly, it allows estimators other than the matching divergence estimators to be used.

In any case, since [[??].sub.n], [[??].sub.n] are consistent estimators of [[theta].sub.o] and [[gamma].sub.o] by condition (1), we can use, from theorem 3.1, [square root of n]{[C.sub.[empty set](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [g.sub.[[??].sub.n]])} to consistently estimate the indicator [C.sub.[empty set]](h; [f.sub.[[theta].sub.o]]) - [C.sub.[empty set]](h, [g.sub.[[gamma].sub.o]]) which will be zero under the null hypothesis Ho. Using a standard Taylor expansion, we can obtain the asymptotic distribution of [square root of n]{[C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) - [C.sub.[empty set]](h, [[??].sub.n])}, which is normal with zero mean and variance [[[omega].sup.2] under [H.sub.o]. The detailed derivation derivation, in grammar: see inflection.  and the expression for for [[omega].sup.2] can be found in the proof of the theorem (3.2).

Hence we define the statistic


where [[??].sup.2] is a consistent estimator of [[omega].sup.2].(DI stands for Divergence Indicator).

We have,

Theorem 3.2. (Asymptotic Distribution of DI Statistic)Given H1-H4, then

(i) under the null hypothesis [H.sub.o], [DI.sub.n] [right arrow] N(0, 1) in distribution

(ii) under the alternative [H.sub.f], [DI.sub.n] [right arow] -[infinity] in probability,

(iii) under the alternative [H.sub.g], [DI.sub.n] [right arrow] +[infinity] in probability.



By difference, it follows that:


From the multivariate The use of multiple variables in a forecasting model.  central limit theorem central limit theorem

In statistics, any of several fundamental theorems in probability. Originally known as the law of errors, in its classic form it states that the sum of a set of independent random variables will approach a normal distribution regardless of the
 and assumption (A1), we can now immediately obtain the asymptotic distribution of


under the null hypothesis of equivalence [H.sub.o].



with A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) and B = [DELTA][C.sub.[empty set]]([gamma], [[gamma].sub.o])

Let [[omega].sup.2] = T [LAMBDA The Greek letter "L," which is used as a symbol for "wavelength." A lambda is a particular frequency of light, and the term is widely used in optical networking. Sending "multiple lambdas" down a fiber is the same as sending "multiple frequencies" or "multiple colors. ] [T.sup.t], we then have


Remark 3.3. One can note that there are some important measures of divergence which can not be written as [empty set]-divergence; for instance, the divergence measures given by Battacharyya, Sharma-Mittal and Reyni. However, such measures can be written in the following form:

[C.sub.[empty set],h]([[theta].sub.1], [[theta].sub.2]) = h([C.sub.[empty set]]([[theta].sub.1], [[theta].sub.2]))

where h is a differentiable increasing function (Math.) a function whose value increases when that of the variable increases, and decreases when the latter is diminished; also called a monotonically increasing function ltname>.

See also: Increase
 mapping from [0, +[infinity][ onto [0,+[infinity][, with h(0) = 0 and h'(x) > 0.

We present these divergence measures, in the following table.
Divergence       h function                   [empty set]

Battacharyya     [h.sub.B](x) = -ln(-x +1)    [[empty set].sub.B](x) =
                                              -[x.sup.1/2] + 1/2(x+1)

Sharma-Mittal    [h.sub.S](x) = 1/s-1         [[empty set].sub.s](x) =
                 [[(1+r(r-1)x).sup.s-1/r-1    [x.sup.r] - r(x-1) - 1/
                 - 1]                         r(r-1)

Renyi            [h.sub.R](x) = 1/r(r-1)      [[empty set].sub.R](x) =
                 ln(r(r-1)x + 1)              [x.sup.r] - r(x-1) - 1/

Table 1: (h; [empty set])-Divergences with r [not equal to] 0, 1

Theorem (3.2) is quite general and gives us a wide variety of asymptotic standard normal tests for model selection based on divergence type statistics. Part (ii) and (iii) also implies that the test is consistent. In the next section, we detail the testing procedures based on Theorem (3.2) by using bootstrap methods.

4. Bootstrap Methods

Implementation of the model selection procedure proposed in section 3 requires the following computations:

(i) Estimation of the parameters [[??].sub.n] and [[??].sub.n],

(ii) Computation of the two divergences statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) and the difference [[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sub.n]) - [D.sub.n]([[??].sub.n])],

(iii) Computation of the variance [[??].sup.2] of [[??].sub.n] and finally, computation of [DI.sub.n] [equivalent to] [[??].sub.n]/[??]

Specifically, we carry out the following steps:

1) Let [F.sub.n] be the empirical probability Empirical probability, also known as a posteriori probability, relative frequency, or experimental probability, is the ratio of the number favorable outcomes to the total number of trials[1]  distribution of the original data [x.sub.1], [x.sub.2], ... , [x.sub.n] i.e., [F.sub.n] : mass 1/n at [x.sub.i], (i = 1, 2, ... , n):


Then draw an i.i.d "bootstrap sample" [x.sup.*.sub.1], [x.sup.*.sub.2], ... , [x.sup.*.sub.n] from [F.sub.n], i.e., draw [x.sup.*.sub.i] randomly with replacement from the observed values [x.sub.1], [x.sub.2], ... , [x.sub.n],

2) Using this bootstrap sample [x.sup.*.sub.i], estimate the competing models to obtain [[theta].sup.*.sub.n] and [[gamma].sup.*.sub.n]. Then calculate the statistic

[[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sup.*.sub.n]) - [D.sub.n]([[??].sup*.sub.n])]

3) Independently repeat steps 1 and 2 a large number of times S, say S=1000. Obtain "bootstrap replications" [[??].sub.n.sup.*]1, [[??].sup.*2.sub.n], ... , [[??].sup.*S.sub.n], and compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer.  the sample variance of {[[??].sup.*j.sub.n], j = 1,..., S}.


where [bar.B] = 1/S [S.summation over (j=1)] [[??].sup.*j.sub.n] is the average of "bootstrap replications".

Once the bootstrap variance [[??].sup.2.sub.*] is obtained, the test statistic [DI.sub.n] is calculated easily using the initial estimates [[??].sub.n] and [[??].sub.n]. Under suitable regularity conditions and for a large number of replications [10], [[??].sup.2.sub.*] is a consistent estimator of [[omega].sup.2].

Thus, from theorem 3.2, a testing procedure for model selection can be based on the comparison of the value of [DI.sub.n] to critical values from a standard normal table. For example, at 5% significance level, we compare [DI.sub.n] with -1.96 and 1.96. If [DI.sub.n] falls between -1.96 and 1.96, we conclude that both estimated models fit the data equally well. If [DI.sub.n] is less than -1.96 (or larger than 1.96), then we reject the null hypothesis in favor of the alternative hypothesis alternative hypothesis Epidemiology A hypothesis to be adopted if a null hypothesis proves implausible, where exposure is linked to disease. See Hypothesis testing. Cf Null hypothesis.  that the estimated model F(x|[[??].sub.n]) (or G(x|[[??].sub.n])) is closer to the true distribution.

Although using the bootstrap method to obtain an estimate of [[omega].sup.2], the basic justification of the preceding testing comes from the asymptotic properties obtained in Theorem 3.2.

5. Numerical Study

We present briefly the basic assumptions on the model and parameter estimators, and we define our general divergence type statistics. Assumption (A2): The observed data [X.sub.i], i = 1,..., are independent and are identically distributed (iid) with some common true distribution H.

The sample space X is partitioned par·ti·tion  
a. The act or process of dividing something into parts.

b. The state of being so divided.

 into M mutually disjoint dis·joint
To put out of joint; dislocate.
 fixed cells [C.sub.1], [C.sub.2], ... , [C.sub.M]. Let n be the sample size. Corresponding to the partition A reserved part of disk or memory that is set aside for some purpose. On a PC, new hard disks must be partitioned before they can be formatted for the operating system, and the Fdisk utility is used for this task.  [C.sub.1], [C.sub.2], ... , [C.sub.M] we can compute the vector of observed cell probabilities

f = [([f.sub.1], [f.sub.2], f ... , [f.sub.M]).sup.t] where [f.sub.i] = 1/n [n.summation over (j=1)] [I.sub.[C.sub.i]]([X.sub.j]), for i = 1, 2, ... , M. (5.1)

and [I.sub.[C.sub.i]]([X.sub.j]) is the indicator function In mathematics, an indicator function or a characteristic function is a function defined on a set that indicates membership of an element in a subset :


Let a specified model be [H.sub.[theta]] = {H(x|[theta]), [theta] [member of] [THETA] [subset] [[??].sup.d]} and denote de·note  
tr.v. de·not·ed, de·not·ing, de·notes
1. To mark; indicate: a frown that denoted increasing impatience.

 the vector of its predicted cell probabilities by:

h([theta]) = ([h.sub.1]([theta]), [h.sub.2]([theta]), ... , [h.sub.M][([theta])).sup.t] where [h.sub.i]([theta]) = [[integral].sub.[C.sub.i]] dH(x|[theta])

where H(x|[theta]) is joint distribution for [X.sub.i].

We suppose [h.sub.i]([theta]) > 0 and [h.sub.i]([theta]) is continuously differentiable (Assumption A1) for every i = 1, 2,..., M.

To illustrate the model selection procedure in the preceding section, we consider an example. We need to define the competing models, and the divergence type statistic to measure the departure of each proposed parametric model from the data generating process.

Here, we choose an important measure of divergence given by Renyi [25] which can be written in following form :


and limiting cases for [alpha] = 0 and [alpha] = 1. That is,




which is Kullback-Leibler divergence.

In case that [f.sub.[[theta].sub.1]] and [f.sub.[[theta].sub.2] are discrete probability distributions In probability theory, a probability distribution is called discrete if it is characterized by a probability mass function. Thus, the distribution of a random variable X is discrete, and X is then called a discrete random variable, if

, their Renyi's divergence is


In statistical literature, the problem of choosing between the family of log-normal distributions In probability and statistics, the log-normal distribution is the single-tailed probability distribution of any random variable whose logarithm is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y  and the family of exponential distributions In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. They are often used to model the time between independent events that happen at a constant average rate.  has a long history. See [7] and [4] among others.

The log-normal distribution is parameterized by r = ([r.sub.1], [r.sub.2]) and has density



The exponential distribution with parameter [beta] has density

g(x|[beta]) = 1/[beta] exp exp
1. exponent

2. exponential
 (-x/[beta]) for x > 0

and zero otherwise.

The estimator used for each competing model is the maximum likelihood estimator (MLE). Specifically, for the log-normal model,


For the exponential 1. (mathematics) exponential - A function which raises some given constant (the "base") to the power of its argument. I.e.

f x = b^x

If no base is specified, e, the base of natural logarthims, is assumed.
 model, the MLE is the sample average, i.e.,

[??] = 1/n [n.summation over (i=1)] [x.sub.i]

Lastly, we use the Renyi's divergence measure (5) to evaluate the discrepancy of a proposed model from the true data generating process. We partition the real line into M intervals {([a.sub.i-1], [a.sub.i]), i = 1, ... , M} where [a.sub.i] is a real number. The choice of the cells is discussed below. The Renyi statistic for the log-normal and exponential models are:




where [h.sub.i](r) and [g.sub.i]([beta]) are probabilities of the interval ([a.sub.i-1], [a.sub.i]) under h(x|r) and g(x|[beta]) respectively, and f is the vector of observed cell probabilities defined in (4).

In our Monte Carlo Monte Carlo (môNtā` kärlō`), town (1982 pop. 13,150), principality of Monaco, on the Mediterranean Sea and the French Riviera.  study, we consider various sets of experiments in which the data are generated from a mixture of an exponential distribution and a log-normal distribution. These two distributions are calibrated cal·i·brate  
tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates
1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument):
 so that have the same population means and variances, namely one and one. Hence the data generating process has density

d(p) = p Exponential (1) + (1 - p) Log-normal (-0.047, 0.5)

where p is set to some specific value for each set of experiments. In each set of experiments, several random samples are drawn from this mixture of distributions. The sample size varies from 100 to 1,000, and each sample size the number of replications is 1,000.

Throughout, the chosen partition has, four cells defined by the values [a.sub.0] = 1.0, [a.sub.1] = 1.5, [a.sub.2] = 2.0, [a.sub.3] = 3.0, and [a.sub.4] = +1. Similarly to the minimum Chi-square methods, note that because the log-normal distribution has two parameters, hence four is the minimum number of cells for which a perfect fit is not always achieved. Note also that the shapes of the log-normal and exponential densities differ greatly around the origin. This motivates the choice of [a.sub.0] = 1.0. The value [alpha] = 0.5 in (5) corresponds, approximatively, to the common density function in [1, +[infinity][ under the null hypothesis [H.sub.o] (see figure 1-c).


We choose five different values for p which are: 0.00, 0.25, 0.41, 0.75 and 1.00. Although our proposed model selection procedure does not require that the data generating process belong to either of the competing models, we consider the two limiting cases p = 0.00 and p = 1.00 for they correspond to the correctly specified cases.

The value p = 0.410 is determined to be the value for which the estimated log-normal distribution and the estimated exponential distribution are approximatively at equal distance from the mixture d(p) according to Renyi's divergence. Thus this set of experiments corresponds approximatively to the null hypothesis of our proposed model selection test [DI.sub.n].

The results of our four sets of experiments are presented in Tables 1-5. The first half of each table gives the average values of the ML estimators [??], [[??].sub.1], and [[??].sub.2], the divergence goodness-of-fit statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]), and the model selection [DI.sub.n] with its bootstrap estimated variance [[??].sup.2.sub.*]. The values in parentheses See parenthesis.

parentheses - See left parenthesis, right parenthesis.
 are standard errors. The second half of each table gives in percentage the number of times our proposed model selection procedures based on the method described in the previous section, favor the log-normal model, the exponential model, or are indecisive in·de·ci·sive  
1. Prone to or characterized by indecision; irresolute: an indecisive manager.

2. Inconclusive: an indecisive contest; an indecisive battle.
. The tests are conducted at the 5% nominal level This article is about the term used in sound and signal processing. For usage in statistics, see nominal measurement.

Nominal level is the operating level at which an electronic signal processing device is designed to operate.

In the two sets of experiments (p = 0 and p = 1) where one model is correctly specified, we use the labels "correct" and "incorrect" when a choice is made. This allows a comparison with the asymptotic N(0, 1) approximation approximation /ap·prox·i·ma·tion/ (ah-prok?si-ma´shun)
1. the act or process of bringing into proximity or apposition.

2. a numerical value of limited accuracy.
 under our null hypothesis of equivalence.

Tables 1 and 5, report the cases when one model is correctly specified. It is well-known that the MLE is consistent for the true parameter value under correct specification.

For example, in Table 1, the log-normal model is correctly specified, and the MLE ofe r = ([r.sub.1], [r.sub.2]) approaches the true value [r.sub.o] = (-0.047, 0.5) as the sample size increases from 100 to 1000. The bootstrap estimator of [omega] also converges as the sample size becomes larger. The test statistic for model selection [DI.sub.n] approximatively increases at a rate [square root of n]. In table 5, when the exponential model is correctly specified, one can observe similar results.

The second half of Table 1, summarizes the results for our model selection procedure. The method performs quite well and select the correct model almost 100% of the times, as expected.

For Tables 2, 3 and 4, the data was generated neither from the log-normal model nor from the exponential model, but from a mixture of these two models. Hence, the log-normal and the exponential model are both incorrectly specified.

In Table 3, the data generating process is chosen such that the log-normal model and the exponential model are approximatively equally close to it. The test statistic [DI.sub.n] is expected to have a limiting standard normal N(0, 1). This roughly confirmed in Table 3. For example, for n = 1000, [DI.sub.n] has mean -0.044 and standard error 0.910.

From our limited Monte Carlo study, one can observe that test statistic for model selection [DI.sub.n] works relatively well, and fits equally well the data with a probability of around 95%.

6. Discusson

In summary, by analogy with the classical type chi-square statistics, we have introduced the divergence measures and propose some convenient asymptotically standard normal tests for model selection based on type divergence statistics that use estimators in a quite general class. The tests are designed to determine whether the estimated competing models are as close to the true distribution against the alternative hypothesis that one estimated model is closer, where closeness is measured according to discrepancy implicit in the divergence type statistic used. To determine the statistical divergence for the discrepancy between the observed data and a specific parametric, computation has done by some numerical technique, by the help of Bootstrap methods, for evaluating the estimator of the asymptotic variance of our test statistic.

Several Monte Carlo experiments were conducted and showed that our procedure performs relatively well. Our work can be used to compare the power of tests statistics for model selection, based on some other type measures of information.


[1] Akaike H., 1973, Information theory and an Extension of the Likelihood Ratio Principe. Proceedimgs of the Second International Symposium of Information Theory, Ed. by Petrov, B.N. and Csaki, F. Budapest: Akademiai Kiado, pp. 257-281.

[2] Andews D.W.K., 1967a, Chi-Square Diagnostic Tests for Econometric Models Econometric models are used by economists to find standard relationships among aspects of the macroeconomy and use those relationships to predict the effects of certain events (like government policies) on inflation, unemployment, growth, etc. : Theory, Econometrica, 56, pp. 1419-1453.

[3] Andews D.W.K., 1988b, Chi-Square Diagnostic Tests for Econometric Models: Introduction and Applications, Journal of Econometrics econometrics, technique of economic analysis that expresses economic theory in terms of mathematical relationships and then tests it empirically through statistical research. , 37, pp. 135-156.

[4] Atkinson A.C., 1970, A Method for Discriminating Between Models, Journal of Royal Statistical Society, Series B, 32, pp. 323-353.

[5] Bar-Hen A. and Daudin J.J., 1995, Generalization gen·er·al·i·za·tion
1. The act or an instance of generalizing.

2. A principle, a statement, or an idea having general application.
 of the Mahalanobis distance In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed.  in the mixed case, Journal of Multivariate Analysis multivariate analysis,
n a statistical approach used to evaluate multiple variables.

multivariate analysis,
n a set of techniques used when variation in several variables has to be studied simultaneously.
, 53, pp. 332-342.

[6] Cochran W.G., The A2 Test of goodness of fit Goodness of fit means how well a statistical model fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e. , Ann. Math. Statist stat·ism  
The practice or doctrine of giving a centralized government control over economic planning and policy.

statist adj.
., 23, pp. 315-345.

[7] Cox D.R., 1962, Further Esults on Tests of Separate Families of Hypotheses, Journal of the Royal Statistical Society The Journal of the Royal Statistical Society is a series of three peer-reviewed statistics journals published by Blackwell Publishing for the London-based Royal Statistical Society. , Series B, 24, pp. 406-421.

[8] Cressie N. and Read T.R.C., 1984, Multinomial mul·ti·no·mi·al  
See polynomial.

[multi- + (bi)nomial.]

 goodness of fit tests, Journal of the Royal Statistical Society, Series B, 46, pp. 440-464.

[9] Csiszar I., 1967, Information-type measures of difference of probability distributions Many probability distributions are so important in theory or applications that they have been given specific names. Discrete distributions
With finite support
  • The Bernoulli distribution, which takes value 1 with probability p
 and indirect observations, Studia Sci. Math. Hung., pp. 299-318.

[10] Efron, 1982, The Jackknife jack·knife  
1. A large clasp knife.

2. Sports A dive in the pike position, in which the diver straightens out to enter the water hands first.

, the boostrap and Other Resampling Plans, CBMSNSF Regional Conference Series in Applied Mathematics, 38.

[11] Jeffrey H., 1946, Theory of probability Noun 1. theory of probability - the branch of applied mathematics that deals with probabilities
probability theory

applied math, applied mathematics - the branches of mathematics that are involved in the study of the physical or biological or sociological
, Univ. Oxford, London.

[12] Burbea J., 1984, The Bose-Einstein entropy entropy (ĕn`trəpē), quantity specifying the amount of disorder or randomness in a system bearing energy or information. Originally defined in thermodynamics in terms of heat and temperature, entropy indicates the degree to which a given  of degree [R] and Jensen difference, Utilitas Math., 26, pp. 171-192.

[13] Kagan M., 1963, On the theory of Fisher's amount information, Sov. Math. Dokl, 4, pp. 99-993.

[14] Kullback S., Leibler, 1951, On the information and Sufficiency, Ann. Math. Statist., 22, pp. 79-86.

[15] Liese F. and Vajda I., 1987, Convex Statistical Distances, Teubner, Leipzig.

[16] Menendez M.L., Pardo Morales D., and Salicru M., 1997, Divergences measures between populations : applications in the exponential family In probability and statistics, an exponential family is any class of probability distributions having a certain form. This special form is chosen for mathematical convenience, on account of some useful algebraic properties; as well as for generality, as exponential families are in , Communications in Statistics (Theory and Methods), 25, pp. 1099-1117.

[17] Moore D.S D.S Drainage Structure (flood protection) ., 1977, Generalized Inverses In mathematics, a generalized inverse or pseudoinverse of a matrix A is a matrix that has some properties of the inverse matrix of A but not necessarily all of them. The term "the pseudoinverse" commonly means the Moore-Penrose pseudoinverse. , Wald's Method and the Construction of Chi-Squared Tests chi-squared test

one of the statistical techniques for determining (1) if there are significant differences between two or more series of frequencies or proportions and (2) whether one series of proportions is significantly different from a control series.
 of fit, Journal of Statistical Association, 7, pp. 131-137.

[18] Moore D.S., 1984, Measures of lack of fit from Tests of Chi-Squared Type, Journal of Statistical Planning and Inference (logic) inference - The logical process by which new facts are derived from known facts by the application of inference rules.

See also symbolic inference, type inference.
, 7, pp. 131-137.

[19] Morales D., Pardo L., and Vajda I., 1997, Some new statistics for testing hypotheses in parametric models, Journal of Multivariate Analysis, 10, pp. 151-166.

[20] Morales D., Pardo L., and Zografos K., 1998, Informational distances and related statistics in mixed continuous and categorical That which is unqualified or unconditional.

A categorical imperative is a rule, command, or moral obligation that is absolutely and universally binding.

Categorical is also used to describe programs limited to or designed for certain classes of people.
 variables, Journal of Statistical Planning and Inference, 75, pp. 47-63.

[21] Morales D., Pardo L., 2001, Some approximations to power functions of [empty set]-divergences tests in parametric models, Test, 10, pp. 249-269.

[22] Nayak T.K., 1985, On diversity measures based on entropy functions, Communications in Statistics (Theory and Methods), 14, pp. 203-215.

[23] Pardo L. Salicru M. Menendez M.L., and Morales D., 1995, Divergence mesures based on entropy functions and statistical inference Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population. It is distinguished from descriptive statistics. , Sankhya, Series B, 57, pp. 315-337.

[24] Pardo L., Morales D., Salicru M., and Menndez, 1994, Asumptotic properties of divergence statistics in a stratified stratified /strat·i·fied/ (strat´i-fid) formed or arranged in layers.

Arranged in the form of layers or strata.
 random sampling and its applications to test satistical hypotheses, Journal of Statistical Planning and Inference, 38, pp. 201-222.

[25] Renyi A., 1961, On measures of entropy and information, Proc. 4slth Berkeley Symp. on Math. Statist. Univ. Calif. Press, Berkeley, 1, pp. 547-561.

[26] Salicru M., Menendez, Pardo L., and Morales D., 1994, On the applications of divergence type mesures in testing statistical hypoteses, Journal of Multivariate Analysis, 51, pp. 372-391.

[27] Serfling R.J., 1980, Approximations Theorems This is a list of theorems, by Wikipedia page. See also
  • list of fundamental theorems
  • list of lemmas
  • list of conjectures
  • list of inequalities
  • list of mathematical proofs
  • list of misnamed theorems
  • Existence theorem
 of Mathematical Statistics Mathematical statistics uses probability theory and other branches of mathematics to study statistics from a purely mathematical standpoint.

Mathematical statistics is the subject of mathematics that deals with gaining information from data.
, John Wiley John Wiley may refer to:
  • John Wiley & Sons, publishing company
  • John C. Wiley, American ambassador
  • John D. Wiley, Chancellor of the University of Wisconsin-Madison
  • John M. Wiley (1846–1912), U.S.
, New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of

[28] Sharma B.D., Mittal D.P., 1977, New nonadditive measures of entropy for discrete probability distributions, J. Math. Sci., 10, pp. 28-40.

[29] Taneja I.J., 1987, Statistical aspects of divergence measures, Journal of Statistical Planning and Inference, 16, pp. 136-145.

[30] Taneja I.J., 1989, On generalized information measures and their applications, Adv. Electron. Phys., 76, pp. 327-413.

[31] Vadja I., 1973, [chi square]-divergence and generalized Fisher's information, Trans. 6th Prague Conf. on Inform. Theory Statistical Decision Functions and Random Process, Prague, pp. 873-886.

[32] Vuong Q. and Weiren W., 1993, Selecting Estimated Models Using Chi-Square Statistics, Annals an·nals  
1. A chronological record of the events of successive years.

2. A descriptive account or record; a history: "the short and simple annals of the poor" 
 D'Economie et de Statistique, 30, pp. 144-164.

[33] Zografos K., Ferentinos K., and Papaioannou T., 1990, [empty set]-Divergence statistics: sampling properties and multinomial goodness of fit and divergence tests, Communication in Statistics (Theory and Methods), 19, pp. 1785-1802.

[34] Watson G.S., 1959, Some Recent Results in Chi-Square Goodness-of-fit Tests, Biometrics, 15, pp. 440-468.

Papa Ngom

Laboratoire de Mathematiques appliquees (LMA LMA left mentoanterior (position of fetus). ),

Universite Cheikh Anta Diop--Dakar--Senegal

Table 1

Data generating Process = Log-norm(-0.047, 0.5)

           n                     100                300

         [??]               0.927 (0.051)      0.925 (0.028)
     [[??].sub.1]          -0.046 (0.052)     -0,046 (0.021)
     [[??].sub.2]           0.497 (0.035)       0.500 (0.021)
     [[??].sub.*]           1.413 (0.181)       1.383 (0.146)
[D.sub.n]([[??].sub.n])     3.699 (0.336)       3.636 (0.178)
[D.sub.n]([[??].sub.n])     3.131 (0.394)       3.103 (0.214)
      [DI.sub.n]            4.081 (1.187)       6.726 (1.082)
       Incorrect                 0%                  0%
      Indecisive                 0%                  0%
        Correct                 100%                100%

           n                     600                1000

         [??]               0.924 (0.020)      0.925 (0.016)
     [[??].sub.1]          -0.047 (0.021)     -0.047 (0.016)
     [[??].sub.2]           0.500 (0.014)      0.500 (0.011)
     [[??].sub.*]           1.325 (0.125)      1.303 (0.107)
[D.sub.n]([[??].sub.n])     3.617 (0.125)      3.616 (0.096)
[D.sub.n]([[??].sub.n])     3.087 (0.152)      3.092 (0.116)
      [DI.sub.n]            9.846 (1.121)     12.806 (1.286)
       Incorrect                 0%                  0%
      Indecisive                 0%                  0%
        Correct                 100%                100%

Table 2

DGP = 0.25 Exp (1) + 0.75 Log-norm (-0.047, 0.5)

       n                   100                   300

     [beta]           0.949 (0.061)         0.944 (0.038)
   [r.sub.1]         -0.181 (0.075)        -0,180 (0.046)
   [r.sub.2]          0.795 (0.122)         0.804 (0.074)
    [omega]           1.349 (0.409)         1.265 (0.264)
[D.sub.n]([??])       3.735 (0.336)         3.677 (0.198)
[D.sub.n)([??])       3.572 (0.385)         3.519 (0.225)
   [DI.sub.n]         1.295 (0.877)         2.609 (1.016)
   Favor Exp                0%                    0%
   Indecisive              76%                   38%
  Favor Log-n              24%                   62%

       n                   600                   1000

     [beta]           0.943 (0.026)         0.944 (0.019)
   [r.sub.1]         -0.179 (0.032)        -0.180 (0.042)
   [r.sub.2]          0.802 (0.053)         0.806 (0.042)
    [omega]           1.233 (0.188)         1.240 (0.165)
[D.sub.n]([??])       3.665 (0.135)         3.664 (0.103)
[D.sub.n)([??])       3.505 (0.154)         3.507 (0.117)
   [DI.sub.n]         3.250 (1.082)         4.091 (1.113)
   Favor Exp                0%                    0%
   Indecisive              12%                    2%
  Favor Log-n              88%                   98%

Table 3

DGP = 0.410 Exp (1) + 0.590 Log-norm (-0.047, 0.5)

       n                  100                   300

    [beta]           0.957 (0.069)         0.955 (0.040)
   [r.sub.1]        -0.263 (0.090)        -0,263 (0.051)
   [r.sub.2]         0.932 (0.131)         0.941 (0.074)
    [omega]          1.201 (0.343)         1.125 (0.198)
[D.sub.n]([??])      3.775 (0.358)         3.712 (0.196)
[D.sub.n]([??])      3.771 (0.398)         3.711 (0.218)
  [DI.sub.n]         0.048 (0.896)         0.031 (0.921)
   Favor Exp               1%                    1%
  Indecisive              96%                   97%
  Favor Log-n              3%                    2%

       n                  600                   1000

    [beta]           0.955 (0.028)         0.955 (0.021)
   [r.sub.1]        -0.264 (0.035)        -0.265 (0.027)
   [r.sub.2]         0.942 (0.055)         0.944 (0.042)
    [omega]          1.103 (0.162)         1.103 (0.132)
[D.sub.n]([??])      3.710 (0.140)         3.706 (0.106)
[D.sub.n]([??])      3.711 (0.154)         3.708 (0.117)
  [DI.sub.n]         0.008 (0.908)        -0.044 (0.910)
   Favor Exp               1%                    1%
  Indecisive              97%                   97%
  Favor Log-n              2%                    2%

Table 4

DGP = 0.75 Exp (1) + 0.25 Log-norm (-0.047, 0.5)

       n                   100                   300

     [beta]           0.986 (0.090)         0.981 (0.051)
   [r.sub.1]         -0.441 (0.116)        -0,443 (0.067)
   [r.sub.2]          1.153 (0.135)         1.158 (0.080)
    [omega]           0.947 (0.254)         0.853 (0.138)
[D.sub.n]([??])       3.919 (0.416)         3.868 (0.229)
[D.sub.n]([??])       4.132 (0.436)         4.082 (0.239)
   [DI.sub.n]        -2.319 (0.902)        -4.426 (1.074)
   Favor Exp               66%                   99%
   Indecisive              34%                    1%
  Favor Log-n               0%                    0%

       n                   600                   1000

     [beta]           0.981 (0.036)         0.980 (0.027)
   [r.sub.1]         -0.445 (0.046)        -0.443 (0.036)
   [r.sub.2]          1.159 (0.055)         1.158 (0.043)
    [omega]           0.840 (0.105)         0.835 (0.090)
[D.sub.n]([??])       3.865 (0.105)         3.855 (0.121)
[D.sub.n]([??])       4.082 (0.164)         4.070 (0.127)
   [DI.sub.n]        -6.388 (1.076)        -8.238 (1.190)
   Favor Exp               100%                  100%
   Indecisive               0%                    0%
  Favor Log-n               0%                    0%

Table 5

Data generating Process = Exponential (1)

       n                   100                   300

     [beta]           1.008 (0.105)         1.001 (0.059)
   [r.sub.1]         -0.570 (0.131)        -0,577 (0.076)
   [r.sub.2]          1.266 (0.128)         1.284 (0.078)
    [omega]           0.840 (0.227)         0.757 (0.107)
[D.sub.n]([??])       4.068 (0.466)         4.023 (0.250)
[D.sub.n]([??])       4.378 (0.465)         4.339 (0.250)
   [DI.sub.n]        -3.833 (1.040)        -7.300 (1.082)
    Correct                100%                  100%
   Indecisive               0%                    0%
   Incorrect                0%                    0%

       n                   600                   1000

     [beta]           1.001 (0.040)         1.000 (0.031)
   [r.sub.1]         -0.576 (0.052)        -0.577 (0.040)
   [r.sub.2]          1.280 (0.052)         1.282 (0.043)
    [omega]           0.738 (0.083)         0.735 (0.074)
[D.sub.n]([??])       4.012 (0.179)         4.007 (0.138)
[D.sub.n]([??])       4.328 (0.178)         4.324 (0.137)
   [DI.sub.n]       -10.565 (1.195)       -13.745 (1.430)
    Correct                100%                  100%
   Indecisive               0%                    0%
   Incorrect                0%                    0%
COPYRIGHT 2007 Research India Publications
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2007 Gale, Cengage Learning. All rights reserved.

 Reader Opinion




Article Details
Printer friendly Cite/link Email Feedback
Author:Ngom, Papa
Publication:Global Journal of Pure and Applied Mathematics
Geographic Code:6SENE
Date:Apr 1, 2007
Previous Article:On greedy algorithms with respect to generalized Walsh system.
Next Article:On the eigenstructure of a Sturm-Liouville problem with an impedance boundary condition.

Related Articles
Understanding statistics using computer demonstrations.
A method for estimating infant mortality rate for Nepal.
A mild recovery develops in Spain.
Statistical inference based on divergence measures.
Information technology; proceedings.
Linear models in statistics, 2d ed.
Trends in applied statistics research.

Terms of use | Copyright © 2014 Farlex, Inc. | Feedback | For webmasters