Selected estimated models with [empty set]divergence statistics.Abstract When testing for discriminating dis·crim·i·nat·ing adj. 1. a. Able to recognize or draw fine distinctions; perceptive. b. Showing careful judgment or fine taste: between two competing models, a statistical method, usually, proceeds by evaluating the measure for discrepancy DISCREPANCY. A difference between one thing and another, between one writing and another; a variance. (q.v.) 2. Discrepancies are material and immaterial. between the observed data and each parametric model In statistics, a parametric model is a parametrized family of probability distributions, one of which is presumed to describe the way a population is distributed. Examples
n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample. statistic a numerical value calculated from a number of observations in order to summarize them. is generally chosen. This paper addresses the question of testing for choosing between two estimated models using some [empty set]Divergence type statistics. We choice for arbitrary pn asymptotically normal estimators to be used for introducing these statistics. The results here are illustrated by a simulation study, then Large Sample theory and bootstrap See boot. (operating system, compiler) bootstrap  To load and initialise the operating system on a computer. Normally abbreviated to "boot". From the curious expression "to pull oneself up by one's bootstraps", one of the legendary feats of Baron von Munchhausen. methods are used to construct our [empty set]divergence tests in parametric models. AMS AMS  Andrew Message System Subject Classification: 62F03, 62F40, 62F05, 94A17. Keywords: Asymptotic distributions In mathematics and statistics, an asymptotic distribution is a hypothetical distribution that is in a sense the "limiting" distribution of a sequence of distributions. A distribution is an ordered set of random variables
for i , [empty set]Divergence statistics, bootstrap methods, testing statistical hypotheses, test goodness fit. 1. Introduction Cochran [6], Watson [34] and Moore [17] [18] have provided comprehensive surveys on Pearson chisquare type statistics, i.e., quadratic forms In mathematics, a quadratic form is a homogeneous polynomial of degree two in a number of variables. The term quadratic form is also often used to refer to a quadratic space, which is a pair (V,q) where V is a vector space over a field k in the cell frequencies. Recently, Andrews [2], [3] has extended the Pearson chisquare testing chisquare test: see statistics. method to nondynamic parametric models, i.e., to models with covariates. Because Pearson chisquare statistics provide natural measures for the discrepancy between the observed data and a specific parametric model, they have also been used for discriminating among competing models. Such a situation is frequent in Social Sciences where many competing models are proposed to fit a given sample. A well know difficulty is that each chisquare statistic tends to become large without an increase in its degrees of freedom as the sample size increases. As a consequence goodnessoffit tests based on Pearson type chisquare statistics will generally reject the correct specification of every competing model. To circumvent cir·cum·vent tr.v. cir·cum·vent·ed, cir·cum·vent·ing, cir·cum·vents 1. To surround (an enemy, for example); enclose or entrap. 2. To go around; bypass: circumvented the city. such a difficulty, a popular method for model selection, which is similar to use of Akaike [1] Information Criterion There are a number of statistics that can act as an information criterion. They include:
The preceding selection rule, however, is not entirely satisfactory. Since chisquare statistics depend on the sample and are therefore random, their actual values are subject to statistical variations, we shall propose some convenient asymptotically standard normal tests for model selection based on [empty set]Divergence type statistics. By analogy with the approach introduced by Vuong [32], our tests are testing the null hypothesis null hypothesis, n theoretical assumption that a given therapy will have results not statistically different from another treatment. null hypothesis, n that the competing models are as close to the data generating process (DGP DGP Director General of Police (India) DGP DogGonePain DGP Dissimilar Gateway Protocol DGP Deutsche Gesellschaft für Parodontologie DGP Data Generating Process DGP Daily Grammar Practice (education) ) where closeness of a model is measured according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. the discrepancy implicit in Adj. 1. implicit in  in the nature of something though not readily apparent; "shortcomings inherent in our approach"; "an underlying meaning" underlying, inherent the [empty set]Divergence type statistics. Following Morales and Pardo [21], let [P.sub.[theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ]] : [theta] [member of] [THETA] be a family of probability measures on a measurable space (X, [beta]x) with open [THETA] [subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original. ] [[??].sup.d], d [greater than or equal to] 1. Measures [P.sub.[theta]] are described by probability density functions Probability density function The function that describes the change of certain realizations for a continuous random variable. (p.d.f.) [f.sub.[theta]](x) = d[P.sub.[theta]]/d[micro]] (x) with respect to a dominating [sigma]finite measure [micro] on X. Sample space, X, is the support of [sigma]finite measure [micro]. Statistical model, ((X, [beta]x), {[sub.[theta]] : [theta] [member of] x, [micro]), satisfies the regularity assumptions (R1)(R3) appearing in pages 144145 of Serfling [27] and the identifiability condition In mathematics, the identifiability condition is defined as which says that if a function evaluates the same, then the arguments must be the same. I.e. : (R4) if [f.sub.[[theta].sub.1]] = [f.sub.[[theta].sub.2]], then [[theta].sub.1] = [[theta].sub.2]. If [[theta].sub.0] is the true value of the parameter [theta] and ([R.sub.1])(R4) holds, then there exist a strongly consistent sequence [[??].sub.n] of roots of the likelihood equations such that [MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression. NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. .], (1.1) where IF ([[theta].sub.0]) is the Fisher information In statistics and information theory, the Fisher information (denoted ) is the variance of the score. It is named in honor of its inventor, the statistician R.A. Fisher. matrix and [[??].sub.n] is assumed to be the maximum likelihood estimator (MLE MLE Maximum Likelihood Estimation MLE Managed Learning Environment MLE Maximum Likelihood Estimate MLE Medical Laboratory Evaluation (Medical Laboratory Proficiency Testing Program, Washington, DC) ). We consider testing procedures based on a sequence of observations [X.sub.n] = ([X.sub.1], [X.sub.2],..., [X.sub.n]) with independent components taken from a p.d.f of the family [f.sub.[theta]] : [theta] [member of] [THETA]. Recently, in the literature, many papers appeared where divergence divergence In mathematics, a differential operator applied to a threedimensional vectorvalued function. The result is a function that describes a rate of change. The divergence of a vector v is given by or type measures of information have been used in testing statistical hypothesis. We refer, among others, to Cressie and Read [8], Nayak [22], Zografos, Ferentinos and Papaioannou [33] Salicru, Morales, Menendez and and Pardo [23], BarHen and Daudin [5] and references therein. Salicru et al. [26] introduced the divergence statistics [S.sub.[empty set],n] [equivalent to] 2n[C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1.2) is the [empty set]divergence of density from the family [f.sub.[theta]] : [theta] [member of] [THETA] introduced by Csiszar [9]. Liese and Vajda [15] have introduced a systematic theory of these divergences. Morales et al. [16] have established that the asymptotic distribution of [S.sub.[empty set],n] [??] [[chi square chi square (kī), n a nonparametric statistic used with discrete data in the form of frequency count (nominal data) or percentages or proportions that can be reduced to frequencies. ].sub.d]. An important problem is to propose some divergences statistics for procedure tests. The asymptotic behavior of the statistics based on [C.sub.[empty set]]([[??].sub.n], [[theta].sub.0]) is needed for choosing between two estimated models. In order to suggest a testing procedure, we present a new method in association with the divergence statistic. The paper is organized as follows. Section 2 introduces the basic notations and defines a class of asymptotically normal estimators. In section 3, we investigate the model selection problem based on divergence type statistics. A large sample test is proposed. In section 4, Efron [10] bootsrap method is used to propose alternative and simpler testing procedures for model selection. Section 5, some simulation results are given. Section 6 concludes the paper and mentions some extensions. 2. Assumption and Asymptotic Behavior of the Divergence Statistic Assumption (A1): (i) The function [empty set] : [0, +[infinity][[right arrow]] [infinity], +[infinity][ is convex Convex Curved, as in the shape of the outside of a circle. Usually referring to the price/required yield relationship for optionfree bonds. and continuous. Its restriction on [0, +[infinity][ is finite, twice continuously differentiable dif·fer·en·tia·ble adj. 1. That can be differentiated: differentiable species. 2. Mathematics Possessing a derivative. , with [empty set](1) = [empty set]'(1) = 0 and [empty set]"(1) = 1; (ii) Each [[theta].sub.0] [member of] [THETA] has an open neighborhood V ([[theta].sub.0]) and 1 [less than or equal to] i, j [less than or equal to] d, it holds: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] condition (i) deals with properties of [empty set]divergence (cf. Liese and Vajda [15]). Condition (ii) is needed to apply delta method In statistics, the delta method is a method for deriving an approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator. for obtaining asymptotic distributions of [empty set]statistics. Conditions sufficient for (ii) are presented in Morales et al. [19]. Assume that ([R.sub.1])([R.sub.4]) and A1 hold. Under [H.sub.o] : [theta] [member of] [[THETA].sub.o] [subset] [THETA], we present the asymptotic distribution of [C.sub.[empty set]]([[??].sub.n], [[theta].sub.o]). Theorem theorem, in mathematics and logic, statement in words or symbols that can be established by means of deductive logic; it differs from an axiom in that a proof is required for its acceptance. 2.1. Let the model and [empty set] satisfy (R1)(R4) and (A1) respectively. Let [theta] bethe true parameter, with [theta] [not equal to] [[theta].sub.o]. Then we have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]. where [[summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) ].sup.2.sub.[empty set]][theta], [[theta].sub.o]) = [AI.sub.F][([theta]).sup.1] [A.sup.t] and A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) with [DELTA] = [partial derivative partial derivative In differential calculus, the derivative of a function of several variables with respect to change in just one of its variables. Partial derivatives are useful in analyzing surfaces for maximum and minimum points and give rise to partial differential ]/[partial derivative][[theta].sub.1],..., [partial derivative]/[partial derivative][[theta].sub.d]. Proof. A first order Taylor expansion gives [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] As [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], it is clear that the random variables, [square root of n][[C.sub.[empty set]]([[??].sub.n], [[theta].sub.o])[C.sub.[empty set]]([theta], [[theta].sub.o])] and A[square root of n][([[??].sub.n][theta]).sup.t] have the same asymptotic distribution, because [square root of n] o([parallel][[??].sub.n]  [theta][parallel]) = [o.sub.p](1) 3. Selecting Estimated Models As we mentioned earlier, the type divergences statistics can be used to discriminate among alternative models. Let h be the true probability density probability density n. Statistics In both senses also called probability distribution. 1. A function whose integral over a given interval gives the probability that the values of a random variable will fall within the interval. of the observations [X.sub.n] = ([X.sub.1],..., [X.sub.n]). We consider a specified model [F.sub.[theta]] = {F(.[theta]); [theta] [member of] [THETA] [subset] [[??].sup.k} with [f.sub.[theta]](x) as the probability density function. Therefore, we define the discrepancy between the observations and the model [F.sub.[theta]] as following: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Of special interests to us is the situation in which a researcher has two competing parametric models [F.sub.[theta]] and [G.sub.[gamma]] = {G(.[gamma]); [gamma] [member of] [GAMMA] [subset] [[??].sup.k]}, select the better of the two models based on their general discrimination statistics [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [f.sub.[[??].sub.n]]) and [D.sub.n]([[??].sub.n]) = [C.sub.[empty set]](h, [g.sub.[[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1). Definition 3.1. (Equivalent, Better and Worse) Consider two competing models [F.sub.[theta]] and [G.sub.[gamma]] and some discrimination type statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) where [[??].sub.n] and [[??].sub.n] are general estimators satisfying condition (1). Let D(x) be the probability limit of [square root of n][D.sub.n](x). The hypotheses [H.sub.o ]: D([[theta].sub.o]) = D([[gamma].sub.o]) [H.sub.f] : D([[theta].sub.o]) < D([[gamma].sub.o]) [H.sub.g] : D([[theta].sub.o]) > D([[gamma].sub.o]) mean that the estimated models F(x[[theta].sub.o]) and G(x[[gamma].sub.o]) are equivalent, that F(x[[theta].sub.o]) is better than G(x[[gamma].sub.o]), and that F(x[[theta].sub.o]) is worse than G(x[[gamma].sub.o]), respectively. Definition (3.1) calls for some remarks. First, in does not require that the same divergence type statistics be used in forming [D.sub.n]([[theta].sub.n]) and [D.sub.n]([[gamma].sub.n]). Choosing, however, different discrepancies for evaluating competing models is hardly justified. Second and more importantly, it allows estimators other than the matching divergence estimators to be used. In any case, since [[??].sub.n], [[??].sub.n] are consistent estimators of [[theta].sub.o] and [[gamma].sub.o] by condition (1), we can use, from theorem 3.1, [square root of n]{[C.sub.[empty set](h, [f.sub.[[??].sub.n]])  [C.sub.[empty set]](h, [g.sub.[[??].sub.n]])} to consistently estimate the indicator [C.sub.[empty set]](h; [f.sub.[[theta].sub.o]])  [C.sub.[empty set]](h, [g.sub.[[gamma].sub.o]]) which will be zero under the null hypothesis Ho. Using a standard Taylor expansion, we can obtain the asymptotic distribution of [square root of n]{[C.sub.[empty set]](h, [f.sub.[[??].sub.n]])  [C.sub.[empty set]](h, [[??].sub.n])}, which is normal with zero mean and variance [[[omega].sup.2] under [H.sub.o]. The detailed derivation derivation, in grammar: see inflection. and the expression for for [[omega].sup.2] can be found in the proof of the theorem (3.2). Hence we define the statistic [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (3.1) where [[??].sup.2] is a consistent estimator of [[omega].sup.2].(DI stands for Divergence Indicator). We have, Theorem 3.2. (Asymptotic Distribution of DI Statistic)Given H1H4, then (i) under the null hypothesis [H.sub.o], [DI.sub.n] [right arrow] N(0, 1) in distribution (ii) under the alternative [H.sub.f], [DI.sub.n] [right arow] [infinity] in probability, (iii) under the alternative [H.sub.g], [DI.sub.n] [right arrow] +[infinity] in probability. Proof. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] By difference, it follows that: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] From the multivariate The use of multiple variables in a forecasting model. central limit theorem central limit theorem In statistics, any of several fundamental theorems in probability. Originally known as the law of errors, in its classic form it states that the sum of a set of independent random variables will approach a normal distribution regardless of the and assumption (A1), we can now immediately obtain the asymptotic distribution of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] under the null hypothesis of equivalence [H.sub.o]. Define: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] with A = [DELTA][C.sub.[empty set]]([theta], [[theta].sub.o]) and B = [DELTA][C.sub.[empty set]]([gamma], [[gamma].sub.o]) Let [[omega].sup.2] = T [LAMBDA The Greek letter "L," which is used as a symbol for "wavelength." A lambda is a particular frequency of light, and the term is widely used in optical networking. Sending "multiple lambdas" down a fiber is the same as sending "multiple frequencies" or "multiple colors. ] [T.sup.t], we then have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Remark 3.3. One can note that there are some important measures of divergence which can not be written as [empty set]divergence; for instance, the divergence measures given by Battacharyya, SharmaMittal and Reyni. However, such measures can be written in the following form: [C.sub.[empty set],h]([[theta].sub.1], [[theta].sub.2]) = h([C.sub.[empty set]]([[theta].sub.1], [[theta].sub.2])) where h is a differentiable increasing function (Math.) a function whose value increases when that of the variable increases, and decreases when the latter is diminished; also called a monotonically increasing function ltname>. See also: Increase mapping from [0, +[infinity][ onto [0,+[infinity][, with h(0) = 0 and h'(x) > 0. We present these divergence measures, in the following table. Divergence h function [empty set] Battacharyya [h.sub.B](x) = ln(x +1) [[empty set].sub.B](x) = [x.sup.1/2] + 1/2(x+1) SharmaMittal [h.sub.S](x) = 1/s1 [[empty set].sub.s](x) = [[(1+r(r1)x).sup.s1/r1 [x.sup.r]  r(x1)  1/  1] r(r1) Renyi [h.sub.R](x) = 1/r(r1) [[empty set].sub.R](x) = ln(r(r1)x + 1) [x.sup.r]  r(x1)  1/ r(r1) Table 1: (h; [empty set])Divergences with r [not equal to] 0, 1 Theorem (3.2) is quite general and gives us a wide variety of asymptotic standard normal tests for model selection based on divergence type statistics. Part (ii) and (iii) also implies that the test is consistent. In the next section, we detail the testing procedures based on Theorem (3.2) by using bootstrap methods. 4. Bootstrap Methods Implementation of the model selection procedure proposed in section 3 requires the following computations: (i) Estimation of the parameters [[??].sub.n] and [[??].sub.n], (ii) Computation of the two divergences statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]) and the difference [[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sub.n])  [D.sub.n]([[??].sub.n])], (iii) Computation of the variance [[??].sup.2] of [[??].sub.n] and finally, computation of [DI.sub.n] [equivalent to] [[??].sub.n]/[??] Specifically, we carry out the following steps: 1) Let [F.sub.n] be the empirical probability Empirical probability, also known as a posteriori probability, relative frequency, or experimental probability, is the ratio of the number favorable outcomes to the total number of trials^{[1]} distribution of the original data [x.sub.1], [x.sub.2], ... , [x.sub.n] i.e., [F.sub.n] : mass 1/n at [x.sub.i], (i = 1, 2, ... , n): [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Then draw an i.i.d "bootstrap sample" [x.sup.*.sub.1], [x.sup.*.sub.2], ... , [x.sup.*.sub.n] from [F.sub.n], i.e., draw [x.sup.*.sub.i] randomly with replacement from the observed values [x.sub.1], [x.sub.2], ... , [x.sub.n], 2) Using this bootstrap sample [x.sup.*.sub.i], estimate the competing models to obtain [[theta].sup.*.sub.n] and [[gamma].sup.*.sub.n]. Then calculate the statistic [[??].sub.n] [equivalent to] [square root of n][[D.sub.n]([[??].sup.*.sub.n])  [D.sub.n]([[??].sup*.sub.n])] 3) Independently repeat steps 1 and 2 a large number of times S, say S=1000. Obtain "bootstrap replications" [[??].sub.n.sup.*]1, [[??].sup.*2.sub.n], ... , [[??].sup.*S.sub.n], and compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer. the sample variance of {[[??].sup.*j.sub.n], j = 1,..., S}. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], where [bar.B] = 1/S [S.summation over (j=1)] [[??].sup.*j.sub.n] is the average of "bootstrap replications". Once the bootstrap variance [[??].sup.2.sub.*] is obtained, the test statistic [DI.sub.n] is calculated easily using the initial estimates [[??].sub.n] and [[??].sub.n]. Under suitable regularity conditions and for a large number of replications [10], [[??].sup.2.sub.*] is a consistent estimator of [[omega].sup.2]. Thus, from theorem 3.2, a testing procedure for model selection can be based on the comparison of the value of [DI.sub.n] to critical values from a standard normal table. For example, at 5% significance level, we compare [DI.sub.n] with 1.96 and 1.96. If [DI.sub.n] falls between 1.96 and 1.96, we conclude that both estimated models fit the data equally well. If [DI.sub.n] is less than 1.96 (or larger than 1.96), then we reject the null hypothesis in favor of the alternative hypothesis alternative hypothesis Epidemiology A hypothesis to be adopted if a null hypothesis proves implausible, where exposure is linked to disease. See Hypothesis testing. Cf Null hypothesis. that the estimated model F(x[[??].sub.n]) (or G(x[[??].sub.n])) is closer to the true distribution. Although using the bootstrap method to obtain an estimate of [[omega].sup.2], the basic justification of the preceding testing comes from the asymptotic properties obtained in Theorem 3.2. 5. Numerical Study We present briefly the basic assumptions on the model and parameter estimators, and we define our general divergence type statistics. Assumption (A2): The observed data [X.sub.i], i = 1,..., are independent and are identically distributed (iid) with some common true distribution H. The sample space X is partitioned par·ti·tion n. 1. a. The act or process of dividing something into parts. b. The state of being so divided. 2. a. into M mutually disjoint dis·joint v. To put out of joint; dislocate. fixed cells [C.sub.1], [C.sub.2], ... , [C.sub.M]. Let n be the sample size. Corresponding to the partition A reserved part of disk or memory that is set aside for some purpose. On a PC, new hard disks must be partitioned before they can be formatted for the operating system, and the Fdisk utility is used for this task. [C.sub.1], [C.sub.2], ... , [C.sub.M] we can compute the vector of observed cell probabilities f = [([f.sub.1], [f.sub.2], f ... , [f.sub.M]).sup.t] where [f.sub.i] = 1/n [n.summation over (j=1)] [I.sub.[C.sub.i]]([X.sub.j]), for i = 1, 2, ... , M. (5.1) and [I.sub.[C.sub.i]]([X.sub.j]) is the indicator function In mathematics, an indicator function or a characteristic function is a function defined on a set that indicates membership of an element in a subset : [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]. Let a specified model be [H.sub.[theta]] = {H(x[theta]), [theta] [member of] [THETA] [subset] [[??].sup.d]} and denote de·note tr.v. de·not·ed, de·not·ing, de·notes 1. To mark; indicate: a frown that denoted increasing impatience. 2. the vector of its predicted cell probabilities by: h([theta]) = ([h.sub.1]([theta]), [h.sub.2]([theta]), ... , [h.sub.M][([theta])).sup.t] where [h.sub.i]([theta]) = [[integral].sub.[C.sub.i]] dH(x[theta]) where H(x[theta]) is joint distribution for [X.sub.i]. We suppose [h.sub.i]([theta]) > 0 and [h.sub.i]([theta]) is continuously differentiable (Assumption A1) for every i = 1, 2,..., M. To illustrate the model selection procedure in the preceding section, we consider an example. We need to define the competing models, and the divergence type statistic to measure the departure of each proposed parametric model from the data generating process. Here, we choose an important measure of divergence given by Renyi [25] which can be written in following form : [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and limiting cases for [alpha] = 0 and [alpha] = 1. That is, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] which is KullbackLeibler divergence. In case that [f.sub.[[theta].sub.1]] and [f.sub.[[theta].sub.2] are discrete probability distributions In probability theory, a probability distribution is called discrete if it is characterized by a probability mass function. Thus, the distribution of a random variable X is discrete, and X is then called a discrete random variable, if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (5.2) In statistical literature, the problem of choosing between the family of lognormal distributions In probability and statistics, the lognormal distribution is the singletailed probability distribution of any random variable whose logarithm is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y and the family of exponential distributions In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. They are often used to model the time between independent events that happen at a constant average rate. has a long history. See [7] and [4] among others. The lognormal distribution is parameterized by r = ([r.sub.1], [r.sub.2]) and has density [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], otherwise. The exponential distribution with parameter [beta] has density g(x[beta]) = 1/[beta] exp exp abbr. 1. exponent 2. exponential (x/[beta]) for x > 0 and zero otherwise. The estimator used for each competing model is the maximum likelihood estimator (MLE). Specifically, for the lognormal model, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] For the exponential 1. (mathematics) exponential  A function which raises some given constant (the "base") to the power of its argument. I.e. f x = b^x If no base is specified, e, the base of natural logarthims, is assumed. 2. model, the MLE is the sample average, i.e., [??] = 1/n [n.summation over (i=1)] [x.sub.i] Lastly, we use the Renyi's divergence measure (5) to evaluate the discrepancy of a proposed model from the true data generating process. We partition the real line into M intervals {([a.sub.i1], [a.sub.i]), i = 1, ... , M} where [a.sub.i] is a real number. The choice of the cells is discussed below. The Renyi statistic for the lognormal and exponential models are: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] where [h.sub.i](r) and [g.sub.i]([beta]) are probabilities of the interval ([a.sub.i1], [a.sub.i]) under h(xr) and g(x[beta]) respectively, and f is the vector of observed cell probabilities defined in (4). In our Monte Carlo Monte Carlo (môNtā` kärlō`), town (1982 pop. 13,150), principality of Monaco, on the Mediterranean Sea and the French Riviera. study, we consider various sets of experiments in which the data are generated from a mixture of an exponential distribution and a lognormal distribution. These two distributions are calibrated cal·i·brate tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates 1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument): so that have the same population means and variances, namely one and one. Hence the data generating process has density d(p) = p Exponential (1) + (1  p) Lognormal (0.047, 0.5) where p is set to some specific value for each set of experiments. In each set of experiments, several random samples are drawn from this mixture of distributions. The sample size varies from 100 to 1,000, and each sample size the number of replications is 1,000. Throughout, the chosen partition has, four cells defined by the values [a.sub.0] = 1.0, [a.sub.1] = 1.5, [a.sub.2] = 2.0, [a.sub.3] = 3.0, and [a.sub.4] = +1. Similarly to the minimum Chisquare methods, note that because the lognormal distribution has two parameters, hence four is the minimum number of cells for which a perfect fit is not always achieved. Note also that the shapes of the lognormal and exponential densities differ greatly around the origin. This motivates the choice of [a.sub.0] = 1.0. The value [alpha] = 0.5 in (5) corresponds, approximatively, to the common density function in [1, +[infinity][ under the null hypothesis [H.sub.o] (see figure 1c). [FIGURE 1 OMITTED] We choose five different values for p which are: 0.00, 0.25, 0.41, 0.75 and 1.00. Although our proposed model selection procedure does not require that the data generating process belong to either of the competing models, we consider the two limiting cases p = 0.00 and p = 1.00 for they correspond to the correctly specified cases. The value p = 0.410 is determined to be the value for which the estimated lognormal distribution and the estimated exponential distribution are approximatively at equal distance from the mixture d(p) according to Renyi's divergence. Thus this set of experiments corresponds approximatively to the null hypothesis of our proposed model selection test [DI.sub.n]. The results of our four sets of experiments are presented in Tables 15. The first half of each table gives the average values of the ML estimators [??], [[??].sub.1], and [[??].sub.2], the divergence goodnessoffit statistics [D.sub.n]([[??].sub.n]) and [D.sub.n]([[??].sub.n]), and the model selection [DI.sub.n] with its bootstrap estimated variance [[??].sup.2.sub.*]. The values in parentheses See parenthesis. parentheses  See left parenthesis, right parenthesis. are standard errors. The second half of each table gives in percentage the number of times our proposed model selection procedures based on the method described in the previous section, favor the lognormal model, the exponential model, or are indecisive in·de·ci·sive adj. 1. Prone to or characterized by indecision; irresolute: an indecisive manager. 2. Inconclusive: an indecisive contest; an indecisive battle. . The tests are conducted at the 5% nominal level This article is about the term used in sound and signal processing. For usage in statistics, see nominal measurement. Nominal level is the operating level at which an electronic signal processing device is designed to operate. . In the two sets of experiments (p = 0 and p = 1) where one model is correctly specified, we use the labels "correct" and "incorrect" when a choice is made. This allows a comparison with the asymptotic N(0, 1) approximation approximation /ap·prox·i·ma·tion/ (ahprok?sima´shun) 1. the act or process of bringing into proximity or apposition. 2. a numerical value of limited accuracy. under our null hypothesis of equivalence. Tables 1 and 5, report the cases when one model is correctly specified. It is wellknown that the MLE is consistent for the true parameter value under correct specification. For example, in Table 1, the lognormal model is correctly specified, and the MLE ofe r = ([r.sub.1], [r.sub.2]) approaches the true value [r.sub.o] = (0.047, 0.5) as the sample size increases from 100 to 1000. The bootstrap estimator of [omega] also converges as the sample size becomes larger. The test statistic for model selection [DI.sub.n] approximatively increases at a rate [square root of n]. In table 5, when the exponential model is correctly specified, one can observe similar results. The second half of Table 1, summarizes the results for our model selection procedure. The method performs quite well and select the correct model almost 100% of the times, as expected. For Tables 2, 3 and 4, the data was generated neither from the lognormal model nor from the exponential model, but from a mixture of these two models. Hence, the lognormal and the exponential model are both incorrectly specified. In Table 3, the data generating process is chosen such that the lognormal model and the exponential model are approximatively equally close to it. The test statistic [DI.sub.n] is expected to have a limiting standard normal N(0, 1). This roughly confirmed in Table 3. For example, for n = 1000, [DI.sub.n] has mean 0.044 and standard error 0.910. From our limited Monte Carlo study, one can observe that test statistic for model selection [DI.sub.n] works relatively well, and fits equally well the data with a probability of around 95%. 6. Discusson In summary, by analogy with the classical type chisquare statistics, we have introduced the divergence measures and propose some convenient asymptotically standard normal tests for model selection based on type divergence statistics that use estimators in a quite general class. The tests are designed to determine whether the estimated competing models are as close to the true distribution against the alternative hypothesis that one estimated model is closer, where closeness is measured according to discrepancy implicit in the divergence type statistic used. To determine the statistical divergence for the discrepancy between the observed data and a specific parametric, computation has done by some numerical technique, by the help of Bootstrap methods, for evaluating the estimator of the asymptotic variance of our test statistic. Several Monte Carlo experiments were conducted and showed that our procedure performs relatively well. Our work can be used to compare the power of tests statistics for model selection, based on some other type measures of information. References [1] Akaike H., 1973, Information theory and an Extension of the Likelihood Ratio Principe. Proceedimgs of the Second International Symposium of Information Theory, Ed. by Petrov, B.N. and Csaki, F. Budapest: Akademiai Kiado, pp. 257281. [2] Andews D.W.K., 1967a, ChiSquare Diagnostic Tests for Econometric Models Econometric models are used by economists to find standard relationships among aspects of the macroeconomy and use those relationships to predict the effects of certain events (like government policies) on inflation, unemployment, growth, etc. : Theory, Econometrica, 56, pp. 14191453. [3] Andews D.W.K., 1988b, ChiSquare Diagnostic Tests for Econometric Models: Introduction and Applications, Journal of Econometrics econometrics, technique of economic analysis that expresses economic theory in terms of mathematical relationships and then tests it empirically through statistical research. , 37, pp. 135156. [4] Atkinson A.C., 1970, A Method for Discriminating Between Models, Journal of Royal Statistical Society, Series B, 32, pp. 323353. [5] BarHen A. and Daudin J.J., 1995, Generalization gen·er·al·i·za·tion n. 1. The act or an instance of generalizing. 2. A principle, a statement, or an idea having general application. of the Mahalanobis distance In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed. in the mixed case, Journal of Multivariate Analysis multivariate analysis, n a statistical approach used to evaluate multiple variables. multivariate analysis, n a set of techniques used when variation in several variables has to be studied simultaneously. , 53, pp. 332342. [6] Cochran W.G., The A2 Test of goodness of fit Goodness of fit means how well a statistical model fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e. , Ann. Math. Statist stat·ism n. The practice or doctrine of giving a centralized government control over economic planning and policy. statist adj. ., 23, pp. 315345. [7] Cox D.R., 1962, Further Esults on Tests of Separate Families of Hypotheses, Journal of the Royal Statistical Society The Journal of the Royal Statistical Society is a series of three peerreviewed statistics journals published by Blackwell Publishing for the Londonbased Royal Statistical Society. , Series B, 24, pp. 406421. [8] Cressie N. and Read T.R.C., 1984, Multinomial mul·ti·no·mi·al n. See polynomial. [multi + (bi)nomial.] mul goodness of fit tests, Journal of the Royal Statistical Society, Series B, 46, pp. 440464. [9] Csiszar I., 1967, Informationtype measures of difference of probability distributions Many probability distributions are so important in theory or applications that they have been given specific names. Discrete distributions With finite support
[10] Efron, 1982, The Jackknife jack·knife n. 1. A large clasp knife. 2. Sports A dive in the pike position, in which the diver straightens out to enter the water hands first. v. , the boostrap and Other Resampling Plans, CBMSNSF Regional Conference Series in Applied Mathematics, 38. [11] Jeffrey H., 1946, Theory of probability Noun 1. theory of probability  the branch of applied mathematics that deals with probabilities probability theory applied math, applied mathematics  the branches of mathematics that are involved in the study of the physical or biological or sociological , Univ. Oxford, London. [12] Burbea J., 1984, The BoseEinstein entropy entropy (ĕn`trəpē), quantity specifying the amount of disorder or randomness in a system bearing energy or information. Originally defined in thermodynamics in terms of heat and temperature, entropy indicates the degree to which a given of degree [R] and Jensen difference, Utilitas Math., 26, pp. 171192. [13] Kagan M., 1963, On the theory of Fisher's amount information, Sov. Math. Dokl, 4, pp. 99993. [14] Kullback S., Leibler, 1951, On the information and Sufficiency, Ann. Math. Statist., 22, pp. 7986. [15] Liese F. and Vajda I., 1987, Convex Statistical Distances, Teubner, Leipzig. [16] Menendez M.L., Pardo Morales D., and Salicru M., 1997, Divergences measures between populations : applications in the exponential family In probability and statistics, an exponential family is any class of probability distributions having a certain form. This special form is chosen for mathematical convenience, on account of some useful algebraic properties; as well as for generality, as exponential families are in , Communications in Statistics (Theory and Methods), 25, pp. 10991117. [17] Moore D.S D.S Drainage Structure (flood protection) ., 1977, Generalized Inverses In mathematics, a generalized inverse or pseudoinverse of a matrix A is a matrix that has some properties of the inverse matrix of A but not necessarily all of them. The term "the pseudoinverse" commonly means the MoorePenrose pseudoinverse. , Wald's Method and the Construction of ChiSquared Tests chisquared test one of the statistical techniques for determining (1) if there are significant differences between two or more series of frequencies or proportions and (2) whether one series of proportions is significantly different from a control series. of fit, Journal of Statistical Association, 7, pp. 131137. [18] Moore D.S., 1984, Measures of lack of fit from Tests of ChiSquared Type, Journal of Statistical Planning and Inference (logic) inference  The logical process by which new facts are derived from known facts by the application of inference rules. See also symbolic inference, type inference. , 7, pp. 131137. [19] Morales D., Pardo L., and Vajda I., 1997, Some new statistics for testing hypotheses in parametric models, Journal of Multivariate Analysis, 10, pp. 151166. [20] Morales D., Pardo L., and Zografos K., 1998, Informational distances and related statistics in mixed continuous and categorical That which is unqualified or unconditional. A categorical imperative is a rule, command, or moral obligation that is absolutely and universally binding. Categorical is also used to describe programs limited to or designed for certain classes of people. variables, Journal of Statistical Planning and Inference, 75, pp. 4763. [21] Morales D., Pardo L., 2001, Some approximations to power functions of [empty set]divergences tests in parametric models, Test, 10, pp. 249269. [22] Nayak T.K., 1985, On diversity measures based on entropy functions, Communications in Statistics (Theory and Methods), 14, pp. 203215. [23] Pardo L. Salicru M. Menendez M.L., and Morales D., 1995, Divergence mesures based on entropy functions and statistical inference Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population. It is distinguished from descriptive statistics. , Sankhya, Series B, 57, pp. 315337. [24] Pardo L., Morales D., Salicru M., and Menndez, 1994, Asumptotic properties of divergence statistics in a stratified stratified /strat·i·fied/ (strat´ifid) formed or arranged in layers. strat·i·fied adj. Arranged in the form of layers or strata. random sampling and its applications to test satistical hypotheses, Journal of Statistical Planning and Inference, 38, pp. 201222. [25] Renyi A., 1961, On measures of entropy and information, Proc. 4slth Berkeley Symp. on Math. Statist. Univ. Calif. Press, Berkeley, 1, pp. 547561. [26] Salicru M., Menendez, Pardo L., and Morales D., 1994, On the applications of divergence type mesures in testing statistical hypoteses, Journal of Multivariate Analysis, 51, pp. 372391. [27] Serfling R.J., 1980, Approximations Theorems This is a list of theorems, by Wikipedia page. See also
Mathematical statistics is the subject of mathematics that deals with gaining information from data. , John Wiley John Wiley may refer to:
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of . [28] Sharma B.D., Mittal D.P., 1977, New nonadditive measures of entropy for discrete probability distributions, J. Math. Sci., 10, pp. 2840. [29] Taneja I.J., 1987, Statistical aspects of divergence measures, Journal of Statistical Planning and Inference, 16, pp. 136145. [30] Taneja I.J., 1989, On generalized information measures and their applications, Adv. Electron. Phys., 76, pp. 327413. [31] Vadja I., 1973, [chi square]divergence and generalized Fisher's information, Trans. 6th Prague Conf. on Inform. Theory Statistical Decision Functions and Random Process, Prague, pp. 873886. [32] Vuong Q. and Weiren W., 1993, Selecting Estimated Models Using ChiSquare Statistics, Annals an·nals pl.n. 1. A chronological record of the events of successive years. 2. A descriptive account or record; a history: "the short and simple annals of the poor" D'Economie et de Statistique, 30, pp. 144164. [33] Zografos K., Ferentinos K., and Papaioannou T., 1990, [empty set]Divergence statistics: sampling properties and multinomial goodness of fit and divergence tests, Communication in Statistics (Theory and Methods), 19, pp. 17851802. [34] Watson G.S., 1959, Some Recent Results in ChiSquare Goodnessoffit Tests, Biometrics, 15, pp. 440468. Papa Ngom Laboratoire de Mathematiques appliquees (LMA LMA left mentoanterior (position of fetus). ), Universite Cheikh Anta DiopDakarSenegal Email: pngom@ucad.sn Table 1 Data generating Process = Lognorm(0.047, 0.5) n 100 300 [??] 0.927 (0.051) 0.925 (0.028) [[??].sub.1] 0.046 (0.052) 0,046 (0.021) [[??].sub.2] 0.497 (0.035) 0.500 (0.021) [[??].sub.*] 1.413 (0.181) 1.383 (0.146) [D.sub.n]([[??].sub.n]) 3.699 (0.336) 3.636 (0.178) [D.sub.n]([[??].sub.n]) 3.131 (0.394) 3.103 (0.214) [DI.sub.n] 4.081 (1.187) 6.726 (1.082) Incorrect 0% 0% Indecisive 0% 0% Correct 100% 100% n 600 1000 [??] 0.924 (0.020) 0.925 (0.016) [[??].sub.1] 0.047 (0.021) 0.047 (0.016) [[??].sub.2] 0.500 (0.014) 0.500 (0.011) [[??].sub.*] 1.325 (0.125) 1.303 (0.107) [D.sub.n]([[??].sub.n]) 3.617 (0.125) 3.616 (0.096) [D.sub.n]([[??].sub.n]) 3.087 (0.152) 3.092 (0.116) [DI.sub.n] 9.846 (1.121) 12.806 (1.286) Incorrect 0% 0% Indecisive 0% 0% Correct 100% 100% Table 2 DGP = 0.25 Exp (1) + 0.75 Lognorm (0.047, 0.5) n 100 300 [beta] 0.949 (0.061) 0.944 (0.038) [r.sub.1] 0.181 (0.075) 0,180 (0.046) [r.sub.2] 0.795 (0.122) 0.804 (0.074) [omega] 1.349 (0.409) 1.265 (0.264) [D.sub.n]([??]) 3.735 (0.336) 3.677 (0.198) [D.sub.n)([??]) 3.572 (0.385) 3.519 (0.225) [DI.sub.n] 1.295 (0.877) 2.609 (1.016) Favor Exp 0% 0% Indecisive 76% 38% Favor Logn 24% 62% n 600 1000 [beta] 0.943 (0.026) 0.944 (0.019) [r.sub.1] 0.179 (0.032) 0.180 (0.042) [r.sub.2] 0.802 (0.053) 0.806 (0.042) [omega] 1.233 (0.188) 1.240 (0.165) [D.sub.n]([??]) 3.665 (0.135) 3.664 (0.103) [D.sub.n)([??]) 3.505 (0.154) 3.507 (0.117) [DI.sub.n] 3.250 (1.082) 4.091 (1.113) Favor Exp 0% 0% Indecisive 12% 2% Favor Logn 88% 98% Table 3 DGP = 0.410 Exp (1) + 0.590 Lognorm (0.047, 0.5) n 100 300 [beta] 0.957 (0.069) 0.955 (0.040) [r.sub.1] 0.263 (0.090) 0,263 (0.051) [r.sub.2] 0.932 (0.131) 0.941 (0.074) [omega] 1.201 (0.343) 1.125 (0.198) [D.sub.n]([??]) 3.775 (0.358) 3.712 (0.196) [D.sub.n]([??]) 3.771 (0.398) 3.711 (0.218) [DI.sub.n] 0.048 (0.896) 0.031 (0.921) Favor Exp 1% 1% Indecisive 96% 97% Favor Logn 3% 2% n 600 1000 [beta] 0.955 (0.028) 0.955 (0.021) [r.sub.1] 0.264 (0.035) 0.265 (0.027) [r.sub.2] 0.942 (0.055) 0.944 (0.042) [omega] 1.103 (0.162) 1.103 (0.132) [D.sub.n]([??]) 3.710 (0.140) 3.706 (0.106) [D.sub.n]([??]) 3.711 (0.154) 3.708 (0.117) [DI.sub.n] 0.008 (0.908) 0.044 (0.910) Favor Exp 1% 1% Indecisive 97% 97% Favor Logn 2% 2% Table 4 DGP = 0.75 Exp (1) + 0.25 Lognorm (0.047, 0.5) n 100 300 [beta] 0.986 (0.090) 0.981 (0.051) [r.sub.1] 0.441 (0.116) 0,443 (0.067) [r.sub.2] 1.153 (0.135) 1.158 (0.080) [omega] 0.947 (0.254) 0.853 (0.138) [D.sub.n]([??]) 3.919 (0.416) 3.868 (0.229) [D.sub.n]([??]) 4.132 (0.436) 4.082 (0.239) [DI.sub.n] 2.319 (0.902) 4.426 (1.074) Favor Exp 66% 99% Indecisive 34% 1% Favor Logn 0% 0% n 600 1000 [beta] 0.981 (0.036) 0.980 (0.027) [r.sub.1] 0.445 (0.046) 0.443 (0.036) [r.sub.2] 1.159 (0.055) 1.158 (0.043) [omega] 0.840 (0.105) 0.835 (0.090) [D.sub.n]([??]) 3.865 (0.105) 3.855 (0.121) [D.sub.n]([??]) 4.082 (0.164) 4.070 (0.127) [DI.sub.n] 6.388 (1.076) 8.238 (1.190) Favor Exp 100% 100% Indecisive 0% 0% Favor Logn 0% 0% Table 5 Data generating Process = Exponential (1) n 100 300 [beta] 1.008 (0.105) 1.001 (0.059) [r.sub.1] 0.570 (0.131) 0,577 (0.076) [r.sub.2] 1.266 (0.128) 1.284 (0.078) [omega] 0.840 (0.227) 0.757 (0.107) [D.sub.n]([??]) 4.068 (0.466) 4.023 (0.250) [D.sub.n]([??]) 4.378 (0.465) 4.339 (0.250) [DI.sub.n] 3.833 (1.040) 7.300 (1.082) Correct 100% 100% Indecisive 0% 0% Incorrect 0% 0% n 600 1000 [beta] 1.001 (0.040) 1.000 (0.031) [r.sub.1] 0.576 (0.052) 0.577 (0.040) [r.sub.2] 1.280 (0.052) 1.282 (0.043) [omega] 0.738 (0.083) 0.735 (0.074) [D.sub.n]([??]) 4.012 (0.179) 4.007 (0.138) [D.sub.n]([??]) 4.328 (0.178) 4.324 (0.137) [DI.sub.n] 10.565 (1.195) 13.745 (1.430) Correct 100% 100% Indecisive 0% 0% Incorrect 0% 0% 

Reader Opinion