# Inference using sample means of parametric nonlinear data transformations.

In empirical HSR, statistics of key analytic interest are often of the following general form

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

where [gamma] = E[g([beta], X)] is the parameter of ultimate interest to be estimated by equation (1), g() is a known (possibly nonlinear) transformation, [??] is a pre-estimate of [beta]--a vector of "deeper" model parameters, and [X.sub.i], denotes a vector of observed data on X for the ith member of a sample of size n (i = 1, ..., N). The three most commonly encountered formulations of equation (1)--average treatment effect (ATE), average marginal effect (AME), and average incremental effect (AIE)--correspond to the following formulations of g( ), respectively

m([beta], 1, [X.sub.0]) - m([beta], 0, [X.sub.0]) (2)

g ([beta], X) = [partial derivative] m([beta], [X.sub.p], [X.sub.0) / [partial derivative] [X.sub.p] (3)

m([beta], [X.sub.p] + [DELTA], [X.sub.0]) - m([beta], [X.sub.p] [X.sub.0]) (4)

where m([beta], [X.sub.p], [X.sub.0]) = E[Y | [X.sub.p], [X.sub.0]] is a regression function written so as to highlight the distinction between a policy-relevant regressor of interest, [X.sub.p], and a vector of regression controls, [X.sub.o]; X = [[X.sub.p] [X.sub.o]]; [beta] is a vector of regression parameters, and A is a known exogenous (usually policy-driven) increment to [X.sub.p]. After the regression parameter estimates are obtained [e.g., [??]--estimated via the nonlinear least (NLS) method], under fairly general conditions, in conjunction with equation (1), the formulations in equation (2), (3), and (4), respectively yield consistent estimators of the ATE when [X.sub.p] is binary; the AME when [X.sub.p] is continuous and interest is in the effect attributable to an infinitesimal policy change; and the AIE when [X.sub.p] is discrete or continuous and the relevant policy increment is [DELTA]. In this note, we focus on the specification and computation of the correct "t-statistic" for equation (1) as derived from standard asymptotic theory. This t-statistic has the following general form

[square root of N]([??] - [[gamma].sup.[dagger]]) / se ([??]) (5)

where [[gamma].sup.[dagger]] is the relevant "null" value of [gamma] (as in a test of the null hypothesis [H.sub.0]: [gamma] = [[gamma].sup.[dagger]]), and se([??]) is the asymptotic standard error of equation (1) defined as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] being a consistent estimator of the asymptotic variance of [??]. Under slightly stronger conditions than those required for the consistency of equation (1), it can be shown that equation (5) is asymptotically standard normal distributed. In the remainder of this note, we take the consistency and asymptotic normality of [??] as given, and concentrate on the correct formulation of se([??]) as derived from standard asymptotic theory. In the Appendix, we show that for most (if not all) of the useful forms of equation (1)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

[[nabla].sub.[beta]g]([??], [X.sub.i]) (a row vector) denotes the gradient of g([beta], X)evaluated at [X.sub.i], and [??], and AVAR ([??]) is an estimator of the asymptotic covariance matrix of [??]. Dowd, Greene, and Norton (2014) opine that inclusion of B in equation (6) "seems incorrect to us" and exclude it from the suggested formulation of the asymptotic standard error of [??] given in their equation (18). That the inclusion of B is not "incorrect" is proven by the derivation in an appendix that is included among the supplementary materials for this paper. (1,2)

So how does our derivation of equation (6) differ from the approach taken by Dowd, Greene, and Norton (2014) in deriving their equation (18)? To answer we must take a closer look at the sampling assumptions underlying the respective derivations. Unlike Dowd, Greene, and Norton (2014), we assume that the sample observations for all the relevant variables, including X, are drawn randomly from the relevant joint distribution for the population of interest. We impose no sampling restriction on X and we allow it to be random--the same assumption that we make for the other elements of the data vector. For example, for the case in which [??] is obtained by regressing Ton 2I(based on a correctly specified nonlinear model), we treat both Y and X as random in sampling. What we have described here is simple random sampling (SRS), which is clearly the most commonly encountered type of sampling in empirical HSR. Moreover, we adopt the conventional approach to deriving the asymptotic properties of [??] (in particular, its asymptotic standard error). Conventional asymptotic theory assumes SRS and focuses on the limiting properties of estimators as the sample size (Nin our case) approaches [infinity]. (The analysis assumes that the same sample is used to estimate [beta] and the mean effect on m(), and that the objective of the analysis involves generalization to a population that is potentially large compared to the sample; equation (6) can be modified for alternative assumptions).

Dowd, Greene, and Norton (2014), on the other hand, supplant SRS with an unrealistic fixed-in-repeated-sampling assumption (FIRS) in which the matrix of observations on X (say [chi] with N rows and K columns; K being the number of regressors in X) is fixed (nonrandom) so that, in sampling, only the observations on Y are randomly drawn. Moreover, they assume that increases in the sample size are not effected by increasing N but instead by holding it and [chi] fixed and drawing repeated observations on Y for each of the fixed rows of [chi]. Denote the number of such FIRS observations on Y as [N.sup.*]. Their formulation of the asymptotic standard error of [??] is obtained while fixing [chi] and N and allowing [N.sup.*] to approach [infinity]. Both the FIRS and its attendant asymptotics are unrealistic and irrelevant in the present context because there are no empirical contexts in HSR for which this assumption could be reasonably maintained. The FIRS assumption is characteristic of an experiment where no generalization is intended beyond the specific designed distribution of X, rather than of an analysis of a random sample drawn from and intended to be representative of a larger population.

Given that (a) the formulation of the asymptotic standard error of y in equation (6) is realistic, relevant, and indeed correct; (b) the practical significance of including B can only be conclusively evaluated in the context of each particular empirical application after it has been estimated; and (c) the calculation of B imposes only minimal marginal computational burden, there remains no reasonable justification for excluding it from equation (6) as Dowd, Greene, and Norton (2014) recommend in their equation (18).

DOI: 10.1111/1475-6773.12494

ACKNOWLEDGMENTS

Joint Acknowledgment/Disclosure Statement This research was supported by a grant from the Agency for Healthcare Research and Quality (R01 HSO1743401) and by grants from the National Institutes of Health (NIH-1 R01 CA155329-01 and NIH-1RC4AG038635-01).

Disclosures: None.

Disclaimers: None.

NOTES

(1.) Basu and Rathouz (2005) and Wooldridge (2010) derive equation (6) via standard asymptotic theory. See Appendix C of Basu and Rathouz (2005), and problem 12.17 of Wooldridge (2010), the solution for which is on pp. 184-186 of Wooldridge (2011).

(2.) In the Supplementary Appendix, we also show that for the most general version of equation (1), an additional term would need to be included in equation (6). Such general versions of equation (1) coincide with models that do not afford causal interpretation of the estimates of (1 obtained there from and are, therefore, of very limited empirical analytic interest. See the Supplementary Appendix for details.

References

Basu, A., and P. J. Rathouz. 2005. "Estimating Marginal and Incremental Effects on Health Outcomes Using Flexible Link and Variance Function Models." Biostatistics 6: 93-109.

Dowd, B. E., W. H. Greene, and E. C. Norton. 2014. "Computation of Standard Errors." Health Services Research 49: 731-50.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data, 2nd Edition. Cambridge, MA: MIT Press.

--. 2011. Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, 2nd Edition. Cambridge, MA: MIT Press.

Supporting information