Endogeneity in logistic regression models.To the Editor: Ethelberg et al. (1) report on a study of the determinants of hemolytic uremic syndrome hemolytic uremic syndrome n. A syndrome in which hemolytic anemia and thrombocytopenia occur with acute renal failure, marked in children by sudden gastrointestinal bleeding, urine that contains red blood cells and is scanty in volume, and resulting from Shiga toxin-producing Escherichia coli Escherichia coli (ĕsh'ərĭk`ēə kō`lī), common bacterium that normally inhabits the intestinal tracts of humans and animals, but can cause infection in other parts of the body, especially the urinary tract. . The dataset is relatively small, and the authors use stepwise stepwise incremental; additional information is added at each step. stepwise multiple regression used when a large number of possible explanatory variables are available and there is difficulty interpreting the partial regression logistic regression In statistics, logistic regression is a regression model for binomially distributed response/dependent variables. It is useful for modeling the probability of an event occurring as a function of other factors. models to detect small differences. This indicates that the authors were aware of the limitations of the statistical power of the study. Despite this, the study has an analytic flaw that seriously reduces the statistical power of the study. An often overlooked problem in building statistical models is that of endogeneity, a term arising from econometric analysis, in which the value of one independent variable is dependent on the value of other predictor variables. Because of this endogeneity, significant correlation can exist between the unobserved factors contributing to both the endogenous independent variable and the dependent variable, which results in biased estimators (incorrect regression coefficients) (2). Additionally, the correlation between the dependent variables can create significant multicollinearity, which violates the assumptions of standard regression models and results in inefficient estimators. This problem is shown by model-generated coefficient standard errors that are larger than true standard errors, which biases the interpretation towards the null hypothesis null hypothesis, n theoretical assumption that a given therapy will have results not statistically different from another treatment. null hypothesis, n and increases the likelihood of a type II error. As a result, the power of the test of significance for an independent variable [X.sub.1] is reduced by a factor of (1-[r.sup.2.sub.(1|2,3,....)]), where [r.sub.(1|2,3,....)] is defined as the multiple correlation coefficient Noun 1. multiple correlation coefficient - an estimate of the combined influence of two or more variables on the observed (dependent) variable statistics - a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the for the model [X.sub.1] = f([X.sub.2],[X.sub.3],...), and all [X.sub.i] are independent variables in the larger model (3,4). The results of this study clearly show that the presence of bloody diarrhea is an endogenous variable Endogenous variable A value determined within the context of a model. Related: Exogenous variable. in the model showing predictors of hemolytic uremic syndrome, in that the diarrhea is shown to be predicted by, and therefore strongly correlated with, several other variables used to predict hemolytic uremic syndrome. Similarly, Shiga toxin Shiga toxins are a family of related toxins with two major groups, Stx1 and Stx2, whose genes are considered to be part of the genome of lambdoid prophages.[1] The toxins are named for Kiyoshi Shiga, who first described the bacterial origin of dysentery caused by 1 and 2 (stx1, stx2) genes are expected to be key predictors of the presence of bloody diarrhea, independent of strain, due to the known biochemical effects of that toxin (5,6). Because the strain is in part determined by the presence of these toxins, including both strain and genotype genotype (jēn`ətīp'): see genetics. genotype Genetic makeup of an organism. The genotype determines the hereditary potentials and limitations of an individual. in the model means that the standard errors for variables for the Shiga-containing strains and bloody diarrhea symptom are likely to be too high, and hence the significance levels (p values) obtained from the regression models are higher than the true probability because of a type I error. This flaw is a particular problem with studies that use a conditional stepwise technique for including or excluding variables. The authors note that they excluded variables from the final model if the significance in initial models for those variables was less than an [alpha] level (p value) of 0.05. Given the inefficiencies due to the endogeneity of bloody diarrhea, as well as those that may result from other collinearities significant predictors were likely excluded from the study, although this cannot be confirmed from the data presented. The problems associated with the endogeneity of bloody diarrhea can be overcome by a number of approaches. For example, the simultaneous equations approach, such as that outlined by Greene (7), would have used predicted values of bloody diarrhea from the first stage of the model as instrumental variables for the actual value in the model for hemolytic uremic syndrome. Structural equations approaches, such as those suggested by Greenland (8), would also be appropriate. However, bloody diarrhea is not the only endogenous variable in their models, and extensive modeling would be necessary to isolate the independent effects of the various predictor variables. Given the small sample size, this may not be possible. The underlying problem in the study is the theoretical specifications for the model, in which genotypes, strains, and symptoms are mixed, despite reasonable expectations that differences in 1 level may predict differences in another. For example, the authors' data demonstrate that all O157 strains contain the stx2 gene and have higher rates of causing hemolytic uremic syndrome and bloody diarrhea. This calls into question the decision to build an analytic model combining 3 distinct levels of analysis. Such a model depends on the independence of the variables to gain unbiased, efficient estimators. The model of the relationships one would develop from a theoretical perspective would predict the opposite (Figure). We expect that the genotypes (by definition) will predict the strain, and that strains have a differential effect on symptoms. The high level of intervariable correlation due to these relationships, coupled with the decision to exclude variables based on likely inefficient p values, raises questions concerning the reliability of the results and conclusions. In particular, the conclusions that strains O157 and O111 are not predictors of hemolytic uremic syndrome deserve to be revisited; other excluded variables may also be significant predictors when considered under an appropriate model. These problems point to the need to ensure proper specification of analytic models and to demonstrate due regard for the underlying assumptions of statistical models used. [FIGURE OMITTED] George Avery George Gordon Avery (born 11 February,1925) was an Australian athlete who competed mainly in the triple jump. He competed for Australia in the 1948 Summer Olympics held in London, Great Britain in the triple jump where he won the silver medal. * * University of Minnesota (body, education) University of Minnesota - The home of Gopher. http://umn.edu/. Address: Minneapolis, Minnesota, USA. , Duluth, Minnesota, USA References (1.) Ethelberg S, Olson KEP KEP Kessler-Ellis Products KEP Kurier-, Express- und Paket (German: Messenger, Express and Parcel) KEP Nepalganj, Nepal - Nepalganj (Airport Code) KEP Kaiser Electroprecision , Schuetz F, Jensen C, Schiellerup P, Engberg J, et al. Virulence factors for hemolytic uremic syndrome, Denmark. Emerg Infect Dis. 2004;10:842-7. (2.) Dowd B, Town R. Does X really cause Y? Washington: Academy Health; 2002. (3.) Hsieh F, Bloch D, Larsen M. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17:1623-34. (4.) Menard S. Applied logistic regression analysis, 2nd ed. Thousand Oaks Thousand Oaks, residential city (1990 pop. 104,352), Ventura co., S Calif., in a farm area; inc. 1964. Avocados, citrus, vegetables, strawberries, and nursery products are grown. (CA): Sage Publications This article or section needs sources or references that appear in reliable, third-party publications. Alone, primary sources and sources affiliated with the subject of this article are not sufficient for an accurate encyclopedia article. : 2002. p. 75-8. (5.) Blackall DP, Marques Marques may refer to:
(6.) Harrison LM, van Haaften WC, Tesh VL. Regulation of proinflammatory cytokine Cytokine Any of a group of soluble proteins that are released by a cell to send messages which are delivered to the same cell (autocrine), an adjacent cell (paracrine), or a distant cell (endocrine). expression by Shiga toxin 1 and/or lipopolysaccharides lipopolysaccharides (lip´ōpol´ēsak´ n.pl a compound or complex of lipid and carbohydrate. in the human monocytic cell line THP-1. Infect Immun. 2004;72:2618-27. (7.) Greene W. Gender economics courses in liberal arts colleges It may never be fully completed or, depending on its its nature, it may be that it can never be completed. However, new and revised entries in the list are always welcome. Liberal arts colleges : further results. J Econ Ed. 1998;29:291-300. (8.) Greenland S, Brumback B. An overview of relations among causal modelling methods. Int J Epidemiol. 2002;31:1030-7. Address for correspondence: George Avery, 1207 Ordean Ct., BohH 320, University of Minnesota Duluth, Duluth, MN 55812, USA; fax: 218-726-7186; email: aver0042@umn.edu In response: We appreciate Avery's interest (1) in our article (2), although we believe the critique of the methods is largely based on misunderstandings. We developed a model for the risk of progression to hemolytic uremic syndrome (HUS) containing 3 variables: whether the infecting Shiga toxin--producing Escherichia coli isolate had the [stx.sub.2] gene, age of the patient, and occurrence of bloody diarrhea. The critique relates to the fact that bloody diarrhea and [stx.sub.2] are not independent, since we showed that [stx.sub.2] was strongly associated with progression to HUS (odds ratio [OR] = 18.9) and also weakly associated with development of bloody diarrhea (OR = 2.5) (2). Avery uses the term endogeneity as it is used in econometric analyses; however, the term "intermediary variable," i.e., a factor in the causal pathway leading from exposure to disease, is more frequently used in epidemiology. In this context, we chose to consider bloody diarrhea as a potential confounder con·found tr.v. con·found·ed, con·found·ing, con·founds 1. To cause to become confused or perplexed. See Synonyms at puzzle. 2. (3). A confounder is a risk factor but is also independently associated with the exposure variable of interest and is not regarded as part of the causal pathway (see online Figure at http://www.cdc. gov/ncidod/EID/vol11no03/05-0071-G.htm). Bloody diarrhea may act as a confounder if patients with bloody stools are treated differently by the examining physicians or if, for instance, unknown virulence factors contribute to the risk of having bloody stools. A second line of critique of our methods apparently develops from the idea that virulence factors determine the serogroup. This idea, however, is a biological misconception. In fact, virulence genes and serogroup are independent at the genetic level, and an important point of our article is that HUS is determined by the virulence gene composition of the strain rather than the serogroup. Regardless of the status of the bloody diarrhea variable, excluding it from the model doesn't change the conclusions of the article. A revised model contains only the significant variables age and [stx.sub.2] (Table). Serotype serotype /se·ro·type/ (ser´o-tip) the type of a microorganism determined by its constituent antigens; a taxonomic subdivision based thereon. se·ro·type n. See serovar. v. O157 is still not an independent predictor of HUS, and this result is robust. Steen Ethelberg * and Kare Molbak * * Statens Serum Institut Statens Serum Institut (English: the State Serum Institute), or SSI for short, is a Danish sector research institute located on the island of Amager in Copenhagen. , Copenhagen, Denmark
Table. Risk factors for HUS among 343 STEC patients, Denmark 1997-2003,
comparison of models with and without bloody diarrhea as a variable *
No. of No. (%) with
Determinant patients HUS
eae
Negative 111 0 (0.0)
Positive 232 21 (9.1)
[stx.sub.2]
Negative 159 1 (0.6)
Positive 184 20 (10.9)
Age
[greater than or equal to] 8 y 178 3 (1.7)
[less than or equal to] 7 y 165 18 (10.9)
Bloody diarrhea
No 218 6 (2.8)
Yes 125 15 (12.0)
O157
No 262 10 (3.8)
Yes 81 11 (13.6)
Original model, New model,
Determinant OR (95% CI) OR (95% CI)
eae
Negative
Positive NI NI
[stx.sub.2]
Negative 1 1
Positive 18.9 (2.4-146) 24.6 (3.2-187)
Age
[greater than or equal to] 8 y 1 1
[less than or equal to] 7 y 11.4 (3.2-41.3) 9.7 (2.7-34.1)
Bloody diarrhea
No
Yes 4.5 (1.6-12.7) EX
O157
No
Yes NS NS
* HUS, hemolytic uremic syndrome; STEC, Shiga toxin-producing
Escherichia coli; OR, odds ratio; CI, confidence interval; NI, not
included (test not appropriate); NS, not significant; EX, excluded from
model.
References (1.) Avery G. Endogeneity in logistic regression models. Emerg Infect Dis. 2005;11: 499-500.. (2.) Ethelberg S, Olsen KE, Scheutz F, Jensen C, Schiellerup P, Enberg J, et al. Virulence factors for hemolytic uremic syndrome, Denmark. Emerg Infect Dis. 2004;10: 842-7. (3.) Griffin PM, Mead PS, Sivapalasingam S. Escherichia coli O157:H7 and other enterohaemorrhagic E. coli E. coli: see Escherichia coli. E. coli in full Escherichia coli Species of bacterium that inhabits the stomach and intestines. E. coli can be transmitted by water, milk, food, or flies and other insects. In: Blaser MJ, Smith PD, Ravdin JI, Greenberg HB, Guerrant RL, editors. Infections of the gastrointestinal tract gastrointestinal tract n. The part of the digestive system consisting of the stomach, small intestine, and large intestine. Gastrointestinal tract . Philadelphia: Lippincott Williams & Wilkins; 2002. p. 627-42. Address for correspondence: Steen Ethelberg, Department of Bacteriology bacteriology Study of bacteria. Modern understanding of bacterial forms dates from Ferdinand Cohn's classifications. Other researchers, such as Louis Pasteur, established the connection between bacteria and fermentation and disease. , Mycology mycology Study of fungi (see fungus), including mushrooms and yeasts. Many fungi are useful in medicine and industry. Mycological research has led to the development of such antibiotic drugs as penicillin, streptomycin, and tetracycline. and Parasitology Parasitology The scientific study of parasites and of parasitism. Parasitism is a subdivision of symbiosis and is defined as an intimate association between an organism (parasite) and another, larger species of organism (host) upon which the parasite is , Statens Serum Institut, Artillerivej 5, DK-2300 Copenhagen S, Denmark; fax: 45-3268-8238; email: set@ssi.dk |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion