Mendelian randomization in the era of genomewide association studies.
A shortcoming regularly arising from observational epidemiologic studies is the reporting of an apparent causal association that cannot be seen in subsequent large, well-designed randomized controlled trials (RCTs).  Dramatically affecting such studies is confounding, in which a particular factor that is not causal but only correlated with a number of environmental factors that influence the probability of developing a particular disease. Such a noncausal factor ends up being incorrectly implicated directly in the pathogenesis of the disease through its relatively indirect relationship with the trait. Another source for an incorrect outcome is the phenomenon of "reverse causation," in which the disease subsequently influences the environmental exposure, leading to it being wrongly implicated in the disease's pathogenesis.
A good example of an observational epidemiologic study throwing up a noncausal link is the connection between homocysteine concentrations and the presentation of cardiovascular disease (7, 8). Although a great source of excitement at the time, this report did not hold up ultimately in subsequent RCTs because of the adverse effects of both confounding factors and reverse causation. Homocysteine concentrations are confounded by an array of environmental factors, including smoking and socioeconomic status, and are attributable to a reverse-causality effect in which the presence of cardiovascular disease actually leads to increased homocysteine concentrations (9).
In general, observed relationships between environmental exposures and diseases that are not confirmed in RCTs can be explained by socioeconomic status, behavioral factors, and the degree of access to healthcare. Attempts have been made, however, to achieve more accurate conclusions from such observational studies, including improved control of anticipated phenotype-specific confounders (which generally are limited to a well-established set), more measurements on a smaller number of study participants (6), and more-extensive replication attempts to make the data interpretations more robust.
One major problem remains, however: RCTs do not always test the same exposures as observational studies. For example, trials may compare differences in exposure in the short term, whereas lifelong exposure differences may be far more relevant to the trait in question. In addition, consistent confounding and reverse-causation effects across multiple studies can produce strong but bogus replicable associations. Another cause for concern is that exposures are rarely ascertained repeatedly so that exposures that show considerable within-person variation can be properly characterized. Such practices can increase the apparent random measurement error. The strength of associations between truly causal risk factors and disease in observational studies is thereby underestimated, owing to this random measurement variation in characterizing the exposure. This leads to an "attenuation by errors" of associations, which has been renamed more recently as "regression dilution bias" (10).
Given the practices we have outlined above, finding interventions that reduce disease risk in various populations can be delayed substantially when observational epidemiology gets it wrong. Indeed, the scientific credibility of this once-revered discipline has been heavily undermined as a consequence.
Genetic epidemiology is seen as the great hope for a healthy future for the field of epidemiology as the limitations of observational epidemiology become increasingly apparent. To ultimately unravel the molecular mechanisms underlying a given phenotype, genetic epidemiology focuses on the inheritance of disease and susceptibility at the individual level.
A growing consensus that the interplay between genetic and environmental factors (i.e., gene-environment interaction) is extremely important in understanding the onset and progression of disease has justified the combining of the genetic and environmental epidemiologic approaches to elucidate causes of human disorders. This discipline also has the potential to uncover environmental factors underlying common diseases that can be modified to improve public health in a much greater way than classic genetics ever did.
Genetic epidemiology, however, also has faced a barrage of criticism over the last 30 years, because of both a lack of consensus and the lack of replication of the vast majority of reported associations between genes and common diseases (7). One could go further and say that if a member of the public read in a newspaper before 2005 that a gene had been associated with a predisposition to a particular common disease, there was a 99% chance that the association was spurious. This state of affairs is attributable to the fact that the only data available at the time were candidate-gene studies and familial linkage-analysis studies. Such methodologies achieved only very limited success in identifying genetic determinants of human disease for a number of reasons, including the generic problem that the linkage-analysis approach is generally poor at identifying common genetic variants that have modest effects and the fact that the candidate gene-association approach relies on a suspected disease-causing gene(s), the identity of which derives from a particular biological hypothesis based on the pathogenesis of the given disease. Thus, given that the pathophysiological mechanisms underlying a disease are unknown, continued use of the hypothesis-driven candidate gene-association approach is likely to identify only a relatively small fraction of the genetic risk factors for the disease.
Genomewide genotyping of >500 000 single-nucleotide polymorphisms (SNPs) can now be readily achieved in an efficient and highly accurate manner. Given that much of human diversity is due to variations in single base pairs distributed throughout the genome, current advances in single base-extension biochemistry and SNP detection via hybridization to synthetic oligonucleotides now permit accurate genotyping on a large scale. This crucial development has made possible the application of large-scale genomewide association studies (GWAS) in the past 4 years, with the development of array-based technology having enabled non-hypothesis-driven investigations of the genome. With this approach, investigators have identified >400 genetic variants that are associated with complex diseases in ophthalmology, endocrinology, cardiology, oncology, immunology, and neurology. A complete catalog of these studies is available at the NIH Web site (http://www.genome.gov/gwastudies).
For the first time in the era of the genetics of common diseases, there is clear a consensus on key signals associated with key phenotypes. Time will tell whether all of these strong associations are fundamental to the biology of the underlying traits, but it is strongly believed that these studies will help define the key molecular pathways influencing common diseases. Functional investigations of the in vitro and in vivo biology of the genes identified in such studies will undoubtedly be fascinating areas of exploration for many scientists. Ultimately, the goal is to translate these findings into more efficacious therapeutic interventions.
In addition to these new hopes for better drugs for society's common health problems, genetic findings with firm scientific support are strongly believed to be useful in genetic epidemiology for assessing more thoroughly and accurately the environmental determinants of disease (such as diet, environmental factors, and behavior) across entire populations rather than only within genetically susceptible subpopulations (11-13). The epidemiologic approach with the most promise in this regard is mendelian randomization (14-16), which is an instrumental variable approach to control for unobserved confounding factors in an observational setting. This approach optimally needs large sample sizes, a requirement that has often been a constraint, but it has become feasible in the GWAS era, with multiple investigative teams collaborating to combine their data sets for metaanalyses, and is now delivering substantial replicable findings (17).
Traditional genetic epidemiology addresses the genetic basis of a given phenotype through the study of the association between genetic and phenotypic variation in a predefined cohort. The markers most often used for such studies are SNPs. Informative markers are defined as those that are sufficiently variable in a population that has a phenotype sufficiently frequent to allow meaningful comparisons to be drawn. An additional approach can make use of the random assignment of genes to reduce the effect of confounders in exploring associations between environmental exposures and disease. This is the essence of mendelian randomization.
The term "mendelian randomization," however, was initially coined in a slightly different context, in which the random assorting of genetic variation at conception was used to generate a study design lacking confounders for the purpose of obtaining estimates of treatment influences in malignancies of childhood (18). Because RCTs were not feasible, the investigators intended to obtain unbiased estimates of the magnitudes of bone marrow transplantation effects in acute myeloid leukemia. Their concept was that comparisons of survival among leukemic children according to whether they had genetically compatible siblings (i.e., a potential donor), regardless of whether transplantation actually occurred, were equivalent to an intentionto-treat analysis in RCTs.
The relatively modern meaning of mendelian randomization is based on Mendel's second law, the law of independent assortment, which assumes that the inheritance of 1 trait is independent of the inheritance of other traits. The concept of genetic variation indicating the action of exposures that are environmentally modifiable has now been used in multiple settings. An example of this concept is the autosomal dominant trait of lactase persistence, which was associated with milk consumption. The subsequent demonstration of an association between lactase persistence and bone-related traits provided evidence that drinking milk protected against osteoporotic fractures (19-22). This example is a clear case of testing mendelian randomization, which has the ability of avoiding the effects of confounding, reverse causation, reporting bias, and underestimation of risk associations caused by phenotypic and behavioral variation.
Mendelian randomization leverages genetic variation so that it acts as a proxy for an environmental factor. This approach circumvents the classic issues that have blighted epidemiology for decades, because the inheritance of genetic variation is independent of the inheritance of other traits; i.e., it is randomized. The concept is based on the notion that genetic variation reflects the biological effect of a modifiable environmental factor, which in turn affects disease risk. That is, the association with disease risk should be equivalent to the influence of the environmental factor under consideration. SNPs have been leveraged in such a manner, albeit they are not typically the functional culprit, but rather a good surrogate, or "tag," for the key variant (23-25). Therefore, mendelian randomization has the potential to provide a window on the influences of modifiable environmental factors on disease that conventional observational epidemiology has classically failed to do.
Parent-offspring designs clearly demonstrate the fundamentals of mendelian randomization because they provide an opportunity to observe directly how phenotypes and genetic variation (alleles) are transmitted to children (26, 27). Heterozygous parents are informative in that they allow investigators to count the number of times 1 of 2 alleles is transmitted to an offspring with a given phenotype. If no association exists between the genetic variant under study and the trait, then each allele has a 50% chance of being transmitted to the offspring; however, a deviation from this per centage would represent an association of the genetic variant with the disease.
We are clearly well beyond the notion that a common disease is determined purely by either environmental or genetic factors, so it is no surprise that mendelian randomization has achieved some traction among epidemiologists. This approach allows one to study the influence of a genetic variant in the context of varying environmental exposure. Indeed, it may ultimately allow one to genetically identify individuals who should avoid prespecified environmental exposures. This approach should not be confused with the discipline of gene-environment interaction studies, however, which is currently popular in the post-GWAS era; rather, its focus is to compare genetic effects and environmental influences (28).
If mendelian randomization can deliver on its promise to overcome the setbacks that have blighted conventional genetic epidemiology, it could play a vital role in answering key questions before decisions are made to commit large amounts of resources to an RCT or a drug-development program. Indeed, if independent SNPs at different genomic locations affect a given phenotype, it should be feasible to investigate the relationship of these SNP combinations to the trait, such that influencing the most extreme scenarios could dramatically affect disease outcomes.
Mendelian randomization is therefore a means to address the phenocopy (a phenotype similar to a genetic syndrome but due to an environmental exposure) and the genocopy (a genetic effect that could as readily be caused by an environmental exposure) in investigating an observed association. This approach, however, should not be interpreted as a hunt for the actual causative mechanism underpinning a given disease phenotype; rather, it simply leverages genetic variants as proxies to address environmental influences on common diseases.
With respect to genetic findings in the GWAS era, one of the first mendelian-randomization studies to leverage a discovered variant has been in the obesity arena. The genetic variant most robustly associated with obesity to date is at the fat mass and obesity associated (FTO)  locus (29), and work with the mendelian-randomization approach has now implicated a role for FTO in cancer (30), although further work is still required to fully establish this association. In addition, a locus strongly associated with type 2 diabetes, CDK5 regulatory subunit associated protein 1-like 1 (CDKAL1), has clearly been shown to operate through birth weight (31, 32).
Limitations of Mendelian Randomization
Although mendelian randomization offers much promise in furthering our understanding of the epidemiology of common diseases, it also has limitations. In many cases, the highly complex data generated in such studies are difficult to analyze, impeding their interpretation and, of most concern, increasing their vulnerability to speculation. It is crucial that these potential impediments be avoided during the design stage, because it is the promise of transparency that makes mendelian randomization attractive to the investigator.
Considering sample size in a mendelianrandomization study is crucial from the outset. Many studies have been rendered uninterpretable simply because of an insufficient sample size and therefore a lack of statistical power for drawing clear conclusions. It is becoming clear that very large sample sizes will be required to leverage usable information from investigations of variants in GWAS studies. Although such variants may be robustly associated with a given trait and be common in the population, the risks they confer on an individual are often very modest (relative risks commonly conferred are in the range of only 1.1-1.3). As described above, the use of mendelian randomization in these circumstances is feasible only in a metaanalysis setting, in which investigative groups collaborate to combine their data sets to achieve sample sizes sufficient for reliably interpreting results.
Another blight in this area of investigation is the nonreplication of findings. Although this possibility is less likely with variants uncovered in GWAS, incorrect conclusions can still be drawn. In addition, the vast majority of variants uncovered in GWAS are simply surrogate tag-SNPs rather than the actual functional variants. This fact could lead to looser correlations and possible misinterpretation.
Despite the best efforts by practitioners of mendelian randomization, confounding can still occur. The variants under investigation might not be independent and could be in linkage disequilibrium, possibly leading to spurious outcomes. In addition, behavioral confounders can still exert an effect if the genetic variant affects behavior directly or indirectly, such as with smoking (33, 34).
There are more-complex complications based on deviations from mendelian principles, in which the allelic-transmission ratio from parent to offspring becomes distorted (35). Such complications can be due to events during meiosis or to selective survival after conception. They can get even more complex if strong genetic variation influences fetal development and the consequent gestational responses. The latter effect is known as "canalization" or "developmental compensation," which involves the buffering of potential effects of either environmental or genetic-factor perturbations occurring during development. The most clear-cut example of developmental compensation is from the study of knock-out animals, in which a functional gene is effectively removed from the study organism. The anticipated consequence of such a removal would be strong phenotypic effects, which, in fact, are not always apparent because of a compensatory mechanism built into the biological system (36, 37). For example, myoglobin is known to be crucial for heart function in mice, but the knock-out model animal experiences no disruption in cardiac function (38).
Another problem related to the transmission of alleles from parent to child is epigenetic effects. For example, one of the parents may be vital to the functional effects of an allele because imprinting; however, as more knowledge in this discipline is gathered, it can be more accurately factored into any mendelian-randomization study design.
All of the limitations outlined above require further attention to optimize the sensitivity of mendelian-randomization approaches. This necessity could be a source of frustration, in that epidemiologists still have these important caveats to account for and it is currently unclear how to exactly implement them until these concerns are fully resolved.
In summary, mendelian randomization, as has been seen in many different genetic epidemiology studies, is susceptible to sample size limitations, lack of replication, and the use of genetic surrogates. In addition, confounders can include genetic variants in linkage disequilibrium, population stratification, and canalization during development; however, a correctly designed mendelian-randomization study that is carefully analyzed and interpreted can produce powerful evidence to support or refute causal hypotheses with respect to environmental influences on the pathogenesis of common diseases.
Outcomes from GWAS are giving new momentum to the portfolio of genetic variants that can be leveraged in mendelian-randomization studies. Until recently, such studies were extremely hampered by the lack of strongly established variants in common diseases and relevant effects with respect to environmental exposures. As the number of loci grows, investigators will become increasingly empowered to identify modifiable causes of common diseases in large population-based studies that are sufficiently powered and extremely well phenotyped.
What is becoming very evident in the GWAS era is a major need for standardization in the reporting of mendelian-randomization findings for distinguishing true signals from background noise. The Human Genome Epidemiology Network (HuGENet; http://www. cdc.gov/genomics/hugenet/default.htm) has been set up to assist in this aim and to aid in the ultimate goal of disease intervention (39, 40). In this era of large-scale analyses, the GWAS approach is revealing new loci through multistage replication efforts involving multiple research centers. Going back to the component cohorts and digging deeper with respect to mendelian randomization have great utility after establishing that a particular variant plays a role in a given trait.
The mission of mendelian randomization, however, is to help inform public health programs going forward through causality testing rather than implicitly outlining specific genetic-screening methodologies. This feature therefore sets mendelian randomization apart from the more conventional goals of genetic epidemiology as a whole. Therefore, combining the use of mendelian randomization with the new knowledge being gleaned from the GWAS revolution may well enlighten us to environmental factors that are modifiable in the context of human health.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest.
Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.
(1.) Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science 1988;242:1257-63.
(2.) Taubes G. Epidemiology faces its limits. Science 1995;269:164-9.
(3.) Davey Smith G. Reflections on the limitations to epidemiology. J Clin Epidemiol 2001;54:325-31.
(4.) Davey Smith G, Phillips AN. Confounding in epidemiological studies: why "independent" effects may not be all they seem. BMJ 1992;305:757-9.
(5.) Phillips AN, Davey Smith G. How independent are "independent" effects? Relative risk estimation when correlated exposures are measured imprecisely. J Clin Epidemiol 1991;44:1223-31.
(6.) Phillips AN, Davey Smith G. The design of prospective epidemiological studies: more subjects or better measurements? J Clin Epidemiol 1993; 46:1203-11.
(7.) Davey Smith G, Ebrahim S. What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 2005; 330:1076-9.
(8.) Bazzano LA, Reynolds K, Holder KN, He J. Effect of folic acid supplementation on risk of cardiovascular diseases: a meta-analysis of randomized controlled trials. JAMA 2006;296:2720-6.
(9.) Zoccali C, Testa A, Spoto B, Tripepi G, Mallamaci F. Mendelian randomization: a new approach to studying epidemiology in ESRD. Am J Kidney Dis 2006;47:332-41.
(10.) Davey Smith G, Phillips AN. Inflation in epidemiology: "the proof and measurement of association between two things" revisited. BMJ 1996;312:1659-61.
(11.) Katan MB. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet 1986;1:507-8.
(12.) Ames BN. Cancer prevention and diet: help from single nucleotide polymorphisms. Proc Natl Acad Sci U S A 1999;96:12216-8.
(13.) Rothman N, Wacholder S, Caporaso NE, Garcia-Closas M, Buetow K, Fraumeni JF Jr. The use of common genetic polymorphisms to enhance the epidemiologic study of environmental carcinogens. Biochim Biophys Acta 2001;1471:C1-10.
(14.) Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001;358:1356-60.
(15.) Keavney B. Genetic epidemiological studies of coronary heart disease. Int J Epidemiol 2002;31: 730-6.
(16.) Davey Smith G, Ebrahim S. Data dredging, bias, or confounding. BMJ 2002;325:1437-8.
(17.) Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 2008;27:1133-63.
(18.) Wheatley K, Gray R. Commentary: mendelian randomization--an update on its use to evaluate allogeneic stem cell transplantation in leukaemia. Int J Epidemiol 2004;33:15-7.
(19.) Birge SJ Jr, Keutmann HT, Cuatrecasas P, Whedon GD. Osteoporosis, intestinal lactase deficiency and low dietary calcium intake. N Engl J Med 1967;276:445-8.
(20.) Newcomer AD, Hodgson SF, McGill DB, Thomas PJ. Lactase deficiency: prevalence in osteoporosis. Ann Intern Med 1978;89:218-20.
(21.) Honkanen R, Kroger H, Alhava E, Turpeinen P, Tuppurainen M, Saarikoski S. Lactose intolerance associated with fractures of weight-bearing bones in Finnish women aged 38-57 years. Bone 1997;21:473-7.
(22.) Corazza GR, Benati G, Di Sario A, Tarozzi C, Strocchi A, Passeri M, Gasbarrini G. Lactose intolerance and bone mass in postmenopausal Italian women. Br J Nutr 1995;73:479-87.
(23.) Davey Smith G, Ebrahim S. 'Mendelian randomization': Can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:1-22.
(24.) Davey Smith G, Ebrahim S Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 2004;33:30-42.
(25.) Davey Smith G, Ebrahim S. Folate supplementation and cardiovascular disease. Lancet 2005; 366:1679-81.
(26.) Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506-16.
(27.) Ahsan H, Hodge SE, Heiman GA, Begg MD, Sus ser ES. Relative risk for genetic associations: the case-parent triad as a variant of case-cohort design. Int J Epidemiol 2002;31:669-78.
(28.) Brennan P. Commentary: mendelian randomization and gene-environment interaction. Int J Epidemiol 2004;33:17-21.
(29.) Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007;316:889-94.
(30.) Brennan P, McKay J, Moore L, Zaridze D, Mukeria A, Szeszenia-Dabrowska N, et al. Obesity and cancer: mendelian randomization approach utilizing the FTO genotype. Int J Epidemiol 2009;38: 971-5.
(31.) Freathy RM, Bennett AJ, Ring SM, Shields B, Groves CJ, Timpson NJ, et al. Type 2 diabetes risk alleles are associated with reduced size at birth. Diabetes 2009;58:1428-33.
(32.) Zhao J, Li M, Bradfield JP, Wang K, Zhang H, Sleiman P, et al. Examination of type 2 diabetes loci implicates CDKAL1 as a birth weight gene. Diabetes 2009;58:2414-8.
(33.) Thorgeirsson TE, Geller F, Sulem P, Rafnar T, Wiste A, Magnusson KP, et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 2008;452: 638-42.
(34.) Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, Zaridze D, et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 2008; 452:633-7.
(35.) Bochud M, Chiolero A, Elston RC, Paccaud F. A cautionary note on the use of Mendelian randomization to infer causation in observational epidemiology. Int J Epidemiol 2008;37:414-6; author reply 416-7.
(36.) Bolon B, Galbreath E. Use of genetically engineered mice in drug discovery and development: wielding Occam's razor to prune the product portfolio. Int J Toxicol 2002;21:55-64.
(37.) Williams RS, Wagner PD. Transgenic animals in integrative biology: approaches and interpretations of outcome. J Appl Physiol 2000;88:111926.
(38.) Garry DJ, Ordway GA, Lorenz JN, Radford NB, Chin ER, Grange RW, et al. Mice without myoglobin. Nature 1998;395:905-8.
(39.) Khoury MJ, Dorman JS. The Human Genome Epidemiology Network. Am J Epidemiol 1998;148: 1-3.
(40.) Khoury MJ. Human genome epidemiology: translating advances in human genetics into population-based data for medicine and public health. Genet Med 1999;1:71-3.
Patrick M.A. Sleiman  * and Struan F.A. Grant [1,2,3]
 Center for Applied Genomics and  Division of Human Genetics, The Children's Hospital of Philadelphia Research Institute, Philadelphia, PA;  Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, PA.
 Nonstandard abbreviations: RCT, randomized controlled trial; SNP, single-nucleotide polymorphism; GWAS, genomewide association studies.
 Human genes: FTO, fat mass and obesity associated; CDKAL1, CDK5 regulatory subunit associated protein 1-like 1.
* Address correspondence to: P.M.A.S. at 1016J, Children's Hospital of Philadelphia Research Institute, 3615 Civic Center Blvd., Philadelphia, PA 19104. E-mail firstname.lastname@example.org. S.F.A.G. at 1216F, Children's Hospital of Philadelphia Research Institute, 3615 Civic Center Blvd., Philadelphia, PA 19104. E-mail email@example.com.
Received December 3, 2009; accepted February 23, 2010.
Previously published online at DOI: 10.1373/clinchem.2009.141564
|Printer friendly Cite/link Email Feedback|
|Author:||Sleiman, Patrick M.A.; Grant, Struan F.A.|
|Date:||May 1, 2010|
|Previous Article:||Establishment of outcome-related analytic performance goals.|
|Next Article:||The D-lemma: to screen or not to screen for 25-hydroxyvitamin D concentrations.|