Printer Friendly

Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

Amniocentesis is a common procedure performed on >200000 women a year in the US alone. It is currently performed on women who are considered to be at a higher risk for pregnancy complications because of their advanced age or to further investigate an abnormal blood or ultrasound test result. The procedure involves the insertion of a needle through the wall of the uterus and the amniotic sac to collect approximately 20 mL of amniotic fluid. Cells from the fluid are collected through centrifugation, cultured, and after approximately 2 weeks analyzed by fluorescent in situ hybridization or a microarray to detect abnormal chromosomal copy number changes or large chromosomal structural rearrangements. In some cases, a small number of specific genes are examined for single- to multibase changes. These tests have become the gold standard for detecting Down syndrome and several other serious birth defects because they have a low false-positive rate; however, they are unable to detect the majority of birth defects.

Currently, there are >5000 genes associated with a genetic disease for which a gene test is offered (GeneTests). In addition, recent studies have defined large sets of genes in which coding variants within these genes are associated with autism (1-3), severe intellectual disability (4), and other congenital disabilities (5-13). A large population study (14) discovered thousands of genes that are intolerant to coding variants, adding to the set of genes that could be disease causing. Finally, the BabySeq project has recently published a curated list of 954 genes for which they recommend that variants found in these genes be reported (15). Taken together, these data provide strong evidence that there could be thousands of genes for which coding changes or complete loss of function of the gene product are incompatible with life or could result in a serious disease phenotype. In other words, testing for large chromosomal changes or examining just a few genes is insufficient for detecting most disease-causing genetic defects.

To date, few studies have performed analyses of the amniotic fluid beyond large structural variations or a few targeted regions of the genome. The first study to apply next-generation sequencing to amniocentesis samples (16) used low-coverage sequence data to detect a balanced translocation disrupting the gene chromodomain helicase DNA binding protein 7 (CHD7) [5] and most likely resulting in a severe congenital syndrome. More recently, it has been shown that whole-exome sequencing is possible on DNA isolated from the cell pellet (17), and that cell-free DNA (cfDNA) [6] from the amniotic fluid can be used to detect copy number variations by use of low-coverage next-generation sequencing data (18). Here, we demonstrate the first high-coverage wholegenome sequence (WGS) from both the cellular and cellfree portions of the amniotic fluid.

Materials and Methods


The institutional review boards of BGI-Shenzhen and the Peking University Shenzhen Hospital provided approval for this study, and all patients signed an informed consent (see the Methods file in the Data Supplement that accompanies the online version of this article at WGS results were not shared with the families. For each amniocentesis procedure, 4 mL of amniotic fluid were sampled and frozen. A full description of DNA isolation, processing, and analysis methods can be found in the Materials file in the online Data Supplement.



For each amniotic sample, we isolated DNA from both the amniotic fluid and the cell pellet. We also collected DNA from the blood of each parent. WGS libraries were made and sequenced on the Complete Genomics nanoarray platform (19, 20) for each DNA sample, enabling a rich set of genomic data for reproducibility analyses and clinical variant annotations to be performed. In total, 28 cfDNA and all 31 cell pellet DNA samples yielded high-quality data. Coverage across each sample yielded a confident call for both alleles made for approximately 97% of the genome and approximately 98% of the exome from both DNA sources (Fig. 1, A and B, and see Table 1 in the online Data Supplement). For 12 of the cell pellet DNA samples, a long fragment read (LFR) (21) library could successfully be made. These libraries also showed a mean coverage of approximately 96% of the genome and exome called confidently, and >99% of all single nucleotide polymorphisms assembled into long contigs with an N50 > 400 kb. These results are similar to previous studies using LFR (21-23) and suggest that, for at least some samples, cell pellet DNA from uncultured cells is of sufficient quality to enable advanced genomic analyses like LFR.

Similar to previous studies on Asian genomes (24), approximately 4 million variants per individual were called (Fig. 1). For most amniocentesis samples, a standard library was made from both the cfDNA and the cell pellet. For the purposes of determining variant call reproducibility, these replicate libraries were ideal (see Table 2 in the online Data Supplement). In general, >96% of calls were shared between pairwise comparisons at all locations for which both libraries were covered with sufficient reads (Fig. 2A). Additionally, as both parents were sequenced, the fetal genome from each library could be compared with parental calls at each loci as further confirmation that the correct variants were being called at all positions. This showed that approximately 99% of calls were consistent with variant calls made in the parents (Fig. 2B). Taken together, these results showed that highquality fetal genomes could be generated from either cfDNA in the amniotic fluid or high molecular weight DNA isolated from the cell pellet.


Previous studies (4, 23, 25) using Complete Genomics genome data have shown that de novo mutations (DNMs) can be detected with a low false-positive error rate by use of appropriate filters (see Methods file in the online Data Supplement). Following similar analysis steps, we found approximately 65, 65, and 50 DNMs per fetal genome in the libraries from a cell pellet, in the cfDNA libraries, and in the LFR libraries, respectively (see Table 3 in the online Data Supplement). Pairwise comparisons between cfDNA and cell pellet libraries demonstrated that approximately 88% of DNMs were shared between libraries for each amniocentesis sample (Fig. 2C). Overall there were 300 DNMs called only in the cell pellet DNA libraries, and 315 called only in the libraries made from cfDNA. However, examination of these calls in the opposite library type showed that 105 and 116 were called in the cfDNA and cell pellet DNA libraries, respectively, but with a quality score that did not pass the filtering criteria (see Table 4 in the online Data Supplement), indicating that many of the DNM calls made by a single library were probably real variants and that there did not appear to be a systematic bias in calls between the 2 different sources of DNA.

To further determine the accuracy of DNM calls and confirm that DNMs identified in the fetal genome were present in the child's genome, we collected buccal samples from 13 of the study participants. Potential DNMs were randomly selected for confirmation, and several hundred base pairs surrounding each candidate DNM were amplified by PCR. In total, 175 regions were successfully amplified and Sanger sequenced. Of these, 162 (92.5%) were found to harbor the potential DNM (Table 5 in the online Data Supplement). Candidates shared between 2 replicate libraries had a much higher confirmation rate (99.1%), in agreement with inherited variant rates (Fig. 2) and supporting the use of replicate libraries to confirm inherited or de novo variant calls as a robust method of evaluation. Potential DNMs were further confirmed to be true DNMs by Sanger sequencing the DNA of each parent. Only 3 of the 89 potential DNMs for which Sanger sequencing was successful for both parents were found to be inherited (see Table 5 in the online Data Supplement), showing that the overall false-negative rate of our sequencing process was quite low (approximately 3.4%).

The mean age of mothers and fathers in our cohort at the time of the amniocentesis procedure was 37.8 and 43.3 years, respectively (see Table 2 in the online Data Supplement). Older fathers have previously been shown to contribute a higher number of DNMs to their children than younger fathers (26, 27). To examine if this correlation could be seen in our cohort, we plotted the total number of DNMs by maternal and paternal age. In our cohort, the total number of DNMs per fetus correlated with paternal age (P = 0.00389), with approximately 1.3 DNMs per additional year of age of the father (Fig. 3A). The mother's age, however, showed no significant correlation (P = 0.9232). When both father's age and mother's age were included to fit a multiple linear regression model, we observed a similar pattern with only the father's age significantly correlated to DNM count (P = 0.00359), but the mother's age having no significant correlation (P = 0.59672). This result is consistent with the previously described pattern between DNMs and parental age (26). Analysis of the base spectrum of DNMs did not differ significantly from that of inherited variants (see Fig. 1 in the online Data Supplement). For samples with LFR data, the parental origin of most DNMs could be determined. This analysis showed the expected pattern of approximately 1.6 X more DNMs from the father (see Table 6 in the online Data Supplement).


To determine our ability to detect larger structural variants, we first compared our read coverage across all chromosomes to karyotyping results from the amniocentesis procedure. Two fetuses were found by karyotyping to carry an extra copy of chromosome 21, which was also confirmed in our read coverage data for libraries made from cfDNA and cell pellet DNA (see Fig. 2 in the online Data Supplement). In addition, there we 3 fetal genomes with known benign polymorphisms in heterochromatin and satellite DNA that were poorly covered by our WGS reads. Thus, for many types of structural changes, karyotyping appears necessary until WGS can be improved in these difficult to sequence parts of the genome. However, many smaller changes (< 1 Mb) are difficult for karyotyping or array CGH to detect, but they should be much easier for WGS.

Similar to the analyses we did with small variants, we compared copy number variants (CNVs) and structural variants (SVs) between replicate libraries and also compared them to parental genome calls (see Tables 7 and 8 in the online Data Supplement). Over 94% of CNVs and 96% of SVs were found in at least one of the parents (Fig. 4A). In addition, approximately 66% of CNVs and approximately 74% of SVs overlapped with CNVs and SVs identified as part of the 1KG project (28) (Fig. 4B). Of those CNVs and SVs that were inherited, 97% and 85% were called between replicate libraries, respectively (Fig. 4C), demonstrating a high level of reproducibility and that most CNV/SV calls were true positives. A total of 8 de novo CNV/SVs of >1 kb were identified within the fetal genomes, excluding trisomy 21 within the fetal genomes from families 21 and 22 (Tables 9 and 10 in the online Data Supplement). The largest identified was 14 kb. Based on previous studies, de novo CNVs larger than 100 kb are rare in healthy individuals (29-34).


We searched our list of variants for entries in ClinVar (35), a well know database for associating genomic variants with disease. On average, each fetal genome contained approximately 1 pathogenic or likely pathogenic variant with assertion criteria and no conflicting interpretations. Most of the potentially disease-causing variants appear to act in a recessive manner (see Table 11 in the online Data Supplement), and no homozygous or compound heterozygous variants with these criteria were discovered. However, this means on average each child is a carrier for a potentially serious disease. Specifically, in family 24, both children inherited a potentially defective copy of adenosine deaminase from their father, making them carriers for severe combined immunodeficiency. In the fetal genomes of 6 different families, we identified autosomal recessive deafness alleles in the genes GJB2 and TMPRSS3; these alleles are known to be more prevalent in Asian populations. We did not identify any known disease-causing variants in the genes responsible for some of the more common serious inherited diseases such as CFTR (cystic fibrosis), HEXA (Tay-Sachs), F8 and F9 (hemophilia A and B), and HBB (sickle cell disease and p thalassemia; see Table 11 in the online Data Supplement). None of the DNMs identified in our study were found in ClinVar.

To further analyze potential disease-causing variants that were not found in ClinVar, we determined the combined annotation-dependent depletion (CADD) (36), sorting intolerant from tolerant (SIFT) (37), and Polyphen2 (38) scores for all rare coding-inherited variants (see Table 12 in the online Data Supplement) and DNMs (Table 1). We also used the ExAC (14) database to identify those genes with high probability of being loss-of-function intolerant and missense Z-scores with variants and/or CNVs/SVs (see Tables 9 and 10 in the online Data Supplement). As a control, these steps were repeated on the genomes ofhealthy Asian participants of the Personal Genome Project (22). Based on this analysis, the majority of variants appeared to be benign (see Fig. 3 in the online Data Supplement). There were, however, a small number of variants that merited further examination based on their scores, notably a detrimental DNM in CHD8 in the fetal genome of family12. Mutations in this gene have recently been described as being one of the more common causes of autism spectrum disorder (ASD) and define a particular subtype of the disease (39). Contact with the physician of this now 2-year-old boy revealed that he does show at least 1 of the common phenotypes, macrocephaly, but at this time he does not show nor has he been evaluated for symptoms ofASD. In the fetal genome of family 26, two different heterozygous missense variants, 1 from each parent, were identified in the gene LRP1 (see Table 12 in the online Data Supplement). Both are predicted to be detrimental by Polyphen2 and SIFT and have CADD scores above 24. Both variants are listed in ExAC, but are rare, 1 having been found in only 2 individuals in the database. In addition, the missense Z-score for this gene is 10.62, suggesting that it is highly intolerant to variation. Variants in this gene have been associated with keratosis pilaris atrophicans, a skin disease that is not expected to severely affect the health of this child. The remainder of the children, as predicted by this genetic screen, have not been reported to have any serious illness.


Apart from diagnosing serious diseases, there are other phenotypes that are important to identify in these fetal genomes. A tragic example is the case of a child who died from respiratory depression associated with excessive levels of morphine in the blood after elective adenotonsillectomy (40). It was later determined he had a duplication of CYP2D6, making him an ultrafast metabolizer of codeine and thus the increased levels of its metabolite, morphine. He would have likely been prescribed a different dose or drug for pain management had this information been known (40). WGS analysis of amniotic material could potentially eliminate these types of severe interactions between a drug and an individual's genetics.

Each fetal genome in our study was analyzed against a list of potential drug interactions cataloged in the DrugBank database (41). This resulted in >400 coding variants per fetal genome in genes that are known targets of drugs. Analysis of Asian genomes and other ethnic groups from the 1KG project resulted in a similar number of coding drug target variants (see Fig. 4 in the online Data Supplement). The vast majority of these variants would not be expected to alter the protein product ofthese genes in such a way as to cause a serious adverse drug reaction. However, we discovered 381 instances of a variant with a low frequency in the population that resulted in complete loss of 1 copy of a gene listed in the DrugBank database in at least 1 of our fetal genomes. Again, it is unclear what effect, if any, these variants would have, and improvements in our understanding of the interaction between drugs and specific variants will be necessary before this type of data can be fully used.

Currently, there are a few well-known gene-drug interactions that we investigated, specifically the cytochrome 450 family involved with metabolizing most drugs and the genes involved in severe reactions to anesthesia. The results of this analysis are summarized in Table 2. Importantly, we discovered that a large number of these children had at least one copy of an inactive or reduced activity cytochrome 450. There are a number of drugs (e.g., warfarin) in which dosing would be altered based on this information. In addition, we identified 4 rare damaging variants in RYR1 and 1 in CACNA1S. Variants in these genes have been associated with malignant hyperthermia, a serious and sometimes fatal response to anesthesia (42). While it is unlikely that all 5 individuals are at risk for malignant hyperthermia, this information would alert an anesthesiologist to use additional precautions and avoid malignant hyperthermia triggering medications during the management of anesthetic care. A caffeine halothane contracture test on a muscle biopsy might also be recommended to confirm malignant hyperthermia.


In this study, we demonstrate for the first time the complete WGS analysis of amniotic samples from pregnant women. We show that up to 97% of the fetal genome can be confidently called using either DNA isolated from the fetal cell pellet or the amniotic fluid, with virtually no difference in quality or coverage between the 2 sources. This is an important discovery, as leftover amniotic fluid is considered a waste product and WGS could be added to amniocentesis testing without interfering with the current standard of care tests. An additional advantage to our approach is that our analysis is possible without the need for the timeconsuming process of amniocyte culturing. It is also possible that the act of culturing amniocytes could cause the selection of particular clones from the heterogeneous ensemble of cells in the amniotic fluid and cause potentially misleading results. While this is unlikely to affect many cases, it is almost certainly better to have a process, such as ours, that does not introduce any selective steps before analysis. Also, we only used 4 mL of a total 20 mL extracted for amniocentesis, pointing toward the possibility that in the future less material could be collected from the mother, if there was a benefit to doing this. Finally, we demonstrate that LFR libraries of high quality can be made from DNA isolated from the cell pellet allowing for haplotyping in these samples. Haplotype information can be extremely important in cases in which detrimental compound heterozygous variants exists and the parents' genomes are not sequenced.

We discovered within the fetal genome almost all small variants, CNVs, and SVs found in the parental genomes. As an additional form of validation, many of these variations overlapped with 1KG project samples. We identified 65 DNMs per genome from both cell pellet and cfDNA sources, in agreement with previous studies (34), and showed that most of these are shared between the 2 libraries. In our data, we find a previously described (26, 27) significant trend toward more DNMs in the genomes of those fetuses with older fathers. Importantly, we find that >92% of the DNMs identified by sequencing a single library from either the cell pellet or cfDNA exist in the newborn child, proving that this type of analysis is accurate and that the fetal genome is sufficiently predictive of the genome of the child.

In this cohort, we discovered a single fetus with a DNM in CHD8. Damaging DNMs in CHD8 are one of the most common causes of simplex cases ofASD, and 80% of individuals with ASD and a DNM in CHD8 also display macrocephaly (39). According to the child's physician, he is already showing signs of macrocephaly. This suggests that the child in our study should be monitored for development of ASD, as there are early intervention programs that can improve outcomes for children with ASD. We also identified a child with compound heterozygous detrimental variants in LRP1, which could cause keratosis pilaris atrophicans, although we were unable to obtain any additional information about the health of this child beyond that she was born healthy. We also found that many of the children in this study are carriers for a severe disease and importantly identified many that could have potential drug dosage issues due to reduced CYP450 activity. Finally, we identified 5 individuals we would consider to be at risk for malignant hyperthermia due to rare variants in RYR1 and CACNA1S. For those individuals, additional testing before general anesthesia or avoidance ofanesthetic medications that are contraindicated for malignant hyperthermia-susceptible patients may be advisable.

Through this analysis, we show that much more information can be acquired from a routine amniocentesis procedure. In addition, we show that either the cell pellet or the amniotic fluid itself can be used for this analysis, suggesting that it could be added to current amniocentesis processes without the need to alter protocols. At the current cost of about $1000, we suggest the methods we have described here should be considered as an additional analysis that can augment current karyotyping data. This type of additional information has the potential to identify many of the causes of serious birth defects that are currently missed. Finally, we believe a high-quality genome should be considered an investment in the child's future, and having this information before the child's birth can be enormously beneficial should any medical emergencies arise.

Author Declaration: This paper was previously posted as a preprint on bioRxiv as

Author Contributions: All authors confirmed they have contributed to the intellectual content ofthispaper and have met the following3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation ofdata; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.

B.A. Peters, R. Drmanac, and F. Chen conceived the study. Y. Deng, W. Xie, and F. Chen collected the amniocentesis samples. B.A. Peters, R. Drmanac, and R.Y. Zhang developed the lab processes and made the libraries for sequence analysis. Q. Mao, N. Gulbahce, Z. Li, H. Xu, Q. Shi, E.E. Peters, and B.A. Peters performed analyses. B.A. Peters, W. Xie, F. Chen, W. Zhang, and R. Drmanac coordinated the study. B.A. Peters, R. Chin, and Q. Mao wrote the paper. All authors contributed to revision and review of the manuscript.

Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts ofinterest:

Employment or Leadership: Q. Mao, Complete Genomics, Inc.; R. Chin, Complete Genomics; H. Xu, BGI-SHENZHEN; R. Drmanac, Complete Genomics; BA. Peters, Complete Genomics/BGI-Shenzhen. Consultant or Advisory Role: None declared.

Stock Ownership: Q. Mao, BGI, Complete Genomics, Inc.; R. Chin, BGI, Complete Genomics; W. Xie, BGI; W. Zhang, BGI; H. Xu, BGI; R.Y. Zhang, BGI; Q. Shi, BGI; N. Gulbahce, BGI; Z. Li, BGI; F. Chen, BGI; R Drmanac, BGI, Complete Genomics; B.A. Peters, BGI.

Honoraria: None declared.

Research Funding: B.A. Peters, the Research Fund for International Young Scientists, National Natural Science Foundation of China (31550110216). The Shenzhen Municipal Government of China Peacock Plan (NO.KQTD20150330171505310).

Expert Testimony: None declared.

Patents: R. Drmanac, 9650673.

Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or final approval of manuscript.

Acknowledgment: The authors would like to acknowledge the ongoing contributions and support of all Complete Genomics and BGI-Shenzhen employees, in particular the many highly skilled individuals that work in the libraries, reagents, and sequencing groups that make it possible to generate high-quality whole-genome data. The authors wish to thank Ou Wang for help in preparing the manuscript.


(1.) Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X, et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 2012; 151:1431-42.

(2.) lossifov I, O'Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 2014;515: 216-21.

(3.) De RubeisS, He X, Goldberg AP, Poultney CS, Samocha K, Cicek AE, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 2014;515:20915.

(4.) Gilissen C, Hehir-Kwa JY, Thung DT, vande Vorst M, van Bon BW, Willemsen MH, et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 2014;511:344-7.

(5.) Epi KC, Epilepsy Phenome/Genome P, Allen AS, Berkovic SF, Cossette P, Delanty N, et al. De novo mutations in epileptic encephalopathies. Nature 2013;501:21721.

(6.) de Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T, et al. Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 2012;367:1921-9.

(7.) Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nat Rev Genet 2012;13:565-75.

(8.) Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N Engl J Med 2013; 369:1502-11.

(9.) AlTurki S, Manickaraj AK, Mercer CL, Gerety SS, Hitz MP, Lindsay S, et al. Rare variants in NR2F2 cause congenital heart defects in humans. Am J Hum Genet 2014;94: 574-85.

(10.) Fromer M, PocklingtonAJ, Kavanagh DH, WilliamsHJ, Dwyer S, Gormley P, et al. De novo mutations inschizophrenia implicate synaptic networks. Nature 2014; 506:179-84.

(11.) Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 2014;506: 185-90.

(12.) McCarthy SE, Gillis J, Kramer M, Lihm J, Yoon S, Berstein Y, et al. De novo mutations in schizophrenia implicate chromatin remodeling and support a genetic overlap with autism and intellectual disability. Mol Psychiatry 2014;19:652-8.

(13.) Deciphering Developmental Disorders S. Large-scale discovery of novel genetic causes of developmental disorders. Nature 2015;519:223-8.

(14.) Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016;536:285-91.

(15.) Ceyhan-Birsoy 0, Machini K, Lebo MS, Yu TW, Agrawal PB, Parad RB, et al. A curated gene list for reporting results of newborn genomic sequencing. Genet Med 2017;19:809-18.

(16.) Talkowski ME,Ordulu Z, Pillalamarri V, Benson CB, Blumenthal I, Connolly S, et al. Clinical diagnosis by wholegenome sequencing of a prenatal sample. N Engl J Med 2012;367:2226-32.

(17.) Zhao R, Ruan Y, Wang X. Whole-exome sequencing and whole genome re-sequencing for prenatal diagnosis of achondroplasia. Int J Clin Exp Med 2015;8:19241-9.

(18.) Qi Q, Lu S, Zhou X, Yao F, Hao N, Yin G, et al. Copy number variation sequencing-based prenatal diagnosis using cell-free fetal DNA in amniotic fluid. Prenat Diagn2016;36:576-83.

(19.) Carnevali P, Baccash J, Halpern AL, Nazarenko I, Nilsen GB, Pant KP, et al. Computational techniques for human genome resequencing using mated gapped reads. J Comput Biol 2012;19:279-92.

(20.) Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 2010;327:78-81.

(21.) Peters BA, Kermani BG, Sparks AB, Alferov 0, Hong P, Alexeev A, et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 2012;487:190-5.

(22.) Mao Q, CiotlosS, Zhang RY, Ball MP, Chin R, Carnevali P, et al. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes. Gigascience 2016;5:1-9.

(23.) Peters BA, Kermani BG, Alferov 0, Agarwal MR, McElwain MA, Gulbahce N, et al. Detection and phasing of single basede novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing. Genome Res 2015;25:426-34.

(24.) Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526:68-74.

(25.) Yuen RK, Thiruvahindrapuram B, Merico D, Walker S, Tammimies K, Hoang N, et al. Whole-genome sequencing of quartet families with autism spectrum disorder. Nat Med 2015;21:185-91.

(26.) Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, Magnusson G, et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 2012; 488:471-5.

(27.) Jiang YH, Yuen RK, Jin X, Wang M, Chen N, Wu X, et al. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet 2013;93:249-63.

(28.) Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature 2015;526:75-81.

(29.) Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, WalshT, et al. Strong association of de novo copy number mutations with autism. Science 2007;316:445-9.

(30.) Xu B, Roos JL, Levy S, van Rensburg EJ, Gogos JA, Karayiorgou M. Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet 2008;40:880-5.

(31.) Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins and functional impact of copy number variation in the human genome. Nature 2010; 464:704-12.

(32.) Itsara A, Wu H, Smith JD, Nickerson DA, Romieu I, London SJ, Eichler EE. De novo rates and selection of large copy number variation. Genome Res 2010;20:146981.

(33.) Oskoui M, Gazzellone MJ, Thiruvahindrapuram B, Zarrei M, Andersen J, Wei J, et al. Clinically relevant copy number variations detected in cerebral palsy. Nat Commun 2015;6:1-7.

(34.) Acuna-Hidalgo R, Veltman JA, Hoischen A. New insights into the generation and role of de novo mutations in health and disease. Genome biology 2016;17: 241.

(35.) Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. Clin Var: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 2014;42:D980 -5.

(36.) Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46:310 -5.

(37.) Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat Protoc 2009;4:1073-81.

(38.) Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods 2010;7:248 -9.

(39.) Bernier R, Golzio C, Xiong B, Stessman HA, Coe BP, Penn 0, et al. Disruptive CHD8 mutations define a subtype of autism early in development. Cell 2014;158: 263-76.

(40.) Ciszkowski C, Madadi P, Phillips MS, Lauwers AE, Koren G. Codeine, ultrarapid-metabolism genotype, and postoperative death. N Engl J Med 2009;361:827-8.

(41.) Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, et al. Drugbank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res 2014;42:D1091-7.

(42.) Rosenberg H, Pollock N, Schiemann A, Bulger T, Stowell K. Malignant hyperthermia: A review. Orphanet J Rare Dis 2015;10:93.

Qing Mao, [1]([dagger]) Robert Chin, [1]([dagger]) Weiwei Xie, [2]([dagger]) Yuqing Deng, [3]([dagger]) Wenwei Zhang, [2]([dagger]) Huixin Xu, [2] Rebecca Yu Zhang, [1] Quan Shi, [2] Erin E. Peters, [4] Natali Gulbahce, [1] Zhenyu Li, [2] Fang Chen, [2] Radoje Drmanac, [1,2] and Brock A. Peters [1,2] *

[1] Advanced Genomics Technology Lab, Complete Genomics, Inc., San Jose, CA; [2] BGIShenzhen, Shenzhen, China; [3] Peking University Shenzhen Hospital, Shenzhen, China; [4] Department of Anesthesiology, Keck Medical Center of the University of Southern California, Los Angeles, CA.

[5] Human Genes: CHD7, chromodomain helicase DNA binding protein 7; CHD8, chromodomain helicase DNA binding protein 8; LRP1, LDL receptor related protein 1; GJB2, gap junction protein beta 2; TMPRSS3, transmembrane serine protease 3; CFTR, cystic fibrosis transmembrane conductance regulator; HEXA, hexosaminidase subunit alpha; F8, coagulation factor VIII; F9, coagulation factor IX; HBB, hemoglobin subunit beta; CYP2D6, cytochrome P450family 2 subfamily D member 6; RYR1, ryanodine receptor 1; CACNA1S, calcium voltage-gated channel subunit alpha1 S; CDK5, cyclin dependent kinase 5; NRG3, neuregulin 3; SPTBN4, spectrin beta, non-erythrocytic 4; CPSF6, cleavage and polyadenylation specific factor 6; PNPLA6, patatin like phospholipase domain containing 6; ZNF781, zinc finger protein 781; KIF15, kinesin family member 15; WDPCP, WD repeat containing planar cell polarity effector; MGAT4B, mannosyl (alpha-1,3-)glycoprotein beta-1,4-N-acetylglucosaminyltransferase, isozyme B; KDM5B, lysine demethylase 5B; CCDC62, coiled-coil domain containing 62; TEX2, testis expressed 2; SLC20A2, solute carrierfamily20 member 2; PARP14, poly(ADP-ribose) polymerase family member 14; OSBPL11, oxysterol binding protein like 11; PPP1R9A, protein phosphatase 1 regulatory subunit 9A; CYP2C9, cytochrome P450 family 2 subfamily Cmember 9; CYP3A5, cytochrome P450 family 3 subfamily A member 5.

[6] Nonstandard abbreviations: cfDNA, cell-free DNA; WGS, whole-genome sequencing; LFR, long fragment read; DNM, de novo mutation; CNV, copy number variant; SV, structural variant; CADD, combined annotation-dependent depletion, ASD, autism spectrum disorder.

* Address correspondence to this author at: Complete Genomics, Inc., 2904 Orchard Parkway, San Jose, CA 95134. E-mail:

([dagger]) These authors contributed equally to this work.

Received August 25,2017; accepted January 12,2018.

Previously published online at DOI: 10.1373/clinchem.2017.281220

[C] 2018 American Association for Clinical Chemistry

Caption: Fig.1. Genome and variant call performance.

We calculated the percent of the genome (A) and the exome (B) called for both alleles for each library. The total number of single nucleotide polymorphisms for each libraryis plotted (C). The reddashed line in all figures is the average rates seen in Complete Genomics' whole-genome sequencing data. Cell, DNA isolated from the cell pellet; cfDNA, cell-free DNA; LFR, samples sequenced with long fragment read technology using cell pellet DNA.

Caption: Fig.2. Small variant library concordance.

Percent overlap between replicate samples (A) and the parents and fetus (B) are plotted. Each dot represents a pairwise comparison. Blue dots are single nucleotide polymorphisms only, and red dots represent all variants. The overlap of de novo mutations (DNMs) between replicate samples (C) is also plotted. Cell, DNA isolated from the cell pellet; cfDNA, cell-free DNA; LFR, samples sequenced with long fragment read technology using cell pellet DNA.

Caption: Fig. 3. Parental age versus fetal de novo mutations (DNMs).

Paternal (A) and maternal (B) age were plotted on the x-axis versus total number of DNMs on the y axis. Each dot represents a fetal genome library. Blue circles denote libraries from cell pellets; red circles denote long fragment read (LFR) libraries; green circles represent libraries made from cell-free DNA. A trend line is plotted on each graph.

Caption: Fig. 4. Copy number variant (CNV) and structural variant (SV) library concordance.

The percent of CNVs and SVs called by standard libraries from the cell pellet (Cell) or amniotic fluid [cell-free DNA (cfDNA)] and at least one parent (A) were plotted. The percent of CNVs and SVs called in fetal genomes and also called by the 1KG project was also plotted (B). Of those CNVs and SVs called by at least one parent, the percent shared between the Cell and cfDNA libraries was plotted (C).
Table 1. Coding DNMs.

Individual   Chromosomal coordinates         Ref     Var   Gene

2            chr7:150753860-150753861        c       T     CDK5

5            chr10:84745342-84745343         c       G     NRG3
6            chr19:41081400-41081401         A       G     SPTBN4
12           chr14:21868326-21868327         C       G     CHD8

12           chr12:69653901-69653902         G       A     CPSF6
15           chr19:7624035-7624036           A       G     PNPLA6

15           chr19:38160418-38160419         C       T     ZNF781
17           chr3:448 72501-44872502         G       A     KIF15
17           chr2:63609084-63609085          G       G     WDPCP

18           chr5:179225540-179225541        T       C     MGAT4B
24R          chr1:202 722 082-202 72 2083    G       A     KDM5B
24R          chr12:123276605-123276610       GAGAA   del   CCDC62
26           chr1 7:62230319-62230320        G       A     TEX2
28           chr8:422 95080-422 95081        T       C     SLC20A2

29           chr3:122418851 -122418852       G       A     PARP14
30           chr3:125257391 -125257392       A       G     OSBPL11
30           chr7:94539852-94539853          G       A     PPP1R9A

Individual   A.A. Change   ExACAF     CADD   SIFT   PP2

2            D72N                     23.7   T      B

5            D690E                    26.6   D      D
6            T2540A        8.24E-06   12.9   T      P
12           L1290F                   24.7   D      D

12           R464H                    35.0   D      D
15           Y1167C                   21.0   T      B

15           G210S                    11.4   T      B
17           D1054N                   25.7   T      B
17           I526T         8.28E-06   28.5   D      D

18           N464S                    22.7   D      D
24R          Q550*                    41.0   N/A    N/A
24R          EEI237DHS*               35.0   N/A    N/A
26           P1048L                   34.0   D      D
28           M316V                    8.5    T      B

29           S483N                    12.2   T      D
30           Y642H                    23.2   D      B
30           R142Q                    24.4   D      D

Individual      Z-score   pLI    DDG2P   SFARI

2               2.92      0.95

5               1.45      0.21
12              5.54      1.00    yes     yes

12              4.35      1.00
15              5.99      0.00

15              -0.79     0.00
17              0.48      0.00
17              -0.47     0.00    yes

18              2.24      0.81
24R             1.99      0.00            yes
24R             0.53      0.00
26              1.41      0.44
28              3.03      0.86

29              1.11      0.01
30              1.25      0.90
30              0.01      0.05

Individual                 Known disease association

2                Complete loss of function appears to be lethal

12                 DNMs in this gene are a major cause of ASD

15                Complete loss of function can cause Boucher-
             Neuhauser syndrome and other PNPLA6-related disorders
17                    Loss of function potentially causes
                             Bardet-Biedl syndrome
28               Missense mutations can cause idiopathic basal
                ganglia calcification-1 by a dominant mechanism

Coding DNMs in each fetal genome were annotated using publicly
available databases. Listed for each DNM are the fetal genome in
which it was found (Individual); the genomic location at which it
was found (Chromosomal Coordinates); the reference base atthat
position (Ref); the de novo base change at that position (Var);
the gene in which it was found (Gene); the amino acid change in
the protein (A.A. Change); the frequency of this change in a
database of over 100 000 exomes(ExAC Frequency); the combined
annotation-dependent depletion (CADD) score; the likelihood that
the change affects the protein function as determined by sorting
intolerant from tolerant (SIFT) and Polyphen2 (PP2); the
missenseZ-score of the gene (Missense Score); the probability of
being I oss-of-f unction (LoF) intolerant (pH); if the gene was
found in the Developmental Disorders Genotype-to-Phenotype
database (DDG2 P) or the Simons Foundation Autism Research
Initiative database (SFARI); and if the gene is known to be
associated with a disease (known disease association). * refers
to an introduced termination codon.

Table 2. Inherited variants in drug response genes.

          Individual              Chromosomal coordinates      Ref

Rare      3                       chr19:38976581-38976582      c

          6                       chr19:39010063-39010064      c
          10                      chr19:39008322-39008323      G
          16                      chr19:38976507-38976508      C
          22                      chr1:201019609-201000000     T
          26                      ch r22:4252 5034-4252 5035   C

Common    3 hets                  chr10:96741052-96741053      A

          10 hets and 1 0 horns   ch r22:4252 6693-4252 6694   G

          13 hets and 3 horns     chr7:992 70538-992 70539     C

          Individual              Var   dbSNP         Gene

Rare      3                       T     rs202225176   RYR1

          6                       T                   RYR1
          10                      A                   RYR1
          16                      A                   RYR1
          22                      C                   CACNA1S
          26                      T     rs5030865     CYP2D6

Common    3 hets                  C     rs1057910     CYP2C9

          10 hets and 1 0 horns   A     rs1065852     CYP2D6

          13 hets and 3 horns     T     rs776746      CYP3A5

          Individual              A.A. Change   ExACAF     gnomAD AF

Rare      3                       P1763S        6.60E-05   1.62E-04

          6                       P3410L        2.47E-05   1.29E-04
          10                      R3337Q
          16                      T1738K
          22                      N1383S        8.24E-06
          26                      G169R         9.52E-04   2.93E-04

Common    3 hets                  I359L         0.064      0.048

          10 hets and 1 0 horns   P34S          0.204      0.192

          13 hets and 3 horns     Disrupt                  0.265

          Individual              CADD    SIFT   PP2

Rare      3                       22.9    T      D

          6                       25.8    D      D
          10                      23.2    D      D
          16                      25.5    T      D
          22                      26.1    D      D
          26                      24.4    D      D

Common    3 hets                  20.4    D      B

          10 hets and 1 0 horns   24.9    D      D

          13 hets and 3 horns     3.375

          Individual              Drug interaction

Rare      3                       Some variants in this gene can
                                  cause malignant hyperthermia in
                                  response to some anesthesia.
          26                      This variant is part of CYP2D6*14
                                  haplotype; it has no activity and
                                  can affect the dosage of many
                                  different drugs.

Common    3 hets                  This variant is part of the
                                  CYP2C9*3 haplotype; it has low
                                  activity and can affect the dosage
                                  of many different drugs.

          10 hets and 1 0 horns   This variant is part of the
                                  CYP2D6*10 haplotype; it has low
                                  activity and can affect the dosage
                                  of many different drugs.

          13 hets and 3 horns     This variant is part of the
                                  CYP3A5*3 haplotype; it has no
                                  activity and can affect the dosage
                                  of many different drugs.

Well-known gene drug interactions were examined. Listed for each
variant in a gene with certain alleles that are known to cause an
interaction with a certain drug or class of drugs are the fetal
genome in which it was found (Individual), the genomic location
it was found (Chromosomal coordinates), the reference base at that
position (Ref), the variant base atthat position (Var), the dbSNP
identifier (dbSNP), the gene itwasfound in (Gene), the amino acid
change in the protein (A.A. Change), the frequency of this change
in a database of over 100 000 exomes(ExAC Frequency), the allele
frequency in a database of over 15 000 genomes (gnomADAF), the
combined annotation-dependent depletion score (CADD), the
likelihood that the change affects the protein function as
determined by SIFT (SIFT) and Polyphen2 (PP2), and the known drug
interaction (Drug interaction).
COPYRIGHT 2018 American Association for Clinical Chemistry, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2018 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Molecular Diagnostics and Genetics
Author:Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Sh
Publication:Clinical Chemistry
Date:Apr 1, 2018
Previous Article:Allele-Specific Droplet Digital PCR Combined with a Next-Generation Sequencing-Based Algorithm for Diagnostic Copy Number Analysis in Genes with High...
Next Article:Prognostic Implications of Multiplex Detection of KRAS Mutations in Cell-Free DNAfrom Patients with Pancreatic Ductal Adenocarcinoma.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters