Printer Friendly

Solexa sequencing: decoding genomes on a population scale.

It is has now been some 17 years since David Klenerman and I (Fig. 1) conceived a project that would subsequently enable rapid whole human genome sequencing. The core thinking that led to Solexa Sequencing was sparked by observations made during basic exploratory research carried out in our laboratories during the mid to late 1990s, and the redirection of these observations toward DNA sequencing was an unintended consequence. The founding ideas and proof-of-concept experiments were carried out at the University of Cambridge. The first commercial sequencing system was subsequently developed at Solexa Limited, a company we founded in 1998. Illumina Inc. acquired Solexa in early 2007 and made further improvements that led to several new sequencing systems. Today, the technology is being used to routinely decode human genomes for medical research, clinical decision-making, and basic science, all at a cost and speed that make population-scale sequencing practical. The journey from concept to reduction to practice has already gone beyond my expectations in terms of performance, adoption, and early insights, yet the era of clinical whole genome sequencing is perhaps only just beginning.

In the mid-1990s, in the University Chemical Laboratories, Cambridge, David Klenerman and I were using fluorescence single-molecule spectroscopy to observe the synthesis of DNA by a polymerase enzyme using fluorescently encoded nucleotides. The work itself and the grant proposal that supported it were fundamental in nature and said nothing of sequencing. The struggle and attempts to improve the experimental design to optimize what we wished to observe suggested to us a means to decode a strand of immobilized DNA by single-molecule fluorescence imaging. It was not immediately clear what the benefits of decoding DNA on a surface would be, until our awareness of DNA microarrays prompted the realization that the decoding of immobilized DNA could be made parallel on an array-type format, giving the potential to sequence DNA on a massive scale. These purely technical ideas were stimulating, but a greater purpose was needed to drive us toward committing them to a serious effort. Klenerman and I were aware of the Human Genome Project activities at the nearby Sanger Institute at Hinxton Hall, and we made a trip to visit 3 of the key scientists, David Bentley, Richard Durbin, and Jane Rogers, in early 1998. We were most impressed by the scale and organization of the sequencing activities at the Sanger Institute. It was also striking that even with all this capacity (along with the other participating genome centers in the world), it would take about a decade to produce the first human reference genome. In tearoom discussions with our 3 hosts, we felt confident that the reference human genome would definitely emerge during the subsequent 10 years. We described our strong desire to create a new method capable of decoding human genomes improved by many orders of magnitude over the methods being used at that time. The Human Genome Project was, after all, going to produce only 1 human genome, and it was evident that very many genomes would be needed to pin down the genetic basis of biological function, dysfunction, and genetic diseases. Equipped with new insights and an enthusiastic vote of confidence from our hosts, we returned to Lensfield Road, to the task of reducing our ideas to practice.

The early work at Lensfield Road focused on chemically adapting fluorescently tagged nucleotide triphosphates so that they could be incorporated one at a time with complete chemical control (Fig. 2). This also involved screening all DNA polymerases that we could lay our hands on, to understand which classes of polymerases would tolerate the types of changes we wanted to make to our nucleotides. In parallel, we also investigated methods for immobilizing DNA to surfaces that would allow stable single-molecule fluorescence imaging. Our early attempts to address these key questions delivered sufficient proof of concept to warrant a more serious effort to integrate all the necessary factors to build a sequencing system. I thought the best vehicle for this would be via a start-up company, providing the impetus to raise the necessary funds and build an interdisciplinary team that could be scaled up rapidly to develop a robust commercial system. We raised venture capital investment and started Solexa Limited in the summer of 1998. Initially, Solexa operated in virtual mode, with all of the experimental work being incubated in our laboratories in the university, where success and further proof of concept led to further investment and a move to external premises near to the Sanger Institute in 2000. Details of the technical aspects have been described elsewhere (1-3). The first whole genome to be sequenced by the Solexa approach was that of [phi] X 174 in 2005. An important change from our original technical vision was to move away from single-molecule to multimolecule sequencing by forming clusters from an array seeded by single molecules of DNA. This decision was prompted by a conversation I had with Sydney Brenner in December 2002, just a few days before he received his Nobel Prize in Stockholm, when he convinced me that clusters would lead to a more pragmatic way of reducing stochastic errors that are generated from single-molecule sequencing. He was right! And furthermore, this provided improved signal strength, leading to systems with relatively inexpensive cameras compared with single-molecule detection.

In 2006, Solexa released the first commercial sequencing system called the Genome Analyzer. It could decode a billion bases of human DNA sequence accurately in a single run, and this was something that was proudly announced in early January 2007. Shortly after that announcement, Illumina consummated the acquisition of Solexa and its technology. In Illumina, the technology was subjected to continuous, further improvement, culminating in the HiSeq platform being released in 2011, initially delivering 600 billion bases per experimental run, later improving to a trillion bases. Other formats of the technology included the first desktop version, the MiSeq, which premiered in 2012; the desktop whole genome sequencer, the NexSeq 500, which was released in 2014; and the HiSeq X Ten systems for population-scale sequencing, which was also released in 2014.

In November 1997, Klenerman and I had claimed (to investors) that our method would be scalable to a billion bases of DNA per experiment (a calculation originally done on the back of a beer mat in a pub). This capacity had been realized in a commercial system in early January 2007, and then exceeded by 3 orders of magnitude by 2014. The cost of accurate whole genome sequencing fell from about 1 billion US dollars for the Human Genome Project, down to about 1000 US dollars today. This is about a million-fold improvement in speed and cost achieved over 17 years.

I will now reflect on where clinical whole human genome sequencing stands and where it may be heading in the coming years. There are a good number of clinical areas where genome (or high-depth) sequencing has demonstrable potential to alter the course of clinical management in the future, and I will mention just a few. Perhaps the most obvious case for whole genome sequencing is cancer, given the genetic causation and genetic uniqueness of every human cancer. Huge strides have been made toward building our understanding of the genetic signatures of cancer. Large-scale efforts such as the International Cancer Genome Consortium compiled huge and important data sets, from which have been extracted common genetic signatures of cancers (4). Real-time monitoring of the evolution of a patient's cancer genome, under particular drug treatment, can provide guidance to optimize and monitor the effectiveness of therapy (5). I had not, until recently, appreciated the potential impact of whole genome sequencing on rare diseases, which collectively afflict 1 in 17 people and are mostly genetic in origin, with the dominant cases expressed early in life, during childhood years (6). There are already some published, well-documented examples of diagnosis of childhood rare diseases by whole genome sequencing of the child and both parents (trio), and some pediatric clinics use whole genome sequencing routinely as part of their standard of care (7, 8). The third area where whole genome sequencing is beginning to make an impact is with infectious diseases. Whole genome sequencing of pathogens in a clinical setting can provide early diagnosis and prevention of outbreaks and is likely to form a routine part of clinical practice (9). Noninvasive analysis of cell-free DNA circulating in plasma, by genome-wide or high-depth targeted sequencing, has huge potential for prenatal diagnosis of fetal genetic disorders (10) and also early detection and diagnosis of cancers (11).

It has been extraordinary to experience the transformation from concepts that stemmed from basic research to a widely used technology, in less than 20 years. It has also been remarkable to observe the rapid adoption of genome sequencing, and early signs of promise, in various segments of the clinical sector. Although one must be cautious not to overstate how far the successful implementation into routine clinical practice will proceed, to me it is now beyond question that genome sequencing will be a lasting part of medicine. The recent launch by the UK National Health Service of a project to sequence the whole genomes of 100 000 UK patients (approximately 0.2% of the population) and integrate the resulting data with the classical clinical records (12) is a pioneering step toward the implementation of genomic medicine on a population scale.

I wish to acknowledge my collaborator David Klenerman, with whom I share the founding inventions and initiation of the Solexa project. I thank coworkers who were courageous enough to embark on this journey, particularly at the early stages when the risks were high. I also acknowledge the talented and dedicated people of Solexa and Illumina, for the commercialization and continued improvement of this technology. I thank the Biotechnology and Biological Sciences Research Council of the UK for funding the basic science that provided the foundation for Solexa sequencing.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contribution to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.

Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:

Employment or Leadership: S. Balasubramanian, Solexa.

Consultant or Advisory Role: S. Balasubramanian, Illumina.

Stock Ownership: S. Balasubramanian, Illumina.

Honoraria: None declared.

Research Funding: S. Balasubramanian, Biotechnology and Biological Sciences Research Council of the UK.

Expert Testimony: None declared.

Patents: US 8,158,346, US 7,772,384, US 7,427,673, US 7,057,026, US 8,623,628, US 8,394,586, US 8,148,064, US 7,785,796, US 7,566,537, and US 6,787,308.

References

(1.) Balasubramanian S. Decoding genomes at high speed: implications for science and medicine. Angew Chem Int Ed 2011; 50:12406.

(2.) Balasubramanian S. Sequencing nucleic acids: from chemistry to medicine. RSC Chem Commun 2011; 47: 7281.

(3.) Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008; 456:53-9.

(4.) Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV et al. Signatures of mutational processes in human cancer. Nature 2013; 500:415-21.

(5.) Jones SJM, Laskin J, Li YY, Griffith OL, An J, Bilenky M, et al. Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol 2010; 11:R82.

(6.) Rare Disease UK [home page]. http://www.raredisease.org.uk/(AccessedOctober2014).

(7.) Saunders CJ, Miller NA, Soden SE, Dinwiddie DL, Noll A, Alnadi N et al. Rapid whole-genome sequencing for genetic-disease diagnosis in neonatal intensive care units. Sci Transl Med 2012; 4:154ra135.

(8.) Jacob HJ, Abrams K, Bick DP, Brodiel K, Dimmock DP, Farrell M et al. Genomics in clinical practice: lessons from the front lines. Sci Transl Med.2013; 5:194cm5.

(9.) Koser CU, Bryant JM, Becq J, Torok ME, Ellington MJ, Marti-Renom MA, et al. Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. N Engl J Med 2013; 369:290-2.

(10.) Lo YM, Chan KC, Sun H, Chen EZ, Jiang P, Lun FM, et al. Maternal plasma DNA sequencing reveals the genomewide genetic and mutational profile of the fetus. Sci Transl Med.2010; 2:61ra91.

(11.) Forshew T, Murtaza M, Parkinson C, Gale D, Tsui DW, Kaper F, et al. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci Transl Med 2012 May 30; 4: 136ra68.

(12.) Genomics England [home page]. http://www.genomicsengland.co.uk/(Accessed October 2014).

Shankar Balasubramanian [1,2] *

[1] Professor, Department of Chemistry, University of Cambridge; [2] CRUK Cambridge Institute, University of Cambridge, Cambridge, UK.

* Address correspondence to the author at: Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK CB21EW. E-mail: sb10031@cam.ac.uk.

Received August 22, 2014; accepted September 12, 2014.

Previously published online at DOI: 10.1373/clinchem.2014.221747
COPYRIGHT 2015 American Association for Clinical Chemistry, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2015 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Reflections
Author:Balasubramanian, Shankar
Publication:Clinical Chemistry
Geographic Code:1USA
Date:Jan 1, 2015
Words:2179
Previous Article:Variability of the reverse transcription step: practical implications.
Next Article:Clinical exome performance for reporting secondary genetic findings.
Topics:

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters