Genome 2.0: mountains of new data are challenging old views.
When scientists unveiled a draft of the human genome The human genome is the genome of Homo sapiens, which is composed of 24 distinct pairs of chromosomes (22 autosomal + X + Y) with a total of approximately 3 billion DNA base pairs containing an estimated 20,000–25,000 genes. in early 2001, many cautioned that sequencing the genome was only the beginning. The long list of the four chemical components that make up all the strands of human DNA DNA: see nucleic acid.
or deoxyribonucleic acid
One of two types of nucleic acid (the other is RNA); a complex organic compound found in all living cells and many viruses. It is the chemical substance of genes. would not be a finished book of life, but a road map of an undiscovered country that would take decades to explore.
Only 6 years later, the landscape of the genome is already proving to be dramatically different than most scientists had expected.
The established view of the genome began to take shape in 1958, just 5 years after Francis Crick Noun 1. Francis Crick - English biochemist who (with Watson in 1953) helped discover the helical structure of DNA (1916-2004)
Francis Henry Compton Crick, Crick and James D. Watson James Dewey Watson (born April 6, 1928) is an American molecular biologist, best known as one of the co-discoverers of the structure of DNA. Watson, Francis Crick, and Maurice Wilkins were awarded the 1962 Nobel Prize in Physiology or Medicine "for their discoveries concerning the worked out the structure of DNA. In that year, Crick Crick , Francis Henry Compton 1916-2004.
British biologist who with James D. Watson proposed a spiral model, the double helix, for the molecular structure of DNA. He shared a 1962 Nobel Prize for advances in the study of genetics. expounded what he called the "central dogma central dogma Molecular biology The pedagogical tenet that translation of a protein invariably follows a chain of molecular command, where DNA acts as the template for both its own replication and for the transcription to RNA–and with subsequent maturation, " of molecular biology molecular biology, scientific study of the molecular basis of life processes, including cellular respiration, excretion, and reproduction. The term molecular biology was coined in 1938 by Warren Weaver, then director of the natural sciences program at the Rockefeller : DNA's genetic information flows strictly one way, from a gene through a series of steps that ends in the creation of a protein. That principle developed into a modern orthodoxy, according to according to
1. As stated or indicated by; on the authority of: according to historians.
2. In keeping with: according to instructions.
3. which a genome is a collection of discrete genes located at specific spots along a strand of DNA. This old view got the basics right: that genes encode proteins and that proteins do the myriad work necessary to keep an organism alive.
Researchers slowly realized, however, that genes occupy only about 1.5 percent of the genome. The other 98.5 percent, dubbed "junk DNA junk DNA
DNA that does not code for proteins or their regulation but is thought to be involved in the evolution of new genes and in gene repair, and constitutes approximately 95 percent of the human genome. ," was regarded as useless scraps left over from billions of years of random genetic mutations. As geneticists' knowledge progressed, this basic picture remained largely unquestioned. "At one time, people said, 'Why even bother to sequence the whole genome? Why not just sequence the [protein-coding part]?'" says Anindya Dutta, a geneticist ge·net·i·cist
A specialist in genetics.
a specialist in genetics.
geneticist at the University of Virginia in Charlottesville.
Closer examination of the full human genome is now causing scientists to return to some questions they thought they had settled. For one, they're revisiting the very notion of what a gene is. Rather than being distinct segments of code amid otherwise empty stretches of DNA--like houses along a barren country road--single genes are proving to be fragmented, intertwined with other genes, and scattered across the whole genome.
Even more surprisingly, the junk DNA may not be junk after all. Most of this supposedly useless DNA now appears to produce transcriptions of its genetic code, boosting the raw information output of the genome to about 62 times what genes alone would produce. If these active nongene regions don't carry code for making proteins, just what does their activity accomplish?
"What we thought was important before was really just the tip of the iceberg tip of the iceberg
n. pl. tips of the iceberg
A small evident part or aspect of something largely hidden: afraid that these few reported cases of the disease might only be the tip of the iceberg. ," says Hui Ge of the Whitehead Institute Founded in 1982, the Whitehead Institute for Biomedical Research is a non-profit research and teaching institution located in Cambridge, Massachusetts. The Whitehead Institute was founded as a fiscally independent entity from Massachusetts Institute of Technology, and its members for Biomedical Research Biomedical research (or experimental medicine), in general simply known as medical research, is the basic research or applied research conducted to aid the body of knowledge in the field of medicine. in Cambridge, Mass.
With the genome sequence in hand, exploration has moved at a brisk pace during the past 6 years. A milestone was reached in June, when a project called the Encyclopedia of DNA Elements (ENCODE) thoroughly mapped the functional regions in 1 percent of the human genome. The effort involved was staggering: Thirty-five teams of scientists from around the world worked for 4 years and compiled more than 600 million data points, the consortium reported in the June 14 Nature.
From the accumulating mountains of data, scientists are building a new picture of how the genome works as a whole. They have found mutations in nongene regions of DNA that are linked to common diseases such as diabetes and forms of cancer. And some researchers propose that DNA once labeled junk could have spawned the complex bodies of higher organisms--even the complexities of the human brain.
SECOND FIDDLE second fiddle
1. A secondary role.
2. One who plays a secondary role.
Informal a person who has a secondary status
Noun TO SUPERSTAR In the emerging picture of the genome's functioning, many of the key elements identified so far are molecules of RNA RNA: see nucleic acid.
in full ribonucleic acid
One of the two main types of nucleic acid (the other being DNA), which functions in cellular protein synthesis in all living cells and replaces DNA as the carrier of genetic , a chemical cousin of DNA.
In the old central dogma, RNA had a strictly subservient role in the all-important task of making proteins. An RNA molecule is made from units of genetic code strung together, much like DNA. But while DNA has two strands twisted together into a double helix double helix
The coiled structure of a double-stranded DNA molecule in which strands linked by hydrogen bonds form a spiral configuration. Also called DNA helix, Watson-Crick helix. , RNA usually has only a single strand.
Protein synthesis Protein synthesis is the creation of proteins using DNA and RNA. Biological and artificial methods for creation of proteins differ significantly.
1. (tool, compression) unzip - To extract files from an archive created with PKWare's PKZIP archiver.
2. . Units of RNA then pair up with their counterparts on one of the DNA strands, forming a complementary messenger RNA mes·sen·ger RNA
See mRNA. (mRNA) molecule. The mRNA detaches and floats off to other parts of the cell, where it hooks up with machinery that transcribes its coded message into a protein.
If RNA's only job were making proteins, then nearly all the RNAs produced in cells should be transcripts of protein-coding genes. (A small fraction of RNAs serve in the protein-transcription machinery.) But in 2005, Jill Cheng and her colleagues at Affymetrix, a genomics company in Santa Clara Santa Clara, city, Cuba
Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba. , Calif., showed that less than half of the RNA produced by 10 of the chromosomes in human cells represented transcripts of traditional genes. In the team's experiments, 57 percent of the RNA was transcribed from noncoding, "junk" regions.
The results from ENCODE were even more striking. In the slice of DNA studied in that project, between 74 percent and 93 percent of the genome produced RNA transcripts. What becomes of this tremendous output is uncertain. John M. Greally of the Albert Einstein College of Medicine
The Albert Einstein College of Medicine (AECOM) is a graduate school of Yeshiva University. It is a private medical school located in the Jack and Pearl Resnick Campus of Yeshiva University in the Morris Park in New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of says it's likely that some portion of it is made accidentally and simply discarded. But the discovery that so much of the genome is being transcribed into RNA underscores how out-of-date the central dogma has become.
Indeed, the closer researchers look, the more functions they find that RNA transcripts perform. An alphabet soup of new acronyms describes the newfound roles of RNAs. First there were short nuclear RNAs (snRNAs) and short nucleolar nucleolar
pertaining to or emanating from nucleolus. RNAs (snoRNAs), both of which reside inside the nucleus and help control production of other RNAs. These were joined by microRNAs (miRNAs) and short interfering RNAs (siRNAs), which can modulate the activity of protein-coding genes. In mice, about 34,000 of the RNA transcripts produced by the genome are non-protein-coding, outnumbering the roughly 32,000 transcripts that code for proteins, according to a 2005 study by an international group of scientists called the Functional Annotation of Mouse Consortium.
These new families of RNAs add a layer of regulation that fine-tunes the production of proteins. While scientists already knew that some proteins influence the activity of other genes, "there are many more RNAs than proteins that play a regulatory role," Ge says.
Gene regulation may not sound sexy, but it's a powerful way for a cell to evolve complex behaviors using the tools--proteins--that it already has. Consider the difference between a one-bedroom bungalow and an ornate, three-story McMansion. Both are made from roughly the same materials--lumber, drywall, wiring, plumbing--and are put together with the same tools--hammers, saws, nails, and screws. what makes the mansion more complex is the way that its construction is orchestrated by rules that specify when and where each tool and material must be used.
In cells, regulation controls when and where proteins spring into action. If the traditional genome is a set of blueprints for an organism, RNA regulatory networks are the assembly instructions. In fact, some scientists think that these additional layers of complexity in genome regulation could be the answer to a long-standing puzzle.
GENOME AS NETWORK The biggest surprise in the first sequence of the human genome was how few protein-coding genes it contained.
"We humans do not have that many more genes than simpler organisms like flies or mice," Ge says. Earlier guesses of the number of genes in humans ran as high as 100,000, but the published sequence in fact contained only about 23,000. That's not much more than the roughly 21,000 genes possessed by the roundworm roundworm, another name for a nematode. See phylum Nematoda. , a microscopic creature without a brain. If protein-coding genes are the only functional elements in an organism's DNA, where does the extra information come from that's needed to assemble and operate the complex bodies and brains of people, as compared with the simplicity of roundworms? "If we just look at the number of genes, it doesn't make sense," Ge says.
While the number of genes isn't much different in roundworms and people, the human genome is 30 times the size of the roundworms'. People have a much larger quantity of DNA beyond what codes for proteins. Since much of this "junk" DNA is being transcribed into RNA, perhaps it's responsible for much of the complexity of human bodies and brains. In fact, organisms simpler than roundworms, such as single-celled bacteria, carry little non-coding DNA and may have no regulatory RNA at all.
"Scientists have been suspecting that it is the regulatory networks that lead to this amazing complexity" in higher organisms, Ge says.
John S. Mattick of the University of Queensland The University of Queensland (UQ) is the longest-established university in the state of Queensland, Australia, a member of Australia's Group of Eight, and the Sandstone Universities. It is also a founding member of the international Universitas 21 organisation. in Brisbane, Australia, points to a known example of the importance of regulatory RNAs: their crucial role in fetal development. For example, most multicellular mul·ti·cel·lu·lar
Having or consisting of many cells.
multi·cel animals possess a gene called Notch that helps guide neural development The study of neural development draws on both neuroscience and developmental biology to describe the cellular and molecular mechanisms by which complex nervous systems emerge during embryonic development and throughout life. , while the gene itself has much the same form in both simple and complex animals, its activity is regulated by miRNAs that are highly variable from one animal to another. Such miRNAs also influence a gene called Hox, which acts in many animals to define a fetus' body axis and the placement of its limbs.
What's more, the changes that distinguish human brains from those of chimpanzees and other apes could be due in part to evolutionary changes in RNAs that don't encode proteins. A group led by Katherine S. Pollard of the University of California, Davis The University of California, Davis, commonly known as UC Davis, is one of the ten campuses of the University of California, and was established as the University Farm in 1905. identified DNA sequences shared by people and chimpanzees, but with large differences, meaning that they have evolved rapidly since the two species shared a common ancestor.
The researchers found that one of these sequences is a non-coding region of DNA that's related to brain function, they reported in the Sept. 14, 2006 Nature. Pollard and her colleagues speculate that this region produces a regulatory RNA and that changes in this RNA contributed to the evolution of the human brain.
With regulatory RNAs appearing to play such an instrumental role in animal development, it's no surprise that scientists are finding disease-associated mutations in regions of the genome formerly regarded as junk.
David Altshuler of the Broad Institute in Cambridge, Mass., and his colleagues looked for DNA mutations in 1,464 patients with type 2 diabetes type 2 diabetes
See diabetes mellitus. . Three of the mutations that correlated with the disease were in DNA segments that don't code for proteins, the team reported in the June 1 Science. Other scientists have found mutations in noncoding DNA Noun 1. noncoding DNA - sequence of a eukaryotic gene's DNA that is not translated into a protein
deoxyribonucleic acid, desoxyribonucleic acid, DNA - (biochemistry) a long linear polymer found in the nucleus of a cell and formed from nucleotides and that link to diseases such as autism autism (ô`tĭzəm), developmental disability resulting from a neurological disorder that affects the normal functioning of the brain. It is characterized by the abnormal development of communication skills, social skills, and reasoning. , breast cancer, lung cancer lung cancer, cancer that originates in the tissues of the lungs. Lung cancer is the leading cause of cancer death in the United States in both men and women. Like other cancers, lung cancer occurs after repeated insults to the genetic material of the cell. , prostate cancer prostate cancer, cancer originating in the prostate gland. Prostate cancer is the leading malignancy in men in the United States and is second only to lung cancer as a cause of cancer death in men. , and schizophrenia.
To be sure, the specific functions of most of the noncoding DNA remain unknown. Projects such as ENCODE have focused on identifying the broad functional categories for active regions of the genome without working out the specific cellular function of each transcript, a task that will take biologists years, if not decades.
In fact, scientists debate whether some fraction of the genome's copious RNA Output might do nothing at all. It may simply be that once the cellular machinery that transcribes DNA into RNA gets started, it sometimes doesn't know when to stop. On the other hand, making lots of RNA that does nothing would be a waste of a cell's energy. That's something that natural systems tend to avoid, so the fact of its production argues for at least some of this RNA being biologically active.
THE GENE IS DEAD In the old view, each gene sat in splendid isolation on its segment of the genome. Other genes might be nearby, but scientists assumed that they didn't overlap each other.
Now it's clear that a single length of DNA can be transcribed in multiple ways to produce many different RNAs, some coding for proteins and others constituting regulatory RNAs. By starting and stopping in different places, the transcription machinery can generate a regulatory RNA from a length of DNA that overlaps a protein-coding gene. Moreover, the code for another regulatory RNA might run in the opposite direction on the facing strand of DNA. According to the ENCODE project results, up to 72 percent of known genes have transcripts on the facing DNA strand as well as the main strand.
"The same sequences are being used for multiple functions," says Thomas R. Gingeras of Affymetrix. That introduces complications into the evolution of the genome, which had until recently been assumed to act through single DNA mutations affecting single genes. Now, "a mutation in one of those sequences has to be interpreted not only in terms of [one gene], but [of] all the other transcripts going through the region," Gingeras explains.
The implications of this single mutation-multiple consequence model are still a matter of debate. In some cases, the RNA transcripts from DNA that overlaps a protein-coding gene regulate that same gene, so a mutation could affect both the structure and the regulation of a protein. But often, those transcripts regulate genes that are far away, or even on different chromosomes. This complex interweaving of genes, transcripts, and regulation makes the net effect of a single mutation on an organism much more difficult to predict, Gingeras says.
More fundamentally, it muddies scientists' conception of just what constitutes a gene. In the established definition, a gene is a discrete region of DNA that produces a single, identifiable protein in a cell. But the functioning of a protein often depends on a host of RNAs that control its activity. If a stretch of DNA known to be a protein-coding gene also produces regulatory RNAs essential for several other genes, is it somehow a part of all those other genes as well?
To make things even messier, the genetic code for a protein can be scattered far and wide around the genome. The ENCODE project revealed that about 90 percent of protein-coding genes possessed previously unknown coding fragments that were located far from the main gene, sometimes on other chromosomes. Many scientists now argue that this overlapping and dispersal of genes, along with the swelling ranks of functional RNAs, renders the standard gene concept of the central dogma obsolete.
LONG LIVE THE GENE Offering a radical new conception of the genome, Gingeras proposes shifting the focus away from protein-coding genes. Instead, he suggests that the fundamental units of the genome could be defined as functional RNA transcripts.
Since some of these transcripts ferry code for proteins as dutiful du·ti·ful
1. Careful to fulfill obligations.
2. Expressing or filled with a sense of obligation.
du mRNAs, this new perspective would encompass traditional genes. But it would also accommodate new classes of functional RNAs as they're discovered, while avoiding the confusion caused by several overlapping genes laying claim to a single stretch of DNA. The emerging picture of the genome "definitely shifts the emphasis from genes to transcripts," agrees Mark B. Gerstein, a bioinformaticist at Yale University.
Scientists' definition of a gene has evolved several times since Gregor Mendel first deduced the idea in the 1860s from his work with pea plants. Now, about 50 years after its last major revision, the gene concept is once again being called into question.