The new frontier in clinical laboratory science.
DNA structure and complementary base pairing
DNA is the chemical unit of genes, which are the single unit of heredity. With few exceptions, a single gene usually codes for a single protein. Genes never code for other chemical entities and are converted to proteins by transcription and translation (see illustration, p. 29). Transcription converts DNA to RNA, and translation converts RNA to protein. Activation of transcription is controlled by components of a gene regulatory unit. By reviewing the nucleotide structure of DNA, it's easier to understand the mechanism and regulation of transcription and translation.
DNA is composed of four structural units called nucleotides: adenosine (A), thymidine (T), guanosine (G), and cytidine (C). Each nucleotide (A, G, C, and T) contains 3 components [ILLUSTRATION FOR FIGURE 1 OMITTED]: a deoxyribose (a five-carbon sugar), a triphosphate group, and a base, which can be adenine, thymine, guanine, or cytosine. The nucleotides are linked by what is called a 5[feet] to 3[feet] linkage in which the fifth carbon of one deoxyribose bonds to the third carbon of the adjacent deoxyribose via the triphosphate group. A phosphodiester bond is formed when carbon atoms are linked by a triphosphate group; and by linking nucleotides together through this bond, single-stranded (ss) DNA is formed [ILLUSTRATION FOR FIGURE 1 OMITTED]. In the nucleus, however, DNA exists in a double-stranded (ds) configuration, which is produced when two ssDNA molecules undergo hydrogen bonding between A-T and G-C. Also known as complementary base pairing, the hydrogen bonding of both A to T and C to G is in a 1:1 ratio. The resulting three-dimensional structure of dsDNA is a double helix, which is the most stable energy configuration for a pair of ssDNA molecules linked by hydrogen bonds (between the base groups of nucleotides) and phosphodiester bonds (between deoxyribose groups).
When linked by complementary base pairing, the two strands that comprise dsDNA are called sense and antisense strands, where the DNA antisense strand is the complementary base pair copy of the DNA sense strand [ILLUSTRATION FOR FIGURE 2 OMITTED]. The sense strand contains the nucleotide sequence for the amino acid sequence of a protein. The genetic code for each amino acid is a sequence of three nucleotides called a codon. For example, the codon GCA encodes for the amino acid alanine. Because each codon is composed of 3 nucleotides, a protein constructed of 100 amino acids is encoded by 100 codons containing 300 nucleotides. A reading frame is the correct sequence of codons that encode for a specific bioactive protein. Codons are found within sections of a gene called exons, and introns are sections of a gene that do not contain codons. Exons are not contiguous - they are segregated by introns [ILLUSTRATION FOR FIGURE 3 OMITTED].
Protein expression step 1: Transcription
Transcription is a multistep, enzymatically mediated process. The first step occurs within the nucleus and involves synthesis of the primary RNA transcript or heterogeneous nuclear RNA (hnRNA) [ILLUSTRATION FOR FIGURE 3 OMITTED]. The primary RNA transcript is synthesized by adding complementary base pair nucleotides in a 5[feet] to 3[feet] direction relative to the DNA antisense strand. This event is catalyzed by DNA-dependent RNA polymerase II. Three known RNA polymerases exist in eukaryotic cells, and all RNA polymerases require the presence of antisense DNA as a template. The resulting primary RNA strand is complementary to the DNA antisense strand and is identical to the nucleotide sequence expressed by the sense DNA strand, except that the nucleotide uridine (U) replaces T during complementary base pairing (A-U in RNA, A-T in DNA) [ILLUSTRATION FOR FIGURE 2 OMITTED]. Other differences between RNA and DNA include:
* RNA contains a ribose sugar hydroxylated at the second carbon, while DNA contains a deoxyribose sugar protonated (hydrogen atom) at the second carbon
* RNA only exists in the single-stranded form.
Because the entire gene length of the DNA antisense strand is transcribed, exons and introns compose the primary RNA transcript. The intron sequences are removed from the primary RNA transcript during RNA splicing, another step in the transcription process. The introns display boundary sequences at a 5[feet] donor site and a 3[feet] acceptor site. The nucleotide sequence flanked by the donor and acceptor site loop out and are removed enzymatically; the remaining exons are then linked together [ILLUSTRATION FOR FIGURE 3 OMITTED].
Intron removal by RNA splicing permits the exons to become contiguous by phosphodiester 5[feet] to 3[feet] linkage. This results in the formation of messenger RNA (mRNA) that contains Only exons. The completed (and normally intron-free) mRNA is then transported out of the nucleus and into the cytoplasm, where it will be translated enzymatically to form a single specific protein. RNA splicing is crucial, because mRNA that contains intron sequences will stop translation. Normally, any mRNA that contains introns or their remnants is enzymatically degraded within the nucleus. If translation of an intron-containing mRNA does occur, the protein will be either dysfunctional or non-functional as it will express a weak or inappropriate biological function. A dysfunctional protein can result in disease (i.e., defective beta globin in beta thalassemia). An example of a non-functional protein is the adenamotous polyposis coli (APC) protein, caused by a mutation in the APC gene. The non-functional APC protein is found in colon cancer of the same name, as well as other types of colon cancer. This will be discussed in part 2 of this review series.
Transcriptional control. Many diseases, such as cancer and beta thalassemia, have been shown to be the result of defects in transcriptional control. Transcription is controlled by the gene regulatory unit, which is composed of 2 sites. The promoter is first and is defined by the nucleotide sequence TATA residing -10 to -30 base pair (bp) nucleotides upstream to the left of the start site [ILLUSTRATION FOR FIGURE 4A OMITTED]. (Nucleotide sequences are preceded by a minus sign when upstream from the start site.) The start site or start codon is the first codon of the first exon. Transcription begins when RNA polymerase II binds to the promoter site. Another set of proteins called transcription factors help. RNA polymerase bind to the promoter site.
The second regulatory site is the enhancer, which is a region -1 to -3 kilobases (kb; or 1,000 to 3,000 base pairs) upstream from the start site [ILLUSTRATION FOR FIGURE 4A OMITTED]. The enhancer region of the gene regulatory unit is targeted by DNA-binding proteins, also known as gene regulatory proteins. DNA-binding proteins recognize specific nucleotide sequences expressed by the enhancer site. These regulatory proteins bind to the enhancer site, causing the DNA strand to loop out and contact the promoter-RNA polymerase-transcription factor complex [ILLUSTRATION FOR FIGURE 4B OMITTED]. Because of the extreme distance of the enhancer from the promoter, this transcriptional control mechanism found in eukaryotic cells is called gene activation at a distance or action at a distance. Thus when activated by DNA-binding proteins, the enhancer-promoter interaction is the control switch that initiates transcription by signaling the RNA polymerase to bind to the promoter and start codon ATG. The start codon ATG is the nucleotide sequence for the amino acid methionine. Therefore, all proteins begin with methionine, and all codons are transcribed enzymatically by RNA polymerase until the stop codon is reached. RNA polymerase activity begins at the start site of the gene, defined by the codon ATG, which is also the nucleotide sequence for the amino acid methionine. Therefore, all proteins begin with methionine, and all the codons are transcribed until the stop codon is reached (TAA, TGA or TAG) where transcription is terminated.
Protein expression step 2: Translation
As previously mentioned, proteins must be present in the correct quantity and have the correct amino acid sequence for organ systems to function normally. By generating the correct sequence, translation is essentially the conversion of mRNA to protein. Translation also requires two other types of RNA: ribosomal RNA (rRNA) of the rough endoplasmic reticulum (present in the cytoplasm) and transfer RNA (tRNA). Ribosomal RNA is essentially the anchor for the interaction between mRNA and tRNA in the cytoplasm. tRNA is a cloverleaf-shaped structure that contains two important components: an anti-codon and the associated amino add covalently linked to that same tRNA [ILLUSTRATION FOR FIGURE 5A OMITTED]. The anti-codon of tRNA is actually the complementary base sequence to the codon sequence of mRNA.
The assembly of amino acids connected to the tRNA results in the linkage of amino acids in the correct sequence prescribed by the codons of mRNA's exons. Catalyzed by the enzyme peptidyl transferase, the amino acids are linked by removing (1) a hydroxyl group from the carboxylate group (COOH) of the preceding amino acid and (2) a proton from the amine group (N[H.sub.2]) of the following amino add [ILLUSTRATION FOR FIGURE 5A OMITTED]. By forming a peptide bond, this linkage of amino adds produces a polypeptide (or amino acid) chain [ILLUSTRATION FOR FIGURE 5B OMITTED]. The length of the amino acid chain, referred to as the number of amino acids, varies with the type of protein formed. Thus, protein synthesis can be summarized as beginning with methionine and sequentially linking together subsequent amino acids to construct a protein translated from mRNA.
Form dictates function
Proteins have many functions in cell biology. Enzymes (e.g., lipase), antibodies, receptors, growth factors, structural proteins (e.g., collagen), and many hormones (e.g., insulin or growth hormone) are examples of the wide variety of proteins, each with a specialized function. This functional specificity is determined by a protein's unique amino acid sequence, known also as primary structure, which specifies the subsequent levels of secondary, tertiary, and quaternary structure [ILLUSTRATION FOR FIGURE 6 OMITTED]. These hierarchies of structure are actually levels of protein folding that determine the protein's 3-dimensional conformation and specific bioactivity.
When amino acid elongation during translation is completed after the polypeptide is completely assembled, the resulting primary structure [ILLUSTRATION FOR FIGURE 6A OMITTED] then defines the secondary structure, which occurs when the amino adds form hydrogen bonds to create two basic forms: alpha helixes and beta-pleated sheets [ILLUSTRATION FOR FIGURE 6B OMITTED]. Both spontaneously form globular structures that may combine to form domains; globular structures and domains define a protein's tertiary structure (the third level of protein folding) [ILLUSTRATION FOR FIGURE 6C OMITTED]. If a protein is composed of two or more domains, a subunit may be formed. When linked by disulfide bonds, subunits form quaternary structure [ILLUSTRATION FOR FIGURE 6D OMITTED]. The tertiary and quaternary structure of many hormones (e.g., thyroid stimulating hormone, human chorionic gonadotropin) are levels of protein folding that define function.
The specificity of protein function is established by either (1) the formation of a unique three-dimensional conformation imparted by the four levels of protein folding or (2) the protein's ability to change shape - that is, alter three-dimensional conformation in response to ambient conditions. This property is allostery, and a protein is allosteric if it can increase or decrease its ability to participate in a reaction. Allosteric changes can be induced by binding the protein to another protein called a ligand, while the allosteric protein that changes shape is often a receptor. Typically, receptors are either bound to a cell membrane (e.g., insulin receptor), or cell-free and soluble (e.g., antibodies, hemoglobin). The allosteric properties of proteins were initially determined by studying the hemoglobin molecule. Changes in hemoglobin conformation determine its binding affinity for oxygen in response to differences in the partial pressure of oxygen in the blood (the basis for the oxyhemoglobin dissociation curve).
Clearly, the expression of proteins with the correct conformational and allosteric properties is essential for cells to perform their specific function(s). Additionally, only the proteins required for cell function are expressed. Gene mutations, which are changes in the nucleotide sequence of a gene, may (1) disrupt the reading frame of the gene's codons, or (2) produce an amino acid substitution within an intact reading frame. Therefore, genetic mutations result in either (1) the complete absence of an essential protein or (2) the expression of functionally altered proteins that do not subserve organ function. The identification and documentation of these disease-causing genetic mutations are the main focus of molecular pathology.
A molecular future for clinical lab scientists
The time between the elucidation of structure of DNA and the present is highlighted by a constellation of discoveries in the genetic regulation of cell function. Since then, our understanding of how genes regulate cell function has dramatically increased in the wake of other discoveries that became the foundation for current concepts linking genes and health.
At the molecular level, the basis of health resides in the ability of cells to maintain their normal function programmed by at least four genetically controlled processes: inter- and intracellular signaling, cell cycle control, differentiation, and apoptosis. Encoded by the p53 gene that is translated to a protein of the same name, apoptosis produces programmed cell death when DNA damage is not repaired. Entire volumes have been devoted to apoptosis and the other three processes. If any one process or a combination of these four key processes contains genetic mutations, translation will either not occur or will result in the expression of dysfunctional proteins.
Many types of mutations exist, but point mutations are the focus here. A point mutation is an insertion, deletion, or substitution of a single nucleotide or base. If the point mutation is an insertion or deletion, a shift occurs in the reading frame of the nucleotide sequence that defines the codons for a functional protein. The resulting frame shift will not permit translation since the nucleotide sequence does not conform to the codons for specific amino adds. If one base replaces another, a different amino acid or codon will occupy that position in the reading frame, producing an amino acid substitution. A classic example of this is sickle cell disease, where inheritance of the S gene results in valine (GUG) substituting for glutamic acid (GAG) at the 6th position of the amino acid sequence for beta globin. The point mutation in this example is the U that substitutes for A in the codon for glutamic add. This mutation causes aberrations in the allosteric properties of globin. Consequently, deoxygenated red cells lose their normal biconcave shape and become sickle-shaped. Anemia and other clinical manifestations of sickle cell disease are produced when sickled red cells occlude small blood vessels.
Sickle cell anemia is an example of how an event at the molecular level (point mutation) produces cellular changes that manifest as clinical disease. Other point mutations produce serious cellular disorders. For example, breast cancer has been linked with point mutations in two or more genes, one involved with the control of cell growth (BRCA-1 gene) and the other with apoptosis (p53 gene).[6-7] The pathogenesis of atherosclerosis and coronary artery disease is strongly associated with mutations in the gene for low density lipoprotein receptor.[8-10]
As previously stated, the mission of molecular pathology is to identify the genetic mutations that contribute to or cause disease. With rare exception, diseases not caused by pathogenic microorganisms are the result of a mutation of at least one gene involved in the regulation of cellular function. When comparing a mutated gene with its normal counterpart, the nucleotide sequence differences may be sufficient to cause synthesis of a protein with altered structure and function that will cause disease.
To identify genetic point mutations, a patient's DNA must be sequenced. In current clinical practice, the most common source of DNA and RNA is a whole blood sample, with additional sources from tissue biopsies and sections. The results of a patient's DNA nucleotide sequence are compared to nucleotide sequence results archived in a database of normal genes. Therefore, by employing techniques in molecular biology (e.g., DNA sequencing) to perform these comparisons, clinical laboratory scientists can identify some specific genetic diseases before clinical expression, based on a specific gene mutation or several associated mutations.
With the patients consent, physicians can then treat some genetic diseases. Prophylaxis can include surgical intervention, drug therapy, or the use of specific antisense nucleotide sequences to block the translation of the sense mRNA transcript called antisense RNA therapy.
The completion of the human genome project will have an unprecedented impact on healthcare. Knowledge of the human genome's entire nucleotide sequence will deliver the holy grail of genetic databases - the human genetic codes for every protein will be unlocked. Using diagnostic techniques in molecular pathology, clinical laboratory scientists and other biotechnologists will be able to document the presence or absence of mutations that could preclude normal cell and organ function. Examples of these diagnostic techniques include:
* polymerase chain reaction (PCR), which is used to identify low quantifies of DNA and increase the quantity of DNA
* immunochemistry, which uses antibodies that recognize specific proteins
* In situ hybridization, which uses nucleotide probes that hybridize to a target sequence of a cell's DNA or RNA.
Using the above key techniques, molecular pathologists study the relationship between disease and gene structure, expression, and regulation. The second part of this continuing education series on molecular pathology will review these diagnostic techniques and their support technologies. Additional information about molecular pathology and disease can be found at the following Web sites:
* American Cantor Society, Cancer Risk Report 1997, cancer screening at www.cancer.org/statistics/97crr/screentoc.html
* Association of Molecular Pathology at zapruder.path.med.umich.edu/users/amp/
* U.S. Department of Energy, Human Genome Program Publications at www.ornl.gov/hgmis/pulicat/publications.html
* Vanderbilt University Clinical Research Center, at www.mc.vanderbilt.edu/gcrc/gene/index.html
* BioChemNet: Biotechnology at schmidel.com/bionet.htm.
To earn CEUs, see test on page 38.
1. Define the terms codon, exon, intron, and reading frame.
2. Discuss the steps by which genetic information encoded in DNA directs the synthesis of a single, specific protein.
3. Describe the 4 structural levels of protein organization.
4. Give 1 example of a protein whose bioactivity depends upon that level of molecular organization.
5. Identify 3 clinical conditions that are linked to genetic mutation.
CE test published through an educational grant from
1. Judson, HF. The Eighth Day of Creation: Makers of the Revolution in Biology. NY: Simon and Schuster; 1979.
2. Alberts B, et al. Part II, Molecular Genetics. Molecular Biology of the Cell. 3rd ed. New York: Garland Publishing; 1994; 195-474.
3. He TC, Sparks AB, Rago C, Hermeking H, Zawel L, da Costa LT, Morin PJ, Vogelstein B, Kinzler KW. Identification of c-MYC as a target of the APC pathway. Science. 1998;281(5382):1509-12.
4. Peninsi E. How a growth control path takes a wrong turn to cancer. Science. 1998;281(5382): 1438-1441.
5. Alberts B, et al. The Control of Gene Expression. Molecular Biology of the Cell. 3rd ed. New York: Garland Publishing; 1994;(9).401-474.
6. Cotran RS, et al. Chapter 7: Neoplasia. In: Schoen F, Ed. Pathologic Basis of Disease. 5th ed. Philadelphia: W.B. Saunders Company, 1994; 241-303.
7. Warmuth MA, Sutton LM, Winer EP. A review of hereditary breast cancer. from screening to risk factor modification. Am J Med. 1997;102(4):407-415.
8. Ludwig EH, Hopkins PN, Allen A, Wu LL, Williams RR, Anderson JL, Ward Riff, Lalouel JM, Innerarity TL. Association of genetic variations in apolipoprotein B with hypercholesterolemia, coronary artery disease, and receptor binding of low density lipoproteins. J Lipid Res. 1997;38(7): 1361-1373.
9. Morash BA, Tan MH, Nassar BA, Too CK, Guernsey DL. A novel mutation in exon 4 of the low density lipoprotein receptor gene resulting in heterozygous familial hypercholesterolemia associated with decreased ligand binding. Atherosclerosis. 1998; 136(1):9-16.
10. Cotran RS, et al. Chapter 5: Genetic disorders. In: Schoen F, Ed. Pathologic Barb of Disease. 5th ed. Philadelphia: W.B. Saunders Company; 1994; 35-137.
11. Lynch HT, Fusaro RM, Lynch JF. Cancer genetics in the new era of molecular biology. Am NY Acad. Sci. 1997; 833:1-28.
12. Mercola D, Cohen JS. Antisense approaches to cancer gene therapy. Cancer Gene Ther. 1995;2(1): 47-59.
Anthony Capetandes is assistant professor, Department of Health Sciences, Long Island University/C.W. Post Campus, Greenvale, NY.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Molecular Pathology, part 1|
|Publication:||Medical Laboratory Observer|
|Date:||Jan 1, 1999|
|Previous Article:||Overcoming geographic isolation.|
|Next Article:||Mediation: a positive alternative in conflict resolution for clinical laboratories.|