Printer Friendly

In silico prediction of the deleterious effect of a mutation: proceed with caution in clinical genetics.

Today, determination of the sequence of a human gene is one of the most common approaches used for the diagnosis of hereditary diseases. When a nucleotide change leading to an amino acid substitution is identified in a putative disease gene, there always remains the difficulty of confirming that the resulting structural modification is the cause of the pathologic manifestations. How is it possible to distinguish between sequence variations that are deleterious for the stability or for the function of the protein, leading to a disorder, and neutral variations that do not modify the phenotype? An increasing number of programs available on the World Wide Web have been proposed to answer this question (1-5). They take into account, to various degrees, factors such as the general rules of protein chemistry (e.g., change in charge or in hydrophobicity, or helix-breaking residue), the three-dimensional structure of the protein, and homologies in amino acid sequences among various species or related proteins. By comparing a submitted sequence to these data, these programs predict what could be the functional significance of the amino acid change identified.

We tested four families of hereditary diseases, using two such programs. The programs tested were PolyPhen (1,2) and Sorting Intolerant From Tolerant (SIFT) (3), which have been developed to predict whether a nonsynonymous single-nucleotide polymorphism is likely to have or not have a deleterious effect. The target genes were those causing hemoglobin disorders, glucose-ophosphate dehydrogenase deficiency, the receptor 1 for tumor necrosis factor-[alpha] (TNFRSFIA), responsible for tumor necrosis factor receptor-associated periodic syndrome (TRAPS; MIM 142680), and the MEFV gene, which is involved in familial Mediterranean fever (FMF; MIM 249100). In the two first cases, the abnormal phenotypes have always been demonstrated as resulting from disturbed stability, function, or expression of the protein. In the case of TNFRSF1A, functional tests strongly suggest that an alteration of this protein is causing the disease. Conversely, for MEFV, the hypothesis of the disease being only in linkage disequilibrium with the mutation cannot be completely ruled out.

Materials and Methods


The SIFT program ( html) uses "sequence homology to predict whether an amino acid substitution will affect protein function and hence, potentially alter phenotype" (3). Results are reported as "deleterious or not" according to scores. The PolyPhen program (http://www.bork.embl-heidelberg. de/PolyPhen/) uses the sequence homology and the "mapping of the substitution site to known protein 3-dimensional structures" (1, 2). In this case, results are given as "benign", "possibly damaging", "probably damaging", or "unknown".


The properties of hemoglobin variants involving the [alpha]- or [beta]-chains were those described in the HbVar database (6), which at present includes some 900 variants whose structure and function have been determined. We checked 20 [alpha]- and 19 [beta]-chain variants known as being without effect on the phenotype and most of the variants known to be responsible of severe disorders such as sickle cell disease, hemolytic anemia attributable to instability, erythrocytosis in relation with increased oxygen affinity, and methemoglobinemia.

A total of 112 glucose-6-phosphate dehydrogenase (G6PD) variants involving a single amino acid change have been reported (7), with their clinical impact classified according to the WHO from I, for the more severe, to 1V for those that are functionally neutral (8).

Sequences variants found in the MEFV and TNFRSFIA genes and responsible for FMF and TRAPS, respectively, are reported in the INFEVERS database (9). Thirty-eight substitutions affecting the TNFRSFIA gene have been described, but only 11 of those have been validated by functional test (10-13). Forty-seven substitutions have been found on the MEFV gene, but no functional test is available.

Results and Discussion


Nearly 900 hemoglobin variants have been reported in the literature (6), but only ~25% of those have been fully documented either as being without clinical manifestation or responsible for significant disorders. These variants were taken for example in this study.

Among the list of 20 variants of the [alpha]-chain, described as clinically neutral, 16 were proposed as being benign by PolyPhen, 2 as possibly damaging, and 2 as probably damaging; 7 were predicted as "deleterious" by SIFT. Among the 19 [beta]-chain variants clinically neutral, 17 were considered as benign, 1 as possibly damaging, and 1 as probably damaging; 10 were proposed as deleterious by SIFT.

An overall correct prediction was obtained when we tested 80 unstable [beta]-chain hemoglobin variants that lead to chronic hemolytic anemia: 70 were correctly recognized as probably, or at least possibly, damaging by PolyPhen, and all except 2 as deleterious by SIFT. These results were those expected from the localization of the abnormalities in regions strongly involved in the correct folding of the protein or close to the heme group. A good prediction was also obtained for those variants involving residues located in regions known to be crucial for oxygen binding and are known causes of erythrocythemia.

Nevertheless, a few major problems need to be pointed out concerning the [beta]-chain variants. For example, the [beta]-globin E6V mutation, which leads to hemoglobin S (Hb5), the most common abnormal Hb that was selected in many populations because of the protection it brings in the heterozygous state against malaria, is returned as benign by PolyPhen. Of course, in vitro, in dilute solutions, the functional properties of this hemoglobin are normal. In contrast, within the erythrocytes of homozygous or compound heterozygous patients, supramolecular interactions occur, which lead to the polymerization of Hb5 and cell deformation, producing sickle cell anemia (14). The prediction programs do not consider the possibility for these kinds of supramolecular interactions because they are ignored in the databanks, which consider only the structure of HbA or animal hemoglobins. This problem is also observed for variants such as Hb Punjab (E121Q), Hb O-Arab (E121K), and HbC (E6K), which when they interact with Hb5 lead to severe sickle cell syndromes. It is certainly dangerous to consider as benign such variants, which indeed are not deleterious for the simple heterozygous phenotype but need to be taken into account in genetic counseling. In a similar way, variants such as HbE (E26K) or Hb Knossos (A275), which lead to a hemoglobin molecule with normal function but unmask an alternative splice site and lead to a thalassemic defect, are returned as benign: from a clinical point of view they should be considered as deleterious.

An additional aspect, not considered by these programs, is a possible posttranslational modification. The amino acid exchange observed in the protein may differ from that deduced from the nucleotide sequence. For example, Hb J-Sardegna (HBA2; g.268C>G) is supposed to lead to the replacement of a His by an Asn when characterized by DNA sequencing (15). This variant is proposed by PolyPhen to be possibly damaging. In fact, because of the presence of another His in the neighborhood, Asn-50 is rapidly deaminated into an Asp, and by protein chemistry methods only Asp is found. This substitution is considered by PolyPhen to be probably damaging. Finally, experience shows that this variant is hematologically normal, in agreement with SIFT, which found this change to be tolerated. The best example involves the unstable Hb Bristol, in which the mutation encodes for a Met at position [beta]67 (HBB; g.332G>A) and leads to the incorporation of an Asp in the protein (16). Another typical example concerns variants of the NHZ terminus of the protein, which may lead to retention of the initiator methionine and to possible N-terminal acetylation. As an example, Hb Long Island-Marseille is expected, according to the nucleotide change, to have His-2 replaced by a Pro; this variant is suspected by PolyPhen to be probably damaging. In fact, this mutation leads to a much larger change of the [NH.sub.2] terminus (Ac-M-V-P-...) and does not modify the normal hematology characteristics in the heterozygoue.

In the case of hemoglobin variants, both programs led to an overall 75% accurate prediction of phenotypes. This quality of the results is certainly an effect of the large amount of data available on the tree-dimensional structure of this protein and of possible alignment with many closely related species. An improved prediction accuracy of these programs should certainly include rules of posttranslational modifications and protein N-terminal processing (17).


G6PD deficiency is one of the most frequent genetic abnormalities, affecting some 400 million people. The X chromosome encodes for this enzyme; therefore, some mutations may be considered as clinically silent in heterozygous women whereas they may cause severe hemolytic crises in males. Human G6PD is in equilibrium between homodimers and tetramers. Each monomer binds a molecule of NADP, which is important to stabilize the dimeric structure and is distinct from the one involved in the catalytic site (18). Among the 29 variants classified in group III or IV by the WHO, 20 were returned as benign by PolyPhen, 2 as possibly damaging, and 7 as probably damaging. With SIFT, 15 mutants were indicated as deleterious and 14 as benign. In the 49 variants of group I, corresponding to the severe cases, PolyPhen returned 31 as probably damaging and 12 as benign. SIFT considered 13 as being benign. The answers were not always identical (Fig. 1). Among those falsely considered as benign, several were located in the interaction area between subunits and in the region that binds to the structural NADP, which are not clearly documented in the three-dimensional structure database.


The "Mediterranean' mutation (5188F), which is responsible for one of the most common deficient phenotypes because it codes for an unstable protein with decreased activity (19) and thus classified in group I-II, was erroneously returned as benign by both programs.

As shown in Fig. 1, the correct phenotype was predicted in ~70% of the cases.


The described TNFRSF1A mutations are clustered in the region encoding for the cysteine-rich extracellular domains of the protein, spanning exons 2-4 of the genomic sequence. Most of these structural modifications would disrupt conserved extracellular disulfide bonds, whereas missense substitutions that do not involve cysteines most likely prevent the formation of highly conserved intra-chain hydrogen bonds (10). Impaired cleavage of the TNFRSF1A ectodomain on cellular activation, with reduction in the plasma concentration of soluble receptor, has been proposed as the mechanism underlying the hyperinflammatory response in TRAPS (11). Among 38 amino acid substitutions coded by the TNFRSF1A gene, only 11 have been validated by functional tests (10-13). Thirty-two (84.2%) were predicted as deleterious/damaging by both programs (Table 1). Y20D, H22Y, and T371 variants are returned as non deleterious by the SIFT program, and Y20H is benign in the PolyPhen program. R92Q and I170D are predicted as nondeleterious/benign by the two programs. Thus, both programs seem to correctly predict the deleterious effect of TNFRSF1A gene mutations; most of these mutations are indeed situated in regions that are essential to maintain the correct three-dimensional structure of the protein.


The prediction value is much more difficult to evaluate when sequence variations are found in a poorly studied gene that is only expected to be a possible "candidate" for a disease. This is the case for the MEFV gene in FMF, which encodes for a protein named "pyrin" (20, 21). Since the discovery of MEFV, no functional test has become available to confirm the deleterious effect of an identified mutation. The implications of the most frequent MEFV mutations (M694V, M694I, M680I, and V726A) have been suggested by epidemiologic and segregation studies (20, 21), but opinions are less clear for other variants. In the INFEVER database, among 47 reported missense substitutions of the MEFV gene, only 6 (12.8%)amino acid substitutions (R42W, 5108R, E148V, E251K, R354W, and Y688C) are predicted as deleterious/damaging by the two programs (Table 1). Nevertheless, these "private" sequence variants are rarely found in FMF patients. The prediction is discordant (benign/deleterious or damaging/non deleterious) for 15 amino acid substitutions. Surprisingly, 4 very common MEFV mutations (M694V, M694I, M680I, and V726A) and 22 other MEFV mutations were predicted as benign/non deleterious by both programs.

The discrepancy between the clinical presentation and the predictions made by the programs suggests either that not enough experimental data are available on this protein or that it is not the true candidate for the disease. The function of pyrin is still unclear. It has been shown that pyrin could modulate apoptosis by interacting with apoptotic speck protein. Exon 1 of MEFV, which encodes the pyrin domain was found to be necessary and sufficient for the pyrin/apoptotic speck protein interaction (22), but surprisingly, the most common MEFV mutations affect the C-terminal B30.2 domain of pyrin. Moreover, because of an ancient frame-shift mutation, the B30.2 domain is not present in rodents (23), and amino acids that cause human disease are often present as a wild type in primates (24). Finally, more thorough protein studies are required to demonstrate whether the identified MEFV mutations really affect pyrin function or are only sequences variations in linkage disequilibrium with some disorder carried by another gene.


These software programs are exciting new tools. They may be useful for scientists investigating a protein by suggesting regions that may be more interesting than others to be studied thoroughly for their function.

The few examples that we show nevertheless indicate that, in no case, should these in silico investigations replace a real functional test of the protein. There are two main reasons that confidence in these programs should be limited. The first is that they can only minimally consider supramolecular interactions with homologous molecules, as in HbS; other macromolecules, such as chaperones; or regions involved in control or signaling. The other reason is possible linkage disequilibrium between a given mutation and a disorder carried by another gene.

Such programs should therefore not be used, in the absence of other tests or arguments, to reach the conclusion that a sequence variation found in a patient is or is not responsible for his disease. Some limits of these programs are clearly announced by their authors, and even if in some cases the level of confidence is high, it remains too low for clinical purpose. These considerations should certainly prevent the geneticist from using in silico predictions for clinical prognosis. These algorithms were nevertheless recently used to predict phenotypes with reduced DNA repair capacity and thus increased cancer risk (25), and to test pigmentation phenotypes in relation with the melanocortin-1 receptor gene and risk for melanoma (26). The only question is how confident could a patient be in these predictions?


(1.) Sunyaev S, Ramensky V, Koch I, Lathe W, Kondrashov AS, Bork P. Prediction of deleterious human alleles. Hum Mol Genet 2001; 10:591-7.

(2.) Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002;30:3894-900.

(3.) Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003;31:3812-4.

(4.) Riva A, Kohane IS. SNPper: retrieval and analysis of human SNPs. Bioinformatics 2002;18:1681-5.

(5.) Stitziel N0, Binkowski TA, Tseng YY, Kasif S, Liang J. topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res 2004 Jan 1;32 Database issue:D520-2.

(6.) Hardison RC, Chui DH, Giardine B, Riemer C, Patrinos GP, Anagnou N, et al. HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Hum Mutat 2002;19:225-33.

(7.) Beutler E, Vulliamy TJ. Hematologically important mutations: glucose-o-phosphate dehydrogenase. Blood Cells Mol Dis 2002;28: 93-103.

(8.) Beutler E. G6PD deficiency. Blood 1996;84:3613-36.

(9.) Sarrauste de Menthiere C, Terriere S, Pugnere D, Ruiz M, Demaille J, Touitou I. INFEVERS: the Registry for FMF and hereditary inflammatory disorders mutations. Nucleic Acids Res 2003;31: 282-5.

(10.) Aksentijevich I, Galon J, Soares M, Mansfield E, Hull K, Oh HH, et al. The tumor-necrosis-factor receptor-associated periodic syndrome: new mutations in TNFRSFIA, ancestral origins, genotype phenotype studies, and evidence for further genetic heterogeneity of periodic fevers. Am J Hum Genet 2001;69:301-14.

(11.) McDermott MF, Aksentijevich I, Galon J, McDermott EM, Ogunkolade BW, Centola M, et al. Germline mutations in the extracellular domains of the 55 kDa TNF receptor, TNFR1, define a family of dominantly inherited autoinflammatory syndromes. Cell 1999;97: 133-44.

(12.) Dode C, Papo T, Fieschi C, Pecheux C, Dion E, Picard F, et al. A novel missense mutation (C30S) in the gene encoding tumor necrosis factor receptor 1 linked to autosomal-dominant recurrent fever with localized myositis in a French family. Arthritis Rheum 2000;43:1535-42.

(13.) Aganna E, Hammond L, Hawkins PN, Aldea A, McKee SA, van Amstel HK, et al. Heterogeneity among patients with tumor necrosis factor receptor-associated periodic syndrome phenotypes. Arthritis Rheum 2003;48:2632-44.

(14.) Ferrone F, Nagel RL. Polymer structure and polymerization of deoxyhemoglobin S. In: Steinberg MH, Foorget BG, Higgs DR, Nagel RL, eds. Disorders of hemoglobin. New York: Cambridge University Press, 2000:577-610.

(15.) Paleari R, Paglietti E, Mosca A, Mortarino M, Maccioni L, Satta S, et al. Posttranslational deamidation of proteins: the case of hemoglobin J Sardegna [[alpha]50(CD8)His [right arrow] Asn [right arrow] Asp]. Clin Chem 1999;45:21-8.

(16.) Rees DC, Rochette J, Schofield C, Green B, Morris M, Parker NE, et al. A novel silent posttranslational mechanism converts methionine to aspartate in hemoglobin Bristol ([beta]67[E11]Val [right arrow] Met [right arrow] Asp). Blood 1996;88:341-8.

(17.) Boissel JP, Kasper TJ, Bunn HF. Cotranslational amino-terminal processing of cytosolic proteins. Cell-free expression of site-directed mutants of human hemoglobin. J Biol Chem 1988;263: 8443-9.

(18.) Au SW, Gover S, Lam VM, Adams MJ. Human glucose-o-phosphate dehydrogenase: the crystal structure reveals a structural NADP(+) molecule and provides insights into enzyme deficiency. Structure Fold Des 2000;8:293-303.

(19.) Piomelli S, Corash LM, Davenport DD, Miraglia J, Amorosi EL. In vivo lability of glucose-o-phosphate dehydrogenase in GdA- and GdMediterranean deficiency. J Clin Invest 1968;47:940-8.

(20.) The International FMF Consortium. Ancient missense mutations in a new member of the RoRetgene family are likely to cause familial Mediterranean fever. Cell 1997;90:797-807.

(21.) The French FMF Consortium. A candidate gene for familial Mediterranean fever. Nat Genet 1997;17:25-31.

(22.) Richards N, Schaner P, Diaz A, Stuckey J, Shelden E, Wadhwa A, et al. Interaction between pyrin and the apoptotic speck protein (ASC) modulates ASC-induced apoptosis. J Biol Chem 2001;276: 39320-9.

(23.) Chae JJ, Centola M, Aksentijevich I, Dutra A, Tran M, Wood G, et al. Isolation, genomic organization, and expression analysis of the mouse and rat homologs of MEFV, the gene for familial Mediterranean fever. Mamm Genome 2000;11:428-35.

(24.) Schaner P, Richards N, Wadhwa A, Aksentijevich I, Kastner D, Tucker P, et al. Episodic evolution of pyrin in primates: human mutations recapitulate ancestral amino acid states. Nat Genet 2001;27:318-21.

(25.) Xi T, Jones IM, Mohrenweiser HW. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function. Genomics 2004;83:970-9.

(26.) Kanetsky PA, Ge F, Najarian D, Swoyer J, Panossian S, Schuchter L, et al. Assessment of polymorphic variants in the melanocortin-1 receptor gene with cutaneous pigmentation using an evolutionary approach. Cancer Epidemiol Biomarkers Prev 2004;13:808-19.

Dimitri Tchernitchko

Michel Goossens

Henri Wajcman *

Service de Biochimie et de Genetique Moleculaire and INSERM U468

Hopital Henri-Mondor

94010 Creteil, France

* Author for correspondence. Fax 33-1-4981-2842; e-mail

DOI: 10.1373/clinchem.2004.036053
Table 1. In silico prediction by the SIFT and PolyPhen programs of the
deleterious effects of amino acid substitutions coded by TNFRSF1A
and MEFV genes.

 In silico prediction
 Gene (no. of
 tested amino SIFT: deleterious/ SIFT: non deleterious/
acid substitutions) PolyPhen: damaging (a) PolyPhen: damaging (a)

TNFRSF1A (n = 38) 32 3
MEFV (n = 47) 6 8

 In silico prediction
 Gene (no. of
 tested amino SIFT: deleterious/ SIFT: non deleterious/
acid substitutions) PolyPhen: benign PolyPhen: benign

TNFRSF1A (n = 38) 1 2
MEFV (n = 47) 7 26

(a) Possibly or probably damaging.
COPYRIGHT 2004 American Association for Clinical Chemistry, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Opinion
Author:Tchernitchko, Dimitri; Goossens, Michel; Wajcman, Henri
Publication:Clinical Chemistry
Date:Nov 1, 2004
Previous Article:Development of an ELISA for the determination of the major haptoglobin phenotype: an interesting technical development and its potential consequences.
Next Article:Immunoquantification of [alpha]--galactosidase: evaluation for the diagnosis of Fabry disease.

Related Articles
Myriad Genetics Publishes Results of 10,000 Genetic Tests For Risk of Breast, Ovarian Cancer; - Predictive Testing Valuable in Women With Just One...
Many diseases still elude genetic risk prediction.
GMP Companies Inc. Licenses Patent Rights to Mayo Medical Laboratories for Conversion Technology(R) Process.
Mutation carriers in colorectal cancer.
BRCA-negative breast ca, ovarian ca not linked.
BRCA-negative breast cancer does not raise ovarian risk.
Predicting Lynch syndrome propensity to cancer: two new models help sort out which patients need extensive genetic testing for the hereditary...
'Genetic medicine' for internists.

Terms of use | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters