Assessment of prediction confidence and domain extrapolation of two structure-activity relationship models for predicting estrogen receptor binding activity.Quantitative structure-activity relationship Quantitative structure-activity relationship (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. (QSAR QSAR Quantitative Structure-Activity RelationshipQSAR Quality System Audit Report QSAR Quality Service Activity Report QSAR Québec Secours Search and Rescue (Canada) ) methods have been widely applied in drug discovery, lead optimization, toxicity toxicity /tox·ic·i·ty/ (tok-sis´i-te) the quality of being poisonous, especially the degree of virulence of a toxic microbe or of a poison. prediction, and regulatory decisions. Despite major advances in algorithms and software, QSAR models have inherent limitations associated with a size and chemical-structure diversity of the training set, experimental error, and many characteristics of structure representation and correlation algorithms. Whereas excellent fit to the training data may be readily attainable, often models fail to predict accurately chemicals that are outside their domain of applicability. A QSAR's utility and, in the case of regulatory decisions, justification for usage increasingly depend on the ability to quantify a model's potential for predicting unknown chemicals with some known degree of certainty. It is never possible to predict an unknown chemical with absolute certainty. Here we report on two QSAR models based on different data sets for classification of chemicals according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. their ability to bind to to contract; as, to bind one's self to a wife s>. See also: Bind the estrogen receptor estrogen receptor A protein of a superfamily of nuclear receptors for small hydrophilic ligands–eg, steroid hormones, thyroid hormone, vitamin D, retinoids; the presence of ERs in breast CA generally is associated with a better prognosis, as they respond to . The models were developed by using a novel QSAR method, Decision Forest, which combines the results of multiple heterogeneous but comparable Decision Tree models to produce a consensus prediction. We used an extensive cross-validation process to define an applicability domain The Applicability Domain (AD) of a QSAR is the physico-chemical, structural or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds. for model predictions based on two quantitative measures: prediction confidence and domain extrapolation (mathematics, algorithm) extrapolation - A mathematical procedure which estimates values of a function for certain desired inputs given values for known inputs. If the desired input is outside the range of the known values this is called extrapolation, if it is inside then . Together, these measures quantify the accuracy of each prediction within and outside of the training domain. Despite being based on large and diverse training sets, both QSAR models had poor accuracy for chemicals within the domain of low confidence, whereas good accuracy was obtained for those within the domain of high confidence. For prediction in the high confidence domain, accuracy was inversely proportional See See also: Inversely to the degree of domain extrapolation. The model with a larger training set of 1,092, compared with 232 for the other, was more accurate in predicting chemicals at larger domain extrapolation, and could be particularly useful for rapidly prioritizing potential endocrine disruptors Endocrine disruptors are exogenous substances that act like hormones in the endocrine system and disrupt the physiologic function of endogenous hormones. Studies have linked endocrine disruptors to adverse biological effects in animals, giving rise to concerns that low-level from large chemical universe. Key words: applicability domain, Decision Forest, domain extrapolation, EDCs, endocrine-disrupting chemicals, estrogen receptor binding, QSAR, prediction confidence, regulatory application. ********** Quantitative structure-activity relationships (QSARs) have been extensively applied in a broad range of scientific areas, including chemistry, biology and toxicology toxicology, study of poisons, or toxins, from the standpoint of detection, isolation, identification, and determination of their effects on the human body. Toxicology may be considered the branch of pharmacology devoted to the study of the poisonous effects of drugs. (Hansch et al. 1995a, 1995b). QSAR is now an inexorably in·ex·o·ra·ble adj. Not capable of being persuaded by entreaty; relentless: an inexorable opponent; a feeling of inexorable doom. See Synonyms at inflexible. imbedded imbedded, adj See embedded. tool in drug development, from lead discovery to lead optimization (Hopfinger and Tokarski 1997; Kubinyi et al. 1998). There is increasing use of QSAR early in the drug discovery process as a screening and enrichment enrichment Food industry The addition of vitamins or minerals to a food–eg, wheat, which may have been lost during processing. See White flour; Cf Whole grains. tool to eliminate from further development those chemicals lacking drug-like properties (Lipinski et al. 1997) or those chemicals predicted to elicit e·lic·it tr.v. e·lic·it·ed, e·lic·it·ing, e·lic·its 1. a. To bring or draw out (something latent); educe. b. To arrive at (a truth, for example) by logic. 2. a toxic response. The availability of powerful new algorithms and scientists trained in their usage suggests the eventual common use of QSAR beyond the pharmaceutical industry to human and environmental regulatory authorities Noun 1. regulatory authority - a governmental agency that regulates businesses in the public interest regulatory agency administrative body, administrative unit - a unit with administrative responsibilities (Benigni and Richard 1998; Bradbury 1994; Hansch et al. 1995a, 1995b; Russom et al. 1995; Schultz and Seward 2000; Tong tong 1 tr.v. tonged, tong·ing, tongs To seize, hold, or manipulate with tongs. [Back-formation from tongs. et al. 2002, 2003a). Any QSAR model produces some degree of error. This is partially due to the inherent limitation in predicting a biological activity based solely on the chemical structure. One can argue from the principles of chemistry that molecular structure of a chemical is the key to understanding its physicochemical physicochemical /phys·i·co·chem·i·cal/ (fiz?i-ko-kem´ik-il) pertaining to both physics and chemistry. phys·i·co·chem·i·cal adj. 1. Relating to both physical and chemical properties. properties and ultimately its biological activity and the influence on organisms (Johnson and Maggiora 1990). However, biological activity of a chemical is an induced response that is influenced by numerous factors dictated by the levels of biological complexity of the system under investigation. The relationship between structure and activity is thus more implicit and thereby requires a more thorough investigation and rigorous validation (Tong et al., 2004). Application of QSARs in regulation has proven to be cost effective for prioritizing untested chemicals for more extensive and costly experimental evaluation. However, for QSARs to be accepted by the regulatory communities, their limitation for use needs to be identified. This is important because a QSAR model's ability to predict unknown chemicals depends largely on the nature of the training set and the algorithm used to establish the structure-activity relationship Structure-activity relationship is the traditional Practices of Medicinal chemistry which try to modify the effect or the potency of Bioactive chemical compound by modifying its Chemical structure. (Eriksson et al. 2003). A model's predictive accuracy and confidence for different unknown chemicals varies according to how well the training set represents the unknown chemicals and how robust the model is in extrapolating beyond the chemistry space defined by the training set (i.e., training domain). Therefore, assessing a model's "prediction confidence," defined as the certainty for a prediction, and "domain extrapolation," defined as the prediction accuracy outside the training domain, is a vital step toward defining the application domain of a model for the regulatory acceptance of QSARs. A large number of environmental chemicals known as endocrine-disrupting chemicals (EDCs) are suspected of disrupting endocrine endocrine /en·do·crine/ (en´do-krin, en´do-krin) 1. secreting internally. 2. pertaining to internal secretions; hormonal. See also under system. en·do·crine adj. functions by mimicking or antagonizing natural hormones in experimental animals, wildlife, and humans (Hileman 1997). EDCs may exert adverse effects through a variety of mechanisms, including estrogen receptor (ER)-mediated mechanisms of toxicity (Fang et al. 2003b). Accordingly, the U.S. Congress in 1996 mandated that the U.S. Environmental Protection Agency Environmental Protection Agency (EPA), independent agency of the U.S. government, with headquarters in Washington, D.C. It was established in 1970 to reduce and control air and water pollution, noise pollution, and radiation and to ensure the safe handling and (EPA EPA eicosapentaenoic acid. EPA abbr. eicosapentaenoic acid EPA, n.pr See acid, eicosapentaenoic. EPA, n. ) develop a strategy for screening and testing a large number of chemicals found in drinking water drinking water supply of water available to animals for drinking supplied via nipples, in troughs, dams, ponds and larger natural water sources; an insufficient supply leads to dehydration; it can be the source of infection, e.g. leptospirosis, salmonellosis, or of poisoning, e.g. (Safe Drinking Water Act The Safe Drinking Water Act (SDWA) is a United States federal law passed by the U.S. Congress on December 16, 1974. It is the main federal law that ensures safe drinking water for Americans. 1996), and food additives food additives, substances added to foods by manufacturers to prevent spoilage or to enhance appearance, taste, texture, or nutritive value. By quantity, the most common food additives are flavorings, which include spices, vinegar, synthetic flavors, and, in the (Food Quality Protection Act 1996) for their endocrine disruption potential. Consequently, more than 58,000 environmental and industrial chemicals have been identified as candidates for possible experimental testing. QSARs could be used as an inexpensive prescreening tool to prioritize pri·or·i·tize v. pri·or·i·tized, pri·or·i·tiz·ing, pri·or·i·tiz·es Usage Problem v.tr. To arrange or deal with in order of importance. v.intr. the chemicals for further testing (Tong et al. 2002). In this article, we applied a novel consensus QSAR method, called Decision Forest (DF) (Tong et al. 2003b), to classify chemicals into active and inactive categories of ER binding as a priority-setting tool for EDCs. We assessed the applicability domain of the DF models through characterizing the prediction confidence and domain extrapolation for predicting unknown chemicals. Material and Methods Estrogen Receptor Data Sets and Structural Descriptors Two data sets were used, and the ER binding activity for both data sets was obtained from the competitive ER binding assay (Blair et al. 2000; Branham et al. 2002). The first data set, designated ER232, contained 232 chemicals, 131 active, and 101 inactive that were tested in our lab (Fang et al. 2003a). This data set has been extensively used by others and us to develop SAR/QSAR models for predicting ER binding activity (Hong et al. 2002; Shi et al. 2001, 2002; Tong et al. 2002, 2003c). The second data set, designated ER1092, is an aggregation of data from the literature containing 1,092 chemicals, of which 350 are active and 736 are inactive. Inactive means that no activity was detectable in the assay. Both data sets span a wide range of structural diversity and activity. Because a previous study indicated no significant difference in results between two-dimensional (2D) descriptors and 3D descriptors in DF (Tong et al. 2003b), only 2D descriptors were used in this study, and these were computed using Molconn-Z, version 4.07 (http://www.eslc.vabiotech.com/ molconn/). After removing descriptors that were constant across all chemicals in a data set, more than 270 descriptors remained and were used in model development. The structural diversity of both data sets was compared in the chemistry space defined by the 2D descriptors on the first three principle components plot (Figure 1). Not surprisingly, ER1092 was found to span much greater structural diversity than ER232. [FIGURE 1 OMITTED] Decision Forest DF is a consensus modeling technique (Tong et al. 2003b) that combines multiple Decision Tree models (hereafter In the future. The term hereafter is always used to indicate a future time—to the exclusion of both the past and present—in legal documents, statutes, and other similar papers. called trees) in a manner that results in more accurate predictions. Because combining several identical trees produces no gain, the rationale behind DF is use of individual trees that are different (i.e., heterogeneous) yet comparable in their prediction accuracy to represent the association of structure and biological activity. The heterogeneity het·er·o·ge·ne·i·ty n. The quality or state of being heterogeneous. heterogeneity the state of being heterogeneous. requirement assures that each tree uniquely contributes to the combined prediction, whereas the quality comparability requirement assures that each tree contributes equally to the combined prediction. Because a certain degree of noise is always present in biological data, optimizing a tree inherently risks overfitting the noise. DF attempts to minimize overfitting by maximizing the difference among individual trees to cancel some random noise through combining the trees. The maximum difference was achieved by constructing each individual tree using a distinct set of descriptors. Details of the DF algorithm have been reported by Tong et al. (2003b). Briefly, developing a DF model (called forest hereafter) comprises four steps: a) construct and prune prune, popular name for a dried plum. Fruits of the many varieties of Prunus domestica, which are firm-fleshed and dry easily without removal of the stone, are gathered after falling from the tree, dipped in lye solution to prevent fermentation, dried in the a tree; b) develop the next tree based on only the descriptors that have not been used in the previous tree(s); c) repeat steps 1 and 2 until no more trees can be developed; d) classify (i.e., predict) a chemical based on the results of all trees. Each tree in a forest is developed using a variant of the Classification and Regression Tree (CART) method (Breiman et al. 1995) that has two steps: a) tree construction and b) tree pruning pruning, the horticultural practice of cutting away an unwanted, unnecessary, or undesirable plant part, used most often on trees, shrubs, hedges, and woody vines. . During tree construction, the algorithm identifies the descriptors that best divide the chemicals in the parent node into two child nodes. The split maximizes the homogeneity Homogeneity The degree to which items are similar. of the activity population in each child node (e.g., one node predominately contains active chemicals, whereas the other predominately contains inactive chemicals). Then, the child nodes become parent nodes for further splits and splitting continues until chemicals in each node are either in one classification category or cannot be split further to improve the quality of the tree. To avoid overfitting the training data, the tree is then cut down to a desired size using tree cost-complexity pruning (Clark and Pregibon 1997). At the end, the terminal node terminal node - leaf of each tree generally is populated pop·u·late tr.v. pop·u·lat·ed, pop·u·lat·ing, pop·u·lates 1. To supply with inhabitants, as by colonization; people. 2. by different ratios of active versus inactive chemicals. In each tree, the probability (0-1) for an "unknown" chemical to be active is taken to be the percentage of active chemicals in the terminal node to which the chemical belongs. The mean probability value for a chemical in all trees in the forest is calculated to assign the classification of the chemical. Chemicals that have a mean probability > 0.5 are designated active, whereas those that have a mean probability < 0.5 are designated inactive. Prediction Confidence Past results have shown that DF predictions are of high confidence for active chemicals with a large probability value (approaching 1) and for inactive chemicals with low probability value (approaching zero), whereas the low confidence predictions are mostly found for chemicals with probability approaching 0.5 (Tong et al. 2003b). Based on this observation, the following equation was used to calculate the confidence level of a prediction: [1] confidence level for chemical i = |[P.sub.i] 0.51/0.5, where [P.sub.i] is the probability value for chemical i. In this equation, the confidence associated with active and inactive prediction is scaled in parallel to the range between zero and 1. If we assume that a high confidence prediction is defined as confidence level > 0.4, both probability ranges of 0.0 0.3 and 0.7 1.0 will be considered the high confidence (HC) region, and 0.3-0.7 is the low confidence (LC) region. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , a high prediction certainty is expected when a chemical with predicted probability in the range 0.0-0.3 is classified as inactive, or when a chemical with probability in the range 0.7-1.0 is predicted as active. In contrast, prediction confidence is lower for chemicals with probabilities in the range 0.3-0.7. Domain Extrapolation Suppose there is a forest that contains n trees (i = 1, ... n). For the ith tree, the classification of an unknown chemical is determined by only one terminal node that is descendent from the root node (mathematics, data) root node - In a tree, a node with no parents, but which typically has daughters. through a set of "IF-THEN" rules based Using "if-this, do that" rules to perform actions. Rules-based products implies flexibility in the software, enabling tasks and data to be easily changed by replacing one or more rules. on k descriptors [x.sub.ij] (j = 1, ... k) (Figure 2). Let [x.sub.ij](max) and [x.sub.ij](min) denote de·note tr.v. de·not·ed, de·not·ing, de·notes 1. To mark; indicate: a frown that denoted increasing impatience. 2. the maximum and minimum values for [x.sub.ij] across the entire data set and [y.sub.ij] denote the descriptor (1) A word or phrase that identifies a document in an indexed information retrieval system. (2) A category name used to identify data. (operating system) descriptor values of the unknown chemicals corresponding to [x.sub.ij]. If [y.sub.ij] is either > [x.sub.ij](max) or < [x.sub.ij](min), then it is outside the range of the training domain defined by [x.sub.ij] in the "IF-THEN" rule in the path from the root to the terminal node in the ith tree. Thus, the distance beyond the training domain for the unknown chemical in the tree i can be calculated by [d.sub.ij] = | [y.sub.ij] - [x.sub.ij](max)| if [y.sub.ij] > [x.sub.ij](max), [d.sub.ij] = |[y.sub.ij] - [x.sub.ij](min)| if [y.sub.ij] < [x.sub.ij](min), or [d.sub.ij] = 0 if [y.sub.ij]> [x.sub.ij](min) or [y.sub.ij] < [x.sub.ij](max) (within the training domain). For the forest, the total percentage of extrapolation outside the training domain is: [MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression. NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ] [FIGURE 2 OMITTED] The prediction accuracy within domain d is calculated by dividing correct predictions by total number of chemicals in this extrapolated domain. Cross-Validation for Assessing Prediction Confidence and Domain Extrapolation We used 10-fold cross-validation to assess a forest's prediction accuracy for unknown chemicals in different domains of prediction confidence and extrapolation. In this procedure the data set is randomly divided into 10 equal portions, and each portion is excluded once and predicted by the forest produced using the remaining nine portions to train the model. Because the 10-fold cross-validation results vary for each run due to random partitioning To divide a resource or application into smaller pieces. See partition, application partitioning and PDQ. of the data set, we repeated the process 2,000 times. The average result of the multiple cross-validation runs provides an unbiased assessment of a forest for predicting unknown chemicals with respect to prediction confidence and extrapolation sensitivity. Results Table 1 summarizes the fitting results of the forests for both the ER232 and ER1092 data sets. The forests had concordances concordances, n.pl 1. items that are in harmony. 2. homeopathic medicines with affinity to one another and therefore can be used serially during the sequence of treating an illness. This interaction was initially noted by Boenninghausen. around 95% with high specificity and sensitivity. Since a statistically sound fitted model provides limited indication of its capability for predicting chemicals that are not included in training, we applied 2,000 runs of 10-fold cross-validation to assess the prediction confidence and extrapolation sensitivity of the model for predicting unknown chemicals. Figure 3 plots forest prediction accuracy versus prediction confidence for ER232 (Figure 3A) and ER1092 (Figure 3B), respectively. For comparison, the results of the first tree in each forest are also plotted in Figure 3. It is readily apparent that the forests have substantially higher prediction accuracy than the tree across the entire range of confidence levels. Importantly, there is a strong trend of higher accuracy with increasing confidence level. We arbitrarily defined two confidence regions, HC and LC corresponding to confidence levels > 0.4 and < 0.4, respectively. Table 2 compares the HC, LC, and overall prediction accuracy. The HC prediction accuracy is approximately 86%, about 22% higher than the prediction accuracy for the LC regions (~ 64%). There is about 5-7% higher prediction accuracy for the HC regions than for the overall prediction accuracy (Table 2). The HC predictions account for approximately 80% of chemicals for ER232 and approximately 70% for ER1092. [FIGURE 3 OMITTED] On the basis of the same cross-validation results, we also assessed the prediction accuracy for the chemicals as a function of extrapolation outside the training domain. Figure 4 compares for both ER232 and ER1092 the overall prediction accuracy for chemicals within the domain defined by the training set chemicals with accuracy for chemicals falling several degrees of extrapolation outside the training domain, as defined by Equation 2. Generally, the farther away the chemicals were from the training domain, the more loss in prediction accuracy was observed. For ER232 the prediction accuracy was reduced by some 10% for chemicals with a 10% extrapolation. In contrast for ER1092, a major decrease in accuracy only occurred beyond a 30% extrapolation. [FIGURE 4 OMITTED] Table 3 further breaks down the overall prediction accuracy shown ill Figure 4 into the accuracies for the HC and LC regions and also gives the distribution of predictions within the extrapolated domains. For the HC prediction region the trend of decreasing prediction accuracy with increasing extrapolation is consistent with Figure 4 for both ER232 and ER1092. In the HC region for both data sets, prediction accuracy is comparable when extrapolation does not exceed 10%. Prediction accuracy declines more notably for chemicals with > 10% extrapolation for ER232 (some with > 16%), and for chemicals with > 30% extrapolation for ER1092. In contrast the LC region prediction accuracy is consistently lower, as expected, and exhibits no discernable trend with extent of extrapolation. Discussion We used the novel QSAR method DF to develop two classification models to predict ER binding. Such models could be important in prioritizing chemicals for testing based on likelihood of activity. We furthermore objectively and quantitatively assessed the applicability domains of the models by computing computing - computer prediction confidence and domain extrapolation for predicting unknown chemicals with an extensive cross-validation. We found that accuracy in classifying unknown chemicals is dependent on both prediction confidence and domain extrapolation, with the dependence most pronounced for prediction confidence. The prediction accuracy is notably higher for the chemicals in the HC domain than for those in the LC domain. In the HC domain, the forest model based on the large data set ER1092 is much better able to extrapolate extrapolate - extrapolation outside the structural domain defined by the training data than is the forest model based on the small data set ER232 and specifically by some 30% compared with 10%. We propose that the ER1092 model is most suitable for aiding in prioritizing chemicals for testing as possible EDCs. The consistently lower prediction accuracy in the LC domain compared to that of the HC domain seems minimally affected by the extent of extrapolation. For many repeated runs of cross-validation with random partitioning, chemicals in the HC domain average 70-80% of the total for both data sets. It should be noted that the distribution of the chemicals between the high and low confidence regions could vary, when applying the model to a test set. Actual distribution depends largely on how well the training set represents the test set chemicals. In the cross-validation, however, the proportion of chemicals in the HC domain is sensitive to the structural diversity and quality of the training set. The ability to quantify confidence greatly enhances the utility of any classification or QSAR method. The ability to accurately gauge confidence of predictions may also determine how best to apply the model. For example, considering the forest models presented here for use in screening and testing for potential EDCs, the HC and LC domain predictions could be used in separate ways. Chemicals in the HC domain are candidates for applying more rigorous quantitative models (Shi et al. 2001) to calculate binding affinities that are, in turn, used to rank-order chemicals for experimental evaluation. However, for chemicals in the LC domain, more thorough evaluation based on other types of models (Hong et al. 2002; Shi et al. 2002; Tong et al. 2003a) and/or assays should be required. Validation is an important step in developing a useful QSAR model. There are two common validation methods--cross-validation and external validation (Tong et al. 2003c). For most classification methods, descriptor selection is normally executed prior to model training. Without preselection of the descriptor variables, the computational expense of cross-validation could be prohibitive pro·hib·i·tive also pro·hib·i·to·ry adj. 1. Prohibiting; forbidding: took prohibitive measures. 2. . However, preselection of descriptors also constitutes a bias, suggesting that cross-validation may overestimate o·ver·es·ti·mate tr.v. o·ver·es·ti·mat·ed, o·ver·es·ti·mat·ing, o·ver·es·ti·mates 1. To estimate too highly. 2. To esteem too greatly. a model's true predictive accuracy for unknown chemicals. For such cases preselecting an external test set not used in training becomes critical to estimating predictive accuracy. But, setting aside an external test set detrimentally reduces the size of the available training set, resulting in the loss of data that would likely improve the model. ideally, the external test set would be rationally selected to represent the chemicals to which the model would be applied. In reality, however, because of the difficulty of such a task, we are unaware of any model development and test set selection in the literature that incorporates a systematic selection of a representative test set. Bias in descriptor selection is not a factor in DF, where in each step of the cross-validation a new set of descriptors is selected that forms the best forest to represent each random spilt spilt v. A past tense and a past participle of spill1. between training and testing data. The full integration of variable selection with forest construction means that the cross-validation accuracy is more likely to represent the true predictivity. Of course, a prediction test on external data is always desirable because it is a real-world application of the model, but very rarely is sufficient data available to warrant complete exclusion of some data from the training data. In a sense, cross-validation closely resembles the conduct of multiple tests on external data. Thus, we choose a rigorous and extensive cross-validation method to validate the models' predictivities in this study, which is able to assess many possible partitions of the training and test sets and then can provide an unbiased and objective means for assessing a model's quality. A large number of QSAR models for ER binding are reported in the literature (Bradbury et al. 1996; Sadler et al. 1998; Waller et al. 1996; Wiese et al. 1997; Zheng and Tropsha 2000), including our own (Tong et al. 1997a, 1997b, 1998; Xing et al. 1999). Although these models yield good statistical results, none explicitly address and assess the confidence in predicting unknown chemicals. We demonstrated in this study that there could be more than a 22% difference in prediction accuracy for the chemicals with high confidence compared with those with low confidence. Thus, for practical applications, having prediction confidence together with the actual predictions greatly extends the usefulness of QSAR and classification models. In regulatory application, the justification for using such models may very well depend on having measures of confidence in the predictions. Four types of uncertainty are generally recognized as affecting the prediction confidence of a QSAR model (Tong et al. 2003c), and all generally are dependent on either the nature of the data set or the choice of the statistical algorithm. First, predictions from any model are intrinsically no better than the experimental data employed to develop the model. Any limitations of the assay used to generate the training data equally extends to the model's predictions. Second, commonly employed statistical methods vary in their abilities to appropriately capture the functional relationship of structural descriptors and activity. Third, for classification models specifically, class assignment is sensitive to a defined cutoff value to distinguish active from inactive. As the cutoff value is lowered, it is likely that the error will increase, even for a well-designed and well-executed assay. The increased experimental error in close proximity to the cutoff value will be transferred to the classification model, which in turn will increase false prediction rate for chemicals with activity in this region. Fourth, a chemical can be represented by different types of descriptors. We often find that, even for a simple mechanism such as ER-binding, some descriptors may well represent binding dependencies for one structural class, whereas other features will better represent binding dependencies for a different structural class. In such cases, regardless of how rigorously the validation procedure is employed, the model may give incorrect predictions for some chemicals, as the entire chemistry space of active chemicals is unknown. These four types of uncertainty determine the applicability domain of a QSAR model, and adequate assessment of this domain that bounds and guides the model's usage, especially in regulatory application, is paramount. The assessment procedure proposed in this study should be equally applicable to other QSAR methods. REFERENCES Benigni R, Richard AM. 1998. Quantitative structure-based modeling applied to characterization and prediction of chemical toxicity. Methods 14:264-276. Blair R, Fang H, Branham WS, Hass B, Dial SL, Moland CL, et al. 2000. Estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. Toxicol Sci 54:138-153. Bradbury S, Mekenyan O, Ankley G. 1996. Quantitative structure-activity relationships for polychlorinated hydroxybiphenyl estrogen receptor binding affinity--an assessment of conformer flexibility. Environ en·vi·ron tr.v. en·vi·roned, en·vi·ron·ing, en·vi·rons To encircle; surround. See Synonyms at surround. [Middle English envirounen, from Old French environner Toxicol Chem 15:1945-1954. Bradbury SP. 1994. Predicting modes of toxic action from chemical structure: an overview. SAR (Segmentation And Reassembly) The protocol that converts data to cells for transmission over an ATM network. It is the lower part of the ATM Adaption Layer (AAL), which is responsible for the entire operation. See AAL. SAR - segmentation and reassembly QSAR Environ Res 2:89-104. Branham WS, Dial SL, Moland CL, Hass BS, Blair RM, Fang H, et al. 2002. Phytoestrogens Phytoestrogens Compounds found in plants that can mimic the effects of estrogen in the body. Mentioned in: Premenstrual Syndrome phytoestrogens, n.pl plant-derived estrogen analogs. and mycoestrogens bind to the rat uterine uterine /uter·ine/ (u´ter-in) pertaining to the uterus. u·ter·ine adj. Of, relating to, or in the region of the uterus. estrogen receptor. J Nutr 132:658 664. Breiman L, Friedman J, Olshen R, Stone C, Steinberg D, Colla P. 1995. CART: Classification and Regression Trees. Stanford, CA:Salford System. Clark LA, Pregibon D. 1997. Tree-based models. In: Modern Applied Statistics with S-Plus (Venables WN, Ripley BD, eds). New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of :Chambers and Hasties, 413-430. Eriksson L, Jaworska J, Worth A, Cronin M, McDowell RM, Gramatica P. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111:1361-1375. Fang H, Tong W, Branham WS, Moland CL, Dial SL, Hong H, et al. 2003a. Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor The androgen receptor (AR) is a type of nuclear receptor which is activated by binding of either of the androgenic hormones testosterone or dihydrotestosterone.[1] . Chem Res Toxicol 16:1338-1358. Fang H, Tong W, Sheehan D. 2003b. QSABs in receptor-mediated effects: the nuclear receptor In the field of molecular biology, nuclear receptors are a class of proteins found within the interior of cells that are responsible for sensing the presence of hormones and certain other molecules. super-family. J Mol Struct (THEOCHEM) 622:113-125. Food Quality Protection Act of 1996. 1996. Public Law 104-170. Hansch C, Hoekman D, Leo A Leo A ( as known as Leo III ) is an irregular galaxy that is part of the Local Group. It lies 2.25 Mly from Earth. References 1. ^ I. D. Karachentsev, V. E. Karachentseva, W. K. Hutchmeier, D. I. Makarov (2004). , Zhang L, Li P. 1995a. The expanding role of quantitative structure-activity relationships (QSAR) in toxicology. Toxicol Lett 79:45-53. Hansch C, Telzer BR, Zhang L. 1995b. Comparative QSAR in toxicology: examples from teratology teratology /ter·a·tol·o·gy/ (ter?ah-tol´ah-je) that division of embryology and pathology dealing with abnormal development and the production of congenital anomalies.teratolog´ic ter·a·tol·o·gy n. and cancer chemotherapy of aniline aniline (ăn`əlĭn), C6H5NH2, colorless, oily, basic liquid organic compound; chemically, a primary aromatic amine whose molecule is formed by replacing one hydrogen atom of a benzene molecule with an amino mustards. Crit Rev Toxicol 25:67-89. Hileman B. 1997. Hormone disrupter research expands. Chem Eng News 75:24-25. Hong H, Tong W, Fang H, Shi L, Xie Q, Wu J, et al. 2002. Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. Environ Health Perspect 110:29-36. Hopfinger AJ, Tokarski JS. 1997. Practical applications of computer-aided drug design. In: Practical Applications of Computer-Aided Design computer-aided design (CAD) or computer-aided design and drafting (CADD), form of automation that helps designers prepare drawings, specifications, parts lists, and other design-related elements using special graphics- and calculations-intensive (Charifson PS, ed). New York:Marcel Dekker Marcel Dekker is a well-known encyclopedia publishing company with editorial boards found in New York, New York. They are part of the Taylor and Francis publishing group. Initially a textbook publisher, they went to encyclopedia publishing in the late 1990's. , 105-164. Johnson M, Maggiora GM. 1990. Concepts and Applications of Molecular Similarity. New York:Wiley. Kubinyi H, Folkers G, Martin YC. 1998. 313 QSAR in drug design--recent advances. Perspect Drug Disc Design 12:R5-R7. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. 1997. Experimental and computational approaches to estimate solubility solubility Degree to which a substance dissolves in a solvent to make a solution (usually expressed as grams of solute per litre of solvent). Solubility of one fluid (liquid or gas) in another may be complete (totally miscible; e.g. and permeability permeability /per·me·a·bil·i·ty/ (per?me-ah-bil´i-te) the property or state of being permeable. per·me·a·bil·i·ty n. 1. The property or condition of being permeable. 2. in drug discovery and development settings. Adv Drug Delivery Rev 23:3-25. Russom CL, Bradbury SP, Carlson AR. 1995. Use of knowledge bases and QSARs to estimate the relative ecological risk of agrichemicals: a problem formulation exercise. SAR QSAR Environ Res 4:83-95. Sadler BR, Cho SJ, Ishaq KS, Chae K, Korach KS. 1998. Three-dimensional quantitative structure-activity relationship study of nonsteroidal non·ste·roi·dal or non·ster·oid adj. Not being or containing a steroid. n. A drug or other substance not containing a steroid. estrogen receptor ligands using the comparative molecular field analysis/cross-validated [r.sup.2]-guided region selection approach. J Med Chem 41:2261-2267. Safe Drinking Water Act of 1996. 1996. Public Law 104-182. Schultz TW, Seward JR. 2000. Health-effects related structure-toxicity relationships: a paradigm for the first decade of the new millennium. Sci Total Environ 249:73-84. Shi L, Tong W, Fang H, Xie Q, Hong H, Perkins R, et al. 2002. An integrated "4-phase" approach for setting endocrine disruption screening priorities--Phase I and II predictions of estrogen receptor binding affinity. SAR QSAR Environ Res 13:69-88. Shi LM, Fang 14, Tong W, Wu J, Perkins R, Blair RM, et al. 2001. QSAR models using a large diverse set of estrogens Estrogens Hormones produced by the ovaries, the female sex glands. Mentioned in: Acne, Polycystic Ovary Syndrome estrogens (es´trōjenz), n. . J Chem Inf Comput Sci 41:186-195. Tong W, Fang H, Hong H, Xie Q, Perkins R, Anson JF, et al. 2003a. Regulatory application of SAR/QSAR for priority setting of endocrine disruptors--a perspective. Pure and Appl Chem 75:2375-2388. Tong W, Fang H, Hong H, Xie Q, Perkins R, Sheehan D. 2004. Receptor-mediated toxicity: QSARs for oestrogen oes·tro·gen n. Variant of estrogen. oestrogen see estrogen. receptor binding and priority setting of potential oestrogenic oestrogenic (ōˈ·es·tr endocrine disruptors. In: Predicting Chemical Toxicity and Fate (Cronin MTB MTB Mountain Bike MTB Mycobacterium Tuberculosis MTB Marshall Tucker Band MTB Motor Torpedo Boat MTB Making The Band (TV show) MTB Minus The Bear (band) MTB Mozilla Thunderbird , Livingstone D, eds). Boca Raton Boca Raton (bō`kə rətōn`), city (1990 pop. 61,492), Palm Beach co., SE Fla., on the Atlantic; inc. 1925. Boca Raton is a popular resort and retirement community that experienced significant industrial development in the 1970s and 80s. , FL:CRC (Cyclical Redundancy Checking) An error checking technique used to ensure the accuracy of transmitting digital data. The transmitted messages are divided into predetermined lengths which, used as dividends, are divided by a fixed divisor. Press, 285-314. Tong W, Hong H, Fang H, Xie Q, Perkins R. 2003b. Decision forest: Combining the predictions of multiple independent decision tree models. J Chem Inf Comput Sci 43:525-531. Tong W, Lowis DR, Perkins R, Chen Y, Welsh WJ, Goddette DW, et al. 1998. Evaluation of quantitative structure-activity relationship methods for large-scale prediction of chemicals binding to the estrogen receptor. J Chem Inf Comput Sci 38:669-677. Tong W, Perkins R, Fang H, Hong H, Xie Q, Branham SW, et al. 2002. Development of quantitative structure-activity relationships (QSARs) and their use for priority setting in the testing strategy of endocrine disruptors. Regul Res Perspect 1:1-16. Tong W, Perkins R, Strelitz R, Collantes ER, Keenan S Keenan is a male Irish name which means "Ancient, Distant". Keenan is an anglicisation of the Irish name Cianáin. The Keenans were historians to the McGuire clan. , Welsh WJ, et al. 1997a. Quantitative structure-activity relationships (QSARs) for estrogen binding to the estrogen receptor: predictions across species. Environ Health Perspect 105:1116-1124. Tong W, Perkins R, Xing L, Welsh WJ, Sheehan DM. 1997b. QSAR models for binding of estrogenic estrogenic /es·tro·gen·ic/ (es?tro-jen´ik) 1. estrus-producing; having the properties of, or similar to, an estrogen. 2. pertaining to, having the effects of, or similar to an estrogen. compounds to estrogen receptor alpha and beta subtypes. Endocrin 138:4022-4025. Tong W, Welsh WJ, Shi L, Fang H, Perkins R. 2003c. Structure-activity relationship approaches and applications. Environ Toxicol Chem 22:1680-1695. Waller CL, Oprea TI, Chae K, Park HK, Korach KS, Laws SC, et al. 1996. Ligand-based identification of environmental estrogens. Chem Res Toxicol 9:1240-1248. Wiese TE, Polin LA, Palomino Palomino Colour type of horse distinguished by its cream, yellow, or gold coat and a white or silver mane. It is popular in pleasure and parade classes. Palominos may conform to the breed types of several light breeds, including the Arabian horse and the American Quarter Horse. E, Brooks SC. 1997. Induction of the estrogen specific mitogenic response of MCF-7 cells by selected analogues of estradiol-17 beta: a 3D QSAR study. J MedChem 40:3659-3669. Xing L, Welsh WJ, Tong W, Perkins R, Sheehan DM. 1999. Comparison of estrogen receptor alpha and beta subtypes based on comparative molecular field analysis (CoMFA). SAR QSAR Environ Res 10:215 237. Zheng W, Tropsha A. 2000. A novel variable selection QSAR approach based on the k-nearest neighbor principle. J Chem Inf Comput Sci 40:185-194. Weida Tong, (1) Qian Xie, (2) Huixiao Hong, (2) Leming Shi, (1) Hong Fang, (2) and Roger Perkins (2) (1) Center for Toxicoinformatics, and (2) Bioinformatics Laboratory, National Center for Toxicological Research The National Center for Toxicological Research is the branch of the United States Food and Drug Administration which conducts research to define biological mechanisms of action underlying the toxicity of products regulated by the FDA. It is located off Interstate 530 in Arkansas. , Food and Drug Administration, Jefferson, Arkansas, USA Address correspondence to W. Tong, Center for Toxicoinformatics, Division of Biometry biometry /bi·om·e·try/ (bi-om´e-tre) the application of statistical methods to biological phenomena. bi·om·e·try n. The statistical analysis of biological data. Also called biometrics. and Risk Assessment, National Center for Toxicological Research, 3900 NCTR NCTR National Center for Toxicological Research NCTR National Council on Teacher Retirement NCTR National Center for Transit Research NCTR Non-Cooperative Target Recognition NCTR Northern Colorado Trail Riders NCTR Non-Cooperative Threat Recognition Rd., HFT HFT Harbor Freight Tools HFT High Function Terminal HFT Hammerfest, Norway (Airport Code) HFT Hot for Teacher (Van Halen song and tribute band) HFT Human Factors in Telecommunications 020, Jefferson, AR 72079, USA. Telephone: (870) 543-7142. Fax: (870) 543-7662. E-mail: wtong@nctr.fda.gov The authors declare they have no competing financial interests. Received 26 March 2001; accepted 15 July 2004.
Table 1. Statistics of the forest models based on ER232 and ER1092.
ER232 FR1092
Number of chemicals 232 1092
Number (%) of misclassifications 5 (2.16%) 50 (4.58%)
Number of trees combined 6 4
Number of descriptors used 79 138
Accuracy 96.6% 95.4%
Specificity 96.0% 91.0%
Sensitivity 96.9% 97.6%
Table 2. The HC and LC predictions from 2,000 runs of 10-fold
cross-validation for ER232 and ER1092.
Confidence ER232
regions Accuracy (%) Percentage of chemicals
HC 86.6 79.2
LC 63.8 20.8
All 81.9 100
Confidence ER1092
regions Accuracy (%) Percentage of chemicals
HC 86.3 69.9
LC 64.7 30.1
All 79.7 100
Abbreviations: HC, high confidence; LC, low confidence.
Table 3. Prediction accuracy in different regions of confidence and
extrapolation derived from 2,000 runs of 10-fold cross-validation
for ER232 and ER1092.
Data set Extrapolation [(d) %]
ER232 0
0-10
10-20
20-30
> 30
ER1092 0
0-10
10-20
20-30
> 30
HC region
Data set Accuracy (%) No. of predictions
ER232 87.8 349,595
79.7 10,442
61.1 1,325
65.3 853
39.7 5,614
ER1092 86.4 1,511,180
89.4 6,896
88.5 3,914
96.8 1,209
48.9 3,560
LC region
Data set Accuracy (%) No. of predictions
ER232 64.7 83,393
55.8 6,216
50.5 2,651
63.9 1,086
65.9 2,825
ER1092 64.4 645,177
61.5 5,135
68.1 3,453
75.3 959
54.1 2,517
|
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion