Discriminating different classes of toxicants by transcript profiling.Male rats were treated with various model compounds or the appropriate vehicle controls. Most substances were either well-known hepatotoxicants or showed hepatotoxicity hepatotoxicity (hepˑ· preclinical phase, preclinical trial . The aim of the present study was to determine if biological samples from rats treated with various compounds can be classified based on gene expression profiles. In addition to gene expression analysis using microarrays, a complete serum chemistry profile and liver and kidney histopathology his·to·pa·thol·o·gy n. The science concerned with the cytologic and histologic structure of abnormal or diseased tissue. Histopathology The study of diseased tissues at a minute (microscopic) level. were performed. We analyzed hepatic gene expression profiles using a supervised learning Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. method (support vector machines Please [improve the article] or discuss this issue on the talk page. ; SVMs) to generate classification rules and combined this with recursive See recursion. recursive - recursion feature elimination to improve classification performance and to identify a compact subset of probe sets with potential use as biomarkers. Two different SVM SVM Support Vector Machines SVM School of Veterinary Medicine SVM Solaris Volume Manager SVM Space Vector Modulation SVM Storage Virtualization Manager (StoreAge) SVM Service Module (also abbreviated as S/M) algorithms were tested, and the models obtained were validated with a compound-based external cross-validation approach. Our predictive models were able to discriminate between hepatotoxic hep·a·to·tox·ic adj. Damaging or destructive to the liver. hepatotoxic causing liver damage. and nonhepatotoxic compounds. Furthermore, they predicted the correct class of hepatotoxicant in most cases. We provide an example showing that a predictive model built on transcript profiles from one rat strain can successfully classify profiles from another rat strain. In addition, we demonstrate that the predictive models identify nonresponders and are able to discriminate between gene changes related to pharmacology and toxicity. This work confirms the hypothesis that compound classification based on gene expression data is feasible. Key words: liver, microarray, predictive toxicology toxicology, study of poisons, or toxins, from the standpoint of detection, isolation, identification, and determination of their effects on the human body. Toxicology may be considered the branch of pharmacology devoted to the study of the poisonous effects of drugs. , rat, support vector machines, toxicogenomics. ********** Microarray technology is a powerful tool allowing simultaneous investigation of gene expression changes of thousands of genes in response to various stimuli. Large-scale and even whole transcriptome The transcriptome is the set of all messenger RNA (mRNA) molecules, or "transcripts", produced in one or a population of cells. The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. analyses have successfully been applied in various fields including variation in budding yeast (Brem et al. 2002), development of Drosophila Drosophila: see fruit fly. drosophila Any member of about 1,000 species in the dipteran genus Drosophila, commonly known as fruit flies but also called vinegar flies. Some species, particularly D. melanogaster (Arbeitman et al. 2002), variation in primates (Enard et al. 2002), and human cancer (Ramaswamy et al. 2003). Class identification and prediction of defined end points using gene expression arrays have shown promising results in ontology ontology: see metaphysics. ontology Theory of being as such. It was originally called “first philosophy” by Aristotle. In the 18th century Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories (Alizadeh et al. 2001; Ramaswamy et al. 2001; Van de Vijver et al. 2002). The application of gene expression analysis in toxicology has led to the emergence of the discipline of toxicogenomics. We anticipate that toxicogenomics will greatly improve the sensitivity, accuracy, and speed of toxicologic investigations. Toxicogenomics assumes that toxicity is accompanied by changes in gene expression that are either causally linked or represent a response to toxicity. Indeed, researchers have been able to link toxicity with expression changes of single genes or whole groups of genes (Hamadeh et al. 2002c; Ruepp et al. 2002; Suter et al. 2003). A transcriptome-wide overview of altered expression patterns can assist the mechanistic mech·a·nis·tic adj. 1. Mechanically determined. 2. Of or relating to the philosophy of mechanism, especially one that tends to explain phenomena only by reference to physical or biological causes. understanding of underlying changes induced by chemicals (Hamadeh et al. 2002b). This requires a comprehensive knowledge of the biological system under investigation, and only known genes are considered for analysis. This functional approach is also promising for the generation and testing of toxicity hypotheses (Donald et al. 2002; Zhang et al. 2002) or the identification of perturbed per·turb tr.v. per·turbed, per·turb·ing, per·turbs 1. To disturb greatly; make uneasy or anxious. 2. To throw into great confusion. 3. pathways (Wang et al. 1999; Zimmermann et al. 2003). Furthermore, identification of toxic mechanisms is valuable for risk assessment because it allows extrapolation (mathematics, algorithm) extrapolation - A mathematical procedure which estimates values of a function for certain desired inputs given values for known inputs. If the desired input is outside the range of the known values this is called extrapolation, if it is inside then of the hazard in humans. Predictive toxicology is based on the hypothesis that similar treatments leading to the same end point will share comparable changes in gene expression. Several investigators have used gene expression profiling Microarray technology is often used for gene expression profiling. It makes use of the sequence resources created by the genome sequencing projects and other sequencing efforts to answer the question, for the classification of toxicants in rodents (Bulera et al. 2001; Hamadeh et al. 2002a; Thomas et al. 2001; Waring et al. 2001b). These studies varied in design and number of compounds investigated, but all indicated the potential of toxicogenomics in predictive risk assessment. A major challenge in predicting toxicologic end points based on transcriptional data lies in discriminating changes due to interanimal variation or experimental background noise from treatment-related changes. Compounds may directly affect expression of certain well-characterized, compound-specific genes. These compound-specific genes are not suited for discrimination between different classes of compounds. Drugs, in contrast to other toxic substances, have pharmacologic as well as toxicologic effects that might affect gene expression. These two effects can, but need not, be related. Despite these confounding confounding when the effects of two, or more, processes on results cannot be separated, the results are said to be confounded, a cause of bias in disease studies. confounding factor factors, gene expression analysis after treatment with various compounds that result in the same toxicologic end point should enable identification of a toxic fingerprint. Various methods are used to analyze large-scale gene expression data. Unsupervised methods widely reported in the literature include agglomerative ag·glom·er·ate tr. & intr.v. ag·glom·er·at·ed, ag·glom·er·at·ing, ag·glom·er·ates To form or collect into a rounded mass. adj. Gathered into a rounded mass. n. 1. clustering (Eisen et al. 1998), divisive clustering (Alon et al. 1999), K-means clustering (Everitt 1974), self-organizing maps This article appears to contradict another article. Please see discussion on the linked talk page. A self-organizing map (SOM) is a type of artificial neural network that is trained using unsupervised learning to produce low-dimensional representation of the training (Kohonen 1995), and principal component analysis (Joliffe 1986). Support vector machines (SVMs), on the other hand, belong to the class of supervised learning algorithms. Originally introduced by Vapnik and co-workers (Boser et al. 1992; Vapnik 1998), they perform well in different areas of biological analysis (Scholkopfand Smola 2002). Given a set of training examples, SVMs are able to recognize informative patterns in input data and make generalizations on previously unseen samples. Like other supervised methods, SVMs require prior knowledge of the classification problem, which has to be provided in the form of labeled training data. Used in a growing number of applications, SVMs are particularly well suited for the analysis of microarray expression data because of their ability to handle situations where the number of features (genes) is very large compared with the number of training patterns (microarray replicates). Several studies have shown that SVMs typically tend to outperform other classification techniques in this area (Brown et al. 2000; Furey et al. 2000; Yeang et al. 2001). in addition, the method proved effective in discovering informative features such as genes that are especially relevant for the classification and therefore might be critically important for the biological processes under investigation. A significant reduction of the gene number used for classification is also crucial if reliable classifiers are to be obtained from microarray data. A proposed method to discriminate the most relevant gene changes from background biological and experimental variation is gene shaving (Hastie et al. 2000). However, we chose another method, recursive feature elimination (RFE 1. RFE - Request For Enhancement (compare RFC). 2. RFE - (From "Radio Free Europe", Bellcore and Sun) Radio Free Ethernet. A system originated by Peter Langston for broadcasting audio among Sun SPARCstations over the Ethernet. ) (Guyon et al. 2002), to create sets of informative genes. The liver is a primary site for drug metabolism Drug Metabolism/Interactions Definition Drug metabolism is the process by which the body breaks down and converts medication into active chemical substances. Precautions Drugs can interact with other drugs, foods, and beverages. and is frequently involved in adverse drug reactions adverse drug reaction, n a detrimental outcome from a drug. Two types of ADRs exist: Type 1 results from dosage mismatch and Type 2 from rare conditions often as a consequence of a small dose. See also risk or sensitive type. . Thus, hepatotoxic compounds were chosen for our toxicogenomic studies. In this study 28 hepatotoxic compounds and 3 nonhepatotoxic compounds were investigated. Time-matched controls dosed with the corresponding vehicles were used to allow discrimination between temporal and compound-induced changes. This is essential for large-scale transcriptome analysis, as extensive circadian circadian /cir·ca·di·an/ (ser-ka´de-an) denoting a 24-hour period; see under rhythm. cir·ca·di·an adj. Relating to biological variations or rhythms with a cycle of about 24 hours. gene expression patterns have recently been reported in the liver and heart of the mouse (Kita et al. 2002; Panda et al. 2002; Storch et al. 2002). Depending on the substance and category of toxicity, different time points were chosen for classification, as manifestation of toxicity was observed earlier for certain compounds than for others. Clinical chemistry, hematology, and histopathology were used to assess toxicity of each individual animal. Models for discrimination of toxic and nontoxic substances as well as models specifying the category of toxicity were built using data from a variety of toxicity studies. The hypothesis that unknown blinded compounds could accurately be classified based solely on gene expression profiles was subsequently tested. In the majority of cases, SVMs were able to predict toxicity as well as the mode of toxicity. The potential for obtaining the same level of predictivity with only a small number of carefully selected genes was investigated. This subset of genes includes potential biomarkers for hepatotoxicity. Materials and Methods Animal Treatment Permission for animal studies was obtained from the local regulatory agencies regulatory agency Independent government commission charged by the legislature with setting and enforcing standards for specific industries in the private sector. The concept was invented by the U.S. , and all study protocols were in compliance with animal welfare guidelines. Male HanBrl:Wistar rats approximately 12 weeks of age (300 g [+ or -] 20%) were obtained from BRL BRL In currencies, this is the abbreviation for the Brazilian Real. Notes: The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion. (Fullinsdorf, Switzerland). The animals were housed individually in Macrolone (Tecniplast GmbH, Hohenpeissenberg, Germany) cages with wood shavings as bedding at 20[degrees]C and 50% relative humidity relative humidity n. The ratio of the amount of water vapor in the air at a specific temperature to the maximum amount that the air could hold at that temperature, expressed as a percentage. in a 12-hr light/dark rhythm with free access to water and Kliba 3433 rodent rodent, member of the mammalian order Rodentia, characterized by front teeth adapted for gnawing and cheek teeth adapted for chewing. The Rodentia is by far the largest mammalian order; nearly half of all mammal species are rodents. pellets (Provimi Kliba AG, Kaiseraugst, Switzerland). For the WY14643 study, male Sprague-Dawley Crl:CD(SD)IGS IGS - Internet Go Server. .BR rats approximately 6 weeks of age (200 g [+ or -] 20%) were obtained from Charles River Charles River River, eastern Massachusetts, U.S. The longest river wholly in the state, it flows into Boston Bay after a course of about 80 mi (130 km). Navigable for about 7 mi (11 km), its estuary separates the cities of Boston and Cambridge. Ltd. (Margate, U.K.) Animals were dosed with test compounds or the corresponding vehicles orally or by ip, iv, or sc injections and sacrificed at specified times by C[O.sub.2] inhalation (Table 1). Immediately preceding sacrifice, terminal blood samples for clinical chemistry investigations were collected from the retroorbital sinus. Liver samples from the left medial medial /me·di·al/ (me´de-il) 1. situated toward the median plane or midline of the body or a structure. 2. pertaining to the middle layer of structures. me·di·al adj. lobe lobe (lob) 1. a more or less well-defined portion of an organ or gland. 2. one of the main divisions of a tooth crown. were removed immediately and placed into RNALater (Ambion, Austin, TX, USA) for RNA RNA: see nucleic acid. RNA in full ribonucleic acid One of the two main types of nucleic acid (the other being DNA), which functions in cellular protein synthesis in all living cells and replaces DNA as the carrier of genetic extraction and gene expression analysis (Table 1). The exposure period for each compound was based on reports in the literature and results from pilot studies using histopathology and clinical chemistry anchoring to assess toxicity. Thus, for unknown compounds best results are expected if several time points (e.g., 6 hr, 1 day, and 1 week) are tested. Clinical Chemistry The following determinations were made from the serum: blood urea nitrogen blood urea nitrogen n. Abbr. BUN Nitrogen in the form of urea in the blood or serum, used as a indicator of kidney function. Blood urea nitrogen (BUN) (BUN), alanine aminotransferase alanine aminotransferase /al·a·nine ami·no·trans·fer·ase/ (ah-me?no-trans´fer-as) alanine transaminase. alanine aminotransferase n. Abbr. ALT See SGPT. (ALT), aspartate aminotransferase aspartate aminotransferase n. Abbr. AST See SGOT. aspartate aminotransferase an enzyme that catalyzes the reversible transfer of an amino group: $$\eqalign $$ (AST (AST Computer, Irvine, CA) A PC manufacturer founded in 1980 by Albert Wong, Safi Quershey and Tom Yuen (A, S and T). It offered a complete line of PCs that sold through its dealer channel. ), [gamma]-glutamyltransferase (GGT GGT ?-glutamyl transferase. GGT Gammaglutamyltransferase, see there ), lactate dehydrogenase lactate dehydrogenase n. Abbr. LDH Any of a class of enzymes found in the liver, kidneys, striated muscle, and heart muscle that catalyze the reversible conversion of pyruvate and lactate. (LDH LDH -lactate dehydrogenase. LDH abbr. lactate dehydrogenase LDH lactic acid dehydrogenase; see lactate dehydrogenase. ), sorbitol dehydrogenase Sorbitol dehydrogenase is an enzyme in carbohydrate metabolism converting sorbitol, the sugar alcohol form of glucose, into fructose. Together with aldose reductase, it provides a way for the body to produce fructose from glucose without using ATP. (SDH (Synchronous Digital Hierarchy) The European counterpart to SONET. See SONET. SDH - Synchronous Digital Hierarchy ), alkaline phosphatase alkaline phosphatase /al·ka·line phos·pha·tase/ (ALP) (fos´fah-tas) an enzyme that catalyzes the cleavage of orthophosphate from orthophosphoric monoesters under alkaline conditions. (ALP (language) ALP - A list processing extension of Mercury Autocode. ["ALP, An Autocode List-Processing Language", D.C. Cooper et al, Computer J 5:28-31, 1962]. ), 5'-nucleotidase (5'-NT), glutamate dehydrogenase Glutamate dehydrogenase is an enzyme, present in mitochondria of eukaryotes, as are some of the other enzymes required for urea synthesis, that converts glutamate to α-Ketoglutarate, and vice versa. (GLD GLD Gold GLD Gelderland (Dutch province) GLD Gladstone (Queensland, Australia) GLD Government Logistics Department (Hong Kong) GLD Glider GLD Generalized Lambda Distribution ), urea, glucose, creatinine creatinine /cre·at·i·nine/ (kre-at´i-nin) an anhydride of creatine, the end product of phosphocreatine metabolism; measurements of its rate of urinary excretion are used as diagnostic indicators of kidney function and muscle mass. , bilirubin Bilirubin The predominant orange pigment of bile. It is the major metabolic breakdown product of heme, the prosthetic group of hemoglobin in red blood cells, and other chromoproteins such as myoglobin, cytochrome, and catalase. , total protein, albumin albumin (ălby `mən) [Lat.,=white of egg], member of a class of water-soluble, heat-coagulating proteins. Albumins are widely distributed in plant and animal tissues, e.g. , globulins GlobulinsA group of proteins in blood plasma whose levels can be measured by electrophoresis in order to diagnose or monitor a variety of serious illnesses. Mentioned in: Protein Electrophoresis , total cholesterol, triglycerides Triglycerides Fatty compounds synthesized from carbohydrates during the process of digestion and stored in the body's adipose (fat) tissues. High levels of triglycerides in the blood are associated with insulin resistance. , phospholipids, fatty acids fatty acid, any of the organic carboxylic acids present in fats and oils as esters of glycerol. Molecular weights of fatty acids vary over a wide range. The carbon skeleton of any fatty acid is unbranched. Some fatty acids are saturated, i.e. , bile acids bile acid /bile ac·id/ (bil as´id) any of the steroid acids derived from cholesterol; classified as primary, those synthesized in the liver, e.g. , sodium, potassium, chloride, calcium, and phosphorus phosphorus (fŏs`fərəs) [Gr.,=light-bearing], nonmetallic chemical element; symbol P; at. no. 15; at. wt. 30.97376; m.p. 44.1°C;; b.p. about 280°C;; sp. gr. 1.82 at 20°C;; valence −3, +3, or +5. . Histology histology (hĭstŏl`əjē), study of the groups of specialized cells called tissues that are found in most multicellular plants and animals. Representative liver samples were fixed in 10% neutral-buffered formalin formalin /for·ma·lin/ (for´mah-lin) formaldehyde solution. for·ma·lin n. An aqueous solution of formaldehyde that is 37 percent by weight. . One additional liver sample from the cranial cranial /cra·ni·al/ (-al) 1. pertaining to the cranium. 2. toward the head end of the body; a synonym of superior in humans and other bipeds. cra·ni·al adj. half of the left lateral lobe was placed in Carnoy fixative fixative /fix·a·tive/ (fik´sit-iv) an agent used in preserving a histological or pathological specimen so as to maintain the normal structure of its constituent elements. fix·a·tive adj. for glycogen glycogen (glī`kəjən), starchlike polysaccharide (see carbohydrate) that is found in the liver and muscles of humans and the higher animals and in the cells of the lower animals. staining. All samples were processed using routine procedures and embedded Inserted into. See embedded system. in Paraplast (Sherwood Medical Ltd., Tullamore, Ireland). Tissue sections approximately 2-3 [micro] were cut and stained with hematoxylin hematoxylin /he·ma·tox·y·lin/ (he?mah-tok´si-lin) an acid coloring matter from the heartwood of Haematoxylon campechianum; used as a histologic stain and also as an indicator. and eosin eosin /eo·sin/ (e´o-sin) any of a class of rose-colored stains or dyes, all being bromine derivatives of fluorescein; eosin Y, the sodium salt of tetrabromofluorescein, is much used in histologic and laboratory procedures. or periodic acid-Schiff per·i·od·ic acid-Schiff adj. Abbr. PAS Of, relating to, or being a reaction that tests for polysaccharides and related substances through the treatment of tissue sections with periodic acid stain and Schiff's reagent. for glycogen. Fat Red 7B stain (Fluka, Buchs, Switzerland) was performed on frozen formalin-fixed sections to visualize lipid deposits. Sample Preparation and Hybridization hybridization /hy·brid·iza·tion/ (hi?brid-i-za´shun) 1. crossbreeding; the act or process of producing hybrids. 2. molecular hybridization 3. RNA isolation, processing, and hybridization were essentially carried out as recommended by Affymetrix (Affymetrix, Santa Clara Santa Clara, city, Cuba Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba. , CA, USA) with minor modifications [Supplemental data (http://ehp. niehs.nih.gov/txg/members/2004/7036/ 7036supplement.pdf)]. Data Acquisition and Preprocessing A preliminary processing of data in order to prepare it for the primary processing or for further analysis. The term can be applied to any first or preparatory processing stage when there are several steps required to prepare data for the user. Primary data were obtained by laser scanning (Hewlett Packard, Palo Alto Palo Alto, city, California Palo Alto (păl`ō ăl`tō), city (1990 pop. 55,900), Santa Clara co., W Calif.; inc. 1894. Although primarily residential, Palo Alto has aerospace, electronics, and advanced research industries. , CA, USA) and collated using the Affymetrix Microarray Suite Version 5.0 software (Affymetrix). Before performing any downstream analysis, data were preprocessed in a standardized way. First, the gene expression values of every single microarray experiment were rescaled to a mean value of zero and a standard deviation In statistics, the average amount a number varies from the average number in a series of numbers. (statistics) standard deviation - (SD) A measure of the range of values in a set of numbers. of 1 to establish comparability across all samples. Because single outlying expression values occur rather frequently and are likely to affect any analysis method, a modified version of the Nalimov outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results. outlier an extremely high or low value lying beyond the range of the bulk of the data. test (Kaiser and Gottschalk 1972) was applied to identify these potential artifacts artifacts see specimen artifacts. . Expression values reported as outliers were replaced by the respective mean values. The test was performed separately for each classification group (i.e., class of toxicity). In contrast to the published method, our modified version does only one round of outlier removal rather than multiple iterations. A normal distribution model is calculated for the expression levels to be tested, and outliers are removed at a 99% confidence level. As a final preprocessing step, the expression values were rescaled so that the expression of each single gene across multiple arrays has a mean value of zero and a standard deviation of 1. This transformation increases the numerical stability In the mathematical subfield of numerical analysis, numerical stability is a desirable property of numerical algorithms. The precise definition of stability depends on the context, but it is related to the accuracy of the algorithm. of the SVM algorithm and facilitates the assessment of the relative importance (weight) of single genes within a reduced feature set. Again, this was performed separately for each classification group. Support Vector Machines A detailed introduction into theory and application of SVMs is beyond of the scope of this article. We refer the interested reader to the available literature (Cristianini and Shawe-Taylor 2000; Scholkopf and Smola 2002) and the Supplemental Data (http://ehp.niehs.nih.gov/txg/members/ 2004/7036/7036supplement.pdf)]. All SVM classifications were based on the free available software package LIBSVM LIBSVM Library for Support Vector Machines , 2.36, which was downloaded from the World Wide Web (Chang and Lin 2001). The source code was modified according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. our needs and compiled to run on the operating system operating system (OS) Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs. IRIX A Unix-based operating system from SGI that is used in its computer systems from desktop to supercomputer. It is an enhanced version of Unix System V Release 4. IRIX integrates the X Window system with OpenGL, creating the first real time 3D X environment. , version 6.5, (Silicon Graphics, Inc., Mountain View, CA, USA). Extensions such as parameter optimization, feature selection, enhanced cross-validation (CV) options, the one-versus-all training scheme, and report generation were implemented in a C library on top of LIBSVM. Choice of Parameters A linear kernel k([x.sub.i], [x.sub.j]) = ([x.sub.i], [x.sub.j] was chosen for the SVM, as higher order correlation functions The introduction to this article provides insufficient context for those unfamiliar with the subject matter. Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page. could easily lead to overinterpreration of the data, given the unfavorable ratio of features and replicates. LIBSVM offers two different SVM formulations for classification: C-SVM and [upsilon up·si·lon or yp·si·lon n. Symbol The 20th letter of the Greek alphabet. ]-SVM. These formulations use
different parameters for adjusting the accuracy versus margin tradeoff
but should produce comparable solutions. We tried both formulations and
tuned their respective parameters for optimal CV performance.To handle the multiple class situation, we applied the one-versus-all training paradigm. Using this approach, a set of binary SVMs is created, each of which separates the samples of one class (positive examples) from all remaining training data (negative examples). Because the number of negative examples usually outweighs the number of positive examples in this scheme, there is always a risk of losing sensitivity for the smaller class. However, practice showed that no additional class bias had to be introduced after appropriate values for the C or [upsilon] parameter had been determined for each single SVM. Optimization of these crucial parameters was done in an iterative it·er·a·tive adj. 1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness. 2. Grammar Frequentative. Noun 1. manner. We typically started with either a C value set to 1.0 or a [upsilon] value of 0.5 and performed a complete gene selection run. Optimization of SVMs using different feature numbers suggested improvements in the initial settings as well as a sensible range for the parameters. Feature selection was then repeated with the new settings, and individual SVMs were again tuned to determine good parameters for different gene numbers. This process led to a noticeably improved classification performance. Classifier Validation The predictive power The predictive power of a scientific theory refers to its ability to generate testable predictions. Theories with strong predictive power are highly valued, because the predictions can often encourage the falsification of the theory. of individual SVMs was primarily rated by their CV performance. However, as our main interest was to estimate the generalization properties of classifiers with respect to new compounds, we did not select the frequently applied leave-one-out or randomization-based schemes. Instead, all microarrays that resulted from the treatment with a certain compound (regardless of dose and time point) were left out as a whole group in one CV cycle. Whenever CV is combined with feature selection, special care must be taken to avoid any bias leading to overoptimistic o·ver·op·ti·mis·tic adj. Excessively optimistic. o ver·op ti·mism n. performance estimates. Therefore, we
applied external CV exclusively where feature selection was done
separately for each group of left-out examples, thereby avoiding the use
of information from the excluded examples in the feature selection
process. Although the final classifier is built on all available
training examples, the described method was used to determine the
optimal number of genes as well as the parameter settings. As a
consequence, we expect that the resulting classifiers are less
influenced by the given selection of compounds and that CV provides a
more realistic estimate for the generalization on new compounds.Quantitative measures for training and CV performance were sensitivity and specificity values as well as the Matthews correlation coefficient The Matthews Correlation Coefficient is used in machine learning as a measure of the quality of binary (two class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are (MCC (The Microelectronics and Computer Technology Corporation, Austin, TX) The first high-tech research and development consortium in the U.S., created in 1982 by leading companies within the electronics industry. ), Sensitivity = TP / TP + FN Specificity = TN / TN + FP MCC = TP x TN - FP x FN / [summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) over ((TP + FP)(TP + FN)(TN + FN))], where TP = number of true positives, TN = number of true negatives, FP = number of False positives, and FN = percentage of false negatives. The MCC is commonly used as a measure of the predictive power of a system that gives categoric variables as output (Matthews 1975). It was our main performance indicator. When several SVMs showed exactly the same CV result, performance on the training set was also taken into account. If this still yielded equal results, we finally selected the simplest model (i.e., smallest number of support vectors and smallest number of features). Gene Selection Although SVMs can easily tolerate the high-dimensional gene space typical of microarray studies, most of the features are usually irrelevant for the classification task and only introduce noise. To obtain a meaningful decision function that generalizes well, the number of variables must be reduced as much as possible. Various methods exist for selecting discriminating features for classification purposes; most deal with variables individually. RFE overcomes some deficiencies of this univariate approach (Guyon et al. 2002). Basically, RFE is a greedy backward elimination method. Starting with all features (except for Affymetrix control genes), a ranking is produced based on the relative importance of a particular feature in the SVM decision function. A certain fraction of the least important variables is then removed, and the process is repeated iteratively until the feature list is empty. The precise order of features might change from iteration One repetition of a sequence of instructions or events. For example, in a program loop, one iteration is once through the instructions in the loop. See iterative development. (programming) iteration - Repetition of a sequence of instructions. to iteration. Because of the multivariate The use of multiple variables in a forecasting model. properties of the SVM algorithm, each feature ranking takes into account (at least to some extent) correlations between single variables. Evaluating the classification performance at each step makes it possible not only to identify, a suitable subset of descriptors but also to determine how many of them are actually needed for a reliable classification. Redundant features also tend to be eliminated during RFE, typically resulting in very compact feature sets (Guyon et al. 2002). We implemented RFE on top of the libsvm software, in the beginning, a user-definable fraction of the least important genes is removed in each iteration. After reaching a certain threshold number, only one more gene is eliminated in each step. We experimented with several values for the fraction and lower threshold values to further improve the classification performance of our classifiers. Presentation of Support Vector Machine Results A binary SVM discriminating between two classes is trained by presenting the training samples of one class (A) as positive examples while samples belonging to the other class (B) act as negative examples. An SVM prediction (i.e., the value of the decision function when a new data example is tested) is simply a real number called the discriminant dis·crim·i·nant n. An expression used to distinguish or separate other expressions in a quantity or equation. . If the discriminant is positive, the example is considered belonging to class A. Similarly, a negative number would indicate membership in class B. The absolute value of the discriminant can be regarded as a measure of confidence for the classification. If there are more than two distinct classes, several binary classifiers must be combined to obtain a prediction for a new sample. When applying the one-versus-all scheme (see above), n classifiers have to be created for n classes. A new data example is then tested with all these SVMs and therefore the result consists of n real values from which the most probable class assignment must be inferred. Classification results can be presented as plots of discriminant values that were obtained from a set of SVMs (Figure 1). A unique assignment is possible if only one SVM produces a positive output for a certain sample. If a treatment group is not classified uniformly, we assign the corresponding compound to a category by majority vote, with 60% as the cutoff. [FIGURE 1 OMITTED] Sometimes two- or three-dimensional scatter plots See scatter diagram. were produced for visualizing the class separation of one model (Figure 2). These diagrams map all training and test examples into one coordinate system coordinate system Arrangement of reference lines or curves used to identify the location of points in space. In two dimensions, the most common system is the Cartesian (after René Descartes) system. and often reveal some (expected or unexpected) internal structure of the data such as subclusters or single outliers. The dimensionality reduction is achieved by plotting linear combinations of features against each other. Coefficients are obtained from the SVM decision function [Supplemental data (http://ehp. niehs.nih.gov/txg/members/2004/7036/ 7036supplement.pdf)]. [FIGURE 2 OMITTED] Results Histopathology and Clinical Chemistry--Profiles Used for Training Support Vector Machine Models We used SVMs as a supervised learning method to generate classification rules. It was of crucial importance to provide training labels on the basis of solid evidence. Therefore, a complete serum chemistry profile and liver histopathology were performed on virtually all rats treated with various model compounds or the appropriate vehicle controls. This information in conjunction with published data provided the basis to allocate gene expression profiles to a specific training class (Table 1). Gene Expression Analysis Gene expression profiles from individual rat livers treated with vehicle or test compounds were analyzed using the Affymetrix U34A GeneChip. All microarrays included in the analysis fulfilled our established quality parameters [Supplemental data (http://ehp.niehs.nih.gov/txg/members/ 2004/7036/7036supplement.pdf)]. All treatments caused transcriptional changes with respect to their corresponding time-matched controls. In all studies, > 150 genes were expressed above background and showed at least a 2-fold modulation with a p-value < 0.05 (two-tailed, unpaired t test). Assessment of Time Effects in Vehicle Control-Treated Rat Livers (Early versus Late) Supervised analysis of gene expression data suffers if parameters other than the investigated effects correlate with the classes for which one tries to identify typical finger-prints. The studies evaluated in this article differed in vehicle, application route, and time point (Table 1). We assumed that time-dependent effects could be the confounding factor with the most noticeable impact on the results. Thus, we analyzed gene expression patterns from vehicle-treated animals (i.e., controls) at various time points. A classification attempt was made using the same time points used for the toxicity classifications in this article (early class is 6 hr, late class is 24 hr up to several days). We obtained a prediction accuracy of 70% and an MCC of 0.41, whereas random shuffling of the analyzed microarrays gave MCC values close to zero, indicating that the observed variations can be attributed to time effects. Using this approach, 14 genes were selected as the best set of discriminative dis·crim·i·na·tive adj. 1. Drawing distinctions. 2. Marked by or showing prejudice: discriminative hiring practices. features. These results confirmed that there are indeed some observable time-related effects [Supplemental data, Table 1 (http://ehp.niehs.nih.gov/txg/members/ 2004/7036/7036supplement.pdf)]. However, because time points of the control microarrays typically vary within one class of toxicity, we expected the SVM to identify the time-dependent genes as not relevant for the toxicity predictions. Actually, only one gene of this subset appeared in one toxicity classifier (rc_AA799616_at) but with a low weight. Effect of the Rat Strain Because different rat strains are widely used in toxicology, we investigated the effect of strain differences of Wistar and Sprague-Dawley rats for classification based on transcript profiles. Our database consisting of Wistar rat data was used to generate an SVM. Subsequently, gene expression profiles from vehicle control and WY14643-treated Sprague-Dawley livers were used to assess whether the model would correctly classify individual animals from another rat strain. All five controls were clearly identified as controls (Figure 1). Their transcript profiles yielded negative discriminants for all SVMs except the control SVM, where positive values marked those profiles as controls. Animals treated with 250 mg/kg WY14643 were unanimously assigned to the peroxisomal proliferator class. Here, discriminant values were positive only for the peroxisomal SVM and negative for all other categories. As those results indicated that gene expression profiles from Sprague-Dawley and Wistar rats are comparable, transcript profiles from WY14643 were included in further models. Generation of Toxic/Nontoxic and Multitoxicity Models We generated a binary classifier for the discrimination of vehicle controls and animals treated with a toxic compound. In addition, to predict the mode of action, multitoxicity models were also created. For both the binary and If two conditions are combined by and, they must both be true for the compound condition to be true as well. Likewise, two bits may be combined with and: x y x AND y 0 0 0 0 1 0 1 0 0 1 1 1 I.e. the multiclass case, we used the same data set for training the SVMs, applying either the C-formulation or the [upsilon]-formulation of the algorithm. To extract smaller sets of truly discriminative genes, we integrated RFE in the process of model building. This enabled us to study performance parameters such as sensitivity and specificity under CV in relation to the number of genes used. All the models were evaluated with an external CV scheme that omits a whole treatment group (typically five animals In the Chinese martial arts, imagery of the Five Animals (Chinese: 五形; Pinyin: wǔ xíng ) per cycle. Therefore, a complete RFE run bad to be carried out for each of the 60 groups (see "Materials and Methods"). Furthermore, all SVMs were validated with an independent test set that contained different doses and time points of the same substances used for training as well as some new compounds. The results obtained were similar for C-SVMs and [upsilon]-SVMs, although the number of used genes at the point of optimal performance seemed to be smaller for the [upsilon]-SVM. However, C-SVM most often outperformed [upsilon]-SVM in terms of classification accuracy. Of all 26 toxic substances, C-SVM could not detect 4, whereas [upsilon]-SVM missed 6 compounds. A clear majority of the control groups were correctly classified under CV. In this respect there was no pronounced difference between the two formulations. Toxic/Nontoxic Model Results for the binary toxic/nontoxic classification are summarized in Table 2. The test set of 63 vehicle-control groups demonstrates how well those models generalize generalize /gen·er·al·ize/ (-iz) 1. to spread throughout the body, as when local disease becomes systemic. 2. to form a general principle; to reason inductively. on previously unseen data [details in Supplemental data (http://ehp. niehs.nih.gov/txg/members/2004/7036/ 7036supplement.pdf)]. Not every single microarray, but all groups were correctly identified as controls using the described voting procedure. Almost 90% of the toxic test groups were correctly classified as toxic using C-SVM. The model did not produce any false-positive predictions. However, there were some false-negatives, as not all toxic treatments could be recognized as toxic. Multiclass Model As the next step we aimed at predicting the mode of toxicity. For this purpose, a control class and three categories of toxicity were initially defined: cholestasis Cholestasis Definition Cholestasis is a condition caused by rapidly developing (acute) or long-term (chronic) interruption in the excretion of bile (a digestive fluid that helps the body process fat). , steatosis steatosis /ste·a·to·sis/ (ste?ah-to´sis) fatty change. ste·a·to·sis n. See fatty degeneration. steatosis fatty degeneration. See also muscular steatosis. , and direct acting. Subsequently, we added peroxisome proliferator-activated receptor In cell biology, peroxisome proliferator-activated receptors (PPARs) are a group of nuclear receptor isoforms that exist across biology. They are intimately connected to cellular metabolism (carbohydrate, lipid and protein) and cell differentiation. [alpha] (PPAR-[alpha]) agonists as a separate class without any loss in prediction accuracy (Table 3). We refer to this as the 4-modes-of-toxicity (4MOT) model. This is an imperfect simplification of the classification task, as some of the compounds show more than one form of hepatotoxicity, depending on dose and time. Therefore, time points at which a specific toxicity was most apparent were selected for the analysis. We generated five different SVMs following the one-versus-all approach, that is, each of the models was trained to discriminate between a certain class of toxicity and the set union of all other expression profiles. In a first step, the individual classifiers were built and optimized separately using the same CV procedure described before. Subsequently, a class assignment for each single microarray in the training or test set was done by combining the output of the five models (Tables 4 and 5). In most cases the prediction was unanimous, that is, just one SVM delivered a positive discriminant and the others returned negative values (e.g., Figure 3A-C A-C Air Conditioning ). In cases where a profile obtained more than one positive discriminant value or only negative numbers, the biggest value determined the classification (e.g., Figure 4). The optimal gene number for classification depends on the category of toxicity. For example, peroxisomal proliferation/PPAR agonists could be recognized with one single marker gene A marker gene is used in molecular biology to determine if a piece of DNA has been successfully inserted into the host organism. There are two types of marker genes: selectable markers and markers for screening. . Nevertheless, the final classifier used four features because of our strategy of simplifying the model by also minimizing the number of support vectors (see "Materials and Methods"). The four top probe sets represent only two distinct genes, acctyl-Coenzyme A acyltransferase 1 (peroxisomal 3-oxoacyl-Coenzyme A thiolase) and cytochrome cytochrome (sī`təkrōm'), protein containing heme (see coenzyme) that participates in the phase of biochemical respiration called oxidative phosphorylation. P450 4A1. Both genes are well-known PPAR-[alpha] responsive genes, and the corresponding upregulation has been described extensively in the literature (Hansmannel et al. 2003; Lee et al. 2003). [FIGURES 3&4 OMITTED] The model for the control group required the most features (122). Performance again was rather similar for [upsilon]-SVM and C-SVM. In the case of [upsilon]-SVM, 274 distinct features were used altogether for discriminating among the five classes of toxicity. However, a reduction to 86 features did not lead to a significant loss in predictivity, indicating that this set could be used for an assay in a 96-well format (data not shown). Categories of toxicity differ not only in optimal feature number but also in prediction accuracy. Under CV as well as in the test set, all toxicant toxicant /tox·i·cant/ (tok´si-kant) 1. poisonous. 2. poison. tox·i·cant n. 1. A poison or poisonous agent. 2. An intoxicant. adj. categories are recognized with a very high specificity, whereas the controls are identified with a high sensitivity. This means that our model produces virtually no false-positive outcomes but at the cost of some false-negative results. All treatment groups within the direct-acting category are either correctly classified or, in the case of aflatoxin, at least recognized as treated with a toxic substance (Table 4). Phalloidin phalloidin /phal·loi·din/ (fah-loid´in) a hexapeptide poison from the mushroom Amanita phalloides, which causes asthenia, vomiting, diarrhea, convulsions, and death. phal·loi·din n. is another example that was identified as toxic, but profiles were classified into two toxicity categories--cholestatic and direct acting. Amiodarone, glibenclamide, and chlorpromazine chlorpromazine (klōrpräm`əzēn'), one of a group of tranquilizing drugs called phenothiazines that are useful in halting psychotic episodes. 1 were not recognized as toxic. Classification of our test set again confirmed the good performance of the model. The classification of 332 test control microarrays with an error rate of 0.6% is remarkable. Using the criteria described above, the success rate in classifying the corresponding 63 control groups is 100% (as seen before in the binary classification Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are Identification of Nonresponding Animals Galactosamine gal·ac·tos·am·ine n. An amino-acid derivative of galactose occurring in various mucopolysaccharides. galactosamine, n treatment of rats usually leads to hepatitis associated with necrosis necrosis /ne·cro·sis/ (ne-kro´sis) pl. necro´ses [Gr.] the morphological changes indicative of cell death caused by progressive enzymatic degradation; it may affect groups of cells or part of a structure or an organ. and inflammation. Animals were treated once with 400 mg/kg galactosamine or vehicle only and sacrificed after 24 hr. In four of five galactosamine-treated animals, there was clear evidence of toxicity assessed by hematology, clinical chemistry, and histopathology; one animal was a nonresponder. Gene expression profiles of individual animals were tested using the 4MOT model described previously. Classification results are in perfect agreement with the assessment using conventional end points. However, gene expression profiling seems to be more sensitive than clinical chemistry and histopathology, as the data point corresponding to the nonresponding rat is clearly shifted toward the direct-acting group (Figure 2). Pharmacologic Effects Are Differentiated from Toxicologic Effects Pharmacologically active substances can alter gene expression, but a predictive model for hepatotoxicity should not confuse a substance with a desired pharmacologic effect with an unwanted toxic outcome. Three nonhepatotoxic but pharmacoactive substances were tested with the SVM models. Although 100 mg/kg gentamicin gentamicin /gen·ta·mi·cin/ (jen?tah-mi´sin) an aminoglycoside antibiotic complex isolated from bacteria of the genus Micromonospora, (sc) led to nephrotoxicity neph·ro·tox·ic·i·ty n. The quality or state of being toxic to kidney cells. nephrotoxicity(ne·fr at 24 hr, no hepatotoxicity was associated with it, nor was hepatotoxicity detected with deprenyl or lazabemide. All three nonhepatotoxic substances were correctly classified as nontoxic using both, the toxic/nontoxic as well as the 4MOT model (Figure 3). These results show that our toxicity classifiers can distinguish well between pharmacologic effects without toxicity and toxicologically relevant transcriptional changes. Classification of Hepatotoxic Compounds with Mechanisms of Toxicity not Represented in Our Database--Lipopolysaccharide, Phenobarbital phenobarbital /phe·no·bar·bi·tal/ (fe?no-bahr´bi-tal) a long-acting barbiturate, used as the base or sodium salt as a sedative, hypnotic, and anticonvulsant. phe·no·bar·bi·tal n. , and Indomethacin indomethacin /in·do·meth·a·cin/ (in?do-meth´ah-sin) a nonsteroidal antiinflammatory drug; used in the treatment of various rheumatic and nonrheumatic inflammatory conditions, dysmenorrhea, and vascular headache. Compounds with mechanisms of toxicity (MOTs) not represented in our training set were used to investigate how they would be classified by our models. The toxic/nontoxic model had the easier task, as dissimilarity with control profiles would already indicate some toxicity-related abnormality in gene expression. The 4MOT model had to classify an unrepresented unrepresented adj → nicht vertreten profile to one of the five available classes. Lipopolysaccharide lipopolysaccharide /lipo·poly·sac·cha·ride/ (-pol?e-sak´ah-rid) 1. a molecule in which lipids and polysaccharides are linked. 2. (LPS LPS - Sets with restricted universal quantifiers. ["Logic Programming with Sets", G. Kuper, J Computer Sys Sci 41:44-64 (1990)]. ) (4 mg/kg iv) was investigated 6 and 24 hr after dosing and identified as toxic by the toxic/nontoxic model. The 4MOT model classified four animals as steatotic and one as cholestatic after 6 hr (Figure 4A). After 24 hr four animals were also classified as steatotic and one as direct acting. No sample was misclassified as a control. Phenobarbital (80 mg/kg ip) was also investigated at 6 and 24 hr. At 24 hr all animals fit into the steatotic category (Figure 4B). At 6 hr, four profiles were most similar to the steatotic group and one to the cholestatic group. In this case most discriminant values were very low, indicating differences with respect to the existing classes. Another interesting example was indomethacin, which was administered either as a single high dose (20 mg/kg po; sample collection at 6 or 24 hr) or as a repeated low dose (5 mg/kg po; daily dosing during 1 week). In the liver, minimal to slight hepatocellular hypertrophy hypertrophy (hīpûr`trəfē), enlargement of a tissue or organ of the body resulting from an increase in the size of its cells. Such growth accompanies an increase in the functioning of the tissue. and decreased glycogen deposition were observed in animals treated with 20 mg/kg at 24 hr and in animals treated for 1 week with 5 mg/kg/day. The repeated dosing also caused tubular dilation dilation /di·la·tion/ (di-la´shun) 1. the act of dilating or stretching. 2. dilatation. di·la·tion n. 1. in the kidney and erosive e·ro·sive adj. Causing erosion. and/or ulcerative ulcerative /ul·cer·a·tive/ (ul´se-ra?tiv) (ul´ser-ah-tiv) pertaining to or characterized by ulceration. ulcerative pertaining to or characterized by ulceration. inflammations in the gastrointestinal tract gastrointestinal tract n. The part of the digestive system consisting of the stomach, small intestine, and large intestine. Gastrointestinal tract . At 6 hr the substance was classified as predominately cholestatic and at 24 hr clearly as steatotic. After 7 days of dosing, three animals were classified as steatotic and two as cholestatic. These profiles had positive discriminants for three toxicity categories (cholestatic, steatotic, direct acting). This indicates that indomethacin is different from our predefined toxicity categories and displays mixed toxicity (Figure 4C). Most important, a very clear dissimilarity from the control group indicated that the indomethacin-treated animals had been exposed to a toxic compound, although the mode of toxicity could not be unequivocally defined. Discussion Gene Expression Profiling The present work aimed to provide evidence that transcript profiles can be used to distinguish compound-treated rat livers from controls and to discriminate between different MOTs. Rats were treated with a variety of vehicles, and hepatotoxic or non-hepatotoxic but pharmacologically active compounds. We focused on hepatotoxicity, as the liver is a main target for toxic reactions. Various questions were addressed in the context of predictive toxicity modeling, including sanimal variability, rat strain differences, effect of time, and discrimination of pharmacologically from toxicologically induced gene changes. Several authors have described the use of gene expression profiling to classify toxicants in rodent liver and thereby demonstrated the potential of toxicogenomics in predictive risk assessment (Bulera et al. 2001; Hamadeh et al. 2002a; Thomas et al. 2001; Waring et al. 2001b). We used a larger number of compounds and selected a different bioinformatics approach to analyze the data. New in this study is the modeling of different categories of toxicity in conjunction with numeric measures for the classification confidence. Our results demonstrate that for different compounds with similar MOT, the likely toxicologic end point can be inferred from gene expression profiles using a database of model compounds as a training set. Moreover, we found good correlation of gene expression changes with histopathologic findings. These results are consistent with those of a previously reported study where methapyrilene toxicity correlated with the severity of pathologic changes (Hamadeh et al. 2002c). Feature Selection SVMs can handle very high-dimensional feature spaces, so there is no pressing need to filter out a small number of genes in a first step. In contrast to many published microarray studies, we did not apply strict cutoffs like 2-fold changes, p-value thresholds, or similar criteria.. These approaches could easily spoil one of the main advantages of a multivariate classification method such as SVMs, as prefiltering of features by common univariate methods (such as the t test) might remove genes that do not reach significance when tested individually but provide useful information when taken together with other, correlated variables. In contrast, RFE allowed us to combine feature selection and model building in a consistent framework, making use of the mutual information between genes (Guyon et al. 2002). We leave it to the method to eliminate noisy, irrelevant variables in the process of forming smaller and smaller subsets of genes with discriminatory power. The approach also helped to avoid the introduction of a feature selection bias, which occurs if information from all experiments is used to reduce the number of genes before any CV is done. However, it is important to remember that the gene lists we obtained are in no way a complete picture of the cellular response but a redundancy-reduced selection of markers that together allow a maximum predictivity. The relationship between gene number and classification performance was studied using RFE, and subsequently the optimal iteration was chosen. Our results indicate that accurate prediction of toxicity (including the category of toxicity) can be achieved using a small set consisting of a few up to some dozens of features (Table 3). In the case of the 4MOT model, the feature number can be reduced from 274 to 86 without major performance impairment. The observation that more genes do not necessarily translate into higher predictive accuracy is consistent with previous findings (Ramaswamy et al. 2003; Thomas et al. 2001), indicating that it is not necessary to measure the whole transcriptome or thousands of genes to predict toxicity. Once initial experiments have led to an optimized set of relevant informative features, a potentially faster and cheaper assay could be developed providing essentially the same classification performance. Interestingly, using only the selected features for hierarchical clustering also resulted in a toxicologically meaningful result, whereas unsupervised clustering with all genes often failed at classifying the animals according to the criteria of interest (data not shown). However, it is worth mentioning that none of the genes in the final set is guaranteed to act as a good toxicity marker on its own because we do not rank features according to their suitability as single markers (univariate approach) but rather optimize whole subsets of features (multivariate approach), in this setting it is possible that a gene that does not appear differentially expressed in two groups can still contribute useful information by combination with other genes. Therefore, it is often the signature taken as a whole that provides the decisive discriminatory power. Marker gene sets identified with the described method are especially prone to show this effect because of the multivariate nature of SVMs and the tendency of the RFE algorithm to eliminate redundant features from the set (Guyon et al. 2002). As gene expression analysis can also be applied in vitro in vitro /in vi·tro/ (in ve´tro) [L.] within a glass; observable in a test tube; in an artificial environment. in vi·tro adj. In an artificial environment outside a living organism. (Burczynski et al. 2000; Waring et al. 2001a), the question arises whether the list of features obtained could be used in a cell-based assay. This seems questionable, as significant differences in gene expression in vitro compared with in vivo in vivo /in vi·vo/ (ve´vo) [L.] within the living body. in vi·vo adj. Within a living organism. in vivo adv. were reported (Boess et al. 2003). Therefore, we expect that results concerning discriminative features and their weights cannot be directly transferred to in vitro classification systems. In addition, the evaluation of the compound effects in vivo is especially important when multiple cell types and possibly multiple organs are involved in the toxicologic response. Confounding Effects A crucial issue when using supervised classification methods is that there must be solid evidence for the initial assignment of gene expression profiles to each category. Therefore, we included only microarrays from animals where independent evidence justified allocation to a specific class. In most cases, histopathologic anchoring was used, but clinical chemistry and occasionally additional biochemical assays (triglyceride assays, data not shown) were also considered. Anchoring to conventional end points was the reason for the heterogeneity het·er·o·ge·ne·i·ty n. The quality or state of being heterogeneous. heterogeneity the state of being heterogeneous. of time points used in the training procedure. This kind of heterogeneity might act as a confounding factor, introducing signatures not related to the toxicity classification problem itself. Special care must be taken to ensure that these confounding factors do not exhibit decisive influence on the model. The potentially confounding effect of time was addressed first, as several authors have highlighted extensive circadian gene expression changes (Kita et al. 2002; Panda et al. 2002; Storch et al. 2002). For this purpose, the same time points (6 hr, 24 hr, and several days) used within our toxicity models were used to train a two-class SVM model for classification of early or late time points. A classifier based on 14 genes was obtained, but predictivity was far from perfect and resulted in a relatively low MCC of 0.41. (Test MCC values for the toxicity classifiers were all > 0.80.) Although these results confirm some time dependency in our experiments, we have no reason to assume that this strongly affects our toxicity models, as we always combined control profiles from all time points in the same group for training. Together with the fact that none of the genes from the time classifiers appeared at a prominent position (with significant weight) in the toxicity models, these results suggest that there is no distinct time bias. In fact, classification of vehicle controls from the test set (originating from independent studies and including various time points) was correct in more than 99% of the cases, which confirms the absence of time bias for the control component of the classifier. Wistar, Sprague-Dawley, and Fischer rats are all frequently used in risk assessment. There is ample evidence that those strains vary in their susceptibilities to various toxicants or mutagens (Asamoto et al. 1989; Kulkarni et al. 1996). Therefore, we investigated whether a model built with Wistar rat expression profiles would be predictive for treatment effects in SpragueDawley rats. PPAR PPAR Peroxisome Proliferator Activated Receptor PPAR Physical Partitions agonists were chosen for this comparison for pragmatic reasons. At the time we studied proprietary PPAR agonists, we were also involved in the Consortium for Metabonomic Toxicology (COMET), where liver tissue collection of WY14643-treated Sprague-Dawley rats could be included. [COMET has been formed by Imperial College (London) and six major pharmaceutical companies. The objective is to apply metabonomics to the toxicologic assessment of compounds (Lindon et al. 2003).] Treated rats as well as controls fit perfectly into the anticipated classes. The classification was successful despite the additional confounding factor introduced by the fact that the SpragueDawley rats were approximately 6 weeks younger than the Wistar rats. This successful class prediction was the rationale for including those expression profiles in our predictive models. As the results suggest, the discriminative transcriptional changes are largely conserved across strains, although the doses required to produce comparable toxicity may vary. Another confounding factor trot the classification task is that pharmaceuticals not only show a toxic effect on gene regulation but might also influence gene expression according to their pharmacologic action. A crucial test for the classification of toxicants based on gene expression profiles is certainly the ability to separate pharmacologic from true toxic effects. Our models succeeded at classifying three pharmacologically active, nonhepatotoxic compounds. In the case of gentamicin, not even the observed nephrotoxicity led to a false prediction of hepatotoxicity. The classification of these three compounds as nonhepatotoxic was not due to a general lack of effects on hepatic gene expression; more than 100 genes were differentially expressed for these compounds, as assessed by fold change together with t test (at least 2-fold change and p-value < 0.05). Mixed Toxicities All transcript profiles were assigned to a specific category, implying that they fit exactly into one class. However, in reality, substances often cause mixed toxicities. We aimed to allocate substances to the best-fitting class, knowing the limitations due to the potential overlap of effects. Our results indicate that characteristic gene expression changes are indeed associated with distinct classes of toxicants. However, as compounds cannot be put into exclusive bins in a strict sense, some substances (aflatoxin, indomethacin, and phalloidin) were predicted to be associated with multiple toxicities. Aflatoxin, for example, needs metabolic activation to exert its toxic effect. It causes generation of reactive oxygen species reactive oxygen species, n molecules and ions of oxygen that have an unpaired electron, thus rendering them extremely reactive. Many cellular structures are susceptible to attack by ROS contributing to cancer, heart disease, and cerebrovascular disease. , lipid peroxidation Lipid peroxidation refers to the oxidative degradation of lipids. It is the process whereby free radicals "steal" electrons from the lipids in cell membranes, resulting in cell damage. This process proceeds by a free radical chain reaction mechanism. , glutathione glutathione: see coenzyme. depletion, and necrosis and therefore has a direct effect on cells (Liu et al. 1999). On the other hand, it is a well-known carcinogen carcinogen: see cancer. carcinogen Agent that can cause cancer. Exposure to one or more carcinogens, including certain chemicals, radiation, and certain viruses, can initiate cancer under conditions not completely understood. (Smela et al. 2001) and is reported to induce both cholestasis (Unger et al. 1977) and steatosis (Amaya-Farfan 1999). Based on classical end points, we decided to allocate aflatoxin to the direct-acting group. The SVM classification of gene expression profiles, however, indicated a greater similarity to cholestatic than to direct-acting compounds. One possible way to address this problem might be to generate several one-versus-control categories and include the aflatoxin samples in both the direct-acting and the cholestatic classes. Another option would be to exclude all compounds from training that do not unambiguously fit into one single category. Reported effects of indomethacin in rats are immediately direct, like adenosine adenosine /aden·o·sine/ (ah-den´o-sen) a purine nucleoside consisting of adenine and ribose; a component of RNA. It is also a cardiac depressant and vasodilator used as an antiarrhythmic and as an adjunct in myocardial perfusion imaging triphosphate triphosphate /tri·phos·phate/ (tri-fos´fat) a salt containing three phosphate radicals. tri·phos·phate n. A salt or ester containing three phosphate groups. depletion in hepatocytes (Masubuchi et al. 1998) and a marked decrease in the hepatic monooxygenase system (Fracasso et al. 1990). Gene expression profiles of rats dosed with indomethacin were classified as cholestatic and steatotic but also matched the direct-acting group. Clinical chemistry supported this mixed toxicity prediction to some extent, as ALP, GGT, AST, and LDH were increased. Histopathology revealed hypertrophy and minimal to slight necrosis, but changes were considered to be adaptive rather than reflecting an adverse effect. In patients, however, cases of cholestasis and steatosis have been reported (Farrel 1994). It remains to be confirmed whether the genomics approach is more sensitive than histopathology in detecting liabilities. Results classifying galactosamine-treated rats using the multitoxicity model support this hypothesis. Galactosamine treatment leads to hepatitis associated with necrosis and inflammation, but a high degree of interanimal variation is well known (Vomel and Platt 1986). In our study four of five expression profiles were identified as toxic while the fifth was classified as control. This classification as nonresponder was in agreement with absence of findings using conventional end points. However, a three-dimensional plot of the SVM results revealed a shift of the expression profile of the nonresponder toward the direct-acting group (Figure 2), suggesting increased sensitivity of the toxicogenomics approach. Model Assessment We used a compound-based external CV scheme (see "Materials and Methods") to obtain more realistic estimates for the classification performance and to select a model from which we can expect a good generalization power. It has to be kept in mind that the compound database is still limited in size, and we do not know whether our set of substances is a representative sampling of the complete toxicology space. Therefore, we cannot completely rule out some sampling bias, which would render our performance estimates too optimistic op·ti·mist n. 1. One who usually expects a favorable outcome. 2. A believer in philosophical optimism. op . Conversely, our CV procedure intrinsically tends to deliver a rather conservative assessment of the performance, as at least some of the compounds provide vital information that is lost as soon as a whole treatment group is withheld from training. For example, glibenclamide (dosed at 25 mg/kg) was not recognized as a cholestatic compound under CV; the final SVMs correctly classified two of five animals in the test set, despite the fact that these had been treated with a lower dose (2.5 mg/kg) and histopathology was only evident in animals that received the high dose. Because of the partial overlap of compounds in the training and test set, one would expect a smaller fraction of misclassifications under test conditions than with the more rigorous CV method. This was indeed observed in most cases (Tables 2 and 3) and emphasizes the extent to which interpretation of results depends on the details of the applied evaluation method. Although in this case the number of CV errors can provide information about the generalization ability of a model, the test performance should be regarded as a measure for its consistency with respect to a certain selection of compounds. For our application we clearly wanted to optimize the former; therefore, we used the described CV scheme to select the best SVMs. When we tentatively switched to a more standard, leave-one-out procedure, not a single CV error occurred. However, the test accuracy was significantly decreased, indicating that a classifier with less generalization ability had been generated by this standard CV method. The current model was based on histopathologic and clinical chemistry data and performs best on data comparable to the training data. If there is no evidence of toxicity (see deprenyl, lazabemide, gentamicin), the gene expression profiles are not wrongly assigned to a toxicity category. Classification of lower but still somewhat toxic doses toxic dose TD50 Toxicology The calculated dose of a chemical introduced by a route other than inhalation, that would cause a specific toxic effect in 50% of a defined experimental animal population Cf Lethal concentration, Lethal dose. was successful with dichlorobenzene (1,500 mmol/kg), amineptine (0.25 mmol/kg), acetaminophen acetaminophen (əsēt'əmĭn`əfĭn), an analgesic and fever-reducing medicine similar in effect to aspirin. It is an active ingredient in many over-the-counter medicines, including Tylenol and Midol. (1,000 mg/kg), bromobenzene (1 mmol/kg), Rx50 (1 mg/kg/day), Rx51 (0.13 mg/kg/day), and Rx60 (0.38 mg/ kg/day). Nevertheless, detection of toxic substances applied at subtoxic doses can be successful; examples include Rxl0 (125 mg/kg/day) and some animals treated with glibenclamide 2.5 mg/kg. However, it is important to remember that the current model was based on solid pathology and therefore optimized for specificity. If borderline borderline /bor·der·line/ (-lin) of a phenomenon, straddling the dividing line between two categories. borderline or doses just below detectable pathology were used to generate the model, correct classification at subtoxic doses could be expected in more cases. However, an increase in sensitivity is expected to be paid for by a reduced specificity (i.e., greater number of false positives). If there were evidence for toxicity, although with a lower histopathologic score and less-pronounced clinical chemistry changes, the model generally performed well, as indicated by the relatively high test MCC values. Examples for successful classification of earlier time points are Rx99 (24 hr) and methylene methylene /meth·y·lene/ (meth?i-len) the bivalent hydrocarbon radical —CH2— or CH2dbond. meth·yl·ene n. dianiline (3 hr). However, very early times can often affect a different set of genes than those noted at later times (Heijne et al. 2004; Ruepp et al. 2002). Although the test set contained some of the same compounds as the training set, the experiments in the test set used lower doses or samples collected at earlier time points. Therefore, the high classification accuracy observed with the test data indicates good sensitivity of our models. An interesting observation was that the two experiments using chlorpromazine were not equally well classified. Chlorpromazine was expected to have a cholestatic effect at the tested doses (Knodell 1975), but the animals were classified predominantly as nontoxic in the first experiment and as cholestatic in the second experiment. However, these differences in gene expression profiles are in agreement with differences in conventional end points, probably because of biological and/or experimental differences, as both experiments were performed at different sites and with slightly different sample processing protocols. Summarizing, we demonstrated that classification problems in toxicogenomics can be effectively addressed by a supervised learning approach. We applied SVMs on microarray data from a set of model hepatotoxicants. Combining SVM parameter optimization with a compound-based external CV scheme and (RFE), we were able to obtain accurate classification (i.e., high sensitivity and specificity) of the compounds included in the training set as well as for previously unseen compounds. In addition, RFE allowed us to select a relatively compact subset of probe sets with potential use as biomarkers. Thus, our results show that toxicogenomics is a very powerful tool for classification of compounds according to their toxicity mechanism when a well-designed database is combined with appropriate bioinformatics tools. Despite these promising results, further investigations must be performed to increase the usefulness of transcript profiling in toxicology. A larger database and refined analysis methods are anticipated to further improve prediction accuracy. We focused mainly on high doses that led to clear toxicity as assessed by conventional end points. However, it has been reported that a compound affects different genes and pathways depending on the administered dose (Andrew et al. 2003). Thus, a next step will be to include expression profiles from lower doses in the model-building process. Earlier time points should also be considered. This will allow us to assess whether gene expression changes are already indicative of toxic liabilities when standard parameters do not yet detect toxicity. In addition, for classification purposes it is irrelevant whether the gene expression changes considered good discriminants for a toxic response are causally linked to the toxicity. Nevertheless, to gain further insight into a specific MOT, it is valuable to interpret results in a biological context, analyzing the altered pathways and their relationship to observed pathology or phenotype phenotype (fē`nətīp'): see genetics. phenotype All the observable characteristics of an organism, such as shape, size, colour, and behaviour, that result from the interaction of its genotype (total genetic makeup) with . These investigations could help separate transcriptional changes that are relevant for the mode of toxicity from mere bystander effects
REFERENCES Alizadeh AA, Ross DT, Perou CM, Van De Rijn M. 2001. Towards a novel classification of human malignancies based on gene expression patterns. J Pathol 195:41-52. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor tumor: see neoplasm. and normal colon tissues probed by oligonucleotide Oligonucleotide A deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence composed of two or more covalently linked nucleotides. Oligonucleotides are classified as deoxyribooligonucleotides or ribooligonucleotides. arrays. Proc Natl Acad Sci USA 96:6745-6750. Amaya-Farfan J. 1999. Aflatoxin [B.sub.1]-induced hepatic steatosis: role of carbonyl carbonyl /car·bon·yl/ (kahr´bah-nil) the bivalent organic radical, C:O, characteristic of aldehydes, ketones, carboxylic acid, and esters. car·bon·yl n. The bivalent radical CO. compounds and active diols on steatogenesis. Lancet 353:747-748. Andrew AS, Warren AJ, Barchowsky A, Temple KA, Klei L, Soucy NV, et al. 2003. Genomic and proteomic profiling of responses to toxic metals toxic metal Environment Any metal known to be toxic to humans–eg, antimony, arsenic, beryllium, bismuth, cadmium, lead, mercury, nickel. Cf Nontoxic metal. in human lung The human lungs are the human organs of respiration. Humans have two lungs, with the left being divided into two lobes and the right into three lobes. Together, the lungs contain approximately 1500 miles (2,400 km) of airways and 300 to 500 million alveoli, having a total cells. Environ Health Perspect 111:825-838. Arbeitman MN, Furlong furlong: see English units of measurement. EE, Imam F, Johnson E, Null BH, Baker BS, et al. 2002. Gene expression during the life cycle of Drosophila melanogaster. Science 297:2270-2275. Asamoto M, Tsuda H, Kato T, Ito N, Masuko T, Hashimoto Y, et al. 1989. Strain differences in susceptibility to 2-acetylaminofluorene and phenobarbital promotion of hepatocarcinogenesis: immunohistochemical analysis of cytochrome P-450 isozyme isozyme /iso·zyme/ (i´so-zim) one of the multiple forms in which an enzyme may exist in an organism or in different species, the various forms differing chemically, physically, or immunologically, but catalyzing the same reaction. induction by 2-acetylaminofluorene and phenobarbital. Jpn J Cancer Res 80:1041-1046. Boess F, Kamber M, Romer S, Gasser Gas·ser , Herbert Spencer 1888-1963. American physiologist. He shared a 1944 Nobel Prize for research on the functions of nerve fibers. R, Muller D, Albertini S Albertini is Italian surname. Notable people with this name include:
Boser BE, Guyon IM, Vapnik VN. 1992. A training algorithm for optimal margin classifiers. In: Proceedings of the 5thAnnual International Conference on Computational Learning Theory In theoretical computer science, computational learning theory or computational learning problem is a mathematical field related to the analysis of machine learning algorithms. it is traditionally referred to as grammatical inference problem. , 27-29 July 1992, Pittsburgh, Pennsylvania “Pittsburgh” redirects here. For the region, see Pittsburgh Metropolitan Area. Pittsburgh (pronounced IPA: /ˈpɪtsbɚg/) is the second largest city in the Commonwealth of Pennsylvania. . Pittsburgh, PA:ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. Press, 144-152. Brem RB, Yvert G, Clinton R, Kruglyak L. 2002. Genetic dissection dissection /dis·sec·tion/ (di-sek´shun) 1. the act of dissecting. 2. a part or whole of an organism prepared by dissecting. of transcriptional regulation in budding yeast. Science 296:752-755. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, et al. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97:262-267. Bulera SJ, Eddy SM, Ferguson E, Jatkoe TA, Reindel JF, Bleavins MR, at al. 2001. RNA expression in the early characterization of hepatotoxicants in Wistar rats by high-density DNA microarrays DNA microarray A small solid support, usually a membrane or glass slide, on which sequences of DNA are fixed in an orderly arrangement. DNA microarrays are used for rapid surveys of the expression of many genes simultaneously, as the sequences contained on a . Hepatology 33:1239-1258. Burczynski ME, McMillian M, Ciervo J, Li L, Parker JB, Dunn RT II, et al. 2000. Toxicogenomics-based discrimination of toxic mechanism in HepG2 human hepatoma hepatoma /hep·a·to·ma/ (hep?ah-to´mah) 1. a tumor of the liver. 2. hepatocellular carcinoma (malignant h.). hep·a·to·ma n. pl. cells. Toxicol Sci 58:399-415. Chang CC, Lin CJ. 2001. LIBSVM: a library for support vector machines. Available: http://www.csie.ntu. edu.tw/~cjlin/libsvm [1 January 2003]. Cristianini N, Shawe-Taylor J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press Cambridge University Press (known colloquially as CUP) is a publisher given a Royal Charter by Henry VIII in 1534, and one of the two privileged presses (the other being Oxford University Press). . Donald S, Verschoyle RD, Edwards R, Judah DJ, Davies R, Riley J, et al. 2002. Hepatobiliary damage and changes in hepatic gene expression caused by the antitumor an·ti·tu·mor also an·ti·tu·mor·al adj. Counteracting or preventing the formation of malignant tumors; anticancer. Adj. 1. drug ecteinascidin-743 (ET-743) in the female rat. Cancer Res 62:4256-4262. Eisen MB, Spellman PT, Brown PO, Botstein D. 1998. Cluster analysis Cluster analysis A statistical technique that identifies clusters of stocks whose returns are highly correlated within each cluster and relatively uncorrelated across clusters. Cluster analysis has identified groupings such as growth, cyclical, stable, and energy stocks. and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863-14868. Enard W, Khaitovich P, Klose J, Zollner S, Heissig F, Giavalisco P, et al. 2002. Intra- and interspecific in·ter·spe·cif·ic adj. Arising or occurring between species. interspecific also interspecies Arising or occurring between species. Adj. 1. variation in primate gene expression patterns. Science 296:340-343. Everitt B. 1974. Cluster Analysis. London: Heinemann. Farrell GC. 1994. Drug-Induced Liver Disease Liver Disease Definition Liver disease is a general term for any damage that reduces the functioning of the liver. Description The liver is a large, solid organ located in the upper right-hand side of the abdomen. . London: Churchill Livingstone Imprint of a medical publishing company owned by Elsevier Ltd, but previously owned by Harcourt and Pearsons. Originally formed from Livingstone, Edinburgh, Scotland, and J & A Churchill, London, UK, and subsequently with an office in New York, but now integrated with the rest of . Fracasso ME, Leone R, Cuzzolin L, Oel Soldato P, Velo GP, Benoni G. 1990. Indomethacin induced hepatic alterations in mono-oxygenase system and faecal fae·cal adj. Chiefly British Variant of fecal. Adj. 1. faecal - of or relating to feces; "fecal matter" fecal Clostridium perfringens Clostridium per·frin·gens or Clostridium welchii n. Gas bacillus. Clostridium perfringens Infectious disease An anaerobic gram-positive spore-forming rod, widely distributed in nature and present in the enterotoxin enterotoxin /en·tero·tox·in/ (en´ter-o-tok?sin) 1. a toxin specific for the cells of the intestinal mucosa. 2. a toxin arising in the intestine. 3. in the rat. Agents Actions 31:313-316. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906-914. Guyon IM, Weston J, Barnhill S Barnhill may refer to:
Hamadeh HK, Bushel bushel: see English units of measurement. PR, Jayadev S, DiSorbo O, Bennett L, Li L, et al. 2002a. Prediction of compound signature using high density gene expression profiling. Toxicol Sci 67:232-240. Hamadeh HK, Bushel PR, Jayadev S, Martin K, Di Sorbo O, Sieber S, et al. 2002b. Gene expression analysis reveals chemical-specific profiles. Toxicol Sci 67:219-231. Hamadeh HK, Knight BL, Haugen AC, Sieber S, Amin RP, Bushel PR, et al. 2002c. Methapyrilene toxicity: anchorage of pathologic observations to gene expression alterations. Toxicol Pathol 30:470-482. Hansmannel F, Clemencet MC, Le Jossic-Corcos C, Osumi T, Latruffe N, Nicolas-Frances V. 2003. Functional characterization of a peroxisome Peroxisome An intracellular organelle found in all eukaryotes except the archezoa (original lifeforms). In electron micrographs, peroxisomes appear round with a diameter of 0.1–1. proliferator response-element located in the intron Intron In split genes, a portion that is included in ribonucleic acid (RNA) transcripts but is removed from within a transcript during RNA processing and is rapidly degraded. 3 of rat paroxisomal thiolase B gene. Biochem Biophys Res Commun 311:149-155. Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, et al. 2000. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol 1:3-21. Heijne WH, Slitt AL, van Bladeren PJ, Groten JP, Klaassen CD, Stierum RH, et al. 2004. Bromobenzene-induced hepatotoxicity at the transcriptome level. Toxicol Sci 79:411-422. Joliffe IT. 1986. Principal Component Analysis. New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of :Springer. Kaiser R, Gottschalk G. 1972. Elementare Tests zur Beurteilung von Messdaten [in German]. Mannheim/Wien/Zuerich:Bibliographisches Institut The German publishing company "Bibliographisches Institut" was founded 1826 in Gotha by Joseph Meyer, moved 1828 to Hildburghausen and 1874 to Leipzig. Its production over the years includes such well-known titles as "Meyers Lexikon" 18-21. Kita Y, Shiozawa N, Jin WH, Majewski RR, Besharse JC, Greene AS, et al. 2002. Implications of circadian gene expression in kidney, liver and the effects of fasting on pharmacogenomic studies. Pharmacogenetics Pharmacogenetics Definition Pharmacogenetics is the study of how the actions of and reactions to drugs vary with the patient's genes. Description 12:55-65. Knodell RG. 1975. Effects of chlorpromazine on bilirubin metabolism and biliary biliary /bil·i·a·ry/ (bil´e-ar?e) pertaining to the bile, to the bile ducts, or to the gallbladder. bil·i·ar·y adj. 1. Of or relating to bile, the bile ducts, or the gallbladder. secretion in the rat. Gastroenterology gastroenterology Medical specialty dealing with digestion and the digestive system. In the 17th century Jan Baptista van Helmont conducted the first scientific studies in the field; William Beaumont published his own observations in 1833. 69:965-972. Kohonen T. 1995. Self Organizing Maps. Berlin:Springer. Kulkarni SG, Duong H, Gomila R, Mehendale HM. 1996. Strain differences in tissue repair response to 1,2-dichlorobenzene. Arch Toxicol 70:714-723. Lee CH, Olson P, Evans RM. 2003. Minireview: lipid metabolism Lipid metabolism The assimilation of dietary lipids and the synthesis and degradation of lipids; this article is restricted to mammals. The principal dietary fat is triglyceride. , metabolic diseases metabolic disease, n a disorder that causes dysfunction of the metabolic action of the body, resulting in loss of control of homeostasis. paraneoplastic syndrome , and peroxisome proliferator-activated receptors. Endocrinology 144:2201-2207. Lindon JC, Nicholson JK, Holmes E, Antti H, Bollard bol·lard n. 1. Nautical A thick post on a ship or wharf, used for securing ropes and hawsers. 2. Chiefly British One of a series of posts preventing vehicles from entering an area. ME, Keun H, et al. 2003. Contemporary issues in toxicology: the role of metabonomics in toxicology and its evaluation by the COMET project. Toxicol Appl Pharmacol 187:137-146. Liu J, Yang CF, Lee BL, Shen Shen, in the Bible, place, perhaps close to Bethel, near which Samuel set up the stone Ebenezer. HM, Ang SG, Ong CN. 1999. Effect of Salvia miltiorrhiza Salvia miltiorrhiza (Traditional Chinese: 丹參; Simplified Chinese: 丹参; Pinyin: dǎnshēn), also known as on aflatoxin [B.sub.1]-induced oxidative stress oxidative stress, n an imbalance of the prooxidant antioxidant ratio in which too few antioxidants are produced or ingested or too many oxidizing agents are produced. in cultured rat hepatocytes. Free Radic Res 31:559-568. Masubuchi Y, Saito H, Horie T. 1998. Structural requirements for the hepatotoxicity of nonsteroidal anti-inflammatory drugs Nonsteroidal Anti-Inflammatory Drugs Definition Nonsteroidal anti-inflammatory drugs are medicines that relieve pain, swelling, stiffness, and inflammation. in isolated rat hepatocytes. J Pharmacol Exp Ther 287:208-213. Matthews BW. 1975. Comparison of the predicted and observed secondary structure of t4 phage phage: see bacteriophage. phage - A program that modifies other programs or databases in unauthorised ways; especially one that propagates a virus or Trojan horse. See also worm, mockingbird. The analogy, of course, is with phage viruses in biology. lysozyme lysozyme: see immunity. Lysozyme An enyme that was first identified and named by Alexander Fleming, who recognized its bacteriolytic properties. . Biochim Biophys Acta 405:442-451. Panda S, Antoch MP, Miller BH, Su AI, Schook AB, Straume M, et al. 2002. Coordinated transcription of key pathways in the mouse by the circadian clock. Cell 109:307-320. Ramaswamy S, Ross KN, Lander ES, Golub TR. 2003. A molecular signature of metastasis metastasis /me·tas·ta·sis/ (me-tas´tah-sis) pl. metas´tases 1. transfer of disease from one organ or part of the body to another not directly connected with it, due either to transfer of pathogenic microorganisms or to in primary solid tumors. Nat Genet genet: see civet. 33:49-54. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, et al. 2001. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98:15149-15154. Ruepp S, Tonge RP, Shaw J, Wallis N, Pognan F. 2002. Genomics and proteomics analysis of acetaminophen toxicity in mouse liver. Toxicol Sci 65:135-150. Scholkopf B, Guyon IM, Weston J. 2003. Statistical learning and kernel methods Kernel Methods (KMs) are a class of algorithms for pattern analysis, whose best known element is the Support Vector Machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, in bioinformatics. In: Artifical Intelligence and Heuristic Methods heuristic method Decision making A form of problem-solving based, not on scientific proof but rather on plausible, possible, or creative conclusions to questions that cannot be answered in the context of, or the 'logic' of which lies outside of, a currently in Bioinformatics (Frasconi P, Shamir R, eds). Amsterdam:I0S Press, 1-21. Scholkopf B, Smola A. 2002. Learning with Kernels. Cambridge, MA:MIT MIT - Massachusetts Institute of Technology Press. Smela ME, Currier SS, Bailey EA, Essigmann JM. 2001. The chemistry and biology of afiatoxin B(1): from mutational spectrometry spectrometry /spec·trom·e·try/ (spek-trom´e-tre) determination of the wavelengths or frequencies of the lines in a spectrum. spec·trom·e·try n. to carcinogenesis car·ci·no·gen·e·sis n. The production of cancer. carcinogenesis production of cancer. biological carcinogenesis viruses and some parasites are capable of initiating neoplasia. . Carcinogenesis 22:535-545. Storch KF, Lipan O, Leykin I, Viswanathan N, Davis FC, Wong WH, et al. 2092. Extensive and divergent circadian gene expression in liver and heart. Nature 417:78-83. Suter L, Haiker M, De Vera MC, Albertini S. 2003. Effect of two 5-ht(6) receptor antagonists on the rat liver: a molecular approach. Pharmacogenomics Pharmacogenomics is the branch of pharmacology which deals with the influence of genetic variation on drug response in patients by correlating gene expression or single-nucleotide polymorphisms with a drug's efficacy or toxicity. J 3:320-334. Thomas RS, Rank DR, Penn SG, Zastrow GM, Hayes KR, Pande K, et al. 2001. Identification of toxicologically predictive gene sets using cDNA microarrays. Mol Pharmacol 69:1189-1194. Unger PD, Mehendale HM, Hayes AW. 1977. Hepatic uptake and disposition of aflatoxin [B.sub.1] in isolated perfused rat liver. Toxicol Appl Pharmacol 41:523-534. Van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, et al. 2002. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347:1999-2009. Vapnik VN. 1998. Statistical Learning Theory Statistical learning theory is an ambiguous term.
Vomel T, Platt D. 1986. Age-dependent phagocytosis phagocytosis: see endocytosis. Phagocytosis A mechanism by which single cells of the animal kingdom, such as smaller protozoa, engulf and carry particles into the cytoplasm. of erythrocytes Erythrocytes Red blood cells. Mentioned in: Bartonellosis erythrocytes (ē·rithˑ·rō·sīts), n.pl red blood cells. by the isolated perfused rat liver after galactosamine hepatitis and alpha-aphthylisothiocyanate cholestasis. Arch Gerontol Geriatr 5:351-539. Wang Y, Rea T, Bian J, Gray S, Sun Y. 1999. Identification of the genes responsive to etoposide-induced apoptosis apoptosis or programmed cell death Mechanism that allows cells to self-destruct when stimulated by the appropriate trigger. It may be initiated when a cell is no longer needed, when a cell becomes a threat to the organism's health, or for other reasons. : application of DNA microarray technology. FEBS FEBS Federation of European Biochemical Societies Lett 445:269-273. Waring JF, Ciurlionis R, Jolly RA, Heindel M, Ulrich RG. 2001a. Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. Toxicol Lett 120:359-368. Waring JF, Jolly RA, Ciurlionis R, Lum n. 1. A chimney. 2. A ventilating chimney over the shaft of a mine. 3. A woody valley; also, a deep pool. PY, Praestgaard JT, Morfitt DC, et al. 2001b. Clustering of hepatotoxins based on mechanism of toxicity using gene expression profiles. Toxicol Appl Pharmacol 175:28-42. Yeang C, Ramaswamy S, Tamayo P, Mukherjee S, Rifkin RM, Angelo M, et al. 2001. Molecular classification of multiple tumor types. Bioinformatics 17:316-322. Zhang W, Wang H, Song SW, Fuller GN. 2002. Insulin-like growth factor binding protein The Insulin-like growth factor binding protein serves as a carrier protein for Insulin-like growth factor 1. Approximately 98% of IGF-1 is always bound to one of 6 binding proteins (IGF-BP). IGFBP-3, the most abundant protein, accounts for 80% of all IGF binding. . 2: Gene expression microarrays and the hypothesis-generation paradigm. Brain Pathol 12:87-94. Zimmermann N, King NE, Laporte J, Yang M, Mishra A, Pope SM, et al. 2003. Dissection of experimental asthma with DNA microarray analysis identifies arginase arginase /ar·gi·nase/ (ahr´ji-nas) an enzyme existing primarily in the liver, which hydrolyzes arginine to form urea and ornithine in the urea cycle. ar·gi·nase n. in asthma pathogenesis. J Clin Invest 111:1863-1874. Guido Steiner, (1,2) Laura Suter, (1) Franziska Boess, (1) Rodolfo Gasser, (1) Maria Cristina de Vera, (1) Silvio Albertini, (1) (1 Non-Clinical Drug Safety and Bioinformatics, F. Hoffman-La Roche Ltd., Basel, Switzerland Address correspondence to S. Ruepp, F. Hoffmann-La Roche Ltd., PRBN-S (90/5.18), CH-4070 Basel, Switzerland. Telephone: 41 61 688 3315. Fax: 41 61 688 8101. E-mail: stef:an.ruepp@roche.com Supplemental data is available online (http://ehp.niehs.nih.gov/txg/members/2004/ 7036/7036supplement.pdf). We thank M. Haiker, N. Flint, S. Romer, K. Rupp, K. Schad, and C. Zihlmann for their excellent technical support and the General Toxicology group for their support. We are also deeply indebted to C. Broger, M. Neeb, B. Gaisser, and D. Wolf from the Bioinformatics group for their excellent support. The authors declare they have no competing financial interests. Received 17 February 2004; accepted 1 July 2004.
Table 1. Histopathology and clinical chemistry results of rats used
included in the SVM training set.
Substance/dose/ Vehicle/route Expected binary
CAS no./supplier of administration class/4-MOT class
Aflatoxin [B.sub.1] Saline +0.5% Toxic/direct
4 mg/kg, 24 hr DMSO/ip
1162-65-8
Sigma
Bromobenzene Corn oil/ip Toxic/direct
3 mmol/kg, 24 hr
108-86-1
Aldrich
Carbon tetrachloride Corn oil/po Toxic/direct
(C[Cl.sub.4])
2 mg/kg, 24 hr
56-23-5
Fluka
Hydrazine Saline/ip Toxic/direct
60 mg/kg, 24 hr
302-01-2
Sigma
Thioacetamide Saline/ip Toxic/direct
50 mg/kg, 24 hr
62-55-5
Sigma-Aldrich
1,2-Dichlorobenzene Corn oil/ip Toxic/direct
4,500 mmol/kg, 24 hr
95-50-1
Fluka
Coumarin Corn oil/po Toxic/direct
200 mg/kg, 24 hr
91-64-5,
Sigma
Acetaminophen Saline +0.5% Toxic/direct
2 g/kg, 24 hr DMSO/po
103-90-2
Fluka
Amineptine Saline/ip Toxic/steatosis
0.5 mmol/kg/day, 2 days
57574-09-1
Servier Laboratories
Amiodarone 7.5% gelatine/ip Toxic/steatosis
100 mg/kg/day, 4 days
1951-25-3
Sigma
Rx74 (Antidiabetic) Klucel/po Toxic/steatosis
250 mg/kg/day, 5 days
Not available
Roche
Rx75 (Antidiabetic) Klucel/po Toxic/steatosis
100 mg/kg/day, 5 days
Not available
Roche
Rx10 (Antidiabetic) Klucel/po Toxic/steatosis
500 mg/kg/day, 5 days
Not available
Roche
Rx99 (5-[HT.sub.6] antagonist) [H.sub.2]O/PO Toxic/steatosis
400 mg/kg/day, 14 days
Not available
Roche
Chlorpromazine 1 Saline/iv Toxic/cholestasis
15 mg/kg, 6 hr
69-09-0
Sigma
Chlorpromazine 2 Saline/iv Toxic/cholestasis
15 mg/kg, 6 hr
69-09-0
Sigma
Cyclosporin A 10% intralipid/iv Toxic/cholestasis
30 mg/kg, 6 hr
59865-13-3
Alexis
Glibenclamide 7.5% gelatine/iv Toxic/cholestasis
25 mg/kg, 6 hr
10238-21-8
Roche
Phalloidin Saline/iv Toxic/cholestasis
0.8 mg/kg, 6 hr
17466-45-4
Sigma
Methylene dianiline Corn oil/po Toxic/cholestasis
100 mg/kg, 6 hr
101-77-9
Fluka
WY14643 Corn oil/po Toxic/PP
250 mg/kg, 14 days
50892-23-4
Sigma-Aldrich
Rx90 (PPAR-[delta] agonist) PBS/po Toxic/PP
180 mg/kg/day, 14 days
Not available
Roche
Rx53 (PPAR-[alpha],[gamma] PBS/po Toxic/PP
co-agonist)
0.9 mg/kg/day, 14 days
Not available
Roche
Rx60 (PPAR-[alpha],[gamma] PBS/po Toxic/PP
co-agonist)
1.5 mg/kg/day, 14 days
Not available
Roche
Rx51 (PPAR-[alpha],[gamma] PBS/po Toxic/PP
co-agonist)
0.5 mg/kg/day, 14 days
Not available
Roche
Rx50 (PPAR-[alpha],[gamma] PBS/po Toxic/PP
co-agonist)
4 mg/kg/day, 14 days
Not available
Roche
Substance/dose/
CAS no./supplier Liver histopathology
Aflatoxin [B.sub.1] Hepatocellular hypertrophy, apoptosis,
4 mg/kg, 24 hr inflammation, glycogen depletion,
1162-65-8 bile duct proliferation
Sigma
Bromobenzene Centrilobular to midzonal hepato-
3 mmol/kg, 24 hr cellular hydropic swelling, necrosis
108-86-1 with mixed inflammation
Aldrich
Carbon tetrachloride Hepatocellular degeneration, single-
(C[Cl.sub.4]) cell necrosis, inflammation,
2 mg/kg, 24 hr microvesicular steatosis
56-23-5
Fluka
Hydrazine Hepatocellular necrosis with
60 mg/kg, 24 hr inflammation, mild microvesicular
302-01-2 steatosis
Sigma
Thioacetamide Hepatocellular vacuolation and
50 mg/kg, 24 hr necrosis
62-55-5
Sigma-Aldrich
1,2-Dichlorobenzene Centrilobular to midzonal hepato-
4,500 mmol/kg, 24 hr cellular hydropic swelling, necrosis
95-50-1 with mixed inflammation
Fluka
Coumarin Hepatocellular hypertrophy, single-
200 mg/kg, 24 hr cell necrosis, lymphocytic
91-64-5, infiltration
Sigma
Acetaminophen Centrilobular hepatocellular
2 g/kg, 24 hr vacuolation, single-cell necrosis,
103-90-2 inflammation
Fluka
Amineptine Hepatocellular microvesicular
0.5 mmol/kg/day, 2 days steatosis, glycogen depletion
57574-09-1
Servier Laboratories
Amiodarone Hepatocellular microvesicular
100 mg/kg/day, 4 days steatosis, glycogen depletion
1951-25-3
Sigma
Rx74 (Antidiabetic) ND (a)
250 mg/kg/day, 5 days
Not available
Roche
Rx75 (Antidiabetic) ND (a)
100 mg/kg/day, 5 days
Not available
Roche
Rx10 (Antidiabetic) ND (b)
500 mg/kg/day, 5 days
Not available
Roche
Rx99 (5-[HT.sub.6] antagonist) Hepatocellular microvesicular
400 mg/kg/day, 14 days steatosis
Not available
Roche
Chlorpromazine 1 ND
15 mg/kg, 6 hr
69-09-0
Sigma
Chlorpromazine 2 Hepatocellular microvesicular
15 mg/kg, 6 hr steatosis, glycogen depletion
69-09-0
Sigma
Cyclosporin A NSF
30 mg/kg, 6 hr
59865-13-3
Alexis
Glibenclamide Hepatocellular hypertrophy
25 mg/kg, 6 hr
10238-21-8
Roche
Phalloidin Hepatocellular necrosis, hemorrhage,
0.8 mg/kg, 6 hr glycogen depletion
17466-45-4
Sigma
Methylene dianiline Single-cell necrosis of bile duct
100 mg/kg, 6 hr epithelium, inflammation
101-77-9
Fluka
WY14643 Increased hepatocellular mitoses,
250 mg/kg, 14 days slight glycogen depletion, increased
50892-23-4 liver weight (7 days)
Sigma-Aldrich
Rx90 (PPAR-[delta] agonist) Liver enlargement, diffuse hepato-
180 mg/kg/day, 14 days cellular hypertrophy
Not available
Roche
Rx53 (PPAR-[alpha],[gamma] Increased liver weight, hepatocellular
co-agonist) hypertrophy and cytoplasmic
0.9 mg/kg/day, 14 days granulation
Not available
Roche
Rx60 (PPAR-[alpha],[gamma] Increased liver weight, hepatocellular
co-agonist) hypertrophy and cytoplasmic
1.5 mg/kg/day, 14 days granulation, increased mitoses,
Not available single-cell necrosis with mixed
Roche inflammation
Rx51 (PPAR-[alpha],[gamma] Increased liver weight, hepatocellular
co-agonist) hypertrophy and cytoplasmic
0.5 mg/kg/day, 14 days granulation
Not available
Roche
Rx50 (PPAR-[alpha],[gamma] Increased liver weight, hepatocellular
co-agonist) hypertrophy and cytoplasmic
4 mg/kg/day, 14 days granulation
Not available
Roche
Substance/dose/
CAS no./supplier Serum clinical chemistry
Aflatoxin [B.sub.1] Increased bile acids, bilirubin,
4 mg/kg, 24 hr AST, ALT, LDH, ALP, 5'-NT
1162-65-8
Sigma
Bromobenzene Increased bilirubin, 5'-NT, albumin;
3 mmol/kg, 24 hr decreased triglycerides
108-86-1
Aldrich
Carbon tetrachloride Increased GGT, liver triglycerides;
(C[Cl.sub.4]) decreased glucose, albumin
2 mg/kg, 24 hr
56-23-5
Fluka
Hydrazine Increased 5'-NT
60 mg/kg, 24 hr
302-01-2
Sigma
Thioacetamide Increased GGT, AST, ALT, ALP,5'-NT;
50 mg/kg, 24 hr decreased glucose, triglycerides,
62-55-5 cholesterol, protein
Sigma-Aldrich
1,2-Dichlorobenzene Increased ALP, albumin, decreased
4,500 mmol/kg, 24 hr triglycerides
95-50-1
Fluka
Coumarin Increased total protein, GLD
200 mg/kg, 24 hr
91-64-5,
Sigma
Acetaminophen Increased albumin; decreased
2 g/kg, 24 hr triglycerides
103-90-2
Fluka
Amineptine Increased GGT, ALP, cholesterol,
0.5 mmol/kg/day, 2 days decreased triglycerides
57574-09-1
Servier Laboratories
Amiodarone Increased GGT, 5'-NT; decreased
100 mg/kg/day, 4 days serum and increased liver
1951-25-3 triglycerides
Sigma
Rx74 (Antidiabetic) ND (a)
250 mg/kg/day, 5 days
Not available
Roche
Rx75 (Antidiabetic) ND (a)
100 mg/kg/day, 5 days
Not available
Roche
Rx10 (Antidiabetic) ND (b)
500 mg/kg/day, 5 days
Not available
Roche
Rx99 (5-[HT.sub.6] antagonist) Increased ALT, GGT; increased liver
400 mg/kg/day, 14 days lipids and phospholipids
Not available
Roche
Chlorpromazine 1 Increased bilirubin, glucose;
15 mg/kg, 6 hr decreased triglycerides
69-09-0
Sigma
Chlorpromazine 2 Increased glucose, decreased
15 mg/kg, 6 hr triglycerides, protein
69-09-0
Sigma
Cyclosporin A Increased bile acids, bilirubin, GGT
30 mg/kg, 6 hr
59865-13-3
Alexis
Glibenclamide Increased ALT; decreased glucose
25 mg/kg, 6 hr
10238-21-8
Roche
Phalloidin Increased bilirubin, bile acids,
0.8 mg/kg, 6 hr 5'-NT, ALP, AST, ALT, LDH, SDH;
17466-45-4 decreased cholesterol, phospholipids
Sigma
Methylene dianiline
100 mg/kg, 6 hr Increased bilirubin, bile acids, GGT,
101-77-9 5'-NT, glucose, phospholipids
Fluka
WY14643 Increased ALP, glucose, SDH
250 mg/kg, 14 days
50892-23-4
Sigma-Aldrich
Rx90 (PPAR-[delta] agonist) Increased AST, ALT
180 mg/kg/day, 14 days
Not available
Roche
Rx53 (PPAR-[alpha],[gamma] Decreased cholesterol, protein
co-agonist)
0.9 mg/kg/day, 14 days
Not available
Roche
Rx60 (PPAR-[alpha],[gamma] Increased serum ALP; decreased
co-agonist) protein, bilirubin
1.5 mg/kg/day, 14 days
Not available
Roche
Rx51 (PPAR-[alpha],[gamma] Increased ALP; decreased
co-agonist) cholesterol, bilirubin, protein
0.5 mg/kg/day, 14 days
Not available
Roche
Rx50 (PPAR-[alpha],[gamma] Increased ALP, glucose; decreased
co-agonist) protein, bilirubin, cholesterol
4 mg/kg/day, 14 days
Not available
Roche
Abbreviations: DMSO, dimethylsulfoxide; ND, not done; NSF, no
significant findings; PBS, phosphate-buffered saline; PP, peroxisome
proliferator. (a) No clinical chemistry or histopathology data were
available from animals used for gene profiling, but repeated dosing
with this compound in animals used for other measurements resulted
in microvesicular steatosis. (b) No clinical chemistry or
histopathology data were available from animals used for gene
profiling. Microvesicular steatosis was not detected in rats with
this treatment schedule. However, in vitro treatment of primary rat
hepatocytes inhibited (3-oxidation and resulted in fat accumulation.
Table 2. Performance of the toxic/nontoxic models and summarized
results of the binary (toxic/nontoxic) classification. (a)
Arrays/groups for classification v-SVM
Classification under external CV
26 treatment groups 20 of 26 treatments correct
116 arrays 89 of 116 arrays correct
34 control groups 32 of 34 groups correct
163 arrays 154 of 163 arrays correct
Classification of test set
19 treatment groups 16 of 19 treatments correct
91 arrays 74 of 91 arrays correct
63 control groups 63 of 63 (all groups correct)
332 arrays 322 of 332 arrays correct
Arrays/groups for classification C-SVM
Classification under external CV
26 treatment groups 22 of 26 treatments correct
116 arrays 90 of 116 arrays correct
34 control groups 32 of 34 groups correct
163 arrays 154 of 163 arrays correct
Classification of test set
19 treatment groups 17 of 19 treatments correct
91 arrays 74 of 91 arrays correct
63 control groups 63 of 63 (all groups correct)
332 arrays 327 of 332 arrays correct
(a) During RFE, the least informative 5% of genes were removed in
each iteration starting with all features (genes) down to 64 genes.
After that, only a single gene was removed in one step. The number
of features finally selected was 63 for the v-SVM and 228 for the
C-SVM. In the case of v-SVM, RFE was carried out with v = 0.1. The
optimized v of the selected (using 63 genes) is 0.203. For C-SVM we
set C to 0.008 during RFE and ended up with C = 0.00429 for the
selected iteration. Both SVMs were equally successful in classifying
vehicle controls, but the C-SVM was slightly better in identifying
toxic treatments.
Table 3. Performance assessment of the five SVMs that form
the MOT model. (a)
Class Features CV specificity
Classification with [upsilon]-SVM
Direct 101 1
PP 4 1
Cholestasis 19 0.99
Steatosis 28 1
Control 122 0.78
Classification with C-SVM
Direct 38 1
PP 16 1
Cholestasis 32 0.98
Steatosis 50 0.99
Control 228 0.78
Class CV sensitivity CV MCC
Classification with [upsilon]-SVM
Direct 0.86 0.92
PP 1 1
Cholestasis 0.60 0.71
Steatosis 0.54 0.72
Control 0.94 0.75
Classification with C-SVM
Direct 0.84 0.90
PP 1 1
Cholestasis 0.57 0.61
Steatosis 0.67 0.75
Control 0.94 0.74
Class Optimized Test specificity
Classification with [upsilon]-SVM
Direct 0.0377 1
PP 0.01 1
Cholestasis 0.0193 0.99
Steatosis 0.0744 1
Control 0.111 0.84
Classification with C-SVM
Direct 0.0176 1
PP 0.0222 1
Cholestasis 0.10 0.98
Steatosis 0.00869 1
Control 0.00429 0.80
Class Test sensitivity Test MCC
Classification with [upsilon]-SVM
Direct 0.75 0.83
PP 1 1
Cholestasis 0.83 0.82
Steatosis 0.91 0.95
Control 0.98 0.86
Classification with C-SVM
Direct 0.75 0.83
PP 1 1
Cholestasis 0.83 0.81
Steatosis 0.91 0.95
Control 0.98 0.83
(a) Results are shown for [upsilon]-SVM and C-SVM. The RFE procedure
was identical to that described in Table 2. The number of features
selected was typically smaller for u-SVM than for C-SVM. Both types
of SVM were comparably successful in classification.
Table 4. Classification of individual microarrays and treatment
groups in training set and overview of and test results for a
[upsilon]-SVM-based model discriminating between different MOTs. (a)
Expected toxicity
Treatment category
Chlorpromazine 1 Cholestatic
Chlorpromazine 2 Cholestatic
Cyclosporin A Cholestatic
Glibenclamide Cholestatic
Methylene dianiline Cholestatic
Phalloidin Cholestatic
Aflatoxin [B.sub.1] Direct acting
1,2-Dichlorobenzene Direct acting
APAP Direct acting
Bromobenzene Direct acting
C[Cl.sub.4] Direct acting
Coumarin Direct acting
Hydrazine Direct acting
Thioacetamide 1 Direct acting
Rx50 (PPAR-[alpha], [gamma]) PP
Rx53 (PPAR-[alpha], [gamma]) PP
Rx51 (PPAR-[alpha], [gamma]) PP
Rx60 (PPAR-[alpha], [gamma]) PP
WY14643 PP
Rx90(PPAR-[delta]) PP
Rx99 (5[HT.sub.6]) Steatotic
Amineptine Steatotic
Amiodarone Steatotic
Rx74 (anitdiabetic) Steatotic
Rx75 (anitdiabetic) Steatotic
Rx10 (anitdiabetic) Steatotic
CV accuracy CV accuracy
Treatment (binary) (4MOT)
Chlorpromazine 1 1/5 (b) 1/5 (b)
Chlorpromazine 2 4/5 4/5
Cyclosporin A 4/5 4/5
Glibenclamide 0/5 (b) 0/5 (b)
Methylene dianiline 5/5 5/5
Phalloidin 3/5 2/5
Aflatoxin [B.sub.1] 2/3 1/3
1,2-Dichlorobenzene 5/5 5/5
APAP 3/5 3/5
Bromobenzene 5/5 5/5
C[Cl.sub.4] 5/5 5/5
Coumarin 5/5 5/5
Hydrazine 5/5 5/5
Thioacetamide 1 3/5 3/5
Rx50 (PPAR-[alpha], [gamma]) 5/5 5/5
Rx53 (PPAR-[alpha], [gamma]) 2/4 (b) 2/4 (b)
Rx51 (PPAR-[alpha], [gamma]) 5/5 5/5
Rx60 (PPAR-[alpha], [gamma]) 5/5 5/5
WY14643 5/5 5/5
Rx90(PPAR-[delta]) 5/5 5/5
Rx99 (5[HT.sub.6]) 3/5 3/5
Amineptine 4/5 4/5
Amiodarone 0/5 (b) 0/5 (b)
Rx74 (anitdiabetic) 3/3 3/3
Rx75 (anitdiabetic) 2/3 2/3
Rx10 (anitdiabetic) 3/3 3/3
Misclassification
Treatment in 4MOT
Chlorpromazine 1 4 controls (b)
Chlorpromazine 2 1 control
Cyclosporin A 1 control
Glibenclamide 5 controls (b)
Methylene dianiline -
Phalloidin 1 direct acting, 2 controls
Aflatoxin [B.sub.1] 1 cholestatic, 1 control
1,2-Dichlorobenzene -
APAP 2 controls
Bromobenzene -
C[Cl.sub.4] -
Coumarin -
Hydrazine -
Thioacetamide 1 2 controls
Rx50 (PPAR-[alpha], [gamma]) -
Rx53 (PPAR-[alpha], [gamma]) 2 controls (b)
Rx51 (PPAR-[alpha], [gamma]) -
Rx60 (PPAR-[alpha], [gamma]) -
WY14643 -
Rx90(PPAR-[delta]) -
Rx99 (5[HT.sub.6]) 2 controls
Amineptine 1 control
Amiodarone 5 controls (b)
Rx74 (anitdiabetic) -
Rx75 (anitdiabetic) 1 control
Rx10 (anitdiabetic) -
(a) Predictions for individual microarrays and treatment groups as
a whole were obtained using different voting schemes described in the
text. A compound-based external CV method was used for the assessment
of model quality. The upper part of the table reports the number of
microarrays correctly classified under CV conditions, either with
correct mechanism of action predicted (column 4) or with at least a
toxic effect recognized (column 3). (b) Misclassifications.
Table 5. Performance summary of the [upsilon]-SVM-based model
discriminating between different MOTs.
Arrays/groups for classification Summary
26 treatment groups 20 of 26 treatment groups
correct MOT identified
22 of 26 treatment groups
correctly identified as toxic
116 microarrays 85 of 116 microarrays correctly
classified
34 control groups 33 of 34 groups correctly
identified as vehicle
controls
163 microarrays 160 of 163 microarrays
correctly classified
Classification of independent test set
19 treatment groups 15 of 19 treatment groups
correct MOT identified
15 of 19 treatment groups
correctly identified as toxic
91 microarrays 74 of 91 microarrays correctly
classified
63 treatment groups 63 of 63 (all groups correctly
identified)
332 microarrays 330 of 332 microarrays
correctly classified
|
|
||||||||||||||||||

`mən)
The 20th letter of the Greek alphabet.
ver·op
ti·mism n.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion