Systems toxicology and the chemical effects in biological systems (CEBS) knowledge base.The National Center for Toxicogenomics is developing the first public toxicogenomics knowledge base that combines molecular expression data sets from transcriptomics, proteomics, metabonomics, and conventional toxicology toxicology, study of poisons, or toxins, from the standpoint of detection, isolation, identification, and determination of their effects on the human body. Toxicology may be considered the branch of pharmacology devoted to the study of the poisonous effects of drugs. with metabolic, toxicological pathway, and gene regulatory network A gene regulatory network (also called a GRN or genetic regulatory network) is a collection of DNA segments in a cell which interact with each other (indirectly through their RNA and protein expression products) and with other substances in the cell, thereby information relevant to environmental toxicology and human disease. It is called the Chemical Effects in Biological Systems (CEBS CEBS Committee of European Banking Supervisors CEBS Certified Employee Benefit Specialist CEBS Chemical Effects in Biological Systems CEBS Church of England Boys Society CEBS Charles Edward Brooke School (UK) ) knowledge base and is designed to meet the information needs of "systems toxicology," involving the study of perturbation perturbation (pŭr'tərbā`shən), in astronomy and physics, small force or other influence that modifies the otherwise simple motion of some object. The term is also used for the effect produced by the perturbation, e.g. by chemicals and stressors, monitoring changes in molecular expression and conventional toxicological parameters, and iteratively integrating biological response data to describe the functioning organism. Based upon functional genomics Noun 1. functional genomics - the branch of genomics that determines the biological function of the genes and their products genomics - the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences) approaches used successfully in analyzing yeast gene expression data sets, relational and descriptive compendia com·pen·di·a n. A plural of compendium. will be assembled for toxicologically important genes, groups of genes, single nucleotide polymorphisms Noun 1. single nucleotide polymorphism - (genetics) genetic variation in a DNA sequence that occurs when a single nucleotide in a genome is altered; SNPs are usually considered to be point mutations that have been evolutionarily successful enough to recur in a (SNPs), and mutant and knockout phenotypes. CEBS data sets will be fully documented in the experimental protocol and therefore searchable by compound, structure, toxicity end point, pathology end point, gene, gene group, SNP SNP Scottish National Party Noun 1. SNP - (genetics) genetic variation in a DNA sequence that occurs when a single nucleotide in a genome is altered; SNPs are usually considered to be point mutations that have been evolutionarily , pathway, and network as a function of dose, time, and the phenotype phenotype (fē`nətīp'): see genetics. phenotype All the observable characteristics of an organism, such as shape, size, colour, and behaviour, that result from the interaction of its genotype (total genetic makeup) with of the target tissue. A knowledge base is being developed by assimilating toxicological, biological, and chemical information from multiple public domain databases and by progressively refining that information about gene, protein, and metabolite metabolite, organic compound that is a starting material in, an intermediate in, or an end product of metabolism. Starting materials are substances, usually small and of simple structure, absorbed by the organism as food. expression for classes of chemicals and their biological effects in various species. By analogy to the GenBank database for genome sequences, researchers will globally query (or BLAST) CEBS using a transcriptome The transcriptome is the set of all messenger RNA (mRNA) molecules, or "transcripts", produced in one or a population of cells. The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. of a tissue of interest (or a list of outliers) to have the knowledge base return information on genes, groups of genes, metabolic and toxicological pathways, and contextually associated phenotypic phe·no·type n. 1. a. The observable physical or biochemical characteristics of an organism, as determined by both genetic makeup and environmental influences. b. information for compounds that display similar response profiles. With high-quality data content, CEBS will ultimately become a resource to support hypothesis-driven and discovery research that contributes effectively to drug safety and the improvement of risk assessments for chemicals in the environment. The CEBS development effort will span a decade or more. Key words: bioinformatics, compendia, database, global query, gene expression, heuristic A method of problem solving using exploration and trial and error methods. Heuristic program design provides a framework for solving the problem in contrast with a fixed set of rules (algorithmic) that cannot vary. 1. algorithms, knowledge base, linkage-disequilibrium, metabonomics, microarray, molecular expression, ontologies, phenotype, phenotypic anchoring, proteomics, sequence, single nucleotide polymorphisms, systems biology Systems biology, a field of study in the biosciences, focuses on the systematic study of complex interactions in biological systems. Particularly from 2000 onwards, the term is used widely in the biosciences, and in a variety of contexts. , systems toxicology, toxicogenomics, transcription factors. Environ Health Perspect 111:811-824 (2003). doi:10.1289/txg. 5971 available via http://dx.doi.org/[Online 7 November 2002] ********** Toxicogenomics is a new scientific field in which researchers study how the genome responds to environmental stressors or toxicants (Aardema and MacGregor 2002; Afshari 2002; Burchiel et al. 2001; Fielden and Zacharewski 2001; Hamadeh et al. 2002a; Nuwaysir et al. 1999; Olden old·en adj. Of, relating to, or belonging to time long past; old or ancient: olden days. [Middle English : old, old; see old + -en, adj. 2002; Tennant 2002; Thomas et al. 2001; Ulrich and Friend 2002). It combines studies of genetics, genomic-scale mRNA expression (transcriptomics), cell and tissuewide protein expression (proteomics), metabolite profiling (metabonomics), and bioinformatics with conventional toxicology in an effort to understand the role of gene-environment interactions in disease. New molecular technologies such as DNA microarray DNA microarray A small solid support, usually a membrane or glass slide, on which sequences of DNA are fixed in an orderly arrangement. DNA microarrays are used for rapid surveys of the expression of many genes simultaneously, as the sequences contained on a analysis and protein chips can measure the expression of hundreds to thousands of genes and proteins at a time, providing the potential to accelerate discovery of toxicant toxicant /tox·i·cant/ (tok´si-kant) 1. poisonous. 2. poison. tox·i·cant n. 1. A poison or poisonous agent. 2. An intoxicant. adj. pathways and specific chemical and drug targets. The power and potential of these new toxicogenomics methods are capable of revolutionizing the field of toxicology. In recognition of this fact, the National Institute of Environmental Health Sciences The National Institute of Environmental Health Sciences (NIEHS) is one of 27 Institutes and Centers of the National Institutes of Health (NIH),which is a component of the Department of Health and Human Services (DHHS). The Director of the NIEHS is Dr. David A. Schwartz. (NIEHS NIEHS National Institute of Environmental Health Sciences (NIH, DHHS) ) has created the National Center for Toxicogenomics (NCT NCT National Childbirth Trust NCT National Car Test NCT North Carolina Theatre NCT National Coordination Team NCT Northern California TRACON NCT Noise Cancellation Technology NCT Network Control and Timing NCT Nicotine Replacement Therapy ; http://www.niehs. nih.gov/nct/concept.htm). The NCT has five major goals: 1) To facilitate the application of gene and protein expression technology 2) To understand the relationship between environmental exposures and human disease susceptibility 3) To identify useful biomarkers of disease and exposure to toxic substances 4) To improve computational methods for understanding the biological consequences of exposure and responses to exposure 5) To create a public database of environmental effects of toxic substances in biological systems The NCT was formally established in September 2000 and is working to implement a strategy through which these goals can be achieved. This article is an initial response to goal 5. It delineates the conceptual framework For the concept in aesthetics and art criticism, see . A conceptual framework is used in research to outline possible courses of action or to present a preferred approach to a system analysis project. and some major design considerations for the proposed Chemical Effects in Biological Systems (CEBS) knowledge base. The concept is open for discussion and debate. Ideker et al. (2001) have used the phrase "systems biology" to describe the integrated study of biological systems at the molecular level--involving perturbation of systems, monitoring molecular expression, integrating response data, and modeling the system structure and function. Here we similarly use the phrase "systems toxicology" to describe the toxicogenomics evaluation of biological systems, involving perturbation by toxicants and stressors, monitoring molecular expression and conventional toxicological parameters, and iteratively integrating response data. CEBS will incorporate high-quality data sets from each of the new toxicogenomics technologies as well as from contemporary molecular and cellular toxicology. The goals of CEBS are a) to create a reference toxicogenomic information system of studies on environmental chemicals/stressors and their effects; b) to develop relational and descriptive compendia on toxicologically important genes, groups of genes, single nucleotide polymorphisms (SNPs), and mutant and knockout phenotypes in animal models relevant to human health and environmental disease; and c) to support hypothesis-driven research and discovery research in environmental toxicology. We must approach these goals in an incremental fashion, recognizing that in the face of rapid technological change it is impossible to anticipate all opportunities and problems that can develop. The conceptual design framework for CEBS is based upon functional genomics approaches that have been used successfully in' analyzing yeast gene expression data sets (Hughes et al. 2000). The proposed framework is illustrated in Figure 1. [FIGURE 1 OMITTED] Because CEBS will contain data on global gene expression, protein expression, metabolite profiles, and associated chemical/stressor-induced effects in multiple species (e.g., from yeast to humans), it will be possible to derive functional pathway and network information based on cross-species homology homology (hōmŏl`əjē), in biology, the correspondence between structures of different species that is attributable to their evolutionary descent from a common ancestor. . CEBS data sets will be fully documented in experimental protocols and therefore searchable by compound, structure, toxicity end point, pathology end point, gene, gene group, etc., as a function of dose, time, and the condition of the target tissue. Controlled vocabularies, dictionaries, and descriptive explanatory text or metadata (that can be processed by a computer) will guide researchers in understanding toxicogenomics data sets. A knowledge base will be developed by carefully assimilating toxicological, biological, and chemical information from multiple public domain databases and by progressively refining that information about classes of chemicals and their biological effects in various species (Tennant 2002; Zweiger 1999). By analogy to the GenBank database for genome sequences, ultimately it will be possible to query the CEBS globally using a transcriptome of a tissue of interest (or a list of outliers from a gene expression analysis) to BLAST (Altschul et al. 1990) the knowledge base and have it return information on genes, groups of genes, metabolic and toxicological pathways, and associated phenotypic information observed in data sets for hits (i.e., compounds that display similar effects in multiple tissues and species, and the dose, time, and phenotypic severity with which these effects are observed). With the expected high-quality data content, CEBS will rapidly become an important scientific resource that provides users with the suite of tools needed to interpret toxicogenomics data and a toxicological reference information system with which to model biological responses across species. As compendia of expression profiles are indexed and compared to discern diagnostic signatures, it will become increasingly possible to characterize an unknown physical or chemical exposure by comparing its gene or protein expression profile to profiles in the database. Joint research by scientists at the NIEHS Microarray Center (NMC NMC Nursing & Midwifery Council (UK) NMC NSSDC Master Catalog (NASA) NMC Northwestern Michigan College (Traverse City, Michigan) NMC National Meteorological Center ) and Boehringer-Ingelheim Pharmaceuticals has shown that global gene expression profiles for chemicals from different mode-of-action classes can provide gene expression "signatures" of chemical exposures in male rats (Hamadeh et al. 2002b, 2002c). These studies were performed on acutely exposed animals, and the expression patterns appear to be representative of the adaptive or pharmacological activity of the chemicals. Using a small training set, Hamadeh et al. (2002d) were able to correctly ascertain chemical class signatures based on pattern recognition of genes induced acutely. This study, in essence, validated the toxicogenomics hypothesis that knowledge can be gained regarding the nature of blinded samples using an initial training set of chemicals. NCT Intramural intramural /in·tra·mu·ral/ (-mu´r'l) within the wall of an organ. in·tra·mu·ral adj. Occurring or situated within the walls of a cavity or organ. Research Current NCT research aims to formally discriminate between "chemical signatures" reflecting early adaptive or pharmacological responses with no ensuing pathology and "effects signatures" that entail altered tissue steady state, toxicity, histopathology his·to·pa·thol·o·gy n. The science concerned with the cytologic and histologic structure of abnormal or diseased tissue. Histopathology The study of diseased tissues at a minute (microscopic) level. , or disease (Bartosiewicz et al. 2001). We are therefore developing learning sets of genomic profiling data for various classes of agents, with doses ranging from those that are pharmacologic to those that are toxic. We will also perform comparative studies that address cross-species differences in toxicological responses as well as susceptibility differences in human subgroups. The combined and integrated data on gene/protein/metabolite changes collected in the context of dose, time, target tissue, and phenotypic severity across species are providing the interpretive information needed to define the molecular basis for chemical toxicity and to model the resulting toxicological and pathological outcomes (Boorman et al. 2002). It will then be feasible to search for evidence of exposure or injury prior to any clinical or pathological manifestation, facilitating identification of early biomarkers of exposure, toxic injury, or susceptibility. We anticipate that toxicogenomics research will lead to the identification, measurement, and evaluation of biomarkers that are more accurate, quantitative, and specific. These biomarkers will be recognized as important factors in a sequence of key events that will help to define the way in which specific chemicals or environmental exposures cause disease. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , toxicogenomics will help to delineate the mode of action of various classes of agents and the unique attributes of certain species and population subgroups that render them susceptible to toxicants as an important step in comparatively assessing potential human health risk (Farland 1992). NCT intramural scientists are now performing additional proof-of-principle experiments designed to establish how effects signatures can be defined and to link the patterns of altered gene expression to specific parameters of well-defined, conventional indices of toxicity. For example, experiments are being designed to correlate gene expression patterns with liver pathologies such as hepatomegaly hepatomegaly /hep·a·to·meg·a·ly/ (hep?ah-to-meg´ah-le) enlargement of the liver. hep·a·to·meg·a·ly n. The abnormal enlargement of the liver. Also called megalohepatia. , hepatocellular necrosis, or inflammation. It is also possible to look for correlative Having a reciprocal relationship in that the existence of one relationship normally implies the existence of the other. Mother and child, and duty and claim, are correlative terms. patterns, for example, in enzyme levels, in liver, and in other tissues or cells such as blood. Changes in serum enzymes provide diagnostic markers of organ function that are commonly used in medicine and in toxicology. This "phenotypic anchoring" of gene expression data to conventional indices removes some of the subjectivity of conventional molecular expression analyses and helps to distinguish the toxicological signal from other gene expression changes that may be unrelated to toxicity, such as the varied pharmacological or therapeutic effects of a compound (Tennant 2002). Future NCT studies will define molecular perturbations caused by environmental chemicals in terms of phenotypic severity, dose, and time (Hamadeh 2002b). We will explore quantitative or absolute gene expression profiling Microarray technology is often used for gene expression profiling. It makes use of the sequence resources created by the genome sequencing projects and other sequencing efforts to answer the question, (Dudley et al. 2002) and consider combining such an approach with physiologically based pharmacokinetic (PB/PK) and pharmacodynamic modeling. PB/PK modeling can be used to derive a quantitative estimate of target tissue dose at any time after treatment, thus creating the possibility of anchoring molecular expression profiles in internal dose as well as in time and phenotypic severity. Relationships among gene, protein, and metabolite expression may then be described as a function of the applied dose of an agent and the ensuing kinetic and dynamic dose-response behavior in various tissue compartments. In addition, the species under study and the interspecies interindividual differences must be considered. With the aid of the knowledge systematically generated and assembled (Zweiger 1999) through literature mining, comparative analysis, and iterative it·er·a·tive adj. 1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness. 2. Grammar Frequentative. Noun 1. biological modeling of molecular expression data sets over time, the adaptive responses of biological systems will be differentiated from those changes associated with or precedent to clinical or visible adverse effects. We anticipate that our understanding of mechanisms of toxicity and disease will improve as these new methods are used more extensively and toxicogenomics databases are developed more fully. The expected result will be the emergence of toxicology as an information science that will enable thorough analysis, iterative modeling, and discovery across biological species and chemical classes. CEBS is being designed to meet the information and modeling requirements of an integrated systems toxicology illustrated conceptually in Figure 2. [FIGURE 2 OMITTED] A key priority for NCT intramural toxicogenomics studies is the profiling of specific compounds and disease processes that lead to target organ target organ n. A tissue or organ that is affected by a specific hormone. target organ, n the organ or body part whose activity levels demonstrate change in the course of biofeedback. toxicities (e.g., hepatotoxicity hepatotoxicity (hepˑ· n. The quality or state of being toxic to kidney cells. nephrotoxicity(ne·fr ). These studies will entertain the following considerations, and emphasis will be on the early steps in the disease processes. Multiple compounds that elicit a particular hepatotoxicity or nephrotoxicity will be studied at multiple sampling times after exposure. Subtoxic as well as toxic doses will be used, and nontoxic isomers isomers (ī´sōmurz), n.pl 1. organic compounds having the same empirical formula–i.e. and related compounds will be included to assess the specificity of effects observed. Drugs and chemicals will be selected for study on the basis of criteria such as human exposure and recent toxicology studies demonstrating consistent cross-species effects. Ideally, a drug will show a therapeutic effect, and chemicals will display mechanism(s) of toxicity that are prototypical for other agents, including those in our proof-of-principle studies. For example, acetaminophen acetaminophen (əsēt'əmĭn`əfĭn), an analgesic and fever-reducing medicine similar in effect to aspirin. It is an active ingredient in many over-the-counter medicines, including Tylenol and Midol. , or paracetamol paracetamol see acetaminophen. acetaminophen, paracetamol an analgesic and antipyretic drug in dogs. It is contraindicated for cats because of serious side-effects which include intravascular hemolysis, methemoglobinemia and hepatic necrosis. , is the first agent to be studied comprehensively by the NCT. Selection was based on an extensive literature (Bessems and Vermeulen 2001) showing that liver toxicity from this agent is a common response in rodents and in humans; its metabolism is similar in rodents and in humans; it displays both therapeutic and toxic effects; and there are opportunities for clinical investigation. Furthermore, acetaminophen has been studied by several laboratories using toxicogenomic methods (Cunningham et al. 2000; Reilly et al. 2001a, 2001b; Ruepp et al. 2002; Yamazaki et al. 2002), which offers the possibility of comparative assessment of observed molecular expression, toxicology, and pathology. Toxicogenomics Research Partnerships The magnitude and complexity of the science underlying the broad goals of the NCT is such that no one organization has the technical, fiscal, or intellectual resources with which to solely accomplish them. A central strategy of the NCT, therefore, is the development of partnerships with universities, other federal research and regulatory agencies, and the private sector through the formation of consortia that will address critical scientific challenges in toxicogenomics. The NCT is, in fact, a synergistic collaboration between intramural and extramural extramural /ex·tra·mu·ral/ (-mur´il) situated or occurring outside the wall of an organ or structure. extramural situated or occurring outside the wall of an organ or structure. scientists based on research partnerships. Operating under a National Institutes of Health cooperative agreement mechanism, the Toxicogenomics Research Consortium (TRC TRC Noun (in South Africa) Truth and Reconciliation Commission: a commission which encourages people who committed human rights abuses or acts of terror during the apartheid era to reveal the truth about their crimes in return for immunity from prosecution ) is a key model for achieving the strategic objectives of the NCT. The TRC consists of five academic centers in addition to the NMC: University of North Carolina at Chapel Hill The University of North Carolina at Chapel Hill is a public, coeducational, research university located in Chapel Hill, North Carolina, United States. Also known as The University of North Carolina, Carolina, North Carolina, or simply UNC ; Fred Hutchinson
The reason for its protection is listed on the protection policy page. ; Oregon Health and Science University, Portland, Oregon; Duke University, Durham, North Carolina Durham is a city in the U.S. state of North Carolina. It is the county seat of Durham CountyGR6 and is the fourth-largest city in the state by population. ; and Massachusetts Institute of Technology Massachusetts Institute of Technology, at Cambridge; coeducational; chartered 1861, opened 1865 in Boston, moved 1916. It has long been recognized as an outstanding technological institute and its Sloan School of Management has notable programs in business, , Cambridge, Massachusetts This article is about the city of Cambridge in Massachusetts. For the English university town, see Cambridge, England. For other places, see Cambridge (disambiguation). Cambridge, Massachusetts is a city in the Greater Boston area of Massachusetts, United States. . The consortium members provide specialized expertise in gene expression profiling and bioinformatics; they will perform both independent and cooperative research on various aspects of toxicogenomics. In the current state of gene expression technology, various methodologies for arraying genes and assessing mRNA expression, as well as multiple bioinformatics tools, are being applied in the analysis and management of such data. Therefore, an initial goal of the TRC is to perform a series of "standardization" experiments for gene expression in order to address sources of variation, develop standard practices, and establish data quality criteria and bioinformatics standards. Initial proof-of-principle experiments are being performed to assess the ability of the consortium members to perform standardized toxicogenomics experiments and to exchange and interpret data across multiple microarray platforms. Data generated from such experiments will be incorporated into the CEBS knowledge base and ultimately will be used to design further hypothesis-driven research. The TRC will build on these standardization experiments in performing additional collaborative studies to investigate molecular responses to various environmental stressors. These efforts of the TRC will make a unique contribution to the field of toxicogenomics and to the quality of the CEBS knowledge base. The NCT participates in a second consortium that addresses many of the same platform and bioinformatics issues as the TRC: the Health and Environmental Sciences Institute of the International Life Sciences Institute (ILSI/HESI) (http://hesi. ilsi.org/activities/index.cfm?pubentityid=8). ILSI/HESI is coordinating the efforts of approximately 30 pharmaceutical companies in a worldwide effort to harmonize cross-platform gene expression data and analysis methods. The ILSI Genomics Project is focusing on three categories of toxicants: in vivo in vivo /in vi·vo/ (ve´vo) [L.] within the living body. in vi·vo adj. Within a living organism. in vivo adv. hepatotoxins, in vivo nephrotoxins, and in vitro in vitro /in vi·tro/ (in ve´tro) [L.] within a glass; observable in a test tube; in an artificial environment. in vi·tro adj. In an artificial environment outside a living organism. genotoxins. The NCT is involved in the former two categories of study in which animals were dosed, tissues were taken for histopathology and RNA RNA: see nucleic acid. RNA in full ribonucleic acid One of the two main types of nucleic acid (the other being DNA), which functions in cellular protein synthesis in all living cells and replaces DNA as the carrier of genetic extraction, and RNA samples were then distributed to participating laboratories for microarray analysis using methods chosen by the respective participating laboratories. This type of collaboration will minimize problems associated with RNA extraction and quality control issues and provide a basis for direct comparisons among various microarray platforms. CEBS will house data from NIEHS intramural and extramural research programs and will accept high-quality data sets from other federal, academic, and industrial partners. For example, through the courtesy of Abbott Laboratories Abbott Laboratories (NYSE: ABT) is a diversified pharmaceuticals and health care company. It has over 65,000 employees and operates in 130 countries. The corporate headquarters are in Abbott Park, Illinois, a neighborhood of North Chicago, Illinois. , Rosetta Inpharmatics, Rosetta Biosoftware, and Merck Pharmaceuticals, a set of data from hepatotoxicity experiments on more than 60 chemicals and drugs (52 hepatotoxins) is being made available to the NCT (Waring et al. 2003). By agreement with these private sector partners, this learning data set will be made publicly available via CEBS to the research community. Microarray Analysis Microarray data resulting from intramural NCT toxicogenomics experiments are currently captured in the NIEHS MicroArray Project System (MAPS). MAPS is a laboratory management information system developed at NIEHS (Bushel bushel: see English units of measurement. et al. 2001) in which approximately 40 data fields are defined to a) manage microarray project information; b) detail experimental design; c) track clones, sample preparation, labeling and hybridization hybridization /hy·brid·iza·tion/ (hi?brid-i-za´shun) 1. crossbreeding; the act or process of producing hybrids. 2. molecular hybridization 3. ; and d) survey the quality control and assurance of processed microarray chips. The NMC currently produces Yeast Chip v. 1 (6.2 K clones) and four mammalian chips: the Human ToxChip v. 3. (2.2 K clones), the Rat ToxChip v. 2. (6.8 K clones), and human and mouse oligonucleotide discovery chips (17.0 K and 16.0 K oligonucleotides, respectively). Gene accession numbers for each gene or expressed sequence tag An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either protein-coding or not). They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. (EST EST electroshock therapy. EST abbr. electroshock therapy ) on each chip are automatically updated biweekly from http://www.ncbi.nlm.nih.gov/UniGene/ to reflect the current National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988. (NCBI) UniGene build. Hundreds of thousands of novel EST sequences have been included in NCBI's UniGene. NMC cDNA chips include a substantial proportion of ESTs, thus offering the potential to discover novel genes involved in important biological or toxicological outcomes and disease processes. To provide some perspective on the information management requirements of gene expression analysis, we have illustrated in Figure 3 the image and data analysis processes for microarray experiments. [FIGURE 3 OMITTED] Implementation of a CEBS Prototype With the assistance of the NIEHS Computer Technology Branch, the NCT is currently implementing a prototype version of the CEBS database through the application and integration of software developed for the NMC and the National Toxicology Program National Toxicology Program Environment A program that conducts toxicologic tests on substances frequently found at the EPA's National Priorities List sites, which have the greatest potential for human exposure (NTP (Network Time Protocol) A TCP/IP protocol used to synchronize the real time clock in computers, network devices and other electronic equipment that is time sensitive. It is also used to maintain the correct time in NTP-based wall and desk clocks. ). Toxicology and pathology data from intramural NCT toxicogenomics experiments are currently being captured in the NTP's Toxicology Database Management System (TDMS (Technical Document Management System) A software suite from IBM designed to support the engineering requirements for technical documents. Aimed at the airline and aerospace industries, TDMS integrates a number of off-the-shelf applications to interface with a ) in an Oracle database and are being integrated with microarray gene expression data (Figure 4). [FIGURE 4 OMITTED] Prototype CEBS (Model A) will be a temporary workbench for concept definition and systems integration in the development of CEBS. Nevertheless, this model will provide early public web access to NCT data sets and will implement software applications and statistical server routines required to analyze microarray data and associated toxicological information. It will provide MIAME MIAME Minimal Information About A Microarray Experiment MIAME Minimum Information About a Microarray Experiment (minimal information about a microarray experiment) (Brazma et al. 2001), supporting the MIAME standard of the Microarray Gene Expression Database (MGED MGED Microarray Gene Expression Data MGED Multidevice Graphics Editor ) Society (http://www.mged.org/). The underlying motivation for MIAME is to enable the establishment of public repositories for microarray data and to serve as a basis for designing a microarray data exchange format or markup language markup language Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML. (microarray gene expression markup language, MAGE-ML MAGE-ML MicroArray and Gene Expression Markup Language ). Many additional database standards are under review for use in the development of CEBS, but perhaps the most important ones are those under the purview The part of a statute or a law that delineates its purpose and scope. Purview refers to the enacting part of a statute. It generally begins with the words be it enacted and continues as far as the repealing clause. of MGED. MGED has expert working groups on a) experimental description and data representation standards; b) microarray data extensible markup language See XML. (language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web. http://w3.org/XML/. (XML XML in full Extensible Markup Language. Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. ) exchange format (CEBS will use XML for data exchange); c) ontology ontology: see metaphysics. ontology Theory of being as such. It was originally called “first philosophy” by Aristotle. In the 18th century Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories (Karp 2000) for sample description (CEBS will follow the gene ontologies of the Gene Ontology (GO) Consortium at http:// www.geneontology.org/ for biological process, molecular function, and cellular component); d) normalization In relational database management, a process that breaks down data into record groups for efficient processing. There are six stages. By the third stage (third normal form), data are identified only by the key field in their record. , quality markup (text) markup - In computerised document preparation, a method of adding information to the text indicating the logical components of a document, or instructions for layout of the text on the page or other information which can be interpreted by some automatic system. control, and cross-platform comparison; and e) future user group queries, query language A generalized language that allows a user to select records from a database. It uses a command language, menu-driven method or a query by example (QBE) format for expressing the matching condition. , data mining, all of which will provide important input for the development of CEBS. In addition, MGED has developed the microarray gene expression-object model (MAGE-OM), which will be used to develop proteomics and toxicology object models for CEBS. The NIEHS Scientific Computing Laboratory is currently pursuing requirements definition and object modeling for proteomics and toxicology to facilitate a seamless future integration of gene, protein, and toxicology/pathology databases. It should be noted also that the TRC is fully operational and is presently receiving database and bioinformatics support through Srinivasa Nagalla at the Oregon Health and Science University (OHSU OHSU Oregon Health & Science University (Portland, OR, USA) ) http://medir.ohsu.edu/~geneview/. OHSU has implemented an Oracle version of GeneX developed at the National Center for Genome Resources. The resource contractors who will support the TRC will come on line early in 2003. They will begin to receive samples from the TRC and to provide data sets to the CEBS prototype in 2004. With the simultaneous development of the NCT proteomics resource contract, the metabonomics research effort, and further expansion of NCT programs, data, and information resources (1) The data and information assets of an organization, department or unit. See data administration. (2) Another name for the Information Systems (IS) or Information Technology (IT) department. See IT. , the CEBS prototype will begin to evolve into the CEBS knowledge base. Systems Toxicology--Bioinformatics and Interpretive Challenges To develop a toxicogenomics knowledge base that will support the requirements of systems toxicology, we must address bioinformatics and interpretive challenges at multiple levels of biological organization and phenotypic severity. Figure 5 illustrates some of these challenges as molecular expression analysis is used to monitor the sequential adaptive, pharmacological, toxicological, and pathological events observable in biological systems after exposure to a chemical. [FIGURE 5 OMITTED] The lower levels of complexity (genes, gene groups, functional pathways) reflect our current levels of understanding and our ability to describe and package that knowledge using what might be termed "linear bioinformatics." In fact, risk assessors seek to define a sequence of key events and common (linear) modes of action for environmental chemicals and drugs (Farland 1992, 1996; Larsen et al. 2000). The networks and systems level of biological organization reflects global bioinformatics challenges, wherein the cell expresses global change constantly in response to environmental stimuli. This is a systems biology reality that can only be addressed using fully context-documented toxicogenomics data sets properly assembled with appropriate statistical and mathematical modeling to develop an integrated systems toxicology. However, a substantial amount of data entry, data processing data processing or information processing, operations (e.g., handling, merging, sorting, and computing) performed upon data in accordance with strictly defined procedures, such as recording and summarizing the financial transactions of a , and knowledge building must be performed before such advanced bioinformatics approaches can be applied. It should be recognized that the development of a knowledge base to accurately reflect global molecular expression and to aid systems biological interpretation is a complex issue dealt with only superficially in the present discussion. Keeping these challenges and concepts in mind, we now present some conceptual arguments regarding the phased development of the CEBS knowledge base--a process that undoubtedly will require a decade or more to complete. Progress in the development of CEBS can be monitored at http://www.niehs.nih.gov/nct/. Phased Development of the CEBS Knowledge Base The CEBS knowledge base will be developed in four substantially overlapping phases: Phase I involves the gathering of microarray gene expression, toxicology and pathology data, and development of gene and protein annotation 1. (programming, compiler) annotation - Extra information associated with a particular point in a document or program. Annotations may be added either by a compiler or by the programmer. and bioinformatics tools. Phase II incorporates corresponding proteomics data sets with similar annotation and bioinformatics tools and develops a temporary proteomics database. Phase III Noun 1. phase III - a large clinical trial of a treatment or drug that in phase I and phase II has been shown to be efficacious with tolerable side effects; after successful conclusion of these clinical trials it will receive formal approval from the FDA integrates gene, protein, and (ideally) metabolite databases and links them with numerous internet resources for metabolic and functional pathway discovery. Phase IV adds two additional databases, one on gene and protein groups and one on SNPs to what has been described above. The three databases then are integrated with a series of bioinformatics tools (data and literature mining) and computational algorithms designed to generate new knowledge. CEBS Phase I: Microarray Gene Expression Data, Toxicology/ Pathology Data, and Associated Analysis Tools CEBS Phase I will be a public toxicogenomics database containing data sets from the TRC, the intramural NCT research program, and from industrial and governmental partners. It will comprise mainly microarray and toxicology data and information. To assist the TRC in populating CEBS with microarray data, the NCT awarded a resource contract to provide access to high-throughput microarray gene expression analysis. As illustrated in Figure 6, CEBS Phase I will track all microarray technical and experimental components relating to relating to relate prep → concernant relating to relate prep → bezüglich +gen, mit Bezug auf +acc chip design, construction, data acquisition, image analysis, and data analysis. It will also track clone set gene sequences, descriptors and other genomic annotations, and associated toxicological/ pathological end points, and will provide basic bioinformatics tools for data analysis and biological interpretation. [FIGURE 6 OMITTED] CEBS Phase I will be protocol driven. All data sets within CEBS will be linked by reference to an experimental protocol number and metadata that will specify standard operating procedures standard operating procedure Medtalk A technique, method or therapy performed 'by the book,' using a standard protocol meeting internally or externally defined criteria; a formal, written procedure that describes how specific lab operations are to be performed. , observations, and measurements to be recorded. CEBS Phase I will include complete sample annotation (e.g., sample name, organism, biosource provider, sample source, developmental stage, age and units, time points, organ/tissue, growth conditions, medium, culture temperature, genetic variation, individual name or ID, disease state, additional clinical information and units, target cell type, cell line, treatment application, treatment type, separation technique, sample extraction method, amplification method, label, etc. All the data types (numbers, graphs, observations, images, etc.) will be related by the experimental protocol. The data to be stored and their location will be similarly identified in the process of defining the experimental protocol, as will reports to be generated and analyses to be performed. The purpose of this high degree of context documentation is to facilitate extensive query and biological interpretation. Domain-specific metadata will introduce experimental data sets in each analytical domain: transcriptomics, toxicology, pathology, etc. CEBS Phase I will incorporate raw microarray image files as well as fully processed outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results. outlier an extremely high or low value lying beyond the range of the bulk of the data. gene lists together with appropriate visualization tools. Results will be displayed or juxtaposed jux·ta·pose tr.v. jux·ta·posed, jux·ta·pos·ing, jux·ta·pos·es To place side by side, especially for comparison or contrast. in various "views," or graphic user interfaces See GUI. , that will provide insights, facilitate further analysis, and suggest new hypotheses to test. CEBS also will access biological, chemical, and toxicological resources in public domain databases, as well as pathway information such as that available in the Kyoto Encyclopedia of Genes and Genomes (KEGG KEGG Kyoto Encyclopedia of Genes and Genomes ) at http://www.genome.ad.jp/kegg/ (Ogata et al. 1999) and What Is There? (WIT) at http://wit.mcs.anl.gov/WIT2/ (Selkov et al. 1998). Links will be built to other databases such as the European Bioinformatics Institute The European Bioinformatics Institute (EBI) is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory (EMBL). It is a pioneer of novel and developmental bioinformatics research. (EBI See electron beam imaging. ) ArrayExpress database (http://www.ebi.ac.uk/microarray/ ArrayExpress/arrayexpress.html), the National Library of Medicine's Gene Expression Omnibus (GEO) Database at http://www. ncbi.nlm.nih.gov/geo/ (Edgar et al. 2002), and the NTP's new Oracle toxicology information bank. To address the first of the bioinformatics and interpretive challenges mentioned above, basic gene annotation in CEBS Phase I will be largely automated; annotation resources will be routinely consulted to provide a complete range of updated gene/protein information. The process of gene annotation is illustrated in Figure 7, and some major biological data and information resources for gene annotation are shown in Table 1. The links for these annotation resources were operational at the time of publication of this article. However, please consult the NCT website for a current list of links (http://www.niehs.nih.gov/nct/). [FIGURE 7 OMITTED] Continuous refinement of gene annotation and sequence definition will improve the interoperability of cross-platform data sets (Zweiger 1999). Steps for keeping sequence data current can be as follows: a) sequence all cDNA clones sets and refer to the known sequences of oligonucleotide sets, b) reference GenBank accession numbers and UniGene ID numbers for genes, and GenBank accession numbers and dbEST cluster ID numbers for ESTs, c) reference TIGR TIGR The Institute for Genomic Research TIGR Treasury Investment Growth Receipt TIGR This Is Getting Ridiculous TIGR Thermally Induced Gallium Removal TIGR TSPI Interface for GPS/RAJPO Gene Indices (http://www.tigr. org/tdb/tgi.shtml) for EST or oligonucleotide consensus sequence (Quackenbush et al. 2001) and MegaBLAST against Trace Archives for genomes of interest. MegaBLAST against Trace Archives compares nucleotide sequence data against the current raw data underlying first-pass sequence generated by various genome sequencing centers. This is particularly important for the rat genome, which is presently very incomplete. This effort to derive new information about incomplete genomes will substantially enhance the discovery value of ESTs on cDNA chips and will facilitate cross-species investigation of gene/protein functional analogies, which we will discuss further. Functional characterization presents a second bioinformatics and interpretive challenge. Functional characterization can involve the grouping of similar genes and gene products. A number of conventional means can accomplish this, including supervised and unsupervised classification/prediction, artificial intelligence, and various genetic algorithms Genetic algorithms Search procedures based on the mechanics of natural selection and genetics. Such procedures are known also as evolution strategies, evolutionary programming, genetic programming, and evolutionary computation. as well as a number of annotation resources just discussed. We propose to use these methods and resources in concert with query of the scientific literature to develop knowledge of the function of genes and gene products. Literature queries can facilitate gene annotation as well as biological interpretation of microarray expression results. The challenge is to deal not only with accepted microarray gene annotation names but also with legacy data in the earlier scientific literature, with the ultimate objective of making linkages of gene and protein annotations with literature on the basis of sequence information. MEDLINE The online medical database of the U.S. National Library of Medicine (NLM) whose parent is the National Institutes of Health, Bethesda, MD. MEDLINE contains millions of articles from thousands of medical journals and publications. The consumer section of the site (http://medlineplus. , the most widely accessible repository of the biomedical bi·o·med·i·cal adj. 1. Of or relating to biomedicine. 2. Of, relating to, or involving biological, medical, and physical sciences. literature, currently contains over 11 million abstracts and is growing rapidly. Unfortunately, it is difficult to use the gene name found in a nucleotide sequence database record (or as presented on a list of outliers) to search the biomedical literature effectively. The generation of names for genes and gene products based on sequence information is a significant challenge. Ultimately, genes and gene products must be linked by sequence data. Sequence-based synonym synonym (sĭn`ənĭm) [Gr.,=having the same name], word having a meaning that is the same as or very similar to the meaning of another word of the same language. Some are alike in some meanings only, as live and dwell. naming requires expertise in both data extraction Data extraction is the act or process of retrieving (binary) data out of (usually unstructured or badly structured) data sources for further data processing or data storage (data migration). and bioinformatics. Expertise in bioinformatics is required, as much of the searching will need to be done using BLAST (http://www.ncbi.nlm.nih.gov/ BLAST/; Altschul et al. 1990). Genomic BLAST pages are available for human, mouse, rat, zebrafish, and other eukaryotic eukaryotic /eu·kary·ot·ic/ (u?kar-e-ot´ik) pertaining to a eukaryon or to a eukaryote. eukaryotic pertaining to eukaryosis. eukaryotic cells see cell. and microbial microbial pertaining to or emanating from a microbe. microbial digestion the breakdown of organic material, especially feedstuffs, by microbial organisms. genomes at the NCBI's BLAST website just mentioned. Nucleotide sequence databases, e.g., GenBank or UniGene, do not contain a "gene product" name field. Instead, the name is imbedded in other information. For example, the GenBank nucleotide definition for "estrogen receptor estrogen receptor A protein of a superfamily of nuclear receptors for small hydrophilic ligands–eg, steroid hormones, thyroid hormone, vitamin D, retinoids; the presence of ERs in breast CA generally is associated with a better prognosis, as they respond to 1" (the HUGO recognized name for this receptor) is "Homo sapiens Homo sapiens (Latin; “wise man”) Species to which all modern human beings belong. The oldest known fossil remains date to c. 120,000 years ago—or much earlier (c. estrogen receptor 1 (ESR ESR - Eric S. Raymond 1), mRNA." Extraction of the appropriate search terms estrogen receptor 1 and ESR1 from the GenBank definition is a trivial task that becomes intractable when a large number of genes or protein products are being searched in the literature, or when the process is being automated, as is being contemplated in the development of the CEBS knowledge base. To improve the interoperability between microarray gene annotation and the scientific literature, all genes in the clone lists are being provided with vetted name lists. By vetting, we mean that each gene name is searched in MEDLINE, and the way in which MEDLINE parses the name is examined to ensure that it is being searched in the desired manner. For example, searching MEDLINE via Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) with the query phrase "estrogen receptor 1" does not return any abstracts. Closer inspection of the search results indicates this is because this phrase does not occur in the MEDLINE phrase index. The vast literature (more than 10,000 abstracts) concerning this receptor is only accessible with the legacy names of "estrogen receptor" and "estrogen receptor alpha." Once name lists suitable for searching MEDLINE are available, we have two tools to help mine the literature data, OmniViz and PDQ (Parallel Data Query) A query optimized for massively parallel processors (MPPs). The software breaks down the query into pieces so that several parts of the database can be searched simultaneously. See SMP. _MED. OmniViz (Battelle Memorial Laboratory, Columbus, Ohio Columbus is the capital and the largest city of the American state of Ohio. Named for explorer Christopher Columbus, the city was founded in 1812 at the confluence of the Scioto and Olentangy rivers, and assumed the functions of state capital in 1816. ) is a global literature search and visualization software package that can help greatly in obtaining an overview of relevant biomedical publications. The proximity-of-data query software, InPharmix's PDQ_MED (Sluka 2002), can facilitate rapid access to relevant abstracts in MEDLINE for multiple genes (as from a list of outliers). In CEBS Phase I, a database of gene identifiers, gene sequence, and synonym names suitable for searching the scientific literature will be available; such a database is currently in beta test A test of new or revised hardware or software that is performed by users at their facilities under normal operating conditions. Beta testing follows alpha testing. Vendors of packaged software often offer their customers the opportunity of beta testing new releases or versions, and the at NIEHS for human, mouse, rat, and yeast chips printed at the NMC. A web interface to the database will be provided allowing CEBS users to enter a chip name and a list of gene IDs or GenBank accession numbers. The output from the interface will be the list of names suitable for searching in MEDLINE or for use with a literature mining tools such as PDQ_MED or OmniViz. This is an important step toward improving the interoperability between microarray gene annotations and the scientific literature and ultimately toward building knowledge in CEBS. CEBS Phase II: Protein Expression Database and Metabonomics Data Sets The proteomics efforts within the NCT consist of an intramural research program, a proteomics resource contract, and extramural and innovative research grant awards in proteomics. The close association of the NCT microarray and proteomics research groups and the NTP provides a unique opportunity for integrating genomics, proteomics, and toxicology data sets. The proteomics group and mass spectrometry mass spectrometry or mass spectroscopy Analytic technique by which chemical substances are identified by sorting gaseous ions by mass using electric and magnetic fields. group perform hypothesis-driven research on differentially expressed proteins in key tissues and biological fluids of interest to toxicogenomics. A primary platform to separate and identify proteins used by NCT proteomics research groups is two-dimensional (2D) gel protein separation and mass spectrometry (MS), or 2D-MS. Analysis by 2D-MS creates protein maps where proteins for a specific tissue are organized by isoelectric point isoelectric point n. The pH at which the electrolyte concentration of an amphoteric substance such as protein is electrically zero because the concentration of its cation form equals the concentration of its anion form. (pi) and molecular weight (MW). To assist the NCT in populating CEBS with proteomics data, the NCT has awarded a proteomics resource contract that will allow access to high-throughput 2D-MS capabilities on an industrial scale. Critical target tissue and serum from toxicology studies is being analyzed for differential protein expression. As discussed earlier, a primary goal of NCT intramural and contract proteomic studies is biomarker discovery Biomarker discovery is the process by which biomarkers are discovered. It is a medical term. Many commonly used blood tests in medicine are biomarkers. The way that these tests have been found can be seen as biomarker discovery. for proteins (including serum/plasma proteins) indicative of chemical exposure and/or to provide mechanistic mech·a·nis·tic adj. 1. Mechanically determined. 2. Of or relating to the philosophy of mechanism, especially one that tends to explain phenomena only by reference to physical or biological causes. insight into chemical toxicity. Therefore, concurrent analysis of serum/ plasma is being performed in addition to specific target organs for each study. In addition to 2D-MS proteomics, a new platform called surface enhanced laser desorption Desorption A process in which atomic and molecular species residing on the surface of a solid leave the surface and enter the surrounding gas or vacuum. ionization ionization: see ion. ionization Process by which electrically neutral atoms or molecules are converted to electrically charged atoms or molecules (ions) by the removal or addition of negatively charged electrons. (SELDI SELDI Surface Enhanced Laser Desorption/Ionization ) is being developed intramurally to screen serum from experimental animals and clinical sources to find new biomarkers (Issaq et al. 2002). Serum proteins are selectively bound to chemically active surfaces on SELDI biochips and are rapidly scanned with high mass accuracy. The normalized serum mass spectra from chemical treatment or disease groups can be compared for differences in specific proteins or in key clusters of protein masses to serve as biomarkers of chemical exposure or disease process. Two other important aspects of NCT proteomics are the extramural proteomics granting activities through the Division of Extramural Research (DERT DERT Disaster Emergency Response Team DERT Division of Employee Relations and Training (National Institutes of Health) DERT Dryden Emergency Response Team (NASA) ) and Small Business Innovation Research (SBIR SBIR Small Business Innovation Research (program/grant) SBIR Space Based Infra-Red SBIR Speaker-Boundary Interference SBIR Site Backsurface-referenced Ideal Plane/Range (silicon wafers) ) awards, which will engage promising academic research projects in proteomics and also harness new innovative proteomics technologies for toxicology. An interim protein expression database (PED n. 1. A basket; a hammer; a pannier. ) will support the intramural proteomics group and the extramural proteomics resource and resource contract. PED will be developed based on microarray standards and proteomics best practices. PED will develop in parallel with the prototypic version of CEBS, and the analytical integration of transcriptomics and proteomics data will be studied. Many of the standards and practices applied in the interpretation of microarray and gene expression The MicroArray and Gene Expression (MAGE) group is working on a standard for the representation of microarray expression data that would facilitate the exchange of microarray information between different data systems. MAGE works within the OMG (Object Management Group). are also applicable to the interpretation of protein expression data sets. Thus, we anticipate that the object models built by MGED in the microarray gene expression database arena also will be applicable to proteomics and metabonomics. As mentioned previously, object modeling for proteomics is currently being pursued. Proteomics objects that may be linked in a linear chain by one-to-many relationships might include the biological sample, raw 2D-stained gel image, enzyme digest, feature number (protein spot), MW, pI, matrix-assisted laser desorption/ionization Matrix-assisted laser desorption/ionization (MALDI) is a soft ionization technique used in mass spectrometry, allowing the analysis of biomolecules (biopolymers such as proteins, peptides and sugars) and large organic molecules (such as polymers, dendrimers and other time-of-flight mass spectrometry This article is about the mass spectrometry technique. For other uses, see time-of-flight. Time-of-flight mass spectrometry (TOF-MS) is method of mass spectrometry in which ions are accelerated by an electric field of known strength. (MALDI-MS MALDI-MS Matrix assisted Laser Desorption Ionisation Mass Spectrometry ), m/z ions for protein fingerprint identification, sequence tag from tandem MS analysis, MS search data results, and protein identification search results. The derived objects in the database might include the study parameters, including experimental, biological, and toxicological details; processed gel images; annotated master gel images for each specific tissue or biological fluid; differentially expressed protein list determined from image analysis; feature (protein spot) table of estimated pI; MW; accession numbers; and protein functional groups. CEBS Phase III: Integrate Microarray Gene Expression and Protein Expression Databases Using a Gene/Protein Group Strategy The integration of microarray/gene expression and protein expression data is a critical step that will require development of knowledge of gene/protein functional relationships, gene/protein groups, and the development of algorithms that will increase our knowledge of the functions of these groups through actual experimentation. To build knowledge, we are mining the published literature for genes and groups of functionally related genes or protein products relevant to known end points in toxicology, pathology, cell regulatory processes, metabolism, and the like. This literature mining and analysis process is using vetted gene names, and the output will be groups of genes/proteins that represent putative functional groups based on the literature. We will then develop algorithms to test these putative functional gene groups derived from the literature against treatment-related expression profiles and against clustered genes (and coregulated ESTs) to confirm gene grouping on the basis of phenotype (Figure 8). [FIGURE 8 OMITTED] This literature-based functional classification of gene groups and their association with known toxicant-responsive pathways will begin to define the relationships between gene and protein expression and our conventional understanding of metabolism, toxicology/pathology, modulation and homeostasis homeostasis Any self-regulating process by which a biological or mechanical system maintains stability while adjusting to changing conditions. Systems in dynamic equilibrium reach a balance in which internal change continuously compensates for external change in a feedback , cell regulation, and cell signaling Cell signaling is part of a complex system of communication that governs basic cellular activities and coordinates cell actions. The ability of cells to perceive and correctly respond to their microenvironment is the basis of development, tissue repair, and immunity as well as . It will also offer an opportunity for discovery of yet unidentified genes (ESTs) that are co-regulated with known genes. To the extent possible, we will confirm gene group membership by sequence analysis and develop statistical procedures and algorithms (Wolfinger et al. 2001) to continually refine our-knowledge of gene/ protein groups and their relationship to functional pathways. With sequence definition of genes, proteins, and gene/protein group members, it will be possible to begin to BLAST outlier genes and proteins from new experimental data sets against data sets already contained in the CEBS database. This will begin to facilitate and inform the integration of transcriptomics and proteomics data sets across treatment, dose, time, tissue type, and phenotypic severity. We also propose to integrate metabonomics data sets into CEBS Phase III because of the pivotal role that metabolism plays in experimental and clinical toxicology as well as in hazard identification and risk assessment (Bundy et al. 2002; Holmes et al. 2000, 2001; Nicholson et al. 1999, 2002). CEBS Phase IV: Knowledge Technology The development of a knowledge base for systems toxicology will require merging several different knowledge-building strategies. In addition to mining the literature for chemical-specific functionally characterized gene/protein groups, testing putative functional gene/protein groups against treatment-related gene and protein expression profiles, and determining the relationships of these gene/protein groups to functional pathways, we will consult gene ontology from the GO Consortium, http://www.geneontology.org/, and attempt to verify the accuracy of the ontologies in terms of biological process, molecular function, and cellular component. This standard gene ontology reflects broad biological goals accomplished by ordered assemblies of molecular functions, tasks performed by individual gene products, and subcellular sub·cel·lu·lar adj. 1. Situated or occurring within a cell: subcellular organelles. 2. Smaller in size than ordinary cells: subcellular organisms. 3. structures, locations, and macromolecular mac·ro·mol·e·cule n. A very large molecule, such as a polymer or protein, consisting of many smaller structural units linked together. Also called supermolecule. complexes, respectively. Standardized gene and protein ontological relationships are significant in that they can help to define functional relationships among genes and groups of genes and proteins. Therefore, we will attempt to confirm the putative functional relationships across multiple molecular expression data sets in the evolving knowledge base. Gene/protein ontology is an important corollary to the gene/protein group strategy and may prove to be an effective approach to the integration of gene and protein expression data sets, especially if it can effectively be converted to a heuristic process. As a further adjunct to the knowledge building, a more complete and heuristic data compendium strategy will be devised based on statistical classification and clustering algorithms (to look for co-regulation) of genes and proteins as a function of dose, time, and target site (Figure 9). Here the experimental protocol defines the doses and the time course as well as the bioassays and biological measurements that will be made. The bioinformatics protocol specifies the various statistical and clustering algorithms that will be applied to look for correlated and coregulated genes. Ontologies will be used as described above. Note that an ontology lists similar elements, whereas a pathway describes an interaction among diverse elements. Using literature-derived putative gene groups (ideally vetted in appropriate gene ontologies), an iterative and heuristic gene/protein group phenotype analysis is expected to yield validated gene/protein groups that map to known functional pathways and, in terms of toxicology, to define the sequence of key events and common modes of action for environmental chemicals and drugs. Compendia of data will be assembled within each toxicogenomic and toxicological/pathological domain. [FIGURE 9 OMITTED] Thus, CEBS Phase IV will enable query by compound, structure and class, toxic or pathologic effects, gene annotation, gene/ protein groups, and functional (e.g., metabolic and toxicological) pathways that lead to toxicity and disease. To facilitate integration of compound-specific data sets, all genes, proteins, and gene/protein groups will be linked to the gene/protein name and sequence database that was described earlier. This will facilitate query using any of the query categories listed above. Ultimately, one will globally query (or BLAST) the CEBS knowledge base using a transcriptome of a tissue of interest (or a list of outliers from gene expression, or proteins from proteomics analysis) and have the knowledge base return all similar toxicogenomics data and data sets as well as contextually associated phenotypic information for compounds tested in various species and tissues represented in the knowledge base. This will be possible because of the derivation and maintenance of up-to-date sequence information on all genes and proteins represented in the knowledge base. In a sequence-driven knowledge base, a global query can return comparative genomic information (discussed below) based on BLAST cutoff values selected by the user. For example, a BLAST-[log.sub.10] (E-value) cutoff for human-to-human comparisons might be 250, whereas rat-to-human may be 150, and yeast-to-human may be as low as 100 or less, i.e., the cutoff values are significantly organism related and may not be related to the assigned names of genes. The actual cutoffs used must also consider the nature of the query sequence; in particular, 3' tails (poly-A containing) are more difficult to match across species than are full-length coding sequences. A Dose/Phenotype Strategy Another strategy to be carefully considered in the development of the CEBS knowledge base is one based on the lowest effective dose required to produce a particular molecular expression phenotype or phenotype severity. We believe that quantitative structure-activity relationship Quantitative structure-activity relationship (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. (QSARs) can be developed only for discrete toxicogenomic events and outcomes that can be anchored in effective dose and a particular toxicological/pathological response or outcome. Precise phenotypic anchoring of discrete toxicogenomic events (derivation of unique gene/protein group signatures) at their lowest effective dose will be possible only if the internal dose can be established or modeled for the particular agent or its metabolites Metabolites Substances produced by metabolism or by a metabolic process. Mentioned in: Interactions in the target tissue. This lowest effective dose/toxicant signature strategy has been employed successfully in the development of the U.S. Environmental Protection Agency/International Agency for Research on Cancer genetic activity profile database (Waters et al. 1991). Graphic profiles and corresponding data listings of lowest effective/highest ineffective doses for genotoxic genotoxic /ge·no·tox·ic/ (je´no-tok?sik) damaging to DNA: pertaining to agents known to damage DNA, thereby causing mutations, which can result in cancer. ge·no·tox·ic adj. agents in various cell types and organisms and for various end points are available in this database of approximately 700 compounds. To develop a similar database for toxicogenomic end points, one annotates and organizes gene expression data sets as a function of compound, organism, end point, dose, and time for select verified gene groups and co-regulated ESTs. One then plots, for example, as a histogram histogram or bar graph Graph using vertical or horizontal bars whose lengths indicate quantities. Along with the pie chart, the histogram is the most common format for representing statistical data. , outlier upregulated and downregulated genes for any appropriate toxicological or pathological end point as a function of lowest effective dose. Note that unidentified but co-regulated ESTs (i.e., ESTs associated with other genes seen to be upregulated or downregulated in response to an environmental toxicant) can contribute to the histogram and potentially to the generation of new knowledge about the mechanism of action of the compound. It should be noted that there will be primary, secondary, and tertiary effects of the same toxicant that will be distinguished from one another on the basis of the molecular and toxicological/ pathological phenotypes described and documented in the knowledge base. Resulting histogram plots are phenotypically anchored in dose and condition of target tissue and facilitate ready development of global QSARs for compounds and specific end points under consideration. Such a quantitative end point profiling approach can readily be combined with PB/PK and pharmacodynamic modeling. (In fact, such modeling can be used to derive an estimate of internal dose in the target tissue.) One then has the possibility to develop quantitative descriptions of the relationships among gene, protein, and metabolite expression profiles as a function of applied dose of the agent under consideration and to model ensuing kinetic and dynamic dose-response parameters in various tissue compartments. This is an important strategy for CEBS, as it will contribute directly to future advancements in PB/PK and pharmacodynamic modeling and support a formal quantitative risk-assessment process (Simmons and Portier 2002). Cross-Species Gene/Protein Comparative Expression Profiling With the availability of full genome sequences for several model organisms, there is intensive research toward the prediction, annotation, and mapping of genes across species. Of particular interest are the protein-coding genes and the intracellular signaling networks and their interactions. Similarities among novel protein sequences in model organisms have become an important and extremely useful source for hypotheses concerning protein function. Drosophila Drosophila: see fruit fly. drosophila Any member of about 1,000 species in the dipteran genus Drosophila, commonly known as fruit flies but also called vinegar flies. Some species, particularly D. melanogaster and Caenorhabditis elegans Caenorhabditis elegans (IPA: [ˌsiːnəʊræbˈdaɪtɪs ˈelegænz]) is a free-living nematode (roundworm), about 1 mm in length, which lives in temperate soil environments. are attractive animal model systems for studying human genes because of their genetic tractability and their phenotypically well-characterized genes (Chervitz et al. 1998; Culetto and Sattelle 2000; Nelson 1999a; Rubin and Merchant 2000). The genome database at the NCBI has assembled Clusters of Orthologous Groups (http://www.ncbi.nlm.nih.gov/COG/) for homologous homologous /ho·mol·o·gous/ (ho-mol´ah-gus) 1. corresponding in structure, position, origin, etc. 2. allogeneic. ho·mol·o·gous adj. 1. nucleotide sequences in more than 40 species, mainly microbial but including D. melanogaster, C. elegans C. elegans A nematode (Caenorhabditis elegans) that lives in soil, feeds on bacteria, and reaches lengths of about 1 mm (0.04 inch). It was the first animal whose genome was completely sequenced, and is widely used as a "model organism" by , and Saccharomyces Saccharomyces: see yeast. cerevisiae. The functional analysis of homologous genes in diverse genetic models is particularly relevant for proteins involved in human diseases to gain rapid understanding of human disease mechanisms and to enhance the probability for development of novel therapies (Rubin et al. 2000). A number of cell functions are regulated by similar gene families across organisms (e.g., genes for the regulation of the cell cycle, cytoskeleton cytoskeleton System of microscopic filaments or fibres, present in the cytoplasm of eukaryotic cells (see eukaryote), that organizes other cell components, maintains cell shape, and is responsible for cell locomotion and for movement of the organelles within it. , cell adhesion Cellular adhesion is the binding of a cell to another cell or to a surface or matrix. Cellular adhesion is regulated by specific adhesion molecules that interact with molecules on the opposing cell or surface. , cell signaling, and apoptosis apoptosis or programmed cell death Mechanism that allows cells to self-destruct when stimulated by the appropriate trigger. It may be initiated when a cell is no longer needed, when a cell becomes a threat to the organism's health, or for other reasons. ). This conservation of essential genes is also observed for transcription factors and many downstream signaling processes. It is believed that the completion of mouse, rat, and zebrafish genome sequencing efforts will provide information not only for the characterization of novel genes but also for the existence of homologous genes involved in every aspect of cell growth and functional differentiation. Gaining an understanding of the evolution and function of stress-response genes from yeasts to humans, for example, could be extremely valuable. Thus, we will provide within CEBS links appropriate genome information resources and eventually develop a comprehensive inventory of homologous genes/proteins across species from yeast to humans that may be important in toxicology and human disease. We anticipate that many of these homologous genes may be expressed similarly in response to environmental exposures that display similar modes of action. Strategically, these stressor-responsive genes and gene clusters could be crucial-for the interpretation of cross-genome expression profiles in an integrated health and ecological risk assessment. A core set of homologous genes should include genes involved in xenobiotic xen·o·bi·ot·ic adj. Foreign to the body or to living organisms. Used of chemical compounds. n. A xenobiotic chemical. xenobiotic any substance, harmful or not, that is foreign to the animal's biological system. activation/detoxification mechanisms, perturbations in cell homeostasis mechanisms, oxidative damage, cell injury, death, and regeneration, and genes controlling critical signaling mediator molecules for these biological processes. Phase I and Phase II enzymes metabolize me·tab·o·lize v. 1. To subject to metabolism. 2. To produce by metabolism. 3. To undergo change by metabolism. metabolize to subject to or be transformed by metabolism. most environmental xenobiotic chemicals, and much is known about their chemical substrates, inducers, and inhibitors. Phase I enzymes, the cytochromes P450 (CYPs), bioactivate as well as detoxify de·tox·i·fy v. 1. To counteract or destroy the toxic properties of a substance. 2. To remove the effects of poison from something, such as the blood. 3. xenobiotics. The primary step involved in the activation process mediated by CYP CYP In currencies, this is the abbreviation for the Cyprus Pound. Notes: The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion. proteins is oxidation, or bioactivation of xenobiotics to electrophiles. Phase II enzymes conjugate conjugate /con·ju·gate/ (kon´jdbobr-gat) 1. paired, or equally coupled; working in unison. 2. a conjugate diameter of the pelvic inlet; used alone usually to denote the true conjugate diameter; see some of these oxidized oxidized having been modified by the process of oxidation. oxidized cellulose see absorbable cellulose. metabolites to water-soluble excretable substances. We will begin our compilation of cross-species gene/protein comparative expression analysis by focusing on xenobiotic metabolic enzymes, the CYPs. Approximately 2,500 CYP genes have been characterized from many organisms (http://drnelson.utmem. edu/CytochromeP450.html), including bacteria and mammalian systems (Nebert and McKinnon 1994; Nelson 1999b; Nelson et al. 1996) and their substrate, inducer inducer /in·duc·er/ (in-dldbomacs´er) a molecule that causes a cell or organism to accelerate synthesis of an enzyme or sequence of enzymes in response to a developmental signal. in·duc·er n. , and inhibitor specificities must be studied in relation to alterations in molecular expression across species and across classes of xenobiotics. We anticipate that as homologous genes are identified, as compendia of gene/ protein expression profiles are developed, and as functional pathways are derived and studied across species, we will be able to begin defining the networks and systems level of biological organization, wherein the cell expresses global change in response to environmental stimuli. Again, we believe that fully context-documented toxicogenomics data sets and mathematical modeling will enable development of an integrated systems toxicology and bioinformatics. In summary, CEBS Phase IV will create the capability to assess the global toxicogenomic responses of biological systems to environmental stressors and to relationally link toxicogenomics data to conventional effects data. Because CEBS Phase IV will include data sets on multiple experimental organisms, cross-species comparisons and extrapolations will be possible at molecular, subcellular, cellular, organ, and systems levels. Further Development of the Phase IV CEBS Knowledge Base On the basis of the foregoing discussion and advances in the field, we have attempted to describe the basic strategies for the development of the core of the CEBS knowledge base as it is conceptualized at the present time. Two additional CEBS Phase IV modules are envisioned for the future. One is a transcription module that may be used to predict the expression of genes a priori a priori In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience. , and the other is a haplotype haplotype /hap·lo·type/ (-tip) the group of alleles of linked genes, e.g., the HLA complex, contributed by either parent; the haploid genetic constitution contributed by either parent. hap·lo·type n. linkage-disequilibrium module that may be used to predict the differential expression of genes in human haplotypes and to estimate the relative sensitivity of population subgroups. The transcription module will build upon rapidly developing knowledge of transcription factors and their pivotal importance in gene regulation. Because the number of transcription factors appears limited (around 2000 for humans), their study to include sequence definition and binding sites can be developed into a predictive science as related to gene and protein expression (Forde et al. 2002; Schrem et al. 2002; Wingender et al. 2000, 2001). The haplotype linkage-disequilibrium module, on the other hand, will take advantage of our evolving knowledge of human haplotypes and associated SNPs that confer differential responses within human population subgroups to various classes of environmental toxicants and stressors (Li 2001). This module will require the addition of a SNPs database. NIEHS has for some time been engaged in the development of the GeneSNPs Database (http://www.genome. utah.edu/genesnps/). It should be noted that SNPs represent only approximately 90% of all DNA sequence DNA sequence Genetics The precise order of bases–A,T,G,C–in a segment of DNA, gene, chromosome, or an entire genome. See Base pair, Base sequence analysis, Chromosome, Gene, Genome. variants. The remainder includes insertions, deletions, inversions, and duplications (1 base or many bases or kilobases). Any or all of these can be important in any gene being studied. We anticipate that the addition of a SNPs database will enable an understanding of the relationship between environmental exposures and human disease susceptibility (Li 2001). This module is important, therefore, both in a toxicological and in a risk-assessment context. Field and clinical research applications of toxicogenomics methods are anticipated by the NCT. It is well known that a single nucleotide polymorphism--a single base change in the message of a gene--can cause a protein to malfunction mal·func·tion v. 1. To fail to function. 2. To function improperly. n. 1. Failure to function. 2. Faulty or abnormal functioning. . Experimentally, it is possible to construct panels of mutants that enable discovery of the impacts of malfunctions in transcription and translation. Preliminary data indicate that gene expression profiles will be useful as diagnostic tools for identifying early stages of various pathologies, including cancer (Alaiya et al. 2002; Alizadeh et al. 2001; Golub et al. 1999; Perou et al. 2000). If this approach enables earlier detection of disease than is currently possible through other approaches, it may allow earlier initiation of therapeutic interventions. Additionally, gene expression profiling may become an important tool for predicting therapeutic outcome and may be particularly useful in addressing the significant variability that has been observed in how well patients respond to different types of drug therapy. Such patterns of variability have been studied using expression profiling and, in some cases, expression signatures have been associated with individuals who are responders or nonresponders for a particular type of drug therapy. Once this kind of result is validated, it may be possible to use expression profiling to optimize the therapeutic regimen for individual patients, thus increasing the chance of a good treatment outcome. It may also be possible to identify susceptible subpopulations for purposes of quantitative risk assessment. Conclusions The NCT and other organizations (Castle et al. 2002; Pennie and Kimber 2002; Waring et al. 2001) are performing experiments to validate the concept of gene expression profiles as signatures of toxicant classes, disease subtypes, or other biological end points. Initial studies indicate that classes of toxicants and toxic responses can be recognized as gene expression signatures using microarray technology. Such experiments have begun to correlate gene expression profiles with other well-defined parameters, including toxicant class, chemical structure, pathological or physiological response, or other validated indices of toxicity. For example, experiments have been designed to correlate gene expression patterns with liver pathologies such as necrosis, apoptosis, fibrosis, or inflammation. It is also possible to look for correlative patterns in surrogate tissues such as blood. Changes in serum enzymes provide diagnostic markers of organ function that are routinely used in medicine and in toxicology. Such phenotypic anchoring of gene expression data using conventional indices will distinguish the toxicological signal from other gene expression changes that may be unrelated to toxicity, such as the adaptive, pharmacological, or therapeutic effects of a compound. By constructing and populating the CEBS knowledge base, the NCT is assisting the field of environmental health research to evolve into an information science in which experimental gene and protein expression data sets are compiled and made readily available to the scientific community. Analysis of these expression profiles for different chemicals from different classes over dose and time can be used to identify expression profiles consistently and mechanistically mech·a·nis·tic adj. 1. Mechanically determined. 2. Philosophy Of or relating to the philosophy of mechanism, especially tending to explain phenomena only by reference to physical or biological causes. 3. linked to specific exposures and disease outcomes. Once sufficient high-quality data have been accumulated and assimilated, it will be possible to characterize an unknown biological or physical sample by comparing its gene and/or protein expression profile to compendia of expression profiles in the database (Hughes et al. 2000). The NCT will develop the capacity to use gene expression signatures to facilitate toxicological characterization of toxicants and their biological effects. As the field of toxicogenomics evolves, toxicogenomics databases will begin to support predictive toxicology and hazard assessment. This will help scientists predict the toxicological impact of suspected toxicants and calculate how much of a hazard these toxicants actually represent to human and environmental health. Infrastructure development is essential to facilitate the integration of the existing public toxicology and structure-activity databases with those under development in toxicogenomics (Richard and Williams 2002). In this way, conventional toxicology and structure--activity databases and the CEBS public knowledge base can realize their full potential in supporting mechanistic interpretations and risk assessment (Simmons and Portier 2002) in the future. Development of the databases must be concomitant with the evolution of bioinformatics and data mining tools and the individuals trained to apply them. The NIEHS is committed to the development of the CEBS knowledge base with which to initiate this evolutionary process. This article attempts to provide a vision of what the CEBS knowledge base will offer and, in general terms, how it will be constructed. The magnitude of the effort required to develop and populate To plug in chips or components into a printed circuit board. A fully populated board is one that contains all the devices it can hold. such a knowledge base requires a collective will and collaborative efforts. Therefore, we will pursue the interoperability of CEBS with other databases elsewhere (e.g., those on cell signaling, protein--protein interactions, and biological and metabolic pathways) to enhance our ability to interpret toxicogenomic data sets. We will seek to develop additional mechanisms through which partnerships with scientists in academic, private sector, and other governmental organizations can be created, and we welcome advice, criticism, and participation in this enterprise. As the CEBS knowledge base expands to include structurally or functionally related agents and as gene identity and annotation progresses, it will be possible to search in a comprehensive way for common, critical, or causal relationships. It will then be possible to create pathway maps of common cellular processes, to map partial genome arrays to pathways, and to link such changes to known phenotypic markers of toxicity. The proposed knowledge base and its relational linkages must grow incrementally, and the developers and users must have the patience and dedication to stay the course. Such incremental growth will eventually become exponential growth Extremely fast growth. On a chart, the line curves up rather than being straight. Contrast with linear. and the field of toxicology will be profoundly changed. In the realm of molecular epidemiology molecular epidemiology Molecular medicine An evolving field that combines the tools of standard epidemiology–case studies, questionnaires and monitoring of exposure to external factors with the tools of molecular biology–eg, restriction endonucleases, , our growing understanding of genomic anatomy (gene sequence and polymorphisms) will form the basis for characterizing person-to-person and ethnogeographic sequence variations in genes that affect responses to drugs and chemicals that affect human susceptibility/vulnerability. Eventually, gene and protein expression profiles from exposed humans (and from organisms in the environment) will be compared with reference expression profiles based on national or international gene expression databases (Ermolaeva et al. 1998). Studying and analyzing patterns of gene expression across species will help us understand the relationship between DNA sequence variation and the phenotype, which in turn will help us understand and integrate the assessment of human and ecological risk. Given the vast numbers and diversity of drugs, chemicals, and environmental agents, the diversity of species in which they act, the time and dose factors critical to the induction of beneficial and adverse effects, the diversity of phenotypic consequences of exposures, etc., it is only through the development of a profound knowledge base that toxicology and environmental health can rapidly advance. Toxicogenomics has the potential to change how environmental toxicology is performed. It will contribute new methods, new data, and new interpretation to the field. The ultimate goal of the NCT is to create a knowledge base that allows environmental health scientists and practitioners to understand and prevent adverse environmental exposures in the 21st century. Where is the wisdom we have lost in knowledge? Where is the know[edge we have lost in information?"--T. S. Elliott
Table 1. Some major biological data and information resources for gene
annotation. (a)
Subject Source
Biomedical literature PubMed
Nucleic acid sequence GenBank
(e.g., for the rat) RGD
Annotation (mouse) MGI
Genome sequence GenBank
TIGR
Protein sequence GenBank
Swiss-Prot
Protein structure Protein DB
PIR
Protein mass spectra PROWL
Posttranslational mods RESID
Biochemical pathways KEGG
WIT
PathDB
Subject Link
Biomedical literature http://www.ncbi.nlm.nih.gov/entrez/query.fcgil
Nucleic acid sequence http://www.ncbi.nlm.nih.gov:80/entrez/
(e.g., for the rat) query.fcgi?db=nucleotide
http://rgd.mcw.edu/ebEST; http://rgd.mcw.edu/
EBEST/
Annotation (mouse) http://www.informatics.jax.org/
Genome sequence http://www.ncbi.nlm.nih.gov:80/entrez/
query.fcgi?db=genome
http://www.ncbi.nih.gov/PMGifs/Genomes/
euk_g.html
http://www.tigr.org/tdb/; http://www.tigr.org/
tdb/tgi/
Protein sequence http://www.ncbi.nlm.nih.gov:80/entrez/
query.fcgi?db=protein
http.//www.expasy.ch/sprot/
Protein structure http://www.rcsb.org/pdb/
http://www-nbrf.georgetown.edu
Protein mass spectra http://prowl.rockefeller.edu
Posttranslational mods http://www-nbrf.georgetown.edu/pirwww/search/
textresid.html
Biochemical pathways http://www.genome.ad.jp/kegg//
http://wit.mcs.anl.gov/WIT2/; http.//
emp.mcs.anl.gov
http://www.ncgr.org/software/pathdb/
(a) Adapted from Gibas and Jambeck (2001).
REFERENCES Aardema MJ, MacGregor JT. 2002. Toxicology and genetic toxicology in the new era of "toxicogenomics": impact of "-omics" technologies. Mutat Res 499(1):13-25. Afshari CA. 2002. Perspective: microarray technology, seeing more than spots. Endocrinology 143(6):1983-1989. Alaiya AA, Franzen B, Hagman A, Dysvik 8, Roblick UJ, Becker S, et al. 2002. Molecular classification of borderline ovarian ovarian /ovar·i·an/ (o-var´e-an) pertaining to an ovary or ovaries. ovarian pertaining to an ovary. ovarian agenesis tumors using hierarchical cluster analysis Cluster analysis A statistical technique that identifies clusters of stocks whose returns are highly correlated within each cluster and relatively uncorrelated across clusters. Cluster analysis has identified groupings such as growth, cyclical, stable, and energy stocks. of protein expression profiles. Int J Cancer 98(6):895-899. Alizadeh AA, Ross DT, Perou CM, van de Rijn M. 2001. Towards a novel classification of human malignancies based on gene expression patterns. J Pathol 195(1):41-52. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215(3):403-410. Bartosiewicz MJ, Jenkins D, Penn S, Emery J, Buckpitt A. 2001. Unique gene expression patterns in liver and kidney associated with exposure to chemical toxicants. J Pharmacol Exp Ther 297(3):895-905. Bessems JG, Vermeulen NP. 2001. Paracetamol (acetaminophen)-induced toxicity: molecular and biochemical mechanisms, analogues and protective approaches. Crit Rev Toxicol 31(1):55-138. Boorman GA, Anderson SP, Casey WM, Brown RH, Crosby LM, Gottschalk K, et al. 2002. Toxicogenomics, drug discovery, and the pathologist. Toxicol Pathol 30(1):15-27. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. 2001. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet genet: see civet. 29(4):365-371. Bundy JG, Spurgeon DJ, Svendsen C, Hankard PK, Osborn D, Lindon JC, et al. 2002. Earthworm earthworm, terrestrial, cylindrical segmented worm of the class Oligochaeta. There are 2,200 earthworm species, found all over the world except in arid and arctic regions and ranging in size from 1 in. (2.5 cm) to the 11-ft (330-cm) giant worms of the tropics. species of the genus Eisenia can be phenotypically differentiated by metabolic profiling. FEBS FEBS Federation of European Biochemical Societies Lett 521(1-3):115-120. Burchiel SW, Knall CM, Davis JW II, Paules RS, Boggs SE, Afshari CA. 2001. Analysis of genetic and epigenetic epigenetic /epi·ge·net·ic/ (-je-net´ik) 1. pertaining to epigenesis. 2. altering the activity of genes without changing their structure. mechanisms of toxicity: potential roles of toxicogenomics and proteomics in toxicology. Toxicol Sci 59(2):193-195. Bushel PR, Hamadeh H, Bennett L, Sieber S, Martin K, Nuwaysir EF, et al. 2001. MAPS: a microarray project system for gene expression experiment information and data validation In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called validation rules, that check for correctness or meaningfulness of data that are input to the system. . Bioinformatics 17(6):564--565. Castle AL, Carver MP, Mendrick DL. 2002. Toxicogenomics: a new revolution in drug safety. Drug Discov Today 7(13):728-736. Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, et al. 1998. Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282(5396):2022-2028. Culetto E, Sattelle DB. 2000. A role for Caenorhabditis elegans in understanding the function and interactions of human disease genes. Hum Mol Genet 9(6):869-877. Cunningham M J, Liang S, Fuhrman S, Seilhamer JJ, Somogyi R. 2000. Gene expression microarray data analysis for toxicology profiling. Ann N Y Acad Sci 919:52-67. Dudley AM, Aach J, Steffen MA, Church GM. 2002. Measuring absolute expression with microarrays with a calibrated cal·i·brate tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates 1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument): reference sample and an extended signal intensity range. Proc Natl Acad Sci U S A 99(11):7554-7559. Edgar R, Domrachev M, Lash AE. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository See repository. . Nucleic Acids Nucleic acids The cellular molecules DNA and RNA that act as coded instructions for the production of proteins and are copied for transmission of inherited traits. Res 30(1):207-210 Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y, et al. 1998. Data management and analysis for gene expression arrays. Nat Genet 20(1):19-23. Farland WH. 1992. The U.S. Environmental Protection Agency's Risk Assessment Guidelines: current status and future directions. Toxicol Ind Health 8(3):205-212. --. 1996. Cancer risk assessment: evolution of the process. Prey Med 25(1):24-25. Fielden MR, Zacharewski TR. 2001. Challenges and limitations of gene expression profiling in mechanistic and predictive toxicology. Toxicol Sci 60(1):6-10. Forde CE, Gonzales AD, Smessaert JM, Murphy GA, Shields S J, Fitch JP, et al. 2002. A rapid method to capture and screen for transcription factors by SELDI mass spectrometry. Biochem Biophys Res Commun 290(4):1328-1335. Gibas C, Jambeck P. 2001 Developing Bioinformatics Computer Skills. Sebastopol, CA:O'Reilly & Associates, Inc. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Science 286(5439):531-537. Hamadeh HK, Amin RP, Paules RS, Afshari CA. 2002a. An overview of toxicogenomics. Curr Issues Mol Biol 4(2):45-56. Hamadeh HK, Bushel PR, Jayadev S, DiSorbo O, Bennett L, Li L, et al. 2002b. Prediction of compound signature using high density gene expression profiling. Toxicol Sci 67(2):232-240. Hamadeh HK, Bushel PR, Jayadev S, Martin K, DiSorbo O, Sieber S, et al. 2002c. Gene expression analysis reveals chemical-specific profiles. Toxicol Sci 67(2):219-231. Hamadeh HK, Knight BL, Haugen AC, Sieber S, Amin RP, Bushel PR, et al. 2002d. Methapyrilene toxicity: anchorage of pathologic observations to gene expression alterations. Toxicol Pathol 30:470-482. Holmes E, Nicholls AW, Lindon JC, Connor SC, Connelly JC, Haselden JN, et al. 2000. Chemometric models for toxicity classification based on NMR NMR: see magnetic resonance. spectra of biofluids. Chem Res Toxicol 13(6):471-478. Holmes E, Nicholson JK, Tranter G. 2001. Metabonomic characterization of genetic variations in toxicological and metabolic responses using probabilistic (probability) probabilistic - Relating to, or governed by, probability. The behaviour of a probabilistic system cannot be predicted exactly but the probability of certain behaviours is known. Such systems may be simulated using pseudorandom numbers. neural networks. Chem Res Toxicol 14(2):182-191. Hughes TR, Marton MJ, Jones AR, Roberts C J, Stoughton R, Armour CD, et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102(1):109-126. Ideker T, Galitski T, Hood L. 2001. A new approach to decoding life: systems biology Annu Rev Genomics Hum Genet 2:343-372. Issaq HJ, Veenstra TD, Conrads TP, Felschow D. 2002. The SELDI-TOF SELDI-TOF Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight MS approach to proteomics: protein profiling and biomarker identification. Biochem Biophys Res Commun 292(3):587-592. Karp PD. 2000. An ontology for biological function based on molecular interactions. Bioinformatics 16(3):269-285. Larsen JC, Farland W, Winters D. 2000. Current risk assessment approaches in different countries. Food Addit Contam 17(4):359-369. Li H. 2001. A permutation One possible combination of items out of a larger set of items. For example, with the set of numbers 1, 2 and 3, there are six possible permutations: 12, 21, 13, 31, 23 and 32. (mathematics) permutation - 1. procedure for the haplotype method for identification of disease-predisposing variants. Ann Hum Genet 65:189-196. Nebert DW, McKinnon RA. 1994. Cytochrome cytochrome (sī`təkrōm'), protein containing heme (see coenzyme) that participates in the phase of biochemical respiration called oxidative phosphorylation. P450: evolution and functional diversity. Prog Liver Dis 12:63-97. Nelson DR. 1999a. Cytochrome P450 and the individuality of species. Arch Biochem Biophys 369(1):1-10. --. 1999b. A second CYP26 P450 in humans and zebrafish: CYP26B1. Arch Biochem Biophys 371(2):345-347. Nelson DR, Koymans L, Kamataki T, Stegeman JJ, Feyereisen R, Waxman DJ, et al. 1996. P450 superfamily superfamily /su·per·fam·i·ly/ (soo´per-fam?i-le) 1. a taxonomic category between an order and a family. 2. : update on new sequences, gene mapping gene mapping n. The determination of the sequence of genes and their relative distances from one another on a specific chromosome. , accession numbers and nomenclature nomenclature /no·men·cla·ture/ (no´men-kla?cher) a classified system of names, as of anatomical structures, organisms, etc. binomial nomenclature . Pharmacogenetics Pharmacogenetics Definition Pharmacogenetics is the study of how the actions of and reactions to drugs vary with the patient's genes. Description 6(1):1-42. Nicholson JK, Connelly J, Lindon JC, Holmes E. 2002. Metabonomics: a platform for studying drug toxicity and gene function Nat Rev Drug Discov 1(2):153-161. Nicholson JK, Lindon JC, Holmes E. 1999. 'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic spec·tro·scope n. An instrument for producing and observing spectra. spec tro·scop data. Xenobiotica 29(11):1181-1189.Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA. 1999. Microarrays and toxicology: the advent of toxicogenomics. Mol Carcinog 24(3):153-159. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27(1):29-34. Olden K. 2002. New opportunities in toxicology in the post-genomic era. Drug Discov Today 7(5):273--276. Pennie WD, Kimber I. 2002. Toxicogenomics; transcript profiling and potential application to chemical allergy. Toxicol In Vitro 16(3):319-326. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. 2000. Molecular portraits of human breast tumours. Nature 406(6797):747-752. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, et al. 2001. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29(1):159-164. Reilly TP, Bourdi M, Brady JN, Pise-Masison CA, Radonovich MF, George JW, et al. 2001a. Expression profiling of acetaminophen liver toxicity in mice using microarray technology. Biochem Biophys Res Commun 282(1):321-328. Reilly TP, Brady JN, Marchick MR, Bourdi M, George JW, Radonovich MF, et al. 2001b. A protective role for cyclooxygenase-2 in drug-induced liver injury in mice. Chem Res Toxicol 14(12):1620-1628. Richard AM, Williams CR. 2002. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat Res 499(1):27-52. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, et al. 2000. Comparative genomics Comparative genomics is the study of relationships between the genomes of different species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary processes that act on of the eukaryotes. Science 287(5461):2204-2215. Rubin RB, Merchant M. 2000. A rapid protein profiling system that speeds study of cancer and other diseases. Am Clin Lab CLIN LAB Clinical Laboratory / Klinisches Labor (Journal) 19(8):28-29. Ruepp SU, Tonge RP, Shaw J, Wallis N, Pognan F. 2002. Genomics and proteomics analysis of acetaminophen toxicity in mouse liver. Toxicol Sci 65(1):135-150. Schrem H, Klempnauer J, Borlak J. 2002. Liver-enriched transcription factors in liver function and development. Part I: The hepatocyte hepatocyte /hep·a·to·cyte/ (hep´ah-to-sit?) a hepatic cell. hep·a·to·cyte n. A parenchymal liver cell. Hepatocyte A liver cell. nuclear factor network and liver-specific gene expression. Pharmacol Rev 54(1):129-158. Selkov E Jr., Grechkin Y, Mikhailova N, Selkov E. 1998. MPW MPW Macintosh Programmer's Workshop (Mac OS Software Development Environment) MPW Multi Product Wafer MPW Maine Photographic Workshops (Rockport, Maine) MPW Multiple Plane Wave MPW Multi-Purpose Workstation : the Metabolic Pathways Database. Nucleic Acids Res 26(1):43-45. Simmons PT, Portier CJ. 2002. Toxicogenomics: the new frontier New Frontier President John F. Kennedy’s legislative program, encompassing such areas as civil rights, the economy, and foreign relations. [Am. Hist.: WB, K:212] See : Aid, Governmental in risk analysis. Carcinogenesis car·ci·no·gen·e·sis n. The production of cancer. carcinogenesis production of cancer. biological carcinogenesis viruses and some parasites are capable of initiating neoplasia. 23(6):903-905. Sluka JP. 2002. Extracting knowledge from genomic experiments by incorporating the biomedical literature. In: Methods of Microarray Data Analysis, II (Lin SM, Johnson KF, eds). Boston:Kluwer. Tennant RW. 2002. The National Center for Toxicogenomics: using new technologies to inform mechanistic toxicology. Environ Health Perspect 110(1):AS-A10. Thomas RS, Rank DR, Penn SG, Zastrow GM, Hayes KR, Pande K, et al. 2001. Identification of toxicologically predictive gene sets using cDNA microarrays. Mol Pharmacol 60(6):1189-1194. Ulrich R, Friend SH. 2002. Toxicogenomics and drug discovery: will new technologies help us produce better drugs? Nat Rev Drug Discov 1(1):84-88. Waring JF, Cavet G, Jolly RA, McDowell J, Dai H, Ciurlionis R, Zhang C, Stoughton R, Lure P, Ferguson A, et al. Development of a DNA microarray for oxicology based on hepatotoxin-regulated sequences. Environ Health Perspect 111:863-870(2003). Waring JF, Jolly RA, Ciurlionis R, Lure PY, Praestgaard JT, Morfitt DC, et al. 2001. Clustering of hepatotoxins based on mechanism of toxicity using gene expression profiles. Toxicol Appl Pharmacol 175(1):28-42. Waters MD, Stack HF, Garrett NE, Jackson MA. 1991. The Genetic Activity Profile database. Environ Health Perspect 96:41-45. Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, et al. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res 29(1):281-283. Wingender E, Chen X, Hehl R, Karas Karas may refer to:
Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, et al. 2001. Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8(6):625-637. Yamazaki K, Kuromitsu J, Tanaka I. 2002. Microarray analysis of gene expression changes in mouse liver induced by peroxisome proliferator-activated receptor In cell biology, peroxisome proliferator-activated receptors (PPARs) are a group of nuclear receptor isoforms that exist across biology. They are intimately connected to cellular metabolism (carbohydrate, lipid and protein) and cell differentiation. alpha agonists. Biochem Biophys Res Commun 290(3):1114-1122. Zweiger G. 1999. Knowledge discovery in gene-expression-microarray data: mining the information output of the genome. Trends Biotechnol 17(11):429-436. This article was previously published in the inaugural Toxicogenomics Section of EHP EHP abbr. 1. effective horsepower 2. electric horsepower . Address correspondence to M.D. Waters, NIEHS, PO Box 12233, MD F1-05, 111 Alexander Drive, Research Triangle Park Research Triangle Park, research, business, medical, and educational complex situated in central North Carolina. It has an area of 6,900 acres (2,795 hectares) and is 8 × 2 mi (13 × 3 km) in size. Named for the triangle formed by Duke Univ. , NC 27709 USA. Telephone: (919) 316-4589. Fax: (919) 541-1460. E-mail: waters2@niehs.nih.gov We thank the following scientists for valuable discussions on the proposed development of the CEBS knowledge base: C. Afshari on basic design characteristics, R. DeWoskin on PB/PK modeling, H. Hamadeh on user interfaces and gene annotation, P.H.M. Lohman on dose profiling, dynamic linkage, and P450 metabolism, S. Nadadur on comparative genomics, N. Stegman, J. Nehls, and J. Doherty on database design and information technology, and J. Sluka on literature mining and PDQ_MED. We also thank A. Abu-Shakra, N. Cariello, S. Eastin, J. Grovenstein, P. Nettesheim, L. Tomatis, S. Sansone, H. Wan, and L. Wright for their many helpful ideas and comments on the manuscript. We are indebted to S. Wilson and L. Birnbaumer for their continuing support of the NCT program. Received 30 August 2002; accepted 25 October 2002. Michael Waters, Gary Boorman, Pierre Bushel, Michael Cunningham Michael Cunningham (born November 6, 1952) is an award-winning American writer, best known for his 1998 novel The Hours, which won the Pulitzer Prize for Fiction and the PEN/Faulkner Award in 1999. , Rick Irwin, Alex Merrick, Kenneth Olden, Richard Paules, James Selkirk, Stanley Stasiewicz, Brenda Weis, Ben Van Houten Van Houten may refer to:
National Center for Toxicogenomics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina North Carolina, state in the SE United States. It is bordered by the Atlantic Ocean (E), South Carolina and Georgia (S), Tennessee (W), and Virginia (N). Facts and Figures Area, 52,586 sq mi (136,198 sq km). Pop. USA |
|
||||||||||||||

tro·scop
Printer friendly
Cite/link
Email
Feedback
Reader Opinion