Gene interaction network suggests dioxin induces a significant linkage between aryl hydrocarbon receptor and retinoic acid receptor beta.Gene expression arrays (gene chips) have enabled researchers to roughly quantify the level of mRNA expression for a large number of genes in a single sample. Several methods have been developed for the analysis of gene array data including clustering, outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results. outlier an extremely high or low value lying beyond the range of the bulk of the data. detection, and correlation studies. Most of these analyses are aimed at a qualitative identification of what is different between two samples and/or the relationship between two genes. We propose a quantitative, statistically sound methodology for the analysis of gene regulatory networks using gene expression data sets. The method is based on Bayesian networks for direct quantification of gene expression networks. Using the gene expression changes in HPL HPL - Language used in HP9825A/S/T "Desktop Calculators", 1978(?) and ported to the early Series 200 family (9826 and 9836, 68000). Fairly simple and standard, but with extensive I/O support for data acquisition and control (BCD, Serial, 16 bit custom and IEEE 488 interfaces), 1A lung airway epithelial cells Epithelial cells Cells that form a thin surface coating on the outside of a body structure. Mentioned in: Corneal Transplantation after exposure to 2,3,7,8-tetrachlorodibenzo-p-dioxin at levels of 0.1, 1.0, and 10.0 nM for 24 hr, a gene expression network was hypothesized and analyzed. The method clearly demonstrates support for the assumed network and the hypothesis linking the usual dioxin dioxin Aromatic compound, any of a group of contaminants produced in making herbicides (e.g., Agent Orange), disinfectants, and other agents. Their basic chemical structure consists of two benzene rings connected by a pair of oxygen atoms; when substituents on the rings are expression changes to the retinoic acid receptor The retinoic acid receptor (RAR) is a type of nuclear receptor[1] which is activated by both all-trans retinoic acid and 9-cis retinoic acid.[2] There are three retinoic acid receptors (RAR), RAR-alpha, RAR-beta, and RAR-gamma encoded by the RARA system. Simulation studies demonstrated the method works well, even for small samples. Key words: Bayesian networks, dioxin, gene regulatory networks, Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. , retinoic acid receptor, risk assessment, systems biology Systems biology, a field of study in the biosciences, focuses on the systematic study of complex interactions in biological systems. Particularly from 2000 onwards, the term is used widely in the biosciences, and in a variety of contexts. , toxicogenomics. ********** Gene expression arrays (gene chips) have enabled researchers to simultaneously monitor the approximate level of mRNA expression for a large number of genes. These mRNA expression levels are one component of the machinery that controls the function and survival of cells; the other components constitute the other major biochemical constituents of a cell such as the actual DNA sequence DNA sequence Genetics The precise order of bases–A,T,G,C–in a segment of DNA, gene, chromosome, or an entire genome. See Base pair, Base sequence analysis, Chromosome, Gene, Genome. , protein levels, and cellular substructures. Signal transduction Signal transduction The transmission of molecular signals from a cell's exterior to its interior. Molecular signals are transmitted between cells by the secretion of hormones and other chemical factors, which are then picked up by different cells. pathways have long been used to describe the sequence of biochemical events that control cellular function and generally include all aspects of the biochemistry of a cell. In the absence of full proteomic data (both primary proteins and modified proteins), it is valuable to understand the quantitative relationship between genes, which we will refer to as gene expression networks. The rates derived from the quantification of gene expression networks provide crude estimates for the overall rates linking genes through complicated signaling pathways. In addition, hypothesized linkages between genes will aid in focusing research efforts in other areas such as proteomics, metabolomics, and toxicologic assays. We recently used toxicogenomic analysis to examine the response of human peripheral lung epithelial cells to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD TCDD tetrachlorodibenzodioxin. , dioxin) in vitro in vitro /in vi·tro/ (in ve´tro) [L.] within a glass; observable in a test tube; in an artificial environment. in vi·tro adj. In an artificial environment outside a living organism. (Martinez et al. 2002). Exposure to this persistent environmental pollutant has been associated in human populations with increased risk of lung cancer lung cancer, cancer that originates in the tissues of the lungs. Lung cancer is the leading cause of cancer death in the United States in both men and women. Like other cancers, lung cancer occurs after repeated insults to the genetic material of the cell. and chronic obstructive pulmonary disease chronic obstructive pulmonary disease n. Abbr. COPD A chronic lung disease, such as asthma or emphysema, in which breathing becomes slowed or forced. ; therefore, understanding its mechanism of action may provide insights into the risk of persistent human exposure not only to TCDD but to other ligands of the aryl hydrocarbon receptor The Aryl hydrocarbon receptor (AhR) is member of the family of basic-helix-loop-helix transcription factors. AhR is a cytosolic transcription factor that is normally inactive, bound to several co-chaperones. (AhR). In this study we showed a variety of cell-signaling pathways that exhibited a dose-dependent alteration by TCDD. One observation in this study was an alteration in retinoic acid retinoic acid /ret·i·no·ic ac·id/ (ret?i-no´ik) an oxidized derivative of retinol, believed to be the form of vitamin A that plays a role in the development and growth of bone and in the maintenance of normal epithelial structures. (RA)-responsive genes. Alterations in RA homeostasis homeostasis Any self-regulating process by which a biological or mechanical system maintains stability while adjusting to changing conditions. Systems in dynamic equilibrium reach a balance in which internal change continuously compensates for external change in a feedback have been observed previously in rodents, leading to a retinoid-deficient state. In addition TCDD exposure in rats has been associated with increased incidence of squamous squamous /squa·mous/ (skwah´mus) scaly or platelike. squa·mous or squa·mose adj. 1. Covered with or formed of scales; scaly. 2. neoplastic neoplastic /neo·plas·tic/ (ne?o-plas´tik) 1. pertaining to a neoplasm. 2. pertaining to neoplasia. neoplastic pertaining to neoplasia or a neoplasm. and nonneoplastic lesions including squamous cell carcinoma squamous cell carcinoma n. A carcinoma that arises from squamous epithelium and is the most common form of skin cancer. Also called cancroid, epidermoid carcinoma. of the lung and hard palate hard palate n. The anterior part of the palate, consisting of the bony palate covered above by the mucous membrane of the nose, and below by the mucoperiosteum of the roof of the mouth. region the oral mucosa The oral mucosa is the mucous membrane epithelium of the mouth. It can be divided into three categories.
1. resembling the retina. 2. retinal, retinol, or any structurally similar natural derivative or synthetic compound, with or without vitamin A activity. signaling can affect the differentiation of squamous epithelia ep·i·the·li·a n. A plural of epithelium. , it is possible that the increase in these squamous lesions may be due to a retinoid-deficient state induced by the alteration in retinoid homeostasis. Identification of the retinoid-responsive genes in the TCDD microarray analyses suggested a functional relationship between AhR activation and retinoid homeostasis and/or signaling in the human lung The human lungs are the human organs of respiration. Humans have two lungs, with the left being divided into two lobes and the right into three lobes. Together, the lungs contain approximately 1500 miles (2,400 km) of airways and 300 to 500 million alveoli, having a total epithelial cells. Although such relationships can be tested empirically, invariably in·var·i·a·ble adj. Not changing or subject to change; constant. in·var i·a·bil a large number of functional relationships are possible
within a given microarray data set; therefore, priority setting for
functional validation studies is often a challenge. In this article we
develop a computational approach for evaluating the likelihood that
observed changes in gene expression are due to hypothesized functional
relationships. We then test the AhR-retinoid interaction using this
method.Several methods have already been proposed for the analysis of gene expression data. The most commonly used methods rely on description of simple fold increases in expression, phylogenetic tree phylogenetic tree Diagram showing the evolutionary interrelations of a group of organisms that usually originated from a shared ancestral form. The ancestor is in the tree trunk; organisms that have arisen from it are placed at the ends of tree branches. analyses, clustering methods, classification methods, or combinations of these. Methods have also been proposed to develop gene expression networks using dynamical systems Dynamical Systems A system of equations where the output of one equation is part of the input for another. A simple version of a dynamical system is linear simultaneous equations. Non-linear simultaneous equations are nonlinear dynamical systems. defined by ordinary differential equations (Chen et al. 1999), modified linear regression Linear regression A statistical technique for fitting a straight line to a set of data points. methods (Gardner et ai. 2003), Boolean networks (Akutsu et al. 2000) where gene expression data are converted to two states (ON and OFF), discrete networks (Hartemink et al. 2002), and many others. Bayesian networks (Friedman et al. 2000; Pe'er et al. 2001) have been proposed as a means of identifying gene interaction networks (Imoto et al. 2002; Tamayo et al. 1999) and for predicting protein-protein interactions using a combination of different types of genomic data (Jansen 2003). Many of the available methods are discussed in a recent review article (Lockhart and Winzeler 2000). Few methods exist that combine careful statistical estimation and hypothesis testing hypothesis testing In statistics, a method for testing how accurately a mathematical model based on one set of data predicts the nature of other data sets generated by the same process. with quantitative gene quantitative gene n. See polygene. interaction models to provide a systems biology-based approach for the analysis of microarray data. In this article, a Bayesian network approach (Friedman et al. 2000; Imoto et al. 2002) previously suggested is modified to provide direct quantification of gene expression networks using microarray data for a known network. This analytical approach provides a model that can be used for mechanism-based mathematical models and for formal analyses of biological hypotheses. Materials and Methods Definition of Gene Expression Network The basic concept for Bayesian networks in the analysis of gene expression data has been described previously (Friedman et al. 2000; Imoto et al. 2002; Tamada et al. 2003). A gene expression network consists of a collection of P genes, denoted by [X.sub.1], [X.sub.2], ... [X.sub.p], linked by weighting functions, [w.sub.i] ([[theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ].sub.i]) (i = 1,2, ... P), where the subscript i denotes that this weighting function pertains to the control of gene [X.sub.i] by all genes linked to it and [[theta].sub.i] denotes the vector of parameters defining the functional relationship. In cases where the relationship between individual genes is monotonic monotonic - In domain theory, a function f : D -> C is monotonic (or monotone) if for all x,y in D, x <= y => f(x) <= f(y). ("<=" is written in LaTeX as \sqsubseteq). (i.e., [X.sub.i] either stimulates or inhibits [X.sub.j] but cannot have mixed effect), such a network can be easily represented graphically as in Figure l. Figure 1 is a simple gene expression network consisting of four genes (squares) and four weighting functions (circles), with lines linking the genes and the weighting functions. Two kinds of lines appear in the model. A line with a bar implies inhibition (e.g., gene [X.sub.3] inhibits gene [X.sub.4] in Figure 1), and a line with no bar implies stimulation (e.g., gene [X.sub.1] stimulates gene [X.sub.4]). No line between genes implies these genes have no direct relationship to each other ([X.sub.2] and [X.sub.4] are not directly linked). The weighting function combining the effects of genes [X.sub.1] and [X.sub.3] on gene [X.sub.4] is denoted by [w.sub.4] ([[theta].sub.4]) in Figure 1. [FIGURE 1 OMITTED] The vector W([theta]) - [[w.sub.1]([[theta].sub.1]) [w.sub.2]([[theta].sub.2]) ... [w.sub.p]([[theta].sub.p])] fully characterizes the functional relationships between genes in a gene expression network and is the target of any estimation effort to identify and quantify a network. The functional form that can be used for any individual [w.sub.i]([[theta].sub.i]) is not restricted. One example is the log-linear gene expression network. Log-Linear Gene Expression Network One of the simplest types of weighting function used to describe a gene expression network is the log-linear weighting function given by the following form: [1] [MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression. NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ], where [x.sub.j] is the observed level of expression (or ratios of expression) of gene [X.sub.j], [[beta].sub.ji], is the magnitude by which a change in one log unit of gene [X.sub.j] will affect the level of expression of gene [X.sub.i], and [I.sub.ji], is an indicator variable describing the direction of the change denoted by [[beta].sub.ji], where [I.sub.ji] = 1 for stimulation, [I.sub.ji] = -1 for inhibition, and [I.sub.ji] = 0 for no effect. For simplicity of notation, we define B = [[beta].sub.ji]].sub.j=1,2 ... p, i=1,2 ... p], T = [[I.sub.ji].sub.j=1,2 ... p, i=1,2 ... p], and A = [log([x.sub.1]), log([x.sub.2]), ... log([x.sub.p])], where we refer to T as the transition matrix and B as the parameter matrix. It is then possible to rewrite Equation 1 in its matrix form given by [2] W ([theta]) = [e.sup.A(B x T)], where [theta] = [[beta].sub.11] [[beta].sub.12] ... [[beta].sub.pp], and the dot represents element-by-element multiplication of B and T. In the example given by Figure 1, the matrices B and Tare 4 x 4 matrices and have only 4 nonzero non·ze·ro adj. Not equal to zero. nonzero Not equal to zero. elements each [(1,3), (1,4), (2,3), and (3,4)], so the vector of parameters is [theta] = [[beta].sub.13] [beta].sub.14] [beta].sub.23] [beta].sub.34]. Tamada et al. (2003) used a nonparametric B-spline for [w.sub.i]([[theta].sub.i]). Such a method could be used in this context as well, where the breakpoints in the splines are at individual doses or times used for an experimental design. The transition matrix provides the qualitative structure of the gene expression network, and the parameter matrix quantities the strength of the relationship between the genes. In the following we use [N.sub.p] ([theta]) to represent a general gene expression network with P genes and [N.sub.TP], ([theta]) to specifically represent a log-linear gene expression network with P genes. Bayesian Network Estimation Procedure Like any other biological measurement, it can be presumed that two observations taken from seemingly identical examples may differ because of uncontrolled variables or simple random fluctuation; this difference is traditionally defined as random variation about the mean behavior in a model. With random variation, x = [[x.sub.1], [x.sub.2] ... [x.sub.p]] is an observation from a random matrix X = [[X.sub.1], [X.sub.2] ... [X.sub.p]]. The simplest method by which random variation can be included in a gene interaction network is to assume that [X.sub.i] is conditional on knowledge of the other X's and [theta] follows a prescribed probability density function Probability density function The function that describes the change of certain realizations for a continuous random variable. . Define [X.sub.i] = [[X.sub.1], [X.sub.2], ... [X.sub.i-1], [X.sub.i+1], ... [X.sub.p]] and define [f.sub.i]([X.sub.i], |[bar.X].sub.i] [theta]) to be the conditional density of [X.sub.i]. If a gene has a regulatory effect on gene [X.sub.i], that gene is referred to as a "parent of gene [X".sub.i]; in other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , it belongs to the set referred to by Pa([X.sub.i]). Hence, for exampie, in the model depicted by Figure 1, Pa([X.sub.3]) = [[X.sub.1], [X.sub.2]]. This notation has been used in other cases and in the context of this modeling, the distribution could then be written as [f.sub.i] ([X.sub.i]|Pa([X.sub.i]), [theta]). A greater level of statistical complexity is possible by also presuming pre·sum·ing adj. Having or showing excessive and arrogant self-confidence; presumptuous. pre·sum ing·ly adv. that the parameters have probability density functions;
[h.sub.i]([[theta].sub.i]) is referred to as the prior distribution of
[[theta].sub.i]. This formulation places the network defined by
[N.sub.I],([theta]) and the data into the context of classical Bayesian
networks (Jensen 1996).Suppose that we have to sets of microarray data [[[x.sub.1j], [x.sub.2j], ... [x.sub.pj].sub.j=1,2, ... m] from gene expression network [N.sub.p]([theta]), where individual arrays are independent random samples from the joint density function for the genes. The joint density function for the parameters given the gene expression data, denoted g([theta]|X), is referred to as the posterior distribution and can be estimated using the Markov chain Monte Carlo (MCMC MCMC Markov Chain Monte Carlo MCMC Malaysian Communications and Multimedia Commission MCMC Mid-Continent Mapping Center McMC McMaster-Carr MCMC Marine Corps Maintenance Contractor ) method (Hastings 1970). In the examples given in this article, the Metropolis algorithm (Andrec and Prestegard 1998) is used to sample from the MCMC to generate samples from the joint density. Specific Cases Used in This Analysis In all analyses that follow, the gene expression network is presumed to be a log-linear network defined by [N.sub.TP]([theta]) in Equations 1 and 2. It is assumed that data arise from microarrays using a relative comparison between two samples (no change results in a value of 1, increased expression > 1, decreased expression < 1), and the distributions for the log of the individual relative gene expression levels conditional on knowledge of T,[theta] and the other X's, [f.sub.i]([X.sub.i]|[bar.X].sub.p], [theta]), are assumed to be normal, with mean defined as the exponent of e in Equation 1 and with standard deviation In statistics, the average amount a number varies from the average number in a series of numbers. (statistics) standard deviation - (SD) A measure of the range of values in a set of numbers. (SD) [sigma]. All parameters in [theta] = (B,S), where S = [([[sigma].sub.1], ([[sigma].sub.2] .... [[sigma].sub.p]] are assumed to have prior distributions (normal for the elements of B and uniform for the elements of S). Assume that the structure of T (transition matrix) is known without error. In this situation, the qualitative relationship between genes in the gene expression network is known. Taking Figure 1 as an example, expectation of each log([X.sub.i]) (i : 1,2,3) becomes E[log([X.sub.1])| T, B, [bar.X].sub.1]] = 0, E[log([X.sub.2]|T,B, [[bar.X].sub.2]] = 0,E[log([X.sub.3])|T,B,[[bar.X].sub.3]] = [[beta].sub.13]log([X.sub.1]) + [[beta].sub.23] log([X.sub.2]), and E[log([X.sub.4])|T,B,[[bar.X].sub.4]] = [[beta].sub.14] log([[bar.X].sub.1]) - [[beta].sub.34]log([X.sub.3]). The ultimate goal of defining a Bayesian network is to derive the posterior distribution for the parameters of interest. To derive the posterior, we must first calculate the conditional likelihood of the data, denoted [L.sub.N][X|N.sub.TP([theta])]. The likelihood is the product of the individual conditional densities and is written [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In the MCMC analysis we must assume a mean and variance for the prior normal distributions for the [beta]'s and bounds on the prior uniform distributions for the [sigma]'s. Several options were chosen for the prior means of the [beta]'s and an uninformative un·in·for·ma·tive adj. Providing little or no information; not informative. un in·for SD
(10) was chosen for the prior variance. To develop bounds on the prior
uniform distributions for the [sigma]'s, SDs were calculated for
each gene across replicates, and the maximum SD observed was multiplied
by 2 to set the upper bound, with 0 set as the lower bound. Given these
priors and the data, MCMC iterations for each data set analyzed are run
until the estimates for the posterior distributions for the
[beta]'s and the [sigma]'s are stabilized.Other distributions and methods could be used to define the priors and generate the posterior distributions for the likelihood and the parameters in the model. In considering a more complicated functional relationship between genes, Michaelis-Menten-type equations could be used to develop networks with restricted maximum and minimum linkages. Such networks would require substantially more data. A user-friendly software package for these analyses is available from the corresponding author. Application to Microarray Gene Expression Data Martinez et al. (2002) evaluated the change in expression of 2,091 genes in triplicate samples of HPL1A and A549 cells exposed to differing levels (0, 0.1, 1.0, and 10 nM) TCDD for 24 hr. Total RNA RNA: see nucleic acid. RNA in full ribonucleic acid One of the two main types of nucleic acid (the other being DNA), which functions in cellular protein synthesis in all living cells and replaces DNA as the carrier of genetic was extracted and, using methods described by Martinez et al., hybridized to NIEHS NIEHS National Institute of Environmental Health Sciences (NIH, DHHS) Human ToxChip, version 1.0 (http://dir. niehs.nih.gov/microarray/chips.htm) to obtain changes in gene expression in dioxin-treated cells (one channel) relative to the controls (second channel). They identified 68 genes that were altered in at least one cell line and 15 genes that were altered in both cell lines. Of these, they identified 11 genes that appear to be involved in the effects of TCDD on the retinoid-signaling pathway. In this article we hypothesize hy·poth·e·size v. hy·poth·e·sized, hy·poth·e·siz·ing, hy·poth·e·siz·es v.tr. To assert as a hypothesis. v.intr. To form a hypothesis. a gene interaction network defining the quantitative role of TCDD in altering retinoid signaling based on the current available literature. The data for these 11 genes from the HPL1A cells and the hypothesized network are analyzed using the methods described above. Results Dioxin Analysis 2,3,7,8-Tetrachlorodibenzo-p-dioxin is a known human carcinogen carcinogen: see cancer. carcinogen Agent that can cause cancer. Exposure to one or more carcinogens, including certain chemicals, radiation, and certain viruses, can initiate cancer under conditions not completely understood. , a suspected teratogen teratogen /ter·a·to·gen/ (ter´ah-to-jen) any agent or factor that induces or increases the incidence of abnormal prenatal development.teratogen´ic te·rat·o·gen n. , and highly toxic highly toxic Occupational medicine adjective Referring to a chemical that 1. Has a median lethal dose–LD50 of ≤ 50 mg/kg when administered orally to 200-300 g albino rats 2. in most mammalian species. There has been considerable speculation that TCDD alters the retinoic acid receptor (RAR RAR Retinoic Acid Receptor RAR Resource Adapter Archive (J2EE) RAR Royal Australian Regiment RAR Risk Assessment Report RAR Roshal Archive (WinRAR compressed file format; file extension) )-dependent signaling pathway via alteration of the synthesis and metabolism of RA. Microarray data (Martinez et al. 2002) on changes in gene expression in HPL1A lung airway epithelial cells after exposure to TCDD at levels of 0.1, 1.0, and 10.0 nM for 24 hr identified 11 genes with significant changes at the 99% confidence level. The gene identifiers and data are given in Tables 1 and 2. Figure 2 hypothesizes a gene interaction network linking the traditional TCDD-induced genes and genes in the RAR-dependent signaling pathway. [FIGURE 2 OMITTED] Vitamin A vitamin A also called retinol Fat-soluble alcohol, most abundant in fatty fish and especially in fish-liver oils. It is not found in plants, but many vegetables and fruits contain beta-carotene (see (retinol retinol: see Vitamin A under vitamin. ) is taken up from blood and binds to the CRBP CRBP Cellular Retinol Binding Protein CRBP Centre de Recherches Regionales sur Bananiers et Plantains (Cameroon) CRBP Connection Reset by Peer in the cytoplasm cytoplasm: see protoplasm. cytoplasm Portion of a eukaryotic cell outside the nucleus. The cytoplasm contains all the organelles (see eukaryote). . Retinol and alcohol dehydrogenases convert the sequestered se·ques·ter v. se·ques·tered, se·ques·ter·ing, se·ques·ters v.tr. 1. To cause to withdraw into seclusion. 2. To remove or set apart; segregate. See Synonyms at isolate. 3. retinol to retinal, which is then converted to RA by retinal dehydrogenases such as ALDH ALDH Aldehyde Dehydrogenase 6 (Rexer et al. 2001). It is also possible that cytochrome P450s such as CYP CYP In currencies, this is the abbreviation for the Cyprus Pound. Notes: The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion. 1AI may also convert retinal to RA (Zhang et al. 2000). Once RA is synthesized, it binds to cytosolic RA binding proteins (such as CRABP CRABP Cellular Retinoic Acid-Binding Protein ). RA enters the nucleus, where it binds to two types of ligand-activated nuclear transcription factors, the RA receptors (e.g., RARB RARB Risk Assessment and Review Board ) and the retinoid X receptors retinoid X receptor One of 2 receptors for retinoids; RXR plays a key role in organ development, in particular of the skin. Cf Retinoic acid receptor. . Several groups have hypothesized that changes observed in RA levels from dioxin exposures are mediated through increased metabolism of retinal to RA through retinal dehydrogenases or cytochrome P450s or both (Schmidt et al. 2003). Using these data together with the known AhR gene battery, we developed a hypothetical gene interaction network (Figure 2). The predominant linkage to RAR is through upregulation of ALDH6 and CFP 1. CFP - Constraint Functional Programming. 2. CFP - Communicating Functional Processes. 3. CFP - Call For Papers (for a conference). 1A 1, which synthesize RA. TCDD alters the metabolism of all-trans-RA (Schmidt et al. 2003), suggesting the linkage between ALDH6 and RARB in Figure 2. RARB has been shown to play a role in the inhibition of cellular replication (Sun et al. 2000). RARB is assumed to modify the expression levels of four genes: ELF3, NCOA NCOA National Change Of Address (USPS) NCOA National Council On the Aging NCOA Nuclear Receptor Coactivator NCOA National Corvette Owners Association NCoA New Care-Of Address NCOA Non-Commissioned Officer Academy 2, ZNF ZNF Zinc Finger (protein; biology) 42, and CDKN CDKN Cyclin-Dependent Kinase Inhibitor 1A. These genes have been shown to be parts of the differentiation pathways of various cell types and are hypothesized to be modified by changes in the RA-signaling pathway. ELF3 is an epithelial-specific transcriptional regulator that may play a role in lung carcinogenesis car·ci·no·gen·e·sis n. The production of cancer. carcinogenesis production of cancer. biological carcinogenesis viruses and some parasites are capable of initiating neoplasia. (Tymms et al. 1997). NCOA2, also known as GRIP1, interacts with the five steroid hormone receptor Steroid hormone receptors are intracellular receptors (typically cytoplasmic) that perform signal transduction for steroid hormones. Steroid hormone receptors are part of the nuclear receptor family that include a group of homologous structured receptors (type II receptors) that types (Hong et al. 1997; Schmidt et al. 1998). ZNF42, also known as MZF-1, is a putative transcriptional regulator induced by RA in human myeloid myeloid /my·eloid/ (mi´e-loid) 1. medullary; pertaining to, derived from, or resembling bone marrow or the spinal cord. 2. having the appearance of myelocytes, but not derived from bone marrow. cells (Hromas et al. 1991). CDKN1A is induced by RA through RARB in human neuroblastoma Neuroblastoma Definition Neuroblastoma is a type of cancer that usually originates either in the tissues of the adrenal gland or in the ganglia of the abdomen or in the ganglia of the nervous system. tumors (Cheung et al. 1998; Liu et al. 1996). Both ALDH6 and RARB affect the regulation of NCOA2, which in turn alters the regulation of ZNF42 and CDKN1A. ACOXI, the human peroxisomal acylcoenzyme A oxidase oxidase /ox·i·dase/ (ok´si-das) any enzyme of the class of oxidoreductases in which molecular oxygen is the hydrogen acceptor. ox·i·dase n. , is hypothesized in the model to upregulate both RARI RARI Reporting Activity Routing Identifier 3 and NCOA2. ACOX1 is the first enzyme of the fatty acid fatty acid, any of the organic carboxylic acids present in fats and oils as esters of glycerol. Molecular weights of fatty acids vary over a wide range. The carbon skeleton of any fatty acid is unbranched. Some fatty acids are saturated, i.e. beta-oxidation pathway (Varanasi et al. 1994), and changes in this gene are likely to affect endogenous levels of fatty acids known to activate the retinoic X receptor, thereby modulating gene expression (Issemann et al. 1993). The second major linkage occurs between cytochrome cytochrome (sī`təkrōm'), protein containing heme (see coenzyme) that participates in the phase of biochemical respiration called oxidative phosphorylation. P4501A1, CYP1A CYP1A Cytochrome P450 1A 1, and RARB. An inducer inducer /in·duc·er/ (in-dldbomacs´er) a molecule that causes a cell or organism to accelerate synthesis of an enzyme or sequence of enzymes in response to a developmental signal. in·duc·er n. of CYP1A1 ([beta]-naphtboflavone) induced the metabolism of all-trans-RA in human intestinal epithelial cells (Lampen et al. 2000). CYP1A1 is upregulated by TCDD (Portier et al. 1993), suggesting the linkage between TCDD and genes in the RAR-signaling pathway such as RARB, NCOA2, and CRAt3P, a specific carrier protein for vitamin A that influences metabolism of RA and increases the sensitivity of a cell to vitamin A signaling (Boylan and Gudas 1992; Ong 1987). The CRABP promoter contains an enhancer region through which RA inhibits CRABP transcription (Means et al. 2000). The slope parameters for all the linkages between genes ([[beta].sub.ij] in Equation 1) in Figure 2 were estimated using the Bayesian gene interaction network approach as described above. Prior probability prior probability, n the extent of belief held by a patient and practitioner in the ability of a specific therapeutic approach to produce a positive outcome before treatment begins. distributions for the log gene expression values were assumed to be normally distributed with a mean of zero and a variance of 1. The SDs ([[sigma].sub.1], [[sigma].sub.2] .... [[sigma].sub.p]) were assumed to have uniform priors ranging from zero to two times the largest SD observed for any one gene (an uninformative prior). A total of 100,000 MCMC samples were obtained, and the last 80% (80,000) were used to estimate the posterior density of the parameters. Although no formal MCMC stopping rule In probability theory, in particular in the study of stochastic processes, a stopping time is a specific type of "random time". The theory of stopping rules and stopping times can be analysed in probability and statistics, notably in the optional stopping theorem. was used, analyses of the last 80,000 MCMC samples clearly supported convergence. Figure 2 also illustrates the type of results routinely obtained in Bayesian analyses. The four histograms shown in Figure 2 are the empirical posterior densities for 4 of the 19 linkages. The linkage between TCDD and CFP1A1 (A) has a distribution for which there are virtually no values below zero, indicating a strong statistical relationship in these data. The mean value, 0.269, indicates the degree of change in CYP1A1 expression as a function of the change in TCDD concentration. The means, SDs, and percentages of values below zero are summarized for all 19 linkages in Table 3. The distributions for the variances are presented in Table 4. Other uninformative priors were tried with no significant alteration in the results presented in Table 3. In our Bayesian analysis Bayesian analysis A decision-making analysis that '…permits the calculation of the probability that one treatment is superior based on the observed data and prior beliefs…subjectivity of beliefs is not a liability, but rather explicitly allows we estimate the posterior distributions for each parameter given the data. If a distribution for a given parameter has a small probability of being < 0 (such as [greater than or equal to] 0.1), that parameter supports a linkage between genes. The network depicted in Figure 2 was developed to test the hypothesis of a linkage between dioxin-responsive genes CYP1A1 and ALDH6, and the RAR-signaling gene RARB. The distribution for [[beta].sub.CYP1A1 [right arrow] RARB] had a substantial mass less than zero (26% < 0, Table 3), suggesting a lack of support for the linkage between changes in message for these two genes. Similar results were seen for [[beta].sub.ALDH6 [right arrow] RARB] (20% < 0, Table 3). Examination of the joint density for [[beta].sub.CYP1A1 [right arrow] RARB] and [[beta].sub.ALDH6 [right arrow] RARB] suggested a negative correlation Noun 1. negative correlation - a correlation in which large values of one variable are associated with small values of the other; the correlation coefficient is between 0 and -1 indirect correlation , indicating that the data may not support both linkages simultaneously. This is not surprising, as they are both acting upon the same component of RA synthesis. By forcing [[beta].sub.CYP1A1 [right arrow] RARB] = 0 and again estimating the remaining parameters, we can examine the distribution of [[beta].sub.ALDH6 [right arrow] RARB] under the condition that the other linkage is not present; in this case, [[beta].sub.ALDH6 [right arrow] RARB] had no estimates less than zero (0%) and there was no change in the posterior distribution for the log-likelihood, suggesting almost no change in the fit of the network to the data, even though we dropped the linkage between CYP1A1 and RARB. Conversely, we can set [[beta].sub.ALDH6 [right arrow] RARB] = 0 and examine the distribution of [[beta].sub.CYP1A1 [right arrow] RARB]; here also we see 0% < 0 and no change in the log-likelihood. These two analyses support the hypothesized linkage between TCDD-responsive genes and RARB-responsive genes, but only through either ALDH6or CYP1A1, not both. Finally, setting both [[beta].sub.CYP1A1 [right arrow] RARB] = 0 and [[beta].sub.ALDH6 [right arrow] RARB] = 0 significantly shifts the distribution of the posterior log-likelihood to smaller values (10% reduction overall), suggesting that at least one of these linkages is needed to explain these data. The only other linkage that did not appear to be supported by these data was the hypothesized linkage between NCOA2 and ZNF42. The distribution for [[beta].sub.NCOA2 [right arrow] ZNF42] had a mean estimate of zero, with 48.5% of the estimates less than zero. Assuming [[beta].sub.NCOA2 [right arrow] ZNF42] = 0 had no impact on the log-likelihood, suggesting this linkage was not needed in the model and that there was no correlation offset with other parameters. Given the sample size and the number of genes in the network, it is surprising that all other linkages appeared to be supported by these data, with the percentage of [beta] values less than zero ranging from 0% for several pairs ([[beta].sub.TCDD [right arrow] CYP1A1], [[beta].sub.RARB [right arrow] CDKN1A], [[beta].sub.RARB [right arrow] ELF3], [[beta].sub.RARB [right arrow] NCOA2], [[beta].sub.RARB [right arrow] ZNF42, and [[beta].sub.ACOX1 [right arrow] NCOA2]) to 9.4% ([[beta].sub.ACOX1 [right arrow] RARB]). Simulation Studies Although the TCDD example is illustrative of the method, it does not address how well this method works under diverse conditions; this is best addressed by Monte Carlo simulations Monte Carlo Simulation A problem solving technique used to approximate the probability of certain outcomes by running multiple trial runs, called simulations, using random variables. . One thousand (1,000) simulated experiments from the simple four-gene network in Figure 1 were generated by the computer using sample sizes of 50, 25, and 10 gene chips in each experiment. Twenty-two combinations of the model parameters ([theta] = [[beta].sub.13], [[beta].sub.14], [[beta].sub.23], [[beta].sub.34], [[sigma].sub.1], [[sigma].sub.2], [[sigma].sub.3], [[sigma].sub.4]) were considered. For each simulation, posterior distributions were calculated and summarized by their means, medians, and SDs. The MCMC process used was identical to that used for the dioxin example, with the exception that only 8,000 iterations of the Metropolis algorithm were performed, and the last 20% (1,600) values were used to calculate the summary statistics. Multiple runs with different starting points were used, with no difference in the final results (not shown). Table 5 provides representative results from two of the simulation studies. The results indicate that, when sample sizes are sufficiently large In mathematics, the phrase sufficiently large is used in contexts such as:
n. 1. The condition of not existing. 2. Something that does not exist. non link is approximately zero, and one could discard this link. Similar results were seen for all the cases studied. To further challenge the estimation procedure, an eight-gene network was simulated and estimated. This network had 18 parameters and was a greater challenge to the Bayesian method. Because of the increase in the number of parameters, SDs were substantially larger than in the four-gene model, but the estimation was still effectively unbiased. Discussion Many methods have been developed for the analysis of gene expression microarray data, but few methods exist for using these data to quantify the interrelated in·ter·re·late tr. & intr.v. in·ter·re·lat·ed, in·ter·re·lat·ing, in·ter·re·lates To place in or come into mutual relationship. in behavior of genes within gene interaction networks. Most network-based methods are focused on network identification, not quantification. Given a hypothesized gene interaction network, this article develops and demonstrates the use of Bayesian network models as a tool for the analysis of a network using microarray data. The method allows for evaluating the strength of relationships within a hypothesized network and could also be used to test for additional linkages within the network. There were two key points raised by these analyses. First, the application of this quantitative approach to the experimental data on TCDD effects in human lung epithelial cells clearly identified two subnetworks as significantly related to the AhR battery and the retinoid signaling. This indicates that the observed gene expression changes are consistent with the underlying hypothesized mechanism of action. In one sense this represents an alternate validation step in a tiered approach to evaluating microarray analyses. For example, ZNF42, although annotated as a retinoid-responsive gene, has not been previously validated as a retinoid-responsive gene in this cell system. The quantitative modeling suggests a highly significant relationship between ZNF42 expression and other genes in the retinoid-signaling subnetwork See subnet. , which provides confidence that its alteration was indeed due to activation of the retinoid-signaling pathway. The testing of other subnetworks within a given data set can further serve to increase confidence that inferences on relationships between genes obtained from other types of analyses (evaluation of gene annotation, clustering, pathway analysis, informatic-based network mapping Network mapping or Internet mapping is the study of the physical connectivity of the Internet. It is not to be confused with the remote discovery of which operating system a computer is running, an activity more akin to hacking. , literature searches) are real. The second point from these analyses is that we were able to test the interaction between the two subnetworks (AhR and retinoid) and illustrated that a functional relationship was likely real. Such an analysis is useful in that it supports further testing of this mechanism experimentally, it may be that some fraction of the toxicity associated with chronic exposure to TCDD could be the direct result of TCDD-induced increases in RA in the cells. The hypothesized network clearly supports a significant change in gene expression associated with signaling through the RARB pathway. The quantitative linkages observed in this experiment are unlikely to hold for an in vivo in vivo /in vi·vo/ (ve´vo) [L.] within the living body. in vi·vo adj. Within a living organism. in vivo adv. system but suggest that an experiment exposing laboratory animals to TCDD, which includes both TCDD and RA measurements with gene expression measurements, would be useful. Two recent experiments address these issues to a limited extent. Schmidt et al. (2003) examined RA levels and changes in expression of CRBPI in male Sprague-Dawley rats and saw significant changes in RA levels in kidney, liver, and serum, and a marginal change in liver CRBP1 after 28 days. They did not examine any of the genes in the network shown in Figure 2, so it is difficult to compare directly with our results. Johnson et al. (2004) used in vitro data from three experiments with AhR ligands activating genes in the heart, kidney, and thoracic aorta of mouse embryos. They used an exhaustive search of three linkages for each gene to identify the most likely gene-gene interactions. They also identified linkages to genes in the RA-signaling pathway (IGFBP-3 and IGFBP-6), but again, not the specific genes used in Figure 2. The simulation experiments were different from the analysis of the TCDD study. In the TCDD study, the network linkages were perturbed per·turb tr.v. per·turbed, per·turb·ing, per·turbs 1. To disturb greatly; make uneasy or anxious. 2. To throw into great confusion. 3. to cause significant quantitative changes in expression, which then could be used to quantify the linkages between genes. In contrast, the simulation study used only the random variation in expression levels to quantify the network. The simulation studies indicate that the proposed method appears to be unbiased and, on average, produces the correct results. However, sample size could be a problem for small experiments with minor changes in gene expression. When the sample size is only 10 microarrays, the SD can be large relative to the expected value Expected value The weighted average of a probability distribution. Also known as the mean value. of the linkage between two genes, suggesting one might misinterpret mis·in·ter·pret tr.v. mis·in·ter·pret·ed, mis·in·ter·pret·ing, mis·in·ter·prets 1. To interpret inaccurately. 2. To explain inaccurately. a linkage as having little statistical support. This problem gets worse as the number of genes in the network increases. In contrast, large sample sizes of 50 microarrays are unlikely to have this problem. Directed changes in the network, as in the dioxin experiment, can help overcome this problem and allow the quantification of significant linkages by as few as nine microarrays. To address this question, two additional simulations were conducted. Using the network shown in Figure 2 and the parameters estimated for the TCDD network shown in Table 3, we simulated 500 data sets consisting of nine microarrays--three for each dioxin dose; that is, we replicated the experiment 500 times using the predicted model. On average the resulting parameter estimates were identical to those observed from fitting the original data but appeared to have a slightly smaller SD than that estimated in the model. This decrease in SD could indicate a degree of model misspecification, as the simulated data appear to fit better than the observed data. In addition, whereas the observed data showed a nonsignificant non·sig·nif·i·cant adj. 1. Not significant. 2. Having, producing, or being a value obtained from a statistical test that lies within the limits for being of random occurrence. linkage between CYP1A1 to RARB, 48% of the simulated data sets found this linkage to be significant. Similarly, 54% found the linkage between ALDH6 and RARB to be significant. In contrast, the simulations found a significant linkage between NCOA2 and ZNF42 in only 6% of the cases (hence the Type I error appears to be good) and between TCDD and CYP1A1 in 100% of the cases (power is high). In a second simulation, the network shown in Figure 2 was again simulated, this time without TCDD included in the experimental design and using just random variation in the genes to produce the data. Again, the results were unbiased, but the SDs more than doubled. In addition, the probability of observing a significant linkage was reduced by about 20% for most linkages. This illustrates the value of stimulating the system when trying to identify gene interaction networks. Clearly, this type of modeling approach is limited in terms of interpretation. First, the model cannot be cyclic; hence, increases in CRABP as a function of RARB that might then result in greater binding of RA in the cytosol cytosol /cy·to·sol/ (sit´ah-sol) the liquid medium of the cytoplasm, i.e., cytoplasm minus organelles and nonmembranous insoluble components.cytosol´ic cy·to·sol n. , reducing RARB expression, could not be included. Given time-course data, it could be possible to explore this linkage using a more complicated modeling form or some other method of analysis such as semicyclic Bayesian networks. Second, the method is dependent on a parametric model In statistics, a parametric model is a parametrized family of probability distributions, one of which is presumed to describe the way a population is distributed. Examples
a statistical model which models frequency counts in contingency tables by using an analysis of variance approach. could underestimate low-dose changes while overestimating high-dose changes. This, in turn, could lead one to accept or reject a given model incorrectly. It should be noted that this type of criticism applies to all the other network analysis methods as well. Finally, although not seen in this analysis, it is possible that the resulting distributions for the linkages between the genes could be sensitive to the choice of prior distributions, and one should be careful to evaluate if such an impact might exist with the data. Although the approach presented here involves only gene expression data, it can easily be expanded to include other data relevant to the linkages between genes and the quantification of signal transduction pathways in cells. Data quantifying protein levels in cells could easily be folded into a general likelihood, linked via a similar model, and analyzed to quantify the entire network. Such an approach leads to rational, mechanism-driven simultaneous analyses of genomics, proteomics, and metabolomics data. In addition, the networks identified through this type of analysis can easily be combined with other mechanism-based mathematical models such as physiologically based pharmacokinetic and pharmacodynamic models to present a true, systems-biology approach for the quantification of risks from exposures to xenobiotics like dioxin. This analysis would form one module of an overall model for TCDD toxicity. For example, if microarray data were available in rats exposed to TCDD, existing models like that of Kohn et al. (2001) could easily be linked to the gene interaction network discussed above. These, in turn, could be linked to cancer data using a mechanistic model to test hypotheses regarding cancer incidence and the mechanisms involved, as shown by Brooks et al. (1999). The method proposed here is not restricted to the log-linear model used in this analysis, nor is it linked to the statistical likelihood chosen for the analysis. Other models such as dynamic models (Chen et al. 1999) and other statistical likelihoods (Wolfinger et al. 2001) could easily be incorporated into the analysis methods. Bayesian networks have been used in a number of settings to provide insight into the complicated linkage between variables that interact. Quantifying the distributions linking genes into networks and expanding this to include proteins and protein modifications will make it possible to quantify the impact of a given chemical agent on the signal transduction pathways in a cell. Although many different methods could be used for this, Bayesian networks have the advantage of flexibility, which will make it possible to build on existing knowledge while bringing new data into the analysis. For the dioxin study presented here, the limitations of the sample size preclude an overall conclusion concerning the validity of the final model for predictions about the role of dioxin in changes to the RAR-signaling pathway. However, this analysis has strengthened the underlying hypothesis that changes in RAR signaling may play an important role in dioxin-mediated toxicity and suggest a number of experiments that could lead to a better-characterized network; this is left for future work. In this article we used known scientific inferences and gene annotation to develop the initial tested network. This approach can also be applied to evaluating the likelihood of any hypothesized network developed by other approaches. As such, it can be applied to networks developed using other types of analyses including Bayesian, Boolean, and informatics-based approaches, as well as other known networks in the scientific literature. The ability to test hypotheses in the context of the network and to build modules that can be quantitatively linked to toxicity are first steps in a true systems-biology approach to mechanism-based use of genomics in risk assessment. This analysis is unique in that it directly addresses these uses. REFERENCES Akutsu T, Miyano S, Kuhara S. 2000. Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol 7(3-4):331-343. Andrec M, Prestegard JH. 1998. A Metropolis Monte Carlo implementation of Bayesian time-domain parameter estimation: application to coupling constant estimation from antiphase multiplets. J Magn Reson 130(2):217-232. Biankin AV, Kench JG, Morey AL, Lee CS, Biankin SA, Head DR, et al. 2001. Overexpression of p21(WAF WAF 1 or Waf n. A member of the Women in the Air Force, organized after World War II, but now no longer a separate branch. [From W(omen in the) A(ir) F(orce).] 1/CIP1) is an early event in the development of pancreatic intraepithelial neoplasia intraepithelial neoplasia An in situ carcinoma confined to an epithelium that may superficially penetrate adnexal glands, measuring < either 3 mm or 5 mm depending on the criteria used; IN is adjectivally modified according to site of origin See CIN . Cancer Res 61(24):8830-8837. Boylan JF, Gudas LJ. 1992. The level of CRABP-I expression influences the amounts and types of all-trans-retinoic acid metabolites Metabolites Substances produced by metabolism or by a metabolic process. Mentioned in: Interactions in F9 teratocarcinoma teratocarcinoma /ter·a·to·car·ci·no·ma/ (-kahr?si-no´mah) a malignant neoplasm consisting of elements of teratoma with those of embryonal carcinoma or choriocarcinoma, or both; occurring most often in the testis. stem cells stem cells, unspecialized human or animal cells that can produce mature specialized body cells and at the same time replicate themselves. Embryonic stem cells are derived from a blastocyst (the blastula typical of placental mammals; see embryo), which is very young . J Biol Chem 267(30): 21486-21491. Brooks EA, Kohn M, van Birgelen A, Lucier G, Portier C. 1999. Stochastic models Stochastic models Liability-matching models that assume that the liability payments and the asset cash flows are uncertain. Related: Deterministic models. for papilloma papilloma /pap·il·lo·ma/ (pap?il-o´mah) a benign tumor derived from epithelium.papillo´matous fibroepithelial papilloma a type containing extensive fibrous tissue. formation following exposure to TCDD. Organohalogen Compd 41:521-524. Chen T, He H, Church G. 1999. Modeling gene expression with differential equations. Pac Symp Biocomput 4:29-40. Cheung B, Hocker JE, Smith SA, Norris MD, Haber M, Marshall GM. 1998. Favorable prognostic significance of high-level retinoic acid receptor beta expression in neuroblastoma mediated by effects on cell cycle regulation. Oncogene oncogene Gene that can cause cancer. It is a sequence of DNA that has been altered or mutated from its original form, the proto-oncogene (see mutation). Proto-oncogenes promote the specialization and division of normal cells. 17(6):751-759. Friedman N, Linial M, Nachman I, Pe'er D. 2000. Using Bayesian networks to analyze expression data. J Comput Biol 7(3-4):601-620. Gardner TS, di Bernardo D, Lorenz D, Collins JJ. 2003. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301(5629):102-105. Hartemink AJ, Gifford DK, Jaakkola TS, Young RA. 2002. Bayesian methods for elucidating genetic regulatory networks. IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. Intell Syst 17(2):37-43. Hastings WK. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97-109. Hayashi K, Yokozaki H, Naka K, Yasui W, Lotan R, Tahara E. 2001. Overexpression of retinoic acid receptor beta induces growth arrest and apoptosis in oral cancer cell lines. Jpn J Cancer Res 92(1):42-50. Hong H, Kohli K, Garabedian MJ, Stallcup MR. 1997. GRIP1, a transcriptional coactivator for the AF-2 transactivation Transactivation is an increased rate of gene expression triggered either by endogenous cellular or viral proteins - transactivators. These protein factors act in trans (i.e., intermolecularly). domain of steroid, thyroid, retinoid, and vitamin D vitamin D Any of a group of fat-soluble alcohols important in calcium metabolism in animals to form strong bones and teeth and prevent rickets and osteoporosis. It is formed by ultraviolet radiation (sunlight) of sterols (see steroid) present in the skin. receptors. Mol Cell Biol 17(5):2735-2744. Hromas R, Collins SJ, Hickstein D, Raskind W, Deaven LL, O'Hara P, et al. 1991. A retinoic acid-responsive human zinc finger gene, MZF-1, preferentially expressed in myeloid cells. J Biol Chem 266(22):14183-14187. Imoto S, Goto T, Miyano S. 2002. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput 175-186. Issemann I, Prince RA, Tugwood JB, Green S. 1993. The peroxisome proliferator-activated receptor In cell biology, peroxisome proliferator-activated receptors (PPARs) are a group of nuclear receptor isoforms that exist across biology. They are intimately connected to cellular metabolism (carbohydrate, lipid and protein) and cell differentiation. : retinoid X receptor heterodimer is activated by fatty acids and fibrate hypolipidaemic drugs. J Mol Endocrinol 11(1):37-47. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan N, Chung S, et al. 2003. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302:449-453. Jensen FV. 1996. An Introduction to Bayesian Networks. London:UCL UCL University College London UCL Université Catholique de Louvain UCL UEFA Champions League UCL Upper Confidence Limit UCL University of Central Lancashire UCL Upper Control Limit UCL Unfair Competition Law UCL Ulnar Collateral Ligament Press. Johnson CD, Balagurunathan Y, Tadesse MG, Falahatpisheh MH, Brun M, Walker MK, et al. 2004. Unraveling gene-gene interactions regulated by ligands of the aryl hydrocarbon receptor. Environ Health Perspect 112(4):403-412. Kociba RJ, Keyes DG, Beyer JE, Carreon RM, Wade CE, Dittenber DA, et al. 1978. Results of a two-year chronic toxicity chronic toxicity Toxicology A condition caused by repeated or long-term exposure to low doses of a toxic substance and oncogenicity oncogenicity The capacity to induce tumors study of 2,3,7,8-tetrachlorodibenzo-p-dioxin in rats. Toxicol Appl Pharmacol 46(2):279-303. Kohn MC, Walker NJ, Kim AH, Portier CJ. 2001. Physiological modeling of a proposed mechanism of enzyme induction by TCDD. Toxicology 162(3):193-208. Lampen A, Meyer S, Arnhold T, Nau H. 2000. Metabolism of vitamin A and its active metabolite active metabolite Therapeutics A drug metabolite with therapeutic activity similar to the parent compound, which must be considered in therapeutic pharmacokinetics all-trans-retinoic acid in small intestinal enterocytes. J Pharmacol Exp Ther 295(3):979-985. Lin RJ, Evans RM. 2000. Acquisition of oncogenic oncogenic /on·co·gen·ic/ (-jen´ik) giving rise to tumors or causing tumor formation; said especially of tumor-inducing viruses. on·co·gen·ic or on·cog·e·nous adj. potential by RAR chimeras in acute promyelocytic leukemia acute pro·my·e·lo·cyt·ic leukemia n. A severe bleeding disorder that is a form of leukemia and is characterized by low concentrations of plasma fibrigen, defective coagulation, and infiltration of the bone marrow with abnormal promyelocytes and through formation of homodimers. Mol Cell 5(5):821-830. Liu M, Lee MH, Cohen cohen or kohen (Hebrew: “priest”) Jewish priest descended from Zadok (a descendant of Aaron), priest at the First Temple of Jerusalem. The biblical priesthood was hereditary and male. M, Bommakanti M, Freedman LP. 1996. Transcriptional activation of the Cdk inhibitor p21 by vitamin B vitamin B n. 1. Vitamin B complex. 2. A member of the vitamin B complex, especially thiamine. vitamin B, vitamin B complex a group of water-soluble substances described separately. 3 leads to the induced differentiation of the myelomonocytic cell line U937. Genes Dev 10(2):142-153. Lockhart BJ, Winzeler EA. 2000. Genomics, gene expression and DNA DNA: see nucleic acid. DNA or deoxyribonucleic acid One of two types of nucleic acid (the other is RNA); a complex organic compound found in all living cells and many viruses. It is the chemical substance of genes. arrays. Nature 405(6788): 827-836. Martinez JM, Afshari CA, Bushel bushel: see English units of measurement. PR, Masuda A, Takahashi T, Walker NJ. 2002. Differential toxicogenomic responses to 2,3,7,8-tetrachlorodibenzo-p-dioxinin malignant and nonmalignant human airway epithelial cells. Toxicol Sci 69(2):409-423. Means, AL, Thompson JR, Gudas LJ. 2000. Transcriptional regulation of the cellular retinoic acid binding protein I gene in F9 teratocarcinoma cells. Cell Growth Differ 11(2):71-82. Mysorekar IU, Mulvey MA, Hultgren SJ, Gordon JI. 2002. Molecular regulation of urothelial renewal and host defenses during infection with uropathogenic Escherichia coli Escherichia coli (ĕsh'ərĭk`ēə kō`lī), common bacterium that normally inhabits the intestinal tracts of humans and animals, but can cause infection in other parts of the body, especially the urinary tract. . J Biol Chem 277(9):7412-7419. Nebert DW, McKinnon RA, Puga A. 1996. Human drug-metabolizing enzyme polymorphisms: effects on risk of toxicity and cancer. DNA Cell Biol 15(4):273-280. Ong DE. 1987. Cellular retinoid-binding proteins. Arch Dermatol 123(12):1693-1695a. Pe'er D, Regev A, Elidan G, Friedman N. 2001. Inferring subnetworks from perturbed expression profiles. Bioinformatics 17(suppl 1):S215-S224. Picard F, Gehin M, Annicotte J, Rocchi S, Champy MF, O'Malley BW, et al. 2002. SRC-1 and TIF TIF Tagged Image File (file name extension) TIF Tax Increment Financing TIF Temporary Internet Files TIF Transport Innovation Fund (UK) TIF Telecommunications Infrastructure Fund 2 control energy balance between white and brown adipose tissues brown adipose tissue see brown adipose tissue. . Cell 111(7):931-941. Portier C, Tritscher A, Kohn M, Sewall C, Clark G, Edler L, et al. 1993. Ligand/receptor binding for 2,3,7,8-TCDD: implications for risk assessment. Fundam Appl Toxicol 20(1):48-56. Rexer BN, Zheng WL, Ong DE. 2001. Retinoic acid biosynthesis Biosynthesis The synthesis of more complex molecules from simpler ones in cells by a series of reactions mediated by enzymes. The overall economy and survival of the cell is governed by the interplay between the energy gained from the breakdown of compounds by normal human breast epithelium is via aldehyde dehydrogenase 6, absentin MCF-7 cells. Cancer Res 61(19):7065-7070. Schmidt CK, Hoegberg P, Fletcher N, Nilsson CB, Trossvik C, Hakansson H, et al. 2003. 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) alters the endogenous metabolism of all-trans-retinoic acid in the rat. Arch Toxicol 77(7):371-383. Schmidt S, Baniahmad A, Eggert M, Schneider S, Renkawitz R. 1998. Multiple receptor interaction domains of GRIP1 function in synergy. Nucleic Acids Nucleic acids The cellular molecules DNA and RNA that act as coded instructions for the production of proteins and are copied for transmission of inherited traits. Res 26(5):1191-1197. Sun SY, Wan H, Yue P, Hong WK, Lotan R. 2000. Evidence that retinoic acid receptor beta induction by retinoids Retinoids A derivative of synthetic Vitamin A. Mentioned in: Ichthyosis retinoids (reˑ·t is important for tumor cell growth inhibition. J Biol Chem 275(22):17149-17153. Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, et al. 2003. Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics 19(suppl 2):11227-11236. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, et al. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic hematopoietic /he·ma·to·poi·et·ic/ (-poi-et´ik) 1. pertaining to hematopoiesis. 2. an agent that promotes hematopoiesis. hematopoietic 1. pertaining to or affecting the formation of blood cells. differentiation. Proc Natl Acad Sci USA 96(6):2907-2912. Tymms MJ, Ng AY, Thomas RS, Schutte BC, Zhou J, Eyre HJ, et al. 1997. A novel epithelial-expressed ETS ETS Educational Testing Service (nonprofit private educational testing and measurement organization) ETS Emergency Telecommunications Service ETS Electronic Trading System ETS Engineering (&) Technical Services gene, ELF3: human and murine murine /mu·rine/ (mur´en) pertaining to, derived from, or characteristic of mice or rats. mu·rine adj. cDNA sequences, murine genomic organization, human mapping to 1q32.2 and expression in tissues and cancer. Oncogene 15(20):2449-2462. Unkila M, Pohjanvirta R, Honkakoski P, Torronen R, Tuomisto J. 1993. 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD)induced ethoxyresorufin-O-deethylase (EROD EROD Education Resource Organizations Directory EROD Ethoxyresorufin-O-deethylation EROD Early Return of Dependents EROD Electronic Record of Deposit (pending tranfer) ) and aldehyde dehydrogenase (ALDH3) activities in the brain and liver. A comparison between the most TCDD-susceptible and the most TCDD-resistant rat strain. Biochem Pharmacol 46(4):651-659. Varanasi U, Chu R, Chu S, Espinosa R, LeBeau MM, Reddy JK. 1994. Isolation of the human peroxisomal acyl-CoA oxidase gene: organization, promoter analysis, and chromosomal localization Customizing software and documentation for a particular country. It includes the translation of menus and messages into the native spoken language as well as changes in the user interface to accommodate different alphabets and culture. See internationalization and l10n. . Proc Natl Acad Sci USA 91(8):3107-3111. Vasiliou V, Pappa A, Petersen DR. 2000. Role of aldehyde dehydrogenases in endogenous and xenobiotic xen·o·bi·ot·ic adj. Foreign to the body or to living organisms. Used of chemical compounds. n. A xenobiotic chemical. xenobiotic any substance, harmful or not, that is foreign to the animal's biological system. metabolism. Chem Biol Interact 129(1-2):1-19. Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, et al. 2001. Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8(6):625-637. Xu J, Li Q. 2003. Review of the in vivo functions of the p160 steroid receptor coactivator family. Mol Endocrinol 17(9):1681-1692. Yoshida N, Yoshida S, Araie M, Handa H, Nabeshima Y. 2000. Ets family transcription factor ESE-1 is expressed in corneal corneal pertaining to the cornea. See also keratitis, keratopathy. corneal anomaly includes microcornea, coloboma, megalocornea, dermoid, congenital opacity. corneal black body see corneal sequestrum (below). epithelial cells and is involved in their differentiation. Mech Dev 97(1-2):27-34. Zhang QY, Dunbar D, Kaminsky L. 2000. Human cytochrome P-450 metabolism of retinals to retinoic acids. Drug Metab Dispos 28(3):292-297. Hiroyoshi Toyoshiba, Takeharu Yamanaka, Hideko Sone, Frederick M. Parham, Nigel J. Walker, Jeanelle Martinez, and Christopher J. Portier Laboratory of Computational Biology and Risk Analysis, National Institute of Environmental Health Sciences The National Institute of Environmental Health Sciences (NIEHS) is one of 27 Institutes and Centers of the National Institutes of Health (NIH),which is a component of the Department of Health and Human Services (DHHS). The Director of the NIEHS is Dr. David A. Schwartz. , Research Triangle Park Research Triangle Park, research, business, medical, and educational complex situated in central North Carolina. It has an area of 6,900 acres (2,795 hectares) and is 8 × 2 mi (13 × 3 km) in size. Named for the triangle formed by Duke Univ. , North Carolina North Carolina, state in the SE United States. It is bordered by the Atlantic Ocean (E), South Carolina and Georgia (S), Tennessee (W), and Virginia (N). Facts and Figures Area, 52,586 sq mi (136,198 sq km). Pop. , USA Address correspondence to C.J. Portier, National Institute of Environmental Health Sciences, PO Box 12233, Research Triangle Park, NC 27709. Telephone: (919) 541-3802. Fax: (919) 541-3647. E-mail: portier@nichs.nih.gov The authors declare they have no competing financial interests. Received 10 February 2004; accepted 23 June 2004.
Table 1. Description of genes included in the gene interaction network
shown in Figure 2.
Gene symbol (alternate Accession no. (a) Gene name (a)
symbols) (a)
ALDH3 (ALDH3A1) AA069024 Aldehyde dehydrogenase 3
family, memberA1
ALDH6(ALDH1A3) AA054748 Aldehyde dehydrogenase 1
family, member A3
ALDH10 (ALDH3A2) H63779 Aldehyde dehydrogenase 3
family, member A2
CYP1A1 AA418907 Cytochrome P450, subfamily
I, polypeptide 1A
CRABP N23941 Cellular retinoic acid
binding protein 1
NCOA2 (SRC-2, TIF2, 877770 Nuclear receptor
GRIP1) coactivator 2
RARB W93713 Retinoic acid receptor,
beta
CDKNIA (p21, Cip1) N23941 Cyclin-dependent kinase
inhibitor 1A
ZNF42 (MZF1, MZF-1, 883364 Zinc finger protein 42
MZF1B)
ELF3 (ESX, ESE1) H27939 E74-like factor 3 lets
domain transcription
factor, epithelial-
specific)
ACOX1(ACOX, PALMCOX) AA040205 Human peroxisomal acyl-
CoA oxidase
Gene symbol (alternate Biological role
symbols) (a)
ALDH3 (ALDH3A1) May play a role in the oxidation of lipid
aldehydes, especially those generated by lipid
peroxidation (Vasiliou et al. 2000), is
induced in rat liver by TCDD (Unkila et
al. 1993)
ALDH6(ALDH1A3) Has ability to synthesize retinoic acid from
both retinol and retinal (Rexer et al. 2001)
ALDH10 (ALDH3A2) Oxidizes long-chain aliphatic aldehydes to
fatty acid
CYP1A1 Phase I enzyme; its expression is controlled
by the AhR. Metabolically activates
procarcinogens to genotoxic electrophilic
intermediates (Nebert et al. 1996)
CRABP Small intracellular protein that is a carrier
for RA (vitamin A)
NCOA2 (SRC-2, TIF2, Transcription coactivator of retinoid/thyroid
GRIP1) receptors; a histone acetyltransferase that
plays an important role in lipid metabolism
and energy balance (Picard et al. 2002, Xu
and Li 2003)
RARB Hetero/homodimers associated with oncogenicity
(Lin and Evans 2000), overexpression in oral
squamous carcinoma cell lines; leads to growth
arrest and apoptosis (Hayashi et al. 2001)
CDKNIA (p21, Cip1) Functions as a regulator of cell-cycle
progression; overexpression linked to
carcinogenesis (Biankin et al. 2001)
ZNF42 (MZF1, MZF-1, Transcription factor that belongs to the
MZF1B) Kruppel family of zinc finger proteins;
RA-responsive; plays a role in cell
proliferation (Hromas et al. 1991)
ELF3 (ESX, ESE1) Transcription factor that transactivates genes
involved in epithelial differentiation and
host defense and mediators of proinflammatory
responses (e.g., Socs3, Cebp/delta, Bcl3, and
CC/CXC chemokines) (Mysorekar et al. 2002;
Yoshida et al. 2000)
ACOX1(ACOX, PALMCOX) First enzyme of the fatty acid [beta]-
oxidation pathway (Varanasi et al. 1994);
changes in this gene are likely to affect
endogenous levels of fatty acids known to
activate the retinoic X receptor, thereby
modulating gene expression (Issemann et
al 1993)
(a) From the NCBI (National Center for Biotechnology Information)
Unigene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=unigene).
Table 2. Relative expression level (to control) of genes in the
HPL1A cells exposed in replicate to three different concentrations
of TCDD. (a)
Genes
ALDH10 1.56 1.33 1.24 1.42
ALDH3 2.10 2.09 2.34 3.88
ALDH6 2.42 2.00 1.77 3.40
CRABP 0.63 0.69 0.74 0.51
CDKN1A 1.56 1.16 1.49 1.30
CVP1A1 3.07 2.63 1.31 14.45
ELF3 1.56 1.37 1.18 2.19
NCOA2 1.42 1.41 0.82 1.34
RARB 1.64 1.42 0.93 1.77
ZNF42 1.88 1.47 1.11 1.62
ACOX1 1.94 1.50 0.78 1.93
TCDD (b) 0.10 0.10 0.10 1.00
Genes
ALDH10 1.56 1.40 1.69 1.47 1.25
ALDH3 2.94 4.09 3.11 3.91 3.76
ALDH6 4.12 3.37 3.76 4.60 3.66
CRABP 0.48 0.47 0.29 0.46 0.41
CDKN1A 1.34 1.58 1.51 1.49 1.63
CVP1A1 6.85 6.09 15.35 14.91 8.08
ELF3 1.70 1.91 3.15 2.00 2.02
NCOA2 1.07 0.92 1.42 1.22 0.82
RARB 1.56 1.21 1.48 1.63 1.15
ZNF42 1.43 1.32 1.60 1.45 1.22
ACOX1 1.03 0.84 10.87 1.22 0.59
TCDD (b) 1.00 1.00 10.0 10.0 10.0
(a) Data from Martinez et al. (2002). (b) TCDD dose unit is measured
in nanomolars. Actual doses are used for TCDD in the analysis.
Table 3. Type of linkage, mean, SD, and percentage of the posterior
distribution below zero for all gene-gene relationships in Figure 2.
From To Type Mean SD % < 0
TCDD ALDH3 A 0.140 0.037 0.03
ALDH6 A 0.150 0.035 0.01
ALDH10 A 0.041 0.013 0.23
CVP1A1 A 0.269 0.056 0.003
ALDH6 CRABP R 0.348 0.152 1.27
NCOA2 R 0.132 0.062 2.10
RARB A 0.149 0.191 19.81
CVP1A1 CRABP R 0.235 0.099 1.03
NCOA2 R 0.046 0.038 11.40
RARB A 0.072 0.120 26.34
NCOA2 CDKN1A R 0.847 0.298 0.50
ZNF42 A 0.000 0.168 48.53
RARB CRABP A 0.418 0.234 3.13
CDKN1A A 1.186 0.199 0.00
ELF3 A 1.423 0.220 0.00
NCOA2 A 0.912 0.085 0.00
ZNF42 A 0.975 0.113 0.00
ACOX1 NCOA2 A 0.125 0.021 0.00
RARB A 0.081 0.065 9.38
Abbreviations: A, activate, R, repress.
Table 4. Estimated mean and median SD ([sigma]) for
genes included in the gene interaction network
shown in Figure 2.
Posterior distribution for [sigma]
Genes Mean (median) SD
ALDH10 0.22 (0.22) 0.04
ALDH3 0.63 (0.61) 0.12
ALDH6 0.62 (0.61) 0.12
CRABP 0.11 (0.11) 0.02
CDKNIA 0.15 (0.14) 0.03
CYP1A1 0 94 (0.92) 0.17
ELF3 0.25 (0.24) 0.05
NCOA2 0.04 (0.04) 0.01
RARB 0.13 (0.12) 0.03
ZNF42 0.08 (0.08) 0.02
ACOX1 0.86 (0.82) 0.20
Table 5. Mean, median, and SD from two simulation studies of
the simple four-gene model (Figure 1).
Sample
Model size Estimation
[[beta].sub.14] = -2 50 Mean (SD)
[[beta].sub.13] = 0.8 Median (SD)
[[beta].sub.23] = 0.8 25 Mean (SD)
[[beta].sub.34] = -1.3 Median (SD)
[[sigma].sub.i] = 1, i = 1,2,3,4 10 Mean (SD)
Median (SD)
[[beta].sub.14] = -2 50 Mean (SD)
[[beta].sub.13] = 0 Median (SD)
[[beta].sub.23] = 0.8 25 Mean (SD)
[[beta].sub.34] = -1.3 Median (SD)
[[sigma].sub.i] = 1, i = 1,2,3,4 10 Mean (SD)
Median (SD)
Model [[beta].sub.14] [[beta].sub.13]
[[beta].sub.14] = -2 -1.98 (0.22) 0.81 (0.21)
[[beta].sub.13] = 0.8 -1.97 (0.22) 0.81 (0.21)
[[beta].sub.23] = 0.8 -2.00 (0.29) 0.80 (0.25)
[[beta].sub.34] = -1.3 -1.98 (0.29) 0.80 (0.28)
[[sigma].sub.i] = 1, i = 1,2,3,4 -1.97 (0.45) 0.79 (0.40)
-1.95 (0.45) 0.79 (0.40)
[[beta].sub.14] = -2 2.00 (0.17) 0.01 (0.17)
[[beta].sub.13] = 0 2.0 (0.18) 0.01 (0.18)
[[beta].sub.23] = 0.8 2.0 (0.22) 0 01 (0.22)
[[beta].sub.34] = -1.3 2.01 (0.23) 0.00 (0.23)
[[sigma].sub.i] = 1, i = 1,2,3,4 2.02 (0.40) -0.02 (0.40)
1.99 (0.40) -0.02 (0.39)
Model [[beta].sub.23] [[beta].sub.34]
[[beta].sub.14] = -2 0.83 (0.19) 1.32 (0.13)
[[beta].sub.13] = 0.8 0.82 (0.19) 1.33 (0.13)
[[beta].sub.23] = 0.8 0.81 (0.23) -1.29 (0.19)
[[beta].sub.34] = -1.3 0.78 (0.24) -1.30 (0.19)
[[sigma].sub.i] = 1, i = 1,2,3,4 0.80 (0.38) -1.29 (0.29)
0.81 (0.37) -1 31 (0.29)
[[beta].sub.14] = -2 0.80 (0.17) -1.30 (0.12)
[[beta].sub.13] = 0 0.81 (0.18) -1.31 (0.13)
[[beta].sub.23] = 0.8 0.79 (0.21) -1.31 (0.16}
[[beta].sub.34] = -1.3 0.77 (0.21) -1.30 (0.16)
[[sigma].sub.i] = 1, i = 1,2,3,4 0.83 (0.40) -1.30 (0.32)
0.85 (0.39) -1.29 (0.31)
Model [[sigma].sub.1] [[sigma].sub.2]
[[beta].sub.14] = -2 1.01 (0.12) 1.00 (0.12)
[[beta].sub.13] = 0.8 1.01 (0.13) 1.00 (0.12)
[[beta].sub.23] = 0.8 1.05 (0.15) 1.03 (0.15)
[[beta].sub.34] = -1.3 1.02 (0.15) 1.01 (0.15)
[[sigma].sub.i] = 1, i = 1,2,3,4 1.13 (0.26) 1.10 (0.27)
1.08 (0.24) 1.04 (0.26)
[[beta].sub.14] = -2 1.02 (0.11) 1.03 (0.12)
[[beta].sub.13] = 0 1.01 (8.12) 1.03 (0.12)
[[beta].sub.23] = 0.8 1.04 (0.15) 1.05 (0.15)
[[beta].sub.34] = -1.3 1.02 (0.15) 1.03 (0.15)
[[sigma].sub.i] = 1, i = 1,2,3,4 1.14 (0.25) 1.13 (0.27)
1.08 (0.24) 1.06 (0.27)
Model [[sigma].sub.3] [[sigma].sub.4]
[[beta].sub.14] = -2 1.03 (0.12) 1.03 (0.13)
[[beta].sub.13] = 0.8 1.02 (0.13) 1.02 (0.14)
[[beta].sub.23] = 0.8 1.05 (0.16) 1.06 (0.16)
[[beta].sub.34] = -1.3 1.02 (0.16) 1.03 (0.17)
[[sigma].sub.i] = 1, i = 1,2,3,4 1.15 (0.32) 1.19 (0.29)
1.04 (0.31) 1.11 (0.28)
[[beta].sub.14] = -2 1.04 (0.12) 1.01 (0.12)
[[beta].sub.13] = 0 1.04 (0.14) 1.00 (0.13)
[[beta].sub.23] = 0.8 1.06 (0.16) 1.04 (0.17)
[[beta].sub.34] = -1.3 1.03 (0.16) 1.03 (0.17)
[[sigma].sub.i] = 1, i = 1,2,3,4 1.16 (0.30) 1.18 (0.26)
1.10 (0.30) 1.10 (0.25)
|
|
||||||||||||||||||

i·a·bil
ing·ly adv.
is true for sufficiently large
Printer friendly
Cite/link
Email
Feedback
Reader Opinion