Printer Friendly
The Free Library
14,678,647 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Annotation and cross-indexing of array elements on multiple platforms.


On the surface, transcript profiling using microarrays seems to offer a way of looking at the global response of the cell to perturbation perturbation (pŭr'tərbā`shən), in astronomy and physics, small force or other influence that modifies the otherwise simple motion of some object. The term is also used for the effect produced by the perturbation, e.g. , with a focus on changes in gene expression. The difficulty, however. is that the response of a particular gene is actually measured on the array by an element that is a short, defined nucleic acid nucleic acid, any of a group of organic substances found in the chromosomes of living cells and viruses that play a central role in the storage and replication of hereditary information and in the expression of this information through protein synthesis.  sequence. Sequences that map back to the same genetic locus may actually be given different names and descriptions when they are deposited in public sequence databases; when such sequences are used in microarray construction, dements that monitor the same genetic locus may have different names and descriptions. The algorithm described here uses a hierarchical approach to assign a single best annotation 1. (programming, compiler) annotation - Extra information associated with a particular point in a document or program. Annotations may be added either by a compiler or by the programmer.  to the dements in a given microarray in such a fashion that dements from one microarray platform may be cross-indexed with those of another. The algorithm relies on the nucleic acid accession number Accession number may mean:
  • Accession number (bioinformatics), a unique identifier given to a biological polymer sequence (DNA, protein) when it is submitted to a sequence database.
 for a given array element, and uses that to retrieve annotation from the most recent versions of LocusLink and UniGene. Both database resources are searched, with a priority being given to annotation derived from the curated LocusLink database. In lieu of annotation found in these databases, the default GenBank annotation is used. As a final outcome, a cross-chip identifier is generated that may be used to cross-index array dements. The program is available as a practical extraction and report language (Perl) script that can run under any Perl interpreter. Key words: annotation, cross-platform, indexing, LocusLink, microarray, UniGene. Environ Health Perspect 112:506-510 (2004). doi:10.1289/txg.6698 available via http://dx.doi.org/ [Online 15 January 2004]

**********

On the surface, microarrays and other genomic technologies offer the toxicologist toxicologist (tok´sikol´jist),
n a person versed in toxicology.


toxicologist

a specialist in toxicology.
 a look at the transcript levels for hundreds to thousands of genes. However, although toxicologists and cell biologists think in terms of genes and pathways, these technologies actually measure nucleic acid sequences. Thus, the challenge is to clearly associate a given nucleic acid sequence with the most current and consistent information on the gene of which it is part. This association is complicated by the fact that the same sequence can be submitted to public databases from several sources that may assign it different names and descriptions. For example, the gene, N-myc downstream regulated (Ndrg1) (LocusID 10397; http://www.ncbi.nih.gov/ LocusLink/) was originally cloned and submitted by three laboratories as different sequences with different names: RTP (1) (Rapid Transport Protocol) The protocol used in IBM's High Performance Routing (HPR) system.

(2) (Realtime Transport Protocol) An IP protocol that supports real time transmission of voice and video.
 (accession no. D87953; http://www.ncbi.nih.gov/ GenBank), a homocysteine-respondent gene in vascular endothelial cells Endothelial cells
The cells lining the inner walls of the blood vessels.

Mentioned in: Von Willebrand Disease
 (Kokame et al. 1996); DRG DRG,
n the abbreviation for diagnosis-related group.


DRG

see dorsal respiratory group.

DRG Diagnosis-related group Managed care A unit of classifying Pts by diagnosis, average length of hospital stay, and
1 (GenBank accession no. X92845), a gene upregulated during colon epithelial cell differentiation Cell differentiation

The mechanism by which cells in a multicellular organism become specialized to perform specific functions in a variety of tissues and organs. Specialized cells are the product of differentiation.
 (Van et al. 1997); and CAP43 (GenBank accession no. AF004162), a gene specifically induced by [Ni.sup.2+] compounds (Zhou et al. 1998). All three sequences are identical and represent the same gene. Microarrays are built using individual sequences or clones that are annotated in this fashion, and thus identifying microarray elements (i.e., spots) on a single array or on different arrays that represent a certain gene can be a frustrating exercise.

Our approach to annotate annotate - annotation  microarray elements makes use of two public databases: UniGene (http://www. ncbi.nih.gov/ UniGene/; Wheeler et al. 2000) and LocusLink (http://www.ncbi.nih.gov/ LocusLink/; Pruitt and Maglott 2001). Whereas UniGene is an experimental system for grouping GenBank sequences (http://www.ncbi.nih.gov/GenBank/) into gene-oriented clusters, LocusLink is a database of curated sequence and descriptive information about genetic loci loci

[L.] plural of locus.

loci Plural of locus, see there
. Together these resources allow us to map a given microarray element to a certain gene, using UniGene and the GenBank accession number of the element, and to annotate that gene using LocusLink information. Furthermore, the process for doing so is automated with a computer script that can be run on a regular basis to make use of current database information. Although our approach appears to be similar to that taken by the DRAGON database (http://pevsnerlab. kennedykrieger.org/dragon.htm; Bouton bouton /bou·ton/ (boo-tahn´) [Fr.] a buttonlike swelling on an axon where it has a synapse with another neuron.

synaptic bouton  b. terminal.
 and Pevsner 2000) and the DAVID David, in the Bible
David, d. c.970 B.C., king of ancient Israel (c.1010–970 B.C.), successor of Saul. The Book of First Samuel introduces him as the youngest of eight sons who is anointed king by Samuel to replace Saul, who had been deemed a failure.
 software (http://apps1.niaid.nih.gov/david/ upload.asp) (Dennis et al. 2003), ours seeks to create a single best annotation for a sequence and, based upon this hierarchical process, to generate a cross-chip ID. Although there are caveats to this approach, the results show that it generally allows for intra- and interplatform identification of microarray elements representing a single gene. This approach has been applied to comparing results generated in the multi-laboratory genomics research program coordinated by the International Life Sciences Institute (ILSI ILSI International Life Sciences Institute
ILSI Incorporated Law Society of Ireland
) Health and Environmental Sciences Institute (HESI HESI High Energy Solar Imager ) Committee on the Application of Genomics to Mechanism-Based Risk Assessment.

Materials and Methods

Algorithm rationale. Most developers of microarrays, either private or commercial (e.g., Affymetrix, Inc., Santa Clara Santa Clara, city, Cuba
Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba.
, CA) will provide for each array element (i.e., probe) a GenBank accession number indicating the sequence or clone that the element represents or is derived from. On the other hand, the descriptive information for such GenBank entries or the locus that they are associated with may change as new information is deposited in the public databases, especially UniGene and LocusLink. Furthermore, UniGene and LocusLink can serve as sequence "Rosetta stones" where a) UniGene serves to collate col·late  
tr.v. col·lat·ed, col·lat·ing, col·lates
1. To examine and compare carefully in order to note points of disagreement.

2. To assemble in proper numerical or logical sequence.

3.
 accession numbers, b) UniGene integrates with LocusLink, c) LocusLink serves as a curated annotation database with canonical gene names and curated gene information, and a) LocusLink integrates with other information such as OMIM OMIM Online Mendelian Inheritance in Man Online genetics The electronic–Web site-www.ncbi.nlm.mih.gov/omim version of Mendelian Inheritance in Man, a curated database See MIM catalog.  (Online Mendelian Inheritance in Man Online Mendelian Inheritance in Man See OMIM. ). To represent the best information for a particular microarray element, a cross-chip ID (XChipID) can be created based upon UniGene and LocusLink information, as described below.

Algorithm and logic flow. The logic flow of annotation is illustrated in Figure 1. Essentially, the program searches the UniGene database for the accession number in question. If the accession number is referenced in UniGene, the next step is to seek information in LocusLink, using the UniGene Cluster ID. If the accession number is not referenced in UniGene, then the LocusLink database is checked for the accession number. (Some accession numbers are referenced in LocusLink but not in UniGene.) If the accession number is not referenced in either the UniGene or LocusLink databases, then the annotation in GenBank associated with that accession number is used. As noted, a XChipID is constructed on the basis of the best ID available, a LocusID being preferred to a UniGene ID, and if neither is found, a GenBank accession number. The prefix The beginning or to add to the beginning. To prefix a header onto a packet means to place the header characters in front of the packet. "To prefix" at the beginning is the opposite of "to append" characters at the end. See prepend.

1.
 to the XChipID indicates the origin of the identifier (LL., LocusLink; Rn., rat; UniGene; Ac., GenBank).

[FIGURE 1 OMITTED]

Input files and software programs. The files obtained from the National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988.  (NCBI) are listed in Table 1, along with the key value and cross-indexed values obtained from each file. Data reported here made use of Rattus norvegicus UniGene Build no. 117 and LocusLink data current to 27 May 2003. Scripts (i.e., program code) were written in Perl, version 5,6.1, a programming language developed in 1988 by Larry Wall (person) Larry Wall - A demigod, the author of Perl, patch, and rn.

In the Perl README, he says, "I want you to know that I create nice things like this because it pleases the Author of my story. If this bothers you, then your notion of Authorship needs some revision.
 as Open Source software (http:// www.perl.org). Perl scripts are text-based programs run by an interpreter program, which has been developed for almost every operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
 (e.g., Mac, PC, UNIX UNIX

Operating system for digital computers, developed by Ken Thompson of Bell Laboratories in 1969. It was initially designed for a single user (the name was a pun on the earlier operating system Multics).
). Five Perl scripts were developed: UgXRef.pl to extract data from the Rn.data UniGene file; UgDupe.pl to examine UniGene data for duplicate entries; ChipXAnno.pl to collate data from the LocusLink, UniGene, and microarray definition files and carry out the annotation; ChipDupe2.pl to examine the microarray annotation file for multiple entries based on the XChipID; ChipCompare8.pl to compare two different microarray annotation files for overlapping entries; and XChipData.pl to merge data sets from two different microarray platforms. The outputs of all programs are simple text files, most of which are tab delimited A text format that uses tab characters as separators between fields. Unlike comma delimited files, alphanumeric data are not surrounded by quotes. , that can be imported into analysis programs such as Microsoft's Excel and Access (Microsoft Corp., Bellevue, WA) and Spotfire DecisionSite. All these programs have been run in a disk operating system See DOS.

1. (operating system) Disk Operating System - (DOS) The original disk operating system from IBM.

DOS was the low-end OS of choice on the IBM 360, the high-end system was called just "OS".
 (DOS) command line window using ActivePerl (binary build 629; http:// www.acrivestate.com), although after conversion of the end-of-line sequence they run under UNIX. UgXRef.pl processes UniGene files and as such is memory intensive: for large UniGene files (e.g., for mouse and human), these scripts must be run on either DOS or UNIX systems with > 1 GB RAM. The scripts are small and are available from the web site for the HESI Committee on the Application of Genomics to Mechanism-Based Risk Assessment (http://hesi.ilsi.org/publications/index.cfm? pubentityid=120).

Microarray definition files listing each microarray element and its associated accession number and description were obtained from individual vendors through the ILSI consortium.

The Blast2 program (http://www. ncbi.nlm.nih.gov/blast/bl2seq/bl2.html) was used to investigate the similarity and identity of various sequences at the protein level.

Results

Array annotation. The algorithm replaces frequently minimal sequence descriptions with biologically meaningful annotation. Thus, elements originally annotated as ESTs (Expressed Sequence Tags An expressed sequence tag or EST is a short sub-sequence of a transcribed spliced nucleotide sequence (either protein-coding or not). They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. ) are identified as corresponding to Gstm2 and Lgals1 (Table 2). It is important that in doing so the algorithm identifies multiple elements, including ESTs, that query the same locus. Examples given in Table 2 include cytochrome cytochrome (sī`təkrōm'), protein containing heme (see coenzyme) that participates in the phase of biochemical respiration called oxidative phosphorylation.  P450 1b1 (Cyp1b1), phosphodiesterase phosphodiesterase /phos·pho·di·es·ter·ase/ (-di-es´ter-as) any of a group of enzymes that catalyze the hydrolytic cleavage of an ester linkage in a phosphoric acid compound containing two such ester linkages.  4B (Pde4b), Cyp4a10, and endothelin receptor There are at least three known endothelin receptors, ETa, ETb1 and ETb2, all of which are G protein-coupled receptors whose activation result in elevation of intracellular-free calcium.  (Ednrb). Conversely, the algorithm can highlight elements incorrectly annotated. Thus, U39571, an element described as phosphatidylinositol 4-kinase (Pik4ca), was not annotated by the algorithm as Pik4ca. In fact, BLAST analysis (http://www.ncbi.nlm.nih.gov/BLAST/) showed that U39571 does not share significant sequence homology homology (hōmŏl`əjē), in biology, the correspondence between structures of different species that is attributable to their evolutionary descent from a common ancestor.  with the other Pik4ca sequences.

Occasionally microarray elements are incorrectly annotated and grouped. Although both accession number X81395 and accession number U10697 were annotated as carboxylesterase 1 (Ces1) (presumably pre·sum·a·ble  
adj.
That can be presumed or taken for granted; reasonable as a supposition: presumable causes of the disaster.
 because of DNA sequence DNA sequence Genetics The precise order of bases–A,T,G,C–in a segment of DNA, gene, chromosome, or an entire genome. See Base pair, Base sequence analysis, Chromosome, Gene, Genome.  homology), the amino acid amino acid (əmē`nō), any one of a class of simple organic compounds containing carbon, hydrogen, oxygen, nitrogen, and in certain cases sulfur. These compounds are the building blocks of proteins.  sequences are divergent enough to suggest that these are indeed two different proteins (data not shown). However, as UniGene clusters and LocusLink information are updated, incorrect groupings can be resolved. Thus, when UniGene and LocusLink information from September 2001 was used, X14552 (alpha-2[mu] globulin globulin, any of a large family of proteins of a spherical or globular shape that are widely distributed throughout the plant and animal kingdoms. Many of them have been prepared in pure crystalline form. , type 1) and M83298 (phosphatase phosphatase /phos·pha·tase/ (-tas) any of a group of enzymes that catalyze the hydrolytic cleavage of inorganic phosphate from esters.

phos·pha·tase
n.
 2A 55-kD regulatory subunit sub·u·nit  
n.
A subdivision of a larger unit.

Noun 1. subunit - a monetary unit that is valued at a fraction (usually one hundredth) of the basic monetary unit
fractional monetary unit
 alpha) were annotated as caldesmon (LocusID 25687), based on short sequence overlaps. Using February 2002 UniGene and LocusLink data the sequences identified by these GenBank accession numbers were distinguished from caldesmon (data not shown). As with any system using these resources, the annotation is only as current as the UniGene and LocusLink files used for input.

The XChipID represents the best information available identifier for a given sequence element and as such offers a means to a) group elements that actually represent the same gene and b) estimate the number of unique genes queried by the microarray. Thus, the 8,740 elements on the Affymetrix RGd_U34a array are estimated to query a total of 6,385 unique genes (Table 3). Of course, the actual sequence queried by each element is different, and as such, these sequences may have different hybridization hybridization /hy·brid·iza·tion/ (hi?brid-i-za´shun)
1. crossbreeding; the act or process of producing hybrids.

2. molecular hybridization

3.
 characteristics and give rise to quantitatively different signals.

Identification of homologous homologous /ho·mol·o·gous/ (ho-mol´ah-gus)
1. corresponding in structure, position, origin, etc.

2. allogeneic.


ho·mol·o·gous
adj.
1.
 targets across array platforms. Using the XchipID, one can determine genes queried in common by two different microarray platforms and compare results at a relatively simplistic sim·plism  
n.
The tendency to oversimplify an issue or a problem by ignoring complexities or complications.



[French simplisme, from simple, simple, from Old French; see simple
 level.

Cross-array comparisons of the Affymetrix RG_U34a, the NIEHS NIEHS National Institute of Environmental Health Sciences (NIH, DHHS)  7K array (National Institute of Environmental Health Sciences The National Institute of Environmental Health Sciences (NIEHS) is one of 27 Institutes and Centers of the National Institutes of Health (NIH),which is a component of the Department of Health and Human Services (DHHS). The Director of the NIEHS is Dr. David A. Schwartz. , Research Triangle Park Research Triangle Park, research, business, medical, and educational complex situated in central North Carolina. It has an area of 6,900 acres (2,795 hectares) and is 8 × 2 mi (13 × 3 km) in size. Named for the triangle formed by Duke Univ. , NC), and the Clontech Atlas Tox2 arrays (Clontech, Palo Alto Palo Alto, city, California
Palo Alto (păl`ō ăl`tō), city (1990 pop. 55,900), Santa Clara co., W Calif.; inc. 1894. Although primarily residential, Palo Alto has aerospace, electronics, and advanced research industries.
, CA, USA) indicate overlaps, as well as a substantial number of genes uniquely queried by each array (Figure 2).

[FIGURE 2 OMITTED]

In fact, the three arrays query only 209 genes in common, and even the Clontech array queries a significant number of genes not queried by the other two arrays. On a case-by-case basis, the results for a given gene on one platform can be compared with those for the same gene on a different platform, using the XChipID (Thompson et al. 2004), taking into account that each platform may query the same gene more than once. It is critical to reiterate, however, that the quality and intensity of the signal from any given microarray element, querying a given gene, will depend on the sequence of that element, preparation of the target hybridization material, and technical aspects of the hybridization and signal processing See DSP. . Furthermore, comparing platforms based on the XChipIDs depends on these platforms being annotated from the same input UniGene and LocusLink files. When these files are updated, the annotation process must be repeated for all platforms to be compared. Finally, comparing data from one array platform to another on a whole-array level is not a trivial effort, as the redundancy of genes queried on each platform creates what is called in database terminology a "many-to-many" relationship. XChipData.pl was designed to merge such data, and an example of the output from this program is given in Table 4.

Discussion

As microarrays are used more and more to investigate questions of biology and toxicology toxicology, study of poisons, or toxins, from the standpoint of detection, isolation, identification, and determination of their effects on the human body. Toxicology may be considered the branch of pharmacology devoted to the study of the poisonous effects of drugs. , a key technical issue becomes more and more problematic: that of associating the signals from each microarray sequence element with the known literature and biological context associated with that sequence. This issue is complicated because element descriptions are current only at the time of array construction and must be updated to reflect evolving information on the gene associated with the element. Such information can include an updated description, a standard gene/locus name (Wain et al. 1999; White et al. 1999), and gene ontology The Gene Ontology project, or GO, provides a controlled vocabulary to describe gene and gene product attributes in any organism. It can be broadly split into two parts.  information (Ashburner et al. 2000). Several automated annotation systems have been described, including the DRAGON system (Bouton and Pevsner 2000), the DAVID software (Dennis et al. 2003), and the NetAffx resource specifically for Affymetrix arrays (http:// www.affymetrix.com; Liu et al. 2003). Information from this latter resource can be automatically retrieved using the ChipInfo software (http://biosun1.harvard.edu/complab/chipinfo/; Zhong et al. 2003). The XChipAnno script described here differs in that it is designed to create a single best annotation and a XChipID. Although conceptually simple, the XChipID does group elements that, by annotation, should be querying the same gene, and in doing so allows for comparison of data across a microarray, between different versions of a microarray, and between different microarray platforms. This annotation can be carried out on a regular basis as public database information is updated. In addition, this annotation procedure requires only the GenBank accession number for a microarray element, not the actual sequence, and does not require extensive computer resources. The RESOURCERER database (http://pga.tigr.org/tigr-scripts/ nhgi_scripts/resourcerer.pl; Tsai et al. 2001) carries out a similar annotation approach using the TIGR Gene Indices and extending this cross-indexing to across species. In contrast to XChipAnno, RESOURCERER focuses on a number of selected common microarray platforms and is accessible by a web interface.

A limitation of this approach, and any approach that groups accession numbers on the basis of UniGene clusters, is that any given build of UniGene may incorrectly cluster certain sequences. Sequence homology can cause closely related but nonidentical non·i·den·ti·cal
adj.
1. Not being the same; different.

2. Fraternal, as of twins.
 genes to cluster together and hence be given the same annotation by this approach. Thus, discordant dis·cor·dant  
adj.
1. Not being in accord; conflicting.

2. Disagreeable in sound; harsh or dissonant.



dis·cor
 results for microarray elements having the same annotation (i.e., XChipID) are best resolved by a rigorous BLAST comparison of element sequences with each other and with the target gene sequence. Although a BLAST comparison of each microarray element sequence with the entire sequence database is technically daunting daunt  
tr.v. daunt·ed, daunt·ing, daunts
To abate the courage of; discourage. See Synonyms at dismay.



[Middle English daunten, from Old French danter, from Latin
, a simple comparison of such a sequence with a target sequence is quite simple using the LALIGN program (part of the FASTA FASTA Fraternidad de Agrupaciones Santo Tomás de Aquino (Spanish: Fraternity of St Thomas Aquinas Groups )
FASTA Federal Acquisition Streamlining Act
FASTA Fresno Area Substitute Teachers Association
 package; ftp://ftp.virginia.edu/ pub/fasta/) (Chao et al. 1992) and could be automated as a quality control check for the annotation of the entire microarray.

Another serious limitation in comparing different microarray platforms is encountered if one array uses sequences from several species, for example, a rat cDNA-based microarray that includes mouse sequences. Although these sequences may hybridize hy·brid·ize  
intr. & tr.v. hy·brid·ized, hy·brid·iz·ing, hy·brid·iz·es
1. To produce or cause to produce hybrids; crossbreed.

2.
 with a rat transcript, annotation by this method is not feasible, as individual species are clustered in UniGene separately. Such cross-species comparisons are desirable but may be best handled by large public database resources that link individual sequences with genomic information (Mattes et al. 2004).

Although any automated procedure to group and annotate DNA sequences is inherently flawed by the absence of human wisdom, such an automated approach is simply required to handle the vast amount of information contained within and generated by microarray technology. The approaches described in this article do help reduce the complexity and redundancy of microarray annotation in a straightforward fashion. The files required by this approach are readily available, and the output files generated may be directly used and manipulated with a variety of software packages such as Excel, Access, or Spotfire. Although microarray results are always best considered on a sequence-by-sequence basis, global annotation procedures can offer a way to provide an initial sift and analysis of the data with biological context.
Table 1. Input NCBI files for annotation.

File name          Source             Key value         Indexed values

loc2acc         LocusLink (a)   GenBank accession no.   LocusID
loc2UG          LocusLink       UniGene ID              LocusID
ll.out          LocusLink       LocusID                 Gene symbol,
                                                        LocusLink
                                                        description
Rn.data         UniGene (b)     Used to create
                                Acc2Ug_Rn.prn,
                                Ug2Tit Rn.prn
Acc2Ug_Rn.prn   Rn.data         GenBank accession no.   UniGene ID
Ug2Tit Rn.prn   Rn.data         UniGene ID              UniGene
                                                        description

(a) ftp://ftp.ncbi.nlm.nih.gov/refseq/LocusLink/.

(b) ftp//ftp.ncbi.nih.gov/repository/UniGene/.

Table 2. Summary of annotation results for Affymetrix Rat RG_U34a
genome chip. (a)

                                                             Updated
                                                            NCBI-based
                                                            annotation
GenBank (a)
accession      Original Affymetrix description (b)          XChipID (c)
no.

AI172064       EST218059 Rattus norvegicus cDNA,            LL.56646
               end/clone=RMUBU47
J02810         RATGSTYBX Rat prostate glutathione           LL.24423
               transferase mRNA, complete cds
X04229         RNGSTYBR Rat mRNA for glutathione            LL.24423
               S-transferase (GST) Y(b) subunit
H32189         EST107045RattusnorvegicuscDNA                LL.24423
               5' end /clone=RPCBK23
565355         Nonselective-type andothelin receptor        LL.50672
X57764         Rat mRNA for ET-B endothelin receptor        LL.50672
AA818970       UI-R-AO-as-g-05-0-UI.s1 Rattus               LL.50672
               norvegicus cDNA, 3' end
U09540         RNU09540 Rattus norvegicus                   LL.25426
               Sprague-Dawley cytochrome P450
               (CYP1B1) mRNA, complete cds
X83867         CYP1B1 Rattus norvegicus CYP1B1 mRNA         LL.25426
               for cytochrome P450 _
AI176856       EST220459 Rattus norvegius cDNA,             LL.25426
               3' end /clone=ROVBX74
M14972         Rat cytochrome P-450-LA-omega                LL.50549
               (lauric acid omega-hydroxylase)
               mRNA, complete cds
AA924267       UI-R-Al-ds-g-03-0-UI.sl Rattus               LL.50549
               norvegicus cDNA, 3' end
D83538         Rat mRNA for 230 kDa                         LL.64161
               Phosphatidylinositol 4-kinase,
               complete cds
U39572         RNU39572 Rattus norvegicus                   LL.64161
               phosphatidylinositol 4-kinase mRNA,
               complete cds
J04563         Rat cAMP phosphodiesterase mRNA,             LL.24626
               3' end
M25350         RATPHOCAMB Rat cAMP                          LL.24626
               phosphodiesterase (PDE4) mRNA,
               partial cds
X81395         Rattus norvegicus mRNA for pl 5.5 esterase   LL.29225
               (ES-3)
U10697         Rattus norvegicus kidney microsomal          LL.29225
               carboxylesterase mRNA

                         Updated NCBI-based annotation

GenBank (a)                      Gene        LocusLind
accession      LocusID (d)    symbols (d)    description (d)
no.

AI172064          56646         Lgals1       Lectin, galactose
                                             3' binding, soluble 1
J02810            24423         Gstm1        Glutathione
                                             S-transferase, mu 1
X04229
H32189
565355            50672          Ednrb       Endothelin
                                             receptor type B
X57764
AA818970
U09540            25426         Cyplbl       Cytochrome P450 1b1
X83867
AI176856
M14972            50549         Cyp4a10      Cytochrome P450, 4a10
AA924267
D83538            64161         Pik4ca       Phosphatidylinositol
                                             4-kinase
U39572
J04563            24626          Pde4b       Phosphodiesterase
                                             48, cAMP-specific
                                             [dunce (Drosophila)-
                                             homolog phospho-
                                             diesterase E4]
M25350
X81395            29225          Cesl        Carboxylesterase 1
U10697

(a) http:/www.ncbi.nih.gov/GenBank/. (b) Affymetrix descriptions are
those provided with the original chip definition file (RG_U34.GIN).
(c) Data represent selected output from XChipData.pl. (d) (From
LocusLink (http://www.ncbi.nih.gov

Table 3. Summary of annotation results for
Affymetrix Rat RG_U34a genome chip. (a)

No. of probe sets              8,740

LocusLink annotated            62.6%
UniGene-only annotated         22.4%
Unique                         73.0%
Ambiguous ESTs                 31.2%

(a) Summary output from ChipXAnnol.pl and ChipCompare8.pl.
Control probesets were not included in the analysis.

Table 4. Comparison of data from two platforms using the XchipID. (a)

                                     Ratio
XChipID      Affymetrix ID       (Affymetrix)   Change

LL.83783     L19998_g_at             1.36         I
LL.24791     rc_AA891204 s_at        0.74         D
LL.24791     rc_AA946313_s_at        0.76         D
LL.24791     U75928UTR#1_s_at        0.63         D
LL.24791     U75929UTR#1_f_at        0.64         D
LL.24791     Y13714_at               0.78         D
LL.171341    J03752_at               1.31         I
LL.299331    rc_AA944397_at          1.77         I
LL.299331    rc_AI176546_at          1.86         I
LL.83687     AF093536_at             0.91         0
LL.24854     M64733mRNA_s_at         2.50         I
LL.113902    L46791_at               1.80         I
LL.113902    X65296cds_s_at          2.48         I
LL.29144     L18889_at               1.27         I
LL.29144     rc_AAB93328_at          1.98         I
LL.29144     rc_A1010725_at          1.41         I
LL.64202     D78308_at               1.22         I
LL.64202     D78308_g_at             1.32         I
LL.64202     X53363cds_s_at          2.04         I

             GenBank (b)
              accession        Gene                    Ratio
XChipID          no.        symbol (c)    NIEHS_ID    (NIEHS)

LL.83783     L19998         SuIt1a1       AA874816      1.30
LL.24791     AA891204       Sparc         AA963036      0.81
LL.24791     AA946313       Sparc         AA963036      0.81
LL.24791     U75928         Sparc         AA963036      0.81
LL.24791     U75929         Sparc         AA963036      0.81
LL.24791     Y13714         Sparc         AA963036      0.81
LL.171341    J03752         Mgst1         AA818422      1.42
LL.299331    AA944397       Hsp86         AA819777      1.80
LL.299331    AI176546       Hsp86         AA819777      1.80
LL.83687     AF093536       Defb1         AA999116      0.85
LL.24854     M64733         Clu           AA818413      1.54
LL.113902    L46791         Ces3          AA955163      1.42
LL.113902    X65296         Ces3          AA955163      1.42
LL.29144     L18889         Canx          AA858850      1.33
LL.29144     AA893328       Canx          AA858850      1.33
LL.29144     AI010725       Canx          AA858850      1.33
LL.64202     D78308         Calr          AA859488      1.39
LL.64202     D78308         Calr          AA859488      1.39
LL.64202     X53363         Calr          AA859488      1.39

Abbreviations: D, decrease; I, increase.

(a) Data represent selected output from XChipData.pl. Both data sets
were analyses of RNA pooled from kidneys of rats treated for 7 days
with 80 mg/kg/day gentamycin (Kramer et al. 2004).
(b) (http://www.ncbi.nih.gov/GenBank/). (c) From LocusLink
(http://www.ncbi.nih.gov/LocusLink/).


REFERENCES

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et el 2000. Gone ontology ontology: see metaphysics.
ontology

Theory of being as such. It was originally called “first philosophy” by Aristotle. In the 18th century Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories
: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet genet: see civet.  25:25-29.

Bouton CM, Pevsner J. 2000. DRAGON: Database referencing of array genes online. Bioinformatics 16:1038-1039.

Chao KM, Pearson WR, Miller W. 1992. Aligning two sequences within a specified diagonal band. Comput Appl Biosci 8:481-487.

Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, et al. 2003. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3.

Kokame K, Kato H, Miyata T. 1996. Homocysteinerespondent genes in vascular endothelial cells identified by differential display analysis. GRP GRP Group
GRP Group (file name extension)
GRP Glass Reinforced Plastic
GRP Gastrin-Releasing Peptide (biology)
GRP Gross Rating Point (advertising) 
78/BiP and novel genes. J Biol Chem 271:29695-29665.

Kramer JA, Pettit SD, Amin RP, Bertram TA, Car BD, Cunningham M, et al. 2004. Overview of the application of transcription profiling using selected nephrotexicants for toxicology assessment. Environ Health Perspect 112:495-505.

Liu G, Loraine AE, Shigeta R, Cline cline, in biology, any gradual change in a particular characteristic of a population of organisms from one end of the geographical range of the population to the other.  M, Cheng J, Valmeekam V, et al. 2003. NetAffx; Affymetrix probesets and annotations. Nucleic Acids Nucleic acids
The cellular molecules DNA and RNA that act as coded instructions for the production of proteins and are copied for transmission of inherited traits.
 Res 31:82-86.

Mattes WB, Pettit SD, Sansone S-A S-A
abbr.
sinoatrial



S-A, SA

sinoatrial.
, Bushel bushel: see English units of measurement.  PR, Waters M, et al. 2004. Database development in toxicogenomics: issues and efforts. Environ Health Perspect 112:495-505.

Pruitt KD, Maglott DR. 2001. RefSeq and LocusLink: NCBI gone-centered resources. Nucleic Acids Res 29:137-140.

Thompson KL, Afshari CA, Amin R, Bertram TA, Car B, Cunningham M, et al. 2004. Identification of platform-independent gene expression markers of cisplatin cisplatin /cis·plat·in/ (sis´plat-in) DDP; a platinum coordination complex capable of producing inter- and intrastrand DNA crosslinks; used as an antineoplastic.

cis·plat·in
n.
 nephrotoxicity neph·ro·tox·ic·i·ty
n.
The quality or state of being toxic to kidney cells.


nephrotoxicity(ne·fr
. Environ Health Perspect 112:488-494

Tsai J, Sulatan R, Lee Y, Pertea G, Karamycheva S, Antonescu V, et al. 2001. Resourcerer: a database for annotating an·no·tate  
v. an·no·tat·ed, an·no·tat·ing, an·no·tates

v.tr.
To furnish (a literary work) with critical commentary or explanatory notes; gloss.

v.intr.
To gloss a text.
 and linking microarray resources within and across species. Genome Biol 2:1-4.

Van BN, Dinjens WN. Diesveld MP, Green NA, Van der Made van der Made is a Dutch last name, derived from the village Made in North Brabant, the Netherlands.

The prefix 'van der' is written in small type and the name is to be filed under 'M'.
 AC, Nozawa Y, et al. 1997. A novel gone which is up-regulated during colon epithelial cell differentiation and down-regulated in colorectal neoplasms. Lab Invest 77:85-92.

Wain H, White J, Povey S. 1999. The changing challenges of nomenclature. Cytogenet Cell Genet 86:162-164.

Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, et al. 2000. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 28:10-14.

White J, Wain H, Bruford E, Percy S. 1999. Promoting a standard nomenclature for genes and proteins. Nature 402:347.

Zhong S, Li C, Wong WH. 2003. ChipInfo: software for extracting gone annotation and gene ontology information for microarray analysis. Nucleic Acids Res 31:3483-3486.

Zhou D, Salnikow K, Costa M. 1998. Cap43, a novel gene specifically induced by [Ni.sup.2+] compounds. Cancer Res 58:2182-2189.

William B. Mattes

Investigative Toxicology, Pfizer Inc, Kalamazoo, Michigan “Kalamazoo” redirects here. For other uses, see Kalamazoo (disambiguation).
Kalamazoo is the largest city in the southwest region of the U.S. state of Michigan. As of the 2000 census, the city had a total population of 77,145.
, USA

This article is part of the mini-monograph "Application of Genomics to Mechanism-Based Risk Assessment."

Address correspondence to W.B. Mattes. Gene Logic Inc., 610 Professional Dr., Gaithersburg, MD 20879 USA. Telephone: (240) 364-6238. Fax: (240) 364-6262. E-mail: wmattes@genelogic.com

The author thanks the many colleagues who offered support and advice. The input of B. Pennie (Pfizer Inc), P. Lord (Johnson & Johnson Pharmaceutical Research Division), R. Paules [National Institute of Environmental Health Sciences (NIEHS)], and D. Robinson D. Robinson was a member of the silver medal winning French cricket team at the 1900 Summer Olympics, the only time to date that cricket has featured in the Olympics. In the only match against Great Britain, he took two wickets in Great Britain's first innings, and was dismissed  (Pfizer Inc) from the International Life Sciences Institute Health and Environmental Sciences Institute Committee on the Application of Genomics to Mechanism-Based Risk Assessment was critical to the initiation and continuation of this effort. J. Fostel (NIEHS), I. Reardon (Pfizer Inc), C. Storer (Pfizer), and M. Lawton (Pfizer Inc) offered especially helpful comments over the course of this project on the algorithm and Perl programming in general. The author also thanks C. Bradfield (McArdle Laboratory for Cancer Research, University of Wisconsin) for a careful review of this article. Finally, the author is indebted to S. Pettit (ILSI HESI) for her constant support and suggestions.

The authors declare they have no competing financial interests.

Received 25 August 2003; accepted 12 January 2004.
COPYRIGHT 2004 National Institute of Environmental Health Sciences
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Genomics and Risk Assessment: Mini-Monograph
Author:Mattes, William B.
Publication:Environmental Health Perspectives
Date:Mar 15, 2004
Words:4263
Previous Article:Database development in toxicogenomics: issues and efforts.(Genomics and Risk Assessment: Mini-Monograph)
Next Article:Bioengineering Research Partnerships.(Fellowships, Grants & Awards)



Related Articles
Do-it-yourself biospecimens: the benefits of home collection.(Science Selections)
Taking stock of toxicogenomics: mini-monograph offers overview.(Science Selections)
Toxicogenomics in risk assessment: an overview of an HESI collaborative research program.(Genomics and Risk Assessment: Mini-Monograph)
The utility of DNA microarrays for characterizing genotoxicity.(Genomics and Risk Assessment: Mini-Monograph)
Overview of an interlaboratory collaboration on evaluating the effects of model hepatotoxicants on hepatic gene expression.(Genomics and Risk...
Clofibrate-induced gene expression changes in rat liver: a cross-laboratory analysis using membrane cDNA arrays.(Genomics and Risk Assessment:...
Cross-site comparison of gene expression data reveals high similarity.(Genomics and Risk Assessment: Mini-Monograph)
Overview of the application of transcription profiling using selected nephrotoxicants for toxicology assessment.(Toxicogenomics)
Identification of platform-independent gene expression markers of cisplatin nephrotoxicity.(Genomics and Risk Assessment: Mini-Monograph)
Database development in toxicogenomics: issues and efforts.(Genomics and Risk Assessment: Mini-Monograph)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles