Printer Friendly

Preliminary Analysis of Expressed Sequence Tags for Sugarcane.

SUGARCANE is a perennial monocotyledonous grass belonging to the Saccharum genus. It is a crop of substantial economic importance, providing approximately two thirds of the world's sugar with an estimated annual worth of about $143 billion (Gallo-Meagher and Irvine, 1996). It is not a simple plant on a genetic level, being a very complex polyploid with chromosome numbers ranging between 100 and 130 (Lu et al., 1994). Genetic research into sugarcane has discovered numerous agronomically important phenotypic traits. However, very little information is available about the genes responsible for these traits. Prior to the initiation of this project only 10 sugarcane gene sequences had be, en identified. Seven had been published (Alix et al., 1998; Albert et al., 1995; Bugos and Thom, 1993a,b; Grof et al., 1995; Henrik et al., 1992; Tang and Sun, 1993) while the remainder had been submitted directly to GenBank database (Dharmasiri and Harrington, 1997; Sugiharto et al., 1997a,b).

The last decade has seen a rapid proliferation in knowledge about plant and animal genomes through the application of large-scale partial sequencing of anonymous cDNA clones from cDNA libraries and their subsequent identification through homology searches of public databases. This approach, commonly referred to as Expressed Sequence Tag (EST) analysis, has been extensively applied in large-scale cDNA sequencing projects for a variety of both plant and animal species such as humans (Adams et al., 1991, 1992), nematodes (McCombie et al., 1992; Waterston et al., 1992), Arabidopsis [Arabidopsis thaliana (L.) Heynh.] (Newman et al., 1994), and rice (Oryza sativa L.) (Sasaki et al., 1994). These groups have shown that partial cDNA sequences, or ESTs, can be used successfully to identify putative clones for a wide range of gene products. ESTs have been reported both in the literature and public databases for 47 690 rice cDNAs (Uchimaya et al., 1992; Sasaki et al., 1994, dbEST release February 2000), 193 090 Arabidopsis cDNAs (Hofte et al., 1993; Newman et al., 1994, dbEST release February 2000), and 55 466 maize (Zea mays L.) cDNAs (Keith et al., 1993, dbEST release February 2000). However, the availability of plant ESTs in the public databases is substantially less than that available for animal systems. This results in many plant gene identifications being based upon their sequence similarity to animal rather than plant species. There is a need, therefore, to identify and characterize new plant genes in order to increase the availability of plant genes in the international public databases.

Sugarcane biotechnology research world-wide is focused primarily on two main areas, genetic manipulation and identification of markers. One of the problems associated with genetic manipulation of sugarcane is the lack of homologous gene sequences, especially important for antisense work. Similarly, the lack of known sugarcane genes also has implications for molecular marker programs. The most recently published sugarcane maps have been constructed by means of anonymous restriction fragment length polymorphism (RFLP) and random amplified polymorphic DNA (RAPD) probes as well as heteroiogous probes from species such as maize, oat (Avena sativa L.), and rice (da Silva et al., 1995; Grivet et al., 1996). The identification of sugarcane genes could thus have significant consequences for sugarcane mapping and genetic manipulation and is therefore of great importance.

As a first step to address this issue, we have prepared cDNA libraries from different tissue types in the sugarcane plant. Here we report on the preliminary analysis of 250 anonymous cDNA clones from a library composed of mRNA isolated from the leaf roll (meristematic region) of the commercial sugarcane cultivar NCo376. This work will make a significant contribution towards sugarcane biotechnology.

MATERIALS AND METHODS

Total and poly (A+) RNA Isolation

RNA was extracted from the leaf roll (tissue section comprising apical meristem plus approximately 5 cm of etiolated immature leaf whorl) of mature field-grown sugarcane plants (Saccharum spp. hybrid, cultivar NCo376) by a modified method of Thompson et al. (1993). Approximately 4 g of tissue was used for each extraction. Tissue was ground to a fine powder under liquid nitrogen and transferred to a 50 mL Corning tube on ice. To each sample, 4 mL of RNA extraction buffer [1% (w/v) sodium dodecyl sulphate, 1 mM aurin tricarboxylic acid (ATA), 4% (w/v) p-aminosalicyclic acid, 10 mM Tris-HC1 pH 7.5, 1 mM ethylenedinitrilotetracetic acid, and 2% (v/v) 2-mercaptoethanol] and 4 mL phenol:chloroform: isoamylalcohol (50:49:1) was added. Samples were homogenized with an Ultra-Turrax vertical homogenizer (IKAWorks, Inc., Willmington, NC) for 3 to 4 min and then centrifuged at 4300 g for 20 min at 4 [degrees] C. The aqueous layer was removed, added to 2 M LiCl and 1 mM ATA (final concentration), and allowed to precipitate overnight at 4 [degrees] C. Samples were then centrifuged at 4300 g for 20 min at 4 [degrees] C. The pellet was suspended in 1 mL of 50 [micro]M ATA and transferred to a microcentrifuge tube. Samples were centrifuged at 3000 g for 2 rain to remove particulate matter and the supernatant transferred to a fresh tube. RNA was precipitated overnight at 4 [degrees] C with 2 M LiCl (final concentration). Samples were then centrifuged at 5000 g for 10 min at 4 [degrees] C, the supernatant discarded and the pellet rinsed with ice-cold 70% (v/v) ethanol. The pellet was resuspended in 250 [micro]L of 50 [micro]M ATA. The RNA was precipitated by the addition of 0.5 volumes 7.5 M ammonium acetate and 3 volumes 95% (v/v) ethanol with incubation for at least 2 h at -20 [degrees] C. After centrifugation at 5000 g for 30 min at 4 [degrees] C, the purified RNA was resuspended in 50 [micro]M ATA. mRNA was isolated by means of Hybond mAP (messenger affinity paper) (Amersham Pharmacia Biotech, Little Chalfont, UK), according to the manufacturer's instructions.

Construction of a Leaf Roll eDNA Library

eDNA Synthesis

First-strand cDNA synthesis was performed according to a modification of the method described in the Promega Protocols and Applications Guide (1990). Approximately 1 [micro]g of poly ([A.sup.+]) RNA was used in a first-strand synthesis reaction catalyzed by the RNase [H.sup.-] M-Mulv (Moloney-Murine Leukemia Virus) reverse transcriptase enzyme (Stratagene, La Jolla, CA) with oligo d[(T).sub.18] as the primer. Final reaction conditions for first-strand synthesis were as follow: 1 [micro]g mRNA; 0.5 [micro]g/ [micro]g mRNA of oligo d[(T).sub.18]; 50 mM Tris-HCl, pH 8.3; 75 mM KCl; 3 mM Mg[Cl.sub.2]; 10 mM DTT; 1 mM each of dATP, dCTP, dGTP, dTTP; 1.6 u/[micro]L ribonuclease inhibitor; 50 u/[micro]g mRNA of RNase [H.sup.-] M-Mulv reverse transcriptase. The reaction was incubated at 37 [degrees] C for 1 h. Second-strand synthesis was performed directly following first-strand synthesis and proceeded according to the method described in the Promega Protocols and Applications Guide (1990). Components for the second-strand synthesis reaction were added directly to the same tube following first-strand synthesis. Final reaction conditions for second-strand synthesis were 50 mM Tris-HCl (pH 7.6); 100 mM KCl; 5 mM Mg[Cl.sub.2]; 5 mM DTT; 0.1 mM NAD; 10 mM [([NH.sub.4]).sub.2][SO.sub.4]; 8 u/mL RNase H; 230 u/mL DNA polymerase 1; 5 u/mL E. coli DNA ligase; 50 [micro]g/mL BSA; 0.2 mM each of dATP, dCTP, dGTP, dTTP from first-strand reaction. The reaction was incubated at 14 [degrees] C for 2 h. After heat inactivation (70 [degrees] C, 10 min), second-strand synthesis was completed by the addition of T4 DNA polymerase (2 u/[micro]g mRNA) and incubated for 10 min at 37 [degrees] C. The ds cDNA product was phenol:chloroform extracted and purified through a QIAquick Spin column (Qiagen GambH, Hilden, Germany) according to the manufacturer's instructions, cDNA was ethanol precipitated prior to ligation to amplification adaptors.

Ligafion to Amplification Adaptors

cDNA was blunt-end ligated to an annealed amplification adaptor set (Jepson et al., 1991). This adaptor set consisted of the following two oligonucleotides:

Oligonucleotide 1 (29-mer):

5'-ATGCTTAGGAATTCCGATTTAGCCTCATA -3'

Oligonucleotide 2 (12 mer):

5'- TATGAGGCTAAA -3'.

Ligation was allowed to proceed overnight at 14 [degrees] C. After ligation, cDNA was size fractionated through a Quick-Spin, Linkers 6 column (Roche Molecular Biochemicals, Indianapolis, IN).

PCR Amplification of eDNA

Ligated, size fractionated cDNA was PCR amplified by means of Oligonucleotide 1 as the primer. The final reaction conditions were as follows: 1 x Taq DNA Polymerase buffer [50 mM KCl, 10 mM Tris-HCl (pH 9.0), 0.1% (v/v) Triton X100]; 600 ng Oligonucleotide 1; 1.25 mM each dideoxynucleotide triphosphates (dNTPs); 3.5 mM MgC[l.sub.2]; 1 unit Taq DNA polymerase; 1 [micro]L ds cDNA template. PCR amplification was performed in a Hybaid OmniGene Thermal Cycler (OmniGene Bioproducts, Inc., Cambridge, MA) under the following conditions: 1 cycle at 73 [degrees] C for 1 min, followed by 35 cycles of 94 [degrees] C, 08 min; 68 [degrees] C, 1.1 min; 73 [degrees] C, 3.0 min. An aliquot of each amplified cDNA sample was analyzed on a 1.5 % (w/v) agarose gel to confirm that amplification was successful. The remainder was used for cloning.

Library Construction

All individual PCR amplified cDNA samples were pooled and ethanol precipitated, cDNA was digested with 30 units EcoRI for 2.5 h and approximately 150 to 200 ng removed for cloning, cDNA was cloned into the EcoRI site of the Lambda ZAP II cloning vector and packaged according to the manufacturer's instructions (Stratagene, La Jolla, CA).

Template Preparation

Aliquots of the constructed leaf roll library were plated out onto solid NZY medium and single plaques randomly picked and stored in SM buffer [100 mM NaCl, 8 mM Mg[SO.sub.4] 7[H.sub.2]O, 20 mM Tris-HCl pH 7.5, 0.01% (w/v) gelatin] at 4 [degrees] C. The insert sizes of individual recombinant phages were examined by specific PCR amplification by means of the M13 reverse and T7 primers followed by 1.5 % (w/v) agarose gel electrophoresis. Templates for the ESTs from the leaf roll library were prepared in two ways. Phagemids [pBluescript SK(-)] plus inserts were excised from individual phages using the ExAssist helper phage system and performed according to the manufacturer's instructions (Stratagene). Individual phagemid clones were plated out onto solid Luria Bertani (LB) medium containing 50 [micro]g/mL ampicillin. For phagemid DNA isolation, a single colony of each clone was removed and inoculated into a 10 mL overnight culture of LB broth containing 50 [micro]g/mL ampicillin. Phagemid DNA was isolated from a 5-mL aliquot of the overnight culture using a Rapid Plasmid Isolation Protocol (Holmes and Quigly, 1981) and purified through QIAquick spin columns (Qiagen). Templates for DNA sequencing were prepared also by specific PCR amplification of cDNA inserts directly from individual phage suspensions in SM buffer by means of the M13 reverse and the T7 primers. Amplified inserts were purified with QIAquick spin columns (Qiagen) prior to sequencing.

Sequencing

Both phagemid and amplified insert cDNA were sequenced by dye terminator cycle sequencing by means of either the Taq DyeDeoxy Terminator Cycle Sequencing kit (PE Applied Biosystems, Foster City, CA), followed by purification through Centri-Sep Spin columns (Princeton Separations, Adelphia, NJ), or the AmpliTaq DNA polymerase, FS ready reaction kit (PE Applied Biosystems). In both cases, all procedures were performed according to the manufacturer's instructions. The M13 Reverse (5') primer was used to generate singlepass partial sequences for all isolated cDNAs. Cycle sequencing was performed in a Hybaid OmniGene Thermal Cycler and sequence analysis was performed with an ABI Prism 310 Genetic Analyzer (PE Applied Biosystems).

Sequence Data Analysis

Sequences were edited manually to remove vector and ambiguous sequences. The EST sequences were compared with the nonredundant protein databases by using the BLASTX (Altschul et al., 1990) e-mail server provided by NCBI (blast@ ncbi.nlm.nih.gov). Sequences showing a Point Acceptable Mutation (PAM) 120 similarity score of over 80 were considered homologous proteins for the clones (Altschul et al., 1090) while those with scores below 80 were regarded as showing sequence similarity. The EST was identified as the protein showing the highest score among the candidate proteins.

RESULTS

Characteristics of the Constructed LR cDNA Library

The titer of the constructed LR cDNA library was 2.96 x [10.sup.5] pfu/mL (unamplified). This titer is comparatively low, relative to the complexity of the polyploid sugarcane genome, but was considered to be sufficiently representative for preliminary analysis of the expressed genes present in sugarcane leaf roll. The titer of the amplified library was 4.2 x [10.sup.9] pfu/mL. Blue/white plaque selection following incubation of an aliquot of the library in the presence of X-gal (5-bromo-4-chloro-3-indolyl-[Beta]-D-galactopyranoside) and IPTG (isopropyl-[Beta]-D-thiogalactoside) revealed 95% recombinant plaques. The quality of the library was assessed by examining the insert sizes of 468 randomly selected recombinant plaques by specific PCR amplification with the T7 and M13 reverse primers. Of the 468 selected plaques, 0.09% were found to have no inserts. Insert sizes were found to range between 400 and 2500 bp with an average insert size of 600 bp. Sequence analysis of 250 randomly selected clones from the library indicated an absence of contaminating rRNA sequences in the library. In addition, both full-length and near full-length sequences were detected indicating that the leaf roll cDNA library was suitable for the generation of expressed sequence tags.

Generation of Expressed Sequence Tags

For generation of the ESTs, only clones with an insert larger than 400 bp were selected for sequencing. Altogether 250 clones were subjected to single-run partial sequencing, 60 of these using plasmid DNA as sequencing template, and the remaining 190 using DNA obtained by specific PCR amplification of insert DNA from recombinant phages using the T7 and M13 reverse primers. The amount of template DNA used per sequencing reaction differed depending on the source. For plasmid-derived DNA, 1 [micro]g of template was used and for PCR amplified DNA, 100 to 200 ng was required. For all sequencing reactions, only the M13 reverse primer (5') was used. As the cDNA library was not a directional library, the orientation of the cDNA inserts was random. This meant that it was not known from which end (5' or 3') the clones had been sequenced. To identify individual clones, each of the edited sequences was translated into all six translational reading frames and compared to the nonredundant protein sequences databases in GenBank. Deduced amino acid sequence homology between a sugarcane EST and a known sequence was deemed significant if the BLASTX PAM 120 similarity score was greater than 80 (Altschul et al., 1990). All sugarcane ESTs have been deposited in the GenBank database for ESTs, dbEST.

Sequencing Template

A small investigation was conducted to determine whether variation occurred in the amino acid sequence homology results when different forms of template DNA were used for sequence analysis. Conventionally, high quality plasmid DNA is the preferred form of template for sequencing reactions. However, the in vivo excision of phagemids from recombinant cDNA clones housed in a [Lambda]ZAP II vector and the subsequent isolation of phagemid DNA is a time-consuming process which can negatively affect large-scale sequencing efforts. It has been recognized that while direct sequencing of recombinant clones without isolation of plasmid DNA is a favorable alternative, results are often inconsistent. This is because the amount and quality of template DNA generated during PCR amplification of inserts may vary, which in turn can lead to unreliable results. In this study, a comparison was performed between sequencing results obtained using template DNA derived either from recombinant plasmids or PCR-amplified eDNA inserts from recombinant phages. Four different clones were selected arbitrarily. All sequencing reactions and sequence analysis were performed at the same time to minimize experimental error. It is evident that the length of the analyzed sequences is similar, regardless of template source (Table 1). After editing of sequences to remove the vector component, a final analyzed sequence length of approximately 400 bp was obtained for both plasmid and PCR-amplified insert DNA.

Table 1. Comparison between the length of analyzed DNA sequence and BLASTX PAM120 homology score using two different sources of template DNA.
                                     Length of        PAM120
                                 analysed sequence   homology
Clone   DNA template                  (bases)         score

B63     PCR-amplified fragment          483            409
pB63    plasmid                         547            486
B81     PCR-amplified fragment          541            136
pB81    plasmid                         410            141
A73     PCR-amplified fragment          420            160
pA73    plasmid                         403             83
B21     PCR-amplified fragment          457             --
pB21    plasmid                         459             --

Clone   DNA template             Putative identification

B63     PCR-amplified fragment   SuSy
pB63    plasmid                  SuSy
B81     PCR-amplified fragment   glutathione S-transferase
pB81    plasmid                  glutathione S-transferase
A73     PCR-amplified fragment   small nuclear ribonucleoprotein E
                                   homolog C29
pA73    plasmid                  small nuclear ribonuceloprotein E
                                   homolog C29
B21     PCR-amplified fragment   none
pB21    plasmid                  none


Identification of Genes

Analysis of 250 randomly selected clones revealed that 38% were homologous to peptide sequences present in the NCBI nonredundant protein databases (Tables 2 and 3). Of the remaining 62% of the ESTs, 49% did not appear to exhibit sequence similarity to any sequence on the databases according to the search criteria used, and thus were interpreted as possibly representing new genes not only in sugarcane but also in all organisms. The other 13% did not show significant homology to previously identified genes in the databases (i.e., similarity scores below 80) and thus were putatively identified on the basis of sequence similarity only. Of the 250 clones analyzed, 25% showed significant deduced amino acid sequence homology to previously identified plant genes (Table 2). Ten clones, although similar to plant genes, did not have PAM 120 scores above 80 and thus could not be considered as homologous. As only 10 previously identified sugarcane genes were registered with GenBank at the time of the database searches (commencing in 11996), all putative clone identities to plant genes came from plants other than sugarcane. Of the 62 identified homologous clones, 31% showed homology to monocotyledonous plant species such as rice, maize, and wheat (Triticum spp.). As expected, these proteins gave high similarity scores. One hundred thirty-seven ESTs (54%) showed sequence similarity to previously identified genes from species other than higher plants, and 20% of these were considered homologous (Table 3). The targeted species were widely distributed from bacteria to human.

Table 2. Sugarcane ESTs with sequence homology or similarity to known plant genes. The EST no. is the accession number assigned by dbEST. The numbers in the columns designated ID, Similar, and Overlap refer to the number of identical (ID) or similar (Similar) amino acids in a region of a particular length (Overlap). The column designated Organism refers to the source of the protein that exhibits homology or similarity to the sugarcane EST.
EST no.    Putative Identification and dbEST Accession   ID

AA080648   60S ribosomal protein L5 [P42796]             69
AA080649   60S ribosomal protein L5 [P46287]             43
AA080650   calcium-dependent protein kinase              33
             [P28583]
AA080655   vacuolar [H.sup.+]-ATPase subunit B           55
             [U07052]
AA080670   protein kinase [L27821]                       10
AA080674   3-oxoacyl-[actyl-carrier protein] reductase   46
             [S22417]
AA080657   unknown [687677]                              26
AA080659   athila ORF1 [AC007505]                        18
AA080580   sucrose synthase [S22537]                     32
AA080581   receptor-like protein kinase [Z17991]         19
AA080582   ADP-ribosylation factor [S49325]              67
AA080583   H2B histone [577825]                          17
AA080585   translation elongation factor eEF-1 beta-A1   28
             chain [S37103]
AA080586   enolase [P42895]                              97
AA080589   cascin kinase II, alpha chain [P28523]        74
AA080590   acyl-CoA-binding protein [U35015]             50
AA080599   protein phosphatase 2C [S55457]               51
AA080605   60S ribosomal protein L32 [Z17739]            40
AA080606   small nuclear ribonucleoprotein E             14
             homolog C29 [P24715]
AA080610   sucrose synthase [X81974]                     40
AA080615   pyruvate kinase, plastid [S44287]             27
AA080634   sucrose synthase [JT0280]                     55
AA080636   GTP-binding protein [D12542]                  69
AA080640   mitochondrial processing peptidase            45
             [X80236]
AA080642   stage III sporulation protein [S39321]        57
AA080646   glutathione S-transferase [P46422]            31
AA080668   proteasome C2 subunit [D37886]                73
AA269154   pectin methylesterase [Y08155]                49
AA269161   auxin response factor 1 [U83245]              18
AA269164   hypothetical protein (beta-1,3-glucanase)     29
             [S31196]
AA269165   vacuolar processing enzyme precursor          35
             [P49045]
AA269289   alcohol dehydrogenase [L08591]                72
AA269290   disease resistance protein RPM1 [X87851]      18
AA269291   chloroplast 30S ribosomal protein S7          32
             [P46292]
AI216928   hypothetical polyprotein [S57908]              6
AI216930   ER lumen protein retaining Receptor           15
             [P35402]
AI216931   unknown protein [AC004138]                    13
AA269292   cathepsin B [X66012]                          14
AA269294   sucrose synthase [JT0280]                     32
AA525640   actin depolymerizing factor [X97726]          70
AA525645   5' end not determined experimentally          21
             [U68408]
AA525649   glutathione S-transferase [P42761]            39
AA525651   3-oxoacyl-[acyl-carrier protein] reductase    84
             precursor [P28643]
AA525652   bicolor membrane intrinsic (Mip 1) protein    36
             [U87981]
AA525655   pyrophoshpate-fructose 6-phosphate            17
             1-phosphotransferase (PFP) beta subunit
             [P21343]
AA525658   UDP-glucose dehydrogenase [US3418]            72
AA525660   RNA helicase isolog [AC002337]                15
AA525661   2-oxoglutarate/malate translocator            67
             [D45075]
AA525664   acetyl-CoA carboxylase [U10187]                7
AA525666   translation initiation factor 5A [Y07920]     89
AA525669   Clp protease [AF032123]                       15
AA525677   ras-related protein RIC1 (GTP binding         37
             protein) [S66160]
AA525679   aspartic proteinase precursor [P42211]        75
AA525680   similar to hypothetical protein from A.       10
             thaliana [AC002986]
AA525686   germin like protein [U75205]                  19
AA525688   aspartic proteinase precursor [P42211]        58
AA525692   unknown protein [AC004122]                    27
AA525697   pectin methylesterase [Y08155]                34
AA577634   farnesyl-diphosphate farnesyltransferase      30
             (squalene synthase) [JC5031]
AA577635   cellulase homolog OR16pep [S71215]            15
AA577636   contains similarity to S. cerevisiae          18
             hypothetical protein YOR197w [3152597]
AA577639   lysine-ketoglutarate reductase/sacchuropine   36
             dehydrogenase bifunctional
             enzyme [AF003551]
AA577641   protein kinase isolog [U90439]                32
AA577644   Bowman-Birk protease inhibitor                12
             [2123385A]
AA577653   triose-phosphate isomerase, cytosolic         60
             [P12863]
AA577658   GDP-associated inhibitor [Y07961]             71
AA577659   chloroplast 50S ribosomal protein L32         10
             [P12197]
AA577663   voltage-dependent anion-selective channel     45
             protein (VDAC) [34 kDa outer
             mitochondrial membrane protein,
             porin] [P42055]
AA577664   hypothetical protein [Z97339]                 31
AA577666   nucleolar histone deacetylase HD2             22
             [U82815]
AA577669   unknown protein [U93215]                      20
AA525685   5-methyltetrahyropteroyl-triglutamate-        19
             homocysteine S-methyltransferase
             [S57636]
AI216932   cellulose synthase [U58284]                   52
AA577633   ATP synthase 6 kD subunit, mitochondrial      17
             [P80497]

Putative Identification and dbEST Accession   Similar

60S ribosomal protein L5 [P42796]                73
60S ribosomal protein L5 [P46287]                43
calcium-dependent protein kinase                 42
  [P28583]
vacuolar [H.sup.+]-ATPase subunit B              56
  [U07052]
protein kinase [L27821]                          19
3-oxoacyl-[actyl-carrier protein] reductase      53
  [S22417]
unknown [687677]                                 35
athila ORF1 [AC007505]                           25
sucrose synthase [S22537]                        36
receptor-like protein kinase [Z17991]            30
ADP-ribosylation factor [S49325]                 68
H2B histone [577825]                             31
translation elongation factor eEF-1 beta-A1      33
  chain [S37103]
enolase [P42895]                                101
cascin kinase II, alpha chain [P28523]           75
acyl-CoA-binding protein [U35015]                55
protein phosphatase 2C [S55457]                  60
60S ribosomal protein L32 [Z17739]               43
small nuclear ribonucleoprotein E                17
  homolog C29 [P24715]
sucrose synthase [X81974]                        44
pyruvate kinase, plastid [S44287]                27
sucrose synthase [JT0280]                        61
GTP-binding protein [D12542]                     77
mitochondrial processing peptidase               59
  [X80236]
stage III sporulation protein [S39321]           70
glutathione S-transferase [P46422]               43
proteasome C2 subunit [D37886]                   77
pectin methylesterase [Y08155]                   63
auxin response factor 1 [U83245]                 22
hypothetical protein (beta-1,3-glucanase)        42
  [S31196]
vacuolar processing enzyme precursor             44
  [P49045]
alcohol dehydrogenase [L08591]                   75
disease resistance protein RPM1 [X87851]         26
chloroplast 30S ribosomal protein S7             38
  [P46292]
hypothetical polyprotein [S57908]                12
ER lumen protein retaining Receptor              17
  [P35402]
unknown protein [AC004138]                       21
cathepsin B [X66012]                             17
sucrose synthase [JT0280]                        39
actin depolymerizing factor [X97726]             73
5' end not determined experimentally             27
  [U68408]
glutathione S-transferase [P42761]               54
3-oxoacyl-[acyl-carrier protein] reductase       93
  precursor [P28643]
bicolor membrane intrinsic (Mip 1) protein       37
  [U87981]
pyrophoshpate-fructose 6-phosphate               17
  1-phosphotransferase (PFP) beta subunit
  [P21343]
UDP-glucose dehydrogenase [US3418]               79
RNA helicase isolog [AC002337]                   18
2-oxoglutarate/malate translocator               72
  [D45075]
acetyl-CoA carboxylase [U10187]                  11
translation initiation factor 5A [Y07920]        90
Clp protease [AF032123]                          19
ras-related protein RIC1 (GTP binding            38
  protein) [S66160]
aspartic proteinase precursor [P42211]           79
similar to hypothetical protein from A.          17
  thaliana [AC002986]
germin like protein [U75205]                     22
aspartic proteinase precursor [P42211]           64
unknown protein [AC004122]                       39
pectin methylesterase [Y08155]                   38
farnesyl-diphosphate farnesyltransferase         48
  (squalene synthase) [JC5031]
cellulase homolog OR16pep [S71215]               19
contains similarity to S. cerevisiae             24
  hypothetical protein YOR197w [3152597]
lysine-ketoglutarate reductase/sacchuropine      41
  dehydrogenase bifunctional
  enzyme [AF003551]
protein kinase isolog [U90439]                   40
Bowman-Birk protease inhibitor                   14
  [2123385A]
triose-phosphate isomerase, cytosolic            60
  [P12863]
GDP-associated inhibitor [Y07961]                81
chloroplast 50S ribosomal protein L32            16
  [P12197]
voltage-dependent anion-selective channel        59
  protein (VDAC) [34 kDa outer
  mitochondrial membrane protein,
  porin] [P42055]
hypothetical protein [Z97339]                    35
nucleolar histone deacetylase HD2                23
  [U82815]
unknown protein [U93215]                         27
5-methyltetrahyropteroyl-triglutamate-           20
  homocysteine S-methyltransferase
  [S57636]
cellulose synthase [U58284]                      60
ATP synthase 6 kD subunit, mitochondrial         17
  [P80497]

Putative Identification and dbEST Accession   Overlap   Score

60S ribosomal protein L5 [P42796]                84      329
60S ribosomal protein L5 [P46287]                44      202
calcium-dependent protein kinase                 49      200
  [P28583]
vacuolar [H.sup.+]-ATPase subunit B              60      280
  [U07052]
protein kinase [L27821]                          24       64
3-oxoacyl-[actyl-carrier protein] reductase      61      238
  [S22417]
unknown [687677]                                 61      105
athila ORF1 [AC007505]                           48       76
sucrose synthase [S22537]                        52      171
receptor-like protein kinase [Z17991]            48       90
ADP-ribosylation factor [S49325]                 69      360
H2B histone [577825]                             62       75
translation elongation factor eEF-1 beta-A1      36      151
  chain [S37103]
enolase [P42895]                                105      511
cascin kinase II, alpha chain [P28523]           78      404
acyl-CoA-binding protein [U35015]                68      260
protein phosphatase 2C [S55457]                  87      228
60S ribosomal protein L32 [Z17739]               54      205
small nuclear ribonucleoprotein E                19       72
  homolog C29 [P24715]
sucrose synthase [X81974]                        52      202
pyruvate kinase, plastid [S44287]                43      126
sucrose synthase [JT0280]                        68      301
GTP-binding protein [D12542]                     82      356
mitochondrial processing peptidase               72      237
  [X80236]
stage III sporulation protein [S39321]           82      314
glutathione S-transferase [P46422]               82      138
proteasome C2 subunit [D37886]                   78      400
pectin methylesterase [Y08155]                   90      261
auxin response factor 1 [U83245]                 29       90
hypothetical protein (beta-1,3-glucanase)        61      161
  [S31196]
vacuolar processing enzyme precursor             63      189
  [P49045]
alcohol dehydrogenase [L08591]                   94      384
disease resistance protein RPM1 [X87851]         49       86
chloroplast 30S ribosomal protein S7             44      159
  [P46292]
hypothetical polyprotein [S57908]                16       42
ER lumen protein retaining Receptor              25       77
  [P35402]
unknown protein [AC004138]                       24       89
cathepsin B [X66012]                             25       79
sucrose synthase [JT0280]                        44      189
actin depolymerizing factor [X97726]             75      371
5' end not determined experimentally             37      119
  [U68408]
glutathione S-transferase [P42761]               77      199
3-oxoacyl-[acyl-carrier protein] reductase       97      441
  precursor [P28643]
bicolor membrane intrinsic (Mip 1) protein       41      203
  [U87981]
pyrophoshpate-fructose 6-phosphate               21       89
  1-phosphotransferase (PFP) beta subunit
  [P21343]
UDP-glucose dehydrogenase [US3418]               88      408
RNA helicase isolog [AC002337]                   21       86
2-oxoglutarate/malate translocator               89      374
  [D45075]
acetyl-CoA carboxylase [U10187]                  18       45
translation initiation factor 5A [Y07920]        92      480
Clp protease [AF032123]                          27       95
ras-related protein RIC1 (GTP binding            41      179
  protein) [S66160]
aspartic proteinase precursor [P42211]           89      390
similar to hypothetical protein from A.          23       62
  thaliana [AC002986]
germin like protein [U75205]                     32       90
aspartic proteinase precursor [P42211]           68      327
unknown protein [AC004122]                       55      143
pectin methylesterase [Y08155]                   59      178
farnesyl-diphosphate farnesyltransferase         70      160
  (squalene synthase) [JC5031]
cellulase homolog OR16pep [S71215]               23       86
contains similarity to S. cerevisiae             29      105
  hypothetical protein YOR197w [3152597]
lysine-ketoglutarate reductase/sacchuropine      42      181
  dehydrogenase bifunctional
  enzyme [AF003551]
protein kinase isolog [U90439]                   50      177
Bowman-Birk protease inhibitor                   26       71
  [2123385A]
triose-phosphate isomerase, cytosolic            63      302
  [P12863]
GDP-associated inhibitor [Y07961]                94      376
chloroplast 50S ribosomal protein L32            26       44
  [P12197]
voltage-dependent anion-selective channel        77      233
  protein (VDAC) [34 kDa outer
  mitochondrial membrane protein,
  porin] [P42055]
hypothetical protein [Z97339]                    39      175
nucleolar histone deacetylase HD2                24      111
  [U82815]
unknown protein [U93215]                         38      100
5-methyltetrahyropteroyl-triglutamate-           24      100
  homocysteine S-methyltransferase
  [S57636]
cellulose synthase [U58284]                      64      303
ATP synthase 6 kD subunit, mitochondrial         21      103
  [P80497]

Putative Identification and dbEST Accession   Organism

60S ribosomal protein L5 [P42796]             Arabidopsis thaliana
60S ribosomal protein L5 [P46287]             Medicago sativa
calcium-dependent protein kinase              Glycine max
  [P28583]
vacuolar [H.sup.+]-ATPase subunit B           Gossypium hirsutum
  [U07052]
protein kinase [L27821]                       Oryza sativa
3-oxoacyl-[actyl-carrier protein] reductase   Brassica napus
  [S22417]
unknown [687677]                              Arabidopsis thaliana
athila ORF1 [AC007505]                        Arabidopsis thaliana
sucrose synthase [S22537]                     Oryza sativa
receptor-like protein kinase [Z17991]         Arabidopsis thaliana
ADP-ribosylation factor [S49325]              Zea mays
H2B histone [577825]                          Zea mays
translation elongation factor eEF-1 beta-A1   Arabidopsis thaliana
  chain [S37103]
enolase [P42895]                              Zea mays
cascin kinase II, alpha chain [P28523]        Zea mays
acyl-CoA-binding protein [U35015]             Gossypium hirsutum
protein phosphatase 2C [S55457]               Arabidopsis thaliana
60S ribosomal protein L32 [Z17739]            Arabidopsis thaliana
small nuclear ribonucleoprotein E             Medicago sativa
  homolog C29 [P24715]
sucrose synthase [X81974]                     Beta vulgaris
pyruvate kinase, plastid [S44287]             Nicotiana tabacum
sucrose synthase [JT0280]                     Triticum aestivum
GTP-binding protein [D12542]                  Pisum sativum
mitochondrial processing peptidase            Solanum tuberosum
  [X80236]
stage III sporulation protein [S39321]        Arabidopsis thaliana
glutathione S-transferase [P46422]            Arabidopsis thaliana
proteasome C2 subunit [D37886]                Oryza sativa
pectin methylesterase [Y08155]                Melandrium album
auxin response factor 1 [U83245]              Arabidopsis thaliana
hypothetical protein (beta-1,3-glucanase)     Solanum tuberosum
  [S31196]
vacuolar processing enzyme precursor          Glycine max
  [P49045]
alcohol dehydrogenase [L08591]                Zea mays
disease resistance protein RPM1 [X87851]      Arabidopsis thaliana
chloroplast 30S ribosomal protein S7          Cuscuta europaea
  [P46292]
hypothetical polyprotein [S57908]             Oryza sativa
ER lumen protein retaining Receptor           Arabidopsis thaliana
  [P35402]
unknown protein [AC004138]                    Arabidopsis thaliana
cathepsin B [X66012]                          Triticum aestivum
sucrose synthase [JT0280]                     Triticum aestivum
actin depolymerizing factor [X97726]          Zea mays
5' end not determined experimentally          Zea mays
  [U68408]
glutathione S-transferase [P42761]            Arabidopsis thaliana
3-oxoacyl-[acyl-carrier protein] reductase    Cuphea lanceolata
  precursor [P28643]
bicolor membrane intrinsic (Mip 1) protein    Sorghum bicolor
  [U87981]
pyrophoshpate-fructose 6-phosphate            Solanum tuberosum
  1-phosphotransferase (PFP) beta subunit
  [P21343]
UDP-glucose dehydrogenase [US3418]            Glycine max
RNA helicase isolog [AC002337]                Arabidopsis thaliana
2-oxoglutarate/malate translocator            Panicum miliaceum
  [D45075]
acetyl-CoA carboxylase [U10187]               Triticum aestivum
translation initiation factor 5A [Y07920]     Zea mays
Clp protease [AF032123]                       Arabidopsis thaliana
ras-related protein RIC1 (GTP binding         Oryza sativa
  protein) [S66160]
aspartic proteinase precursor [P42211]        Oryza sativa
similar to hypothetical protein from A.       Arabidopsis thaliana
  thaliana [AC002986]
germin like protein [U75205]                  Arabidopsis thaliana
aspartic proteinase precursor [P42211]        Oryza sativa
unknown protein [AC004122]                    Arabidopsis thaliana
pectin methylesterase [Y08155]                Melandrium album
farnesyl-diphosphate farnesyltransferase      Glycyrrhiza glabra
  (squalene synthase) [JC5031]
cellulase homolog OR16pep [S71215]            Arabidopsis thaliana
contains similarity to S. cerevisiae          Arabidopsis thaliana
  hypothetical protein YOR197w [3152597]
lysine-ketoglutarate reductase/sacchuropine   Zea mays
  dehydrogenase bifunctional
  enzyme [AF003551]
protein kinase isolog [U90439]                Arabidopsis thaliana
Bowman-Birk protease inhibitor                Pisum sativum
  [2123385A]
triose-phosphate isomerase, cytosolic         Zea mays
  [P12863]
GDP-associated inhibitor [Y07961]             Arabidopsis thaliana
chloroplast 50S ribosomal protein L32         Oryza sativa
  [P12197]
voltage-dependent anion-selective channel     Solanum tuberosum
  protein (VDAC) [34 kDa outer
  mitochondrial membrane protein,
  porin] [P42055]
hypothetical protein [Z97339]                 Arabidopsis thaliana
nucleolar histone deacetylase HD2             Zea mays
  [U82815]
unknown protein [U93215]                      Arabidopsis thaliana
5-methyltetrahyropteroyl-triglutamate-        madagascar periwinkle
  homocysteine S-methyltransferase
  [S57636]
cellulose synthase [U58284]                   Gossypium hirsutum
ATP synthase 6 kD subunit, mitochondrial      Solanum tuberosum
  [P80497]


Table 3. Sugarcane ESTs with sequence homology or similarity to non-plant genes. The EST no. is the accession number assigned by dbEST. The numbers in the columns designated ID, Similar, and Overlap refer to the number of identical (ID) or similar (Similar) amino acids in a region of a particular length (Overlap). The column designated Organism refers to the source of the protein that exhibits homology or similarity to the sugarcane EST.
                   Putative Identification            ID
EST no.              and dbEST Accession

AA080647   yeast hypothetical 16.2 kd protein         41
             [P36053]
AI376340   LIM homeobox protein [Z97340]              29
AA080651   ribosomal protein L30 [B24028]             34
AA080653   alpha-fetoprotein enhancer-binding          8
             protein [A41948]
AA080669   mouse B-cell receptor CD22-Beta             7
             precursor [P35329]
AA080671   influenza virus hemagglutinin 5'           21
             epitope tag [S71745]
AA080672   erythrocyte membrane protein band          14
             4.2 [U04056]
AA080673   ornan sperm protamine P1 [P35307]          12
AA080675   hypothetical protein YJL076W [S56852]       8
AA080676   proline-rich polypeptide precursor         14
             [A42663]
AA080677   putative vitellogenin receptor              6
             [U13637]
AA080678   argininosuccinate synthase [P13257]        12
AA080656   mucin [M188878]                             9
AA080658   Cii-1=beta-toxin [165350]                   7
AA080660   GAG polyprotein [P31622]                   13
AA080661   OATL1 [L08238]                              6
AA080663   human U1 small ribonucleoorotein            8
             [P09234]
AA080664   Fin29 [AB007447]                           14
AA080665   60S ribosomal protein L28 [P46779]         13
AA080667   PAC1 protein [P39946]                      12
AA080587   oligodendrocyte-specific proline-rich      16
             protein 2 [C55663]
AA080591   drome broad-complex core-NS-Z3 protein      8
             [Q01293]
AA080592   similar to Saccharomyces cerevisiae ORF    10
             YCR028 [D89224]
AA080595   competence protein S [P80355]               7
AA080598   F55A11.4 [Z72511]                          18
AA080604   60S ribosomal protein L28 [P46779]         12
AA080608   gamma-glutamyl transpeptidase 2 [P36268]   15
AA080609   yeast probable ribosomal protein in        38
             VMA3-RIP1 intergenic region [P39990]
AA080611   hypothetical protein L9122.5 [S59413]       8
AA080612   tenascin-X precursor [A40701]               6
AA080617   surface antigen [M92048]                   15
AA080618   mec1p [U31109]                              8
AA080619   similar to calcium channel alpha subunit    9
             [U61951]
AA080621   probable G protein-coupled receptor        14
             GPR21 [Q99679]
AA080622   faf=fat facets gene [A49132]                7
AA080624   putative gtg start codon [X90711]           7
AA080625   51C surface protein [M65164]                6
AA080626   hypothetical 92.1 kD protein C24H6.03 in   17
             chromosome 1 [Q09760]
AA080628   phosphoribosylformylglycin amidine          7
             synthase [P35421]
AA080629   ORF 1130 [U20247]                           6
AA080630   human thrombospondin 1 precursor            7
             [P07996]
AA080632   URP5 [1204259B]                            16
AA080631   ERCC5 [D16305]                              6
AA080635   murine erythroleukemia cardiac calcium      5
             channel [U17869]
AA080637   ORF gene product [X95373]                   7
AA080638   latent transforming growth factor           8
             beta-binding protein 3 precursor
             [A57293]
AA080641   aminopeptidase [U35646]                    28
AA080643   TO8A11.2 [Z50875]                          25
AA080644   peroxisomal membrane protein 47B           19
             [U53145]
AA080645   coded for by C. elegans Cdna [U50191]       6
AA269152   DNA-binding protein [JQ1058]               18
AA269153   F15B9.7 [Z78018]                            6
AA269155   HIV-EP2 enhancer-binding protein            9
             [X65644]
AA269156   60S ribosomal protein L13A [P40429]        27
AA269157   copper transporting P-type ATPase           6
             [U38477]
AA269159   unknown [Z69239]                           25
AA269160   small nuclear ribonucleo protein           23
             [P43330]
AA269163   glycine-rich [U23453]                      17
AA269166   ZK593.7 [Z69385]                           44
AA269167   zinc finger protein ZMS1 [P46974]          10
AA269168   F25F2.2 [Z35599]                            8
AA269170   ryanodine receptor (skeletal muscle)        7
             [P21817]
AA269171   RO7E5.13 [Z32683]                          41
AA269172   HC1 ORF [X66285]                            7
AA269173   BmGATA beta isoform 3 [U16274]             10
AA269174   pyruvate dehydrogenase E1 component,       61
             alpha subunit [D90915]
AA269175   glycoprotein GP330, renal-rat               7
             (fragments) [A30363]
AA269176   ZC374.2 [Z72518]                            7
AA269177   guanine nucleotide-binding protein G(O),    5
             alpha subunit [P30033]
A1216928   ORF N118 [D84656]                          20
AA269293   Xenlia DG42 protein [P13563]                7
AA269295   ran=25 kda ras-related protein [239838]     7
AA269296   POLIM genome polyprotein [P03299]           5
AA269297   f549 [AE000312]                            12
AA269298   RNA-directed RNA polymerase (ORF 1A)        8
             [P19751]
AA269299   AT-motif binding factor [D26046]            7
AA525639   thioredoxin [P34723]                       23
AA525641   dTDP-glucose 4-6-dehydratase [D90911]      35
AA525642   hypothetical protein ZC84.1 [S28291]       11
AA525643   von Willebrand factor [L76227]              5
AA525646   envelope protein (human immunodeficiency    8
             virus type 1) [U20673]
AA525647   spliceosome associated protein [U41371]    26
AA525648   lozenge [U47849]                           16
AA525650   pericentrin [P48725]                       16
AA525653   human BDNF/NT-3 growth factors receptor    12
             precursor [Q16620]
AA525654   metallothionein (MT) [P07216]               9
AA525659   tropomyosin 1, fusion protein 33           20
             [P49455]
AA525662   chaperone protein SEFB precursor            6
             [P33387]
AA525665   adenosine kinase [U33936]                  38
AA525668   hypothetical 24.8 kD protein in            13
             FAA3-BET1 intergenic region [P40555]
AA525671   coded for by C. elegans cDNA yk89e9.5      25
             [U50199]
AA525672   human small proline-rich protein 2B         6
             [P35325]
AA525673   yeast verprolin [P37370]                    8
AA525674   Ig alpha chain C region [S03297]            5
AA525676   DNA polymerase [D50489]                    11
AA525681   human histone H3.1 [P16106]                33
AA525682   spike protein (porcine respiratory          7
             corona virus) [D00658]
AA525687   hypothetical protein C23D3.15 in           12
             chromosome 1 [AB004534]
AA525689   sequence 3 from Patent WO 8912462           5
             [I1349]
AA525690   endothelin-2 precursor (ET-2) [P12064]      9
AA525691   yeast hypothetical 32.8 kd protein in      28
             NCE3-HHT2 intergenic region [P53965]
AA525693   rjs [AF061529]                             13
AA525694   ORF N150 [D84656]                          10
AA577629   hypothetical 337.6 kD protein T20G5.3 in    6
             chromosome III [P34576]
AA577630   yeast hypothetical 65.3 kD protein in      20
             PRE3-SAG1 intergenic region [P47082]
AA577631   similarity to C. elegans retinoic acid     19
             receptors [Z92825]
AA577637   alk 8 [Y14766]                             14
AA577640   hypothetical protein in OGT 5' region      12
             [P46133]
AA577647   GRK4c [X97568]                             20
AA577648   similar to hypothetical proteins           16
             [Z99115]
AA577649   SNAP-25 interacting protein hrs-2           7
             [U87863]
AA577651   membrane protein CD40 [U57745]              9
AA577654   hypothetical protein [D63999]              11
AA577655   C32A3.1 [Z48241]                           12
AA577660   mataxin [AF059277]                         17
AA577661   60S ribosomal protein L10A [P530261        60
AA577667   probable ubiquitin carboxyl-terminal        8
             hydrolase [ubiquitin-specific
             processing protease] [P34547]
AA577668   transmembrane glycoprotein [M76753]         7

        Putative Identification
          and dbEST Accession              Similar

yeast hypothetical 16.2 kd protein           57
  [P36053]
LIM homeobox protein [Z97340]                29
ribosomal protein L30 [B24028]               38
alpha-fetoprotein enhancer-binding           10
  protein [A41948]
mouse B-cell receptor CD22-Beta               8
  precursor [P35329]
influenza virus hemagglutinin 5'             22
  epitope tag [S71745]
erythrocyte membrane protein band            18
  4.2 [U04056]
ornan sperm protamine P1 [P35307]            17
hypothetical protein YJL076W [S56852]        16
proline-rich polypeptide precursor           17
  [A42663]
putative vitellogenin receptor                6
  [U13637]
argininosuccinate synthase [P13257]          18
mucin [M188878]                              11
Cii-1=beta-toxin [165350]                    13
GAG polyprotein [P31622]                     17
OATL1 [L08238]                                7
human U1 small ribonucleoorotein             11
  [P09234]
Fin29 [AB007447]                             19
60S ribosomal protein L28 [P46779]           25
PAC1 protein [P39946]                        15
oligodendrocyte-specific proline-rich        22
  protein 2 [C55663]
drome broad-complex core-NS-Z3 protein        9
  [Q01293]
similar to Saccharomyces cerevisiae ORF      13
  YCR028 [D89224]
competence protein S [P80355]                14
F55A11.4 [Z72511]                            30
60S ribosomal protein L28 [P46779]           23
gamma-glutamyl transpeptidase 2 [P36268]     18
yeast probable ribosomal protein in          46
  VMA3-RIP1 intergenic region [P39990]
hypothetical protein L9122.5 [S59413]        17
tenascin-X precursor [A40701]                 9
surface antigen [M92048]                     23
mec1p [U31109]                               15
similar to calcium channel alpha subunit     11
  [U61951]
probable G protein-coupled receptor          23
  GPR21 [Q99679]
faf=fat facets gene [A49132]                  7
putative gtg start codon [X90711]            11
51C surface protein [M65164]                  8
hypothetical 92.1 kD protein C24H6.03 in     21
  chromosome 1 [Q09760]
phosphoribosylformylglycin amidine            8
  synthase [P35421]
ORF 1130 [U20247]                             8
human thrombospondin 1 precursor              9
  [P07996]
URP5 [1204259B]                              22
ERCC5 [D16305]                               10
murine erythroleukemia cardiac calcium       10
  channel [U17869]
ORF gene product [X95373]                    12
latent transforming growth factor             9
  beta-binding protein 3 precursor
  [A57293]
aminopeptidase [U35646]                      46
TO8A11.2 [Z50875]                            31
peroxisomal membrane protein 47B             31
  [U53145]
coded for by C. elegans Cdna [U50191]        10
DNA-binding protein [JQ1058]                 19
F15B9.7 [Z78018]                             10
HIV-EP2 enhancer-binding protein              9
  [X65644]
60S ribosomal protein L13A [P40429]          37
copper transporting P-type ATPase            10
  [U38477]
unknown [Z69239]                             29
small nuclear ribonucleo protein             29
  [P43330]
glycine-rich [U23453]                        25
ZK593.7 [Z69385]                             62
zinc finger protein ZMS1 [P46974]            17
F25F2.2 [Z35599]                             15
ryanodine receptor (skeletal muscle)         10
  [P21817]
RO7E5.13 [Z32683]                            53
HC1 ORF [X66285]                             12
BmGATA beta isoform 3 [U16274]               11
pyruvate dehydrogenase E1 component,         70
  alpha subunit [D90915]
glycoprotein GP330, renal-rat                 7
  (fragments) [A30363]
ZC374.2 [Z72518]                              9
guanine nucleotide-binding protein G(O),      8
  alpha subunit [P30033]
ORF N118 [D84656]                            37
Xenlia DG42 protein [P13563]                 11
ran=25 kda ras-related protein [239838]       8
POLIM genome polyprotein [P03299]            11
f549 [AE000312]                              17
RNA-directed RNA polymerase (ORF 1A)         11
  [P19751]
AT-motif binding factor [D26046]             10
thioredoxin [P34723]                         29
dTDP-glucose 4-6-dehydratase [D90911]        46
hypothetical protein ZC84.1 [S28291]         15
von Willebrand factor [L76227]               12
envelope protein (human immunodeficiency     10
  virus type 1) [U20673]
spliceosome associated protein [U41371]      37
lozenge [U47849]                             19
pericentrin [P48725]                         22
human BDNF/NT-3 growth factors receptor      15
  precursor [Q16620]
metallothionein (MT) [P07216]                11
tropomyosin 1, fusion protein 33             28
  [P49455]
chaperone protein SEFB precursor              7
  [P33387]
adenosine kinase [U33936]                    45
hypothetical 24.8 kD protein in              22
  FAA3-BET1 intergenic region [P40555]
coded for by C. elegans cDNA yk89e9.5        50
  [U50199]
human small proline-rich protein 2B           6
  [P35325]
yeast verprolin 1P37370]                     10
Ig alpha chain C region [S03297]              6
DNA polymerase [D50489]                      14
human histone H3.1 [P16106]                  34
spike protein (porcine respiratory            8
  corona virus) [D00658]
hypothetical protein C23D3.15 in             19
  chromosome 1 [AB004534]
sequence 3 from Patent WO 8912462             5
  [I1349]
endothelin-2 precursor (ET-2) [P12064]       11
yeast hypothetical 32.8 kd protein in        43
  NCE3-HHT2 intergenic region [P53965]
rjs [AF061529]                               22
ORF N150 [D84656]                            14
hypothetical 337.6 kD protein T20G5.3 in      7
  chromosome III [P34576]
yeast hypothetical 65.3 kD protein in        38
  PRE3-SAG1 intergenic region [P47082]
similarity to C. elegans retinoic acid       28
  receptors [Z92825]
alk 8 [Y14766]                               20
hypothetical protein in OGT 5' region        17
  [P46133]
GRK4c [X97568]                               30
similar to hypothetical proteins             20
  [Z99115]
SNAP-25 interacting protein hrs-2             9
  [U87863]
membrane protein CD40 [U57745]               14
hypothetical protein [D63999]                15
C32A3.1 [Z48241]                             15
mataxin [AF059277]                           28
60S ribosomal protein L10A [P53026]          82
probable ubiquitin carboxyl-terminal         11
  hydrolase [ubiquitin-specific
  processing protease] [P34547]
transmembrane glycoprotein [M76753]           9

        Putative Identification
          and dbEST Accession              Overlap   Score

yeast hypothetical 16.2 kd protein            87      245
  [P36053]
LIM homeobox protein [Z97340]                 29      146
ribosomal protein L30 [B24028]                47      189
alpha-fetoprotein enhancer-binding            15       51
  protein [A41948]
mouse B-cell receptor CD22-Beta               10       35
  precursor [P35329]
influenza virus hemagglutinin 5'              29       97
  epitope tag [S71745]
erythrocyte membrane protein band             29       68
  4.2 [U04056]
ornan sperm protamine P1 [P35307]             31       44
hypothetical protein YJL076W [S56852]         26       36
proline-rich polypeptide precursor            43       45
  [A42663]
putative vitellogenin receptor                 7       41
  [U13637]
argininosuccinate synthase [P13257]           31       62
mucin [M188878]                               19       55
Cii-1=beta-toxin [165350]                     23       40
GAG polyprotein [P31622]                      27       54
OATL1 [L08238]                                13       39
human U1 small ribonucleoorotein              17       50
  [P09234]
Fin29 [AB007447]                              30       81
60S ribosomal protein L28 [P46779]            31       76
PAC1 protein [P39946]                         21       69
oligodendrocyte-specific proline-rich         44       75
  protein 2 [C55663]
drome broad-complex core-NS-Z3 protein        19       42
  [Q01293]
similar to Saccharomyces cerevisiae ORF       18       70
  YCR028 [D89224]
competence protein S [P80355]                 20       41
F55A11.4 [Z72511]                             59       87
60S ribosomal protein L28 [P46779]            31       68
gamma-glutamyl transpeptidase 2 [P36268]      33       71
yeast probable ribosomal protein in           76      163
  VMA3-RIP1 intergenic region [P39990]
hypothetical protein L9122.5 [S59413]         28       42
tenascin-X precursor [A40701]                 16       38
surface antigen [M92048]                      44       68
mec1p [U31109]                                20       46
similar to calcium channel alpha subunit      21       34
  [U61951]
probable G protein-coupled receptor           37       70
  GPR21 [Q99679]
faf=fat facets gene [A49132]                   8       40
putative gtg start codon [X90711]             17       41
51C surface protein [M65164]                  16       34
hypothetical 92.1 kD protein C24H6.03 in      35       90
  chromosome 1 [Q09760]
phosphoribosylformylglycin amidine             8       38
  synthase [P35421]
ORF 1130 [U20247]                             12       41
human thrombospondin 1 precursor              12       40
  [P07996]
URP5 [1204259B]                               42       69
ERCC5 [D16305]                                14       37
murine erythroleukemia cardiac calcium        13       35
  channel [U17869]
ORF gene product [X95373]                     19       40
latent transforming growth factor             15       42
  beta-binding protein 3 precursor
  [A57293]
aminopeptidase [U35646]                       82      142
TO8A11.2 [Z50875]                             43      137
peroxisomal membrane protein 47B               4      105
  [U53145]
coded for by C. elegans Cdna [U50191]         14       35
DNA-binding protein [JQ1058]                  20       93
F15B9.7 [Z78018]                              12       42
HIV-EP2 enhancer-binding protein              15       44
  [X65644]
60S ribosomal protein L13A [P40429]           76      123
copper transporting P-type ATPase             13       39
  [U38477]
unknown [Z69239]                              40      129
small nuclear ribonucleo protein              30      124
  [P43330]
glycine-rich [U23453]                         42       93
ZK593.7 [Z69385]                              81      242
zinc finger protein ZMS1 [P46974]             21       62
F25F2.2 [Z35599]                              28       43
ryanodine receptor (skeletal muscle)          12       43
  [P21817]
RO7E5.13 [Z32683]                             75      227
HC1 ORF [X66285]                              23       57
BmGATA beta isoform 3 [U16274]                21       43
pyruvate dehydrogenase E1 component,          93      313
  alpha subunit [D90915]
glycoprotein GP330, renal-rat                  9       37
  (fragments) [A30363]
ZC374.2 [Z72518]                              12       50
guanine nucleotide-binding protein G(O),       8       43
  alpha subunit [P30033]
ORF N118 [D84656]                             54      119
Xenlia DG42 protein [P13563]                  17       50
ran=25 kda ras-related protein [239838]        8       41
POLIM genome polyprotein [P03299]             15       37
f549 [AE000312]                               28       52
RNA-directed RNA polymerase (ORF 1A)          17       42
  [P19751]
AT-motif binding factor [D260461              15       40
thioredoxin [P34723]                          46      124
dTDP-glucose 4-6-dehydratase [D90911]         53      198
hypothetical protein ZC84.1 [S28291]          26       55
von Willebrand factor [L76227]                20       46
envelope protein (human immunodeficiency      26       48
  virus type 1) [U20673]
spliceosome associated protein [U41371]       54      125
lozenge [U47849]                              36      102
pericentrin [P48725]                          51       61
human BDNF/NT-3 growth factors receptor       25       69
  precursor [Q16620]
metallothionein (MT) [P07216]                 21       61
tropomyosin 1, fusion protein 33              68       75
  [P49455]
chaperone protein SEFB precursor              13       34
  [P33387]
adenosine kinase [U33936]                     65      208
hypothetical 24.8 kD protein in               35       74
  FAA3-BET1 intergenic region [P40555]
coded for by C. elegans cDNA yk89e9.5         76      133
  [U50199]
human small proline-rich protein 2B           11       41
  [P35325]
yeast verprolin [P37370]                      15       41
Ig alpha chain C region [S03297]               8       29
DNA polymerase [D50489]                       26       53
human histone H3.1 [P16106]                   41      167
spike protein (porcine respiratory            12       43
  corona virus) [D00658]
hypothetical protein C23D3.15 in              29       68
  chromosome 1 [AB004534]
sequence 3 from Patent WO 8912462              8       31
  [I1349]
endothelin-2 precursor (ET-2) [P12064]        32       40
yeast hypothetical 32.8 kd protein in         63      167
  NCE3-HHT2 intergenic region [P53965]
rjs [AF061529]                                35       61
ORF N150 [D84656]                             23       60
hypothetical 337.6 kD protein T20G5.3 in      10       35
  chromosome III [P34576]
yeast hypothetical 65.3 kD protein in         71       89
  PRE3-SAG1 intergenic region [P47082]
similarity to C. elegans retinoic acid        54       69
  receptors [Z92825]
alk 8 [Y14766]                                34       68
hypothetical protein in OGT 5' region         28       69
  [P46133]
GRK4c [X97568]                                71       79
similar to hypothetical proteins              39       68
  [Z99115]
SNAP-25 interacting protein hrs-2             12       58
  [U87863]
membrane protein CD40 [U57745]                22       41
hypothetical protein [D63999]                 33       70
C32A3.1 [Z48241]                              25       44
mataxin [AF059277]                            60       71
60S ribosomal protein L10A [P530261          112      311
probable ubiquitin carboxyl-terminal          17       54
  hydrolase [ubiquitin-specific
  processing protease] [P34547]
transmembrane glycoprotein [M76753]           13       43

        Putative Identification
          and dbEST Accession              Organism

yeast hypothetical 16.2 kd protein         Saccharomyces cerevisiae
  [P36053]
LIM homeobox protein [Z97340]              Caenorhabditis elegans
ribosomal protein L30 [B24028]             Rattus rattus
alpha-fetoprotein enhancer-binding         Homo sapiens
  protein [A41948]
mouse B-cell receptor CD22-Beta            Mus musculus
  precursor [P35329]
influenza virus hemagglutinin 5'           Saccharomyces cerevisiae
  epitope tag [S71745]
erythrocyte membrane protein band          Mus musculus
  4.2 [U04056]
ornan sperm protamine P1 [P35307]          Ornithorhynchus anatinus
hypothetical protein YJL076W [S56852]      Saccharomyces cerevisiae
proline-rich polypeptide precursor         Rattus rattus
  [A42663]
putative vitellogenin receptor             Drosophila melanogaster
  [U13637]
argininosuccinate synthase [P13257]        Methanosarcina barkeri
mucin [M188878]                            Homo sapiens
Cii-1=beta-toxin [165350]                  Centruroides infamatus
GAG polyprotein [P31622]                   sheep pulmonary
                                             adenomatosis virus
OATL1 [L08238]                             Homo sapiens
human U1 small ribonucleoorotein           Homo sapiens
  [P09234]
Fin29 [AB007447]                           Homo sapiens
60S ribosomal protein L28 [P46779]         Homo sapiens
PAC1 protein [P39946]                      Saccharomyces cerevisiae
oligodendrocyte-specific proline-rich      Homo sapiens
  protein 2 [C55663]
drome broad-complex core-NS-Z3 protein     Drosophila melanogaster
  [Q01293]
similar to Saccharomyces cerevisiae ORF    Shizosaccharomyces pombe
  YCR028 [D89224]
competence protein S [P80355]              Bacillus subtilis
F55A11.4 [Z72511]                          Caenorhabditis elegans
60S ribosomal protein L28 [P46779]         Homo sapiens
gamma-glutamyl transpeptidase 2 [P36268]   Homo sapiens
yeast probable ribosomal protein in        Saccharomyces cerevisiae
  VMA3-RIP1 intergenic region [P39990]
hypothetical protein L9122.5 [S59413]      Saccharomyces cerevisiae
tenascin-X precursor [A40701]              Homo sapiens
surface antigen [M92048]                   Trypanosoma cruzi
mec1p [U31109]                             Saccharomyces cerevisiae
similar to calcium channel alpha subunit   Caenorhabditis elegans
  [U61951]
probable G protein-coupled receptor        Homo sapiens
  GPR21 [Q99679]
faf=fat facets gene [A49132]               Drosophila melanogaster
putative gtg start codon [X90711]          Bordetella pertussis
51C surface protein [M65164]               Paramecium tetraurelia
hypothetical 92.1 kD protein C24H6.03 in   Schizosaccharomyces
  chromosome 1 [Q09760]                      pombe
phosphoribosylformylglycin amidine         Drosophila melanogaster
  synthase [P35421]
ORF 1130 [U20247]                          Dichelobacter nodosus
human thrombospondin 1 precursor           Homo sapiens
  [P07996]
URP5 [1204259B]                            Chlamydomonas
                                             reinhardtii
ERCC5 [D16305]                             Homo sapiens
murine erythroleukemia cardiac calcium     Mus musculus
  channel [U17869]
ORF gene product [X95373]                  Plasmodium falciparum
latent transforming growth factor          Mus musculus
  beta-binding protein 3 precursor
  [A57293]
aminopeptidase [U35646]                    Mus musculus
TO8All.2 [Z50875]                          Caenorhabditis elegans
peroxisomal membrane protein 47B           Candida boidinii
  [U53145]
coded for by C. elegans Cdna [U50191]      Caenorhabditis elegans
DNA-binding protein [JQ1058]               Mus musculus
F15B9.7 [Z78018]                           Caenorhabditis elegans
HIV-EP2 enhancer-binding protein           Homo sapiens
  [X65644]
60S ribosomal protein L13A [P40429]        Homo sapiens
copper transporting P-type ATPase          Mus musculus
  [U38477]
unknown [Z69239]                           Schizosaccharomyces
                                             pombe
small nuclear ribonucleo protein           Homo sapiens
  [P43330]
glycine-rich [U23453]                      Caenorhabditis elegans
ZK593.7 [Z69385]                           Caenorhabditis elegans
zinc finger protein ZMS1 [P46974]          Candida boidinii
F25F2.2 [Z35599]                           Caenorhabditis elegans
ryanodine receptor (skeletal muscle)       Homo sapiens
  [P218171
RO7E5.13 [Z32683]                          Caenorhabditis elegans
HC1 ORF [X66285]                           Mus musculus
BmGATA beta isoform 3 [U16274]             Bombyx mori
pyruvate dehydrogenase E1 component,       Synechocystis spp.
  alpha subunit [D90915]
glycoprotein GP330, renal-rat              Rattus rattus
  (fragments) [A30363]
ZC374.2 [Z72518]                           Caenorhabditis elegans
guanine nucleotide-binding protein G(O),   Rattus rattus
  alpha subunit [P30033]
ORF N118 [D84656]                          Schizosaccharomyces
                                             pombe
Xenlia DG42 protein [P13563]               Xenopus laevis
ran=25 kda ras-related protein [239838]    Homo sapiens
POLIM genome polyprotein [P03299]          human poliovirus 1
f549 [AE000312]                            Escherichia coli
RNA-directed RNA polymerase (ORF 1A)       murine hepatitis virus
  [P19751]
AT-motif binding factor [D26046]           Mus musculus
thioredoxin [P34723]                       Penicillium chrysogenum
dTDP-glucose 4-6-dehydratase [D90911]      Synechocystis sp.
hypothetical protein ZC84.1 [S28291]       Caenorhabditis elegans
von Willebrand factor [L76227]             Canis familiaris
envelope protein (human immunodeficiency   Homo sapiens
  virus type 1) [U20673]
spliceosome associated protein [U41371]    Homo sapiens
lozenge [U47849]                           Drosophila melanogaster
pericentrin [P48725]                       Mus musculus
human BDNF/NT-3 growth factors receptor    Homo sapiens
  precursor [Q16620]
metallothionein (MT) [P07216]              Pleuronectes platessa
tropomyosin 1, fusion protein 33           Drosophila melanogaster
  [P49455]
chaperone protein SEFB precursor           Salmonella enteritidis
  [P33387]
adenosine kinase [U33936]                  Homo sapiens
hypothetical 24.8 kD protein in            Saccharomyces cerevisiae
  FAA3-BET1 intergenic region [P40555]
coded for by C. elegans cDNA yk89e9.5      Caenorhabditis elegans
  [U50199]
human small proline-rich protein 2B        Homo sapiens
  [P35325]
yeast verprolin [P37370]                   Saccharomyces cerevisiae
Ig alpha chain C region [S03297]           Gorilla gorilla
DNA polymerase [D50489]                    Hepatitis B virus
human histone H3.1 [P16106]                Homo sapiens
spike protein (porcine respiratory         pig
  corona virus) [D00658]
hypothetical protein C23D3.15 in           Schizosaccharomyces
  chromosome 1 [AB004534]                    pombe
sequence 3 from Patent WO 8912462          --
  [I1349]
endothelin-2 precursor (ET-2) [P12064]     Canis familiaris
yeast hypothetical 32.8 kd protein in      Saccharomyces cerevisiae
  NCE3-HHT2 intergenic region [P53965]
rjs [AF061529]                             Mus musculus
ORF N150 [D84656]                          Schizosaccharomyces
                                             pombe
hypothetical 337.6 kD protein T20G5.3 in   Caenorhabditis elegans
  chromosome III [P34576]
yeast hypothetical 65.3 kD protein in      Saccharomyces cerevisiae
  PRE3-SAG1 intergenic region [P47082]
similarity to C. elegans retinoic acid     Caenorhabditis elegans
  receptors [Z92825]
alk 8 [Y14766]                             Candida albicans
hypothetical protein in OGT 5' region      Escherichia coli
  [P46133]
GRK4c [X97568]                             Rattus norvegicus
similar to hypothetical proteins           Bacillus subtilis
  [Z99115]
SNAP-25 interacting protein hrs-2          Rattus norvegicus
  [U87863]
membrane protein CD40 [U57745]             Bos taurus
hypothetical protein [D63999]              Synechocystis sp.
C32A3.1 [Z48241]                           Caenorhabditis elegans
mataxin [AF059277]                         Mus musculus
60S ribosomal protein L10A [P53026]        Mus musculus
probable ubiquitin carboxyl-terminal       Caenorhabditis elegans
  hydrolase [ubiquitin-specific
  processing protease] [P34547]
transmembrane glycoprotein [M76753]        human T-cell lymphocyte
                                             virus type 1


During the course of sequencing analysis, several redundant clones were detected. These clones are presented in Table 4. The frequency of these clones in the total pool of leaf roll ESTs analyzed ranged from 0.8 to 4%. Analysis of the database search results indicated that the DNA sequences of the redundant clones were not identical; that is, they did not simply represent copies of the same clone in the cDNA library (data not shown). In most cases, the sequences were homologous to different regions of the gene sequence in the database, although in some cases, small regions of sequence overlap between clones was observed (data not shown).

A slight exception occurred for the ESTs homologous to SuSy [sucrose synthase (EC 2.4.1.13)]. Four clones were identified as being homologous to SuSy. These clones were first sequenced with the M13 Forward primer to obtain the full-length sequences of the four individual cDNA clones. Sequence overlap analysis indicated that high sequence homology occurred between clones AA080610 and AA080580, and AA080634 and AA269294 (Table 2). Consensus sequences for each of these pairs was generated with Sequence Navigator (PE Applied Biosystems). Overlap alignment of these two consensus sequences generated a total sequence length of 1450 bp with a near-identical overlap of 286 bp. GenBank database searches with the 1450-bp fragment revealed high homology to SuSy isoform I (data not shown). These results suggest that all four SuSy clones represent only one of the known SuSy isoforms. For all of the redundant clones identified, it is suggested that it may be indicative of an increased expression of those genes in the leaf roll.

Functional Identification of Sugarcane ESTs

All identified ESTs were categorized into general biochemical and metabolic function (Fig. 1). The leaf roll cDNA clones exhibited homology to a broad diversity of genes, including enzymes and proteins associated with ubiquitous metabolic pathways, structural proteins, and components of transcriptional and translational apparatus. The largest number of clones (35%) was found to encode many proteins as yet uncharacterized. There are several high-throughput gene sequencing programs currently in progress and many expressed sequences deposited in the GenBank databases by these groups do not yet have an identity. This results in many putative identities to unknown or hypothetical proteins. Of the remaining 65% of clones that were identified, 12.4% were enzymes. Sucrolytic enzymes were the most common, with nine clones representing six different enzymes being identified. These included key regulatory enzymes such as SuSy (AA080580, AA080610, AA080634, AA269294) and triose phosphate isomerase (AA577653). Several other metabolic pathways were represented including the citric acid cycle, fatty acid metabolism, anaerobic metabolism, and amino acid biosynthesis. A further 10.8% of ESTs were involved in protein modification and 9.7% in protein synthesis. These included eight different ribosomal proteins, represented by 10 individual clones, and a variety of protein kinases.

Membrane-associated proteins contributed a further 5.9% of the total identified clones. Fewer genes were involved in DNA binding (4.9%), regulation (4.9%), structural proteins (4.3%), RNA modification (3.2%), cell wall metabolism (2.7%), secretory proteins (1.1%), and ATP synthesis and electron transport (1.6%). Only one clone was identified as being stress- or defense-related (disease resistance protein RPM1, AA269290). A small percentage of clones (3.2%) were identified as having sequence similarity to proteins involved in functions that are not known to exist in plants. For example, ESTs were putatively identified for Caenorhabditis elegans retinoic acid receptors (AA577631), Canis familiaris von Willebrand factor (AA525643), human alpha-fetoprotein enhancer-binding protein (AA080653), and several others which cannot be immediately assigned probable functions in plants.

DISCUSSION

The use of an EST approach was found to be a very efficient and successful way of identifying genes in sugarcane. Of all the leaf roll cDNA clones identified by database homology searches, 38% had statistically significant similarities to known gene sequences. This value is comparable with that observed for the analysis of clones from maize endosperm and seedling cDNA libraries (39.3%, Shen et al., 1994) but was slightly greater than results from EST projects using cDNA libraries prepared from maize leaf (20%, Keith et al., 1993), various tissues and growth stages in rice (25%, Yamamoto and Sasaki, 1997), and equal portions of poly([A.sup.+]) RNA from etiolated seedlings, roots, leaves, and flowering inflorescences of Arabidopsis (32%, Newman et al., 1994). Several reasons have been cited for the apparent differences in results between various EST projects. For example, van de Loo and coworkers have indicated significantly higher values of identification when the tissue used for cDNA library construction was specialized for processes involving well-characterized classes of proteins (van de Loo et al., 1995). In addition, it has been shown that sequencing from the 5' terminus of the mRNA instead of the 3' is more informative, and thus the use of directionally cloned cDNA libraries will result in more significant matches (Shen et al., 1994). In this study, a non-directional cDNA library was prepared so the relatively high percentage of clone identification is probably related to the use of leaf roll tissue for library construction. The leaf roll is the meristematic region of the plant and is metabolically highly active. It is expected that a high proportion of the genes expressed in the leaf roll are involved in core housekeeping metabolic processes, for which DNA sequence information is available on international databases. However, it should be noted that a considerable proportion of clones with significant homology to sequences in the database (20%) have been identified on the basis of homology to non-plant genes. It is possible that these gene sequences have not been well characterized in plants. Because of the rapid growth in numbers of partially sequenced or completely sequenced animal and yeast genes, it is likely that there will always be a significant proportion of sugarcane (and other plant) genes identified by homology to non-plant genes. During the course of this study, it was also found that routine resubmission of clones with no sequence similarity usually resulted in several more identifications, simply due to new additions to the databases in the interim. It is likely that with the continual rapid escalation of databank submissions from a whole array of organisms, the rate of genes identified will increase simply based on repeated database searches.

During cDNA library construction, it is assumed that all cDNAs present are equally likely to be cloned. The relative frequency of cDNAs in sugarcane leaf roll tissue would therefore reflect the steady-state levels of the mRNA in the leaf roll. Thus the analysis of cDNA abundance may not only identify fundamental housekeeping genes, but also tissue-specific genes. Because of the small sample size of 250 clones in this study, random sequencing resulted primarily in the identification of genes belonging to the superabundant and abundant classes. To identify rare genes by this approach, it will be necessary to either sequence all the clones in the library, or to prepare a normalized library. However, the high cost both in resources and labor required for large-scale sequencing of total cDNA libraries make it an unpractical option for many small laboratories.

A variety of studies have shown that the composition of clones identified in cDNA libraries reflects the regulation of gene expression related to differentiation, growth condition, or environmental stress. In a recent review of the Rice Genome Project (Yamamoto and Sasaki, 1997), results were presented from EST identification of clones from a variety of tissues subjected to different growth conditions. This research has indicated, for example, that many ribosomal proteins and histone genes were found in growth-phase callus while genes encoding globulin and seed storage proteins such as glutelin and prolamine were identified in ripening panicles. Similarly, in developing castor endosperm a significant proportion of identified clones showed homology to storage proteins or components of the protein biosynthetic apparatus (van de Loo et al., 1995). In this study, the distribution of identified genes between the various metabolic pathways indicated that in sugarcane leaf roll genes involved in protein synthesis, protein modification and glycolysis were the most abundant (Fig. 1). In addition, there was also a significant proportion of genes coding for structural and cell wall proteins. These results probably reflect the high metabolic rate of the leaf roll. In addition, it was not surprising that only one clone was identified as being stress induced (disease resistance protein, RPM1). Because the leaf roll is protected by several leaf sheaths, it is not normally subject to insect or pathogen attack and will therefore not be adversely affected by environmental stresses except under extreme conditions. Some unexpected genes were also detected. Two clones were identified with homologies to a germin-like protein and a stage III sporulation protein, both involved in processes not considered to occur in sugarcane. A similar phenomenon has been observed in maize where proteins involved in nodulation and other processes specifically present in legumes were identified (Shen et al., 1994). These authors suggested that genes with specific functions in some species may have been "borrowed" through evolution to form new genes with different functions, or which simply share some common functional domain.

[Figure 1 ILLUSTRATION OMITTED]

During the course of the sequencing of the 250 cDNA clones, it was found that several types of clones were identified more than once. It is acknowledged that, compared with many other EST projects, a sample size of 250 is very small. It is also assumed that during the construction of the cDNA library, the PCR amplification of the cDNA was proportional and thus the library is representative of the mRNA pool. On this basis, it may be inferred that the occurrence of multiple copies of specific genes may be indicative of their relative frequency and reflect possible trends in level of expression in the leaf roll. Ten of the ESTs showed similarity to eight different ribosomal proteins (Table 4). Seven of these were large subunit proteins, one was a small subunit protein and it also included two chloroplast ribosomal proteins. This result was not unexpected because of the vigorous growth state of the leaf roll. Ribosomal proteins are fundamental proteins for living systems and are thought to play a specific regulatory role during development. Many ribosomal genes have been identified in growth-phase callus of rice (Yamamoto and Sasaki, 1997) so it seems likely that in sugarcane, ribosomal proteins would be specifically involved in differentiation and growth in the meristematic leaf roll region. Of particular interest in sugarcane, is the identification of clones homologous to the SuSy gene. Expression of SuSy in the leaf roll was found to be quite high (1.6% of total genes identified) compared with 0.6% expression in rice endosperm (Liu et al., 1995). Although the reaction catalyzed by SuSy is readily reversible, there is evidence that it is primarily involved in the breakdown of sucrose (Kruger, 1990). It has been shown that in actively growing tissues where there is high demand for hexose sugars as respiratory substrates, SuSy activity is high (Kruger, 1990). The apparent high expression of SuSy in sugarcane leaf roll could therefore be expected to be primarily related to the breakdown of sucrose in order to meet the demand for respiratory metabolites. The homology search results indicate that all the SuSy ESTs might be from the same expressed gene. However, more research is needed to establish whether this is the case. It is interesting to note that the sugarcane cDNA exhibited the highest homologies to the SuSy gene sequences from dicotyledonous species, despite the presence of SuSy gene sequences from other monocotyledonous plants in the database. The reasons for this observation are not immediately apparent. Other clones that were identified more than once could also be related to the active metabolic state of the leaf roll (Table 4). For example, expression of pectin methylesterase is related to cell wall biosynthesis during cell division. Likewise, 3-oxoacyl-(acyl-carrier protein) reductase expression is essential for cell membrane biosynthesis. Further work aimed at analyzing expression profiles of leaf roll cDNA clones using macroarrays is currently in progress. These results will supplement the trends observed from the random sequencing.

No similar work on the construction of an EST database has yet to be reported for sugarcane. This research has indicated that genes may be easily identified in sugarcane and has provided information about the metabolic state of the leaf roll, independent of the complexity of the sugarcane genome. It has also provided a resource of gene sequence information for sugarcane that may be applied to sugarcane biotechnology research. Further work is underway to develop an EST database for mature internodal tissue, the region in the plant where sucrose accumulation occurs.

ACKNOWLEDGMENTS

The authors acknowledge the technical assistance of Avril Harvey. The South African Sugar Association Experiment Station (SASEX) and the Foundation for Research Development (FRD) are gratefully acknowledged for financial support.

REFERENCES

Adams, M.D., M. Dubnick, A.R. Kerlavage, R. Moreno, J.M. Kelley, T.R. Utterback, J.W. Nagle, C. Fields and J.C. Venter. 1992. Sequence identification of 2375 human brain genes. Nature 355: 632-634.

Adams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie, and J.C. Venter. 1991. Complemetary DNA sequencing: expressed sequence tags and the human genome project. Science 252:1651-1656.

Albert, H.H., J.B. Carr, and P.H. Moore. 1995. Nucleotide sequence of sugarcane polyubiquitin cDNA. Plant Physiol. 109(1):337.

Alix, K., F.C. Baurens, F. Paulet, J.C. Glaszmann, A. D'Hont. 1998. Isolation and characterization of a satellite DNA family in the Saccharum complex. Genome 41:854-864.

Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410.

Bugos, R.C., and M. Thom. 1993a. A cDNA encoding a membrane protein from sugarcane. Plant Physiol. 102:1367.

Bugos, R.C., and M. Thom. 1993b. Glucose transporter cDNAs from sugarcane. Plant Physiol. 103:1469-1470.

da Silva, J., R.J. Honeycutt, W. Burnquist, S.M. Al-Janabi, M.E. Sorrells, S.D. Tanksley, and B.W.S. Sobral. 1995. Saccharum spontaneum L. `SES 208' genetic linkage map combining RFLP- and PCR-based markers. Mol. Breeding 1:165-179.

Dharmasiri, S., and H.M. Harrington. 1997. Nucleotide sequence of a cDNA clone encoding nucleoside diphosphate kinase from sugarcane. GenBank Accession: U55019 (http://www4.ncbi.nlm.nih.gov/ irx/cgi-bin/birx_doc?genbank+4595549; verified June 28, 2000).

Gallo-Meagher, M., and J.E. Irvine. 1996. Herbicide resistant sugarcane plant containing the bar gene. Crop Sci. 36:1367-1374.

Grivet, L., A. D'Hont, D. Roques, P. Feldmann, C. Lanaud, and J.C. Glaszmann 1996. RFLP mapping in cultivated sugarcane (Saccharum spp.): Genome organisation in a highly polyploid and aneuploid interspecific hybrid. Genetics 142:987-1000.

Grof, C.P.L., D. Glassop, R.C. Bugos, and P.H. Moore. 1995. Sequence of a cDNA encoding cytosolic fructose-1,6-bisphosphatase from sugarcane (Saccharum L) Plant Physiol. 109:339.

Henrik, A.H., T. Martin, and S.S.M. Sun. 1992. Structure and expression of a sugarcane gene encoding a housekeeping phosphoenolpyruvate carboxylase. Plant Mol. Biol. 20:663-671.

Hofte, H., T. Desprez, J. Amselem, H. Chiapello, M. Caboche. A. Moisan, M.F. Jourjon, J.L. Charpenteau, P. Berthomieu, D. Guerier, J. Giraudat, F. Quigley, F. Thomas, D.Y. Yu, R. Mache, M. Raynal, R. Cooke, F. Grellet, M. Delseny, Y. Parmentier, G. Marcillac, C. Gigot, J. Fleck, G. Philipps, M. Axelos, C. Bardet, D. Tremousaygue, and B. Lescure. 1993. An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J. 4:1051-1061.

Holmes, D.S., and M. Quigley. 1981. A rapid boiling method for the preparation of bacterial plasmids. Anal. Biochem. 114:193-197.

Jepson, I., J. Bray, G. Jenkins, W. Schuch, and K. Edwards. 1991. A rapid procedure for the construction of PCR cDNA libraries from small amounts of plant tissue. Plant Mol. Biol. Reporter 9(2): 131-138.

Keith, C.S., D.O. Hoang, B.M. Barrett, B. Feigelman, M.C. Nelson, H. Thai, and C. Baysdorfer. 1993. Partial sequence analysis of 130 randomly selected maize cDNA clones. Plant Physiol. 101:329-332.

Kruger, N.J. 1990. Carbohydrate synthesis and degradation, p. 59-76. In D.T. Dennis and D.H. Turpin (ed.) Plant physiology, Biochemistry and molecular biology. Longman Singapore Publishers (Pte) Ltd., Singapore.

Liu, J., C. Hara, M. Umeda, Y. Zhao, T.W. Okita, and H. Uchimaya. 1995. Analysis of randomly isolated cDNAs from developing endosperm of rice (Oryza sativa L.): evaluation of expressed sequence tags, and expression levels of mRNAs. Plant Mol. Biol. 29:685-689.

Lu, Y.H., A. D'Hont, F. Paulet, L. Grivet, M. Arnaud, J.C. Glaszmann. 1994. Molecular diversity and genome structure in modern sugarcane varieties. Euphytica 78:217-226.

McCombie, W.R., M.D. Adams, J.M. Kelley, M.G. FitzGerald, T.R. Utterback, M. Kahn, M. Dubnick, A.R. Kerlavage, J.C. Venter, and C. Fields. 1992. Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genet. 1:124-131.

Newman, T., F.J. de Bruijn, P. Green, K. Keegstra, H. Kende, L. McIntosh, J. Ohlrogge, N. Raikhel, S. Somerville, M. Thomashow, E. Retzel, and C. Somerville. 1994. Genes galore: A summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones. Plant Physiol. 106: 1241-1255.

Sasaki, T., J. Song, Y. Koga-Ban, E. Matsui, F. Fang, H. Higo, H. Nagasaki, M. Hori, M. Miya, E. Murayama-Kayano, T. Takiguchi, A. Takasuga, T. Niki, K. Ishimaru, H. Ikeda, Y. Yamamoto, Y. Mukai, I. Ohta, N. Miyadera, I. Havukkala, and Y. Minobe. 1994. Towards cataloguing all rice genes: large-scale sequencing of randomly chosen rice cDNAs from a callus cDNA library. Plant J. 6(4):615-624.

Shen, B., N. Carneiro, I. Torres-Jerez, B. Stevenson, T. McCreery, T. Helentjaris, C. Baysdorfer, E. Almira, R.J. Ferl, J.E. Habben, and B. Larkins. 1994. Partial sequencing and mapping of clones from two maize cDNA libraries. Plant Mol. Biol. 26:1085-1101.

Sugiharto, B., H. Sakakibara, and T. Sugiyama. 1997a. Differential expression of two genes for sucrose-phosphate synthase in sugarcane: Molecular cloning of the cDNAs and comparative analysis of gene expression. GenBank Accession: AB001337 (http:// www4.ncbi.nlm.nih.gov/irx/cgi-bin/birx_doc?genbank+4505184; verified June 28, 2000).

Sugiharto, B., H. Sakakibara, and T. Sugiyama. 1997b. Differential expression of two genes for sucrose-phosphate synthase in sugarcane: Molecular cloning of the cDNAs and comparative analysis of gene expression. GenBank Accession: AB001338 (http:// www4.ncbi.nlm.nih.gov/irx/cgi-bin/birx_doc?genbank+4505185; verified June 28, 2000).

Tang, W., and S.S.M. Sun. 1993. Sequence of a sugarcane ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit gene. Plant Mol. Biol. 21:949-951.

Thompson, W.F., M. Everett, N.O. Polans, R.A. Jorgensen and J.D. Palmer. 1993. Phytochrome control of RNA levels in developing pea and mung-bean leaves. Planta 158:487-500.

Uchimiya, H., S. Kidou, T. Shimazaki, S. Takamatsu, H. Hashimoto, R. Nishi, S. Aotsuka, Y. Matsubayashi, N. Kidou, M. Umeda, and A. Kato. 1992. Random sequencing of cDNA libraries reveals a variety of expressed genes in cultured cells of rice (Oryza sativa L.). Plant J. 2:1005-1009.

van de Loo, F.J., S. Turner, and C. Somerville. 1995. Expressed sequence tags from developing castor seeds. Plant Physiol. 108: 1141-1150.

Waterston, R., C. Martin, M. Craxton, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Showkeen, N. Halloran, M. Metzstein, T. Hawkins, R. Wilson, M. Berks, Z. Du, K. Thomas, J. Thierry-Mieg, and J. Sulston. 1992. A survey of expressed genes in Caenorhabditis elegans. Nature Genet. 1:114-123.

Yamamoto, K., and T. Sasaki. 1997. Large-scale EST sequencing in rice. Plant Mol. Biol. 35:135-144.

Abbreviations: bp, base pair; EST, expressed sequence tag; NCBI, National Centre for Biotechnology Information; SuSy, sucrose synthase; pfu, plaque forming units; PAM, Point Acceptable Mutation.

Deborah L. Carson and Frederik C. Botha(*)

D.L. Carson (xtecdc@sugar.org.za), Biotechnology Dep., South African Sugar Association Exp. Stn., Private Bag X02, Mount Edgecombe, 4300, South Africa, and Institute for Plant Biotechnology, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa; F.C. Botha, Institute for Plant Biotechnology, Univ. of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa. The South African Sugar Association Experiment Station (SASEX) and the Foundation for Research Development (FRD) are gratefully acknowledged for financial support. Received 1 Nov. 1999. (*) Corresponding author (FCB@land.sun.ac.za).

Published in Crop Sci. 40:1769-1779 (2000).
COPYRIGHT 2000 Crop Science Society of America
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000 Gale, Cengage Learning. All rights reserved.

 
Article Details
Printer friendly Cite/link Email Feedback
Author:Carson, Deborah L.; Botha, Frederick C.
Publication:Crop Science
Geographic Code:1USA
Date:Nov 1, 2000
Words:11550
Previous Article:Molecular Variation and [F.sub.1] Performance among Strains of the Sweet Corn Inbred P39.
Next Article:Genetic Diversity Patterns in Chinese Soybean Cultivars Based on Coefficient of Parentage.
Topics:

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters