Identification and characterization of 66 EST-SSR markers in the eastern oyster Crassostrea virginica (Gmelin).
ABSTRACT Large numbers of genetic markers are needed for genomic analyses in the eastern oyster (Crassostrea virginica). We previously identified 53 simple sequence repeat (SSR) markers from an expressed sequence tag (EST) database using a high selection standard. We mined the same EST database again using a lower threshold (>5 di-nucleotide and 4 other repeats) and identified 330 new SSR-containing ESTs. Primers were designed for 201 suitable sequences, and PCR was successful for 137. The screening of 113 primer pairs that produced fragments shorter than 800 bp produced 66 polymorphic SSR markers, which were characterized in 30 oysters from three populations and a full-sib family. The SSRs had an average of 5.4 alleles per locus, ranging from 2 12. Thirty-four loci segregated in the family, with seven showing significant deviation from Mendelian ratios after Bonferroni correction. Nullalleles were observed at 17 loci. The EST-derived SSRs are part of expressed genes, and most of them should be useful for gene and genome mapping. This study shows that more SSR markers can be developed from ESTs using lower selection standards.
KEY WORDS: expressed sequence tags, simple sequence repeats, linkage mapping, population genetics, oyster, Crassostrea virginica
The eastern oyster (Crassostrea virginica Gmelin, 1791) is an economically important mollusc that has supported important fishery industries in the United States. However, the eastern oyster populations and fishery in much of the Mid-Atlantic region have been devastated by overfishing, habitat destruction, and diseases (MacKenzie 1996). Aquaculture production of the eastern oyster is on the rise and increasingly demands superior stocks. Genetic improvement of oyster stocks can greatly benefit from a better understanding of the oyster genome and genes that control economically important traits such growth and disease resistance. An important step in genomic research is the development of a large set of genetic markers for genetic and QTL (quantitative trait loci) mapping.
For a long time, there were few genetic markers for the eastern oyster. It is only recently that DNA-based genetic markers became available. Genetic linkage maps have been constructed for the eastern oyster using primarily amplified fragment length polymorphisms (AFLPs) (Yu & Guo 2003). Although AFLPs are efficient markers and widely used in aquaculture species, they are dominant and less informative markers, and not readily transferable among populations. AFLP-based genetic maps have limited applications unless codominant markers are added. Codominant markers such as simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) are better suited for genome mapping because they are more informative and easily transferable. Whereas codominant markers are ideal, they are also more difficult and expensive to develop. At the present time, there are only about 95 SSRs available in C. virginica including seven from Brown et al. (2000), four from Yu and Guo (2003, 2006), 31 from Reece's laboratory (Reece et al. 2004, Carlsson et al. 2006, Carlsson & Reece 2007) and 53 from Wang & Guo (2007). Whereas the number of SSRs markers is sufficient for population genetics studies, it is inadequate for genome mapping. Large numbers of SSR markers (hundreds) are needed for genome mapping and population-wide association studies.
SSR markers are typically developed from genomic libraries enriched for SSR. Recently, expressed sequences tags (ESTs) have been shown to be good sources of SSR markers (Zhan et al. 2005, Wang et al. 2007, Wang & Guo 2007). ESTs are part of expressed genes, and the EST-derived SSRs can be considered as type I markers and used to map genes of known functions. We have previously identified and characterized 53 EST-SSRs by screening a database of 9,101 C. virginica ESTs with stringent criteria of having at least eight di-nucleotide and five other repeats (Wang & Guo 2007). These 53 EST-SSRs are highly polymorphic and useful for mapping and population studies. In this study, we mined the same EST database again using a lower threshold (>5 di-nucleotide and four other repeats) in an attempt to obtain more SSR markers. Here we report the development and characterization of 66 new polymorphic EST-SSR markers in selected individuals from three populations and a full-sib family.
MATERIALS AND METHODS
We downloaded all C. virginica ESTs from GenBank (http:// www.ncbi.nlm.nih.gov/dbEST) and screened them for SSRs with the software MISA (MIcroSAtellite, http://pgrc.ipk-gatersleben. de/misa/). The selection threshold for SSRs used in this study was five to seven di- and four tri, tetra-, penta- or hexa-nucleotide repeats, lower than the respective eight and five repeats used in the previous study (Wang & Guo 2007).
Primers were designed for SSR-containing ESTs with good and sufficient flanking sequences, using PRIMER 3 (http:// frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi), as previously described. A M13-tail (TGTAAAACGACGGCCAGT) was added to the 5' end of the forward primers (Schuelke 2000). PCR for each primer pair was performed in a 10-[micro]L solution including 1X PCR buffer (Promega) with 1.5 2.5 mM Mg[Cl.sub.2], 1.0 mg/mL bovine serum albumin, 0.2 mM each dNTP, 0.025 U Taq DNA polymerase, 0.025 [micro]M forward primer, 0.1 [micro]M reverse primers, 0.1 [micro]M of the WellRED dye-labeled M13 primer, and 5-30 ng of oyster genomic DNA. PCR cycling protocol consisted of the following: an initial denaturing for 5 rain at 94[degrees]C; 34 cycles of 94[degrees]C for 30 s, annealing at proper temperature (Table 1) for 45 s, and 72[degrees]C for 45 s; 18 cycles of 94[degrees]C for 30 s, 53[degrees]C for 45 s, and 72[degrees]C for 45 s; and a final extension at 72[degrees]C for 10 rain. PCR was conducted on either a GeneAmp 9700 thermocycler (Perkin Elmer, Weiterstadt, CA) or a PTC-200 DNA engine (MJ Research Inc., Watertown, MA).
All primer pairs were initially evaluated for consistent amplification in six oysters from three populations (two each): a wild Delaware Bay population (DB), a hatchery population of Rutgers University (NEH), and a wild population from Louisiana (LA). DNA was extracted from adductor muscle or mantle/gill tissues from each oyster with the E.Z.N.A. mollusc DNA kit (Omega Bio-tek, GA) following supplied protocols. PCR products were visualized on 2% agarose gels to see if the amplification is successful. PCR conditions were optimized for loci having complex banding patterns or low yields by adjusting annealing temperatures and/or Mg[Cl.sub.2] concentrations.
For primer pairs that produced specific and reproducible fragments, polymorphism was further assessed using denaturing polyacrylamide gels (4% to 8% polyacrylamide, AA:BIS = 19:1, with 7 M urea and in 0.5X TBE as described in Wang & Guo (2007). PCR products were denatured at 95[degrees]C for 5 min and then loaded onto preheated polyacrylamide gels and run for 3-5 h at 150 V. PCR fragments were stained with ethidium bromide and visualized under UV illumination.
Polymorphic SSRs were further genotyped and characterized in 30 oysters from the same three populations mentioned above (10 each). PCR was conducted as described above. PCR products (about 0.5-1.0 [micro]L) that labeled with different WellRED fluorescent dyes were mixed with 30 [micro]L of deionized formamide and 0.4 [micro]L of size standard for electrophoresis on a CEQ 8000 Genetic analyzer (Beckman Coulter). Allele size was determined by the software onboard the genetic analyzer, and genotypes for each oyster were recorded.
To verify Mendelian inheritance, all polymorphic markers were tested in a full-sib family (HB4) with two parents and 100 one-year old progeny. DNA was extracted by the E.Z.N.A. mollusc DNA kit as mentioned before. All segregating loci were tested for goodness of fit to the expected Mendelian ratios using chi-square test.
To determine the function of genes associated with the SSR markers, GenBank homology searches were conducted for all EST sequences that contained polymorphic SSRs using BLASTX and BLASTN (http://www.ncbi.nlm.nih.gov/ BLAST), at a significant level of e-value <1.00E-8.
Using the threshold mentioned above, the screening of 9,101 ESTs identified 330 SSR-containing sequences and 398 SSRs. Fifty-two EST sequences had more than one SSR, and 44 ESTs contained compound SSR motifs. Among the 398 SSRs, 201 (50.5%) were di-nucleotide, 164 were (41.2%) trinucleotide, 28 (7.0%) were tetra-nucleotide, one was penta- and four were hexa-nucleotide repeats. Di- and trinucleotide repeats in combination accounted for 91.7% of all SSRs. Among di-nucleotide repeats, the motif AG/CT (60.7%) was most common, whereas CG/CG was least abundant (only one SSR). Among trinucleotide repeats, ACT/AGT (27.4%) and AAG/CTT (24.4%) and were the most common motifs, whereas AAAC/GTTT was the most-frequent tetra-nucleotide motif.
Primers were designed for 201 SSR-containing ESTs that had good and sufficient flanking sequences. PCR amplification was successful for 137 (68.2%) primer pairs after optimization, and the remaining 64 (31.8%) failed to amplify under various annealing temperatures and/or Mg[Cl.sub.2] concentrations. Among the 137 amplified primer pairs, 93 (67.9%) produced PCR products with expected sizes, and 44 (32.1%) produced longer than the expected size, probably because of the presence of introns. Because the size standard used for the genetic analyzer can only detect PCR products shorter than 800 bp, 24 primer pairs that produced PCR fragments longer than 800 bp were excluded for further characterization. Subsequently, 113 primer pairs that produced fragments shorter than 800 bp were screened for polymorphism in the six oysters with polyacrylamide gel electrophoresis, producing 66 (58.4%) polymorphic loci. The remaining 47 (41.6%) loci were monomorphic and may not be the true SSRs. The primer sequences and PCR conditions for the 66 polymorphic SSRs are listed in Table 1.
Polymorphism of the 66 SSRs was characterized in 30 oysters from three populations as mentioned above. Two SSRs amplified more than two fragments in some individuals, probably due to nonspecific amplification. The remaining 64 SSRs showed no more than two alleles per individual (Table 1). Null-genotypes were observed at nine loci in 4-15 of the 30 oysters (or 13.3-50%). In the 30 oysters genotyped, the SSRs had an average of 5.4 alleles per locus, ranging from 2-12 (Table 1).
All 66 polymorphic SSRs were tested for Mendelian segregation in a full-sib family (HB4) with 100 progeny, although 32 loci were monomorphic (31 loci had AA x AA genotype and one marker was BB x AA). The remaining 34 loci were polymorphic and segregated in HB4 (Table 2). Null alleles were observed at 10 loci (29.4%). For some loci, the two alleles showed different amplification efficiency. At RUCV241 (AB x AC), for example, the C allele is visibly lower than A and B alleles, and the C allele may be considered as a partial null-allele and require careful attention for scoring. Ten (29.4%) loci showed significant (P < 0.05) deviation from Mendelian ratios, and seven (20.6%) remained significant after Bonferroni correction (Table 2). Among the 34 segregating loci, four (RUCV156, RUCV230, RUCV227, and RUCV270) amplified more than two fragments in some or all oysters. These loci could still be scored for mapping purposes, as some of fragments were fixed leaving only one locus segregating. For example, in RUCV227 (AB x CC), the extra D allele at 639 bp showing up in both parents and all the progeny, and the A, B and C alleles showed Mendelian inheritance. Likewise, loci RUCV156 had two extra alleles that showed up in both parents and all progeny and could be ignored. RUCV270 amplified three extra alleles that showed no variation in all oysters, and the three alleles (243, 395 and 421 bp) differed by multiples of 26 bp, possibly because of a tandem mini-satellite. The extra fragment at RUCV230 was only present in some individuals. Further studies are needed to verify whether the extra alleles represent duplicated loci or caused by nonspecific amplifications.
GenBank BLAST searches found that 26 of 66 SSR-containing ESTs (39.4%) had significant (at e-value <1.00E08) homology to known genes or predicted proteins from other organisms (Table 3). The 26 genes included ribosomal proteins, ribosomal RNA methyltransferase, beta-actin, erg gene, cytoplasmic actin, actin binding protein, heat shock protein, aspartate racemase, and receptor tyrosine kinase. Twelve of these genes were segregating in HB4 and could potentially be mapped as Type I markers (Table 2).
We previously developed 53 SSR markers from the ESTs of the eastern oyster (Wang & Guo 2007). The present study was designed to determine if additional SSR markers can be developed from the same EST database by lowering the SSR selection threshold. Determining the lowest threshold of detecting SSRs is important for bioinformatic mining of SSR markers. By lowering the threshold from having at least eight di-nucleotide and five other types of repeats to five and four respectively, this study identified an additional 330 SSR-containing sequences and developed 66 new polymorphic SSRs from the same database of 9101 ESTs. Most studies have used the threshold of having at least six di-nucleotide and five other nucleotide repeats (Gupta et al. 2003, Thiel et al. 2003, Barrett et al. 2004, Gao et al. 2004, Nicot et al. 2004, Perez et al. 2005), and some used five di-nucleotide and four other repeats (Yu et al. 2004, Han et al. 2006). Our results suggest that the lower threshold of having at least five di-nucleotide and four other types of repeats works in the eastern oyster.
Using the lower threshold, this and the previous study in combination identified 537 SSRs in 456 ESTs (5.0% of all ESTs) totaling 5.2 million base pair. Our results suggest that there is about one SSR in every 9.7 kb of expressed sequences in the eastern oyster, which is similar to what have been reported for wheat (9.2 kb, Gupta et al. 2003) and barley (6.3 kb, Thiel et al. 2003). Such a comparison should be viewed with caution, as slightly different criteria are used in different studies.
Lowering the SSR threshold had a clear effect on the success rate of SSR discovery. In our previous study, 53 polymorphic SSRs were developed by screening 66 primer pairs, corresponding to a success rate of 80% (Wang & Guo 2007). In this study, the success rate is 58% or 66 polymorphic SSRs from 113 primer pairs. Further, the level of polymorphism of the SSRs developed in this study was also reduced. In the same set of 30 oysters, the 53 SSRs from the previous study had 9.3 alleles per locus, whereas the 66 SSRs in this study had 5.4 alleles per locus. Many of the SSRs identified in this study, especially these with 2-3 alleles, may be indels rather than true SSRs. Clearly, we are approaching the limit of discovering more SSRs from the same EST database.
In this study, 68% of the primer pairs were successfully amplified, a rate similar to the 67% reported by Reece et al. (2004) for genomic SSRs in the same species, and higher than the 47% reported with genomic SSRs for the Pacific oyster C. gigas (Thunberg) (Li et al. 2003). Failures of PCR amplification can be caused by many factors including primer design, sequence quality and polymorphism at the priming site. For EST-SSRs, the failure can also be caused by the presence of introns. The presence of introns was suggested by the larger than expect PCR fragments observed at 32% of the loci. This is a conservative estimate as some introns might be too large to be amplified. It seems that in the eastern oyster at least 32% of EST amplicons between 100 and 300 bp contain introns, and the upper limit should be about 54% as 93 of the 201 primer pairs produced products of the expected size (also see Wang & Guo 2007).
Polymorphism at the priming site may explain the nullalleles observed in this study. Because of exceptionally high levels of polymorphism, null alleles are common in oysters (Huvet et al. 2000, McGoldrick et al. 2000, Launey et al. 2002, Li et al. 2003, Hedgecock et al. 2004, Reece et al. 2004). The finding of null alleles at 17 (or 25.8%) out of 66 loci (nine in populations and 10 in the family with two overlapping) is not unusual. Hedgecock et al. (2004) observed null alleles at 49 out of 96 loci (or 51%) in three families of C. gigas. The presence of null-alleles may complicate population genetics studies, as nullalleles cause deviation from Hardy-Weinberg equilibrium. We did not test fitness to Hardy-Weinberg equilibrium in this study because the 30 oysters did not come from one population. We used 30 oysters from three diverse populations to estimate allele diversity across a wide geographic range.
As expected, most polymorphic SSR loci developed here segregated in Mendelian ratios. The number of distorted loci observed in this study (29% before and 21% after Bonferroni correction) was slightly higher than that reported in the same species for EST-SSRs (21% before and 7% after Bonferroni correction) (Wang & Guo 2007) and for genomic-SSRs (29% before and 11% after correction) (Reece et al. 2004), but similar to what has been reported in C. gigas (Launey & Hedgecock 2001; McGoldrick et al. 2000) and the flat oyster Ostrea edulis (Naciri et al. 1995). Segregation distortion is common in oysters and probably caused by the high levels of polymorphism or recessive lethal genes.
As part of or immediately adjacent to expressed genes, EST-derived SSRs are particularly useful in mapping and interrogating functional genes. Unfortunately, most oyster ESTs do not have homology to known genes at this time and in this study, only 26 of 66 SSR-containing ESTs can be matched to known genes. Most of these genes can be identified over time as more genes from molluscs are annotated. Some of the genes, such as heat shock proteins, may be involved in host response to stress and diseases. Markers derived from these genes should be useful for gene and comparative mapping, and for associating their variation with phenotypes.
In conclusion, 66 new SSRs were successfully developed from a database of eastern oyster ESTs with a low selection threshold of having at least five di- and four other nucleotide repeats. Whereas some of the SSRs have low levels of polymorphism, most of them are moderately polymorphic and segregate in Mendelian ratios. They should be useful for genome mapping and population genetics studies. Extremely high levels of polymorphism for SSR markers may not be advantageous for some applications. Population genetics analysis with highly variable SSRs may require a large sample size to cover a large number of rare alleles. Moderately variable SSRs with no or fewer null-alleles may be more appropriate for population genetics studies. This study brings the total number of available SSRs markers to 161. Still, hundreds more SSRs and other types of genetic markers are needed for genome mapping in the eastern oyster. The finding that more SSRs can be developed from EST databases using a low selection threshold should encourage similar efforts in this and other species.
This study is supported by grants from NOAA Sea Grant Oyster Disease Research Program (NJMSC-6742-0001) and New Jersey Marine Science Consortium (NJMSC-6840-0005). This is publication No. 2009-3 of IMCS and NJSG-09-713.
Barrett, B., A. Griffiths, M. Schreiber, N. Ellison, C. Mercer, J. Bouton, B. Ong, J. Forster, T. Sawbridge, G. Spangenberg, G. Bryan & D. Woodfield. 2004. A microsatellite map of white clover. Theor. Appl. Genet. 109:596-608.
Brown, B. L., D. E. Franklin, P. M. Gaffney, M. Hong, D. Dendanto & I. Kornfield. 2000. Characterization of microsatellite loci in the eastern oyster, Crassostrea virginica. Mol. Ecol. 9:2217-2219.
Carlsson, J., C. L. Morrison & K. S. Reece. 2006. Wild and aquaculture populations of the eastern oyster compared using microsatellite. J. Hered. 97:595-598.
Carlsson, J. & K. S. Reece. 2007. Eight PCR primers to amplify EST-linked microsatellites in the Eastern oyster, Crassostrea virginica genome. Mol. Ecol. Notes 7:257-259.
Gao, L. F., R. L. Jing, N. X. Huo, Y. Li, X. P. Li, R. H. Zhou, X. P. Chang, J. F. Tang, Z. Y. Ma & J. Z. Jia. 2004. One hundred and one new microsatellite loci derived from ESTs (EST-SSRs) in bread wheat. Theor. Appl. Genet. 108:1392-1400.
Gupta, P. K., S. Rustgi, S. Sharma, R. Singh, N. Kumar & H. S. Balyan. 2003. Transferable EST-SSR markers for the study of polymorphsim and genetic diversity in bread wheat. Mol. Genet. Genomics 270:315-323.
Han, Z. G., C. B. Wang, X. L. Song, W. Z. Guo, J. Y. Gou, C. H. Li, X. Y. Chen & T. Z. Zhang. 2006. Characteristics, development and mapping of Gossypium hirsutum derived EST-SSRs in aliotetraploid cotton. Theor. Appl. Genet. 112:430-439.
Hedgecock, D., G. Li, S. Hubert, K. Bucklin & V. Ribes. 2004. Wildspread null alleles and poor cross-species amplification of microsatellite DNA loci cloned from the Pacific oyster, Crassostrea gigas. J. Shellfish Res. 23:379 385.
Huvet, A., P. Boudry, M. Ohresser, C. Delsert & F. Bonhomme. 2000. Variable microsatellite in the Pacific oyster Crassostrea gigas and other cupped oyster species. Anim. Genet. 31:71-72.
Launey, S. & D. Hedgecock. 2001. High genetic load in the pacific oyster Crassostrea gigas. Genetics 159:255-265.
Launey, S., C. Ledu, P. Boudry, F. Bonhomme & Y. Naciri-Graven. 2002. Geographic structure in the flat oyster Ostrea edulis L. as revealed by microsatellite polymorphism. J. Hered. 93:331-338.
Li, G., S. Hubert, K. Bucklin, V. Ribes & D. Hedgecock. 2003. Characterization of 79 microsatellite DNA markers in the Pacific oyster Crassostrea gigas. Mol. Ecol. Notes 3:228-232.
MacKenzie, C. L., Jr. 1996. History of oystering in the United States and Canada, featuring the eight greatest oyster estuaries. Mar. Fish. Rev. 58:1-78.
McGoldrick, D., D. Hedgecock, L. J. English, P. Baoprasertkul & R. D. Ward. 2000. The transmission of microsatellite alleles in Australian and north American stocks of the Pacific oyster (Crassostrea gigas): selection and null alleles. J. Shellfish Res. 19: 779-788.
Naciri, Y., Y. Vigouroux, J. Dallas, E. Desmarais, C. Delsert & F. Bonhomme. 1995. Identification and inheritance of (GA/TC)n and (AC/GT)n repeats in the European flat oyster Ostrea edulis (L.). Mol. Mar. Biol. Biotechnol. 4:83-89.
Nicot, N., V. Chiquet, B. Gandon, L. Amilhat, F. Legeai, P. Leroy, M. Bernard & P. Sourdille. 2004. Study of simple sequence repeat (SSR) markers from wheat expressed sequence tags (ESTs). Theor. Appl. Genet. 109:800-805.
Perez, F., J. Ortiz, M. Zhinaula, C. Gonzabay, J. Calderon & F. A. M. J. Volckaert. 2005. Development of EST-SSR markers by data mining in three species of shrimp: Litopenaeus vannamei, Litopenaeus stylirostris, and Trachypenaeus birdy. Mar. Biotechnol. 7:554-569.
Reece, K. S., W. L. Ribeiro, P. M. Gaffney, R. B. Carnegie & S. K. Allen. 2004. Microsatellite marker development and analysis in the eastern oyster (Crassostrea virginica): Confirmation of null alleles and non-Mendelian segregation ratios. J. Hered. 95:346-352.
Schuelke, M. 2000. An economic method for the fluorescent labeling of PCR fragments. Nat. Biotechnol. 18:233-234.
Thiel, T., W. Michalek, R. K. Varshney & A. Graner. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106:411-422.
Wang, D., X. Liao, L. Cheng, X. Yu & J. Tong. 2007. Development of novel EST-SSR markers in common carp by data mining from public EST sequences. Aquaculture 271:558-574.
Wang, Y. & X. Guo. 2007. Development and characterization of ESTSSR markers in the eastern oyster Crassostrea virginica. Mar. Biotechnol. 9:500-511.
Yu, J. K., T. M. Dake, S. Singh, D. Benscher, W. L. Li, B. Gill & M. E. Sorrells. 2004. Development and mapping of EST-derived simple sequence repeat markers for hexaploid wheat. Genome 47:805-818.
Yu, Z. & X. Guo. 2003. Genetic linkage map of the eastern oyster Crassostrea virginica Gmelin. Biol. Bull. 204:327-338.
Yu, Z. & X. Guo. 2006. Identification and mapping of disease-resistance QTLs in the eastern oyster, Crassostrea virginica Gmelin. Aquaculture 254:160-170.
Zhan, A. B., Z. M. Bao, X. L. Wang & J. J. Hu. 2005. Microsatellite markers derived from bay scallop Argopecten irradians expressed sequence tags. Fish. Sci. 71:1341-1346.
YONGPING WANG, (1) YAOHUA SHI (1,2) AND XIMING GUO (1) *
(1) Haskin Shellfish Research Laboratory, Institute of Marine and Coastal Sciences, Rutgers University, 6959 Miller A venue, Port Norris, New Jersey 08349; (2) Ocean College, Hainan University, Key Laboratory of Tropic Biological Resources, MOE, Hainan Key Laboratory of Tropical Hydrobiological Technology, 58 Renmin Road, Haikou, Hainan 570228, People's Republic of China
* Corresponding author. E-mail: firstname.lastname@example.org
TABLE 1. Locus name, sequence identity, motif type, primer sequences, PCR conditions and allele number (in 30 oysters) of 66 EST-SSRs developed in Crassostrea virginica. Locus GenBank ID Motif Primer Sequence (5'-3') RUCV102 51877088 (TTG)4 ACCTGATCCTTCCTTTGTGG GCTCTTGATAGGAGTCATTTTCC RUCV108 51876414 (ACA)4 CATCAATGCAGCAGAAAAGC CACTGCCTACACCGACAACC RUCV109 51876365 (CGA)4 TCAGTTCCTCGCCTCCTACC TCGTGTGCTTGTGATGATCC RUCV112 51568977 (CAACCA)4 CAGCCTCCAGGAAGTAGTGC CGGTAGTAGTTGGGGCTTCC RUCV113 51568962 (ATC)4 AAAAACAACGCTGGAAAAGC GCTGGATTTTGTCCAAGAGG RUCV114 51568799 (ATTG)4 GTGAGAAGGGATTGGAGTGC ATGAAATAATGGCGATACGG RUCV116 51568721 (GAT)4(GAC) AGGGACGTCTCGTCTTACGG CTCTCTCTGCCTTTGCATCC RUCV119 51568681 (GAG)4 AGTCCGACGAATCAATCACC CTTCCTGCTGCAGACCTAGC RUCV121 51568504 (GCT)4 CCTTAGTGGCAACAACAATCC AGAGACCAAATCCCATGTCG RUCV122 51568396 (AGA)4 AAACATGGCAGAAGTTGACG TGGACCTGCTTCTTTCTTGG RUCV125 51567984 (ACA)4 GGACGGTTCTACTCTGTGTTTCC ACTGATGACCCACAGCAAGC RUCV126 51567665 (CAG)4 TTGTCTGTGAAGTCCGTTGG TCGTCGGGTAGATAGTTGTCG RUCV131 51567118 (GCA)4 CTCTGGAGACAAATCCATGC CATTTCTCTGTGCTGATGACG RUCV133 51566676 (AAAC)4 CATGAATGGCTAACTGAAAAGG CCTTGTAGAGGATGCTTCACC RUCV135 31908515 (GGA)4 CTGTCTGAGTCCCCAGAAGC TCTTGCACAGCTTTTCTTCC RUCV138 31907361 (AT)6 AAGCATTCCAACCTCTGTCC TGGCATCCAAACAACATACC RUCV148 31906049 (CAAT)4 CGGATGGGACGTTAAATGG CATTGGTCACATCTCACACG RUCV150 31905846 (TA)6 GTGGGGGTCATTCTATGTGG AGGTTGACGTTCCCTCTGC RUCV152 31905701 (GATT)4 AAATGTGAGTCACGGTCAGG GCTTTTCGAGGGAGTGAGG RUCV156 31905336 (TA)6 CATACGGTATCCTCTTATTTCAGC TTTTTCTTGCATTCGCTAGG RUCV159 31904931 (TAT)4 GGGCACATTGAAGTGTTGG GAGGGGGAGAAATAGTGAAGG RUCV164 31904689 (TC)6 GGAAGAGTGTTTTGAATTGACG ATATGTGATCCCCACACAAGG RUCV165 31904655 (GA)6 GCCAAAGAAGCTCAAAAAGG GCTATGACGCATCCTTCTCC RUCV168 31904205 (ATT)4 GTGGTTCAGCTTTTATCTGTCC TTACACGAATGTGGTGATGC RUCV172 31903867 (GAG)4 CAACGCTATGAAGGGACAGG ATCTTGCACAGCTCCTTTGG RUCV173 31903680 (CAA)4 GGAAGGGTGACCTAATGTGG GGACAGAATGTAAGCGTCTGC RUCV176 31903249 (AAG)4 GGACTGTGAGTGGGAAGTGG CAGTGGTTTTGGGCTTTCC RUCV183 31900834 (GA)6 GTGTGAAGTCAGGCTGTATGG GCACCACAAATACTGCATACC RUCV185 31900617 (CAT)4 AGCGTGGCTACTCTTTCACC CACCTGAATCGCTCATTTCC RUCV186 31900433 (AGC)4 GAAAAACGCAAAGAGGAAGG TGTTCTTTCTCGGGGTTAGC RUCV190 14581219 (AGA)4 TTTGCTTCAAAAGTGGTTGG TCATTCAGCATGTTAGATGTGG RUCV191 14581217 (TTA)4 GGGACTAGGTCGAAAAGACC CACGAAGAACATCGCTTCC RUCV195 14581037 (ACAG)4 GACAAGACGTAGCCATCAACC AAGGTCCAACCATCCAACC RUCV197 14580980 (TA)6 GTGACTGTACAAAGGCTGTGC GGAAAATGAGCCATCTACTGC RUCV199 14580606 (CGGA)4 GACATGGCCAATCATCTCC TACCCCTTTATGTCCGTTCG RUCV204 14581280 (AC)5 CTATGCTCGGCACTTCAGG AAAATGTAGGTTGGGTGTTCG RUCV206 31901037 (AG)5 GGTGTGAAAAACATGCAACG GGCAGTAACAGCATTGAGTCC RUCV210 31903448 (AT)5 ATAATTCAGGGATGGGTTGG GAATACAAGGCAAAATGGAAGC RUCV212 31904211 (GT)5 AAAACCTACCCCTGGTTTCC TTAGTTGTCATTCGCGTTGC RUCV216 31904635 (TA)5 GGACATCCGGGTCCTATACC GGAGACACATGTATGCAAAGAGC RUCV218 31904852 (CA)5 TCTACCCACCCTGAGTCACC GATGGATCTCCTGGCCTAGC RUCV220 31905003 (AT)5 ACAGGAGAATGCAGGAATGG GCGAGTTTTGACTTTACAATGC RUCV221 31905130 (AT)5 CGAGATCGAAGGACAAAAGC GTGTCACAAGGAATAAAGATCACC RUCV226 31905913 (GA)5 AAGCTAAAGCGTGTGTGTGC GCTTTCTCCAGGTTTTCTCG RUCV227 31906040 (GA)5 CTATGCCACCACCACAGAGG CCTTCTCCTCCTCCAGTTCC RUCV228 31906092 (TC)5 TCTCATGTTGGATGGAATGC TACCGAACGCCACTATCAGG RUCV230 31907050 (CAT)4 GGACTTTGAGCAGGAAATGG GAAAGATGGCTGGAACATGG RUCV235 31908461 (CT)5 CCAAACACGAGGAGTCTAACC GAGCCTCGATACACAACACC RUCV237 31908791 (CT)5 GTGGGAGACAGAGGGAAGC AACATAGCAGTGGGCAGTCC RUCV241 51172880 (AT)5 TGCAGCAAATTCAAAACAGC CAAGGGGAGAGAATTTTATTGG RUCV243 51567484 (AG)5 TTCTGGGTTGTTTTTGTGAGG AACACTTGGCTCCAGCTACC RUCV246 51567784 (AT)5 CCCAACAGACATTGGACTGC GAGATGACAGTGAGCCTTTCG RUCV253 51876088 (TA)5 GGGTCCATGTTCTCTGACG TCACTGCTACATGGTAACAAGC RUCV256 51877114 (CA)5 CAGGGGAAAACTTGTCATGC CCCCTCCATTTGTCGTAGG RUCV263 51568075 (TTG)4 GTAGTAAGCTCCAGGGGAAGGA CACGGAGATCTCACATGCAA RUCV265 31908056 (TA)5 ATCACCGATGGAAACAGTCC CCTAAATGTACAGATACAGCAGAAG RUCV270 31902968 (TTTA)4 GGACCAAATATTCCACATCACAC AAGCTGAATGCCCAAACATC RUCV272 31900597 (TA)6 AGCAATTCTGTGCTGATTCAAG TGAAAAATCCATGAGCGATG RUCV274 14581106 (TTA)4 CCAACAACAAAACGTGGAAAC AGGGAGGGATGTTTATCGTG RUCV277 14580875 (AC)5 GGCTGAGTTCAAATTCATGTTC GTTGGTTTGGTGCCATCTG RUCV279 14581306 (AT)5 GTCATTTTGGCCCTAATCTTACAC AACAACACGAGGCGGTTATAC RUCV280 14581359 (AT)5 GTGCGCACTTGATTTAGC GGGTTTTACGCCGTATTGTC RUCV282 31903779 (AT)5 AATGCATTAGCGTCTGAAG CAAAGCATCCTTGGGTGAAG RUCV284 31904125 (TA)5 ATTTCTTTCCGCAAGCAGTG CAGATTCAACCGAGATTGTGAG RUCV287 31904707 (GTG)4 TCCAATGACGACCTTTAGAATG GGAAATGGGTGAGTTTTTGC RUCV297 51567947 (TC)5 CATAAACCGGTGGAATACCC TTCCTCTAACTTGGCCGTTC Mg[CI.sub.2] Tm Expected Observed Allele Locus (MM) ([degrees]C) Size Size No. RUCV102 1.5 60 224 243-250 5 RUCV108 1.5 60 153 318-337 4 RUCV109 1.5 60 213 225-231 3 RUCV112 1.5 60 175 168-192 3 RUCV113 1.5 60 116 345-364 7 RUCV114 1.5 60 234 230-258 6 RUCV116 1.5 60 121 137-143 3 RUCV119 1.5 56 255 483-553 7 RUCV121 2.5 56 296 628-646 5 RUCV122 1.5 56 229 621-695 8 RUCV125 2.0 60 261 277-283 4 RUCV126 2.0 60 240 255-260 3 RUCV131 2.0 60 266 468-502 9 RUCV133 1.5 60 122 135-139 2 RUCV135 2.0 55 196 465-543 11 RUCV138 1.5 56 281 304-311 3 RUCV148 1.5 60 247 216-270 6 RUCV150 1.5 60 278 288-302 3 RUCV152 1.5 60 280 298-310 6 RUCV156 1.5 60 299 314-332 9 RUCV159 1.5 60 258 275-283 5 RUCV164 1.5 60 232 253-260 5 RUCV165 2.0 60 252 355-433 10 RUCV168 2.0 55 277 282-298 4 RUCV172 2.0 55 256 497-522 8 RUCV173 2.0 60 230 231-249 3 RUCV176 2.0 55 287 626-636 4 RUCV183 1.5 56 136 144-165 6 RUCV185 2.0 55 187 206-209 2 RUCV186 2.0 55 261 263-278 2 RUCV190 2.0 55 292 310-322 4 RUCV191 2.0 55 286 301 3 RUCV195 1.5 60 267 408-435 5 RUCV197 2.5 60 177 183-232 11 RUCV199 1.5 60 266 275-294 9 RUCV204 2.0 60 199 190-226 5 RUCV206 2.0 60 272 287-292 3 RUCV210 1.5 60 298 320-328 2 RUCV212 1.5 60 297 310-320 5 RUCV216 1.5 60 233 204-260 8 RUCV218 1.5 60 205 223-225 2 RUCV220 1.5 60 168 176-188 2 RUCV221 1.5 60 126 131-146 5 RUCV226 1.5 60 293 283-298 4 RUCV227 1.5 60 278 527-660 8 RUCV228 1.5 60 206 476-507 7 RUCV230 2.0 55 136 152-167 6 RUCV235 1.5 60 135 142-154 2 RUCV237 1.5 60 279 299-303 3 RUCV241 1.5 60 226 245-264 6 RUCV243 1.5 60 217 235-237 2 RUCV246 1.5 60 185 196-205 4 RUCV253 1.5 60 224 231-263 7 RUCV256 1.5 60 195 419-452 3 RUCV263 1.5 60 225 227-252 6 RUCV265 1.5 60 347 365-377 5 RUCV270 1.5 60 576 522-601 11 RUCV272 1.5 60 154 168-179 4 RUCV274 2.0 55 235 227-232 12 RUCV277 1.5 60 354 374-381 6 RUCV279 1.5 60 395 412-432 8 RUCV280 1.5 60 413 428-450 6 RUCV282 1.5 60 433 408-453 5 RUCV284 1.5 60 519 504-547 7 RUCV287 2.0 55 408 413-431 4 RUCV297 1.5 60 355 359-382 12 Locus Note RUCV102 RUCV108 RUCV109 RUCV112 RUCV113 15/30 nulls RUCV114 RUCV116 RUCV119 RUCV121 RUCV122 RUCV125 RUCV126 5/30 nulls RUCV131 RUCV133 4/30 nulls RUCV135 RUCV138 RUCV148 RUCV150 RUCV152 8/30 nulls RUCV156 RUCV159 RUCV164 RUCV165 RUCV168 RUCV172 RUCV173 RUCV176 RUCV183 RUCV185 RUCV186 RUCV190 RUCV191 EF * RUCV195 RUCV197 RUCV199 RUCV204 RUCV206 RUCV210 15/30 nulls RUCV212 RUCV216 EF * RUCV218 RUCV220 RUCV221 RUCV226 RUCV227 13/30 nulls RUCV228 RUCV230 RUCV235 RUCV237 10/30 nulls RUCV241 RUCV243 RUCV246 RUCV253 RUCV256 RUCV263 RUCV265 5/30 nulls RUCV270 RUCV272 RUCV274 RUCV277 RUCV279 RUCV280 8/30 nulls RUCV282 RUCV284 RUCV287 RUCV297 * EF, more than two fragments in some individuals TABLE 2. Segregation of 34 C. virginica EST-SSRs in a full-sib family tested against Mendelian ratios. The presence of null-alleles (O) was deduced based on parental and progeny genotypes. Locus Mother Father Progeny (N) Progeny Genotype RUCV108 AA BO 90 AB:AO RUCV109 AA AB 100 AA:AB RUCV113 BO AC 99 AB:AO:BC:CO RUCV114 BB AB 97 AB:BB RUCV116 AB AA 95 AB:AA RUCV119 CC AB 81 AC:BC RUCV121 CC AB 79 AC:BC RUCV126 BC AB 98 AB:AC:BC:CC RUCV131 AC BB 99 AB:BC RUCV135 AB 0 94 AO:BO RUCV150 BB AB 100 ABSB RUCV152 AO AO 100 AA+AO:OO RUCV156 BB AB 97 AB:BB RUCV159 AA BC 95 AB:AC RUCV164 AA AB 94 AA:AB RUCV165 AC AB 89 AA:AB:AC:BC RUCV168 AB BB 97 AB:BB RUCV183 AC BC 100 AB:AC:BC:CC RUCV190 AB BO 97 AB:AO:BB+BO RUCV197 BD AC 99 AB:BC:AD:CD RUCV210 AO 0 97 AO:00 RUCV216 AB BB 99 AB:BB RUCV218 AA AB 98 AA:AB RUCV220 AB BB 100 AB:BB RUCV227 AB CC 92 AC:BC RUCV230 AB BB 96 AB:BB RUCV241 AB AC 97 AA:AB:AC:BC RUCV246 BB AO 100 AB:BO RUCV270 AO AB 100 AA+AO:AB:BO RUCV274 AB CO 94 AC:BC:AO:BO RUCV277 BB AB 94 AB:BB RUCV279 BB AC 100 AB:BC RUCV284 0 AO 100 AO:OO RUCV297 CD AB 95 AC:AD:BC:BD Locus Expected Ratio Observed Ratio P-value RUCV108 1:1 58:32 0.0061 RUCV109 1:1 51:49 0.8415 RUCV113 1:1:1:1 24:21:30:24 0.6309 RUCV114 1:1 31:66 0.0004 * RUCV116 1:1 54:41 0.1823 RUCV119 1:1 40:41 0.9115 RUCV121 1:1 39:40 0.9104 RUCV126 1:1:1:1 28:23:26:19 0.4854 RUCV131 1:1 59:40 0.0562 RUCV135 1:1 42:52 0.3023 RUCV150 1:1 49:51 0.8415 RUCV152 3:1 57:43 0.0000 * RUCV156 1:1 65:32 0.0008 * RUCV159 1:1 40:55 0.1238 RUCV164 1:1 38:56 0.0634 RUCV165 1:1:1:1 27:29:12:21 0.0491 RUCV168 1:1 48:49 0.9191 RUCV183 1:1:1:1 21:21:39:19 0.0144 RUCV190 1:l:2 24:24:49 0.4016 RUCV197 1:l:1:1 22:22:32:23 0.4139 RUCV210 1:1 47:50 0.7607 RUCV216 1:1 52:47 0.6153 RUCV218 1:1 55:43 0.2254 RUCV220 1:1 33:67 0.0007 * RUCV227 1:1 45:47 0.8348 RUCV230 1:1 51:45 0.5403 RUCV241 1:1:1:1 23:19:29:26 0.5207 RUCV246 1:1 51:49 0.8415 RUCV270 2:l:1 70:28:02 0.0000 * RUCV274 1:1:1:1 18:31:19:26 0.1864 RUCV277 1:1 50:44 0.5360 RUCV279 1:1 51:49 0.8415 RUCV284 1:1 25:75 0.0000 * RUCV297 1:1:1:1 35:21:22:7 0.0002 * * Designates significant deviation from expected Mendelian ratios after Bonferroni correction. TABLE 3. SSR-containing ESTs of C. virginica with significant homology to known genes from other organisms. Locus Sequence ID Gene Function RUCV102 gb|AA134580.1| SLC6A9 protein (Bos taurus) RUCV113 ref|NP_001086845.1| MGC83353 protein (Xenopus laevis). RUCV116 ref|NP_001075868.1| elongation factor 1 beta (Oryctolagus cuniculus). RUCV119 ref|XP_001200338.1| PREDICTED: similar to Pesl -prov protein (Strongylocentrotus purpuratus). RUCV121 gb|AAV84269.1| ribosomal protein P2-like (Culicoides sonorensis) RUCV122 gb|ABZ04266.1| ribosomal protein rps15 (Linens viridis) RUCV131 gb|EDL30000.1| transmembrane protein 57 (Mus musculus) RUCV135 ref|XP_001622481.1| predicted protein (Nematostella vectensis). RUCV150 dbj|BAE78960.1| aspartate racemase (Scapharca broughtonii) RUCV172 ref|XP_001630508.1| predicted protein (Nematostella vectensis). RUCV173 ref|XP_683651.21 PREDICTED: hypothetical protein (Danio rerio) RUCV176 ref|NP990272.1| chromodomain helicase DNA binding protein 1 (Gallus gallus) RUCV185 gb|ABW97741.1| beta-actin (Crassostrea ariakensis) RUCV186 ref|XP_795301.1| PREDICTED: hypothetical protein (Strongylocentrotus purpuratus). RUCV210 ref|XP_001607815.1| PREDICTED: similar to vacuolar proton atpases isoform 3 (Nasonia vitripennis). RUCV216 ref|XP_001079310.1| erg gene (erg_E) (Xenopus laevis). RUCV220 ref|XP_780065.1| PREDICTED: similar to receptor tyrosine kinase (Strongylocentrotus purpuratus). RUCV221 BC003832.1 Mus musculus S100 calcium binding protein A6 (calcyclin), Mrna RUCV226 ref|XP785663.21 PREDICTED: similar to zinc finger protein (Strongylocentrotus purpuratus). RUCV227 ref|XP_001199159.1| PREDICTED: similar to KIAA0445 protein, partial (Strongylocentrotus purpuratus) RUCV228 ref|NP_001009557.1| solute carrier family 6 (neurotransmitter transporter, glycine),member 5 (Danio rerio) RUCV230 dbj|BAE80701.I cytoplasmic actin (Pinctada fucata) RUCV235 gb|ABS57447.1| heat shock protein hsp21.4 (Heliconius erato) RUCV243 ref|XP_001658956.1| ribosomal RNA methyltransferase (Aedes aegypti). RUCV256 ref|XP001192377.1| PREDICTED: similar to dihydropyrimidinase, partial (Strongylocentrotus purpuratus) RUCV263 ref|NP_001038592.1| hypothetical protein LOC567 109 (Danio rerio). Locus E-value RUCV102 7.00E-15 RUCV113 3.00E-45 RUCV116 3.00E-73 RUCV119 3.00E-48 RUCV121 2.00E-19 RUCV122 4.00E-70 RUCV131 4.00E-27 RUCV135 2.00E-11 RUCV150 2.00E-30 RUCV172 1.00E-18 RUCV173 4.00E-29 RUCV176 3.00E-36 RUCV185 1.00E-140 RUCV186 2.00E-18 RUCV210 3.00E-47 RUCV216 2.00E-63 RUCV220 1.00E-12 RUCV221 1.00E-15 RUCV226 2.00E-61 RUCV227 8.00E-58 RUCV228 5.00E-81 RUCV230 4.00E-154 RUCV235 4.00E-10 RUCV243 2.00E-33 RUCV256 1.00E-104 RUCV263 6.00E-13