Printer Friendly

A leucine aminopeptidase gene of the Pacific oyster Crassostrea gigas exhibits an unusually high level of sequence variation, predicted to affect structure, and hence activity, of the enzyme.

ABSTRACT Leucine aminopeptidase (LAP) belongs to a family of ubiquitous peptidases, with roles in growth and development, stress responses and adaptation to changing environmental conditions. The LAP gene was sequenced from a commercially important marine bivalve: the Pacific oyster Crassostrea gigas, and sequence polymorphisms were identified. This study identified 21 single nucleotide polymorphisms (SNPs), which would alter the encoded amino acid sequence, one 3 base deletion, one single base deletion, which would create a truncated protein (predicted to be nonfunctional), and a further 50 silent SNPs. The 23 polymorphisms altering protein sequence were found to occur in 34 different combinations, which we designated as 34 alleles: many more than the 6 alleles predicted previously by allozyme analysis. Predictions of protein structure and stability were used to identify which of the changes in the protein sequence are most likely to affect enzyme function. The sequence of the LAP gene noncoding regions was also analyzed and simple sequence repeats (microsatellites) were found in introns 1 and 9. The intron 1 microsatellite region was also found in another species of oyster: Crassostrea hongkongensis. We have demonstrated that the LAP gene region of the Pacific oyster, C. gigas is highly variable, and have identified new, potentially useful, genetic markers.

KEY WORDS: pacific oyster, Crassostrea gigas, leucine aminopeptidase, single nucleotide polymorphisms, simple sequence repeats (microsatellites), protein structure, genetic markers


Pacific Oyster

The Pacific cupped oyster, Crassostrea gigas (Thunberg 1793) is native to Japan, but having been widely introduced to other areas it is now a major aquaculture species worldwide, including in Europe and North America, where it has a cultivation range from Baja California to South East Alaska. For example in Washington state, United States, where it was first introduced in the early 1900s, it is currently the most valuable shellfish resource.

Single Nucleotide Polymorphisms (SNPs)

SNPs have been the focus of much attention in many species, because they are abundant and well suited to automated, large-scale genotyping and are increasingly becoming the marker of choice for genetic analysis. They are used in studying genome evolution and in selective breeding programs in agriculture.

Various approaches have been used for the discovery of novel SNPs. Identifying SNPs from DNA sequence data usually require experimental verification to exclude errors, which may occur during cloning and sequencing procedures. Researchers studying maize developed a computer-based method for identifying SNPs from 102,551 expressed sequence tags (ESTs) (Batley et al. 2003). Unfortunately, EST programs produce a relatively high error rate. However, Batley et al. (2003) removed false SNPs from their data by only recording those, which occurred in 2 or more sequences. Validity of this method was determined by direct sequencing for 264 SNPs, with over 90% of candidate SNPs being confirmed.

Crassostrea gigas exhibits a high level of genetic polymorphism. PCR-based studies, determining the level of sequence polymorphism in primer binding sites suggested a conservative estimate of one SNP every 82 base pairs. (Hedgecock et al. 2004). More recent estimates, based on studies of expressed sequence tags (ESTs) indicated the levels of genetic polymorphisms in C. gigas to be one SNP every 40 bp. This compares with one SNP per 1,000 bp (approx.) in the domestic mouse (Lindblad-Toh et al. 2000) and one SNP per 1-2 kb in humans (Clifford et al. 2000, Deutsch et al. 2001). The high SNP frequency estimated for C. gigas has led researchers to conclude that this species has one of the highest rates of protein polymorphism in any animal (Buroker et al. 1979).

A large number of deleterious recessive mutations (i.e., high genetic load) has also been demonstrated in C. gigas, with 15-20 lethal mutations per oyster, which is about 5 times the number found in humans or fruit flies (Launey & Hedgecock, 2001). This may be because of an increased chance of mutation, associated with the large number of cell divisions necessary to produce millions or billions of gametes ([10.sup.6] - [10.sup.8] eggs per female per season). This is consistent with G. C. Williams (1975) Elm-Oyster evolutionary model and other experimental evidence for a high mutational load. Marine bivalves seem to have similar evolutionary biology to highly fecund plants because of intense natural selection on their early life stages. SNPs are also found at high levels in plants. In maize (Zea mays) a SNP is believed to occur every 48 bp and 130 bp in untranslated and coding regions respectively (Tenaillon et al. 2001, Rafalski, 2002).

Leucine Aminopeptidases

The present study focused on SNPs within a gene encoding leucine aminopeptidase (LAP). LAPs are members of the M 17 family of peptidases and have been widely studied. They are ubiquitous, being found in animals, plants, and prokaryotes. For a review, see Strater & Lipscomb (2004). The enzyme is an exo-peptidase, which completes the hydrolysis of polypeptide fragments, produced by intracellular proteinases.

The best characterized LAPs are those from Escherichia coli (LAP also known as XerB, PepA, and CarP) and Bos taurus (bovine: LAP active as a homohexamer) and X-ray crystal structures have been determined for both Burley et al. (1992), and Strater et al. (1999a). LAP is believed to be involved in a wide range of processes, including stress response and osmoregulation in plants (Chao et al. 1999). There is also medical interest in LAP, because altered activity can be associated with conditions including cataracts (Taylor et al. 1982) and cancers (Gupta et al. 1989, Scott et al. 1986).

In marine bivalves, LAP variation is believed to be of selective importance through a role in adaptation to the changes in salinity, which occur as a result of tides, and varying freshwater input (Koehn et al. 1980, Moore et al. 1980, Koehn & Hilbish 1987). However, the major role of LAP is in protein turnover: the continuous breakdown, replacement and/or renewal of proteins. Protein turnover has essential roles in processes such as metabolism, development (such as the body changes from a larval to adult form), adaptation and repair (Hawkins, 1991, Ciechanover & Schwartz 1994). An individual's level of protein turnover can vary greatly, depending on stage of development, environmental factors, and genotype. Elevated energy costs associated with higher rates of protein turnover have been proposed to lead to reduced growth efficiency (Hawkins & Day 1996). Thus LAP variation is likely to be responsible for some of the variation observed in the rates of growth of different marine bivalve individuals (Mallet & Haley 1983, Mason et al. 1998).

Earlier studies looking at LAP variations between different individuais of bivalves used allozyme electrophoresis. Allozyme studies provide an insight into genetic diversity by assessing changes in physical properties of the protein (Fujino & Nagaya 1977, McGoldrick & Hedgecock 1997, Day et al. 2000). Although providing useful information, allozyme data have certain limitations. Some genetic changes might not alter the mass or charge of the protein enough to produce a band shift, and hence would be undetectable. Also, because allozyme electrophoresis requires an active enzyme to produce a band, genetic variations causing nonfunctional enzymes (null mutations) would not be observed (Michinina & Rebordinos 1997).

The study described here aimed to assess the level of genetic diversity in a C. gigas LAP gene. Two LAP genes have been identified in C. gigas. This study refers to the one defined by accession number AF288678 (sequence submitted by J.-M. Escoubas) (sometimes also referred to as LAP2). We were most interested in those variations affecting amino acid sequence of the coded protein, with the longer-term goals of finding a marker for altered LAP activity, and of ultimately seeing how peptidase activity influences growth rate. We present data indicating an unusually high level of genetic variation within this important gene, and make predictions about the effect of some of these variations on protein activity. We also show that simple sequence repeat (microsatellite) regions exist within the LAP gene of two different species of Crassostrea. The genetic markers we have identified will be useful in future studies. It is possible that some of the markers could be linked to beneficiai traits, and hence could be used for future genetic enhancement, for example of growth performance, for use in aquaculture.


C. gigas Samples

Crassostrea gigas individuais were obtained from stocks originating from the following locations: Pendrell Sound and Seymour Inlet (Canada), "Kumamoto" (Japan), Guernsey Sea Farms, Limosa Famas (River Yealm), and River Lynher (all UK). Following arrival at the laboratory, animals were acclimated (7 days minimum in recirculated seawater, 15[degrees]C, without food other than phytoplankton occurring naturally in sea water). Individual oysters were removed from the tank, shucked, and the tissue was immediately ground in liquid nitrogen and stored at -80[degrees]C.

Amplification, Cloning, and DNA Sequence Analysis of LAP Coding Regions

Standard molecular techniques were used. RNA was extracted from approximately 100-160 [micro]g ground, frozen tissue using an RNeasy MIDI kit (Qiagen, according to the manufacturer's instructions), homogenizing tissue thoroughly with a 5 ml syringe and G18 needle prior to cell lysis. RNA was eluted in a total of 250 [micro]l RNAse-free [H.sub.2]O and stored at -20[degrees]C.

The LAP gene (accession number AF288678) was amplified by Reverse Transcriptase (RT)-PCR using primers PLPF1 (forward) and PLPR10 (reverse) (see Table 1), and a Qiagen One Step RT-PCR Kit, used according to the manufacturer's instructions, cDNA synthesis (50[degrees]C, 30 min) and denaturation (95[degrees]C, 15 min) was followed by amplification: 35 cycles of 95[degrees]C (1 min), 52[degrees]C (1 min) and 72[degrees]C (1.5 min), with a final extension step of 72[degrees]C (10 min), followed by cooling to 4[degrees]C. RT-PCR products were analyzed by agarose gel electrophoresis. Desired bands were excised from the gels and the DNA fragments cleaned using a Wizard SV gel and PCR cleanup kit (Promega), according to manufacturer's instructions. DNA was eluted in nuclease-free H20 and stored at -20[degrees]C.

Resulting RT-PCR products were cloned using a pGEMTeasy kit (Promega) and sequenced using standard techniques (Big Dye Terminator Sequencing Kit v.3.1 (Applied Biosystems), according to manufacturer's instructions for sequencing double stranded DNA).

Each clone was sequenced with multiple primers to ensure good coverage, with overlap, on each strand to eliminate sequencing errors. M13 primers (M13 uni (-21), forward and M13 reverse) were used, together with specific primers along the oyster LAP gene: PLPF2, PLPF3, PLPF8, PLPF9, PLPR1, PLPR3 (Table 1).

Analysis of the LAP Gene Sequence from Multiple Individuals of Pacific Oyster Crassostrea gigas to Identify Single Nucleotide Polymorphisms (SNPs) Causing Amino Acid Substitutions

The LAP gene was sequenced from 24 C. gigas individuals, derived from across the locations listed above. One-hundred and forty clones were sequenced in total; the number of individuals and clones varying between locations.

Sequence Alignments

Clustal-W (EMBL-European Bioinformatics Institute) (Thompson et al. 1994) analyses were performed using the Bioedit Sequence Alignment Editor v. (Hall 1999) and later refined manually. Analysis was performed for the DNA sequences and the translated amino acid sequences. Sequences have been submitted to Genbank (accession numbers FJ347720-FJ347754).

Designation of SNPs and Assessment of SNP Linkages

SNPs causing amino acid substitutions were of greatest interest to this study, therefore the translated sequences were used primarily. Amino acid substitutions were only deemed to be real if the same substitution was observed from at least 3 clones, which were derived from more than one RT-PCR reaction. The entire sequence of each clone was analyzed in Bioedit, and patterns of SNP linkages were then identified. The same criteria were used, to avoid errors, while looking at the DNA sequences to identify SNPs which would not alter amino acid scquence.

Structural Predictions

Initial predictions of the importance of each SNP were made using the Cn3D4.1 sequence/structure alignment viewer, downloaded from the National Center for Biotechnology Information (NCBI) web site (, aligning the C. gigas amino acid sequence into the crystal structures of bovine lens LAP (accession number 1LAM, GI:1311324) and aminopeptidase A (PepA) from E. coli (accession number 1GYT, GI:21730291) so that the positions of residues altered by C. gigas SNPs could be identified.

Detailed predictions on the effect of each SNP were performed as a service by the Exeter Biocatalysis Center (University of Exeter, EX4 4QD, UK). The structures of each of the allelic variants of the oyster LAP were modeled independently using the bovine LAP as a template. In addition, the wild type allele was also modeled using the E. coli PepA as a template. Modeling was carried out using the Molecular Operating Environment (MOE) software with CHARMM22 as the forcefield. Ten models for each protein sequence were built, with the "best" intermediate model subjected to further refinement. The final model for each was further refined to correct stereochemical problems.

Sequencing Microsatellite Repeat Regions within C. gigas LAP Gene Introns

Genomic DNA was extracted from C. gigas Limosa Farms individuals (river Yealm, Devon UK) using a Qiagen kit, used according to manufacturer's instructions, and was amplified by PCR using primers PLPF1 & PLPR5 (intron 1) and PLPF8 & PLPR10 (intron 9) (see Table 1).

Amplification was performed using a mixture of high fidelity (Pfu) polymerase to minimize the incorporation of sequence errors and standard Taq polymerase to facilitate cloning by the addition of an A tail to the products. PCR reactions, set up in duplicate, comprised 5.0 [micro]L 10 x Pfu buffer (Promega), 1.0 [micro]L dNTP mix (10 mM), 1.0 [micro]L each of forward and reverse primers (10 mM), 1.5 [micro]L DNA template, 1.0 [micro]L Pfu polymerase (3 u/ [micro]L, Promega), 0.33 [micro]L Taq polymerase (5 u/[micro]L, Promega), made up to 50 [micro]L final volume with [H.sub.2]O. Cycling parameters consisted of a denaturation step at 95[degrees]C (1 min), followed by 35 cycles of 95[degrees]C (30 sec), 55[degrees]C (30 sec), 74[degrees]C (4 min). After a final extension step at 72[degrees]C (5 min), samples were cooled to 4[degrees]C. Purified PCR products were cloned using a pGEMTeasy kit (Promega) and sequenced by standard techniques.

Amplification of the LAP Gene of Crassostrea hongkongensis

Total DNA from Crassostrea hongkongensis (Lato & Morton 2003) was a kind gift from Katherine Lam of the Hoi Ha Wan Marine Life Center, City University of Hong Kong. Individuals of C. hongkongensis were taken from a wild population originating from Lau Fau Shan, Hong Kong. This population grew on a rocky shore approximately 1.5 km from any oyster farms.

LAP amplification was performed using DNA from a single individual of C. hongkongensis. PCR amplification was carried out using various combinations of PCR primers designed for the C. gigas LAP gene sequence (not listed). Primers PLPF1 and PLPF2 comp were selected to amplify and clone a region of the C. hongkongensis LAP gene for DNA sequence analysis. The PCR product was cloned using a pGEMTeasy kit (Promega). Eight clones were sequenced using M13 forward and reverse primers. All procedures were performed using standard molecular techniques.


Polymorphisms in the C. gigas LAP Coding Sequence

The LAP gene was sequenced from multiple individuals of the Pacific Oyster, C. gigas to identify single nucleotide polymorphisms (SNPs). 21 SNPs were identified, which affect the amino acid sequence, together with a 3 nucleotide deletion affecting amino acid residues 133 and 134, replacing K and A with a T residue and a 1-base deletion creating a premature stop codon, which would produce a truncated (presumably inactive) protein, as summarized in Figure 1. The SNPs causing amino acid substitutions occurred in 34 different combinations, which we describe as 34 alleles, as summarized in Table 2 (we have listed alleles 20 and 21 separately, even though they vary from each other after the premature stop codon: (i.e., both would encode the same truncated protein meaning we have actually identified 33 different protein sequences). This compares with only 6 alleles predicted from allozyme studies (Smith et al. 1986). This difference is presumably because many of the altered proteins would not show differential electrophoretic mobility. In addition to having limitations in resolving power, allozyme studies require a protein to be active for visualization. It is possible that some allozymes (including, but not limited to, the truncate) are nonfunctional, and hence would not be detectable.

A further 50 "silent" SNPs (not affecting amino acid sequence) were identified within the 1593 bp coding sequence, as shown in Figure 1. The total number of LAP alleles would therefore be far greater than 34. Although the amino acid sequence of many of the encoded proteins would be identical, it appears that synonymous SNPs may not be "silent" as previously supposed. In addition to possible affects on gene expression, the presence of a rare codon has been shown to alter in vivo folding, and consequently protein function (Kimchi-Sarfaty et al. 2007). Taken together, the total number of SNPs identified suggest an overall frequency of one SNP every 22 bp within the coding region of the C. gigas LAP gene: a high frequency even for C. gigas genes (Sauvage et al. (2007) recently reported a range of one SNP every 20-100 bases in expressed sequences).


We are reasonably confident that the sequence variations identified do represent true SNPs. Ir has previously been shown for expressed sequence tags (ESTs) that by only focusing on those polymorphisms observed in two or more of the aligned sequences, the majority of sequencing errors will be removed (Batley et al. 2003). Our decision to designate as SNPs only those variations seen in clones derived from at least two independent RT-PCR reactions effectively removed the likelihood of counting RT-PCR errors as SNPs. However, it is possible that some true genetic variation may have been ignored, making the actual SNP frequency even higher.

LAP Structural Predictions and the Importance of Sequence Polymorphisms

All the amino acid substitutions were considered, both separately and in the combinations represented by the different alleles, to predict those nonsynonymous polymorphisms most likely to have functional effects. Functional effects were considered possible if substitutions were predicted to affect the active site, access to the active site via solvent channels, sites of interaction between monomers involved in forming the active hexamer and/or overall protein stability. Assessment of each mutation was made by analysis of its position in the hexamer structure. Combinations of the different amino acid substitutions as encoded by the allelic variants appeared unlikely to have a compounding effect.

Analyses of the effects of changes at amino acid residues 10 and 17 could not be made, as these positions could not be modeled using either of the template protein structures. However, the N-terminal region of the protein is far from the active site, the active site entrance channel and the interface between subunits. These amino acid substitutions therefore appear unlikely to affect activity.

Only those substitutions predicted most likely to have an effect will be discussed further. In the following text, they are defined by the number of the amino acid residue, followed by the alternative residues identified.

216 L/* is predicted to produce a severely truncated, hence non functional protein.

SNPs Predicted to Affect Protein-Protein Interactions

Several substitutions were identified, which could affect the interaction of the LAP hexamer with other proteins, if such interactions occur. These SNPs are: 30 S/F, 198 K/E, 278 K/E, 281 Q/H, and 528 K/N.

SNPs Predicted to Affect Catalysis

Only six of the amino acid substitutions appeared likely to affect catalysis directly: four through changes to the entrance of the active site channel (Fig. 2A), which may .alter substrate binding (120 V/E, 174 E/K, 180 A/T, and 133 [DELTA]KA/T) and two through disruption of active site coordination (322 A/V, 477 L/ P) (Fig. 2D). Each is discussed below.

120 V/E

This residue is positioned on the surface of the protein at the entrance to the active site channel. The change from valine to glutamate results in the loss of a hydrophobic residue and the gain of a negative charge, which could affect the entrance of substrate peptide to the active site.


These residues are located within a helix (Fig. 2B), and the deletion of a residue here is likely to cause some local changes in the helix structure. Loss of the lysine residue results in the loss of a positive charge. Because these residues are in a positively charged patch at the entrance to the active site channel, this change may affect entrance of substrate to the active site channel.

174 E/K

This residue is within the entrance channel to the active site (Fig. 2C). Substitution from glutamate to lysine changes the charge from negative to positive, which could significantly affect access of substrate to the active site. However, comparison with other bovine LAP structures indicates the peptide substrate may bind on the opposite side of the channel, meaning the substitution might have less effect.

180 A/T

This lies within the entrance cavity to the active site channel. Substitution may affect substrate entry, as threonine has a larger side chain than alanine.

322 A/V

The change from alanine to valine could cause unfavorable hydrophobic interactions between helices, resulting in some movement of a helix (residue 316-331), the N-terminus of which is involved in zinc binding in the active site. However, the valine

residue is at a distance of ~17 [Angstrom] from the zinc, which may be too far to affect the structure of the active site.

477 L/P

In the LAP structure, residue 477 is close to the arginine residue (Fig. 2D), which is implicated in activity (Strater et al. 1999b). The adjacent residue, phenylalanine 476, is important in limiting the flexibility of the conserved arginine, and it is possible that the 477 L/P change may perturb this arrangement slightly. Of all the amino acid substitutions identified, 477 L/P is the closest to the catalytic site and possibly more likely to affect the catalytic mechanism of the LAP enzyme.

Predictions of Protein Stability

The amino acid sequences encoded by each allele (representing aU combinations of SNPs causing amino acid substitutions) were input into ProtParam (EXPASY server, Gasteiger et al. 2005) and some of the output is included in Table 2. Instability index is a measure of protein stability in vitro, calculated by the occurrence of 400 dipeptides in the protein sequence (Guruprasad et al. 1990). Proteins with an instability index <40 are predicted as stable, whilst those with an index >40 are predicted as unstable. Based on this, although the differences are small, four of the alleles are predicted to be unstable (see Table 2), whereas most of the allelic variants are likely to be stable. It was not possible to predict which specific substitutions are most likely to affect stability. The truncated proteins were excluded from this analysis.

The SNPs we have identified in the C. gigas LAP gene may be used in the future as genetic markers, for example to study different populations. Because we have also identified which of the LAP gene variations are most likely to have physiological effects, by focusing on these SNPs it may be possible to develop an assay to screen for altered LAP activity.


We wanted to know whether the LAP gene is also highly variable in other organisms. There is little report of LAP gene diversity in the literature, although variations have been found within the LapN gene of Lycopersicon esculentum (tomato): changes affecting the N-terminal and catalytic regions of the protein are predicted to make the enzyme inactive (Tu et al. 2003). Otherwise, from information available in data bases, it seems that the LAP gene is highly conserved, even in plants. The rice genome web site ( lists SNPs in the LAP flanking sequences and introns, but none within the coding sequence. For Arabidopsis, there is an insertion within the 3' flanking region, but no SNPs within the LAP coding sequence or introns (from The Arabidopsis Information Resource web site). Viewing other genes (not LAP), chosen at random, showed large numbers of codingregion SNPs, suggesting that LAP is unusually highly conserved in both rice and Arabidopsis.

As stated above, C. gigas has a highly variable genome. Even considering this, C. gigas may be unusual in having such a variable LAP gene. We can only speculate that there are greater benefits of LAP variation for an estuarine animal, which must tolerate different levels of salinity, resulting from tidal effects and varying freshwater input.

LAP Intron Microsatellite Repeats

Introns 1 and 9 were shown to contain simple sequence repeat (SSR), or microsatellite (MS) regions: [(CT).sub.n], where n = 15-32 (MS 1, intron 1); n = 6-23 (perfect repeats, MS 2, intron 9), as shown in Figure 3. For MS1, the CT repeats were perfect in all clones, with only the number of these varying. MS2 is an imperfect microsatellite region, with an extra C being added and or 1 or more Ts being replaced by Cs in the different clones.

The identification of 2 microsatellite regions may enable the future development of a simple method to screen for C. gigas LAP variants. A greater number of SSR variations would probably be found if more sequence analyses were performed, looking at these regions in numerous different individuals. It is possible that particular variants of the microsatellites are linked to other LAP SNPs and, indeed, to variations in growth or other physiological traits.


Microsatellite DNA exhibits higher rates of sequence change than any other part of eukaryotic chromosomes, making it useful as a molecular marker for taxonomic and phylogenetic studies. Microsatellites have already been used in various bivalves: abalone (Muchmore et al. 1998), scallop (Canapa et al. 2000), mussels (Martinez-Lage et al. 2002), and oyster (Lopez-Flores et al. 2004).


The function of microsatellites has been reviewed, and intron microsatellites are reported to affect gene transcription, mRNA splicing, or export to the cytoplasm (Li et al. 2004). In humans, intronic microsatellites are involved in diseases such as cancers of the breast and colon, Friedrich's ataxia, spinocerebellar ataxia, cystic fibrosis, and muscular dystrophy. Interestingly, some microsatellites located in the first intron, as is the case for C. gigas M1, are involved in transcription regulation. Polymorphisms in microsatellites located at the 5'-end of introns, as is the case for C. gigas M2, can result in abnormal splicing. It is therefore wholly possible that both the microsatellite regions in the LAP gene of C. gigas described here have direct effects on LAP activity.

Crassostrea hongkongensis also Contains a Microsatellite Repeat in Intron 1

Several pairs of primers designed for C. gigas produced specific PCR products from C. hongkongensis genomic DNA (between them covering approximately 60% of the entire LAP region), which suggests a high degree of LAP sequence conservation between the two species (data not shown). Initial DNA sequence analysis of one fragment of the LAP gene from C. hongkongensis revealed a [(CT).sub.n] microsatellite region (n = 20-22 perfect repeats). This is located at the same position, within intron 1, as the microsatellite region identified in C. gigas (see Fig. 4), further indicating the similarity of the LAP gene in the 2 species. Based on the limited data available, (good sequence data was only obtained for 3 C. hongkongensis clones), there appeared fewer CT repeats in the C. hongkongensis SSR. Although all the microsatellite region 1 sequences obtained for C. gigas showed perfect CT repeats, the C. hongkongensis MS1 region contained an A residue. The discovery of an SSR within intron 1 of both the species investigated supports the theory that the microsatellite region is functional, possibly affecting gene regulation.


DNA sequence analysis has identified a large number of polymorphisms within the LAP gene of C. gigas. Our data suggests that the LAP gene may show more variation than other C. gigas genes previously analyzed for SNPs. In addition, we suggest that the C. gigas LAP gene is far more variable than the LAP genes of other organisms studied to date. In total, we identified 73 SNPs within the LAP gene coding sequence, indicating a frequency of one SNP every 22 bp: a high SNP frequency even for C. gigas. Of these SNPs, 22 are predicted to alter the amino acid sequence of the encoded protein and one to generate a truncated protein. These represent 33 different protein sequences: compared with only six alleles previously predicted by allozyme studies. Protein structural predictions allowed us to speculate that some of the amino acid substitutions affect enzyme function. We have also identified microsatellite repeat regions within two of the LAP gene introns, and demonstrated that variation occurs within these regions.

We demonstrated that there is substantial sequence similarity between the LAP genes of two different species of oyster: C. gigas and C. hongkongensis, and that they both have a microsatellite repeat region within intron 1. This suggests that some of the data obtained from C. gigas may also apply to other species of oyster.

Both the SNPs and microsatellites are suitable for further study and for developing genotype-screening methods. Although beyond the scope of this study, it is possible that some of the synonymous SNPs may affect LAP activity, hence these too could be the subject of future study. Given the ecological importance of the LAP gene, sequence variation identified in this study may, in the future, be useful as genetic markers. These markers could be used to compare C. gigas individuals and/or populations and to further our understanding of stress responses and growth efficiency, with applications in genetic-based stock enhancement and breeding design.


Support was provided by Plymouth Marine Laboratory's Core Strategic Research Program. The authors thank The Exeter Biocatalysis Centre, in particular Prof. J. Littlechild and Dr K. Line for their service in LAP structural analysis and prediction of the effects of LAP amino acid substitutions, and Dr K. Lam (Hoi Ha Wan Marine Life Centre, City University of Hong Kong) for providing C. honglongensis DNA. The authors remember Dr Anna Goostrey with gratitude for helpful discussions, advice, and friendship during the course of this project, and dedicate this paper to her memory.


Batley, J., G. Barker, H. O'Sullivan, K. J. Edwards & D. Edwards. 2003. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol. 132:84-91.

Buroker, N. E., W. K. Hershberger & K. K. Chew. 1979. Population genetics of the family Ostreeidae. I. Intraspecifc studies of Crassostrea gigas and Saccostrea commereialis. Mar. Biol. 54:157-169.

Burley, S. K., P. R. David, R. M. Sweet, A. Taylor & W. N. Lipscomb. 1992. Structure determination and refinement of bovine lens leucine aminopeptidase and its complex with betastatin. J. Mol. Biol. 224:113-140.

Canapa, A., M. Barucca, P. N. Cerioni & E. Olmo. 2000. A satellite DNA containing CENP-B box-like motifs is present in the Antarctic scallop Adamussium colbecki. Gene 247:175 180.

Chao, W. S., Y.-Q. Gu, V. Pautot, E. A. Bray & L. L. Walling. 1999. Leucine aminopeptidase RNAs, proteins, and activities increase in response to water deficit, salinity, and the wound signals system in, methyl jasmonate, and abscisic acid. Plant Physiol. 120:979-992.

Ciechanover, A. & A. L. Schwartz. 1994. The ubiquitin-mediated proteolytic pathway: machanisms of recognition of the proteolytic substrate and involvement in the degradation of native cellular proteins. FASEB J. 8:182-191.

Clifford, R., M. Edmonson, Y. Hu, C. Nguyen, T. Scherpbier & K. H. Buetow. 2000. Expression-based genetic / physical maps of single nucleotide polymorphisms identified by the cancer genome anatomy project. Genome Res. 10:1259-1265.

Day, A. J., A. J. S. Hawkins & P. Visootiviseth. 2000. The use of allozymes and shell morphology to distinguish among sympatric species of the rock oyster Saccostrea in Thailand. Aquaculture 187:51-72.

Deutsch, S., C. Iseli, P. Bucher, S. E. Antonarakis & H. S. Scott. 2001. A cSNP map and database for human chromosome 21. Genome Res. 11:300-307.

Fujino, K. & N. Nagaya. 1977. Biochemical polymorphism in the Pacific oyster--II Variants in tetrazolium oxidase and leucine aminopeptidase. Bull. Japan. Soc. of Sci. Fish. 43:1455-1459.

Gasteiger, E., C. Hoogland, A. Gattiker, S. Duvaud, M. R. Wilkins, R. D. Appel & A. Bairoch. 2005. Protein identification and analysis tools on the ExPASy server. In: J. M. Walker, editor. The proteomics protocols handbook. Totowa, N J: Humana Press. pp. 571-607.

Gupta, S. K., M. Aziz & A. A. Kahn. 1989. Serum leucine aminopeptidase estimation: a sensitive prognostic indicator of invasiveness in breast carcinoma. Indian J. Pathol. Microbiol. 32:301-305.

Guruprasad, K., B. V. B. Reddy & M. W. Pandit. 1990. Correlation between the stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability from its primary structure. Protein Eng. 4:155-161.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Aeids Symp. Ser. 41:95-98.

Hawkins, A. J. S. 1991. Protein turnover: a functional appraisal. Funct. Ecol. 5:222-233.

Hawkins, A. J. S. & A. J. Day. 1996. The metabolic bais of genetic differences in growth efficiency among marine animals. J. Exp. Mar. Biol. Ecol. 203:93-115.

Hedgecock, D., G. Li, S. Hubert, K. Bucklin & V. Ribes. 2004. Widespread null alleles and poor cross-species amplification of microsatellite DNA loci cloned from the Pacific Oyster, Crassostrea gigas. J. Shellfish Res. 23:379-385.

Kimchi-Sarfaty, C., J. M. Oh, I.-W. Kim, Z. E. Sauna, A. M. Calcagno, S. V. Ambudkar & M. M. Gottesman. 2007. A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science 315:525-528.

Koehn, R. K. & T. J. Hilbish. 1987. The adaptive importance of genetic variation. Am. Sci. 75:134-141.

Koehn, R. K., B. L. Bayne, N. M. Moore & J. S. Siebenaller. 1980. Salinity related physiological and genetic differences between populations of Mytilus edulis. Biol. J. Linn. Soc. 14:319-334.

Lato, K. & B. Morton. 2003. Mitochondrial DNA and morphological identification of a new species of Crassostrea (Bivalvia: Ostreidae) cultured for centuries in the Pearl River delta, Hong Kong, China. Aquaculture 228:1-13.

Launey, S. & D. Hedgecock. 2001. High genetic load in the Pacific Oyster Crassostrea gigas. Genetics 159:255-265.

Li, Y.-C., A. B. Korol, T. Fahima & E. Nevo. 2004. Microsatellites within genes: structure, function and evolution. Mol. Biol. Evol. 21:991-1007.

Lindblad-Toh, K., E. Winchester, M. J. Daly, D. G. Wang, J. N. Hirschhorn, J.-P. Laviolette, K. Ardlie, D. E. Reich, E. Robinson, P. Sklar, N. Shah, D. Thomas, J.-B. Fan, T. Gingeras, J. Warrington, N. Patil, T. J. Hudson & E. S. Lander. 2000. Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nat. Genet. 24:381-386.

Lopez-Flores, I., R. de la Herran, M. A. Garrido-Ramos, P. Boudry, C. Ruiz-Rejon & M. Ruiz-Rejon. 2004. The molecular phylogeny of oysters based on a satellite DNA related to transposons. Gene 339:181-188.

Mallet, A. L. & L. H. Haley. 1983. Growth rate and survival in pure population matings and crosses of the oyster Crassostrea virginica. Can. J. Fish. Aquat. Sei. 40:948-954.

Martinez-Lage, A., F. Rodriguez, A. Gonzalez-Tizon, E. Prats, L. Cornudella & J. Mendez. 2002. Comparative analysis of different satellite DNAs in four Mytilus species. Genome 45:922-929.

Mason, C. J., D. D. Reid & J. A. Nell. 1998. Growth characteristics of Sydney rock oysters Saccostrea commercialis in relation to size and temperature. J. Exp. Mar. Biol. Ecol. 227:155-168.

McGoldrick, D. J. & D. Hedgecock. 1997. Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific Oyster Crassostrea gigas (Thunberg): implications for the causes of inbreeding depression. Genetics 146:321-334.

Michinina, S. R. & L. Rebordinos. 1997. Genetic differentiation in marine and estuarine populations of Crassostrea angulata. Mar. Ecol. Prog. Ser. 154:167-174.

Moore, M. N., R. K. Koehn & B. L. Bayne. 1980. Leucine aminopeptidase (aminopeptidase-I), N-acetyl-[beta]-hexosamidase and lysosomes in the mussel, Mytilus edulis, in salinity changes. J. Exp. Zool. 214:239 249.

Muchmore, M. E., G. W. Moy, W. J. Swanson & V. Vacquier. 1998. Direct sequencing of genomic DNA for characterisation of a satellite DNA in five species of Eastern Pacific abalone. Mol. Mar. Biol. Biotechnol. 7:1-6.

Rafalski, A. 2002. Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol. 5:94-100.

Sauvage, C. C. S., S. Lapegue & P. Boudry. 2007. Identification of SNPs and their mapping for the identification of QTLs of resistance to summer mortality in the Pacific oyster Crassostrea gigas. Aquaculture 272:S309.

Scott, C. S., M. Davey, A. Hamilton & D. R. Norfolk. 1986. Serum enzyme concentrations in untreated acute myeloid leukaemia. Ann. Hematol. 52:297-303.

Smith, P. J., H. Ozaki & Y. Fujio. 1986. No evidence for reduced genetic variation in the accidentally introduced oyster Crassostrea gigas in New Zealand. N. Z. J. Mar. Freshw. Res. 20:569-574.

Strater, N. & W. N. Lipscomb. 2004. Leucyl aminopeptidase (animal). Review, Handbook of Proteolytic Enzymes, 2 ed. In: A. J. Barrett, N. D. Rawlings & J. F. Woessner, editors. London: Elsevier. pp. 896-901.

Strater, N., D. J. Sherratt & S. D. Collums. 1999a. X-ray structure of aminopeptidase A from Escherichia coli and a model for the nucleoprotein complex in Xer site-specific recombination. EMBO J. 18:4513-4522.

Strater, N., L. Sun, E. R. Kantrowitz & W. N. Lipscomb. 1999b. A bicarbonate ion as a general base in the mechanism of peptide hydrolysis by dizinc leucine aminoeptidase. Proc. Natl. Acad. Sci. USA 96:11151-11155.

Taylor, A., M. Daims, J. Lee & T. Surgenor. 1982. Identification and quantification of leucine aminopeptidase in aged normal and cataractous human lenses and ability of bovine lens LAP to cleave bovine crystallins. Curr. Eye Res. 2:47-56.

Tenaillon, M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F. Doebley & B. S. Gaut. 2001. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays spp mays L.). Proc. Natl. Acad. Sci. USA 98:9161-9166.

Thompson, J. D., D. G. Higgins & T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.

Tu, C.-J., S.-Y. Park & L. L. Walling. 2003. Isolation and characterization of the neutral leucine aminopeptidase (LapN) of tomato. Plant Physiol. 132:243-255.

Williams, G. C. 1975. Sex and evolution: monographs in population biology. Princetown, N.J.: Publ. Princetown University Press.


Plymouth Marine Laboratory, Plymouth PL1 2DH, United Kingdom

(1) Current address: The Bigelow Laboratory for Ocean Sciences, Maine 04575.

(2) Current address: The University of Sheffield, S10 2TN, United Kingdom.

(3) Current address: The University of Warwick, Coventry, CV4 7AL, United Kingdom.

(4) Current address: University of Otago, Dunedin, New Zealand.

(5) Current address: The Hyperbaric Medical Centre, Plymouth, PL6 8BU, United Kingdom.

* Corresponding author. E-mail:
Primers used in this study.

Forward primers
  PLPF1           5' - TGG CTG CTT CGA TTG TAA AAC - 3'
  PLPF2           5' - TGG ACA AAG GGA GAG AAT CC - 3'
  PLPF3           5' - CAC CCA CAA AAT TTG CAG AG - 3'
  PLPF8           5' - CAG CTA CCT GTT CAT GTC AAA G - 3'
  PLPF9           5' - CAG GTG CTG CTG GTG TTT T - 3'
Reverse primers
  PLPR1           5' - GTT GGT CGT CCT GAC ATG C - 3'
  PLPR3           5' - ATT CCG CCC AGT CTT TAT CC - 3'
  PLPR5           5' - CAG CCA GTT GTT CTT TGA CG - 3'
  PLPR10          5' - TCG TGA TAC AAC TCA AAG TTC A - 3'
  PLPf2 comp      5' - GGA TTC TCT CCC TTT GTC CA - 3'
M13 uni (-21)     5' - TGT AAA ACG ACG GCC AGT - 3'

C. gigas LAP alleles.

         Amino acid sequence polymorphisms (1)

10    17    26    30    41    72    96    120   [Allele.
S/N   V/I   L/F   S/F   K/N   K/Q   V/D   V/E    sup.2]

 X           X                 X     X              2
 X           X                 X     X              3
 X           X                       X              4
       X           X                 X              5
       X           X                 X              6
       X           X                 X              7
       X           X                 X              8
       X           X                 X              9
       X           X                 X             10
             X                 X     X             11
                   X                 X             12
      nd           X                 X             13
                   X                 X             14
                   X                 X             15
                   X                 X             16
                   X                 X             17
                   X                 X             18
                   X                 X             19
                         X           X     X       20
nd    nd    nd    nd    nd    nd    nd    nd       21
                         X           X             22
                         X           X             23
                         X           X             24
                                     X             25
                                     X             26
                                     X             27
                                     X             28
                                     X             29
                                     X             30
                                     X             31
nd    nd    nd    nd    nd                         32
                                     X             34

           Amino acid sequence polymorphisms (1)

[DELTA]   174   180   195   198   206   233   278   [Allele.
 KA/T     E/K   A/T   S/C   K/E   L/*   V/1   K/E    sup.2]

                                         X              4
                 X                       X              5
                 X                       X              6
                 X                       X              7
                       X     X                          8
                                         X              9
                 X                       X             13
                 X                       X             14
                                         X             15
                                         X             16
                                         X             17
                                         X             18
   X                               X     X     X       20
  nd                               X     X     X       21
                                         X             22
                                         X             23
           X                                           26
                       X     X                         27
                 X                                     32
   X                                                   34

              Amino acid sequence polymorphisms (1)

281   322   421   422   477   528   530   [Allele.      Index
Q/H   A/V   E/D   N/D   L/P   K/N   */R    sup.2]     [dagger]

                                              1         40.22#
                   X           X              2         39.77
                               X              3         39.40
                                              4         39.42
             X     X                          5         39.36
             X     X           X              6         39.08
                               X              7         39.44
             X     X                          8         41.47#
                                              9         40.09#
                               X             10         39.75
                               X             11         39.77
                                             12         39.21
                                             13         38.90
                               X             14         38.61
                                             15         39.26
             X     X                         16         38.90
             X     X           X             17         38.61
                               X             18         38.98
                               X             19         38.92
                         X     X     X       20          NA
                               X             21          NA
                                             22         39.18
                               X     X       23         38.84
 X                             X     X       24         37.95
                                             25         39.57
             X     X           X             26         39.92
             X     X                         27         41.00#
 X                             X     X       28         38.40
       X     X     X                         29         39.05
             X     X                         30         39.21
                               X             31         39.29
                                             32         39.85
                               X             33         39.93
                               X             34         39.29

Sequence [polymorphisms.sup.1] are listed by residue number
and amino acid substitution. An [allele.sup.2] was defined by
a particular combination of DNA polymorphisms causing amino
acid substitutions, with X indicating that an amino acid was
substituted. This was not determined (nd) in some cases.
Instability index ([dagger]) is a measure of protein stability
in vitro, with an index >40 indicating an unstable protein
(highlighted). *: translation stop. NA: not applicable.
COPYRIGHT 2008 National Shellfisheries Association, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Wharam, Susan D.; Wardill, Trevor J.; Goddard, Victoria; Donald, Kirsten M.; Parry, Helen; Pascoe, P
Publication:Journal of Shellfish Research
Article Type:Abstract
Geographic Code:1USA
Date:Dec 1, 2008
Previous Article:Gonadal development and histochemistry of the tropical oyster, Crassostrea corteziensis (Hertlein, 1951) during an annual reproductive cycle.
Next Article:Discrimination of nine Crassostrea oyster species based upon restriction fragment-length polymorphism analysis of nuclear and mitochondrial DNA...

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters