Extensive heteroplasmy and evidence for fragmentation in the Callinectes sapidus mitochondrial genome.
KEY WORDS: blue crab, mitochondrial, heteroplasmy, inheritance, editing, Callinectes sapidus
Mitochondrial DNA (mtDNA) has been extensively used in the study of interspecies phylogenies and intraspecies population genetics. It is generally agreed that mtDNA has a rapid evolutionary rate, lacks recombination, and is homoplasmic within individuals (Avise et al. 1987); making it a great tool for examining genetic differences and phylogeography over relatively short evolutionary time. These are generalizations, however, and the assumption of homoplasmy can sometimes be false when a cell or individual possesses multiple mitochondrial genotypes giving rise to heteroplasmy. This can confound population studies because mitochondrial heteroplasmy effectively creates a subpopulation of organelles within each individual of the organismal population. This can make the effective population size appear larger than it really is as well as create false negatives when determining lineages using mitochondrial (maternal) inheritance. Mitochondrial heteroplasmy has been reported in a wide range of species including insects [fruit flies: (Solignac et al. 1983), beetles: (Boyce et al. 1989)]; birds [gulls: (Crochet & Desmarais 2000, Kvist et al. 2003)]; fish [red drums: (Gold & Richardson 1990)]; mammals (Hauswirth & Laipis. 1982, Boursot et al. 1987, Moraes et al. 1989, Holt et al. 1990); isopods linked to an atypical genome structure (Doublet et al. 2008, Doublet et al. 2012); and consistently in mussels (Fisher & Skibinski 1990, Quesada et al. 2003, Breton et al. 2006, Sano et al. 2010)). The initial description of heteroplasmy usually relies on identification of multiple peaks in a sequence chromatogram. This was first observed in Holstein cows when researchers noticed the mtDNA polymorphism within a maternal lineage (Hauswirth & Laipis 1982). Wu et al. (2000) confirmed that multiple mitochondrial genome sequences existed within individual cows by sequencing cloned large fragments including 12S, 16S, tRNA, and the D-loop. Heteroplasmy can be benign, but because of the small size and high density of open reading frames for essential genes within the mitochondrial genome, sequence inconsistencies can prove fatal. Many cases of heteroplasmy have been documented in humans in both coding genes and noncoding regions that have been linked to pathogenic mitochondrial diseases (Van Hove et al. 2008, Wang et al. 2008, Ye et al. 2008).
Although heteroplasmy is normally attributed to accumulated point mutations, the mtDNA heteroplasmy in mussels in the genus Mytilus is due to an unusual type of inheritance called doubly uniparental inheritance (Fisher & Skibinski 1990), by which both paternal and maternal mtDNA may be passed on to the offspring. In other cases of mtDNA heteroplasmy, the reason is thought to be either paternal leakage from incomplete condensation of sperm (Kvist et al. 2003), germ-line mitochondrial molecule segregation (Hauswirth & Laipis 1982, Wu et al. 2000), mitochondrial gene duplication (Abbott et al. 2005), somatic mutation (Chatterjee et al. 2006), or sometimes, a false mtDNA heteroplasmy caused by nuclear copies of mitochondrial genes. Among them, nuclear pseudogenes have been increasingly reported (Lopez et al. 1996, Williams & Knowlton 2001, Parr et al. 2006). Another common form of mtDNA heteroplasmy involves length variation caused by variable numbers of tandem repeats in the noncoding control region of the mitochondrial genome (Nesbe et al. 1998). This length variation is thought to occur through slipped strand mispairing during replication (Densmore et al. 1985). In some lineages, extensive recombination and site replication resulted in genome fragmentation and encoding of the genome on multiple minicircles (Schuster & Brennicke 1994, Mao et al. 2014). The fragmented transcripts that often contain degenerate sequences are then glued back together and edited using a guide RNA encoded on a separate chromosome (Cruz-Reyes & Sollner-Webb 1996). Recent data concerning the population structure of Callinectes sapidus derived using mitochondrial markers demonstrated an extremely high level of diversity even within isolated water bodies (Feng et al. 2017b). Sequence discrepancies in the mitochondrial control region among C. sapidus siblings were also detected in unpublished studies following the sequencing of the mitochondrial genome (Place et al. 2005). It is unclear if this sequence diversity within populations and broods is from an extremely high mutation rate or an artifact resulting from heteroplasmy. In this study, a known lineage of three C. sapidus individuals, a male, a female, and an Fl megalopa, was used to determine the source and extent of heteroplasmy using cloning of polymerase chain reaction (PCR) products of three loci, nad2, nad4, and coI, located in different regions of the mitochondrial genome. To further validate this work, a comparison of somatic and germline tissues was performed using Illumina sequencing of circular 16-kb amplified DNA templates as well as cDNA templates at four loci from a single individual.
MATERIALS AND METHODS
DNA Extraction and PCR Amplification
Deoxyribonucleic acid for familial analysis was extracted from two hatchery raised naively mated adult blue crabs and a megalopa from their Fl offspring. Given the available resources, it was decided to sequence three loci from these individuals and make artifacts from pseudogenes less likely rather than sequence a single locus and include multiple progeny. For each of the two adult crabs, one walking leg was preserved in 95% ethanol until the DNA extraction. Approximately, 100 mg of muscle tissue from the two adults was used for DNA extraction with the Qiagen DNeasy Blood and Tissue Kit (Valencia, CA). The megalopa was also stored in 95% ethanol until DNA extraction. The entire megalopa was placed into 200 [micro]l DNA extraction buffer consisting of 3 M urea, 4 M guanidine thiocyanate, 50 mM Tris pH 8.0, 20 mM ethylenediaminetetraacetic acid, and 1% Tween 20 and homogenized with a wide bore pipette tip. The megalopa was incubated for 3 h at 60[degrees]C and spun at 12,000 X g for 10 min. The supernatant was mixed with 600 [micro]l binding buffer and extracted using a Zymo Research DNA clean and concentrate-5 column according to the instructions of the manufacturer (Irvine, CA). Deoxyribonucleic acid from all three individuals was assessed for quality, quantified and diluted to 10 ng/[micro]l using a Nanodrop ND-1000 spectrophotometer (Wilmington. DE). Deoxyribonucleic acid for genome sequencing was extracted from the testes and swimming leg muscle from an adult male following euthanasia on ice using the same procedure as the megalopa. Deoxyribonucleic acid concentration was assessed using a Nanodrop ND-1000 and diluted to 10 ng/[micro]l. Ribonucleic acid from testes and muscle from the same individual were extracted using Tri Reagent (Sigma-Aldrich, St. Louis, MO) according to the instructions of the manufacturer. The RNA (1 [micro]g) was converted to cDNA using random primers and Superscript II (Invitrogen. Carlsbad. CA) according to the instructions of the manufacturer. Polymerase chain reaction primers were designed based on the published complete Callinectes sapidus mitochondrial genome sequence (Place et al. 2005) Gen Bank accession NC_006281, targeting four genes--12S, nad2, nad4, and coI. Primers were designed to generate a product of easily sequenced length (around 650 bases) and be within the open reading frame of each locus greater than 100 base pairs from each end. Primer sequences and product lengths are shown in Table 1. The outward facing 16S primers from Place et al. (2005) were also used for genome amplification. Primer locations and orientation are shown with the genome map in Figure 1. Polymerase chain reactions for familial analysis and cDNA library generation were carried out in a final volume of 20 [micro]l using the high-fidelity cloned Phusion polymerase (New England Biolabs), using appropriate buffers as the instructions indicated. The thermal cycling conditions were as follows: initial denaturation and heat activation at 98[degrees]C for 5 min followed by 40 cycles of denaturation at 95[degrees]C for 15 s, annealing at 48[degrees]C for 30 s, and extension at 68[degrees]C for 30 s. Genome amplification was performed using the outward facing 16S PCR primers in a 20[micro]l volume also using the Phusion polymerase. This method uses site-specific primers exponentially amplifying circular templates. The cycling conditions for genome amplification consisted of an initial denaturation and heat activation at 96[degrees]C for 2 min followed by 30 cycles of denaturation at 96[degrees]C for 15 s, annealing at 48[degrees]C for 30 s, and extension at 68[degrees]C for 8 min. Genome amplification by rolling circle amplification was performed using the Templiphi Kit (GE Life Sciences, Marlborough, MA) according to the instructions of the manufacturer. This method uses random priming of circular templates with continuous amplification. The size of amplified products for all methods was estimated by electrophoresis in 1% agarose gels.
Cloning and Library Generation and Sequencing
Polymerase chain reaction products of each of the three genes for familial analysis were cloned using the StrataClone Blunt PCR Cloning Kit (Stratagene, Santa Clara, CA). Colonies were screened for inserts using PCR and positive colonies were selected and cultured for plasmid extraction using the protocol described in Molecular Cloning (Sambrook et al. 1989). The extracted plasmids were sequenced in a 10-[micro]l volume, consisting of 40-150 ng of PCR product, 3 pmol of primer, 0.5 pi Big Dye v3.1 sequencing mix, and 1.5 [micro]l 5X sequencing buffer (Applied Biosystems, Carlsbad, CA). The cycling parameters were 95[degrees]C for 5 min; followed by 50 cycles at 95[degrees]C for 15 s, 50[degrees]C for 15 s, and 60[degrees]C for 4 min. The sequencing reaction product was purified according to the instructions provided by the Big Dye v3.a Sequencing Kit. The pellet was air-dried for 20-30 min before 10 [micro]l HI-DI formamide was added (Applied Biosystems). The mixture was heated at 95[degrees] C for 2 min and snap cooled on ice. The denatured sequencing product was loaded into an ABI 3130x1 genetic analyzer and sequence chromatograms produced. Each PCR product was sequenced in both the forward and reverse directions.
Polymerase chain reaction products from the cDNA templates as well as the two amplified genome libraries from DNA templates were fragmented, labeled, and purified using the Nextera XT Library Preparation Kit from Illumina (San Diego, CA) according to the instructions of the manufacturer. Libraries were normalized and sequenced on a MiSeq at the Bioanalytical Services laboratory at IMET using a 2 X 300 MiSeq Reagent Kit v3.
The forward and reverse sequence reads from the clone libraries were aligned to the reference mitochondrial genome sequence (NCBI accession NC_006281.1) and curated using Sequencher 4.8 (Gene Codes, Ann Arbor, MI). The sequences were also checked for double peaks to ensure that only single colonies had been picked during the cloning procedure. The consensus sequence of the two reads was exported for further analysis. The polymorphism diversity calculation and neutrality test were performed by using the program DnaSP v5 (Librado & Rozas 2009). For the haplotype/nucleotide diversity, the following measures were calculated: S, as the number of segregating (polymorphic) sites; haplotype (gene) diversity and its sampling variance (Nei 1987); the standard deviation (or standard error) as the square root of the variance; nucleotide diversity, Pi ([Pi]), as the average number of nucleotide differences per site between two sequences and its sampling variance (Nei 1987); the mutation parameter Theta ([theta], per site or per gene) from S or Eta ([eta]), as the total number of mutations (Watterson 1975, Nei 1987); and k, the average number of nucleotide differences between two sequences (Tajima 1993). The free software Network 188.8.131.52 (fluxus-engineering.com) was used to produce minimum spanning networks of all haplotypes with the Median Joining calculation (Bandelt et al. 1999).
Fastq files generated by the MiSeq were imported into the CLC Genomics Workbench version 9 (Qiagen) for analysis. Remaining Illumina adaptors and low-quality bases (<40 PHRED) were trimmed from the read ends and the reads from each library (two tissues, two DNA templates, and one cDNA template for six total libraries) were mapped onto the reference genome with a minimum match length of 30% and minimum match identity of 60%. Variants were called and tabulated using a cutoff of 5% of the total reads, a minimum quality score of 30, and a minimum neighborhood score of 30 within 5 bases. Read mappings were used to generate SAM files and a Perl script was written to search for split read mappings with greater than 10 bases unmapped on either end. This was done to determine the number of reads in each sample that contain long unmapped portions as a proxy for estimating fragmentation.
Familial Analysis of Cloned Sequences
The total number of clone sequences used in this study for each gene and individuals is shown in Table 2. In total, 217, 105, and 32 clones were sequenced in both directions using partial sequences from the nad2, coI, and nad4 loci for three Callinectes sapidus individuals. Among them, multiple haplotypes were observed for each gene in each organism. The maximum number of different genotypes occurring at a single locus in one individual was 24 for nad2 in the megalopa of 66 clones, whereas the minimum number was 2 for nad4 in the female from 8 clones. In general, unique haplotypes were obtained in more than 20% of the sequenced clones except in the males for nad2 (8.7%) and for nad4 (15.8%). For nad2, there were 13 and 22 unique haplotypes between the female and the megalopa, respectively; 2 and 16 for coI, respectively; and 1 and 2, respectively, for nad4 (Figs. 2-4). All haplotypes from the male were unique with respect to the female and the megalopae except for a single sequence from the male that was identical to the coI-dominant sequences of the female and megalopa. The presence of unique haplotypes between the female and the megalopa is an immediate indication that sequencing saturation was not reached. At all three loci, a sufficient number of sequences was generated to show identity of dominant sequences between the female and the megalopa without evidence of the dominant sequence of the male in the megalopa.
The haplotype diversity (Hd) varied from 16.7% to 78.7% (Table 3). For all three genes (nad2, coI, and nad4), the megalopa showed the highest haplotype diversity (0.787,0.622, and 0.700, respectively) and the male had the lowest (0.167, 0.568, and 0.205, respectively) except for coI where the female had the lowest diversity (0.257). Comparing the average nucleotide diversity (Pi) and pairwise difference (k) in three genes across all clones, nad2 and nad4 had much higher values of Pi (0.00838 and 0.00736, respectively) and k (5.732 and 5.246, respectively) than coI (Pi = 0.00363 and k = 2.453), showing an overall higher genetic diversity of the two nicotinamide dehydrogenase subunits versus coI.
For nad2 (Fig. 2), the frequencies of the dominant haplotype in the female, the male, and the megalopa were, respectively, 72.0%, 91.3%, and 43.9%; for coI (Fig. 3), the dominant haplotype frequencies were 86.7%, 62.8%, and 61.7%; and for nad4 (Fig. 4), they were 87.5%, 89.5%, and 60.0%. The megalopa had the lowest relative abundance in its dominant haplotype for all three genes when compared with the two other individuals. This result is consistent with the higher haplotype diversity of the megalopa. The major haplotypes for all three genes of the female were always identical to that of the megalopa, whereas the major haplotypes of the male were always distinct and separated by many substitutions. The female and the megalopa also shared a few secondary haplotypes in the nad2 locus, which had the highest sequence coverage. The male did not share any minor haplotypes with either the female or the megalopa with one exception of a single sequence that was the same as the dominant haplotype of the female and the megalopa at the coI locus (Fig. 4). None of the dominant haploytpes included any stop codons but there were three minor haplotypes (five clones) of the nad2 locus, and two minor haplotypes (two clones) of the coI locus that had stop codons in the expected open reading frame. The remaining 212 sequences (41 DNA haplotypes) were without premature stop codons, encoding for 18 different protein sequences. Including stop codons, several haplotypes coded for amino acid substitutions when compared with the dominant haplotype at the nad2 locus as shown in Figure 5. The dominant haplotype of the female and the megalopa has one amino acid difference from the dominant haplotype of the male in this figure.
Variant Analysis of Illumina Library Sequences
Read mapping statistics for each of the six libraries is shown in Table 4. In general, the DNA-based libraries had a low number of reads map as a function of the percentage of total reads, especially for the Templiphi-generated libraries (Fig. 6). The unmapped reads consisted mainly of long stretches of dinucleotide and trinucleotide repeats. Although the rolling circle amplification of the Templiphi Kit is especially sensitive to spontaneous loop formation and repeat sequences, these same repeated sequences were observed in the Phusion-generated libraries. The same repeated sequences appear at the ends of multiple mapped reads at specific locations (Fig. 7). Along with repeated sequences are stretches of sequence for which an annotation could not be determined using BLAST and the NCBI nonredundant database. The read mapping showed read stacking near the 16S locus in the Phusion-generated libraries and near the heavy strand origin of replication in the Templiphi libraries (Fig. 6). These are likely artifacts from the amplification methods due to spontaneous loop formation at the heavy strand replication origin in the Templiphi library and 16S being used for priming with the Phusion library. Although a product size of approximately 16 kb was produced for both genome amplification methods, many minor products seem to have also been sequenced, especially in the Phusion-generated libraries. Results of variant calling are shown in Table 5 including single nucleotide variants (SNV), multiple nucleotide variants (MNV), and insertions and deletions (INDEL). In total, the DNA-based libraries produced 1,029 unique variants in both tissues and the cDNA libraries produced 241 unique variants in both tissues across the four open reading frames. Generally, the Templiphi libraries gave much more even coverage, whereas the large number of reads mapped in the cDNA libraries produced very deep coverage (Fig. 6). Also, although there were variants across the genome, the frequency of variants was highest in the control region, 7.3% in muscle and 7.8% in testes, relative to the rest of the genome, 2.2% in the muscle and 1.4% in the testes (Fig. 7). On the basis of positions identified as variant, there was 28.8% concordance combining both DNA methods among the two tissue types and 42.5% concordance in the cDNA libraries among the two tissue types. For each concordant position, there was no disagreement in the variants detected or the dominant variant based on frequency between the two tissue types. When comparing cDNA and DNA, there was 28.7% concordance of sites within the testes and 10.5% concordance in the muscle. When comparing cDNA and DNA libraries at concordant sites, however, there was frequently a disagreement as to the actual sequence variants as well as the number of variants and frequencies. Also, the split read mapping shown in Figure 6 was almost completely absent in the cDNA libraries, less than 2% compared with greater than 40% in each of the DNA-based libraries (Table 6). Although the unmapped portions of the reads were commonly simple repeats, other more complex sequences were present but could not be annotated using BLAST against the NCBI nonredundant database.
The Extent and Source of Heteroplasmy in the Blue Crab Mitochondrial Genome
Mitochondrial heteroplasmy has been described in many eukaryotic lineages, but aside from plants and mussels the incidence is often low and efforts are often not been made to determine the extent and source of heteroplasmy. This study focused on sequencing protein-coding regions from members of a known family as well as high throughput sequencing of DNA and cDNA from somatic and gonadal tissue to explore the origin, extent, and inheritability of heteroplasmic sequences in the blue crab. This study began with observations of double peaks in the sequence chromatograms of blue crab mitochondrial protein-coding genes. This was an unexpected find because heteroplasmic sequences are often in very low abundance or limited to noncoding regions such as the D-loop (Wirgin et al. 1995). The D-loop is an especially common area for heteroplasmy because repeated sequences can easily be amplified at the site of heavy strand replication initiation by recombination, and the D-loop region of the Callinectes sapidus mitochondrial genome was previously shown to have frequent insertions/ deletions between siblings and to be maternally inherited (Place et al. 2005). The remainder of the genome consists of genes encoding essential proteins for respiration and ATP synthesis. Thus, a strong selection pressure exists selecting against mutation of protein-coding genes in the mitochondrion. Regions of the mitochondria that exhibit abnormal redox potential are usually pinched off and targeted for destruction (Lee et al. 2004) thereby preventing accumulation of abnormal genotypes, but previous studies using mitochondrial gene markers to track hatchery produced crabs encountered double peaks in the sequence chromatograms shared by brood stock females and the resultant offspring (Feng et al. 2017a) alluding to the presence and inheritability of heteroplasmic sequences. This study confirms previous observations of heteroplasmy but with a much higher frequency than any other recorded instance of heteroplasmy, save for plant mitochondrial genomes, and with signs of genome fragmentation.
Heteroplasmy has been demonstrated to occur via a variety of mechanisms, but the number of haplotypes shown in the blue crab is unique among metazoans. In other metazoans only one or a few secondary haplotypes have been shown to occur. Thus, existing explanations for the occurrence of heteroplasmy in other metazoans fall short of explaining the results of this study in a variety of ways. The mitochondrial genome was sequenced (Place et al. 2005) using outward facing primers relying on a circular template and both DNA-based amplification methods used to construct Illumina libraries in this study also relied on circular templates. This makes nuclear pseudogenes an unlikely explanation for the observed sequence degeneracy. The presence of premature stop codons in some of the nad2 and coI sequences of the familial analysis is striking and is the strongest argument for nuclear pseudogenes, but pseudogenes in the nuclear genome would have to both occur at a low enough frequency for the male contribution to be unobservable within these data but also have undergone enough duplication to account for the observed haplotype diversity. Extensive duplication of mitochondrial genes transferred to the nucleus, especially for multiple noncontiguous regions of the mitochondrion, has never been documented. Also, the raw number of disparate haplotypes in the three open reading frames used in the familial analysis as well as disparities in overlapping reads in the Illumina libraries makes nuclear copies even more unlikely because it would require multiple transfer events. Most of the haplotypes observed arise from substitutions with synonymous amino acid translations to the dominant haplotype (Fig. 5), and estimates of substitution rates within blue crab populations has shown a trend toward synonymous substitutions at the third codon position (Feng et al. 2017b). This bias in sequence polymorphisms generally argues for selective pressure acting on a coding gene and not a pseudo-gene, barring the unlikely import of either the transcript or the protein into the organelle.
If heteroplasmy in the blue crab has arisen from accumulation of sequence variants, then fertilization by sperm that have not fully condensed remains a viable explanation. Although paternal contribution was not observed in this dataset, paternal haplotypes could be transferred to offspring at a very low frequency. This would mean that retention of these sequences is high, given the observed number of haplotypes. It is also possible that mitochondrial heteroplasmy has arisen due to accumulated mutations and replication errors because the mitochondrion itself experiences extremely high oxidative potential. Mitochondria appear to possess a competent set of DNA repair enzymes, but a reduction or breakdown in the ability to remove or repair damaged material would result in the accumulation of divergent sequences within each organelle (Alexeyev et al. 2013). Also, heterologous sequences within a single mitochondrion appear to remain segregated (Gilkerson et al. 2008), making the nonuniform transfer of mitochondrial genomes to offspring and an increase in generational genetic diversity more likely.
Amelioration of Heteroplasmy and Evidence for Transcript Editing
Regardless of the source of these sequence variants, it is unclear why heteroplasmy is not harmful in Callinectes sapidus as it is in other metazoan species. One possibility, albeit unlikely, is gene silencing. Reads mapping to mitochondrial genes in a published transcriptome of the blue crab do not show any evidence of heteroplasmy or fragmentation (Yednock et al. 2015). These reads were generated from a poly-A selected library and may represent transcripts selected for translation with all other degenerate or fragmented transcripts silenced and degraded. Mitochondrial sequences in this transcriptome could also be nuclear pseudogenes, although there was no evidence of sequence degeneracy other than a few synonymous substitutions compared with the reference genome. The haplotypes in the familial analysis that contain stop codons occur at multiple different amino acid residues relative to the reference genome and in different positions in the sequence, making selective read-through or degenerate codon usage unlikely. It could also be possible that the dominant haplotype is being used as a guide RNA to modify the other sequences posttranscriptionally. This may explain why this degeneracy and fragmentation was not observed in the original C. sapidus mitochondrial genome sequencing because the 16-kb product used for shotgun sequencing was gel isolated (personal communication, C. R. Steven) and the libraries in this study were generated without selection. The use of a guide RNA, often present on a "megacircle," to edit degenerate and possibly fragmented transcripts encoded on minicircles has evolved within several lineages (Schuster & Brennicke 1994, Cruz-Reyes & Sollner-Webb 1996). This may also explain why the split read mapping was almost absent from the cDNAbased Illumina libraries, as this could represent an intermediate in the editing process where many of the transcript fragments have been reconnected and some bases have been edited. Thus, some form of editing is the most likely explanation for the observations presented in this study. This is in contrast to terrestrial isopods where heteroplasmy has been retained and linearly inherited due to the unique trimeric genomic organization of the mitochondria of these organisms, but there is still a clear retention of purifying selection and heteroplasmy has been limited to one tRNA locus (Doublet et al. 2012). If the degenerate haplotypes coded by the blue crab mitochondrion were being expressed literally it would indicate a shift in our understanding of the coevolution between the nuclear and mitochondrial genomes. Although the oxidative phosphorylation complexes of the mitochondrion are composed of hundreds of genes, only a handful of these are coded for on the mitochondrial genome with the remaining coded for by the nuclear genome and transported to the mitochondrion. Heteroplasmy in the open reading frames of these genes would demonstrate an uncoupling of this nuclear/mitochondrial relationship and would require a plasticity in both genomes that has not been described.
Effects of Heteroplasmy on Physiological and Population Biology
The blue crab Callinectes sapidus is known to inhabit a wide range of salinities including the ocean floor, estuaries, and rivers. Although some members of the megalopae that come into an estuary settle at the mouth, many more migrate up to lower salinity waters. Even over short periods of time this species can readily adapt to large changes in salinity. Because this is accomplished by altering the blood CO2/carbonate content to compensate for changes in hemolymph osmolyte concentrations there is a metabolic adjustment that is made, demonstrated by changes in respiration rate (Engel & Eggert 1974). This requires a tremendous metabolic output by the mitochondrion and the oxidative stress within the organelle can cause extensive DNA damage. If the genome has been fragmented and expanded based on recombination events in an analogous manner to plant organellar genomes, this may be a way of protecting against damage by having "extra" genetic material that can be modified by oxidative damage and/or recombination and subsequently edited back to the canonical sequence. There seems to be limits to the ability of the editing machinery to make a canonical transcript evidenced by the large number of synonymous substitutions primarily at the third codon position. It is difficult to say exactly which strategy is more energetically favorable for maintaining genome integrity: a small genome that is robust but requires constant proofreading and elimination of errors, or an expanded genome that produces degenerate transcripts that require editing. It is also unclear what really is the "canonical" sequence in an individual and how this affects population dynamics. Population studies using mitochondrial markers show an extremely high level of diversity within a geographically or temporally bounded population (Feng et al. 2017b). It is possible that some of the low frequency haplotypes within a single individual selectively become the dominant amplified product, but the sequence chromatogram generated from the template amplicon used for subcloning in the familial analysis matched the most common haplotype from the clone libraries for all three individuals. Thus, there is no reason to believe that sequencing PCR products amplified from the mitochondrial genome does not represent the most common sequence within an individual. It is also not necessarily the case that the most common sequence within an individual is being used as a guide RNA and represents the version of the gene sequence ultimately used for translation. Although RNA editing is the most parsimonious explanation for the existence of observed heteroplasmy and fragmentation in the blue crab mitochondrial genome, confirmation of this hypothesis and the repercussions for population dynamics and genome evolution cannot be fully evaluated until a potential guide RNA has been identified. It is also unclear how heterogeneous the transmission of mitochondrial haplotypes is across many generations. If heteroplasmy is playing a role in the extremely high sequence diversity observed in blue crab populations, which is likely considering the extraordinarily high prevalence of heteroplasmy within the mitochondrial genome, then there will be interplay between increased drift due to relaxed selection pressure on the genomic sequence and uneven inheritance of mitochondrial haplotypes across generations. The added variables from heteroplasmic sequences in the blue crab can be circumvented by using cDNA barcoding, however, so although DNA sequencing can be useful for identifying discreet differences in populations (Feng et al. 2017b), RNA sequencing with mitochondrial loci may be better suited for future studies of population connectivity in this species.
This is contribution #16-187 from the Institute of Marine and Environmental Technology and #5281 from the University of Maryland Center for Environmental Science. Raw sequence reads are available from NCBI as sequence read archives under BioProject #PRJNA357714. This work was funded by a grant from NOAA Award number NA17FU2841 Blue Crab Advanced Research Consortium Project to AR P.
Abbott, C. L., M. C. Double, J. W. Trueman. A. Robinson & A. Cockburn. 2005. An unusual source of apparent mitochondrial heteroplasmy: duplicate mitochondrial control regions in Thalassarche albatrosses. Mol. Ecol. 14:3605-3613.
Alexeyev, M., I. Shokolenko, G. Wilson & S. LeDoux. 2013. The maintenance of mitochondrial DNA integrity--critical analysis and update. Cold Spring Harb. Perspect. Biol. 5:a012641.
Avise, J. C., J. Arnold, R. M. Ball, E. Bermingham. T. Lamb, J. E. Neigel, C. A. Reeb & N. C. Saunders. 1987. Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics. Annu. Rev. Ecol. Syst. 18:489-522.
Bandelt, H. J., P. Forster& A. Rohl. 1999. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16:37-48.
Boursot, P., H. Yonekawa & F. Bonhomme. 1987. Heteroplasmy in mice with deletion of a large coding region of mitochondrial DNA. Mol. Biol. Evol. 4:46-55.
Boyce, T. M., M. E. Zwick & C. F. Aquadro. 1989. Mitochondrial DNA in the bark weevils: size, structure and heteroplasmy. Genetics 123: 825-836.
Breton, S., G. Burger, D. T. Stewart & P. U. Blier. 2006. Comparative analysis of gender-associated complete mitochondrial genomes in marine mussels (Mytilus spp.). Genetics 172:1107-1119.
Chatterjee, A., E. Mambo & D. Sidransky. 2006. Mitochondrial DNA mutations in human cancer. Oncogene 25:4663-4674.
Crochet, P. A. & E. Desmarais. 2000. Slow rate of evolution in the mitochondrial control region of gulls (Aves: Laridae). Mol. Biol. Evol. 17:1797-1806.
Cruz-Reyes, J. & E. Sollner-Webb. 1996. Trypanosome U-deletional RNA editing involves guide RNA-directed endonuclease cleavage, terminal U exonuclease, and RNA ligase activities. Proc. Natl. Acad. Sei. USA 93:8901-8906.
Densmore, L. D., J. W. Wright & W. M. Brown. 1985. Length variation and heteroplasmy are frequent in mitochondrial DNA from parthenogenetic and bisexual lizards (genus Cnemidophorus). Genetics 110:689-707.
Doublet, V., R. Raimond, F. Grandjean. A. Lafitte, C. Souty-Grosset & I. Marcade. 2012. Widespread atypical mitochondrial DNA structure in isopods (Crustacea, Peracarida) related to a constitutive heteroplasmy in terrestrial species. Genome I National Research Council Canada = Genome !Conseil National De Recherches Canada. 55:234-44.
Doublet, V., C. Souty-Grosset. D. Bouchon. R. Cordaux & I. Marcade. 2008. A thirty million year-old inherited heteroplasmy. PLoS One 3:e2938.
Engel, D. W. & L. D. Eggert. 1974. The effect of salinity and sex on the respiration rates of excised gills of the blue crab, Callinectes sapidus. Comp. Biochem. Physiol. A 47:1005-1011.
Feng, X., E. P. Williams & A. R. Place. 2017a. Successful identification and discrimination of hatchery reared blue crabs (Callinectes sapidus) released into the Chesapeake Bay using a genetic tag. J. Shellfish Res. 36:277-282.
Feng, X., E. P. Williams & A. R. Place. 2017b. High genetic diversity and implications for determining population structure in the blue crab Callinectes sapidus. J. Shellfish Res. 36:231-242.
Fisher, C. & D. O. F. Skibinski. 1990. Sex-biased mitochondrial DNA heteroplasmy in the marine mussel Mytilus. Proc. R. Soc. Lond. B Biol. Sei. 242:149-156.
Gilkerson, R. W., E. A. Schon, E. Hernandez & M. M. Davidson. 2008. Mitochondrial nucleoids maintain genetic autonomy but allow for functional complementation. J. Cell Biol 181:1117-1128.
Gold, J. R. & L. R. Richardson. 1990. Restriction site heteroplasmy in the mitochondrial DNA of the marine fish Sciaenops ocellatus (L.). Anim. Genet. 21:313-316.
Hauswirth, W. W. & P. J. Laipis. 1982. Mitochondrial DNA polymorphism in a maternal lineage of Holstein cows. Proc. Natl. Acad. Sei. USA 79:4686-4690.
Holt, I. J., A. E. Harding, R. K. Petty & J. A. Morgan-Hughes. 1990. A new mitochondrial disease associated with mitochondrial DNA heteroplasmy. American Journal of Human Genetics 46:428-433.
Kvist, L., J. Martens, A. A. Nazarenko & M. Orell. 2003. Paternal leakage of mitochondrial DNA in the great tit (Parus major). Mol. Biol. Evol. 20:243-247.
Lee, Y. J., S. Y. Jeong, M. Karbowski, C. L. Smith & R. J. Youle. 2004. Roles of the mammalian mitochondrial fission and fusion mediators Fisl, Drpl, and Opal in apoptosis. Mol. Biol. Cell 15:5001-5011.
Librado, P. & J. Rozas. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451-1452.
Lopez, J. V., S. Cevario & S. J. O'Brien. 1996. Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome. Genomics 33:229-246.
Mao, M., A. D. Austin, N. F. Johnson & M. Dowton. 2014. Coexistence of minicircular and a highly rearranged mtDNA molecule suggests that recombination shapes mitochondrial genome organization. Mol. Biol. Evol. 31:636-644.
Moraes, C. T., E. A. Schon, S. DiMauro & A. F. Miranda. 1989. Heteroplasmy of mitochondrial genomes in clonal cultures from patients with Kearns-Sayre syndrome. Biochem. Biophys. Res. Commun. 160:765-771.
Nei, M. 1987. Molecular evolutionary genetics. New York. NY: Columbia University Press.
Nesbo, C. L., M. O. Arab & K. S. Jakobsen. 1998. Heteroplasmy, length and sequence variation in the mtDNA control regions of three percid fish species (Perca fluviatilis, Acerina cernua. Stizostedion lucioperca). Genetics 148:1907-1919.
Parr, R. L., J. Maki, B. Reguly, G. D. Dakubo, A. Aguirre, R. Wittock, K. Robinson, J. P. Jakupciak & R. E. Thayer. 2006. The pseudomitochondrial genome influences mistakes in heteroplasmy interpretation. BMC Genomics 7:185.
Place, A. R., X. Feng, C. R. Steven, H. M. Fourcade & J. L. Boore. 2005. Genetic markers in blue crabs (Callinectes sapidus) II. Complete mitochondrial genome sequence and characterization of genetic variation. J. Exp. Mar. Biol. Ecol. 319:15-27.
Quesada, H., H. Stuckas & D. O. Skibinski. 2003. Heteroplasmy suggests paternal co-transmission of multiple genomes and pervasive reversion of maternally into paternally transmitted genomes of mussel (Mytilus) mitochondrial DNA. J. Mol. Evol. 57:S138-S147.
Sambrook, J., E. F. Fritsch & T. Maniatis. 1989. Molecular cloning: a laboratory manual. New York, NY: Cold Spring Harbor Laboratory Press.
Sano, N., M. Obata & A. Komaru. 2010. Mitochondrial DNA transmitted from sperm in the blue mussel Mytilus galloprovincialis showing doubly uniparental inheritance of mitochondria, quantified by real-time PCR. Zoolog. Sei. 27:611-614.
Schuster, W. & A. Brennicke. 1994. The plant mitochondrial genome: physical structure, information content, RNA editing, and gene migration to the nucleus. Annu. Rev. Plant Biol. 45:61-78.
Solignac, M., M. Monnerot & J. C. Mounolou. 1983. Mitochondrial DNA heteroplasmy in Drosophila mauritiana. Proc. Natl. Acad. Sei. USA 80:6942-6946.
Tajima, F. T. 1993. Introduction to molecular paleopopulation biology. Tokyo, Japan: Scientific Societies Press: Sinauer Associates. pp. 37-59.
Van Hove, J. L., C. Freehauf, S. Miyamoto, G. D. Vladutiu, J. Pancrudo, E. Bonilla, M. A. Lovell, G. W. Mierau, J. A. Thomas & S. Shanske. 2008. Infantile cardiomyopathy caused by the T14709C mutation in the mitochondrial tRNA glutamic acid gene. Eur. J. Pediatr. 167:771-776.
Wang, J. Y., Y. S. Gu. J. Wang, Y. Tong. Y. Wang, J. B. Shao & M. Qi. 2008. MGB probe assay for rapid detection of mtDNA 11778 mutation in the Chinese LHON patients by real-time PCR. J. Zhejiang Univ. Sei. B 9:610-615.
Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276.
Williams, S. T. & N. Knowlton. 2001. Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. Mol. Biol. Evol. 18:1484-1493.
Wirgin, I., M. Pedersen, S. Maceda. B. Jessop, S. Courtenay & J. R. Waldman. 1995. Mixed-stock analysis of striped bass in two rivers of the Bay of Fundy as revealed by mitochondrial DMA. Can. J. Fish. Aquat. Sei. 52:961-970.
Wu, J., R. K. Smith, A. E. Freeman, D. C. Beitz, B. T. McDaniel & G. L. Lindberg. 2000. Sequence heteroplasmy of D-loop and rRNA coding regions in mitochondrial DNA from Holstein cows of independent maternal lineages. Biochem. Genet. 38:323-335.
Ye, C., Y. T. Gao, W. Wen, J. P. Breyer, X. O. Shu. J. R. Smith, W. Zheng & Q. Cai. 2008. Association of mitochondrial DNA displacement loop (CA)n dinucleotide repeat polymorphism with breast cancer risk and survival among Chinese women. Cancer Epidemiol. Biomarkers Prev. 17:2117-2122.
Yednock, B. K., T. J. Sullivan & J. E. Neigel. 2015. De novo assembly of a transcriptome from juvenile blue crabs (Callinectes sapidus) following exposure to surrogate Macondo crude oil. BMC Genomics 16:521.
ERNEST P. WILLIAMS, XIAOJUN FENG AND ALLEN R. PLACE *
Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, 701 East Pratt Street, Baltimore, MD 21202
* Corresponding author. E-mail: firstname.lastname@example.org
Caption: Figure 1. Annotated map of the Callinectes sapidus mitochondrial genome. Heavy strand open reading frames are shown in green and complementary strand open reading frames are shown in blue. Structural RNA are shown as arrows with red heads. Primer binding sites for the primers used in this study are shown as arrows with black heads with the direction of the arrow indicating the direction of polymerase extension.
Caption: Figure 2. Median joining network of the nad2 locus. Each circle represents a unique haplotype where the diameter of the circle is the total haplotype frequency. Overlapping sequence of multiple individuals is indicated by pie slices where the size of the slice is the percentage frequency of that individual. Red circles or pie slices indicate paternal sequence, yellow is maternal sequence, and green is offspring sequence.
Caption: Figure 3. Median joining network of the nad4 locus. Each circle represents a unique haplotype where the diameter of the circle is the total haplotype frequency. Overlapping sequence of multiple individuals is indicated by pie slices where the size of the slice is the percentage frequency of that individual. Red circles or pie slices indicate paternal sequence, yellow is maternal sequence, and green is offspring sequence.
Caption: Figure 4. Median joining network of the coI locus. Each circle represents a unique haplotype where the diameter of the circle is the total haplotype frequency. Overlapping sequence of multiple individuals is indicated by pie slices where the size of the slice is the percentage frequency of that individual. Red circles or pie slices indicate paternal sequence, yellow is maternal sequence, and green is offspring sequence.
Caption: Figure 5. Median joining network of the nad2 locus depicting amino acid translations of the nucleotide sequences. Amino acid sequences identical to the dominant sequence of the female and the megalopa are shown in w hite. Nonsynonymous changes are colored according to the new amino acid state where blue is a positive amino acid, red is a negative amino acid, orange is a polar amino acid, green is a hydrophobic amino acid, gray is a neutral change to a similar amino acid, and black indicates a premature stop codon.
Caption: Figure 6. Coverage map showing the number of reads at each sequence position in the blue crab mitochondrial genome for the (A) Phusion-generated library, (B) Templiphi-generated library, and (C) cDNA-based library. The X axis corresponds to the sequence position of the reference genome GenBank accession NC_006281 and the Y axis is the read count given on a log 10 scale.
Caption: Figure 7. Read mappings from the (A) Templiphi-generated muscle library for the coI open reading frame, (B) nd4 open reading frame, (C) D-loop, and (D) nd6 open reading frame. Forward-mapped reads are shown in red with reverse reads in green. Bases in disagreement with the reference are highlighted red for adenine, yellow for guanine, green for thymidine, and blue for cytidine. Portions of the read that did not map to the reference are translucent and gaps are shown with a hyphen.
TABLE 1. Primers used in this study. Gene Amplieon Forward primer (5' -3') Reverse primer (5' -3') length (bp) nad2 750 TGCTTTATTATTCAACCCCG CCGAATAGATTGATTGAAGT coI 678 TTGCTGCTGCTATTGCTCAC AATACAGCGCCCATGGATAG nad4 816 CATTTACTTTTCCTGAACAACATGA TGCCTTTGTTGGTGTCTTTG 12S 790 CCAGGTTCACTTTCCAGTAA ATGAGAGTTGTAGCGGGTAA 16S 16kb ATGCTACCTTTGCACGGTCAAGA CTTATCAAAGGAAAAGTTTG TACCGCGGC CGACCTCGATGTTG TABLE 2. Haplotype frequency and total number of colonies sequenced for the nad2, nac/4, and coI loci from the three individuals. Number of haplotypes (number of colonies Megalopa Gene (bp) Total Female Male nad2 (685) 44 (217) 16 (82) 6 (69) 24 (66) coI(678) 28 (105) 3 (15) 10 (43) 17 (47) nad4(713) 7 (32) 2 (8) 3 (19) 3 (5) TABLE 3. Diversity indices for sequences from the nad2, nai4, and coI loci for the three individuals. Diversity index Total Female Male Megalopa nad2 Hd 0.747 0.481 0.167 0.787 Pi 0.00838 0.00528 0.00142 0.0081 k 5.732 3.62 0.973 5.539 coI Hd 0.763 0.257 0.568 0.622 Pi 0.00363 0.00059 0.00258 0.00161 k 2.453 0.4 1.748 1.09 nad4 Hd 0.635 0.25 0.205 0.7 Pi 0.00736 0.00541 0.00236 0.00112 k 5.246 2.5 1.684 0.8 TABLE 4. Read mapping statistics for the six Illumina libraries with average read length given in bases. Tissue Source Reads Number of Average read mapped (%) reads mapped length Testes Phusion 35.7 14K 56 Templiphi 1.1 23K 208 cDNA 59.8 1.58M 150 Muscle Phusion 21.8 218K 107 Templiphi 1.3 110K 143 cDNA 95.8 9.22M 159 TABLE 5. Results of variant detection for each libraries showing SNV, MNV, and INDEL. Tissue Source SNV MNV INDEL Testes Phusion 29 2 2 Templiphi 369 148 84 cDNA 66 13 4 Muscle Phusion 104 20 11 Templiphi 524 152 77 cDNA 278 59 18 TABLE 6. Total number of reads and the percent of mapped reads for each library annotated with split read mapping of more than 10 bases on the 5'-end, the 3'-end, or both ends. Tissue Source 5'-split > 3-split > Both ends Split of 10 bases 10 bases split total reads mapped (% Testes Phusion 10,082 3,961 1,138 80.9 Templiphi 17,833 18,490 13,376 40.0 cDNA 19,242 16,138 5,474 1.5 Muscle Phusion 171,191 142,231 110,690 42.2 Templiphi 86,829 70,505 53,170 46.1 cDNA 49,727 36,342 1,267 0.9
|Printer friendly Cite/link Email Feedback|
|Author:||Williams, Ernest P.; Feng, Xiaojun; Place, Allen R.|
|Publication:||Journal of Shellfish Research|
|Date:||Apr 1, 2017|
|Previous Article:||Population genomic analysis of the blue crab Callinectes sapidus using genotyping-by-sequencing.|
|Next Article:||Multiple paternity in the blue crab (Callinectes sapidus) assessed with microsatellite markers.|