Printer Friendly

Development and Validation of Targeted Next-Generation Sequencing Panels for Detection of Germline Variants in Inherited Diseases.

With the advent of massively parallel sequencing, commonly called next-generation sequencing (NGS), methodologies, the number of genes implicated in human disease has increased substantially in the last decade. The increase in gene discovery has led to a surge in the number of clinical laboratory tests offered to detect genetic variants associated with inherited disorders. Many disorders, such as sensorineural hearing loss, cardiomyopathy, and RASopathies, are genetically and clinically heterogeneous with variants in numerous genes resulting in the overlapping phenotypes. In contrast to a sequential (gene-by-gene) testing approach, such as Sanger sequencing, a disease-targeted NGS panel focused on the simultaneous analysis of a set of genes associated with a specific clinical indication is often a suitable cost-effective alternative. A laboratory must take gene- and disease-specific parameters into consideration when designing and analytically validating NGS-based gene panels for clinical testing. General guidelines for clinical NGS assays have been published by the American College of Medical Genetics and Genomics, the College of American Pathologists, the National Committee for Clinical Laboratory Standards, and the Association for Molecular Pathology. (1-3) The College of American Pathologists Biochemical and Molecular Genetics Committee has previously published examples of assay validation for molecular genetic testing, including a methods-based approach for validation of laboratory-developed testing by Sanger sequencing and verification of a US Food and Drug Administration-approved assay for cystic fibrosis mutation testing. (4,5) In this manuscript, we describe examples of the design and validation of NGS targeted panels for inherited disorders. Key considerations for test design, assessment of the validity of gene-disease relationships, validation criteria, and quality measures are addressed. Specifically, a methods-based validation approach (6) that was implemented at the Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, is described. We report an integrated validation strategy using HapMap samples and samples with specific disease variants for a combined targeted NGS panel for 5 diseases, including early infantile epileptic encephalopathy, craniofacial disorders, RASopathy disorders, hearing loss, and hereditary cancer. Additional test design considerations are highlighted using examples from the Laboratory for Molecular Medicine at Partners HealthCare Personalized Medicine, Boston, Massachusetts.

FAMILIARIZATION AND PLANNING

Disorders with significant locus and allelic heterogeneity, that is, those caused by multiple different sequence variants in one of several different genes, are typically prioritized for panel testing. Key considerations including specimen volume, expected turnaround time, and calculations of labor, time, and cost of reagents to perform and analyze the test can inform decision making about clinical test development. As with other molecular genetic tests, NGS germline panels can have several useful applications, such as confirming a clinical or prenatal diagnosis, facilitating presymptomatic surveillance, and developing strategies for management and early intervention. It is important to select genes with sufficient scientific evidence of a causative role in the disease, as variants in genes that are not yet established as disease causing are difficult to interpret and can lead to inconclusive results. Determination of the clinical validity of genes and corresponding variants often relies on the evidence presented in published literature. However, the type and depth of published evidence vary greatly for different genes, and objectively assessing the clinical validity of the disease association of genes can therefore be challenging. Thus, it is important for clinical laboratories to establish an objective method to curate and classify the evidence used to determine the strength of gene-disease associations. At the present time there are no expert or regulatory guidelines as to what level of evidence warrants inclusion of a gene in a test designed for diagnosis of a specific inherited disorder, and as a consequence, test content can vary substantially across testing laboratories. A comprehensive evidence-based framework for evaluating gene-disease association validity has been recently made available by the Clinical Genome Resource or ClinGen (https://www.clinicalgenome.org/ knowledge-curation/gene-curation/).

Once the gene content for a targeted NGS panel is determined, the next step is to determine the genomic region of interest. Many genes have multiple, alternatively spliced transcripts whose spatiotemporal expression can vary. Currently, there is no consensus among laboratories or specifications by regulators in selection of which transcript should be used for sequencing analysis and annotation. Existing approaches range from using a single transcript (eg, the one with the most exons or the one that is predominantly expressed in the tissue of interest) to a more inclusive "all-exon" approach. The constraint to the latter approach is that the relative importance of individual transcripts is often not well defined. Because multiple transcripts may be defined for a particular gene, it is important to include the transcript used for variant reporting in the laboratory report by referencing the messenger RNA transcript and protein sequence numbers for complementary DNA and protein nomenclature respectively. Once the transcripts for each gene have been selected, coding exons with flanking intronic regions are used to define the region of interest. There is currently no consensus on the length of intronic sequence that should be included in analysis, although most laboratories include sequences from [+ or -]10 to 20 bases past the intron-exon boundary in order to detect intronic mutations in the splice donor and acceptor sequences. However, it may be important to include more deep intronic regions, for example if known pathogenic mutations occur within an intron in a specific gene.

A thorough review of the variant spectrum associated with each gene is essential to identify common pathogenic variants or hot spots and pathogenic variants located outside of typically covered exonic regions, such as deep intronic or untranslated regions. This information is essential to determine the targeted genomic region of interest and can be valuable when selecting validation specimens. It is also important for determining the clinical sensitivity of testing, that is, what percentage of patients with disease will have a mutation that is detectable by the targeted region. For example, CFTR-related disorders are caused by single-nucleotide variants (SNVs) in exons, 5' and 3' untranslated regions, deletions, insertions/duplications, complex rearrangements, and intronic repeat variations. Another example is a common pathogenic variant in the Fabry disease gene (GLA) that would be missed if a standard exon-targeted design is applied. The deep intronic c.640801G>A variant in GLA is a frequent cause of the X-linked cardiac type of Fabry in the Taiwan Chinese population. (7)

Information about the disease, including key clinical indicators, disease mechanism, prevalence, mode of inheritance, penetrance, and expressivity, should be investigated at the test design stage. All of these factors play a critical role when interpreting results and writing a clear, concise report.

Example

For the methods-based validation approach described below (see Analytical Sensitivity, Specificity, and Precision section), 151 genes associated with 5 diseases were combined in one panel (12 genes for RASopathy, 97 for hearing loss, 17 for craniofacial disorders, 11 for hereditary cancer, and 15 for early infantile epileptic encephalopathy; Supplemental Table 1, contained in the supplemental digital content [containing 5 tables and 1 figure]) is available at www.archivesofpathology.org in the June 2017 table of contents. One gene, MSX2, was present in both the craniofacial and hearing loss syndromes. Genes and diseases were reviewed and scored for clinical validity. Information about the disease mechanism, mode of inheritance, and transcripts was noted in a central database (data not shown here). All coding exons ([+ or -]10 base pairs [bp] into the intron) were targeted for each gene. In total a ~0.5-Mb region was targeted for panel design and development.

Using the general guidelines listed above, the following is an example of gene curation for FGFR3 in designing a targeted NGS panel for a craniosynostosis panel. Variants in FGFR3 have been observed in 100% of individuals with Crouzon syndrome with acanthosis nigricans and in 100% of individuals diagnosed with Muenke syndrome. (8) FGFR3 has 3 transcripts and 18 coding exons. In severe presentations, de novo pathogenic variants in affected individuals are observed. Advanced paternal age has been reported to be associated with de novo pathogenic variants in Muenke syndrome. (9) The majority of pathogenic variants in FGFR3 are missense changes that result in an autosomal dominant gain of function effect. There is one recurring pathogenic variant, c.749C>G; p.Pro250Arg, that is the single cause of Muenke syndrome. (10) Based on this information, the targeted capture panel was designed to include genomic regions that encompass the coding region. Validation experiments were designed to include a positive control to confirm that the c.749C>G; p.Pro250Arg variant could be detected using the laboratory-based approach (Table 1).

TECHNICAL CONSIDERATIONS

During the design of an NGS gene panel, is it important for the laboratory to be aware of technical limitations of NGS technology. Many of these limitations may be inherent to all technologies, but some are specific to particular enrichment, sequencing, or bioinformatics techniques.

Technical Limitations

Interference of Homologous Sequences.--There are significant challenges in interrogating medically significant genes with high sequence homology. Some genes, or parts of genes, may not be adequately captured or sequenced to allow for confidence in quality of the data. These include genes with complex sequence contexts such as pseudogenes, genetic rearrangements, and a high GC content. Regions of high homology with other genomic regions, such as pseudogenes or gene duplication events, may lead to false-positive and/or false-negative results due to mismapped reads. It is critical that laboratories assess regions of homology to identify genomic regions within the targeted gene panel that may not be uniquely present in the genome. Variant calls in highly homologous regions that cannot be accurately detected by NGS can often be resolved by other methods such as Sanger sequencing if gene-specific primers can be designed. If that is not possible, affected regions may need to be excluded from the panel. If the excluded gene or region is critical to diagnosis of the disease, other methods such as long-range polymerase chain reaction may need to be used. Regions that are difficult to interrogate by NGS, such as those with high or low GC content and homologous regions, are particularly important to assess during the assay validation, especially in a methods-based validation approach. Short read lengths can make sequence assembly and alignment challenging when homology to other loci is present. Target sequences therefore need to be carefully examined to determine if the sequence context is amenable to short-read NGS. If significant challenges are evident, non-NGS assays may need to be added to the test to ensure optimal clinical validity. (11)

Exon Level Deletions and Duplications.--Deletion or duplication at exon level can be detected via NGS using several commercially available bioinformatics tools; however, the analytical sensitivity and specificity of these changes must be determined by the laboratory. (12)

Repeat Regions.--Clinically significant homopolymer tracts and triplet repeat expansions are usually not able to be detected by standard NGS and are better analyzed using other methods.

Other Limitations.--Mosaicism and low levels of heteroplasmy for mitochondrial DNA variants may not be detected, depending on the depth of sequence coverage and limit of detection that is validated. Depending on the availability of parental DNA, the chromosomal phase of identified pathogenic variants may not be determined (ie, whether variants are in cis or trans). Rare variants in primer or probe hybridization sites may compromise analytical sensitivity. Because a multigene panel is typically focused on the coding regions of the gene, regulatory region and deep intronic variants may not be identified.

Based on the design and validation data, a laboratory may decide whether to use another method to fill in for genomic regions that cannot be accurately analyzed by NGS or to exclude the region from analysis. Genomic regions that are not covered by testing should be included in the assay description and laboratory report and clinical sensitivity calculations should be adjusted accordingly.

Example

Below we provide an example from the Laboratory for Molecular Medicine during the development of a hearing loss panel, which lists genes with high homology and/or high GC content (see Supplemental Table 2 for complete gene list). Table 2 shows (1) the number of exons that are affected by high homology to other loci (here defined as 90% of all bases with a mappability score (11) of <1) and (2) the number of exons with unusually high (>75%) or low (<35%) GC content. The STRC gene illustrates a gene that cannot be analyzed with standard short-read NGS methods (28 of 29 exons have high homology with a pseudogene, with long stretches being 100% identical across exons and introns between the 2 genes13). Because this gene is a key contributor to nonsyndromic hearing loss14 and clinical specificity (the ability to deduce the genetic cause based on the patient's clinical features) is low, it was deemed critical to be included in a diagnostic gene panel for inherited hearing loss, and this can be accomplished by supplementing the NGS assay with a long-range polymerase chain reaction assay that discriminates between the gene and its pseudogene. (13) The TMC1 gene is also affected by homology and GC issues; however, in this case homology is restricted to 1 exon and 4 exons have low GC content. Based on whether these exons are critical, the laboratory director may decide to drop them from the test, particularly if unique Sanger sequencing primers cannot be designed for confirmatory purposes. In this case, unique Sanger sequencing primers were available for the TMC1 exon with homology to another genomic region. Much of the OTOGL gene (34 of 58 exons) has low GC content. Although this may result in poor coverage/data quality, it was considered worth generating validation data first to gauge the true extent of this problem. If the number of exons that fail NGS analysis is low, Sanger sequencing may be used to fill in insufficiently covered regions.

For genomic regions that are difficult to analyze by NGS, it may be advisable to investigate the feasibility of developing robust Sanger sequencing primers in parallel, for example to explore whether primer design is possible in these regions. The development of companion Sanger sequencing assays can be approached in different ways depending on the laboratory's general operational approach. For small gene panels, it is often possible and efficient to predevelop orthogonal assays for all exons covered by the test. This strategy does not scale with increasing gene content, as many assays will never be needed (either because no variant is ever detected that needs confirmation and/or the region performs robustly by NGS and do not need confirmation by Sanger sequencing). It maybe more practical to restrict Sanger predevelopment to vulnerable regions that have an increased likelihood to fail. A pilot run will identify problem regions (ie, those that always fail), but the scope of most test development efforts is usually insufficient to allow identifying all genomic regions of reduced robustness. It is for those regions that an upfront in silico ascertainment of the targeted test region is most useful. Figure 1 summarizes key concepts and provides a decision matrix for dealing with genomic regions that are difficult to sequence by NGS technology.

Selection of Target Enrichment and Sequencing Techniques

A critical step in test development is choosing which sequencing technology and enrichment techniques to use. Several commercial NGS platforms are available. Each sequencing platform has specific parameters that differ in sequence capacity, sequence read length, sequence run time, and quality and accuracy of the data. Size of the targeted region, type of variation detected, required depth of coverage, projected sample volume, turnaround time requirements, and costs are all considered when choosing a sequencer. For a comprehensive review on NGS technologies, the reader is referred to Mardis (15) and Metzker. (16)

All NGS targeted panels require enrichment of targeted genomic regions prior to sequencing. There are several strategies in which target enrichment can be achieved. These methods include polymerase chain reaction-based capture, molecular inversion probe-based capture, and hybrid capture methods. Each approach varies in sensitivity (percentage of target bases that are represented by one or more sequence reads), specificity (percentage of sequences that map to the intended targets), uniformity (variability in sequence coverage across target regions), reproducibility (correlation of results obtained from replicate experiments), cost, ease of use, and amount of DNA required. Mamanova et al (17) provides a comprehensive review of target-enrichment strategies. The data shown here were generated with a targeted hybridization-based approach using SureSelect for target enrichment (Agilent Technologies, Santa Clara, California) and sequenced using the MiSeq system (Illumina, Inc, San Diego, California).

Bait Design Strategy

Vendors of target capture assays typically allow custom design of baits. One key consideration is bait density, as this will impact capture efficiency (especially in difficult regions). To ensure reliable capture, it is advantageous to choose a baiting strategy that covers each base more than once. However, for very large targets this may not be practical for economic reasons. If complete coverage is critical (which is typically the case for diagnostic NGS testing) an iterative design process may be an option, where a less dense bait tiling is tested first and underperforming regions are then optimized.

A second key consideration is the total bait territory. For certain hybridization-based enrichment techniques, off-target capture is expected. For regions that capture well, it may not be necessary to cover the entire targeted region with baits, as captured fragments typically extend beyond both ends of a given bait. However, coverage at the edges of the targeted region will always be significantly lower, and these bases often are insufficiently covered. If complete coverage above the minimal acceptable number of reads is desired, it can therefore be beneficial to extend the baited region beyond the actual region of interest. Figure 2 shows the impact of these baiting strategies on final coverage for a representative exon.

BIOINFORMATICS PIPELINE FOR ALIGNMENT, VARIANT CALLING, ANNOTATION, AND FILTRATION

Next-generation sequencing produces an extensive amount of sequence data that is typically processed and analyzed in 3 major steps. The primary step, executed by onboard instrument software, translates sequencing signals into linear sequence with associated individual nucleotide base quality scores analogous to Phred scores. This information is compiled into a file format termed .fastq, which is the input for the secondary step during which sequence reads are aligned to a reference sequence. Aligned reads are compiled into a file format termed .bam. Key information in the .bam file includes read alignment location relative to the reference, read mapping quality, depth of read coverage per mapped location, and forward and reverse read distribution when bidirectional sequencing has been performed. The .bam files can be viewed in genome browsers that also allow visualization of variants in reads relative to the reference. The tertiary step uses the .bam file as input into software that determines differences between the aligned reads and the reference sequence and compiles those differences into a variant call file format. (18) The tertiary step also includes annotation of the variants (eg, assignment of c. and p. nomenclature) and association of variants with metadata (eg, variant frequency in populations). Each step is complex and to accomplish them requires a combination of algorithms and software that may be open source or commercial. The choices of algorithms and software are influenced by sequencing chemistry and instrumentation, the application and types of variants to be detected (eg, SNVs or copy number variants), and the bioinformatics expertise of the laboratory. Critically, it has been shown that different bioinformatics pipelines generate differences in variant outputs and accuracy. (19-21) The imperfect and evolving state of NGS bioinformatics poses challenges for clinical laboratories with regard to choice and evaluation of bioinformatics tools. Further discussion on NGS bioinformatics principles can be found in O'Rawe et al, (19) Reumers et al, (20) and Ross et al. (21)

Once the bioinformatics pipeline has been optimized, a comprehensive validation is performed using sequence reads generated from samples with known variants covering the spectrum of the diagnostic test. As described above, a sufficient number of samples should be analyzed to assess the pipeline's analytic and diagnostic sensitivity and specificity as well as precision. Once a pipeline has been validated, any changes to the protocol need to be documented and revalidated.

Example

For the validation data shown below (see Analytical Sensitivity, Specificity, and Precision), a methods-based validation was performed that encompassed the bioinformatic elements. Read alignment and variant calling were performed with an in-house bioinformatics pipeline that incorporated NovoAlign (Novocraft, Selangor, Malaysia) for read alignment and Picard (for duplicate removal) and the Genome Analysis Toolkit (Broad Institute, Cambridge, Massachusetts) for downstream processing and variant calling (reference sequence: hg19v37). Variant annotation and initial variant filtration were performed with Bench Lab NGS software (Cartagenia, Cambridge, Massachusetts) for variants with coverage of 5X or more. This filtration restricts the data to variants in the Human Genome Mutation Database and/or rare variants with a coding effect such as nonsynonymous, stop loss, stop gain, start loss, insertions/ deletions (indels), frameshifts, and variants within the consensus splice site (6 bases in the intron and 2 bases in the exon). Additional information about the alignment and variant calling pipeline is available in Supplemental Figure 1.

All algorithms, software, customizations, and databases used in the analysis of NGS data were documented and versioned. Quality control parameters were developed and documented. Parameters and thresholds that determine the overall quality of a successful sequencing run were established (see Quality Assurance and Quality Control and Supplemental Table 3).

ANALYTICAL VALIDATION

Analytical Sensitivity, Specificity, and Precision

Once the methodology is established and the protocol is optimized in the laboratory, the entire test should be validated, including all steps in the process (wet bench as well as the bioinformatics analyses) using all sample types that will be accepted for the panel (eg, whole blood; saliva; formalin-fixed, paraffin-embedded tissue; buccal swab; cultured amniocytes and chorionic villi). Regulatory requirements and quality management system standards require that laboratories determine assay performance characteristics including analytical sensitivity and specificity and precision or reproducibility. (5,22,23) All 3 of these measures are determined by testing samples that are from individuals with known sequence variants and known negative controls. For validation of an NGS panel, it is not feasible to identify and analyze controls for every possible mutation within the targeted genes; therefore, a methods-based validation approach was taken. The methods-based validation approach incorporates samples with known mutations, particularly targeted to common mutations and specific types of variants or genomic regions that may be more difficult to detect, such as indels, GC-rich regions, and regions of repetitive sequence. Positive control samples that have high-confidence SNV and indel calls by whole genome sequencing, such as NA12878, are available through the Coriell Institute for Medical Research, Camden, New Jersey, and the National Institute of Standards and Technology, Gaithersburg, Maryland. These data were generated by the Genome in a Bottle Consortium by integrating and arbitrating among 14 data sets. (24,25) Positive controls may also be obtained through clinical and/or research laboratories by using previously tested methods such as NGS or Sanger sequencing or SNP arrays.

Analytical sensitivity is the likelihood that the assay will detect a sequence variant when present within the targeted region (1--false-negative rate). This is determined by dividing the number of known variants (true positives) detected by the NGS targeted panel by the total number of known variants detected by a reference method or data set. It is recommended that recurrent disease-causing variants be included in the analyses because these may be seen frequently in a disease cohort. (6)

Analytical specificity is the likelihood that the assay will be negative when no variant is present (1--false-positive rate). This measurement is established by dividing the number of true negatives (known reference alleles) by the sum of true negatives and false positives, typically obtained by comparison with the results obtained by a reference method such as Sanger sequencing (or the National Institute of Standards and Technology's high-confidence sequence generated for NA12878).

Knowing that current sequencing platforms and bioinformatic pipelines exhibit differences in their capacity to detect different classes of genetic variations, it is recommended that analytical sensitivity and specificity be established separately for each type of sequence variation such as SNVs, indels, and copy number variants, if applicable.

Example

For the methods-based validation study conducted at the Children's Hospital of Philadelphia, 30 samples were used. Among these, 15 samples were previously characterized to carry pathogenic mutations in various target genes across the 5 disease groups (Table 1). The remaining 15 samples were negative controls, which included 13 DNA specimens that tested negative for mutations in selected genes and 2 HapMap samples (NA12878 and NA19240). Genomic DNA was extracted from blood or other patient tissues following standard DNA extraction protocols in the laboratory. Coding regions with 10-bp flanking intronic sequences of genes of interest were enriched using the SureSelectXT Target Enrichment System (Agilent Technologies) for Illumina Paired-End Sequencing Library. Differentially indexed postcapture libraries were sequenced using the Miseq 2 X 150-bp V2 Regent Kit (Illumina).

In order to determine sensitivity and specificity of this assay, additional data sets were obtained for select patient samples that had been analyzed previously by alternative technologies. Single-nucleotide variant array data were obtained for 10 of the samples from either the Children's Hospital of Philadelphia cytogenomics laboratory or public databases (such as the 1000 Genomes and EVS database). In addition, whole-genome sequencing data for the 2 HapMap samples were obtained from the Broad Institute and Illumina, respectively, and the consolidated SNV data by the Genome in a Bottle Consortium (24) for one of the HapMap samples (NA12878). For 5 patients, variant information was available on the Noonan panel of genes through a previously validated NGS protocol. Detailed information for samples used in this validation study and corresponding reference data sets are listed in Table 1.

Analytical sensitivity and specificity of the assay were calculated by comparing variants identified in this assay with variants identified in the reference data sets. For samples with SNV array results available, every position with an array call was analyzed for concordance with the result obtained through the NGS assay (ie, the MiSeq result). Discordant variants were further resolved using Sanger sequencing analysis. Results of sensitivity and specificity studies are shown in Table 3 and Supplemental Tables 4 and 5.

Recurring false-positive variants in HRAS and MAP2K2 were identified (Supplemental Table 4). Both of these variants were flagged to have poor quality scores and would have been flagged by the laboratory for confirmation by Sanger dideoxy sequencing analysis. Repeating the assay with the same specimen with a new enrichment kit led to elimination of the 2 false-positive variants. It is unclear whether it was the enrichment kit or a sample preparation error in the original assay that led to the resolution of the discordant variants. Based on these results, it is recommended that laboratories leverage validation studies to understand the sources of false negatives and false positives and develop strategies to address them. For example, laboratories may choose to review the quality and alignment of the data using tools such as the Integrative Genomics Viewer. (26) It is recommended that laboratories develop quality metrics for acceptability of variant calls and a policy on when to confirm variants by an orthogonal method such as Sanger sequencing. Based on our experience with this validation and other additional data not shown here, we have set the following parameters for confirmation of variants by Sanger sequencing: (1) any variant with a read depth less than 20, (2) call quality less than 500 (Phred score of confidence P value), (3) genotype quality less than 99, (4) strand bias greater than 80% of variant reads align to single strand or (5) an allele frequency less than 40% for heterozygous variants or less than 95% for homozygous variants, and (6) any reportable disease causing variant (classified as variant of uncertain significance, likely pathogenic or pathogenic; Table 4). To be noted, these parameters are not meant for universal use because these are specific and unique to the sequencing and bioinformatics pipeline being used in a laboratory and are likely to undergo modifications as chemistries and informatics tools get updated. It is recommended that every laboratory determine these parameters based on their experience with their internal laboratory protocols.

Two HapMap samples were also used during the validation study. For NA12878, variants within the targeted regions were compared with a reference variant list. Discrepancy among the reference data sets was resolved by further examining the GATK filter and quality score for the Broad WGS data set, variant context, and the filter information in the GIAB variant list. For this study, true negatives were defined as positions without variants in the comparison reference sample. True positives were defined as positions with heterozygous or homozygous variant calls in the comparison reference sample. Comparison between the reference data set and the NGS panel data showed 1 false-positive SNV call and 1 false-positive indel in the panel data set. The false-positive SNV call showed very strong strand bias in the .bam file and the indel call was identified within a homopolymer region (>20 A), indicating that both variants were unlikely to be true positives. In summary, for NA12878, more than 469 121 positions within the targeted region were correctly called as true negatives and 264 variants were called correctly as true positives. For HapMap sample NA19240, the WGS variant data from Illumina and the 1000 Genomes Omin2.5 array data set were obtained and used as the reference data sets. Comparison between the reference data set and the variant set from the panel indicated that 469 047 positions were correctly called as true negatives and 339 variants were called correctly as true positives.

For samples with only one or a few genes previously tested in this laboratory, variant information was extracted from this NGS assay for genes previously tested and compared with the previous test result (Table 1). All SNVs and small indels (<5 bp) that are sufficiently covered (ie, with >30X minimum per base coverage) were successfully identified. Known pathogenic variants were compared and the results are shown in Table 1. A mutation in the ARX gene was not identified in the positive control because of low coverage (<30X). Exon 2 of ARX gene is GC rich and is traditionally a region that is difficult to sequence. Greater coverage increases the probability of correctly calling a variant; however, there are platform-specific upper limits to coverage. In targeted regions with low coverage, Sanger sequencing or another method may need to be incorporated in order to maximize sensitivity. In this study, all low coverage exons were sequenced with complementary Sanger sequencing; therefore, this variant was correctly identified.

In summary, analytical sensitivity and specificity for this method were more than 99% for SNV detection. For indel variants, detection sensitivity and specificity were more than 99% for small indels (<5 bp) and variants within nonhomopolymer regions (<7 of the same nucleotide in a row).

Precision refers to the reproducibility or "robustness" of the assay, meaning the ability to obtain the same results from the same sample when the assay is performed repeatedly. For reproducibility, both intrarun and interrun reproducibility should be assessed. To evaluate intrarun precision (repeatability), 3 libraries were prepared from the HapMap DNA sample NA12878 in parallel, each with a unique index. An equimolar amount of each library was pooled and sequenced on the same Miseq flow cell. To evaluate interrun precision (reproducibility), the HapMap NA127878 DNA was captured and sequenced in another independent run. Variants called for each sample/run were compared among the 3 intrarun library samples (NA12878I, NA12878II, and NA12878III) for assessing intrarun repeatability and between the interrun library sample (NA12878) and each of the 3 intrarun samples for assessing interrun reproducibility. Reproducibility was calculated by dividing number of discordant calls by total positions in the region of interest; results are shown in Table 5.

Reference and Reportable Range

Reference range is defined as the range of test values expected for a designated population of individuals (US CFR 493; February 28, 1992). The range is determined by testing a series of specimens from a given population who are known to be free of the disease of concern. (23) In the example, reference range is defined as the normal variation of sequence within the population that the assay is designed to detect. Variation in normal individuals can include single-base changes, insertions, deletions, and copy number variation. Reportable range is defined as the portion of the genome for which sequence information can be reliably derived for a defined test system. There may be areas of the targeted regions that cannot be sequenced reliably and thus would be excluded from the reportable range. For example, the targeted panel assay validated here is designed to detect only germline mutations and is not validated for detection of somatic mutations. Based on the validation results and the technical limitations of NGS, variants in homopolymer regions, indels more than 5 bp in size, genes with high homology to pseudogenes or within repetitive regions, and exon level copy number variation were determined to be beyond the reportable range of this assay and thus were excluded for calculation of sensitivity and specificity. Mutations within the promoter regions, deep intronic regions, or regulatory elements are outside of the targeted regions of this assay and thus would not be detected.

VARIANT INTERPRETATION

Genomic data have revealed the complexity of the human genome, and the concept of 1 gene-1 disease has changed.

This has implications in all areas of medicine and is not limited to rare diseases. (27) Variant interpretation is typically performed using data from population frequency databases, segregation analysis, mutation databases, reported studies, and putative impact on protein function.

It is recommended that variants be interpreted using the recently published variant classification guidelines. (28,29) Those variants that occur at a high frequency (usually greater than 5%) in a population are often filtered out by bioinformatic analysis. (29) However, for many rare disorders or for particular genes a frequency of 5% may be too high; thus, more stringent thresholds may be used for filtering if information on prevalence and penetrance is available. For reporting a variant, it is important to determine if the effect of the variant is consistent with the patient's phenotype and also to examine the segregation of the variant within the proband's family (when family members are available).

The American College of Medical Genetics and Genomics first published recommendations for sequence interpretation in 2005, and then again in 2008, with the most recent revision coming in 2015, introducing the 5term classification system. (28-30) These guidelines are specifically directed toward inherited disease testing in clinical laboratories, though they have also been used for somatic variant classification. For population frequencies, data from large-scale sequencing projects, such as the 1000 Genomes Project and projects focused on data aggregation, such as the Exome Aggregation Consortium and Genome Aggregation databases, are now freely available for use in research and diagnostic settings. (31,32) The Clinical Genome Resource aims to improve our understanding of genomic variation through data sharing and collaboration, starting with aggregating sequence and structural variants in the National Center for Biotechnology Information's publicly available ClinVar knowledge base. (33-35) There are several other useful databases, such as the Leiden Open Variation Database (www.lovd.nl/3.0/ home), the Human Gene Mutation Database (www.hgmd. cf.ac.uk/ac/index.php), and disease-specific databases such as the Clinical and Functional Translation of CFTR Mutation Database (cftr2.org) that can be very helpful in obtaining variant information. All publications addressing segregation in families and controls must be carefully reviewed. Functional studies are helpful in determining if the variant impacts normal function or expression. However, these studies may be challenging to interpret because there are no perfect model systems and results may be contradictory among different analyses. The final report should include the variant classification and all the evidence supporting the variant classification, including references and whether the variant(s) detected fully or partially explain the patient's phenotype. (29)

As the number of sequencing variants grows, additional evidence may warrant variant reanalysis. For example, the access to sequence data on more than 60 000 individuals in the Exome Aggregation Consortium database (exac. broadinstitute.org) has led many variants of uncertain significance to be reclassified as benign. To set appropriate expectations, laboratories may develop policies on the reanalysis of genetic data.

QUALITY ASSURANCE AND QUALITY CONTROL

General quality assurance and quality control recommendations are stated in the Clinical Laboratory Amendment of 1988, and more specific molecular and sequencing quality assurance and quality control recommendations have been articulated by the Clinical Laboratory Standards Institute (MM9-A2, MM20) and the College of American Pathologists, the American College of Medical Genetics and Genomics, and the Association for Molecular Pathology. (3,6,22,36)

A quality assurance program for NGS testing will assess preanalytic, analytic, and postanalytic processes used from enrichment and sequence analysis through reporting. The program addresses problems that arise in the course of testing, such as events that can affect the test result or nonconformance with the laboratory's own policies and procedures. Documentation includes both review of the effectiveness of corrective actions taken and the revision of policies and procedures intended to prevent recurrence.

Documentation of all testing processes is a critical part of laboratory quality assurance. All standard operating protocols of DNA/RNA sample preparation, fragmentation, library preparation, bar coding (molecular indexing), sample pooling, and sequence generation are documented so that each step and subsequent manipulations can be traced. Metrics and quality control parameters used to assess run performance are also documented. Commonly used metrics include the fraction of bases meeting specified quality and coverage thresholds and average coverage/base and target region (Table 4; Supplemental Table 3). The laboratory should define and document acceptance and rejection criteria for each test step. It is critical to determine and summarize regions that failed analysis (eg, because of inadequate coverage) if they are not covered by orthogonal technologies such as Sanger sequencing. Assuring sample traceability throughout the whole analysis workflow is critical so that sample swaps can be easily detected.

The routine application of a validated bioinformatics pipeline is accompanied by monitoring of laboratory-determined quality control metrics. Divergence from expected quality metrics during the analysis of clinical samples requires investigation and resolution. These metrics are assessed per run/sample as well as routinely to detect trends. An example would be when the bioinformatics output of NGS data demonstrates an insufficient number of sequence reads passing an expected or required base quality score threshold. Deviations may indicate a technical aberration or process failure occurring during technical wet bench procedures or during a step in the bioinformatics pipeline. It is suggested that the clinical laboratory review a summary of the quality scores, metrics, and total number of reads to determine overall quality of the run before start of alignment given the time and other resources required. Quality control procedures are designed to ensure expected test performance, detect assay failure, and provide confidence that a reliable result is generated.

Example

For the NGS targeted panel performed at the Children's Hospital of Philadelphia, the quality control metrics along with the criteria used are listed in Table 4. Preanalytic, analytic, and postanalytic metrics of the wet bench as well as the bioinformatics pipeline are established, providing criteria for beginning to end of the NGS workflow. Metrics are monitored per sample/run and assessed monthly as part of a continuous quality improvement program.

IMPLEMENTATION

Before implementation, a validation report must be written and approved by the laboratory director. This report should include an introduction to the test; the diseases/ genes being tested; a description of the samples, controls, and methodology used including bioinformatics; validation parameters such as precision, specificity, sensitivity, reportable range, and reference range; and clinical validity and utility of the test. A standard operating procedure is composed that includes test indication, intended use, test principle, specimen handling and storage, reagents and controls, equipment, the stepwise assay procedure, results interpretation and report generation, and references. Integration into the clinical workflow involves training technologists who will perform the test. Training technologists comprises not only technical aspects of running the test, but also disease information to aid in the understanding of a result and its interpretation. Report templates for negative, positive, and uncertain results are drafted; however, customization is often performed and determined by the classification of the observed genetic variants. All equipment that is used should be properly installed, inspected, and maintained continually as long as the test is offered. Procedures for instrument, operation, and performance qualification are available and in place. Quality control and quality assurance measures, including proficiency testing and archiving of records, reports, and tested specimens, should be performed. The billing mechanism and budgetary allocations should also be finalized before the test is operational. Appropriate regulatory agencies may need to be notified (and in some cases may require preapproval) before test implementation. All of these measures need to be ready before offering the test. It is important to keep in mind that validation is a continuous process of monitoring, documentation, and improvement. This is especially significant in the continually evolving field of NGS with frequent improvements in technology and informatics tools. Clinical laboratories must therefore carefully balance improvements in test performance with available resources.

The authors would like to thank Mahdi Sarmady, PhD; Kajia Cao, PhD; Laura Conlin, PhD, FACMG; and Hakon Hakonarson, MD, PhD for their support. We thank Patricia Vasalos, BS, and Jaimie Halley, BS, for providing support and coordination for all the next-generation sequencing validation manuscripts in this series; they both are employees of the College of American Pathologists (Northfield, Illinois).

References

(1.) Aziz N, Zhao Q, Bry L, et al. College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. Arch Pathol Lab Med. 2015; 139(4):481-493.

(2.) Rehm HL, Bale SJ, Bayrak-Toydemir P, et al. ACMG clinical laboratory standards for next-generation sequencing. Genet Med. 2013; 15(9):733-747.

(3.) Gargis AS, Kalman L, Berry MW, et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol. 2012; 30(11):1033-1036.

(4.) Pont-Kingdon G, Gedge F, Wooderchak-Donahue W, et al. Design and analytical validation of clinical DNA sequencing assays. Arch Pathol Lab Med. 2012; 136(1):41-46.

(5.) Lacbawan FL, Weck KE, Kant JA, et al. Verification of performance specifications of a molecular test: cystic fibrosis carrier testing using the Luminex liquid bead array. Arch Pathol Lab Med. 2012; 136(1):14-19.

(6.) College of American Pathologists Molecular Pathology Resource Committee. Molecular checklist. In: College of American Pathologists, ed. Title. July 28, 2015 ed. Northfield, IL; College of American Pathologists; 2015:1-50.

(7.) Lin HY, Chong KW, Hsu JH, et al. High incidence of the cardiac variant of Fabry disease revealed by newborn screening in the Taiwan Chinese population. Circ Cardiovasc Genet. 2009; 2(5):450-456.

(8.) Robin NH, Falk MJ, Haldeman-Englert CR. FGFR-related craniosynostosis syndromes. In: Pagon RA, Adam MP, Ardinger HH, et al., eds. GeneReviews. Seattle: University of Washington, Seattle; 1993:1-42.

(9.) Rannan-Eliya SV, Taylor IB, De Heer IM, Van Den Ouweland AM, Wall SA, Wilkie AO. Paternal origin of FGFR3 mutations in Muenke-type craniosynostosis. Hum Genet. 2004; 115(3):200-207.

(10.) Agochukwu NB, Doherty ES, Muenke M. Muenke Syndrome. In: Pagon RA, Adam MP, Ardinger HH, et al., eds. GeneReviews. Seattle: University of Washington; 1993:1-29.

(11.) Mandelker D, Schmidt RJ, Ankala A, et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing [published online ahead of print May 26, 2016]. Genet Med. 2016. doi:10.1038/gim.2016.58.

(12.) Mason-Suares H, Landry L. Lebo MS. Detecting copy number variation via next generation technology. Curr Genet Med Rep. 2016; 4(3):74-85.

(13.) Mandelker D, Amr SS, Pugh T, et al. Comprehensive diagnostic testing for stereocilin: an approach for analyzing medically important genes with high homology. J Mol Diagn. 2014; 16(6):639-647.

(14.) Francey LJ, Conlin LK, Kadesch HE, et al. Genome-wide SNP genotyping identifies the Stereocilin (STRC) gene as a major contributor to pediatric bilateral sensorineural hearing impairment. Am J Med Genet A. 2012; 158A(2):298-308.

(15.) Mardis ER. Next-generation sequencing platforms. Annu Rev Anal Chem (Palo Alto Calif). 2013; 6:287-303.

(16.) Metzker ML. Sequencing technologies--the next generation. Nat Rev Genet. 2010; 11(1):31-46.

(17.) Mamanova L, Coffey AJ, Scott CE, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010; 7(2):111-118.

(18.) Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics. 2011; 27(15):2156-2158.

(19.) O'Rawe J, Jiang T, Sun G, et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 2013; 5(3):28.

(20.) Reumers J, De Rijk P, Zhao H, et al. Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing. Nat Biotechnol. 2012; 30(1):61-68.

(21.) Ross MG, Russ C, Costello M, et al. Characterizing and measuring bias in sequence data. Genome Biol. 2013; 14(5):R51.

(22.) College of American Pathologists Molecular Pathology Resource Committee. All common checklist. In: College of American Pathologists, ed. Title. July 28, 2015 ed. Northfield, IL: College of American Pathologists; 2015:1-39.

(23.) Jennings L, Van Deerlin VM, Gulley ML; College of American Pathologists Molecular Pathology Resource Committee. Recommended principles and practices for validating clinical molecular pathology tests. Arch Pathol Lab Med. 2009; 133(5):743-755.

(24.) Zook JM, Chapman B, Wang J, et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014; 32(3):246-251.

(25.) Zook JM, Catoe D, McDaniel J, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016; 3: 160025.

(26.) Robinson JT, Thorvaldsdottir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol. 2011; 29(1):24-26.

(27.) Macarthur DG. Challenges in clinical genomics. Genome Med. 2012; 4(5): 43.

(28.) Richards CS, Bale S, Bellissimo DB, et al. ACMG recommendations for standards for interpretation and reporting of sequence variations: revisions 2007. Genet Med. 2008; 10(4):294-300.

(29.) Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015; 17(5):405-424.

(30.) Maddalena A, Bale S, Das S, Grody W, Richards S; ACMG Laboratory Quality Assurance Committee. Technical standards and guidelines: molecular genetic testing for ultra-rare disorders. Genet Med. 2005; 7(8):571-583.

(31.) Lanthaler B, Wieser S, Deutschmann A, et al. Genotype-based databases for variants causing rare diseases. Gene. 2014; 550(1):136-140.

(32.) Watt S, Jiao W, Brown AM, et al. Clinical genomics information management software linking cancer genome sequence and clinical decisions. Genomics. 2013; 102(3):140-147.

(33.) Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016; 44(D1): D862-D868.

(34.) Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol. 2013; 425(21):4047-4063.

(35.) Rehm HL, Berg JS, Brooks LD, et al. ClinGen--the Clinical Genome Resource. N Engl J Med. 2015; 372(23):2235-2242.

(36.) Endrullat C, Glokler J, Franke P, Frohme M. Standardization and quality management in next-generation sequencing. Appl Transl Genom. 2016; 10:2-9.

Avni Santani, PhD; Jill Murrell, PhD; Birgit Funke, PhD; Zhenming Yu, PhD; Madhuri Hegde, PhD; Rong Mao, MD; Andrea Ferreira-Gonzalez, PhD; Karl V. Voelkerding, MD; Karen E. Weck, MD

Accepted for publication November 30, 2016.

Published as an Early Online Release March 21, 201 7.

Supplemental digital content is available for this article at www archivesofpathology.org in the June 2017 table of contents.

From the Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia (Dr Santani); the Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania (Drs Santani, Murrell, and Yu); the Department of Pathology, MGH/Harvard Medical School, Boston, Massachusetts (Dr Funke); the Laboratory for Molecular Medicine at Partners HealthCare, Personalized Medicine, Cambridge, Massachusetts (Dr Funke); the Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia (Dr Hegde); the Department of Pathology, ARUP Laboratories Institute for Clinical and Experimental Pathology (Dr Mao) and the Department of Pathology (Dr Voelkerding), University of Utah School of Medicine, Salt Lake City; the Division of Molecular Diagnostics, Department of Pathology, Virginia Commonwealth University, Richmond (Dr Ferreira-Gonzalez); Genomics and Bioinformatics, ARUP Laboratories, Salt Lake City, Utah (Dr Voelkerding); and the Department of Pathology and Laboratory Medicine and Genetics, University of North Carolina at Chapel Hill (Dr Weck).

Dr Santani received royalties from Agilent Technologies, was a consultant for Invitae, and has an honorarium from Arcadia University, Cambridge Healthtech Institute. The other authors have no relevant financial interest in the products or companies described in this article.

This manuscript is being submitted on behalf of the College of American Pathologists Biochemical and Molecular Genetics Committee and College of American Pathologists Next-Generation Sequencing Project Team.

Reprints: Avni Santani, PhD, Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, 3615 Civic Center Blvd, 716ARC, Philadelphia, PA 19104 (email: SANTANI@email.chop.edu).

Caption: Figure 1. Decision matrix for dealing with homologous sequences. A sample process for evaluating the test contents for homology including possible approaches for dealing with homology is shown. Key decision nodes depend on whether or not alternate assays (such as unique Sanger sequencing assays that discriminate against pseudogenes) need to be developed to ensure maximum clinical utility of the test. * Mappability calculated using 250-bp stretches at 98% homology. (11) Abbreviations: NGS, next-generation sequencing; PCR, polymerase chain reaction.
Table 1. Samples With Previously Identified Heterozygous
Variants That Were Used for Validation
of the Combined Disease Panels

Gene        Transcript          Variant        Type

CDKL5     NM_001037343.1   c.2384A>C,          SNV
                           p.Asn795Thr
CDKL5     NM_001037343.1   c.380A>G,           SNV
                           p.His127Arg
FGFR1     NM_023110.2      c.755C>G,           SNV
                           p.Pro252Arg
FGFR2     NM_000141.4      c.1018T>C,          SNV
                           p.Tyr340His
FGFR3     NM_001163213.1   c.749C>G,           SNV
                           p.Pro250Arg
FLCN      NM_144997.5      c.1285delC,         Indel
                           p.His429ThrfsX39
FLCN      NM_144997.5      c.1285dupC,         Indel
                           p.His429ProfsX27
GJB2      NM_004004.5      c.35delG,           Indel
                           p.Gly12ValfsX2
MAP2K1    NM_002755.3      c.388T>C,           SNV
                           p.Tyr130His
MYO7A     NM_000260.3      c.3719G>A,          SNV
                           p. R1240Q
MYO7A     NM_000260.3      c. 1344-2A>G        SNV
PCDH15    NM_033056.3      c.4024C>A,          SNV
                           p.Gln1342Lys
SLC26A4   NM_000441.1      c.565G>T,           SNV
                           p.Ala189Ser
SOS1      NM_005633.3      c.1654A>G,          SNV
                           p.Arg552Gly
TWIST1    NM_000474.3      c.396dupG,          Indel
                           p.Lys133GlufsX105
USH1C     NM_153676.3      c.1069C>T,          SNV
                           p.Arg357Trp
VHL       NM_000551.3      c.402delA,          Indel
                           p.Glu134AspfsX25

Gene      Chr Pos (GRCH37)             Panel

CDKL5     ChrX:18643255      Epileptic encephalopathy

CDKL5     ChrX:18598065      Epileptic encephalopathy

FGFR1     Chr8:38282208      Craniofacial

FGFR2     Chr10:123276899    Craniofacial

FGFR3     Chr4:1803571       Craniofacial

FLCN      Chr17:17119709     Hereditary cancer

FLCN      Chr17:17119709     Hereditary cancer

GJB2      Chr13:20763686     Hearing loss

MAP2K1    Chr15:66729180     RASopathy

MYO7A     Chr11:76901153     Hearing loss

MYO7A     Chr11:76873164     Hearing loss
PCDH15    Chr10:55591253     Hearing loss

SLC26A4   Chr7:107314758     Hearing loss

SOS1      Chr2:39249915      RASopathy

TWIST1    Chr7:19156549      Craniofacial

USH1C     Chr11:17542909     Hearing loss

VHL       Chr3:10188259      Hereditary cancer

Abbreviations: Chr Pos, chromosome position; indel,
insertion-deletion; SNV, single-nucleotide variant.

Table 2. Analysis of Hearing Loss Panel Genes for
GC Content and Regions of Homology

Gene       No. of     No. of     No. of    No. of    Notes
           Exons    Exons With    Exons     Exons
                     Homology    <35% GC   >75% GC
                      Issue

STRC         29         28          0         0      Alert: special
                                                     assay because of
                                                     high homology
                                                     (long stretches
                                                     are 100%
                                                     identical)

TMC1         20         1           4         0      Alert: check if
                                                     exon can be
                                                     covered by
                                                     Sanger (can
                                                     unique primers
                                                     be designed). If
                                                     not, can it be
                                                     dropped?

CISD2        3          1           1         0

OTOGL        58         0          34         0      Watch list: low
                                                     GC-can be
                                                     problematic

MYO6         34         0          27         0      Watch list: low
                                                     GC-can be
                                                     problematic

GPR98        90         0          24         0      Watch list: low
                                                     GC-can be
                                                     problematic

MYO3A        33         0          18         0      Watch list: low
                                                     GC-can be
                                                     problematic

HSD17B4      26         0          13         0      Watch list: low
                                                     GC-can be
                                                     problematic

PCDH15       39         0          12         0      Watch list: low
                                                     GC-can be
                                                     problematic

RDX          14         0          10         0      Watch list: low
                                                     GC-can be
                                                     problematic

SERPINB6     9          0           1         1      Watch list: high
                                                     GC-can be
                                                     problematic

GIPC3        6          0           0         1      Watch list: high
                                                     GC-can be
                                                     problematic

KCNQ1        17         0           0         1      Watch list: high
                                                     GC-can be
                                                     problematic

P2RX2        10         0           0         1      Watch list: high
                                                     GC-can be
                                                     problematic

TMIE         4          0           0         1      Watch list: high
                                                     GC-can be
                                                     problematic

Table 3. Summary of Analytical Sensitivity and

Specificity                           MiSeq Platform

Performance Measure            % (No./Total)     95% CI

Analytical sensitivity (all   100 (1600/1600)   0.99-1.0
variants)

Analytical specificity (all   100 (1 986 875/   0.99-1.0
variants)                       1 986 875)

Analytical sensitivity        100 (1582/1582)   0.99-1.0
(substitutions)

Analytical specificity        100 (1 048 132/   0.99-1.0
(substitutions)                 1 048 132)

Analytical sensitivity          100 (18/18)     0.78-1.0
(insertions/deletions)

Analytical specificity         100 (938 743/    0.99-1.0
(insertions/deletions)           938 743)

Table 4. Quality Control (QC) Metrics Monitored for a Targeted Panel

                         Criteria            Metrics Measured per
                                            Sample and Annually for
                                              Continuous Quality
                                                  Improvement

Preanalytic QC

Rejected          Wrong sample, type of     % of specimens rejected
samples           tube, insufficient
                  quantity, clotted,
                  mislabeled

DNA extraction    Optical density 260/      % of specimens with DNA
                  280 between 1.6 and 2.2   extraction failures

                  Optical density 260/
                  230 between 1.6 and 3.0

Analytic QC

Library           Fragmentation size,       % of specimens with
preparation QC    precapture library size   library preparation
                  and concentration,        failures
                  final library
                  preparation size and
                  concentration, sample
                  pooling

Sequencing QC     Cluster density: <1350    % of specimens with
                  k/[mm.sup.2]              sequencing run failures

                  Base quality: % of
                  bases with Q [greater
                  than or equal to] 30
                  should be >80

Bioinformatics    Total reads: >4M          % of specimens with
QC                                          bioinformatics failures

                  % Reads aligned: >90

                  Average target
                  coverage: >300X

                  % Region of interest: %
                  of bases [greater than
                  or equal to] 30X depth
                  should be >95

Confirmation      Sanger confirmation of    No. of variants that
using             all reported single/      failed confirmation
alternative       nucleotide variants and
techniques        insertions/deletions.
                  Variants flagged for
                  low quality:

                  Call quality <500

                  Genotype quality <99

                  Read depth <20

                  Strand ratio >80% of
                  variant reads align to
                  single strand

                  Heterozygous allele
                  ratio <40% for variant
                  (>60% for reference)

                  Homozygous allele ratio
                  <95% for variant (>5%
                  for reference)

Postanalytic QC

Average           6-8 weeks                 No. of specimens during
turnaround time                             stated turnaround time

Number of         Variable across disease   Monitor trends annually
positive and
negative cases

Amended or        Variable                  Monitor annually
corrected
reports

Table 5. Intrarun and Interrun Reproducibility (a)

Intrarun or   Sample Pairing          Discordant
Interrun                             Variants (b)

Intrarun      NA12878I-NA12878II          14
              NA12878II-NA12878III        14
              NA12878I-NA12878III         12
Interrun      NA12878-NA12878I            9
              NA12878-NA12878II           11
              NA12878-NA12878III          9

(a) For all pairings, total variants = 469 121 and reproducibility =
99.99% (95% CI, 0.99-1.0).

(b) Discordant variants were identified as poor quality because of
low coverage or strand bias.

Figure 2. Impact of baiting strategy on coverage at the end of
target regions. Baiting deeper into the intronic regions (coding DNA
Sequence [CDS] + 5 bp versus CDS - 50 bp) improved coverage at the
end positions of the variant reporting region.

CDS+15     Base Coverage      76X
CDS-15     Base Coverage      237X
CDS+/-15   Average Coverage   264X
COPYRIGHT 2017 College of American Pathologists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Santani, Avni; Murrell, Jill; Funke, Birgit; Yu, Zhenming; Hegde, Madhuri; Mao, Rong; Ferreira-Gonza
Publication:Archives of Pathology & Laboratory Medicine
Article Type:Report
Date:Jun 1, 2017
Words:9181
Previous Article:Serum Bilirubin Concentrations in Patients With Takayasu Arteritis.
Next Article:Validation of OncoPanel: A Targeted Next-Generation Sequencing Assay for the Detection of Somatic Variants in Cancer.
Topics:

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |