Printer Friendly

Integration of Technical, Bioinformatic, and Variant Assessment Approaches in the Validation of a Targeted Next-Generation Sequencing Panel for Myeloid Malignancies.

Next-generation sequencing (NGS), or massively parallel sequencing, analysis of myeloid malignancies, including acute myeloid leukemias (AMLs), (1-5) myelodysplastic syndromes (MDSs), (6) myeloproliferative neoplasms (MPNs), (1) and MDS/MPNs, (7) has yielded a number of significant advances in the identification of diagnostic, prognostic, predictive, and therapeutic biomarkers for these disorders. (1,2,8) Genes and variants relevant to tumor progression, tumor evolution in response to therapy, and minimal residual disease assessment in myeloid malignancies have also been identified via whole-genome or whole-exome sequencing, often in concert with whole-transcriptome sequencing. (9-11) Recent advances define AML with biallelic CEBPA mutations as a separate disease entity, (12) with patients exhibiting this molecular profile having a significantly better prognosis (13) compared with patients with wild-type or single-variant CEBPA cases. Likewise, CALR exon 9 mutations are now considered diagnostic of primary myelofibrosis, (12) and CALR type 1-like (c.1099_1150del and similar variants) and type 2-like (c.1154_1155insTTGTC and similar) variants are known to have differential prognostic impact. (14) The presence of both SF3B1 and JAK2 mutations is diagnostic and prognostic for refractory anemia with ring sideroblasts with thrombocytosis, (15) now classified as a specific subtype of MDS/MPN. (12) Variants in BRAF, SMC1A, SMC3, RAD21, STAG2, NRAS, IDH1, IDH2, SRSF2, and SETBP1 have also recently been reported to be relevant for disease subclassification or treatment across multiple hematologic malignancies. (1,2,16,17) A number of these clinically important genes are known to contain variants with larger insertions/ deletions (eg, FLT3, CALR). Other genes are particularly GC rich (CEBPA), and may therefore pose a challenge to amplification during library preparation and subsequent detection using NGS technology.

Several best-practice recommendations for the validation and implementation of NGS approaches have been developed for the clinical diagnostic laboratory setting. (18-20) As with all clinical tests, NGS tests need to be assessed by performance characteristics of analytical sensitivity, specificity, accuracy, reproducibility, linearity, limit of detection, and reportable range. (18) Next-generation sequencing testing has additional unique challenges that must be considered, such as genomic regions that are difficult to sequence and that may require complementary non-NGS analytic solutions or bioinformatics solutions to ensure complete analysis. In addition, NGS assays comprise several distinct parts: wet-bench components, bioinformatics approach(es), and clinical interpretation of the variant calls, each of which needs to be considered in designing and conducting NGS assay validation.

In this report, we describe the validation approach for implementation of an NGS diagnostics test for myeloid malignancies in a hospital-based clinical molecular diagnostic setting. We evaluated the performance characteristics and test limitations of the Illumina TruSight Myeloid Sequencing Panel (TSMP; 54 genes, 568 amplicons; Illumina, San Diego, California), and tested it against a range of sample types and from various hematologic malignancies. We assessed the TSMP for gaps in NGS output and developed complementary approaches to ensure evaluation of all clinically relevant regions. Finally, we assessed bioinformatics and variant assessment approaches to streamline the postanalytic workflow for panel implementation.

METHODS

Assay Design and Genes/Regions Included

The TSMP is an amplicon-based panel targeting regions of 54 genes with known involvement in various hematologic malignancies, including AML, MDS, and MPNs. These genes include variants that provide clinically relevant diagnostic, prognostic, predictive, or therapeutic information. The panel consists of 568 unique amplicons of ~250 base pairs (bp) in length, with a total genomic footprint covering ~141 kb, targeting the complete exonic regions of 15 genes and exonic hot spots of 39 genes.

Validation Samples Used

Samples used in the validation of this test included DNA from peripheral blood leukocytes and bone marrow samples from a number of different hematologic malignancy types, including AML, MDS, MPN, and MPN/MDS. Peripheral blood leukocyte and bone marrow samples were extracted using phenol/chloroform or an automated extraction method where DNA purification was performed using 350 [micro]L of bone marrow or whole blood on the MagAttract DNA Blood Midi M48 Kit and processed on the BioRobot M48 workstation (Qiagen, Hilden, Germany). For analysis of several test performance characteristics, we also used DNA extracted from the reference HapMap samples NA12878 and NA19240. The HapMap cell lines NA12878 and NA19240 have been widely characterized on multiple NGS platforms in various clinical laboratories and are used globally as reference materials for evaluating assays in development. The variant SNP profiles for the HapMap samples NA12878 and NA19240 are known and documented in the Genetic Testing Reference Materials program (http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/; accessed August 19, 2016). This repository houses data obtained by whole genome sequencing, by whole-exome sequencing, or from targeted tests generated from 12 clinical and research laboratories.

NGS Library Preparation and Sequencing

Library preparation was performed using 50 ng of DNA, using the MiSeq Reagent Kit v3 chemistry according to Illumina's standard protocol. Each library preparation includes sample-specific indices, which allow for pooling of libraries prior to sequencing using 2 X 250-bp reads. All validation samples were run on the Illumina MiSeq platform. Run parameters and data output from each run were obtained and compared against specifications outlined by the manufacturer (Illumina). Cluster densities, reads passing filter, and output greater than Q30 values met and exceeded specifications issued by the manufacturer (>50 million reads passing filter per run; cluster density between 1000 and 1500 K/[mm.sup.2]; Q30 > 80%, median data yield per run, 13.9 giga-bases).

Data Analysis and Bioinformatics

Data analysis steps were performed using NextGENe v.2.3.1 (SoftGenetics, State College, Pennsylvania), and included read quality trimming, alignment to the reference genome (version hg19), and calling of single-nucleotide variants and short insertions or deletions. Data were viewed on the NextGENe Viewer v2.3.1. Read alignment and variant calling were also performed using the MiSeq Reporter (MSR) v 2.4 (Illumina) software, and annotated using VariantStudio 2.2 (Illumina). Depth of coverage for all samples was obtained from the Genome Analysis Toolkit's21 DepthOfCoverage walker with count_reads setting, which allows overlapping regions to contribute to the count (ie, counts all reads independently, even if from the same fragment); duplicate reads were not marked in the processing of the raw data for amplicon-based panels where reads have the same start and end. Depth-of-coverage analysis was then performed using a custom bioinformatics script. This script parses Genome Analysis Toolkit depth-of-coverage outputs, retrieves coverage values for all the loci, and generates locus-based, interval-based, gene-based, and sample-based graphical distributions of coverage. The script also generated lists of loci, intervals, and genes covered less than 500X in percentiles ranging from 5 to 100 by considering all the samples included in the analysis.

Variants detected in certain genomic regions covering coding regions of genes including IKZF1 and CUX1 were found to be incompletely mapped and annotated on both software systems, NextGENe and VariantStudio. Variants in these genes were manually annotated using alternate sources including Alamut Visual v.2.7.2 software (Interactive Biosoftware, Rouen, France). All samples were further assessed for the presence of the type [I.sup.22] [c.1099_1150del (p.Leu367fs)] variant in CALR using a laboratory custom-designed CALR detection algorithm, independent of NextGENe and MSR software. Outputs were generated for each sample outlining the number of reads that match to wild-type CALR sequence and the 52-bp deletion variant.

Criteria for Variant Filtering and Selection for Review

Variant lists were evaluated in regions of coverage greater than 100X, with variant allele fraction greater than 5%. Variants that were present in the reference population data sets (1000 Genomes phase 1 release v3.20101123, 1000 Genomes phase 3 release v5.20130502, (23) variants in the ESP6500SI-V2 data set of the exome sequencing project [http://evs.gs.washington.edu/EVS/, accessed August 2016], annotated with SeattleSeqAnnotation137, Exome Aggregation Consortium (24) release 0.3, Database of Single Nucleotide Polymorphisms (25) build 141 GRCh37.p13) at a global minor allele frequency greater than 1% were filtered out from the analysis. Variants that repeatedly occurred in more than 10% of cases and were not located within a mutation hot-spot region or previously reported in the literature were compiled, recorded as suspected repeating artifacts, and excluded from further analysis. All variant calls that met reporting criteria for depth of coverage and allele fraction were investigated on output files from bioinformatics software packages. Variants that matched all reporting criteria but were detected on only one variant caller were selected for verification by orthogonal methods including Sanger sequencing or restriction fragment length polymorphism and assessed in detail to determine the reason for discordance between callers.

Orthogonal Verification Methods

In order to have a high-throughput means to verify NGS-detected variants, we used a laboratory-developed Sequenom MassARRAY platform (Agena Bioscience, San Diego, California) assay, which analyzes 189 variants in 26 genes (hematologic malignancies panel [HMP]). This panel was previously validated using 32 cases that were known positive by other methods for the NPM1 p.Trp288fs, FLT3-TKD, and JAK2 p.Val617Phe variants, along with a panel of 7 cell lines positive for variants in BRAF, JAK2, KRAS, NRAS, and NPM1, with resultant sensitivity and specificity for the HMP of 100% in comparison with non-NGS methods. Other molecular tests routinely used in the clinical laboratory were used for detection of KIT, NPM1, and FLT3-ITD variants as described previously. (26,27) Testing of CALR was performed using DNA extracted from peripheral blood or bone marrow and analyzed for insertions or deletions in exon 9 using fluorescent polymerase chain reaction (PCR) followed by fragment analysis. All samples were also amplified by PCR and sequenced in both directions. Data from other NGS assays (TruSeq Amplicon Cancer Panel; Illumina) were also available for a subset of cases, (28) and were used for comparing results from overlapping regions.

RESULTS

Validation Samples Profiled

Data from a total of 139 cases were profiled to assess panel performance, and included data from 72 AML cases (51.8%), 10 MDS cases (7.2%), 26 MPN cases (18.7%), 6 controls (4.3%), and 25 samples (18.0%) from other hematologic malignancies (Figure 1, A). We detected 375 variants in 41 genes for the 139 cases in the validation cohort; variant types detected include single-nucleotide variants, insertions, deletions, and indels (Figure 1, B and C).

Coverage Profiles Across the Panel

The number of samples profiled in a single NGS run is dependent on the capacity of the sequencing instrument. We profiled up to 24 samples using the TSMP per sequencing run on the MiSeq, such that more than 95% of the targeted region achieves greater than 500X coverage in every sample. Data output from 4 TSMP sequencing runs, profiling 92 cases, indicated that average coverage per sample on the TSMP ranged from 2823X to 4801X (median, 3758X). To identify gaps in coverage, genomic regions with coverage of less than 500X in at least 90% of samples were extracted and analyzed. This includes portions of coding regions in BCOR, CEBPA, CUX1, GATA2, HRAS, RUNX1, and STAG2. Chromosome coordinates for all identified low-coverage regions were compared against those sites with recurrent mutations (>50 occurrences) reported in the COSMIC database in order to determine if there were mutation hot-spot regions located within the region of poor coverage. This analysis indicated that the low-coverage regions identified from the panel do not contain known mutation hot spots in any of the genes. Data from 4 TSMP sequencing runs were also compared at the amplicon level using average coverage per amplicon. Data indicated that 94.7% (538 of 568) of amplicons were covered at greater than 500X; only 5.3% (30 of 568) of amplicons had a coverage less than or equal to 500X (Figure 2, A). Of these, 2.14% (12 of 568) of all amplicons failed to reach a coverage depth of 100X. Gene regions included in these low-coverage amplicon sets (<100X) were consistent with those reported above (Figure 2, B).

Use of Orthogonal Methods for Verification

Seventy-one of 139 cases (51.1%) from the TSMP validation sample set were tested on one or more of the following orthogonal platforms: Sanger sequencing, HMP, the Illumina TruSeq Amplicon Cancer Panel, or routinely used and validated single-gene clinical laboratory tests as described in Methods. The data obtained from different platforms were compared to determine concordance among tests within mutually covered genomic regions. In the 71 samples, variant calls for 162 of 163 mutually covered single-nucleotide variants (99.4%) and insertions/deletions of up to 33 bp were concordant between the TSMP and any one of the other orthogonal testing platforms. One variant was identified as a FLT3 tyrosine kinase domain (TKD) mutation positive (typically codon 835) in the single-gene FLT3 assay, but reported by NGS as a 3-bp deletion that resulted in an in-frame deletion of codon 836. A secondary verification of this variant by Sanger sequencing to clarify this discrepancy could not be performed as there was insufficient sample.

Test Performance Characteristics

We evaluated a range of test performance characteristics for the TSMP, including assessment of interfering substances, analytical sensitivity, analytical specificity, reproducibility and repeatability, accuracy, linearity, and limit of detection, defined as in Clinical and Laboratory Standards Institute MM09-A2, 2014 edition. (47)

Analytical Sensitivity and Specificity.--Analytical sensitivity is defined as the proportion of biological samples that have a positive test result or known variant and that are correctly classified as positive, that is, the likelihood that the assay will detect a sequence variant, if present; analytical specificity is defined as the ability of a test to detect only the target analytes that is, the probability that the assay will not detect a sequence variation if not present. Samples for which variants were present and called within regions that are mutually covered on TSMP and HMP were identified as true positives. A total of 85 true-positive variants from 38 cases were detected mutually between the TSMP and the HMP. There were no false negatives for variants with greater than 5% allele fraction in regions covered at depth greater than 500X (analytical sensitivity = 100%; 95% CI, 95.75%-100%). Thirty-three samples previously tested on the HMP and known to be negative for variants were also tested on the TSMP. All samples (33 of 33) had no reportable variants within the regions covered in both panels (analytical specificity = 100%; 95% CI, 89.42%-100.00%). There were no variants in this data set that met the criteria for false positives.

Test Precision.--Test precision is defined as the closeness of agreement between indications or measured quantity values obtained by replicate measurements on the same or similar objects under specific conditions, that is, the degree in which a repeated measurement gives the same result. Test precision includes 2 concepts: repeatability and reproducibility. Reproducibility is the degree to which the same sequence is derived when sequencing is performed by multiple operators, with multiple lots of reagents, by more than one instrument, and, when applicable, from site to site (also known as robustness). Repeatability is the degree to which the same sequence is derived in sequencing the same reference sample many times under the same conditions. Test precision was assessed via analysis of the same samples run multiple times in the same NGS run, on different runs, and between technologists.

Interrun Sample Reproducibility.--To test interrun sample reproducibility, allele frequencies for 31 variants from 9 samples profiled in duplicate on different runs were compared. There was 100% concordance in the ability to detect variants between the 2 runs for all samples (Figure 3, A; [R.sup.2] for correlation between variant allele frequencies = 0.94).

Intertechnologist Reproducibility.--To test intertechnologist reproducibility, selected samples were profiled by 2 different technologists on 2 different runs. Allele frequencies for 23 variants from 6 samples were compared. There was 100% concordance in our ability to detect variants between the 2 runs (Figure 3, B; [R.sup.2] for correlation between variant allele frequencies = 0.95).

Intrarun Sample Reproducibility.--To test intrarun sample reproducibility, 1 sample was profiled 3 times in the same run using 3 different library preparations (index sets 1-3; Figure 3, C). There was 100% concordance in the detection of all 53 variants between the samples.

Test Accuracy, Linearity, and Limit of Detection.--Test accuracy is defined as the closeness of agreement between a measured quantity value and a true quantity value--the degree of agreement between the nucleic acid sequences derived from the assay and a reference sequence. Linearity is the ability of the test to return values directly proportional to the concentration of the analyte in the sample. Limit of detection refers to the lowest amount of analyte measured that can be detected; the minimum detectable allelic fraction in a given sample. To test the accuracy of the TSMP, we used the reference Coriell cell lines NA12878 and NA19240 and compared concordance between the variants called by our informatics pipeline against those reported by the GetRM project. All publicly reported high-quality variants (true-positive variants identified on 2 distinct technologies) listed in the Genetic Testing Reference Materials project for the sample NA12878 were documented and compared with our list of variants. All high-quality variants reported for NA12878 were detected by the TSMP. All high-quality variants published for NA19240 were also identified on the TSMP (data not shown). There was 100% concordance between the variants detected on the TSMP and those previously reported in studies using the HapMap cell lines.

We also titrated the HapMap cell lines NA12878 and NA19240 against each other in a dilution series of 1X, 2X, 4X, 10X, and 100X. All dilutions were assayed using the TSMP to assess variant profiles, identify presence of known variants, and establish limit of detection. Allele frequencies of variants detected in both cell lines demonstrated a linear decline consistent with sample dilution. The TSMP therefore detects variants at frequencies directly proportional to the concentration of the analyte in the sample (Figure 4, A). Similar results were obtained using the cell line NA12878, using a set of 6 known variants (data not shown). The expected and observed allele frequencies for 21 previously described variants uniquely present in cell line NA19240 were identified and compared across 6 dilution levels. All variants (21 of 21) were present and reliably detected at dilutions down to a minimum of 5% variant allele fraction (1:9 dilution). All genomic positions were covered at a minimum depth of coverage of 500X (Figure 4, B). Our data also identified an additional variant that followed the same dilution and variant allele fraction as described above, but this was not a previously reported variant in this cell line. This new variant position was covered at a lower depth of coverage (average 480X). The relative lower depth of coverage at this position may explain why it may not have been included in previously reported data sets.

Additional Analytical Procedures Required for Clinically Relevant Variants

Identifying Variants in CEBPA.--The presence of high GC content (75% in coding region) of CEBPA poses a challenge to sequencing this gene. A large proportion (5 of 6 amplicons) covering the clinically actionable CEBPA gene did not meet a minimum depth of coverage of 500X (Figure 5, A and B). We mapped the 6 amplicons covering the CEBPA regions against previously reported variants located in the gene and reported in the COSMIC database (Figure 5, C). Previously reported CEBPA variants are reported most frequently within codons 290 to 359 (Figure 5, C); the amplicon covering this region performed well to detect variants within the range (Figure 5, B). Amplicons with the lowest average depth of coverage were located in the regions between codons 88 and 160 and between codons 218 and 290 (Figure 5, B). We therefore determined that use of NGS for CEBPA assessment needs to be supplemented with Sanger sequencing, using a modification to previously published primer sets. (29)

Analysis of Variants in the ASXL1 Homopolymer Region (ASXL1 c.1934dupG; p.Gly646fs).--The ASXL1 c.1934dupG variant occurs in a homopolymer run of 8 G nucleotides, and its clinical relevance is controversial in the literature. (30) This variant has been reported to occur as a PCR artifact resulting from polymerase slippage. (30) It has also been detected in the normal population (it is reported in the Exome Sequencing Project at frequencies of 2.27%-3.19%, depending on the population, and in the Exome Aggregation Consortium data set at minor allele frequencies of 0%-0.22%), and may be a potential germline variant. There are recent reports of this variant as a somatic gain-of-function variant relevant to disease (31); in some cases, groups have verified validity of the variant by repeated sequencing, at times using different enzymes and primer sets. (32,33) Our data demonstrate that this variant occurs at a low variant allele fraction (~5%) in a large fraction of cases in the TSMP data set, where it is a suspected artifact (Figure 6, A). However, in 30 of 490 cases tested (6.1%), this variant was detected at a higher variant allele fraction (15%-35%; Figure 6, B inset), where it is a true-positive call. Given that ASXL1 is actionable in a number of disease sites, we sought to verify this variant in our cases where it appeared at a higher variant allele fraction, in the process of evaluating this variant for reporting. Unlike variants that occur repeatedly in our data set and are associated with either low variant allele fraction or low depth of coverage, such as STAG2, c.2124A>T (p.Leu708Phe) (Figure 7, A through C), the ASXL1 c.1934dupG variant has a consistently high depth of coverage (median, 5034X; range, 2351X-11 589X), and a bimodal distribution in variant allele fraction (Figure 6, B).

Analysis of Insertion Variants in FLT3 (FLT3-ITDs).--We assessed TSMP's ability to detect clinically actionable FLT3 insertions (found in ~25% of AMLs), using 31 known positive cases with FLT3-ITD sizes ranging from 24 to 90 bp. Next-generation sequencing detected 15 of 31 positive cases, including 4 cases with FLT3 insertion sizes greater than 25 bp, up to a maximum size of 33 bp (assay sensitivity = 48%; specificity = 100%), necessitating the use of other orthogonal assays to detect larger FLT3 insertions.

Analysis of the Common Type 1 Deletion Variant in CALR c.1099_1150del (p.Leu367fs).--The 52-bp deletion in CALR is a clinically actionable variant that we identified in our data set by using a specific script designed for this purpose, in addition to variant callers used in our laboratory at the time of assay validation. Read data from a total of 387 samples were directed through the script. Analysis indicated that a total of 12 cases were positive for the CALR deletion variant. Ten cases with available DNA were tested by Sanger sequencing or restriction fragment length polymorphism. All (10 of 10) tested variant positive cases identified by the script were successfully verified by orthogonal methods. We further verified that all positive cases that were identified by the script were also detected by the MSR variant caller after increasing the MSR default insertion/deletion size limit to 55 bp. Analysis of CALR variants in the laboratory currently is performed with the use of the MSR caller alone.

Comparison of Bioinformatics Approaches (NextGENe and MSR--Variant Studio)

Variant output from 2 software systems (NextGENe v.2.3.1 and MSR variant caller, annotated using Variant-Studio 2.2) was compared for analysis of reportable variants selected for review. NextGENe was previously validated for use with the amplicon-based NGS panels in the laboratory, (28) and was therefore taken as the standard for comparison against MSR and VariantStudio. Using a set of 183 cases containing a total of 581 reviewed variants, we determined that 97% (567 of 581) of reported variants were detected by both software systems. Eight of 581 variants (1.5%) were not detected by the MSR software; all 8 were insertions/ deletions greater than 25 bp in length, and therefore the discordance between software systems was attributed to the use of default instrument/software settings that restricted calling of insertions/deletions up to a maximum of 25 bp. Upon review and modification of the MSR indel size settings, all 8 previously undetected variants were identified by this software. Six of 581 variants (1%) were detected solely using the MSR caller, and not NextGENe. Upon reviewing the alignment data for all variants, we determined that all 6 variants were located toward the ends of amplicons, within regions that were covered by primer/ probe sites from alternate amplicons that overlapped. Sanger sequencing confirmed that all 6 variants were true-positive calls. As the NextGENe analysis retained read sequence data located within primer sites for all amplicons, the target coverage in the regions containing these variants was oversampled, and therefore led to lower variant allele frequencies, which were therefore filtered out. In order to streamline analyses, we selected variant calls from a single variant caller for implementation. Because the MSR caller was more effective at capturing variants located at selected ends of amplicons, and there were no variants missed by the use of MSR alone, this was selected as a single caller for variant reporting.

Variant Nomenclature for Some Genomic Regions May Be Incomplete in Commercially Available Software Tools

Using a combination of VariantStudio and NextGENe analysis methods, we were able to detect selected genomic regions that appeared to be incompletely annotated, including regions within IKZF1 and CUX1 that appeared to have detected variants but lacked associated protein amino acid change or complementary DNA changes. These regions required manual annotation by overlapping read alignments against known reference sequences for the genes of interest using software such as Alamut. Cartagenia Bench Lab NGS (Agilent, Santa Clara, California) was also able to provide complete annotations for these regions, with the use of the most updated reference sequences.

Methods for Automating Variant Filtering and Interpretation

As part of the validation and implementation process for the panel, we sought to design and evaluate methods for efficient prioritization of variants for interpretation and reporting, including methods to identify and remove polymorphisms in the absence of matched normal samples. We designed a triaging algorithm that could be used for automation to identify potentially clinically relevant variants from tumor-only analysis of NGS data in hematologic malignancies. Variants called using the Illumina MSR software package for each sample were uploaded into a commercially available tool, Cartagenia Bench NGS v4.2, for analysis (Figure 8). Of all variants detected by NGS (median, 427 variants/case; range, 338-643 variants/case), 35% (median, 150 variants/case; range, 125-172 variants/case) passed all MSR quality criteria. Applying a variant allele fraction threshold resulted in a median of 24 variants/case (range, 11-35 variants/case). Reporting was further restricted to well-covered, exonic nonsynonymous, intronic splice site, and known deleterious synonymous variants, resulting in a median of 2 variants/case for manual review (range, 010 variants/case; Figure 9, A). When combined with our internal data set of more than 600 unique variant interpretations across 8 hematologic malignancies, this approach enabled the review and interpretation of previously known variants, and allowed for prioritizing novel variants in order of clinical actionability. (34) By intersecting the sample variant file with known actionable variants previously interpreted in the laboratory, we were able to prioritize highly actionable variants even at low variant allele frequencies for reporting. Comparison of reportable variant output between the manual review process and the software-based automated review process indicated that all variants identified using the conventional variant review process were detected using the automated process if they were present in the input file. This approach recursively uses our laboratory-developed variant knowledge base, and enables us to organize and use variant interpretations easily for generation of clinical and research reports.

Germline variants are not called from our analysis in the absence of data from matched normal samples, and common polymorphisms were identified using multiple reference population databases. The number of variants identified as polymorphisms using reference databases (National Heart, Lung, and Blood Institute Exome Sequencing Project; Database of Single Nucleotide Polymorphisms; Exome Aggregation Consortium; 1000 Genomes Project, phases 1 and 3) singly and in combination are indicated (Figure 9, B). Data are from a cohort of 30 cases (16 AML, 5 MDS, 3 MPN, and 6 others), and are represented as mean and standard deviation per case.

Considerations About Sample Types in Variant Profile Interpretation

As we anticipated receiving both blood and bone marrow samples for testing on the TSMP, we sought to determine whether sample type impacted sample performance on NGS. DNA extracted from 34 matched cases with both blood and bone marrows as starting material were tested as input for library preparation and sequencing. Both sample types demonstrated equal performance in the assay, generating comparable total numbers of reads, aligned reads, and reads on target. No differences were detected in average depth of coverage or in the fraction of bases with coverage greater than 500X in the 2 sample types (data not shown).

To determine whether sample origin affected mutational abundance and variant profiles, 68 blood- and bone marrow-derived samples from 34 matched patients were run and variants compared (Figure 10, A and B). Thirteen of 34 cases (38%) had at least 1 variant identified in one sample type but not the other. Ten of 13 cases had a total of 14 variants that were identified in the bone marrow, but not the corresponding blood sample (Figure 10, A and B). Two of 13 cases had variants that were discordant in both the blood and the bone marrow sample; both of these cases had blood and bone marrow samples collected at different times (1.5 and 7 months apart). One case had a variant detected in the blood that was not detected in the bone marrow (Figure 10, A and B). Overall this indicates that although samples derived from both blood and bone marrow perform equally well in the assay, the variant profiles can be unique to each sample type, and may be different if the sample collection is temporally separated. Tumor cellularity is also expected to vary between the blood and bone marrow compartments, and is also likely to affect variant profiles.

Confirmatory Testing in Specific Cases

In the bioinformatics analysis and variant assessment workflow, we noted examples of variants where confirmatory testing was required for accurate nomenclature. This was particularly true for indel calls (such as the examples illustrated in Figure 11), where there were 2 alignment possibilities and the call was associated with strand bias. Sanger sequencing was required in these cases to clarify the technical accuracy of the call and its alignment and subsequent nomenclature. Sanger sequencing also proved beneficial for confirming the presence of insertion/deletion calls observed close to amplicon ends (within 5 bp). In the example outlined in Figure 12, a deletion of 2 nucleotides was identified in close proximity to the end of 2 amplicons. Sanger sequencing this region confirmed the deletion in only 1 of the 2 nucleotide positions. Other complexities include the presence of multiallelic sites (identification of more than 2 variant alleles at the same nucleotide position; Figure 13); Sanger sequencing of this region confirmed the presence of 2 different alleles at the same nucleotide position, in addition to the wild-type allele, and also confirmed the presence of the insertion event that was located downstream of the detected triallelic variant site.

DISCUSSION

In this report, we present details of applying previously described guidelines (18,19) to the validation of the Illumina TSMP in a clinical genome diagnostics setting. We highlight the performance characteristics of the wet-bench analytic test, as well as additional analytic procedures for clinically relevant variants and the use of complementary non-NGS-based tests to address some of the limitations associated with NGS analysis. Furthermore, we discuss the integration of the wet bench analytical test with bioinformatics, variant assessment, and variant interpretation steps in the overall workflow. We also describe some examples of complex variant analyses and demonstrate the utility of confirmatory or alternate methods in select scenarios.

Others have previously reported on the validation of custom NGS panels for hematologic malignancies. (20,35-37) These studies have shown the range of application of NGS in the context of molecular diagnostics for hematologic malignancies; here, we add to this body of work by describing, in detail, key considerations around the integration of wet-bench, data analysis, and variant assessment approaches, which are all required for successful test implementation. A comprehensive validation approach needs to establish both test performance characteristics and test limitations. This approach allows for the appropriate application of the test to routine diagnostic conditions. For example, evaluating the ability of TSMP to detect variations in CEBPA enables an assessment of whether a separate test for CEBPA is needed. Because of the lack of appropriate coverage across all relevant exons of this gene, we chose to design a Sanger sequencing assay to complement the NGS analysis. Other groups have identified alternative approaches to improve sequence coverage at the CEBPA locus, by long-range PCR to capture the entire CEBPA exonic region followed by Nextera (Illumina) library preparation and inclusion for sequencing as part of the TSMP workflow. (38)

The TSMP was also not capable of detecting FLT3-ITDs greater than 33 bp in size because of limitations in the amplicon-based technology that restrict the generation/ sequencing of amplicons containing large insertions as well as limitations in bioinformatic tools in aligning and calling larger insertions/deletions. There are efforts to improve upon informatics tools that can be used for detection of insertion/deletions to call indels up to a size limit of 102 bp. (39) Although these methods improve the variant calling rate at the FLT3 locus, they are unable to identify those indels sized greater than that of the sequenced read pair, and in those situations, routine laboratory methods remain the gold standard for detection and reporting of these changes.

Analysis of NGS data also requires an evaluation of repeatedly occurring variant calls and suspected artifacts. We include an example of the ASXL1 c.1934dupG (p.Gly646fs) variant that appears as an artifact at low variant allele frequency and a true somatic variant at higher variant allele frequencies, because of the presence of sequencing artifact as a result of the 8-bp homopolymer that is located at the mutation site. We include an assessment of overall variant allele fraction, depth of coverage, call quality, and the frequency with which the variant appears in the data set to identify potential recurring artifacts and distinguish these from true somatic calls. Because of the high background at this site, the imposed detection limit for this locus was different from that of other regions of the panel, and in line with published reports. (20)

Our current system of reporting includes verification of selected variant types: variants that occur close to the end of the target amplicons and complex indels in actionable genes should be sequence verified. Any variant that is detected in a homopolymeric region is verified, with close attention to the corresponding negative control sample to ensure that reported frequency of the variant is adequately above background signal (noise) detected within the region of interest. Minor disease clones may also be relevant to disease progression and patient management. (9-11) The NGS approach is capable of reproducibly identifying variants with a lower limit of detection of 5%; however, Sanger sequencing is not able to reliably verify variants at this level. Although not specifically addressed in this validation, potential solutions would be the incorporation of additional orthogonal tests, such as droplet digital PCR for relevant variants, with lower detection limits, into the verification workflow.

The density of variants detected by TSMP also necessitated a review of bioinformatics and data analysis practices to ensure repeatability and accuracy of analysis. Automated variant triage and assessment using commercially available tools such as Cartagenia Bench Lab NGS enabled rapid identification of reportable variants. This tool also provided an in silico solution to the absence of a matched normal by using multiple population databases, in combination, to identify and eliminate common polymorphisms.

Finally, we assessed the applicability of our previously published somatic variant classification system (34) to variants identified in hematologic malignancies during the assessment and interpretation of clinically reportable variants. In many cases, for genes without mutational hot spots, the presence or absence of a variant in the gene, rather than the variant itself, is relevant to actionability. In the context of our classification system, variants in these genes are identified as class 3, and may be more biologically relevant than class 2 variants (known mutational hot spots in a gene that are known to be actionable in other indications). Furthermore, a larger body of evidence is available for the actionability of genes and variants in hematologic malignancies in the context of diagnosis, prognosis, and/or outcomes in response to treatment, impacting the manner of interpretation and classification of the variant being assessed. Finally, some reports suggest that the actionability of given variant is dependent upon the molecular profile of that patient; that is, a variant may be clinically actionable in the presence or absence of other specific changes. (1,2) For example, a patient with biallelic CEBPA exhibits an actionable variant profile comprising 2 variants formally classified as class 3A; either variant in isolation, however, is not actionable. (40,41) Several other somatic variant classification systems have been recently reported in the literature. (42-46) Although these classification systems are adaptable to the hematologic malignancy context, applying those focused more specifically on drug trials, as well as those focused on variant-level interpretation and classification, poses greater challenges.

CONCLUSIONS

In summary, we describe the validation of a commercially available NGS panel with application to hematologic malignancies. We assessed the panel for its performance characteristics--analytic sensitivity and specificity; reproducibility and repeatability; test accuracy, linearity, and limit of detection--as well as for genomic regions that required additional analytic approaches in order to implement successful, comprehensive testing. We also highlighted considerations around bioinformatics analysis and requirements for automation of variant assessment that we undertook in order to streamline the workflow for this panel in the diagnostic setting. In evaluating an NGS panel for validation and implementation, we recommend that diagnostic laboratories consider a comprehensive evaluation of the test against the full scope of its intended application, and implement it alongside appropriate complementary testing to ensure no loss of clinically actionable information.

Funding for this work was provided by the Princess Margaret Cancer Foundation, Genome Canada (Genomic Applications Partnership Program), and the Ontario Genomics Institute. We thank Patricia Vasalos, BS, for providing support and coordination for all the NGS validation manuscripts in this series; she is an employee of the College of American Pathologists (Northfield, Illinois).

Please Note: Illustration(s) are not available due to copyright restrictions.

References

(1.) Patel JP, Gonen M, Figueroa ME, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012; 366(12):10791089.

(2.) Papaemmanuil E, Gerstung M, Bullinger L, et al. Genomic classification and prognosis in acute myeloid leukemia. N Engl J Med. 2016; 374(23):22092221.

(3.) Metzeler KH, Herold T, Rothenberg-Thurley M, et al. Spectrum and prognostic relevance of driver gene mutations in acute myeloid leukemia. Blood. 2016; 128(5):686-698.

(4.) Ruffalo M, Husseinzadeh H, Makishima H, et al. Whole-exome sequencing enhances prognostic classification of myeloid malignancies. J Biomed Inform. 2015; 58:104-113.

(5.) Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013; 368(22):2059-2074.

(6.) Haferlach T, Nagata Y, Grossmann V, et al. Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia. 2014; 28(2):241-247.

(7.) Mason CC, Khorashad JS, Tantravahi SK, et al. Age-related mutations and chronic myelomonocytic leukemia. Leukemia. 2016; 30(4):906-913.

(8.) Grossmann V, Kohlmann A, Eder C, et al. Molecular profiling of chronic myelomonocytic leukemia reveals diverse mutations in >80% of patients with TET2 and EZH2 being of high prognostic relevance. Leukemia. 2011;25(5):877-879.

(9.) Klco JM, Miller CA, Griffith M, et al. Association between mutation clearance after induction therapy and outcomes in acute myeloid leukemia. JAMA. 2015; 314(8):811-822.

(10.) Walter MJ, Shen D, Ding L, et al. Clonal architecture of secondary acute myeloid leukemia. N Engl J Med. 2012; 366(12):1090-1098.

(11.) Ding L, Ley TJ, Larson DE, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012; 481(7382): 506-510.

(12.) Arber DA, Orazi A, Hasserjian R, et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood. 2016; 127(20):2391-2405.

(13.) Pastore F, Kling D, Hoster E, et al. Long-term follow-up of cytogenetically normal CEBPA-mutated AML. J Hematol Oncol. 2014; 7:55.

(14.) Tefferi A, Lasho TL, Tischer A, et al. The prognostic advantage of calreticulin mutations in myelofibrosis might be confined to type 1 or type 1like CALR variants. Blood. 2014; 124(15):2465.

(15.) Broseus J, Alpermann T, Wulfert M, et al. Age, JAK2(V617F) and SF3B1 mutations are the main predicting factors for survival in refractory anaemia with ring sideroblasts and marked thrombocytosis. Leukemia. 2013; 27(9):1 826-1831.

(16.) Makishima H, Yoshida K, Nguyen N, et al. Somatic SETBP1 mutations in myeloid malignancies. Nat Genet. 2013; 45(8):942-946.

(17.) Hou HA, Liu CY, Kuo YY, et al. Splicing factor mutations predict poor prognosis in patients with de novo acute myeloid leukemia. Oncotarget. 2016; 7(8):9084-9101.

(18.) Gargis AS, Kalman L, Berry MW, et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol. 2012; 30(11):1033-1036.

(19.) Rehm HL, Bale SJ, Bayrak-Toydemir P, et al. ACMG clinical laboratory standards for next-generation sequencing. Genet Med. 2013; 15(9):733-747.

(20.) Kanagal-Shamanna R, Singh RR, Routbort MJ, Patel KP, Medeiros LJ, Luthra R. Principles of analytical validation of next-generation sequencing based mutational analysis for hematologic neoplasms in a CLIA-certified laboratory. Expert Rev Mol Diagn. 2016; 16(4):461-472.

(21.) McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-1303.

(22.) Tefferi A, Wassie EA, Lasho TL, et al. Calreticulin mutations and long-term survival in essential thrombocythemia. Leukemia. 2014; 28(12):2300-2303.

(23.) Genomes Project C, Auton A, Brooks LD, et al. A global reference for human genetic variation. Nature. 2015; 526(7571):68-74.

(24.) Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60 706 humans. Nature. 2016; 536(7616):285-291.

(25.) Sherry ST, Ward M-H, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001; 29(1):308-311.

(26.) How J, Sykes J, Gupta V, et al. Influence of FLT3-internal tandem duplication allele burden and white blood cell count on the outcome in patients with intermediate-risk karyotype acute myeloid leukemia. Cancer. 2012; 118(24): 6110-6117.

(27.) Brandwein JM, Hedley DW, Chow S, et al. A phase I/II study of imatinib plus reinduction therapy for c-kit-positive relapsed/refractory acute myeloid leukemia: inhibition of Akt activation correlates with complete response. Leukemia. 2011; 25(6):945-952.

(28.) Misyura M, Zhang T, Sukhai MA, et al. Comparison of next-generation sequencing panels and platforms for detection and verification of somatic tumor variants for clinical diagnostics. J Mol Diagn. 2016; 18(6):842-850.

(29.) Ahn JY, Seo K, Weinberg O, Boyd SD, Arber DA. A comparison of two methods for screeningCEBPA mutations in patients with acute myeloid leukemia. J Mol Diagn. 2009; 11(4):319-323.

(30.) Abdel-Wahab O, Kilpivaara O, Patel J, Busque L, Levine RL. The most commonly reported variant in ASXL1 (c.1934dupG;p.Gly646TrpfsX12) is not a somatic alteration. Leukemia. 2010; 24(9):1656-1657.

(31.) Balasubramani A, Larjo A, Bassein JA, et al. Cancer-associated ASXL1 mutations may act as gain-of-function mutations of the ASXL1-BAP1 complex. Nat Commun. 2015; 6:7307.

(32.) Schnittger S, Eder C, Jeromin S, et al. ASXL1 exon 12 mutations are frequent in AML with intermediate risk karyotype and are independently associated with an adverse outcome. Leukemia. 2013; 27(1):82-91.

(33.) Gelsi-Boyer V, Brecqueville M, Devillier R, Murati A, Mozziconacci M-J, Birnbaum D. Mutations in ASXL1 are associated with poor prognosis across the spectrum of malignant myeloid diseases. J Hematol Oncol. 2012; 5(1):12.

(34.) Sukhai MA, Craddock KJ, Thomas M, et al. A classification system for clinical relevance of somatic variants identified in molecular profiling of cancer. Genet Med. 2016; 18(2):128-136.

(35.) Bartels S, Schipper E, Hasemeier B, Kreipe H, Lehmann U. Routine clinical mutation profiling using next generation sequencing and a customized gene panel improves diagnostic precision in myeloid neoplasms. Oncotarget. 2016; 7(21):30084-30093.

(36.) McKerrell T, Moreno T, Ponstingl H, et al. Development and validation of a comprehensive genomic diagnostic tool for myeloid malignancies. Blood. 2016; 128(1):e1-e9.

(37.) Shen W, Szankasi P, Sederberg M, et al. Concurrent detection of targeted copy number variants and mutations using a myeloid malignancy next generation sequencing panel allows comprehensive genetic analysis using a single testing strategy. Br J Haematol. 2016; 173(1):49-58.

(38.) Yan B, Hu Y, Ng C, et al. Coverage analysis in a targeted amplicon-based next-generation sequencing panel for myeloid neoplasms. J Clin Pathol. 2016; 69(9):801-804.

(39.) Kadri S, Zhen CJ, Wurst MN, et al. Amplicon Indel Hunter is a novel bioinformatics tool to detect large somatic insertion/deletion mutations in amplicon-based next-generation sequencing data. J Mol Diagn. 2015; 17(6): 635-643.

(40.) Dufour A, Schneider F, Metzeler KH, et al. Acute myeloid leukemia with biallelic CEBPA gene mutations and normal karyotype represents a distinct genetic entity associated with a favorable clinical outcome. J Clin Oncol. 2010; 28(4):570-577.

(41.) Li HY, Deng DH, Huang Y, et al. Favorable prognosis of biallelic CEBPA gene mutations in acute myeloid leukemia patients: a meta-analysis. Eur J Haematol. 2015; 94(5):439-448.

(42.) Van Allen EM, Wagle N, Stojanov P, et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat Med. 2014; 20(6):682-688.

(43.) Carr TH, McEwen R, Dougherty B, et al. Defining actionable mutations for oncology therapeutic development. Nat Rev Cancer. 2016; 16(5):319-329.

(44.) Meric-Bernstam F, Johnson A, Holla V, et al. A decision support framework for genomically informed investigational cancer therapy. J Natl Cancer Inst. 2015; 107(7).

(45.) Andre F, Mardis E, Salm M, Soria JC, Siu LL, Swanton C. Prioritizing targets for precision cancer medicine. Ann Oncol. 2014; 25(12):2295-2303.

(46.) Li MM, Datto M, Duncavage EJ, et al. Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer: A Joint Consensus Recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists. J Mol Diagn. 2017; 19(1):4-23.

(47.) Clinical and Laboratory Standards Institute. Nucleic Acid Sequencing Methods in Diagnostic Laboratory Medicine; Approved Guideline--Second Edition. Wayne, PA: Clinical and Laboratory Standard Institute; 2014. CLSI document MM09-A2.

Mariam Thomas, PhD; Mahadeo A. Sukhai, PhD; Tong Zhang, MSc; Roozbeh Dolatshahi, MSc; Djamel Harbi, MD, MSc; Swati Garg, PhD; Maksym Misyura, PhD; Trevor Pugh, PhD; Tracy L. Stockley, PhD; Suzanne Kamel-Reid, PhD

Accepted for publication January 12, 2017.

Published as an Early Online Release March 3, 2017.

From the Laboratory Medicine Program, Advanced Molecular Diagnostics Laboratory, Departments of Pathology and Genetics (Drs Thomas, Sukhai, Garg, Misyura, Stockley, and Kamel-Reid), the Princess Margaret Cancer Centre (Drs Thomas, Sukhai, Garg, Misyura, Pugh, Stockley, and Kamel-Reid and Ms Zhang), and High Performance Computing and Bioinformatics Services, Princess Margaret Genomics Centre (Dr Harbi and Mr Dolatshahi), University Health Network, Toronto, Ontario, Canada; and the Departments of Medical Biophysics (Drs Pugh and Kamel-Reid) and Laboratory Medicine and Pathobiology (Drs Stockley and Kamel-Reid), The University of Toronto, Toronto, Ontario, Canada.

The authors have no relevant financial interest in the products or companies described in this article.

Reprints: Suzanne Kamel-Reid, PhD, Clinical Laboratory Genetics, Department of Pathology, University Health Network, The University of Toronto, Room 11-418, 200 Elizabeth Street, Toronto, Ontario, Canada, M5G 2C4 (email: suzanne.kamel-reid@uhn.ca).

Caption: Figure 2. Determination of depth of coverage for amplicons in the TruSight Myeloid Sequencing Panel (TSMP; Illumina, San Diego, California), and evaluation of regions of poor average depth of coverage. A, A representative line graph depicting the distribution of the depth of coverage for all amplicons present in the TSMP from 23 cases run on a sequencing run. The line graph illustrates the cumulative frequency of amplicons with average coverage depth less than or equal to the minimum for each interval bin as indicated on thex-axis. Data indicated that 538 of 568 amplicons (94.7%) were covered at greater than 500X; 30 of 568 amplicons (5.3%) had a coverage less than or equal to 500X. B, Low-performing amplicons (average coverage depth <500X) in specific genes are indicated in the box plot depicting median coverage values and coverage ranges for 23 samples on a representative sequencing run.

Caption: Figure 3. Performance characteristics of the TruSight Myeloid Sequencing Panel (TSMP; Illumina, San Diego, California): interrun, intrarun and intertechnologist reproducibility. Performance characteristics of the TSMP demonstrating interrun (A), intertechnologist (B), and intrarun (C) reproducibility. A, Allele frequencies for 31 variants from 9 samples profiled in duplicate on different runs are indicated in the scatter plot. There was 100% concordance in the detection of variants between the 2 runs for all samples ([r.sup.2] =0.94). B, Allele frequencies for 23 variants from 6 samples were compared to test for intertechnologist reproducibility. All variants were detected with 100% concordance between the 2 runs with a high degree of correlation ([r.sup.2] = 0.95). C, Intrarun reproducibility is demonstrated by profiling 1 sample 3 times in the same run using 3 different library preparations. Variant allele frequencies for 53 variants identified in the sample were compared among the 3 runs, and the graph illustrates mean and standard deviation. There was 100% concordance in the detection of all 53 variants among the samples. For this analysis, all variants including polymorphisms were included in the analysis.

Caption: Figure 4. Performance characteristics of the TruSight Myeloid Sequencing Panel (TSMP; Illumina, San Diego, California) demonstrating linearity and limit of detection of the TSMP using known variants in a dilution series of the HapMap cell line NA19240. A, Variant allele frequencies for 22 variants are depicted at each dilution level for the NA19240. Data indicate robust detection of 22 variants up to an expected allele fraction of 5% at the 1:9 dilution level. Data are shown for 1 homozygous variant call, and the remaining 21 heterozygous variants that are uniquely present in NA19240. B, Graph of depth of coverage assessed for each of the 22 variants at each dilution level reported in (A), demonstrating that all variant positions were covered at a minimum depth of coverage of 500X, with the exception of one that was covered at 480X. Bars represent mean [+ or -] standard deviation of the depth-of-coverage values for each variant in all samples (n = 5) illustrated in (A).

Caption: Figure 5. Evaluation of depth of coverage for amplicons for the clinically relevant CEBPA gene. A, Schematic representing the exonic coding region of the clinically relevant CEBPA gene, and the distribution of the 6 TruSight Myeloid Sequencing Panel (TSMP; Illumina, San Diego, California) amplicon sets across the length of the gene. B, Depth of coverage across each amplicon set (denoted by the codons they cover) is indicated. Data are derived from a total of 92 cases tested on the TSMP, and are shown as box-and-whisker plots; the lines in each box represent the 25th, 50th, and 75th percentiles of depth of coverage for each amplicon; error bars represent the 95% CIs for the data. C, Data representing the number of variant occurrences deposited in the publicly available Catalogue of Somatic Mutations in Cancer (COSMIC) database and their distribution along the length of the CEBPA gene.

Caption: Figure 6. Assessment of ASXL1 c.1934dupG (p.Gly646fs) variant occurrences in cases profiled on the TruSight Myeloid Sequencing Panel (TSMP; Illumina, San Diego, California). A, ASXL1 c.1934dupG (p.Gly646fs) variant occurrences within the larger cohort (N =490) of cases profiled on the TSMP. This analysis was undertaken using sample sets that extended beyond the validation cohort in order to use experience gained during panel implementation, and running of cases for the interrogation of interesting variants showing trends that may have not been observed in the relatively smaller sample sets used for assay validation. Data indicate the variant allele fraction (VAF) and corresponding depth of coverage for each sample tested. B, A histogram of VAF demonstrates the bimodal distribution of variant allele fractions, with a high frequency of variants detected at VAF between 1% and 8%, and an increased abundance of variants detected at higher VAF (12%-40%). A histogram restricted to variant calls that have allele fraction greater than 10% demonstrates presence of a small subset of variants detected at VAF between 12% and 40% (inset).

Caption: Figure 7. Assessment of a recurrent variant within a larger cohort of cases profiled on the TruSight Myeloid Sequencing Panel (Illumina, San Diego, California), and identified to be a suspected artifact. A, Distribution of variant allele fraction (VAF) and depth of coverage for each case identified with the STAG2 c.2124A>T (p.Leu708Phe) variant call. Note that these variant calls are associated with lower overall depth of coverage. B, A histogram of VAF demonstrates the distribution of variant across different allele frequencies. C, A histogram demonstrating the distribution of depth-of-coverage values for the variant in consideration. Data indicate that more than 95% of variants were covered at a depth of less than 100X. The overall low coverage at this variant position and relative high recurrence of this variant in our data set and its absence in publicly available data sets enables us to rule out this call as a possible artifact of the assay, and not a true-positive call.

Caption: Figure 8. Variant triage algorithm applied to prioritize and assess the input list of variants from the TruSight Myeloid Sequencing Panel (Illumina, San Diego, California). Variants are first put through a quality filter step where they are assessed for call quality, and where variant calls that pass all quality filters (PASS) identified by the MiSeq Reporter (MSR; Illumina) software are advanced through the analysis pipeline. Selected variants that are flagged with quality concerns (No PASS) such as strand bias are assessed for location and coverage (exonic, coverage >500X). All variants that pass through the quality filter step are then compared against a curated list of known deleterious and actionable set of variants (actionable variant list), and selected for review if they match entries within this list. The next filter step aims to remove polymorphisms that are present in population variant databases at high minor allele frequency (MAF > 1%); any variants classified as pathogenic in the ClinVar database, predicted deleterious in silico, and any insertion/deletion variants are retained for manual review. All remaining variants located in exonic regions and leading to coding changes, or located at splice sites, are flagged for review if they meet a variant allele fraction (VAF) cutoff of 5% and a depth-of-coverage cutoff of 500X. A lower depth-of-coverage cutoff (100X) is used for variants with VAF greater than 15%. Variants in the CEBPA gene are reviewed and assessed at a lower depth-of-coverage threshold (50X) to account for known challenges in sequencing the gene.

Caption: Figure 9. Results of variant triaging algorithm. A, Number of variants meeting filter categories that are represented in Figure 8. Analysis of a validation set of 30 cases using this approach resulted in a median of 2 variants/case (range, 0-10 variants/case), prioritized for review. B, In the absence of data from matched normal samples, common polymorphisms were identified using multiple population variant databases. The number of variants identified as polymorphisms using reference databases (Exome Sequencing Project [ESP]; Single Nucleotide Polymorphism Database [dbSNP]; Exome Aggregation Consortium [ExAC]; and 1000 Genomes Project, phases 1 [1000G_P1] and 3 [1000G_P3]) singly and in combination are indicated. Data are from a cohort of 30 cases, and are represented as mean and standard deviation.

Caption: Figure 11. Example of a variant call with strand bias selected for confirmatory testing. Screenshot of reads indicating variant calls with 2 different alignments detected in the CALR gene. Sanger sequencing confirmed presence of the variant.

Caption: Figure 12. Example of variant calls located at end of amplicon selected for confirmatory testing. Two end-of-amplicon deletion calls in the ASXL1 gene are indicated, only one of which was verified by Sanger sequencing.

Caption: Figure 13. Example of a complex variant call selected for confirmatory testing. An insertion variant identified upstream of a multiallelic variant call in the WT1 gene is indicated; all variant calls were verified by Sanger sequencing.
Figure 1. Summary of cases used in the validation of the TruSight
Myeloid Sequencing Panel (Illumina, San Diego, California), and the
variant types detected in the genes present on the panel. A,
Distribution of cases used in the validation cohort, including acute
myeloid leukemia (AML), myeloproliferative neoplasm (MPN),
myelodysplastic syndrome (MDS), control cases, and others. B,
Distribution of variant types identified in the validation cohort,
including single-nucleotide variants (SNV), insertions, deletions,
and insertion/deletions (indels). C, Distribution of variant types
identified in each gene. A diversity of variant types (including
SNVs, insertions, and deletions) were identified in all clinically
actionable genes present on the panel.

A.

N = 139 validation cases

Controls
AML
MPN
MDS
Others

B.

N = 375 variants, 139 cases

SNV         71%
Insertions  22%
Deletions   4%
Indels      3%

C.

Indels
Insetions
Deletions
SNV

Note: Table made from pie chart.

Figure 10. Comparison of variant profiles identified from matched
blood (BC) and bone marrow (BM) samples. Each paired sample was
assessed individually to identify variants. Variants were then
compared between the cases to evaluate numbers of concordant and
discordant calls between samples. A, Pie chart indicating
distribution of cases that were compared and the resulting
concordance between BC and BM. B, Number of variants identified in
each pair of cases (color-coded in blue for concordant variants and
red for variants that were present in one but not both sample types)
are illustrated in the stacked bar graph.

A

Cases with concordant BC    62%
  and BM variant profiles
Cases with variants not     29%
  detected in BC
Cases with variant not      3%
  detected in BM
Cases with discordant       6%
  variant in BC and BM

B

Variants detected uniquely in BC or BM

Variants detected in both BC and BM

Note: Table made from pie chart.
COPYRIGHT 2017 College of American Pathologists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Thomas, Mariam; Sukhai, Mahadeo A.; Zhang, Tong; Dolatshahi, Roozbeh; Harbi, Djamel; Garg, Swati; Mi
Publication:Archives of Pathology & Laboratory Medicine
Article Type:Report
Date:Jun 1, 2017
Words:9722
Previous Article:Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection.
Next Article:Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease.
Topics:

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters