Printer Friendly

Diagnostic Accuracy Studies of Fine-Needle Aspiration Show Wide Variation in Reporting of Study Population Characteristics: Implications for External Validity.

Accurate diagnosis is essential for appropriate patient care, and diagnostic test accuracy (DTA) studies provide the basis for understanding test performance. However, when the items described in the DTA studies themselves are inadequate, comparison between studies is compromised. (1) Study comparisons rest on the assessment of applicability or external validity. A study is said to be applicable if the clinical problem addressed in the study is similar to a clinical problem of interest (the target). Assessment of external validity can be undertaken by using the PICO (population, index test, comparator, and outcome) format (2) to compare a study to its target. One essential element of applicability involves population descriptions. A DTA is useful only if one can determine whether the study population is relevant to a target population with a specific clinical question. This requires that the study population be described in more than generic terms and should include items such as the referral pattern, setting, and prior testing, in addition to characteristics of the patients themselves.

Lack of attention to external validity by studies is a frequent concern. (3) Study populations may differ considerably from the target population owing to restrictive selection criteria, (4,5) and though selection criteria are an important aspect of population descriptions, they have received little study. (6) Differences can arise from subtle sources such as referral patterns or patient choice, (3) and although guidelines and checklists, such as the STARD (standards of reporting diagnostic accuracy) statement, (7,8) have improved the quality of reporting, some authors (9) believe they emphasize internal validity over external validity. In general, there is a tradeoff between internal and external validity. Restrictive selection criteria reduce variance and improve internal validity but compromise external validity. There is consensus that more attention should be paid to external validity. This concern is reflected by the fact that indirectness (external validity) is included as a major component of the GRADE (Grading of Recommendations Assessment, Development and Evaluation) criteria. (10)

Features of population descriptions that are relevant to DTA studies include an adequate description of populations to facilitate comparisons and descriptions that are based on similar items, since population descriptions will vary by disease entity and by test. Overall, some degree of consistency would be useful to assess population descriptions.

In recent years there has been a heightened awareness of deficiencies in study design and reporting in DTA studies. (11-15) In 1999, the Cochrane Screening and Diagnostic Test Methods Working Group first convened to reduce deficiencies in diagnostic test reporting. Since then, several tools have been introduced to address these issues. The STARD checklist (8,16) was introduced to improve reporting and has been widely adopted by pathology journals. The QUADAS (Quality Assessment of Diagnostic Accuracy Studies) instrument (17,18) is an evidence-based survey used to assess applicability and risk of bias for studies included in systematic reviews of diagnostic accuracy studies. Though these checklists provide general guidance, we are unaware of any studies that have attempted to study population descriptions in DTA studies in detail.

Fine-needle aspiration cytology (FNAC) is a diagnostic test that is used to obtain samples for a range of diseases from a variety of anatomic sites. Fine-needle aspiration cytology DTA studies are frequently published, but even within a single anatomic site, FNAC DTA studies show considerable variability in accuracy. (19) It is not known whether the heterogeneity is due to variation in methodology or population parameters. In principle, the cause of variation can be determined by meta-analysis; however, studies must fully report the details of population characteristics to conduct such analyses. Two recent studies (20,21) have shown wide variation and many deficiencies in methods' reporting in fine-needle aspiration (FNA) studies. Given the deficiencies seen in methods' reporting, it is possible that similar deficiencies are common in population descriptions. To our knowledge, the quality of reporting of population parameters in FNA studies has not been studied. We therefore conducted a literature evaluation survey to assess the quality of reporting of population parameters in FNAC DTA studies and to evaluate the comparability (external validity) of studies focused on 1 anatomic site.


Conceptual Framework

A diagnostic accuracy study should be designed to be relevant to a particular population (target population). (1) Through the course of participant selection, the actual study population will differ from the target population. Internal validity refers to the conclusions drawn from the study population. External validity refers to the applicability of the results to the target population (Figures 1 and 2).

Population parameters are used both to select a study population (selection) and to describe the population in order to compare it with others (description). (22) Parameter descriptors include demographics, disease states, and referral patterns (Table 1).

Study Selection

We evaluated studies focusing on 4 anatomic sites: lung, thyroid, salivary glands, and pancreas. We conducted a separate MEDLINE (US National Library of Medicine) search for each anatomic site to identify all FNAC DTA studies conducted between January 2005 and October 2011. We used the MEDLINE Mesh term "Sensitivity and Specificity" to filter for DTA studies, "biopsy, fine needle," and the term for the particular anatomic site, for example, [pancreas AND "biopsy, fine needle" AND "Sensitivity and Specificity"]. Studies were eligible if they could be identified as DTA studies for FNAC (ie, comparing diagnoses obtained by FNAC against a reference standard--either histopathology or clinical follow-up). Reviews and studies with fewer than 10 cases were excluded. Full reports of eligible studies were obtained, and studies were selected for potential inclusion if they provided data on the sensitivity and specificity of FNAC. There was restriction on language, study type, or location. For each anatomic site, we randomly selected a set of 20 studies for evaluation. Random selection was implemented as follows: we created a list of eligible studies in Microsoft Excel (Microsoft Corporation, Redmond, Washington), assigned a random number between 1 and 1000 to each study, sorted the studies, and then selected the first 20 studies. Random selection was used because our objective was to sample the overall literature as would be required in a meta-analysis of diagnostic accuracy studies. High-quality meta-analyses generally require a systematic review of the literature and assessment of comparability (ie, external validity) as implemented by the QUADAS survey. (23,24) Meta-analysis requires the synthesis of all evidence. Random sampling provides an unbiased sample of the published literature as would be required for a meta-analysis of DTA studies. (25) The search and selection process is shown in Figure 3.

Survey Development

We conducted a preliminary survey to develop a list of parameters that were commonly used to describe the populations in our collection of studies. In this preliminary survey, all included studies were scanned for a list of parameters. This approach was used to identify all the population descriptors that had been reported in the literature. We are not aware of any studies demonstrating the impact of population characteristics on FNAC accuracy. Our approach identified parameters reported by authors of FNA DTA studies. Our method for identifying population descriptors reflects the collective opinion of 80 authors regarding the choices they made for reporting population parameters. Presumably, these parameters were reported because authors thought they may be important for interpreting the results of their studies. We did not include population descriptors that might be thought to affect FNAC but had not been reported in the literature. We then organized the parameters into categories as described above. We also developed a list of study characteristics that might be related to parameter usage (eg, journal type, study location). The survey instrument was pilot tested on a subset of studies and was revised from evaluation of discrepancies.

Survey Execution

Each study was independently evaluated by 2 different authors (R.L.S. and K.K.N.) using a standard form (Table 1). Each evaluator filled out the form by marking whether a parameter was used and the manner in which the parameter was used. For example, if a study mentioned the sex distribution of the sampled population, a checkmark would be placed next to sex" and the use would be categorized as description (see Table 1). If the study was limited to males, a checkmark would be placed next to sex and the use would be categorized as selection.

Statistical Analysis

Survey data were recorded in a database (Microsoft Access 2010) and all statistical analysis was conducted by using Stata Statistical Software Release 12 (Stata Corporation, College Station, Texas). Results were considered statistically significant if P < .05. Multivariate logistic regression was used to assess the effect of various parameters (anatomic site, journal type, population parameter) on the specification rate. The specification rate is defined as the probability that a particular item was specified in the "Methods" section of studies.


Literature Search

We identified 236 eligible studies from 4 anatomic sites (lung, 49; thyroid, 88; salivary glands, 77; pancreas, 22) and randomly chose 20 studies from each site for evaluation.

Survey Reliability

Per-item agreement averaged 83% (95% confidence interval: 77%-99%).

Characteristics of Evaluated Studies

Studies were conducted in a total of 30 countries but were most frequently conducted in the United States (37%) (Table 2). Clinical specialty of the author, journal type, and anatomic biopsy site were highly correlated (P = .001). Most studies were retrospective. Most prospective studies were lung and pancreas (37%) compared with salivary and thyroid studies (12%). Many articles failed to mention whether cases were consecutive. Most studies were conducted in tertiary care settings.

Parameter Identification

We identified 12 items that were commonly used to describe the study populations (Table 1). These were grouped into 3 categories: demographics (age, sex, race, and comorbidities), disease characteristics (lesion location, disease stage, lesion size, resectability, and prior treatment), and referral path (imaging tests, failed test, and physician referral). We added a category of "other" in each group to capture low-frequency parameters that did not fit into the designated categories. The items in Table 1 correspond to those most frequently mentioned in the literature and do not represent items that could, in theory, affect accuracy.

Parameter Specification per Study

There was considerable variability in the number of parameters specified per study (Table 3). There were significant differences in the average number of parameters specified between anatomic sites (P = .001) and by item use (P = .001). There was high interrater agreement on the total number of parameters specified per study (observed agreement = 89.9%, expected agreement = 22.0%, k = 0.87, P < .001).

Specification Rates by Parameter and Parameter Category

Overall, 94% of studies used at least 1 parameter to describe a population (Tables 4 and 5). Eighty-six percent of studies reported demographic characteristics, and 78% reported disease characteristics. Selection processes were described less frequently. Sixty-two percent of all studies specified at least 1 parameter to describe a selection process (Table 4). The relative frequency of parameter category specification varied by anatomic site. A more detailed view (Table 5) shows that age and sex were the most frequently specified parameters. Age, sex, lesion size, and location were frequently described but were infrequently used for selection. Imaging results were the most frequently specified parameter for selection. There was no significant difference in individual parameter specification by anatomic site after controlling for parameter type and use.

Other Reporting

For all studies on average, 33% mentioned the target population, 38% provided a comparison of features of positive and negative cases, and 21% provided a flow diagram. The differences between anatomic sites were not statistically significant (Table 6).


Our study shows considerable variability in the reporting of population characteristics. We found significant differences in population descriptions across anatomic sites. We expected that reporting of population parameters would vary by site owing to the differences in the underlying disease entities and referral patterns; however, we also found considerable variability of reporting for studies conducted within a particular site. In some cases, the number of parameters varied from 0 to 5. This variability suggests that there is lack of consensus regarding the parameters that should be reported even for a particular diagnostic test applied at a particular diagnostic site.

We found that population parameters were used most often to describe populations but were used less frequently to describe population selection. We also found that only a third of studies referred to a target population, which was surprising, because it is fundamental to the formulation of a clinical research question. For example, the specification of the target population is a key component of the commonly used PICO framework for problem formulation. (2) Additionally, the STARD guidelines recommend that DTA studies include a flow diagram, but only 21% of studies in our sample did so.

There are other indications that population reporting is often inadequate. A significant number of studies failed to provide any description of either the population or the selection process. We found that age and sex were, by far, the most frequently specified population parameters. Although these parameters can be important in some contexts, a diagnostic accuracy study should report on parameters that are likely to affect test performance. We suspect that age and sex are frequently specified owing to accessibility rather than importance. Studies typically used a total of 2 to 4 parameters to describe populations. If one discounts age and sex, populations are typically described by at most 1 or 2 parameters. Our survey counted any use of a parameter regardless of whether it was used in a meaningful way. For example, if a study mentioned that 50% of the population had a previous magnetic resonance imaging scan, we counted that as a description of the referral process despite the fact that the usefulness of that information is questionable. Thus, our survey probably overestimates the rate at which useful parameters are specified. Even using this liberal definition of parameter specification, we found that studies reported relatively few parameters. Thus, in addition to variability and standardization, our survey raises concerns about the overall adequacy of population descriptions.

The quality of reporting population parameters was generally lower in studies reporting on salivary gland lesions than other anatomic sites. There were other indications to suggest that reporting may be generally lower in salivary gland studies. For example, salivary gland studies had weaker designs, had fewer prospective studies, and infrequently had longitudinal follow-up. Salivary gland studies are often based on convenience sampling of surgery cases and suffer from partial verification bias. (21) Salivary gland studies most often appeared in surgical journals. Thus, it is possible that reporting quality varies by journal type.

Population descriptions are important because they are used to assess the applicability (external validity) of a study. In the context of diagnostic accuracy studies, it is important to report population parameters that are likely to affect accuracy. Test accuracy can be affected by a variety of population factors such as prior testing, stage of disease, and prior treatment. (1) For diagnostic tests, the external validity of a study requires an assessment of whether the test will perform similarly in a target population as it did in the study sample population (Figure 1), which can be ascertained by comparing the subset of population parameters that affect accuracy (Figure 4), since not all population parameters will affect accuracy. The population parameters that affect accuracy may differ from the parameters that affect therapeutic efficacy. Thus, it is necessary for researchers to carefully consider which parameters may affect the accuracy of a particular diagnostic test and to report them in a way that facilitates assessment of external validity. Researchers should also be aware that the impact of various population parameters may be unknown and will have to be determined by meta-analysis of DTA studies. In our opinion, researchers should be conservative and report parameters that may affect accuracy so these can be tested later by meta-analysis.

Relatively few studies (38%) provided a comparison between the characteristics of the diseased and nondiseased population. This information is important for assessment of spectrum bias as shown in Figure 2. Spectrum bias can be viewed as a special case of external validity and requires a description of the nondiseased population.

Our study has identified several possible areas for improvement. Overall, it appears that studies could improve the description of population selection. As a starting point, it would be helpful if studies formulated a clear research question in the PICO format that specifies a target population. Studies should then describe the selection process that was used to obtain a sample population to answer the question. A well-formed research question not only improves the design of a study but also makes it easier to identify relevant studies. We believe the framework in Figure 1 provides a useful way to communicate the selection of the sample population and the applicability of the sample population to the research question. We also believe that there is significant potential for improvement in the description of the sample population. Often, the sample populations were described by only a few aggregate terms. It is doubtful that such sparse descriptions would enable a clinician or researcher to assess the applicability of a study to a specific clinical problem. It is not possible to provide specific guidelines because the requirements will differ by disease, by site, and by test. On the other hand, it is possible to provide some broad guidelines. For example, authors should fully report referral patterns (previous testing, referring physician type) because referral patterns are known to affect diagnostic accuracy. (1) Similarly, information to assess the pretest probability of malignancy would be helpful for comparing outcomes of DTA studies. Information about severity of disease (eg, lesion size, stage) would be helpful for assessing possible spectrum bias. While the specifics may vary by anatomic site, our results suggest that there is opportunity to improve the quality of reporting of population characteristics in FNA DTA studies.

There were many studies in which reporting was clearly inadequate (eg, those that reported no parameters), but it is difficult to say how many parameters are sufficient or what parameters should be required. Authors must be selective and cannot report every detail. Comparability and external validity are determined by comparing results across multiple studies and gauging the impact of various factors on outcomes. Eventually, key factors emerge and these should be reported. Clearly, the list of factors will depend on the anatomic site. Although there are theoretic reasons to believe that population differences may affect FNAC accuracy, very few studies have examined the impact of population characteristics on diagnostic accuracy of FNAC. For example, tumor size has been shown to affect the diagnostic yield. (26) Fine-needle aspiration DTA studies show wide variation in accuracy, and it is possible that population differences may account for some of this variability. The role of population differences will most likely be revealed through analysis of heterogeneity in meta-analytic studies. This type of analysis will depend upon careful documentation of population parameters in the included DTA studies.

Diagnostic test accuracy studies show significant heterogeneity in outcomes. In theory, part of this heterogeneity might be due to population factors. Although authors report population parameters in FNA DTA studies, little is known about the impact of population characteristics on diagnostic accuracy. The impact of population characteristics on accuracy could be studied by meta-analytic methods that compare diagnostic performance across studies and correlate diagnostic performance with population characteristics. Such meta-analytic studies of heterogeneity are only feasible if the primary studies report population characteristics that can be compared across studies. Our analysis shows that analysis of heterogeneity of diagnostic accuracy results would be difficult owing to the variability in reporting of population characteristics. This is a lost opportunity because meta-analyses of primary studies could provide insight into the variability in FNA DTA outcomes.

The need for meta-analyses of diagnostic studies is well recognized. An assessment of comparability is a key component of the evaluation of studies for inclusion in a meta-analysis. (23,24) Our study is a literature assessment study that was designed to determine whether the quality of reporting is generally sufficient to support such evaluations. We have found that the variability in reporting compromises the assessment of comparability as required for meta-analysis of DTA studies.

The need to compare the external validity of DTA studies also arises in clinical practice. For example, population characteristics might be important in interpreting differences in performance of 2 DTA studies or in determining whether the accuracy reported in a DTA study is applicable to a particular patient. Full reporting of population parameters is required for these assessments.

Our study has several limitations. We only identified parameters that were used within the set of included studies. It is possible that we missed some important parameters because they were not mentioned in our set of studies. More generally, a population parameter could be important but might never have been studied or mentioned in a study. Our method was sufficient to show variability in reporting but not sufficient to determine whether reporting was adequate. Our method was based on random sampling and it is possible that, by chance, our sample was not representative. Our lowest sampling rate was in thyroid studies (20 of 88 studies), so it is unlikely that our sample was unrepresentative. Also, because we used randomized sampling, the risk of bias is low.

Our study is one of the first to assess reporting of population parameters in the context of diagnostic accuracy studies. Unfortunately, there is little evidence regarding the impact of population parameters on FNAC accuracy. Thus, it was difficult to assess the adequacy of reporting. It would be useful to conduct a similar study for a diagnostic test in which there is more evidence regarding the impact of population parameters on test performance.


Fine-needle aspiration cytology DTA studies show considerable variability in the description of sample populations and the population selection process. Studies generally reported 2 to 4 parameters to describe the sample population. The selection process was generally described in less detail than sample populations. Studies often fail to provide flow diagrams or to provide a clear statement of the research problem. There is considerable opportunity for studies to improve both descriptions of sample populations and the process used to select them. Variation in reporting of study populations makes it difficult to compare and synthesize evidence from FNAC DTA studies, and to develop broad guidelines for clinical application of FNAC.

Please Note: Illustration(s) are not available due to copyright restrictions.


(1.) Knottnerus JA, Buntinx F. The Evidence Base of Clinical Diagnosis: Theory and Methods of Diagnostic Research. Oxford, England: Wiley; 2011.

(2.) Guyatt G, Rennie D, Meade MO, Cook DJ. Users' Guide To The Medical Literature: A Manual for Evidence-Based Clinical Practice. New York, New York, USA. McGraw-Hill; 2008.

(3.) Rothwell PM. External validity of randomised controlled trials: "to whom do the results of this trial apply?". Lancet. 2005; 365(9453):82-93.

(4.) Travers J, Marsh S, Caldwell B, et al. External validity of randomized controlled trials in COPD. Respir Med. 2007; 101(6):1313-1320.

(5.) Uijen AA, Bakx JC, Mokkink HGA, van Weel C. Hypertension patients participating in trials differ in many aspects from patients treated in general practices. J Clin Epidemiol. 2007; 60(4):330-335.

(6.) Perrio M, Waller PC, Shakir SA. An analysis of the exclusion criteria used in observational pharmacoepidemiological studies. Pharmacoepidemiol Drug Saf. 2007; 16(3):329-336.

(7.) Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Clin Biochem. 2003; 40(pt 4):357-363.

(8.) Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003; 49(1):7-18.

(9.) Persaud N, Mamdani MM. External validity: the neglected dimension in evidence ranking. J Eval Clin Pract. 2006; 12(4):450-453.

(10.) Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines, 8: rating the quality of evidence--indirectness. J Clin Epidemiol. 2011; 64(12):1303-1310.

(11.) Smidt N, Rutjes AWS, van der Windt DAWM, et al. The quality of diagnostic accuracy studies since the STARD statement: has it improved? Neurology. 2006; 67(5):792-797.

(12.) Smidt N, Rutjes AWS, van der Windt DAWM, et al. Quality of reporting of diagnostic accuracy studies. Radiology. 2005; 235(2):347-353.

(13.) Whiting P, Rutjes AWS, Dinnes J, Reitsma JB, Bossuyt PMM, Kleijnen J. A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools. J Clin Epidemiol. 2005; 58(1):1-12.

(14.) Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of designrelated bias in studies of diagnostic tests [erratum in JAMA. 2000; 283(15):1963]. JAMA. 1999; 282(11):1061-1066.

(15.) Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004; 140(3):189-202.

(16.) Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative: Standards for Reporting of Diagnostic Accuracy. Clin Chem. 2003; 49(1):1-6.

(17.) Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003; 3:25.

(18.) Whiting PF, Weswood ME, Rutjes AWS, ReitsmaJB, Bossuyt PNM, Kleijnen J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol. 2006; 6:9.

(19.) Schmidt RL, Hall BJ, Wilson AR, Layfield LJ. A systematic review and meta-analysis of the diagnostic accuracy of fine-needle aspiration cytology for parotid gland lesions. Am J Clin Pathol. 2011; 136(1):45-59.

(20.) Schmidt RL, Factor RE, Affolter KE, et al. Methods specification for diagnostic test accuracy studies in fine-needle aspiration cytology: a survey of reporting practice. Am J Clin Pathol. 2012; 137(1):132-141.

(21.) Schmidt RL, Factor RE, Witt BL, Layfield LJ. Quality appraisal of diagnostic accuracy studies in fine-needle aspiration cytology: a survey of risk of bias and comparability. Arch Pathol Lab Med. In press.

(22.) Elwood J. Critical Appraisal of Epidemiological Studies and Clinical Trials. New York, NY: Oxford University Press; 2007.

(23.) Reitsma JB, Rutjes AWS, Whiting P, Voassov VV, Leeflang MMG, Deeks JJ. Chapter 9: Assessing Methodological quality. In: Deeks JJ, Bossuyt PM, Gatsonis C (editors), Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0.0. The Cochrane Collaboration, 2009. Available from:

(24.) Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011; 155(8):529-536.

(25.) Lohr SL. Sampling: Design and Analysis. Boston, MA: Brooks/Cole; 2009.

(26.) Siddiqui AA, Brown LJ, Hong SK, et al. Relationship of pancreatic mass size and diagnostic yield of endoscopic ultrasound-guided fine needle aspiration. Dig Dis Sci. 2011; 56(11):3370-3375.

Robert L. Schmidt, MD, PhD, MBA; Krishna K. Narra, MD, MS; Benjamin L. Witt, MD; Rachel E. Factor, MD, MHS

Accepted for publication April 6, 2014.

From the Department of Pathology, University of Utah School of Medicine and ARUP Laboratories, Salt Lake City, Utah.

The authors have no relevant financial interest in the products or companies described in this article.

Reprints: Robert L. Schmidt, MD, PhD, MBA, Department of Pathology, University of Utah School of Medicine, 1 5 North Medical Dr E, Salt Lake City, UT 84112 (e-mail: edu,

Caption: Figure 1. The figure (derived from Elwood (22) by permission of Oxford University Press) shows the selection processes that define various populations in the definition of a study sample and study participants. A source population (sampling frame) is the population that is available to study a clinical question posed in terms of a target population. The eligible population is derived from the source population by the application of selection (exclusion/inclusion) criteria. The study sample is obtained from the eligible population by using a sampling plan. Not all cases in a study sample contribute study data. Thus, the actual study participants (cases) may differ from the study sample. Internal validity relates to the specific conclusions drawn from the study participants. External validity is defined by the applicability of the results obtained from a set of study participants to a hypothetic target population. Internal validity is necessary but not sufficient for external validity.

Caption: Figure 2. The figure illustrates the definition of spectrum bias. The disease spectrum is the difference in characteristics between those with disease and those without disease. The spectrum may be described as wide (large difference) or narrow (small difference). The spectrum difference is a comparison of the disease spectrum of the study population (those actually studied) and the disease spectrum of a target population. Two studies are comparable if the spectrum difference is small.

Caption: Figure 3. The figure shows the results of the literature search and study selection process. Twenty studies were randomly selected from the set of studies identified for each anatomic site. Abbreviations: DTAs, diagnostic test accuracy studies; FNAC, fine-needle aspiration cytology.

Caption: Figure 4. The figure provides a conceptual model for evaluation of external validity of diagnostic tests. External validity involves a comparison of the parameters of a target population and the sample population. In the context of a diagnostic accuracy study, external validity is assessed by comparing factors that affect diagnostic accuracy.
Table 1. Survey Form and Parameter Classification (a)

                Parameter Application
                Description            Selection

  __Age                  __Age
                __Sex                  __Sex
Demographic     __Race                 __Race
                __Comorbidities        __Comorbidities
                __Other *              __Other

                __Lesion location      __Lesion location
                __Stage                __Stage
Disease         __Lesion size          __Lesion size
                __Resectability        __Resectability
                __Prior treatment      __Prior treatment
                __Other                __Other

                __Imaging              __Imaging
Referral path   __Prior test           __Prior test
                __Physician referral   __Physician referral
                __Other                __Other

(a) The table shows the parameters that were identified as commonly
specified and the groups to which they were assigned. For example,
age is a parameter that was assigned to the demographics category.
The survey was used to record the frequency with which a particular
parameter or parameter category was used either to describe a
population or to describe the process by which the population was
selected. For example, a patient may have been selected on the basis
of imaging results (referral path) or the study may simply have
described imaging results in the selected patients (eg, 45% of the
study population had evidence of cancer in imaging studies) but did
not use imaging results to select patients.

Table 2. Characteristics of Included Studies (a)

                                           Salivary    Thyroid

Study location (n)   Most frequent          UK (3)    USA (10),
                       countries                      Italy (2),
                                                      Turkey (2)

Journal types        Pathology                 3          13
                     Radiology                 0           0
                     Surgery                  16           6
                     Gastroenterology          0           0
                     Other                     1           1
                     Avg impact factor        1.1         2.2
Author types         Pathology                 5           8
                     Radiology                 0           1
                     Surgery                  15           7
                     Gastroenterology          0           0
                     Other                     0           4
Study design         Prospective, %           10          15
                     Consecutive, %           20          50
                     IRB approval, %          10          35
                     Longitudinal FU, %        0          10
                     Tertiary setting, %      95         100

                                                Lung        Pancreas

Study location (n)   Most frequent            USA (6),      USA (11),
                       countries             Japan (3),      UK (2)
                                           Australia (2),
                                             Korea (2)
Journal types        Pathology                    4             4
                     Radiology                    4             4
                     Surgery                      3             1
                     Gastroenterology             0            11
                     Other                        9             0
                     Avg impact factor           3.6           2.8
Author types         Pathology                    3             3
                     Radiology                    3             3
                     Surgery                      3             1
                     Gastroenterology             0            13
                     Other                       11             0
Study design         Prospective, %              40            35
                     Consecutive, %              55            45
                     IRB approval, %             70            60
                     Longitudinal FU, %          55            35
                     Tertiary setting, %         90            50

(a) Journals and authors assigned to 5 categories (pathology-
cytology, radiology, surgery, gastroenterology, and other). Author
category was based on the departmental affiliation of the
corresponding author. Study design characteristics were counted as
positive only if they were specifically reported. Many studies did
not specify study characteristics and the values given represent
minimum values. Study location indicates the countries in which
studies were conducted. There were 20 studies for each anatomic site.
For thyroid, studies were most frequently conducted in the United
States (10 studies), Italy (2 studies), and Turkey (2 studies).
Abbreviations: Avg, average; FU, follow-up;IRB, institutional review
board;UK, United Kingdom;USA, United States.

Table 3. Parameter Specification Rates by Tissue and Item Usea

                Parameter Specification Rate, %

Parameter Use   No. of Parameters   Salivary     Thyroid

Description             0              32           9
                        1               5           9
                        2              18          36
                        3              27          27
                        4              14          14
                        5               5           5
                        6               0           0
                        7               0           0
                    Mean (SD)       1.8 (1.6)   2.4 (1.1)
Selection               0              82          41
                        1              14          45
                        2               5           9
                        3               0           0
                        4               0           0
                        5               0           5
                        6               0           0
                        7               0           0
                    Mean (SD)       0.2 (0.4)   0.6 (0.6)

                Parameter Specification Rate, %

Parameter Use     Lung      Pancreas      Total

Description         5           5          13
                   10           0           6
                    5          24          21
                   15          29          25
                   45          10          20
                   20          24          13
                    0           5           1
                    0           5           1
                3.8 (1.1)   3.4 (1.2)   2.9 (1.6)
Selection          35          57          54
                   25          38          31
                   25           0           9
                    5           5           2
                    5           0           1
                    5           0           2
                    0           0           0
                    0           0           0
                0.9 (0.9)   0.5 (0.5)   0.5 (0.7)

(a) The table shows the percentage of studies that specified a
particular number of population description parameters by tissue
type and items use. For example, 32% of salivary gland studies
provided no parameters to describe the sample population and 5%
provided only a single parameter. Eighty-two percent of salivary
gland studies provided no specific parameters to describe selection
criteria. The mean is the average number of parameters specified.

Table 4. Specification Rates for Parameters and Parameter Categories

Parameter                           Parameter Use (% of Studies)
               Parameter            Description   Selection   Total

Demographics   Age                     0.82         0.07      0.86
               Sex                     0.84         0.04
               Race                    0.05         0.00
               Comorbidities           0.18         0.02
               Other                   0.06         0.09
               Total                   0.86         0.11
Disease        Lesion location         0.48         0.09      0.78
features       Stage                   0.15         0.05
               Lesion size             0.51         0.09
               Resectability           0.08         0.05
               Prior treatment         0.04         0.02
               Other                   0.28         0.06
               Total                   0.78         0.27
Referral       Imaging                 0.33         0.33      0.67
pattern        Prior test              0.06         0.05
               Physician referral      0.17         0.18
               Other                   0.04         0.11
               Total                   0.39         0.49
Total                                  0.94         0.62

(a) For parameter categories, the table indicates the rate at which
studies specified any parameter within the category. Thus 86% of
articles described at least 1 demographic feature (age, sex, race,
comorbidities, or other) of the sampled population. Only 11% of
articles specified a demographic parameter for selection. Ninety-
four percent of studies described at least some feature of the
sampled population. For parameters, the percentages indicate the
specification rate for a particular use (description versus

Table 5. Parameter Specification Rates by Parameter, Tissue, and
Parameter Usea

                                                         Rate, %
Parameter     Parameter
Use           Category          Parameter               Salivary

Description   Demographics      Age                        73
                                Sex                        77
                                Race                        4
                                Comorbidities               0
                                Other                       4
              Disease           Lesion location            45
              characteristics   Stage                       9
                                Size                        9
                                Resectability               9
                                Prior treatment             0
                                Other                      27
              Referral path     Imaging                    14
                                Failed test                 0
                                Physician referral          0
                                Other                       0
Selection     Demographics      Age                         4
                                Sex                         0
                                Race                        0
                                Comorbidities               0
                                Other                       0
              Disease           Lesion location             0
              characteristics   Stage                       0
                                Size                        0
                                Resectability               0
                                Prior treatment             4
                                Other                       4
              Referral path     Imaging                     4
                                Failed test                 0
                                Physician referral         13
                                Other                       4

                                                         Rate, %
Parameter     Parameter
Use           Category          Parameter                Thyroid

Description   Demographics      Age                        91
                                Sex                        91
                                Race                        5
                                Comorbidities              23
                                Other                       9
              Disease           Lesion location             5
              characteristics   Stage                      91
                                Size                       59
                                Resectability               0
                                Prior treatment             9
                                Other                      18
              Referral path     Imaging                    36
                                Failed test                 0
                                Physician referral          0
                                Other                       4
Selection     Demographics      Age                        18
                                Sex                        14
                                Race                        0
                                Comorbidities               0
                                Other                       4
              Disease           Lesion location             0
              characteristics   Stage                       0
                                Size                       14
                                Resectability               0
                                Prior treatment             0
                                Other                      14
              Referral path     Imaging                    14
                                Failed test                 4
                                Physician referral          4
                                Other                      23

                                                         Rate, %
Parameter     Parameter
Use           Category          Parameter                 Lung

Description   Demographics      Age                        80
                                Sex                        80
                                Race                        5
                                Comorbidities              10
                                Other                       5
              Disease           Lesion location            80
              characteristics   Stage                      30
                                Size                       80
                                Resectability              10
                                Prior treatment             5
                                Other                      40
              Referral path     Imaging                    40
                                Failed test                10
                                Physician referral         15
                                Other                       0
Selection     Demographics      Age                         5
                                Sex                         0
                                Race                        0
                                Comorbidities               5
                                Other                       0
              Disease           Lesion location            30
              characteristics   Stage                      20
                                Size                       20
                                Resectability              10
                                Prior treatment             0
                                Other                       0
              Referral path     Imaging                    75
                                Failed test                15
                                Physician referral         30
                                Other                       0

                                                         Rate, %
Parameter     Parameter
Use           Category          Parameter               Pancreas

Description   Demographics      Age                        86
                                Sex                        86
                                Race                        5
                                Comorbidities              38
                                Other                       5
              Disease           Lesion location            67
              characteristics   Stage                      14
                                Size                       57
                                Resectability              14
                                Prior treatment             0
                                Other                      28
              Referral path     Imaging                    43
                                Failed test                14
                                Physician referral         14
                                Other                      10
Selection     Demographics      Age                         0
                                Sex                         0
                                Race                        0
                                Comorbidities               5
                                Other                       0
              Disease           Lesion location            10
              characteristics   Stage                       0
                                Size                        5
                                Resectability              10
                                Prior treatment             5
                                Other                       5
              Referral path     Imaging                    42
                                Failed test                 0
                                Physician referral         24
                                Other                      14

(a) The table shows the rate at which each parameter was specified by
the sample of studies. For example, 73% of salivary gland studies
described the age of the study population but only 4% of salivary
gland studies used age as a selection criterion.

Table 6: Reporting Characteristics (a)

                                                 Anatomic Site

Reporting Item                           Salivary   Thyroid   Lung

Did the study mention the target
population?                                 13        51       30
Did the study provide a comparison of
the characteristics of positive and
negative cases?                             50        34       38
Did the study provide a flow diagram?       13        17       25

                                           Anatomic Site

Reporting Item                           Pancreas   Total, %

Did the study mention the target
population?                                 36         33
Did the study provide a comparison of
the characteristics of positive and
negative cases?                             31         38
Did the study provide a flow diagram?       27         21

(a) The table entries indicate the percentage of studies that
reported on a particular item. For example, 13% of salivary gland
studies provided a flow diagram and 51% of thyroid studies mentioned
the target population.
COPYRIGHT 2014 College of American Pathologists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2014 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Schmidt, Robert L.; Narra, Krishna K.; Witt, Benjamin L.; Factor, Rachel E.
Publication:Archives of Pathology & Laboratory Medicine
Date:Jan 1, 2014
Previous Article:Detection of BRAF p.V600E Mutations in Melanoma by Immunohistochemistry Has a Good Interobserver Reproducibility.
Next Article:Getting Out From Behind the Paraffin Curtain.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters