Diagnostic Accuracy Studies of Fine-Needle Aspiration Show Wide Variation in Reporting of Study Population Characteristics: Implications for External Validity.
Lack of attention to external validity by studies is a frequent concern. (3) Study populations may differ considerably from the target population owing to restrictive selection criteria, (4,5) and though selection criteria are an important aspect of population descriptions, they have received little study. (6) Differences can arise from subtle sources such as referral patterns or patient choice, (3) and although guidelines and checklists, such as the STARD (standards of reporting diagnostic accuracy) statement, (7,8) have improved the quality of reporting, some authors (9) believe they emphasize internal validity over external validity. In general, there is a tradeoff between internal and external validity. Restrictive selection criteria reduce variance and improve internal validity but compromise external validity. There is consensus that more attention should be paid to external validity. This concern is reflected by the fact that indirectness (external validity) is included as a major component of the GRADE (Grading of Recommendations Assessment, Development and Evaluation) criteria. (10)
Features of population descriptions that are relevant to DTA studies include an adequate description of populations to facilitate comparisons and descriptions that are based on similar items, since population descriptions will vary by disease entity and by test. Overall, some degree of consistency would be useful to assess population descriptions.
In recent years there has been a heightened awareness of deficiencies in study design and reporting in DTA studies. (11-15) In 1999, the Cochrane Screening and Diagnostic Test Methods Working Group first convened to reduce deficiencies in diagnostic test reporting. Since then, several tools have been introduced to address these issues. The STARD checklist (8,16) was introduced to improve reporting and has been widely adopted by pathology journals. The QUADAS (Quality Assessment of Diagnostic Accuracy Studies) instrument (17,18) is an evidence-based survey used to assess applicability and risk of bias for studies included in systematic reviews of diagnostic accuracy studies. Though these checklists provide general guidance, we are unaware of any studies that have attempted to study population descriptions in DTA studies in detail.
Fine-needle aspiration cytology (FNAC) is a diagnostic test that is used to obtain samples for a range of diseases from a variety of anatomic sites. Fine-needle aspiration cytology DTA studies are frequently published, but even within a single anatomic site, FNAC DTA studies show considerable variability in accuracy. (19) It is not known whether the heterogeneity is due to variation in methodology or population parameters. In principle, the cause of variation can be determined by meta-analysis; however, studies must fully report the details of population characteristics to conduct such analyses. Two recent studies (20,21) have shown wide variation and many deficiencies in methods' reporting in fine-needle aspiration (FNA) studies. Given the deficiencies seen in methods' reporting, it is possible that similar deficiencies are common in population descriptions. To our knowledge, the quality of reporting of population parameters in FNA studies has not been studied. We therefore conducted a literature evaluation survey to assess the quality of reporting of population parameters in FNAC DTA studies and to evaluate the comparability (external validity) of studies focused on 1 anatomic site.
A diagnostic accuracy study should be designed to be relevant to a particular population (target population). (1) Through the course of participant selection, the actual study population will differ from the target population. Internal validity refers to the conclusions drawn from the study population. External validity refers to the applicability of the results to the target population (Figures 1 and 2).
Population parameters are used both to select a study population (selection) and to describe the population in order to compare it with others (description). (22) Parameter descriptors include demographics, disease states, and referral patterns (Table 1).
We evaluated studies focusing on 4 anatomic sites: lung, thyroid, salivary glands, and pancreas. We conducted a separate MEDLINE (US National Library of Medicine) search for each anatomic site to identify all FNAC DTA studies conducted between January 2005 and October 2011. We used the MEDLINE Mesh term "Sensitivity and Specificity" to filter for DTA studies, "biopsy, fine needle," and the term for the particular anatomic site, for example, [pancreas AND "biopsy, fine needle" AND "Sensitivity and Specificity"]. Studies were eligible if they could be identified as DTA studies for FNAC (ie, comparing diagnoses obtained by FNAC against a reference standard--either histopathology or clinical follow-up). Reviews and studies with fewer than 10 cases were excluded. Full reports of eligible studies were obtained, and studies were selected for potential inclusion if they provided data on the sensitivity and specificity of FNAC. There was restriction on language, study type, or location. For each anatomic site, we randomly selected a set of 20 studies for evaluation. Random selection was implemented as follows: we created a list of eligible studies in Microsoft Excel (Microsoft Corporation, Redmond, Washington), assigned a random number between 1 and 1000 to each study, sorted the studies, and then selected the first 20 studies. Random selection was used because our objective was to sample the overall literature as would be required in a meta-analysis of diagnostic accuracy studies. High-quality meta-analyses generally require a systematic review of the literature and assessment of comparability (ie, external validity) as implemented by the QUADAS survey. (23,24) Meta-analysis requires the synthesis of all evidence. Random sampling provides an unbiased sample of the published literature as would be required for a meta-analysis of DTA studies. (25) The search and selection process is shown in Figure 3.
We conducted a preliminary survey to develop a list of parameters that were commonly used to describe the populations in our collection of studies. In this preliminary survey, all included studies were scanned for a list of parameters. This approach was used to identify all the population descriptors that had been reported in the literature. We are not aware of any studies demonstrating the impact of population characteristics on FNAC accuracy. Our approach identified parameters reported by authors of FNA DTA studies. Our method for identifying population descriptors reflects the collective opinion of 80 authors regarding the choices they made for reporting population parameters. Presumably, these parameters were reported because authors thought they may be important for interpreting the results of their studies. We did not include population descriptors that might be thought to affect FNAC but had not been reported in the literature. We then organized the parameters into categories as described above. We also developed a list of study characteristics that might be related to parameter usage (eg, journal type, study location). The survey instrument was pilot tested on a subset of studies and was revised from evaluation of discrepancies.
Each study was independently evaluated by 2 different authors (R.L.S. and K.K.N.) using a standard form (Table 1). Each evaluator filled out the form by marking whether a parameter was used and the manner in which the parameter was used. For example, if a study mentioned the sex distribution of the sampled population, a checkmark would be placed next to sex" and the use would be categorized as description (see Table 1). If the study was limited to males, a checkmark would be placed next to sex and the use would be categorized as selection.
Survey data were recorded in a database (Microsoft Access 2010) and all statistical analysis was conducted by using Stata Statistical Software Release 12 (Stata Corporation, College Station, Texas). Results were considered statistically significant if P < .05. Multivariate logistic regression was used to assess the effect of various parameters (anatomic site, journal type, population parameter) on the specification rate. The specification rate is defined as the probability that a particular item was specified in the "Methods" section of studies.
We identified 236 eligible studies from 4 anatomic sites (lung, 49; thyroid, 88; salivary glands, 77; pancreas, 22) and randomly chose 20 studies from each site for evaluation.
Per-item agreement averaged 83% (95% confidence interval: 77%-99%).
Characteristics of Evaluated Studies
Studies were conducted in a total of 30 countries but were most frequently conducted in the United States (37%) (Table 2). Clinical specialty of the author, journal type, and anatomic biopsy site were highly correlated (P = .001). Most studies were retrospective. Most prospective studies were lung and pancreas (37%) compared with salivary and thyroid studies (12%). Many articles failed to mention whether cases were consecutive. Most studies were conducted in tertiary care settings.
We identified 12 items that were commonly used to describe the study populations (Table 1). These were grouped into 3 categories: demographics (age, sex, race, and comorbidities), disease characteristics (lesion location, disease stage, lesion size, resectability, and prior treatment), and referral path (imaging tests, failed test, and physician referral). We added a category of "other" in each group to capture low-frequency parameters that did not fit into the designated categories. The items in Table 1 correspond to those most frequently mentioned in the literature and do not represent items that could, in theory, affect accuracy.
Parameter Specification per Study
There was considerable variability in the number of parameters specified per study (Table 3). There were significant differences in the average number of parameters specified between anatomic sites (P = .001) and by item use (P = .001). There was high interrater agreement on the total number of parameters specified per study (observed agreement = 89.9%, expected agreement = 22.0%, k = 0.87, P < .001).
Specification Rates by Parameter and Parameter Category
Overall, 94% of studies used at least 1 parameter to describe a population (Tables 4 and 5). Eighty-six percent of studies reported demographic characteristics, and 78% reported disease characteristics. Selection processes were described less frequently. Sixty-two percent of all studies specified at least 1 parameter to describe a selection process (Table 4). The relative frequency of parameter category specification varied by anatomic site. A more detailed view (Table 5) shows that age and sex were the most frequently specified parameters. Age, sex, lesion size, and location were frequently described but were infrequently used for selection. Imaging results were the most frequently specified parameter for selection. There was no significant difference in individual parameter specification by anatomic site after controlling for parameter type and use.
For all studies on average, 33% mentioned the target population, 38% provided a comparison of features of positive and negative cases, and 21% provided a flow diagram. The differences between anatomic sites were not statistically significant (Table 6).
Our study shows considerable variability in the reporting of population characteristics. We found significant differences in population descriptions across anatomic sites. We expected that reporting of population parameters would vary by site owing to the differences in the underlying disease entities and referral patterns; however, we also found considerable variability of reporting for studies conducted within a particular site. In some cases, the number of parameters varied from 0 to 5. This variability suggests that there is lack of consensus regarding the parameters that should be reported even for a particular diagnostic test applied at a particular diagnostic site.
We found that population parameters were used most often to describe populations but were used less frequently to describe population selection. We also found that only a third of studies referred to a target population, which was surprising, because it is fundamental to the formulation of a clinical research question. For example, the specification of the target population is a key component of the commonly used PICO framework for problem formulation. (2) Additionally, the STARD guidelines recommend that DTA studies include a flow diagram, but only 21% of studies in our sample did so.
There are other indications that population reporting is often inadequate. A significant number of studies failed to provide any description of either the population or the selection process. We found that age and sex were, by far, the most frequently specified population parameters. Although these parameters can be important in some contexts, a diagnostic accuracy study should report on parameters that are likely to affect test performance. We suspect that age and sex are frequently specified owing to accessibility rather than importance. Studies typically used a total of 2 to 4 parameters to describe populations. If one discounts age and sex, populations are typically described by at most 1 or 2 parameters. Our survey counted any use of a parameter regardless of whether it was used in a meaningful way. For example, if a study mentioned that 50% of the population had a previous magnetic resonance imaging scan, we counted that as a description of the referral process despite the fact that the usefulness of that information is questionable. Thus, our survey probably overestimates the rate at which useful parameters are specified. Even using this liberal definition of parameter specification, we found that studies reported relatively few parameters. Thus, in addition to variability and standardization, our survey raises concerns about the overall adequacy of population descriptions.
The quality of reporting population parameters was generally lower in studies reporting on salivary gland lesions than other anatomic sites. There were other indications to suggest that reporting may be generally lower in salivary gland studies. For example, salivary gland studies had weaker designs, had fewer prospective studies, and infrequently had longitudinal follow-up. Salivary gland studies are often based on convenience sampling of surgery cases and suffer from partial verification bias. (21) Salivary gland studies most often appeared in surgical journals. Thus, it is possible that reporting quality varies by journal type.
Population descriptions are important because they are used to assess the applicability (external validity) of a study. In the context of diagnostic accuracy studies, it is important to report population parameters that are likely to affect accuracy. Test accuracy can be affected by a variety of population factors such as prior testing, stage of disease, and prior treatment. (1) For diagnostic tests, the external validity of a study requires an assessment of whether the test will perform similarly in a target population as it did in the study sample population (Figure 1), which can be ascertained by comparing the subset of population parameters that affect accuracy (Figure 4), since not all population parameters will affect accuracy. The population parameters that affect accuracy may differ from the parameters that affect therapeutic efficacy. Thus, it is necessary for researchers to carefully consider which parameters may affect the accuracy of a particular diagnostic test and to report them in a way that facilitates assessment of external validity. Researchers should also be aware that the impact of various population parameters may be unknown and will have to be determined by meta-analysis of DTA studies. In our opinion, researchers should be conservative and report parameters that may affect accuracy so these can be tested later by meta-analysis.
Relatively few studies (38%) provided a comparison between the characteristics of the diseased and nondiseased population. This information is important for assessment of spectrum bias as shown in Figure 2. Spectrum bias can be viewed as a special case of external validity and requires a description of the nondiseased population.
Our study has identified several possible areas for improvement. Overall, it appears that studies could improve the description of population selection. As a starting point, it would be helpful if studies formulated a clear research question in the PICO format that specifies a target population. Studies should then describe the selection process that was used to obtain a sample population to answer the question. A well-formed research question not only improves the design of a study but also makes it easier to identify relevant studies. We believe the framework in Figure 1 provides a useful way to communicate the selection of the sample population and the applicability of the sample population to the research question. We also believe that there is significant potential for improvement in the description of the sample population. Often, the sample populations were described by only a few aggregate terms. It is doubtful that such sparse descriptions would enable a clinician or researcher to assess the applicability of a study to a specific clinical problem. It is not possible to provide specific guidelines because the requirements will differ by disease, by site, and by test. On the other hand, it is possible to provide some broad guidelines. For example, authors should fully report referral patterns (previous testing, referring physician type) because referral patterns are known to affect diagnostic accuracy. (1) Similarly, information to assess the pretest probability of malignancy would be helpful for comparing outcomes of DTA studies. Information about severity of disease (eg, lesion size, stage) would be helpful for assessing possible spectrum bias. While the specifics may vary by anatomic site, our results suggest that there is opportunity to improve the quality of reporting of population characteristics in FNA DTA studies.
There were many studies in which reporting was clearly inadequate (eg, those that reported no parameters), but it is difficult to say how many parameters are sufficient or what parameters should be required. Authors must be selective and cannot report every detail. Comparability and external validity are determined by comparing results across multiple studies and gauging the impact of various factors on outcomes. Eventually, key factors emerge and these should be reported. Clearly, the list of factors will depend on the anatomic site. Although there are theoretic reasons to believe that population differences may affect FNAC accuracy, very few studies have examined the impact of population characteristics on diagnostic accuracy of FNAC. For example, tumor size has been shown to affect the diagnostic yield. (26) Fine-needle aspiration DTA studies show wide variation in accuracy, and it is possible that population differences may account for some of this variability. The role of population differences will most likely be revealed through analysis of heterogeneity in meta-analytic studies. This type of analysis will depend upon careful documentation of population parameters in the included DTA studies.
Diagnostic test accuracy studies show significant heterogeneity in outcomes. In theory, part of this heterogeneity might be due to population factors. Although authors report population parameters in FNA DTA studies, little is known about the impact of population characteristics on diagnostic accuracy. The impact of population characteristics on accuracy could be studied by meta-analytic methods that compare diagnostic performance across studies and correlate diagnostic performance with population characteristics. Such meta-analytic studies of heterogeneity are only feasible if the primary studies report population characteristics that can be compared across studies. Our analysis shows that analysis of heterogeneity of diagnostic accuracy results would be difficult owing to the variability in reporting of population characteristics. This is a lost opportunity because meta-analyses of primary studies could provide insight into the variability in FNA DTA outcomes.
The need for meta-analyses of diagnostic studies is well recognized. An assessment of comparability is a key component of the evaluation of studies for inclusion in a meta-analysis. (23,24) Our study is a literature assessment study that was designed to determine whether the quality of reporting is generally sufficient to support such evaluations. We have found that the variability in reporting compromises the assessment of comparability as required for meta-analysis of DTA studies.
The need to compare the external validity of DTA studies also arises in clinical practice. For example, population characteristics might be important in interpreting differences in performance of 2 DTA studies or in determining whether the accuracy reported in a DTA study is applicable to a particular patient. Full reporting of population parameters is required for these assessments.
Our study has several limitations. We only identified parameters that were used within the set of included studies. It is possible that we missed some important parameters because they were not mentioned in our set of studies. More generally, a population parameter could be important but might never have been studied or mentioned in a study. Our method was sufficient to show variability in reporting but not sufficient to determine whether reporting was adequate. Our method was based on random sampling and it is possible that, by chance, our sample was not representative. Our lowest sampling rate was in thyroid studies (20 of 88 studies), so it is unlikely that our sample was unrepresentative. Also, because we used randomized sampling, the risk of bias is low.
Our study is one of the first to assess reporting of population parameters in the context of diagnostic accuracy studies. Unfortunately, there is little evidence regarding the impact of population parameters on FNAC accuracy. Thus, it was difficult to assess the adequacy of reporting. It would be useful to conduct a similar study for a diagnostic test in which there is more evidence regarding the impact of population parameters on test performance.
Fine-needle aspiration cytology DTA studies show considerable variability in the description of sample populations and the population selection process. Studies generally reported 2 to 4 parameters to describe the sample population. The selection process was generally described in less detail than sample populations. Studies often fail to provide flow diagrams or to provide a clear statement of the research problem. There is considerable opportunity for studies to improve both descriptions of sample populations and the process used to select them. Variation in reporting of study populations makes it difficult to compare and synthesize evidence from FNAC DTA studies, and to develop broad guidelines for clinical application of FNAC.
Please Note: Illustration(s) are not available due to copyright restrictions.
(1.) Knottnerus JA, Buntinx F. The Evidence Base of Clinical Diagnosis: Theory and Methods of Diagnostic Research. Oxford, England: Wiley; 2011.
(2.) Guyatt G, Rennie D, Meade MO, Cook DJ. Users' Guide To The Medical Literature: A Manual for Evidence-Based Clinical Practice. New York, New York, USA. McGraw-Hill; 2008.
(3.) Rothwell PM. External validity of randomised controlled trials: "to whom do the results of this trial apply?". Lancet. 2005; 365(9453):82-93.
(4.) Travers J, Marsh S, Caldwell B, et al. External validity of randomized controlled trials in COPD. Respir Med. 2007; 101(6):1313-1320.
(5.) Uijen AA, Bakx JC, Mokkink HGA, van Weel C. Hypertension patients participating in trials differ in many aspects from patients treated in general practices. J Clin Epidemiol. 2007; 60(4):330-335.
(6.) Perrio M, Waller PC, Shakir SA. An analysis of the exclusion criteria used in observational pharmacoepidemiological studies. Pharmacoepidemiol Drug Saf. 2007; 16(3):329-336.
(7.) Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Clin Biochem. 2003; 40(pt 4):357-363.
(8.) Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003; 49(1):7-18.
(9.) Persaud N, Mamdani MM. External validity: the neglected dimension in evidence ranking. J Eval Clin Pract. 2006; 12(4):450-453.
(10.) Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines, 8: rating the quality of evidence--indirectness. J Clin Epidemiol. 2011; 64(12):1303-1310.
(11.) Smidt N, Rutjes AWS, van der Windt DAWM, et al. The quality of diagnostic accuracy studies since the STARD statement: has it improved? Neurology. 2006; 67(5):792-797.
(12.) Smidt N, Rutjes AWS, van der Windt DAWM, et al. Quality of reporting of diagnostic accuracy studies. Radiology. 2005; 235(2):347-353.
(13.) Whiting P, Rutjes AWS, Dinnes J, Reitsma JB, Bossuyt PMM, Kleijnen J. A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools. J Clin Epidemiol. 2005; 58(1):1-12.
(14.) Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of designrelated bias in studies of diagnostic tests [erratum in JAMA. 2000; 283(15):1963]. JAMA. 1999; 282(11):1061-1066.
(15.) Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004; 140(3):189-202.
(16.) Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative: Standards for Reporting of Diagnostic Accuracy. Clin Chem. 2003; 49(1):1-6.
(17.) Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003; 3:25.
(18.) Whiting PF, Weswood ME, Rutjes AWS, ReitsmaJB, Bossuyt PNM, Kleijnen J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol. 2006; 6:9.
(19.) Schmidt RL, Hall BJ, Wilson AR, Layfield LJ. A systematic review and meta-analysis of the diagnostic accuracy of fine-needle aspiration cytology for parotid gland lesions. Am J Clin Pathol. 2011; 136(1):45-59.
(20.) Schmidt RL, Factor RE, Affolter KE, et al. Methods specification for diagnostic test accuracy studies in fine-needle aspiration cytology: a survey of reporting practice. Am J Clin Pathol. 2012; 137(1):132-141.
(21.) Schmidt RL, Factor RE, Witt BL, Layfield LJ. Quality appraisal of diagnostic accuracy studies in fine-needle aspiration cytology: a survey of risk of bias and comparability. Arch Pathol Lab Med. In press.
(22.) Elwood J. Critical Appraisal of Epidemiological Studies and Clinical Trials. New York, NY: Oxford University Press; 2007.
(23.) Reitsma JB, Rutjes AWS, Whiting P, Voassov VV, Leeflang MMG, Deeks JJ. Chapter 9: Assessing Methodological quality. In: Deeks JJ, Bossuyt PM, Gatsonis C (editors), Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0.0. The Cochrane Collaboration, 2009. Available from: http://srdta.cochrane.org/.
(24.) Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011; 155(8):529-536.
(25.) Lohr SL. Sampling: Design and Analysis. Boston, MA: Brooks/Cole; 2009.
(26.) Siddiqui AA, Brown LJ, Hong SK, et al. Relationship of pancreatic mass size and diagnostic yield of endoscopic ultrasound-guided fine needle aspiration. Dig Dis Sci. 2011; 56(11):3370-3375.
Robert L. Schmidt, MD, PhD, MBA; Krishna K. Narra, MD, MS; Benjamin L. Witt, MD; Rachel E. Factor, MD, MHS
Accepted for publication April 6, 2014.
From the Department of Pathology, University of Utah School of Medicine and ARUP Laboratories, Salt Lake City, Utah.
The authors have no relevant financial interest in the products or companies described in this article.
Reprints: Robert L. Schmidt, MD, PhD, MBA, Department of Pathology, University of Utah School of Medicine, 1 5 North Medical Dr E, Salt Lake City, UT 84112 (e-mail: Robert.email@example.com. edu, Robert.firstname.lastname@example.org).
Caption: Figure 1. The figure (derived from Elwood (22) by permission of Oxford University Press) shows the selection processes that define various populations in the definition of a study sample and study participants. A source population (sampling frame) is the population that is available to study a clinical question posed in terms of a target population. The eligible population is derived from the source population by the application of selection (exclusion/inclusion) criteria. The study sample is obtained from the eligible population by using a sampling plan. Not all cases in a study sample contribute study data. Thus, the actual study participants (cases) may differ from the study sample. Internal validity relates to the specific conclusions drawn from the study participants. External validity is defined by the applicability of the results obtained from a set of study participants to a hypothetic target population. Internal validity is necessary but not sufficient for external validity.
Caption: Figure 2. The figure illustrates the definition of spectrum bias. The disease spectrum is the difference in characteristics between those with disease and those without disease. The spectrum may be described as wide (large difference) or narrow (small difference). The spectrum difference is a comparison of the disease spectrum of the study population (those actually studied) and the disease spectrum of a target population. Two studies are comparable if the spectrum difference is small.
Caption: Figure 3. The figure shows the results of the literature search and study selection process. Twenty studies were randomly selected from the set of studies identified for each anatomic site. Abbreviations: DTAs, diagnostic test accuracy studies; FNAC, fine-needle aspiration cytology.
Caption: Figure 4. The figure provides a conceptual model for evaluation of external validity of diagnostic tests. External validity involves a comparison of the parameters of a target population and the sample population. In the context of a diagnostic accuracy study, external validity is assessed by comparing factors that affect diagnostic accuracy.
Table 1. Survey Form and Parameter Classification (a) Parameter Application Parameter Type Description Selection __Age __Age __Sex __Sex Demographic __Race __Race __Comorbidities __Comorbidities __Other * __Other __Lesion location __Lesion location __Stage __Stage Disease __Lesion size __Lesion size __Resectability __Resectability __Prior treatment __Prior treatment __Other __Other __Imaging __Imaging Referral path __Prior test __Prior test __Physician referral __Physician referral __Other __Other (a) The table shows the parameters that were identified as commonly specified and the groups to which they were assigned. For example, age is a parameter that was assigned to the demographics category. The survey was used to record the frequency with which a particular parameter or parameter category was used either to describe a population or to describe the process by which the population was selected. For example, a patient may have been selected on the basis of imaging results (referral path) or the study may simply have described imaging results in the selected patients (eg, 45% of the study population had evidence of cancer in imaging studies) but did not use imaging results to select patients. Table 2. Characteristics of Included Studies (a) Salivary Thyroid Gland Study location (n) Most frequent UK (3) USA (10), countries Italy (2), Turkey (2) Journal types Pathology 3 13 Radiology 0 0 Surgery 16 6 Gastroenterology 0 0 Other 1 1 Avg impact factor 1.1 2.2 Author types Pathology 5 8 Radiology 0 1 Surgery 15 7 Gastroenterology 0 0 Other 0 4 Study design Prospective, % 10 15 Consecutive, % 20 50 IRB approval, % 10 35 Longitudinal FU, % 0 10 Tertiary setting, % 95 100 Lung Pancreas Study location (n) Most frequent USA (6), USA (11), countries Japan (3), UK (2) Australia (2), Korea (2) Journal types Pathology 4 4 Radiology 4 4 Surgery 3 1 Gastroenterology 0 11 Other 9 0 Avg impact factor 3.6 2.8 Author types Pathology 3 3 Radiology 3 3 Surgery 3 1 Gastroenterology 0 13 Other 11 0 Study design Prospective, % 40 35 Consecutive, % 55 45 IRB approval, % 70 60 Longitudinal FU, % 55 35 Tertiary setting, % 90 50 (a) Journals and authors assigned to 5 categories (pathology- cytology, radiology, surgery, gastroenterology, and other). Author category was based on the departmental affiliation of the corresponding author. Study design characteristics were counted as positive only if they were specifically reported. Many studies did not specify study characteristics and the values given represent minimum values. Study location indicates the countries in which studies were conducted. There were 20 studies for each anatomic site. For thyroid, studies were most frequently conducted in the United States (10 studies), Italy (2 studies), and Turkey (2 studies). Abbreviations: Avg, average; FU, follow-up;IRB, institutional review board;UK, United Kingdom;USA, United States. Table 3. Parameter Specification Rates by Tissue and Item Usea Parameter Specification Rate, % Parameter Use No. of Parameters Salivary Thyroid Specified Description 0 32 9 1 5 9 2 18 36 3 27 27 4 14 14 5 5 5 6 0 0 7 0 0 Mean (SD) 1.8 (1.6) 2.4 (1.1) Selection 0 82 41 1 14 45 2 5 9 3 0 0 4 0 0 5 0 5 6 0 0 7 0 0 Mean (SD) 0.2 (0.4) 0.6 (0.6) Parameter Specification Rate, % Parameter Use Lung Pancreas Total Description 5 5 13 10 0 6 5 24 21 15 29 25 45 10 20 20 24 13 0 5 1 0 5 1 3.8 (1.1) 3.4 (1.2) 2.9 (1.6) Selection 35 57 54 25 38 31 25 0 9 5 5 2 5 0 1 5 0 2 0 0 0 0 0 0 0.9 (0.9) 0.5 (0.5) 0.5 (0.7) (a) The table shows the percentage of studies that specified a particular number of population description parameters by tissue type and items use. For example, 32% of salivary gland studies provided no parameters to describe the sample population and 5% provided only a single parameter. Eighty-two percent of salivary gland studies provided no specific parameters to describe selection criteria. The mean is the average number of parameters specified. Table 4. Specification Rates for Parameters and Parameter Categories (a) Parameter Parameter Use (% of Studies) Category Parameter Description Selection Total Demographics Age 0.82 0.07 0.86 Sex 0.84 0.04 Race 0.05 0.00 Comorbidities 0.18 0.02 Other 0.06 0.09 Total 0.86 0.11 Disease Lesion location 0.48 0.09 0.78 features Stage 0.15 0.05 Lesion size 0.51 0.09 Resectability 0.08 0.05 Prior treatment 0.04 0.02 Other 0.28 0.06 Total 0.78 0.27 Referral Imaging 0.33 0.33 0.67 pattern Prior test 0.06 0.05 Physician referral 0.17 0.18 Other 0.04 0.11 Total 0.39 0.49 Total 0.94 0.62 (a) For parameter categories, the table indicates the rate at which studies specified any parameter within the category. Thus 86% of articles described at least 1 demographic feature (age, sex, race, comorbidities, or other) of the sampled population. Only 11% of articles specified a demographic parameter for selection. Ninety- four percent of studies described at least some feature of the sampled population. For parameters, the percentages indicate the specification rate for a particular use (description versus selection). Table 5. Parameter Specification Rates by Parameter, Tissue, and Parameter Usea Specification Rate, % Parameter Parameter Use Category Parameter Salivary Description Demographics Age 73 Sex 77 Race 4 Comorbidities 0 Other 4 Disease Lesion location 45 characteristics Stage 9 Size 9 Resectability 9 Prior treatment 0 Other 27 Referral path Imaging 14 Failed test 0 Physician referral 0 Other 0 Selection Demographics Age 4 Sex 0 Race 0 Comorbidities 0 Other 0 Disease Lesion location 0 characteristics Stage 0 Size 0 Resectability 0 Prior treatment 4 Other 4 Referral path Imaging 4 Failed test 0 Physician referral 13 Other 4 Specification Rate, % Parameter Parameter Use Category Parameter Thyroid Description Demographics Age 91 Sex 91 Race 5 Comorbidities 23 Other 9 Disease Lesion location 5 characteristics Stage 91 Size 59 Resectability 0 Prior treatment 9 Other 18 Referral path Imaging 36 Failed test 0 Physician referral 0 Other 4 Selection Demographics Age 18 Sex 14 Race 0 Comorbidities 0 Other 4 Disease Lesion location 0 characteristics Stage 0 Size 14 Resectability 0 Prior treatment 0 Other 14 Referral path Imaging 14 Failed test 4 Physician referral 4 Other 23 Specification Rate, % Parameter Parameter Use Category Parameter Lung Description Demographics Age 80 Sex 80 Race 5 Comorbidities 10 Other 5 Disease Lesion location 80 characteristics Stage 30 Size 80 Resectability 10 Prior treatment 5 Other 40 Referral path Imaging 40 Failed test 10 Physician referral 15 Other 0 Selection Demographics Age 5 Sex 0 Race 0 Comorbidities 5 Other 0 Disease Lesion location 30 characteristics Stage 20 Size 20 Resectability 10 Prior treatment 0 Other 0 Referral path Imaging 75 Failed test 15 Physician referral 30 Other 0 Specification Rate, % Parameter Parameter Use Category Parameter Pancreas Description Demographics Age 86 Sex 86 Race 5 Comorbidities 38 Other 5 Disease Lesion location 67 characteristics Stage 14 Size 57 Resectability 14 Prior treatment 0 Other 28 Referral path Imaging 43 Failed test 14 Physician referral 14 Other 10 Selection Demographics Age 0 Sex 0 Race 0 Comorbidities 5 Other 0 Disease Lesion location 10 characteristics Stage 0 Size 5 Resectability 10 Prior treatment 5 Other 5 Referral path Imaging 42 Failed test 0 Physician referral 24 Other 14 (a) The table shows the rate at which each parameter was specified by the sample of studies. For example, 73% of salivary gland studies described the age of the study population but only 4% of salivary gland studies used age as a selection criterion. Table 6: Reporting Characteristics (a) Anatomic Site Reporting Item Salivary Thyroid Lung Did the study mention the target population? 13 51 30 Did the study provide a comparison of the characteristics of positive and negative cases? 50 34 38 Did the study provide a flow diagram? 13 17 25 Anatomic Site Reporting Item Pancreas Total, % Did the study mention the target population? 36 33 Did the study provide a comparison of the characteristics of positive and negative cases? 31 38 Did the study provide a flow diagram? 27 21 (a) The table entries indicate the percentage of studies that reported on a particular item. For example, 13% of salivary gland studies provided a flow diagram and 51% of thyroid studies mentioned the target population.
|Printer friendly Cite/link Email Feedback|
|Author:||Schmidt, Robert L.; Narra, Krishna K.; Witt, Benjamin L.; Factor, Rachel E.|
|Publication:||Archives of Pathology & Laboratory Medicine|
|Date:||Jan 1, 2014|
|Previous Article:||Detection of BRAF p.V600E Mutations in Melanoma by Immunohistochemistry Has a Good Interobserver Reproducibility.|
|Next Article:||Getting Out From Behind the Paraffin Curtain.|