College of American Pathologists Cancer Protocols: Optimizing Format for Accuracy and Efficiency.
There are many parallels between the health care and aviation industries in communicating essential information, and some aviation safety procedures have been incorporated into the field of medicine. For example, surgeons and pathologists have adopted "call back" or "read back" methods to ensure that verbal information has been understood as intended. (4) Carefully prepared, preflight checklists have proven invaluable in ensuring safe flight conditions. Similarly, more than 20 years ago, the College of American Pathologists (CAP) developed the cancer protocols, the equivalent of aviation checklists, to communicate complex, written information. (5) This improved report format was designed to replace the prose of traditional pathology reports and took the form of a synopsis, which, according to the Merriam Webster Dictionary, is a "short description of the most important information about something; a summary or outline." (6)
The CAP protocols incorporate data elements that are rigorously scrutinized and evaluated for their level of evidence. The protocols are developed through a consensus process, in which experts provide content and specific language regarding a fixed set of response choices, and supporting literature. Once compiled by the expert panel, there is an open period when the proposed protocols are posted on the CAP Web site (http://www.cap.org) for comment. The information in the protocols is organized in a standardized synoptic format containing headers, required data elements (RDEs), and predefined response choices (Figure 1). Synopses have proven superior to prose for conveying diagnostic and prognostic information and facilitating comprehension, (7,8) and the header/RDE/response synoptic is now considered a valuable format for the inclusion of important pathologic information. (9-11)
Synoptic reports facilitate medical decision making but, to be effective, the information has to be presented in a manner that ensures efficient and accurate interpretation by the clinician. (12,13) Although there is evidence of user satisfaction with synoptic formats, (14,15) there is no consensus as to the ideal way to display information. In fact, the CAP protocol-development process is driven by pathologists who have organized synoptic reports following their thought processes, from macroscopic to microscopic examination, with little input from the health care provider. Because cancer protocols are independently created by site-specific content experts, there is inconsistency in the nomenclature used, the wording varies within and among protocols, and related information is located in several places within a single protocol. For example, lymph nodes with metastases can be described as involved, positive, or with metastasis, and the words lymph node or lymph nodes appear in 5 different locations in the fallopian tube protocol chosen for the current study. Thus, the focus has been on content, rather than on usability and readability, and because the CAP protocols do not include recommendations regarding formatting, the final products vary greatly. (53) That variation can be confusing and time consuming for the health care provider and may lead to errors in interpretation affecting patient safety and efficiency in the clinical setting. In this study, we have assessed the aesthetic appeal of various formats and, via a query analysis, the impact of various synoptic format styles on speed and accuracy of data extraction.
MATERIALS AND METHODS
This study was approved by the institutional review board of the University of South Florida and the Research and Development Committee of the James A. Haley Veterans' Hospital (Tampa, Florida). To reduce possible knowledge bias, we selected the CAP fallopian tube protocol, (16) a complex report of a rare neoplasm unfamiliar to most clinicians. This protocol was used as published on the CAP Web site, without content modification, but with changes in formatting. For the content of the protocol, we adopted the same terminology used by the CAP: header, RDE, and response (Figure 1). We conducted a survey with 2 main components: aesthetics (visual preference) and accuracy (correct answers in the context of time). Voluntary participants were recruited from the University of South Florida's colleges of medicine and nursing, with the concurrence of the offices of student affairs, the cancer program committee, and the tumor boards of the James A. Haley Veterans' Hospital, and from a national meeting of the Florida Cancer Registrars Association. The survey was conducted at multiple locations on different dates. Participants indicated their aesthetic preference using a paper with the 4 formats (Figure 2), whereas the accuracy component was performed using an audience response system, as described below. Participants included nurses, medical students, nursing students, dentists and dental assistants, cancer registrars, and physicians of various specialties, such as primary care, radiation, medical oncology, and surgery. The study consisted of 4 parts: (1) aesthetic appeal, (2) accuracy, (3) efficiency (accuracy in the context of time), and (4) readability or complexity of the RDE/response pair.
We evaluated the visual appeal of 4 formats with variation in capitalization (RDE versus response) and justification (left-justified versus column). In left-justified formats, the text was aligned along the left margin without separation between RDE and response. In column formats, the RDE was aligned along the left margin with the response separated by a blank space, creating 2 separate columns. In Figure 2, the upper-right and lower-left forms are in column format, whereas the upper-left and lower-right forms are left justified. Font size was intentionally decreased to emphasize organization rather than content. Participants received a sheet of paper showing the 4 formats (Figure 2) and were instructed to record their sex and to rank the formats using a Likert scale (1, most pleasing; 4, least pleasing) to determine if there was a relationship between sex and visual appeal of the format. The participants were given only 10 seconds to judge the formats because the goal was the immediate impression rather than an in-depth analysis. (17)
We randomly distributed 8 formats (1 per participant), identified with the letters A through H (example in Figure 1). The formats contained all required data elements but with variations in capitalization (RDE versus response) and justification (left-justified versus columned) (Table 1). A summary box, with relevant data selected from the National Comprehensive Cancer Network clinical practice guidelines for treatment of fallopian tube cancer, (18) was also included at the top of some formats. We instructed participants that it was not necessary to understand the information, and no instructions were given regarding the content or purpose of the summary box. We asked 11 factual, multiple-choice questions, designed to anticipate the information that a health care provider would be looking for when faced with the protocol, which were worded to match the protocol's RDE/response pairs (Table 2). The questions and multiple-choice answers were projected using PowerPoint (Microsoft, Redmond, Washington) with integrated TurningPoint audience-response software (Turning Technologies, Youngstown, Ohio) to capture answers and time to answer. To ensure that the system was working correctly, to engage the participants, and to ensure familiarity with the technology, we began by asking generic questions not relevant to the CAP protocol, such as "In what season is your birthday?" The average time that participants took to answer a question for which they should know the answer was 11.3 seconds (range, 1-59 seconds). We concluded that 60 seconds was a sufficient amount of time for participants to answer the questions related to the protocol. No problems were identified with the technology, and it was, therefore, assumed that differences in answers and answering times were due solely to participants' interaction with the format. Some questions were felt to be straightforward, easily answerable, with clear terminology, and with answers located in only one place in the protocol. Other questions were designed to address inconsistencies, such as word variation and word repetition, within the protocol. The answer to 1 question (Q) (Q5, is there perineural invasion?) was not found in the protocol, and the answer to another question (Q4, were internal iliac lymph nodes sampled?) had to be deduced from information in the protocol. Answers were classified as correct (accuracy), incorrect (inaccuracy), and unanswered, if no response was given in the time allowed.
Efficiency was assessed by correlating a specific format with the time participants took to correctly respond to questions. Time was measured either as total response time (the entire time that it took participants to answer correctly) or in arbitrary periods of 10 seconds. (17)
Readability and Complexity of the RDE/Response Pair
To investigate the readability of the RDE/responses, 3 readability tests--the Flesch-Kincaid Grade Level Readability Test, (19) the SMOG index, (20) and the Automated Readability Index (21)--were used. These tests are commonly used in educational practices to assess both the readability and the understandability of a reading passage. Whether based on characters per word (Automated Readability Index) or syllables per word (Flesch-Kincaid Grade Level Readability Test and SMOG), the tests convert readability scores to the US grade-level system: grade 1 (ages 6-7 years) to grade 12 (ages 17-18 years). A final composite score, consisting of the average of these 3 tests, was used for the final analysis.
To test group differences in the aesthetics analysis, we used a Student t test. For the accuracy analysis, data were collected as mean (SD) or, where appropriate, frequency and percentage. We used the item difficulty index as a measure of the proportion of participants answering a question correctly. We also used item discrimination analysis to assess correctness and to discriminate between participants presented with columned or justified formats. The index of discrimination (percentage correct in the columned minus the percentage correct in the justified format) was scored as columned versus justified for all items, except questions 2 and 7, which were scored as justified versus columned to facilitate positive indices. The ratio of the index of discrimination to maximum discrimination investigated the correlation between a participant's response to a single question and his or her total score when answering all questions. Correlation values ranged from -1.00 to +1.00, and a greater value indicated that more participants correctly answered a given question. If a question was so easy that nearly all participants answered it correctly, its discrimination value would be near zero. If a question was so difficult that nearly all participants answered it incorrectly, its discrimination value would also be near zero. A most-effective question would have a moderate difficulty and a high discrimination value. Thus, greater values indicated how effective a question was in discriminating between participants who performed well and those who do not. We used these values as an indirect measure of format performance and set our threshold at a correlation of more than 0.45 (an r of 0.45 = [r.sup.2] of 0.20 or 20%). The Student t test was used to test group differences for items, and total score and an analysis of variance model were used to assess the association of formats with answering time. All analyses were conducted using SAS version 9.3 software (SAS Institute, Cary, North Carolina) with a 2-tailed P < .05 considered statistically significant.
A total of 432 individuals participated in the aesthetic component of the study. Participants were cancer registrars (n = 60; 13.9%), nurses (n = 57; 13.2%), nursing students (n = 111; 25.7%), third-year medical students (n = 44; 10.2%), fourth-year medical students (n = 84; 19.4%), dental staff (n = 11; 2.5%), primary care physicians (n = 27; 6.3%), radiation therapy staff (n = 6; 1.4%), and members of the hospital cancer committee, including oncologists of several disciplines (n = 32; 7.4%). Of the 291 participants (67.4%) who answered the question about sex, 120 (41.2%) were women, and 171 (58.8%) were men. The data obtained from those who responded to the question of sex did not differ from the data of the population of respondents as a whole. Women and men did not respond differently in terms of aesthetic preference, and there was no significant difference in format choice among the different professions. Of the 4 formats presented to the participants, columned formats were preferred to left-justified formats (P < .001), regardless of capitalization in either the RDE or the response (Figure 3).
A total of 446 participants completed the accuracy analysis. The 8 formats (A-H) were distributed randomly, resulting in the following percentages: A (n = 64; 14.3%), B (n = 59; 13.2%), C (n = 52; 11.7%), D (n = 53; 11.9%), E (n = 55; 12.3%), F (n = 52; 11.7%), G (n = 55; 12.3%) and H (n = 56; 12.5%). No statistically significant differences were observed in this random distribution (P = .99). Of a total of 4906 questions, 4251 (92.9%) were correct, 326 (7.0%) were incorrect, and 329 (6.79%) were unanswered (Figure 4). In the group of participants using columned formats (n = 218), the total correct answers ranged from 151 (69.2%) for Q5 (is there perineural invasion?) to 208 (95.4%) for Q3 (Item Difficulty Index ranging from 0.75 to 1.0), while in the group of participants using justified formats (n = 228) the total correct answers ranged from 136 (59.6%) for Q5 to 217 (95.2%) for Q3 (Table 3). Thus, columned formats perform better with regard to accuracy (P = .02), and specifically for Q5 (is there perineural invasion?) (P = .04). The proportion ranged from 66 (justified, Q5) to 100 (Table 4). Of the 11 questions, 3 (Q3, Q8, and Q10) were answered correctly by all participants (100% accuracy) using columned formats, whereas only Q3 (what is the tumor size?) was answered correctly by all participants using the justified formats. For questions with an overall correct response of 50% or more, the maximum discrimination between columned and justified formats was 200% (Maximum discrimination = Columned 100% correct response rate + Justified 100% correct response rate). In this study, the index of discrimination (Columned format percentage correct--Justified format percentage correct) ranged from 0 (Q3, what is the tumor size?) to 70.0% (Q10, how many lymph nodes were examined) (Table 4). The 446 participants were divided in 2 professional groups, (estimated expertise determined by contact with cancer patients): 381 (85.4%) of participants were in the nonexpert group and 65 (14.6%) in the expert group. No statistically significant differences were observed, between these 2 groups, in their ability to extract accurate information from the formats. Of the 368 participants (82.5% of the total) who answered the question about sex, 241 (65.6%) were women and 127 (34.4%) were men. No sex differences were observed in accuracy.
Both the configuration of the RDE/response pair and the format type affect the ability of the participant to answer correctly. The most efficient format would be the one that allows the most correct answers in the shortest time. Participants were asked to record which format they received as indicated by a letter (A-H) in the upper left hand corner (Figure 1), which took an average of 3.5 seconds. Participants took more time when answering incorrectly than when answering correctly (data not shown). No sex differences were observed in the time it took to answer. Participants presented with justified formats took, on average (SD) 96.3 (30.1) seconds to complete the test, 4.1% longer than participants using the columned formats (92.5 [27.6] seconds) (Table 3). No statistically significant differences were observed among the formats within each of the 2 (columned and justified) groups ([F.sub.1427] = 1.9, P = .17).
Three of the columned formats (C, G, and H) performed best in overall efficiency and allowed participants to correctly answer at least 9 of the 11 questions with the following percentages: C (n = 37 of 52; 71.1%), G (n = 40 of 55; 72.7%) and H (n = 39 of 56; 69.6%).
Using a cutoff of 10 seconds, of the total 446 participants, 253 (56.7%) correctly answered all 11 questions. Of those 253, 131 (51.8%) had received columned formats, and 122 (48.2%) had received justified formats. Of the 52 participants who received format C (columned, capitalized response, no box), 49 (94.2%) answered all questions correctly in the full time allowed (60 seconds). Of these 49, 43 (87.8%) correctly answered all 11 questions in the first 10 seconds. Of the 79 participants with other columned formats, 53 (67.1%) answered all 11 questions in the first 10 seconds. Of the 122 who received justified formats, only 65 (53.3%) correctly answered all 11 questions in the first 10 seconds. Columned formats performed better than justified formats (P = .001).
The other columned format with a capitalized response (format H) had a summary box. Thus, the presence of a summary box increased the overall response time, thereby decreasing efficiency, presumably because participants compared the information in the box with that in the protocol.
Readability and Complexity of the RDE/Response Pair
To respond to the 11 questions, participants read the PowerPoint question (number of characters ranging from 18 to 47) and scanned the RDEs (number of characters ranging from 9 to 30) and the responses (number of characters ranging from 1 to 65). An assessment of the readability of the PowerPoint questions and RDE/responses was made using an average of 3 tests. Readability levels were affected more by the RDE/response pair than by the PowerPoint question (data not shown). The average readability levels for all RDE/responses were Q11 = 1.75, Q1 = 2.65, Q3 = 2.65, Q2 = 5.05, Q9 = 6.80, Q8 = 7.40, Q10 = 7.75, Q6 = 8.80, Q4 = 11.90, Q7 = 12.15, and Q5 = 16.4. Questions 1 (what is the tumor site?), 3 (what is the tumor size?), and 11 (what is the pathologic stage?) were the easiest questions to understand, whereas Q5 (is there perineural invasion?) was the most difficult, followed by Q7 (were the obturator lymph nodes sampled?) and Q4 (were internal iliac lymph nodes sampled?). Question 5 does not have a correlating RDE/ response pair in the protocol, and 138 of the 446 participants (30.9%) left the question unanswered or answered it incorrectly. Perineural invasion, a common and often expected data field in most cancer reports, is not a required data element in the fallopian tube protocol, (16) and even though the correct answer for Q5 was "Not in protocol/not found," participants answered "no" significantly more often than they answered "yes" (P < .001). We postulate that this is due to the assumption that if a feature isn't explicitly stated as a positive finding, it must be negative/not present. In the case of Q4, the word internal was not included in the protocol, whereas the word external was. This is another potential source of confusion from related, but not identical, wording. Questions with numbers either in the RDE or in the response (Q3, what is the tumor size? and Q6, what is the p53 status?) were answered correctly using most formats.
The fundamental decision regarding which items are included in the CAP cancer protocols is based on evidence but, there are no official recommendations regarding the final configuration of the synopsis. As a result, formats vary among institutions and among pathologists within a single institution. Pathology departments with access to one of the software packages available can preconfigure their reports (computerized systems may have limitations regarding columned designs, which may be lost during transmission), whereas other pathologists have to create their own template on a case-by-case basis. Although some pathologists have engaged end users in the design of a synoptic report, this has been done without scientific evidence or the application of human factors engineering concepts.
Several studies have established the advantages of the synoptic report format. (7-11,22) In his excellent review on the topic, Valenstein (23) states that, since 1991, when Markel and Hirsch (10) first used a synoptic report configured as a vertical data list, no studies have been performed to assess whether that type of display correlates with speedier transmission of information, better retention of information, or fewer recall errors by the user. Valenstein (23) describes an operational definition of communication effectiveness, which, in the context of synoptic reporting, considers several parameters of better format performance: (1) better recognition of key facts tested by a multiple-choice test, (2) better recall of content without prompting by multiple-choice questions, (3) fewer errors in recognition or recall, (4) less time to assess the content without an increase in errors, (5) increased likelihood of taking a correct action, and (6) increased satisfaction with the communication experience.
In our study, we only compared synoptic formats with each other because several studies have already proven the superiority of synopses over narrative format. (7,8,24,25) We did not investigate effects on recall because we are planning further experiments targeting only oncologists. We addressed user satisfaction in the aesthetics component of our study and attempted to draw some conclusions regarding the 4 formatting principles addressed by Valenstein (23): (1) use of headlines to emphasize key findings, (2) maintenance of layout continuity, (3) optimization of information density, and (4) clutter reduction. We did not introduce changes to the continuous layout of the format proposed by the CAP. The density of information and the reduction of clutter were addressed by investigating the effects on efficiency related to complexity of the RDE/response pair, wording differences, and distractors related to a confusing display of data. Many format modifications could be investigated, but we reduced them to a few major changes and for only one of the cancer types.
Mental processes, such as perception, reasoning, emotion, motivation, memory, language, communication, and problem solving, have a critical role in the interaction of humans with the environment. Human capabilities also vary according to psychologic factors, experience, knowledge, mental or emotional state, and attitude, and these factors have a significant effect on the completion of any task. Perceptions, however, include not only the classical dimension of aesthetics, such as order and clarity, but also items related to the expressive dimension, such as visual clues based on groups of related items, separation of unrelated items, font style, alignment of visually connected elements, and headings and subheadings to facilitate scanning. (26) It is this expressive component that may facilitate recognition and recall of critical parameters. (27,28)
In our study, columned formats were selected based on aesthetic appeal and, using several analytic methods, were also shown to perform superiorly to justified formats in both accuracy and efficiency. This correlation between aesthetics and efficiency is not surprising, considering that recent neuroaesthetics research (26) identifies visual clues as a strong determinant of user satisfaction resulting in increased attention span and better performance. Interestingly, in general, fast readers benefit most from the 2-column, justified text, whereas slow readers perform best with the 1-column, left-justified text. (29-32) In our 10-second period analysis, participants who received columned formats performed better than participants who received justified formats.
In recent years, human factors engineering studies have guided the design and use of aviation checklists. Degani and Wiener (33) identified 2 critical parameters in the configuration of checklists: legibility of print and readability. Legible print allows the reader to quickly and positively identify each individual character, and depends on character width, form, and contrast between characters and background. Readability, on the other hand, allows the rapid recognition of words, word-groups, abbreviations, and symbols. Some of the proposed guidelines for aviation (34) apply to the configuration of synoptic reports: (1) restriction of sentence length, (2) increase in the sequential spacing between items, (3) reduction of word complexity, and (4) standardization of wording and nomenclature to reduce variability and ambiguity. Thus, desirable formatting principles include uniformity in format design, optimization of information, density, and standardization of terminology.
The limited capacity of short-term memory is one of the most severe constraints on human performance. (35) Thus, when the number of tasks increases beyond a certain threshold, commitment to memory is difficult if not impossible. Long checklists, therefore, place unrealistic demands on short-term memory. (36,37) The average subject has limitations in visual span and processing and cannot store more than 7 bits of unrelated information in short-term or "working" memory. (27,36,38) Furthermore, memory centers process incoming information for a very brief period, usually on the order of 0.5 to 3 seconds, and the amount of information held at any given moment is limited to 5 to 7 discrete elements, such as letters of the alphabet. Thus, too much ungrouped information, packed together in a single line, will be forgotten or difficult to assimilate. The 60-second limit on answering time, used in this study, is considered more than adequate because cognitive research shows that the time used in gathering and recollection of data is actually in the tenths of a second. (17)
Headers facilitate content assimilation and should be used whenever possible. (31,37) In the CAP protocols, there are headers for main items, and the RDEs also act as headers to call the attention of the reader. In our study, our attempt to use capital letters in the RDE or in the response to achieve item separation did not seem to make a significant difference, but it is advisable that one style is chosen and maintained throughout. In our proposed formats (Figure 5, A and B) we use capitals in the response because the data that the end user is trying to identify is located in the response.
The complexity of the RDE/response pair has a role in the ability of the participant to reach an answer within 60 seconds. Short questions, such as "what is the tumor size?" "what is the tumor grade?" and "what is the tumor site?" were rarely left unanswered, whereas more-complex items, such as "were the obturator lymph nodes sampled?" or "what is the status of the fimbria?" were left unanswered more often. Longer lines may result in faster reading but research suggests that medium to short lines typically result in higher comprehension. (32) Thus, the length of the RDE and response should be kept to a minimum, even if that means increasing the number of lines in the response or the number of RDE/response pairs.
It is also imperative to use consistent terminology. Idowu et al (39) addressed this question in a survey of clinicians and concluded that certain pathology terms had different meanings for clinicians than they do for pathologists. Lubin et al (40) also found that ambiguous terminology and lack of clarity in content and organization were among the challenges to be addressed in reporting molecular pathology results. We found that more errors were associated with lengthy descriptions, similar words (ie, inguinal, internal), inconsistent wording (ie, involved, and positive), and duplication of information (lymph node status). For instance, positive should be preferred to involved, and negative should be preferred to uninvolved because positive and negative are words than can be easily distinguished from each other. Regardless of the word selected, that word should be consistently used in the synopsis. On the other hand, in our study, responses with easily recognized visual features, like numbers, were more often answered correctly and more rapidly. Thus, numbers should be used instead of words, whenever possible because the use of numbers facilitates recognition.
Also, searching the synopsis for data that are expected but not included is prone to errors of assumption. In our study, Q5 (is there perineural invasion?) was left unanswered often, and when answered, the answer was often incorrect, hence, the importance of accurately replicating the reports, as configured by the CAP, and answering all required parameters. Also, national organizations should make a point to routinely educate physicians/providers that the CAP protocols contain the scientifically validated data elements; items not included are not to be assumed negative.
Pathologists organize their reports following an order that replicates their daily operation: gross description, microscopic evaluation, final diagnosis, and comment. The end user, however, may be more interested in other items, such as final diagnosis, margin status, or pathologic stage. If that is the case, the design of the reports could be organized to place the most important items, at the beginning of the synopses. In an attempt to investigate whether placing specific items at the top of the report reduces answering times, we included a box containing a few important parameters. The presence or absence of this box had no impact on accuracy but it increased answering time, likely because the participants were cross-checking the contents of the box with the contents of the synopsis. An unpublished survey in our institution (data not shown) revealed that oncologists (n = 5) do not have a consistent preference for what parameters have to be emphasized in such a box. Repetition of data in reports is time consuming and ineffective because redundancy limits the usefulness of the design, and duplication of information increases transcription errors and confuses responders, as reported by Degani and Weiner. (33) They concluded that duplication of data enhanced redundancy and degraded the overall checklist performance. No differences were found in either aesthetic preference or performance between men and women. This is not surprising in light of recent studies (41) revealing that cognitive differences are greater within each sex than between the sexes.
In our study, correct answers were considered a surrogate for accuracy, and incorrect answers as errors in interpretation. To our knowledge, there are no studies citing medical errors as the result of misinterpretation of synoptic reports, but we all receive phone calls requesting clarification of certain items, staging, or pathology terminology, and we are left to wonder about the accuracy of the interpretation by those who did not call. Even pathologists responsible for presentation of cases at tumor boards are often confused by the synoptic reports generated by other pathologists from their own department. Thus, even when the content in the synoptic report is complete and accurate, lack of uniformity in formatting leads to confusion among clinicians and can be seen as relaxation in pathology methods and procedures. (42-46) Because synoptic reports are currently being evaluated for implementation by a variety of medical specialties and organizations, we should ensure homogeneity and user satisfaction. (47-51) Furthermore, the oncology care provider spends considerable time extracting information from a single synoptic pathology report. This time requirement aggregates each day for each patient and, taken together, it compounds not only for the individual care provider but for the institution as a whole.
Considering the results of this study, we propose a format that could be used to decrease cognitive errors and facilitate comprehension (Figure 5, A). This proposed format incorporates the human factors engineering principles outlined in this article, as well as those in Valenstein. (23)
First, we begin with the posting date of the CAP protocol to indicate the version used. This is important because there is a window of time before implementation of the protocols is required. In the event of a staging change (eg, the American Joint Committee on Cancer, 7th edition, effective January 1, 2010), (52) the CAP protocols may reflect the stage change (ie, posted on the Internet in October 2009) but with a delay in reporting requirement (ie, August 2010). It should be clear to clinicians and other pathologists which edition of the protocol is used. There is also a need for education of both pathologists and providers as to the purpose of the data, the delay, and the possible discrepant staging. This could be done at a national level as well as at local tumor boards.
We follow with the summary statement of the pathologic staging, as it is the culmination of the report. All the included elements support the stage, and the stage is the basis for treatment. Therefore, we have relocated it from the end of the report, where it is in current CAP protocols, to the first line.
We have chosen a columned format because columns were not only favored in the aesthetic portion of our study (Figure 3) but also performed better in accuracy testing (Figure 4). Although our findings did not support a clear superiority of capitalization in either the required data element or the response, we chose to capitalize the response, to highlight those elements. When capitalized in a column, the short responses can be quickly read vertically as a list of critical, diagnostic bits of information.
Information in the protocol is grouped according to the American Joint Committee on Cancer staging, with T, N, and M sections, a significant change from current protocols. We found this to be a logical approach to presenting the information that supports the summary pathologic stage. Under the headings Tumor and Metastases, we include data about size, location, tumor type, and then, lymph nodes, lymph-vascular invasion and distant metastases, respectively.
At the bottom are additional findings, including immunohistochemical studies, other diagnoses, and other such information. Clinical history was removed from the protocol because, in our opinion, this is information that pathologists cannot independently validate at the gross bench or the microscope. We do this realizing there could be exceptions, especially in the case of bone and soft tissue tumors, in which tumor classification relies heavily on radiologic correlation.
Finally, we suggest that the CAP protocol serve as the sole source of an aggregate diagnosis for independent specimens that may be received from a complex cancer resection, for example, several lymph node stations, lung wedge resections, and completion lobectomy. Although these are received as independent specimens, the diagnoses can be summarized in one diagnosis and communicated most effectively in a synoptic format as we have proposed. Aside from the benefits of a shorter report that requires less time for the provider to read and digest, redundancies are eliminated. Having multiple locations for the same information invites the possibilities of errors and discrepancies into the report. Another benefit of a shorter report is the ability to display the entire diagnostic information in a single screen view, increasing user satisfaction and efficiency.
Because the fallopian tube protocol is not used universally, we also include our proposal for a colon cancer specimen (Figure 5, B), designed using the same principles, to demonstrate how these principles can translate across all protocols. Before their release, protocols should be reviewed to ensure homogeneity in terminology across tumor sites. It is also important to emphasize aspects of human factors engineering in the training of future generations of pathologists to ensure optimal display of data.
We recognize some limitations of this study: (1) we assumed that participants did not have extensive oncologic knowledge, and we queried various users of cancer data but only a limited number of cancer specialists; (2) there may have been differences between participants who frequently use audience response systems and those who do not; (3) the testing method was somewhat artificial because most reports are read on a computer screen, and we did not use variations in fonts, bolding, and colors, which are currently available in laboratory information systems; (4) we maintained the structure of the CAP protocol and did not include variations in the order of the RDEs or in the grouping of the items; and (5) we only measured response and not comprehension. Some of these limitations could be addressed in future studies.
To our knowledge, this is the first study to examine both the visual appeal of varying formats of a synoptic CAP cancer protocol and the effects of formatting on the accuracy and efficiency of data extraction. Based on the results herein, we propose a model of efficient data presentation using human factors engineering. We use the fallopian tube and the colon templates and employ columns, headers, short phrases, and uniform fonts and terminology. These formatting principles can be easily applied to other cancer sites. Human factors engineering and methods used in this study would be helpful in optimizing the display of complex patient information.
Please Note: Illustration(s) are not available due to copyright restrictions.
(1.) Leape LL, Berwick DM. Five years after To Err Is Human: what have we learned? JAMA. 2005; 293(19):2384-2390.
(2.) Prepared for the WHO Patient Safety's Methods and Measures for Patient Safety Working Group. Human Factors in Patient Safety: Review of Topics and Tools. Geneva, Switzerland: World Health Organization WH0/IER/PSP/2009.05. http://www.who.int/patientsafety/research/methods_measures/human_factors/ human_factors_review.pdf. Published April 2009. Accessed October 1, 2013.
(3.) Yeh M, Jo YJ, Donovan C, Gabree S. Human factors consideration in the design and evaluation of flight deck displays and controls. FAA Final Report, Version 1.0. DOT/FAA/TC-13/44, D0T-VNTSC-FAA-13-09, 2013. http://ntl.bts. gov/lib/50000/50700/50760/General_Guidance_Document_Nov_2013_v1.pdf. Published November 2013. Accessed February 5, 2014.
(4.) Sexton JB, Thomas EJ, Helmreich RL. Error, stress, and teamwork in medicine and aviation: cross sectional surveys. BMJ. 2000; 320(7237):745-749.
(5.) College of American Pathologists. Cancer protocols. http://www.cap.org/ web/home/resources/cancer-reporting-tools/cancer-protocols. Accessed October 1, 2012.
(6.) Merriam-Webster. Synopsis. Merriam-Webster.com Web site. http://www merriam-webster.com/dictionary/synopsis. Accessed May 8, 2014.
(7.) Mamykina L, Vawdrey Dk, Stetson PD, Zheng K, Hripcsak G. Clinical documentation: composition or synthesis? I Am Med Inform Assoc. 2012; 19(6): 1025-1031.
(8.) Schmidt RA. Synopses, systems and synergism. Am J Clin Pathol. 2007; 127(6):845-847.
(9.) Srigley JR, McGowan T, MacLean A, et al. Standardized synoptic cancer pathology reporting: a population-based approach. J Surg Oncol. 2009:99(8): 517-524.
(10.) Markel SF, Hirsch SD. Synoptic surgical pathology reporting. Hum Pathol. 1991:22(8):807-810.
(11.) Leslie KO, Rosai J. Standardization of the surgical pathology report: formats, templates, and synoptic reports. Semin Diagn Pathol. 1994:11(4):253-257.
(12.) Nygren E, Wyatt JC, Wright P. Helping clinicians to find data and avoid delays. Lancet. 1998:352(9138):1462-1466.
(13.) Wyatt JC, Wright P. Design should help use of patients'data. Lancet. 1998: 352(9137):1375-1378.
(14.) Lankshear S, Srigley J, McGowan T, Yurcan M, Sawka C. Standardized synoptic cancer pathology reports--so what and who cares?: a population-based satisfaction survey of 970 pathologists, surgeons, and oncologists. Arch Pathol Lab Med. 2013; 137(11):1599-1602.
(15.) Kang HP, Devine LJ, Piccoli AL, Seethala RR, Amin W, Parwani AV. Usefulness of a synoptic data tool for reporting of head and neck neoplasms based on the College of American Pathologists cancer checklists. Am J Clin Pathol. 2009; 132(4):521-530.
(16.) Clarke BA, Crum CP, Nucci MR, Oliva E, Cooper K; for Members of the Cancer Committee, College of American Pathologists. Protocol for the examination of specimens from patients with carcinoma of the fallopian tube. College of American Pathologists Web site: FallopianTube22.214.171.124. http://www cap.org/ShowProperty?nodePath=/UCMCon/Contribution Folders/WebContent/ pdf/fallopiantube-12protocol-3100.pdf. Published February 1, 2011. Accessed October 1, 2012.
(17.) Nielsen J. Powers of 10: time scales in user experience. NN/g Nielsen Norman Group. http://www.nngroup.com/articles/powers-of-10-time-scales-in-ux/. Published October 5, 2009. Accessed October 1, 2012.
(18.) Morgan RJ Jr, Alvarez RD, Armstrong DK, et al; National Comprehensive Cancer Network. Epithelial ovarian cancer. J Natl Compr Canc Netw. 2011; 9(1): 82-113.
(19.) Flesch R. Anew readability yardstick. J Appl Psychol. 1948; 32(3):221-233.
(20.) Hedman AS. Using the SMOG formula to revise a health-related document. Am J Health Educ. 2008; 39(1):61-64.
(21.) Coleman M, Liau TL. A computer readability formula designed for machine scoring. I Appl Psychol. 1975; 60(2):283-284.
(22.) Renshaw SA, Mena-Allauca M, Touriz M, Renshaw A, Gould EW. The impact of template format on the completeness of surgical pathology reports. Arch Pathol Lab Med. 2014; 138(1):121-124.
(23.) Valenstein PN. Formatting pathology reports: applying four design principles to improve communication and patient safety. Arch Pathol Lab Med. 2008; 132(1):84-94.
(24.) Bouma, H. Visual reading processes and the quality of text displays. In: Grandjean E, Vigliani E, eds. Ergonomic Aspects of Visual Display Terminals. London, England: Taylor & Francis; 1980:101-114.
(25.) Wright P, Jansen C, Wyatt JC. How to limit clinical errors in interpretation of data. Lancet. 1998; 352(9139):1539-1543.
(26.) Habekost T, Starrfelt R. Visual attention capacity: a review of TVA-based patient studies. ScandJ Psychol. 2009; 50(1):23-32.
(27.) Jonides J, Lewis RL, Nee DE, Lustig CA, Berman MG, Moore KS. The mind and brain of short-term memory. Annu Rev Psychol. 2008; 59:193-224.
(28.) Sabih D, Sabih A, Sabih Q, Khan AN. Image perception and interpretation of abnormalities; can we believe our eyes?: can we do something about it? Insights Imaging. 2011; 2(1):47-55.
(29.) Dyson MC, Kipping GJ. The legibility of screen formats: are three columns better than one? Comput Graph. 1997; 21(6):703-712.
(30.) Dyson MC, Haselgrove M. The influence of reading speed and line length on the effectiveness of reading from screen. Int J Hum Comput Stud. 2001; 54(4): 585-612.
(31.) De Bruijn D, De Mul S, Van Oostendorp H. The influence of screen size and text layout on the study of text. Behav Inform Technol. 1992; 11(2):71-78.
(32.) Dyson MC, Kipping GJ. The effects of line length and method of movement on patterns of reading from screen. Visible Lang. 1998; 32(2):150-181.
(33.) Degani A, Wiener EL. Human Factors of Flight-Deck Checklists: The Normal Checklist. Mountain View, CA: Ames Research Center; 1990. National Aeronautics and Space Administration Contract Report NCC2-377.
(34.) Degani A, Wiener EL. On the Design of Flight-Deck Procedures. Mountain View, CA: Ames Research Center; 1994. National Aeronautics and Space Administration Contract Report 177642.
(35.) Atkinson RC, Shiffrin RM. Human memory: a proposed system and its control processes. In: Spence KW, Spence JT, eds. The Psychology of Learning and Motivation: Advances in Research and Theory. Vol. 2. New York, NY: Academic Press; 1968:89-195.
(36.) Baddeley AD, Hitch G. The recency effect: implicit learning with explicit retrieval? Mem Cognit. 1993; 21(2):146-155.
(37.) Rogers Y, Sharp H, Preece J. Interaction Design: Beyond Human Computer Interaction. London, England: John Wiley & Sons; 2011.
(38.) Drury CG, Paramore B, Van Cott HP, Grey SM, Corlett EN. Task analysis. In: Salvendy G, ed. Handbook of Human Factors and Ergonomics. New York, NY: John Wiley & Sons; 1982:371-399.
(39.) Idowu MO, Wiles A, Wan W, Wilkinson DS, Powers CN. Equivocal or ambiguous terminologies in pathology: focus of continuous quality improvement? Am I Surg Pathol. 2013; 37(11):1722-1727.
(40.) Lubin IM, McGovern MM, Gibson Z, et al. Clinician perspectives about molecular genetic testing for heritable conditions and development of a clinician-friendly laboratory report. J Mol Diagn. 2009; 11(2):162-171.
(41.) Burstein B, Bank L, Jarvik LF. Sex differences in cognitive functioning: evidence, determinants, implications. Hum Dev. 1980; 23(5):289-313.
(42.) Foucar E. 'Individuality' in the specialty of surgical pathology: self-expression or just another source of diagnostic error? Am J Surg Pathol. 2000; 24(11):1573-1576.
(43.) Galloway M, Taiyeb T. The interpretation of phrases used to describe uncertainty in pathology reports. Patholog Res Int. 2011:656079. doi:10.4061/ 2011/656079.
(44.) Ruby SG. Clinician interpretation of pathology reports: confusion or comprehension? Arch Pathol Lab Med. 2000:124(7):943-944.
(45.) Dawood S. Pathology vs medical oncology: the crucial exchange of accurate information. Br J Cancer. 2013:108(4):745-747.
(46.) Attanoos RL, Bull AD, Douglas-Jones AG, Fligelstone LJ, Semararo D. Phraseology in pathology reports: a comparative study of interpretation among pathologists and surgeons. J Clin Pathol. 1996; 49(1):79-81.
(47.) Idowu MO, Bekeris LG, Raab S, Ruby SG, Nakhleh RE. Adequacy of surgical pathology reporting of cancer: a College of American Pathologists Q-Probes study of 86 institutions. Arch Pathol Lab Med. 2010:134(7):969-974.
(48.) Messenger DE, McLeod RS, Kirsch R. What impact has the introduction of a synoptic report for rectal cancer had on reporting outcomes for specialist gastrointestinal and non-gastrointestinal pathologists? Arch Pathol Lab Med. 2011; 135(11):1471-1475.
(49.) Mohanty SK, Piccoli AL, Devine LJ, et al. Synoptic tool for reporting of hematological and lymphoid neoplasms based on World Health Organization classification and College of American Pathologists checklist. BMC Cancer. 2007; 7:144.
(50.) Murari M, Pandey R. A synoptic reporting system for bone marrow aspiration and core biopsy specimens. Arch Pathol Lab Med. 2006; 130(12): 1825-1829.
(51.) Yunker WK, Matthews TW, Dort JC. Making the most of your pathology: standardized histopathology reporting in head and neck cancer. J Otolaryngol Head Neck Surg. 2008; 37(1):48-55.
(52.) Edge SB, Byrd DR, Compton CC, Fritz AG, Greene FL, Trotti A, eds. AJCC Cancer Staging Manual. 7th ed. New York, NY: Springer; 2010.
(53.) Ellis DW, Srigley J. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets. Virchows Arch. 2016; 468(1):51-59.
Leah B. Strickland-Marmol, MD; Carlos A. Muro-Cacho, MD; Scott D. Barnett, PhD; Matthew R. Banas, BSIE; Philip R. Foulis, MD, MPH
Accepted for publication September 25, 2015. From the Departments of Pathology and Laboratory Medicine (Drs Strickland-Marmol, Muro-Cacho, and Foulis), Systems Redesign (Mr Banas), and the Health Services Research & Development Center of Innovation on Disability and Rehabilitation Research (Dr Barnett), James A. Haley Veterans' Hospital, Tampa, Florida.
The authors have no relevant financial interest in the products or companies described in this article.
Reprints: Philip R. Foulis, MD, MPH, Department of Pathology and Laboratory Medicine, James A. Haley Veterans' Hospital (673), 13000 Bruce B. Downs Blvd, Room 1D-172b (VA 113), Tampa, FL 33612-4798 (e-mail: Philip.Foulis@va.gov).
Caption: Figure 1. One of the formats (format G) with components defined by the College of American Pathologists. A box with selected information is provided at the top. Abbreviation: RDE, required data element.
Caption: Figure 2. Example of a sheet with 4 randomly generated formats, given to participants for aesthetics analysis. The participants recorded their preference scores in the boxes using a Likert scale (1, most pleasing; 4, least pleasing).
Caption: Figure 3. Aesthetics. Y-axis: Likert scale of results (1, most pleasing; 4, least pleasing). X-axis: The columned formats (C and E) were preferred to the left justified (D and F) formats (P < .001).
Table 1. Characteristics of the 8 Formats (A to H) Used in the Study Format Columned Justified Capitalization Box A X RDE X B X Response X C X Response D X Response E X RDE F X RDE G X RDE X H X Response X Abbreviation: RDE, required data element. Table 2. Questions Used in the Accuracy Component of the Study 1. What is the tumor site? 2. How many lymph nodes are involved? 3. What is the tumor size? 4. Were the internal iliac lymph nodes sampled? 5. Is there perineural invasion? 6. What is the p53 status? 7. Were the obturator lymph nodes sampled? 8. What is the tumor grade? 9. What is the status of the fimbria? 10. How many lymph nodes were examined? 11. What is the pathologic stage? Table 3. Individual Item (Question) Characteristics3 Question Correct Score, Mean (SD) Columned Justified P Value 1 0.96 (0.19) 0.92 (0.28) .05 2 0.86 (0.35) 0.88 (0.33) .56 3 1.00 (0.00) 1.00 (0.00) NA 4 0.86 (0.35) 0.84 (0.37) .58 5 0.75 (0.44) 0.66 (0.48) .04 6 0.99 (0.10) 0.98 (0.15) .28 7 0.96 (0.19) 0.98 (0.14) .32 8 1.00 (0.07) 0.99 (0.12) .33 9 0.95 (0.21) 0.91 (0.29) .06 10 1.00 (0.07) 0.97 (0.16) .06 11 0.99 (0.07) 0.98 (0.14) .20 Total score 0.94 (7.98) 0.92 (10.40) .02 Question Time (s) to Respond, Mean (SD) Columned Justified P Value 1 16.13 (9.15) 17.63 (11.10) .13 2 18.78 (11.98) 18.25 (12.21) .66 3 11.49 (4.54) 11.77 (5.75) .58 4 19.40 (10.09) 19.92 (9.33) .59 5 32.33 (10.45) 33.41 (11.93) .33 6 11.65 (7.58) 11.77 (7.32) .89 7 13.77 (10.26) 13.26 (8.88) .60 8 13.06 (5.77) 14.39 (7.24) .04 9 20.52 (12.23) 21.78 (12.84) .31 10 12.75 (6.81) 13.16 (6.72) .53 11 14.09 (7.66) 14.81 (7.73) .35 Total score 92.49 (27.61) 96.34 (30.07) .17 Abbreviation: NA, not applicable. (a) Participants with >2 missing items were excluded from the total score calculation. The range is from 0 to 1. Columned formats performed better in accuracy (P = .02) and, specifically, better in question 5 (is there perineural invasion?) (P = .04). Table 4. Item Difficulty Index Question Question 1 (a) 2 3 4 Columned (207) (207) (208) (202) Proportion 199 (96) 178 (86) 208 (100) 174 (86) correct, % Proportion 8 (4) 29 (14) 0 (0) 28 (14) incorrect, % Justified (215) (215) (217) (215) Proportion 198 (92) 189 (88) 217 (100) 181 (84) correct, % Proportion 17 (8) 26 (12) 0 (0) 34 (16) incorrect, % Discrimination 4 2 ... (c) 2 index (b) Maximum 12 26 ... 30 discrimination Discriminating 33 8 NA 7 efficiency (d) % Question Question 5 6 7 8 Columned (201) (203) (197) (207) Proportion 151 (75) 201 (99) 189 (96) 207 (100) correct, % Proportion 50 (25) 2 (1) 8 (4) 0 (0) incorrect, % Justified (206) (212) (206) (217) Proportion 136 (66) 208 (98) 202 (98) 215 (99) correct, % Proportion 70 (34) 4 (2) 4 (2) 2 (1) incorrect, % Discrimination 9 1 2 1 index (b) Maximum 59 3 6 1 discrimination Discriminating 15 33 33 100 efficiency (d) % Question Question 9 10 11 Columned (197) (205) (198) Proportion 187 (95) 205 (100) 196 (99) correct, % Proportion 10 (5) 0 (0) 2 (1) incorrect, % Justified (216) (217) (209) Proportion 197 (91) 210 (97) 205 (98) correct, % Proportion 19 (9) 7 (3) 4 (2) incorrect, % Discrimination 5 2 1 index (b) Maximum 14 3 3 discrimination Discriminating 36 67 33 efficiency (d) % Abbreviation: NA, not applicable. (a) Number of participants who answered a given question: 1, 422;2, 422;3, 425;4, 417;5, 407; 6, 415;7, 403;8, 424;9, 413;10, 422;11, 407. (b) The discrimination index was scored as columned versus left justified for all questions, except questions 2 and 7, which were scored as justified versus columned to facilitate positive indices. The threshold was set at >0.45. (c) Since question 3 had 100% correctly answered the discrimination index and maximum discrimination cannot be calculated. (d) Discriminating efficiency = Discrimination index/Maximum discrimination X 100%. Figure 4. Accuracy. Y-axis: Percentages of correct (blue) and incorrect (red) answers for the 11 questions (question [Q]1 to Q11) (X axis). Unanswered questions are not included. Correct Incorrect Q1 91.9 6.3 Q2 86.3 11.2 Q3 98.6 0.2 Q4 81.9 13.7 Q5 65.8 28.8 Q6 94.9 1.4 Q7 84.4 9.3 Q8 96.5 0.9 Q9 90.2 6.0 Q10 82.6 1.6 Q11 93.5 1.2 Note: Table made from bar graph. Figure 5. Example of proposed formats designed according to the results from this study. A, Fallopian tube cancer. B, Colon cancer. The Web posting date is included to indicate which edition of the protocol was used. Carcinoma of the Colon (Protocol version October 2013) PATHOLOGIC STAGING: pT3 N1 M1 PRIMARY TUMOR Site: LEFT FALLOPIAN TUBE Location: FIMBRIA, AMPULLA Histologic type: SEROUS CARCINOMA Histologic grade: G2: MODERATELY DIFFERENTIATED Greatest dimension: 3.0 cm Integrity: RUPTURED Relationship to FUSED ovary: Status of CLOSED fimbriated end: Microscopic UTERUS extension: Lymph-vascular NOT IDENTIFIED invasion: METASTASES Lymph Nodes EXAMINED: 13 POSITIVE: 5 (total) Common iliac: EXAMINED: 2 POSITIVE: 1 External iliac: EXAMINED: 3 POSITIVE: 2 Obturator: EXAMINED: 4 POSITIVE: 0 Inguinal: EXAMINED: 4 POSITIVE: 2 Distant Metastases: LIVER ADDITIONAL INFORMATION Immunohistochemistry: p53 POSITIVE Other Findings: NONE SPECIMENS: UTERUS FALLOPIAN TUBES OVARIES LYMPH NODES LIVER BIOPSY A Carcinoma of the Colon (Protocol version October 2013) PATHOLOGIC STAGING: pT3 N2a M1a PRIMARY TUMOR Site: CECUM Histologic type: ADENOCARCINOMA Histologic grade: LOW GRADE Greatest Dimension: 4.5 cm Macroscopic tumor NOT IDENTIFIED perforation: Lymph-vascular invasion: NOT IDENTIFIED Perineural invasion: NOT IDENTIFIED Tumor deposits: NOT IDENTIFIED MARGINS Proximal: NEGATIVE FOR INVASIVE CARCINOMA NEGATIVE FOR DYSPLASIA Distal: NEGATIVE FOR INVASIVE CARCINOMA NEGATIVE FOR DYSPLASIA Mesenteric: NEGATIVE FOR INVASIVE CARCINOMA Closest to invasive MESENTERIC (3.5 cm) carcinoma: METASTASES Lymph nodes: EXAMINED: 27 POSITIVE: 6 Distant metastases: LIVER ADDITIONAL INFORMATION: CROHN DISEASE SPECIMENS: TERMINAL ILEUM CECUM APPENDIX ASCENDING COLON LIVER BIOPSY B