An examination of different performance outcomes in an analytical procedures task.
To improve auditor judgments it is first necessary to understand and evaluate the judgment process (Libby 1981; Biggs et al. 1988; Bedard and Biggs 1991a, 1991b), including determining what successful auditors do differently from those who are less successful. For example, Bedard and Biggs (1991a) suggest that examining the information acquisition and evaluation process of auditors with correct and incorrect judgments for a task with a criterion variable may enable the evaluation of the components of such judgments. Our study incorporates a seeded error that allows us to examine the potential causes of different performance outcomes in an analytical procedures task through concurrent examination of four stages of the analytical procedures process (hypothesis generation, information search, hypothesis evaluation, and a final judgment).
To examine the differences in the judgment processes of auditors who make correct judgments versus those who do not, we use a computerized research instrument that records 82 experienced auditors' initial hypothesis generation and likelihood estimates, the order in which information was selected, additional hypotheses generated, resulting changes in the likelihood of hypotheses generated, and a judgment of the most likely cause of an analytical procedures fluctuation. With respect to hypothesis generation, this study provides further evidence relating to the importance of having a correct initial hypothesis set. (1) Specifically, the study identifies the extent to which correct participants either generate the correct cause initially or during information search, as well as the extent to which incorrect participants were able to self-generate the correct cause but subsequently chose a different cause. To examine differences in information search between correct and incorrect auditors, we examine the number of audit tests selected, the amount of budgetary time utilized, the breadth of testing, the depth of testing, and the focus of the tests selected. Finally, to examine information evaluation, we examine changes in the likelihoods of the correct and inherited hypotheses after evaluating specific tests.
Our study extends prior research examining performance differences in auditors' analytical procedures processes. In particular, this study extends the findings of the Bedard and Biggs (1991a) verbal protocol study through the use of a computerized research instrument that allows participants to select and evaluate additional audit tests and evidence, beyond the financial statements and other background material provided in the case. This allows us to examine in which of the various stages of analytical procedures auditors make less-than-optimal judgments. In addition, the concurrent examination of the four stages of analytical procedures extends previous research, which either has examined these stages independently or has examined less than the entire process. An exception is Asare and Wright (2003) where, for a subset of their participants, the four stages of the process were examined concurrently. (2)
Increasingly, auditors are using analytical procedures to investigate unexpected fluctuations noted in a client's financial statements (Bell et al. 1997). The analytical procedures process has been described by Koonce (1993) as a diagnostic sequential and iterative process involving hypothesis generation, information search, hypothesis evaluation, and a final judgment. That is, auditors generate or inherit explanatory hypotheses relating to the noted fluctuation (hypothesis generation), select additional tests and view the results of these tests (information search), evaluate evidence to discriminate between the competing hypotheses (hypothesis evaluation), and make a final judgment as to the most likely cause of the noted fluctuation. These stages of the analytical procedures process and their relationship to our design (discussed in detail later) are shown in Figure 1.
[FIGURE 1 OMITTED]
Early medical research suggested that, for diagnostic tasks, a correct hypothesis set was critical for successfully identifying the correct cause of a patient's problem. This research noted that the failure to determine the correct cause was due to not ever considering the true explanation and not due to poor evaluation processes (Elstein et al. 1978). Incorrect physicians were found to be unable to disconfirm early hypotheses and physicians who did not initially test for a given health condition were unlikely to subsequently make a correct final diagnosis (Patel and Groen 1986). However, a correct final diagnosis was always made by physicians who, at any time, had considered the correct cause. Further, the development of additional hypotheses was found to be poor, even where evidence contradictory to the initial hypotheses was examined (Joseph and Patel 1990). The importance of the hypothesis generation stage noted in the medical research has meant that a considerable amount of subsequent audit research has been based on the premise that having a correct hypothesis set will inevitably lead to selection of the correct cause (Libby 1985; Libby and Frederick 1990; Heiman 1990; Bedard and Biggs 1991a; Ismail and Trotman 1995; Bierstaker et al. 1999).
The importance of hypothesis generation in an analytical procedures setting was illustrated by Bedard and Biggs (1991a) and Bierstaker et al. (1999) using verbal protocols. This research showed that auditor performance was inhibited by hypothesis generation, with incorrect participants unable to construct the correct cause, even with extensive decision processes, due to a fixation on an initial problem representation. This finding held even where the initial or focus explanation was rejected, leading Bedard and Biggs (1991a) to conclude that audit effectiveness might be impaired when the correct cause is not included in the initial hypothesis set.
However, there is some evidence to suggest that an initial hypothesis set containing the correct cause is neither a necessary nor sufficient condition for selecting the correct cause. Several researchers have suggested that the noted deficiencies in hypothesis generation might be overcome by the auditors' selection of appropriate tests (Asare et al. 1998; Bedard et al. 1998). Asare and Wright (2003) found that a correct initial hypothesis set led to superior performance by auditors in an analytical procedures task, when the auditors either received a balanced set of evidence (3) or selected a limited amount of evidence from a provided set of evidence options. They also noted that having a correct hypothesis set did not inevitably lead to a correct conclusion. Thus, contrary to medical research, not all auditors with the correct initial hypothesis set identified the correct cause, while some with an initial incorrect hypothesis set were able to identify the cause after information search and evaluation. These results suggest that inclusion of the correct cause in the initial hypothesis set is neither a necessary nor sufficient condition for selection of the correct most likely cause.
The current study tests the robustness of these audit findings in a setting where the auditors' set of explanatory hypotheses is comprised of a single inherited incorrect hypothesis and an unlimited set of self-generated hypotheses and where participants are able to select and evaluate as many of the 18 provided audit tests as considered necessary. Hypothesis 1a examines whether inclusion of the correct cause in the initial hypothesis set is a necessary condition for the selection of the correct most likely cause. If some auditors (a significantly greater number than zero) can generate the correct cause during information search, then this would show that inclusion of the correct cause in the initial set was not a necessary condition for selection of that cause as the most likely cause.
H1a: Auditors who do not self-generate the correct cause in their initial hypothesis set will be able to generate the correct cause during information search.
To examine whether inclusion of the correct cause in the initial hypothesis set (or latter hypothesis set) is a sufficient condition for selecting the correct cause, we test the following hypothesis:
H1b: Generating the correct cause either initially or during information search does not necessarily lead to the selection of that cause as the most likely cause.
Since previous auditing research has indicated that having a correct hypothesis set does not inevitably lead to the correct cause being selected (Bedard and Biggs 1991a; Asare and Wright 2003), it is likely that differences will exist in the information search processes of participants able to and not able to select the correct cause after generating it. Information search processes are also likely to vary with the type of error made by incorrect participants: selecting the inherited explanation or selecting another non-error explanation. Asare and Wright (2003) provide some evidence of the impact of different information search strategies on performance. They demonstrate that a balanced evidence set can result in improvements in performance, but it does not fully compensate for having an initial incorrect hypothesis set. Differences in information search for correct and incorrect participants are addressed in the current study through the following hypothesis:
H2: There are differences in the information search between auditors who selected the correct cause and those who selected each of the two incorrect causes.
This hypothesis is examined by considering the specific audit tests selected by the auditors including:
(1) the number of audit tests selected;
(2) the amount of budget time utilized;
(3) the number of hypothesis categories examined (breadth of testing);
(4) the number of tests per hypothesis category (depth of testing); and
(5) the focus of audit tests selected (i.e., focus of initial test selected, and focus on specified tests).
Few studies have examined the hypothesis evaluation stage of the analytical procedures process. An exception is the series of papers by Asare and Wright (1995, 1997a, 1997b), which found that auditors do not consider the full inferential implications of evidence. Although these studies do not investigate potential differences in hypothesis evaluation performance between auditors able to and not able to identify a correct explanatory cause, previous research suggests such differences exist. Bedard and Biggs (1991a) noted that in addition to hypothesis generation problems, incorrect auditors had problems with hypothesis evaluation since several of the auditors were unable to disconfirm their self-generated hypotheses even though disconfirming cues had already been acquired. It has been noted that one way to overcome this focus on initial hypotheses is to consider additional explanations, either inherited from the experimenter or self-generated (Heiman 1990) or explicitly considering why an inherited cause may not be correct (Koonce 1992). Further, both Heiman-Hoffman et al. (1995) and Asare and Wright (2003) also noted belief perseverance for an initial hypothesis, even after evidence to the contrary was received during the information search and hypothesis evaluation phases of analytical procedures. Of particular interest to this study is the finding by Asare and Wright (2003) that not all auditors inheriting the same correct hypothesis set and the same balanced set of evidence were able to select the correct cause. This finding suggests that there are differences in the evaluation of individual audit test results between correct and incorrect auditors.
In our study, we expect two specific differences in information evaluation. First, on reviewing the results of tests related to the inherited hypothesis, participants can either increase, decrease, or not change their likelihood assessment that the inherited hypothesis is correct. Correct evaluation of the test results would decrease the likelihood of the inherited hypothesis. Consequently we expect that correct participants would be more likely than incorrect inherited participants to decrease the likelihood of the inherited hypothesis after viewing each of these tests, but less likely to increase the likelihood of the inherited hypothesis. Second, after reviewing the results of the correct error tests, participants can increase, decrease, or keep constant the likelihood of the correct cause. Correct evaluation of the test results would increase the likelihood of the correct cause. It is expected that correct participants are more likely than incorrect participants to increase this likelihood of the correct cause, but less likely to decrease the likelihood of the correct cause. The above is tested with the following hypothesis:
H3a: After reviewing the results of tests related to the inherited hypothesis, correct participants are more likely (less likely) to decrease (increase) their likelihood of the inherited hypothesis compared to incorrect inherited participants.
H3b: After reviewing the results of tests related to the correct error, correct participants are more likely (less likely) to increase (decrease) their likelihood of the correct cause compared to incorrect participants.
Support for H1a and H1b suggests that inclusion of the correct cause in the initial set is neither a necessary nor sufficient condition for selection of the correct cause as the most likely cause. Therefore, after hypothesizing the correct cause, there must be some other differences between participants who ultimately select/do not select this hypothesis as the most likely cause. We examine the information search and hypothesis evaluation processes for those participants who, at some stage, generate the correct cause to address the following research question.
RQ1: For auditors who self-generate the correct cause, are there differences in the information search and hypothesis evaluation processes of those who select the correct cause and those who do not?
Participants were 82 auditors (4) from one Australian Big 5 firm. In terms of general experience, the mean audit experience of the participants was 3.8 years (range 2 to 8 years). All but two participants had at least three years of audit experience. Two participants had two years of experience; 41 participants had five years of experience; 19 participants had four years of experience; 17 participants had four years of experience; four participants had six years of experience; one subject had eight years of experience. The participants' self-described level in the firm was supervisor (n = 28), senior (n = 21) and staff (n = 35). All but 10 (11.9 percent) of the participants had some manufacturing audit experience (mean number of manufacturing audits 5.89, range 0-30; mean number of manufacturing clients 2.77, range 0-15).
The case used in this study was adapted from Asare and Wright (1995, 1997b). After pilot testing, amendments to the original Asare and Wright (1995, 1997b) case were made to (1) suit the Australian setting; (2) assist participants in achieving the correct outcome (the changes included adding clarifying information); and (3) add controls to the computer program. (5)
The auditors were provided with an instruction booklet containing background information, including a set of financial data with a seeded error, and results of analytical procedures, which included an unexpected material fluctuation in the gross margin ratio for a hypothetical manufacturing client.
The case materials told participants to focus on the material unexpected fluctuation in gross margin. Although there were multiple plausible explanations consistent with the initial problem information, the auditors were informed that only one cause was responsible for substantially all of the fluctuation, "accounting for 90 percent or more of the fluctuation." This approach is consistent with previous studies (e.g., Koonce 1992; Libby 1985; Bedard and Biggs 1991a; Asare and Wright 1997b).
The actual cause of the unexpected fluctuation was that the company incorrectly accounted for standard cost variances relating to labor and overhead. The entire variance was posted to the COGS account rather than being allocated across COGS and finished goods inventory, resulting in an overstatement of COGS and an understatement of finished goods inventory. All participants inherited the same plausible (that is, consistent with the available problem information) but incorrect non-error explanation suggesting that increased Japanese competition had led to shrinking margins. (6)
After reading the background material, the remainder of the experiment was performed interactively using a personal computer. Participants generated hypotheses to explain the noted fluctuation, assessed the likelihood that each of the explanations accounted for substantially all of the fluctuation, then conducted a sequential and iterative information search under a realistic time budget constraint before making their judgment as to the most likely cause of the fluctuation. That is, participants selected the audit procedure they wished to conduct from a list of 18 possible audit tests (different random order for each subject), reviewed the results for the test, added hypotheses, if desired, and then assessed the likelihood for the complete set of potential explanations, as before. This process of selection and review of individual tests and adding explanations continued until the auditors opted to nominate one of the potential explanations as the most likely cause of the noted fluctuation. Only after the auditors had examined as many tests as they considered necessary did they nominate their most likely cause. Finally, all auditors were asked several debriefing questions relating to their previous work experience. These steps are shown in Figure 1. On average, the experimental task took 60 minutes to complete.
The provided list of 18 tests (see Exhibit 1) and associated test results covered six possible categories of hypotheses: sales mix or price change, inherited non-error, general related to gross margin changes, sales error, inventory error, and correct error. These categories and the related tests are described in Table 3. The categories are the same as those used by Asare and Wright (1997a, 1997b, 2003). Three of the tests (Tests 3, 16, and 17) related directly to the inherited explanation (i.e., increased Japanese competition leading to shrinking margins), while one test (Test 1.3) indirectly related to that explanation by testing for changes in sales prices. Selection and correct interpretation of Tests 3, 16, or 13 should have allowed participants to disconfirm the inherited explanation. (7)
List of Audit Tests
1. Prepare an analysis of gross margin per unit for the current year as compared to the prior year.
2. Inspect inventories or test perpetual records, making test counts and noting damaged, obsolete, and overstocked items.
3. Gather comparative financial data of clients in the same industry and determine whether their gross margins are declining.
4. Compare product sales mix by month with the previous year.
5. Test sales cut off at balance sheet date to ascertain the propriety of the corresponding entries.
6. Test raw material pricing by reference to perpetual inventory records and to vendor invoices.
7. Compare standard costs from the current year to the prior year.
8. Examine vendors' invoices and determine whether inventory in transit (FOB destination) have been improperly included in current year's inventory.
9. By product line, compare the percentage of material, labor, and overhead costs in ending inventory to that in cost of sales.
10. Determine proper cut-off by examination of transactions around inventory date, using information gathered during the physical examination.
11. Perform a review of the standard costs as compared to actual costs through price testing of raw materials and calculation of actual labor and overhead costs incurred.
12. Trace shipping documents to resultant sales invoices and entry into the sales journal to verify that sales transactions are recorded.
13. Prepare an analysis comparing the average sales price per unit by product line for two-year period.
14. Compare sales prices in selected transactions to price lists and determine the extent of sales discounts granted.
15. Prepare an analysis of gross margin by month by product line as compared to the prior year.
16. Consult with the marketing manager about the extent of Japanese pricing competition and marketing strategies to deal with the situation.
17. Examine industry trade journals and economic indicators to determine the trends and extent of Japanese competition.
18. Compare standard cost variance accounts with previous year
Coding of Data
The nominated most likely cause for each subject was coded into one of four performance outcomes: correct (i.e., the actual seeded error, including overstatement of COGS or understatement of finished goods inventory (8)), incorrect inherited (i.e., the inherited explanation), incorrect self-generated error (9) (i.e., an error other than a correct explanation), and incorrect self-generated non-error (i.e., a non-error other than the inherited non-error). In addition, consistent with the audit test categories used by Asare et al. (1998), the focus of the tests examined by the auditors was coded according to whether they related to the correct seeded explanation, the inherited explanation, or to other plausible but incorrect explanations. All classifications were performed by one of the authors and an independent coder. After performing the coding task independently, the two coders met and resolved all differences.
Twenty-four (29.3 percent) of the auditors proposed the correct seeded error cause. The remaining most likely causes were incorrect, with 28 (34.1 percent) auditors selecting the inherited explanation and 30 (36.6 percent) selecting an incorrect self-generated non-error explanation as their most likely cause.
Hypotheses 1a and lb examine whether a correct initial hypothesis set is either a necessary or sufficient condition for selection of the correct cause. Table 1 shows that a large number of participants (n = 63) generated the correct cause during the task. While most (n = 50) generated the correct cause in the initial set of hypotheses, 13 participants were able to generate the correct cause during information search (13 is significantly different from zero, binomial test, p = 0.000). That is, the correct cause can be generated after the initial hypothesis set and therefore initial generation of the correct cause is not a necessary condition for selection of the correct cause and, thus, H1a is supported. Also, including the correct cause in the hypothesis set did not ensure that it would be selected as the correct cause, as 39 (61.9 percent) of the 63 participants considering the correct cause ultimately did not select it (39 is significantly different from zero, binomial test, p = 0.000). Thus, H1b is supported. These results indicate that generation of the correct cause in the initial hypothesis set is neither a necessary nor sufficient condition for selection of the correct cause. (10)
Hypothesis 2 addresses whether there are differences in the information searches of correct and incorrect participants. Table 2 presents descriptive statistics for the following measures of information search examined: the number of audit tests selected, the budgeted hours utilized, number of hypothesis categories tested (breadth of testing), and the number of tests per hypothesis category (depth of testing) by participants with different performance outcomes. Results of three separate one-way ANOVAs, with performance outcome as the independent variable, and number of tests selected, number of audit hours utilized, and number of hypothesis categories examined as the dependent variables indicated no significant differences in any of these three measures between the performance outcome groups (F = 0.095, p = 0.909; F = 0.269, p = 0.765; F = 0.624, p = 0.539, respectively).
One-way ANOVAs examining the depth of hypothesis testing (measured as the number of tests selected from each of the six hypothesis categories) revealed significant differences between participants with different performance outcomes (F = 0.745, p = 0.046). Participants selecting the correct cause and those selecting the incorrect self-generated non-error both selected fewer Category 2 tests (relating to the inherited explanation) than participants selecting the incorrect inherited explanation (p = 0.000; p = 0.003; means of 0.63, 1.89, and 1.23, respectively). Compared to participants selecting the inherited explanation, participants selecting either the correct cause or an incorrect self-generated non-error cause also selected more Category 6 tests (relating to the correct error) (p = 0.011; p = 0.007; means of 2.00, 1.21, and 2.00, respectively). The only difference in information search between correct participants and incorrect self-generated non-error participants is that the correct participants selected significantly more inventory tests on average (1.04 versus 0.37, p = 0.023).
Table 3 presents descriptive results for the number of times each of the 18 audit tests was selected by participants in the three performance outcome groups. Chi-square tests of proportions indicated that there are several key differences in the frequency of the selection of each test between the performance outcome groups: very few participants selecting the correct cause selected two of the tests relating to the inherited explanation (Tests 3 and 17); consistent with the findings noted for the depth of hypothesis testing, significantly more of the participants selecting the inherited explanation selected two tests directly related to the inherited explanation (Tests 3 and 17) than participants selecting either the correct or an incorrect self-generated non-error explanation; significantly fewer of the participants selecting the inherited explanation selected two tests directly examining the correct error explanation (Tests 11 and 18) compared to participants selecting the correct cause or an incorrect self-generated non-error explanation; significantly less participants selecting an incorrect self-generated non-error cause chose Test 10 compared to the other two outcome groups.
To test Hypothesis 3a we examine changes in the likelihood of the inherited explanation for both correct and incorrect inherited participants after reviewing the results of four tests related to the inherited hypothesis (Tests 3, 13, 16, and 17). After reading the results of these tests, correct participants were much more likely to interpret the results correctly (decrease the likelihood) than incorrect inherited participants. Correct participants selected 18 of these tests and the impact on the likelihood of the inherited hypothesis was: decrease: 11; no change: 4; increase: 3. In contrast, incorrect inherited participants selected 55 of these tests and the impact on the likelihood of the inherited hypothesis was: decrease: 4; no change: 15; increase: 36 (Chi-squared test comparing the correct and incorrect inherited: [chi square] = 25.31 ; p = 0.000). The results provide support for H3a and show that not only did incorrect inherited participants choose these tests more often, but they also generally incorrectly interpreted them. The incorrect inherited participants generally increased the likelihood of the inherited hypothesis after viewing these tests but few participants in the correct group did this.
To test H3b, Table 4 shows the impact of examining the results of each of the four Category 6 tests (correct tests) on the likelihood of the correct cause. Overall results for the four tests are shown at the bottom of Table 4 (Chi-squared tests comparing the correct and the combined incorrect groups: [chi square] = 56.10; p = 0.000). These results provide support for H3b. Most correct participants increased the likelihood of the correct cause, while few incorrect participants did so. In fact, in the bottom section of Table 4 where we combine Tests 7, 9, 11, and 18, we find that the likelihood of the correct cause was decreased 28 times after seeing the results of these tests. Only four of these 28 decreases were by a correct subject (three of these related to Test 7). On the other hand, the likelihood was increased 47 times and 35 of these were by correct participants. Additional insights can also be gained by considering the impact of each of the four individual tests. Table 4 shows the differences in how the correct participants evaluated each of these tests compared to the other two outcome groups. For Test 7, there is little difference between the three outcome groups. However, for Tests 9, 11, and 18, it is clear that most participants in the correct group increased the likelihood of the correct cause after reviewing these test results, whereas most participants in the other two outcome groups did not.
From some additional analysis we carried out, we note that 50 percent (that is, 15) of the incorrect self-generated non-error group actually finished with a Category 6 test. It would be expected that the results of these Category 6 tests would result in participants decreasing the likelihood of their non-error hypotheses and increasing the likelihood of the correct cause. However, of these 15 participants, 10 actually increased the likelihood of their non-error by an average 0.26, two participants added in a new non-error hypothesis, and three participants made no change. Again, these results suggest that poor hypothesis evaluation was an important reason for not identifying the correct cause.
To answer RQ1, further analysis was undertaken to investigate differences in the information search and hypothesis evaluation processes between the 24 participants (identified in Table 1) who ultimately were able to select the correct cause after self-generating it, and the 39 participants (see Table 1) who were unable to correctly identify that cause as the correct cause after self-generating it. Table 5 provides descriptive differences between these two groups. With respect to information search, no differences were noted between participants selecting the correct cause, the incorrect inherited cause or an incorrect self-generated non-error cause for the number of tests selected, the budget hours utilized, or the breadth of hypothesis testing (number of hypothesis categories tested). However, several differences were noted in the depth of testing (number of tests per hypothesis category). Participants selecting the correct cause selected significantly fewer tests relating to the inherited explanation (Category 2) than did participants selecting the incorrect inherited explanation and participants selecting an incorrect self-generated non-error cause (F = 21.900, p = 0.000; means of 0.63, 1.95, and 1.26, respectively). The correct participants also selected significantly more tests relating to the correct cause (Category 6) than incorrect inherited participants (F = 3.141, p = 0.050; means of 2.00 and 1.35, respectively).
With respect to measures of performance outcome, both the final likelihood for the inherited explanation and the likelihood for participants' selected cause differed significantly between the groups (F = 153.379, p = 0.000; F = 8.264, p = 0.001). Post hoc tests revealed that participants selecting the incorrect inherited explanation had a significantly higher final likelihood for the inherited explanation than did the correct participants (p = 0.000; 84.00, 2.88) and the incorrect self-generated non-error participants (p = 0.000; 84.00, 14.21). Participants selecting an incorrect self-generated non-error had a significantly lower likelihood for their selected cause than the correct participants (p = 0.021; 54.74, 75.04) and the incorrect inherited participants (p = 0.001; 54.74, 84.00). While in the direction expected, the number of participants selecting a Category 6 test as the last test did not differ significantly between the correct and incorrect participants. No significant differences were noted between the groups for the time taken to complete the experiment.
DISCUSSION AND CONCLUSION
This study uses a realistic analytical procedures task in which participants had the opportunity to search for and obtain additional evidence. It jointly examines hypothesis generation, information search, hypothesis evaluation, and the final judgment stages of analytical procedures, and how these stages differ between auditors with different levels of performance.
The fact that 19 of out of 82 participants (23 percent) never considered the correct cause indicates that hypothesis generation deficiencies are one of the major causes for poor performance on this task, Of the 63 participants who generated the correct cause during the task, most (n = 50) generated it in the initial set, while relatively few (n = 13) generated it during testing. While the small number generating it during testing is consistent with the medical literature described earlier, it does show that subsequent generation is possible and that generation of the correct cause in the initial hypothesis set is not a necessary condition for selection of the correct cause. However, it appears that if auditors are going to generate the correct cause, it is most likely to occur in their initial hypothesis set. It was also found that including the correct cause in the initial or subsequent hypothesis set did not ensure that it would be selected as the correct cause, with only 24 (38 percent) of the 63 participants including the correct cause in their hypothesis set at some stage ultimately selecting it. These results indicate that, contrary to findings reported previously in medical (Elstein et al. 1978) and auditing (Bedard and Biggs 199 la) research, the inclusion of the correct cause in the auditors' initial hypothesis set is neither a necessary nor sufficient condition for selection of the correct cause.
Incorrect participants were divided into two categories: those who incorrectly selected the inherited hypothesis and those who incorrectly selected another non-error. The results addressing information search indicate that there were several differences in the depth of testing for specific hypotheses, as well as for the frequency of selection of specific tests between the correct and the incorrect inherited participants. The former chose significantly less of the inherited non-error tests and significantly more of the correct error tests. However, there were few differences in information search between the correct participants and the incorrect self-generated non-error participants, with both choosing on average the same number of correct error tests. It therefore appears that the differences between these two groups' performance is more likely to relate to information evaluation rather than information search.
Differences were found in the hypothesis evaluation performance of the correct and incorrect participants. For the incorrect inherited group, not only did they choose more tests related to the inherited hypothesis and less correct error tests, but they also generally had difficulties evaluating these tests. These participants generally increased their likelihood of the inherited explanation rather than reducing it and they more frequently decreased the likelihood of the correct error hypothesis rather than increasing it. This was not the case for the correct participants who generally correctly interpreted the results of these tests.
While there were few differences in information search between correct participants and incorrect self-generated non-error participants, the differences in hypothesis evaluation were significant. In particular, for the evaluation of the correct error tests, most participants in the correct group increased the likelihood of the correct cause. However, the incorrect self-generated non-error group, on average, more often decreased the likelihood of the correct cause. This poor performance in hypothesis evaluation led to the final incorrect conclusion.
In summary, the results of this study indicate that although the correct and incorrect participants selected a similar number of tests, and spent a similar amount of budget and experimental time, there were significant differences in hypothesis generation, information search, and hypothesis evaluation. These findings therefore lend support to the suggestion by Asare and Wright (2003) that not only is hypothesis generation important, but information search and hypothesis evaluation are also important sources of incorrect judgments.
Our conclusions should be considered in light of the study's limitations. First, the case involved an income understatement, whereas in practice auditors may see income overstatement as a higher-risk situation. To the extent that different information search and hypothesis evaluation strategies may be used to investigate under- and overstatement errors, our results may not be generalizable. Second, while we provided extensive background data, financial data, and numerous tests, participants were told to focus on the gross margin ratio. Future research could provide less guidance, include a wider variety of patterns of accounts related to the error, and incorporate the technology that has recently been adopted by firms. Third, the study did not include all diagnostic tests and participants could limit the number of audit procedures conducted. In practice, the use of an audit program where auditors cannot cut off work without approval and where they may have had more time and incentives to follow up with additional work, overall performance on the task may be higher. Fourth, this study did not examine the effect of technology changes, in the form of computerized checklists and decision aids, in the performing of analytical procedures. This issue provides scope for future research. Finally, although this study provides considerable evidence of the ways in which information search and hypothesis evaluation differ between auditors with different performance outcomes, the research design used does not allow conclusions to be made regarding why these differences occur. Future research using alternative methods, such as protocol analysis, may provide evidence of such differences.
TABLE 1 Descriptive Statistics for Participants Generating and Not Generating the Correct Error Cause Number of Participants in Each Group and Their Performance Outcome Participants Not Generating the Correct Error Cause Selected Selected Incorrect Incorrect Total Inherited Non-Error n = 19 n = 8 n = 11 Generated in initial set of hypotheses Generated during information search Generated during the whole task Not generated 19 8 11 during the whole task Participants Generating the Correct Error Cause Selected Selected Selected Correct Incorrect Incorrect Total Cause Inherited Non-Error n = 63 n = 24 n = 20 n = 19 Generated in 50 19 17 14 initial set of hypotheses Generated 13 5 3 5 during information search Generated 63 24 20 19 during the whole task Not generated during the whole task TABLE 2 Descriptive Statistics for Information Search Incorrect Correct (C) Inherited (I) Mean No. Mean No. (std. dev.) (std. dev.) Category (n = 24) (n = 28) Audit Tests 5.71 5.82 Selected (a) (1.30) (.94) Budget Hours 22.13 21.43 Utilized (a) (5.36) (3.94) Number of 3.96 4.14 Hypothesis (.86) (.80) Categories Tested (a) 1 Sales Mix or Price .67 .61 Change (.76) (.57) (Tests 4, 13 and 14) 2 Inherited Non-error .63 1.89 (b) (Tests 3, 16 and 17) (.58) (.74) 3 General Gross .67 .46 Margin Changes (.56) (.51) (Tests 1 and 15) 4 Sales Error .71 .71 (Tests 5 and 17) (.55) (.46) 5 Inventory Error 1.04 .93 (Tests 2, 6, 8, and 10) (1.00) (1.09) 6 Correct error 2.00 1.21 (d) (Tests 7, 9, 11, and 18) (.93) (.79) Incorrect Self-Generated Non-Error (N-E Total Mean No. Mean No. (std. dev.) (std. dev.) Category (n = 30) (n = 82) Audit Tests 5.83 5.83 Selected (a) (1.15) (1.14) Budget Hours 21.23 21.73 Utilized (a) (4.49) (4.63) Number of 3.87 4.02 Hypothesis (1.14) (.97) Categories Tested (a) 1 Sales Mix or Price .97 .79 Change (.72) (.69) (Tests 4, 13 and 14) 2 Inherited Non-error 1.23 1.29 (Tests 3, 16 and 17) (.86) (.89) 3 General Gross .80 .67 Margin Changes (.55) (.57) (Tests 1 and 15) 4 Sales Error .47 .64 (Tests 5 and 17) (.68) (.59) 5 Inventory Error .37 (c) .76 (Tests 2, 6, 8, and 10) (.56) (.93) 6 Correct error 2.00 1.71 (Tests 7, 9, 11, and 18) (1.02) (1.01) (a) No significant differences in number of categories tested between the performance outcome groups (F = 0.624, p = 0.539). (b) I significantly more than C: p = 0.000; and I significantly more than N-E: p = 0.003. (c) N-E significantly less than C: p = 0.023; and N-E significantly less than I: p = 0.059. (d) I significantly less than C: p = 0.011; and 1 significantly less than N-E: p = 0.007. TABLE 3 Descriptive Statistics for Frequency of Selection for Each Audit Test by Performance Outcome Incorrect Correct Inherited Number Number (Proportion) (Proportion) Test (n = 24) (n = 28) Category 1: Sales Mix or Price Change Tests 4 7 (.29) 7 (.25) 13 4 (.17) (a)(b) 3 (.11) (a) 14 5 (.21) 7 (.25) Category 2: Inherited Non-Error Tests 3 2 (.08) (a) 20 (.75) (b) 16 12 (.50) 19 (.68) 17 0 (.00) (a) 13 (.46) (b) Category 3: General Tests for Gross Margin Changes 1 12 (.50) 9 (.32) 15 4 (.17) 4 (.14) Category 4: Sales Error Tests 5 11 (.46) 12 (.43) 12 6 (.25) 8 (.29) Category 5: Inventory Error Tests 2 7 (.29) 10 (.36) 6 4 (.17) 3 (.11) 8 5 (.21) 7 (.25) 10 9 (.37) (a) 6 (.21) (a)(b) Category 6: Correct Error Tests 7 12 (.50) 13 (.46) 9 9 (.37) 9 (.32) 11 15 (.62) (a) 8 (.29) (b) 18 13 (.50) (a) 4 (.14) (b) Total 137 164 Incorrect Self- Generated Non-Error Total Number Number (Proportion) (Proportion) Test (n = 30) (n = 82) Category 1: Sales Mix or Price Change Tests 4 8 (.27) 22 (.26) 13 11 (.37) (b) 18 (.23) 14 10 (.33) 22 (.27) Category 2: Inherited Non-Error Tests 3 11 (.37) (c) 33 (.43) 16 19 (.63) 50 (.61) 17 7 (.23) (c) 20 (.25) Category 3: General Tests for Gross Margin Changes 1 17 (.57) 38 (.48) 15 7 (.23) 15 (.19) Category 4: Sales Error Tests 5 9 (.30) 32 (.41) 12 5 (.17) 19 (.24) Category 5: Inventory Error Tests 2 3 (.10) 21 (.25) 6 2 (.07) 9 (.11) 8 4 (.13) 16 (.19) 10 2 (.07) (b) 17 (.21) Category 6: Correct Error Tests 7 15 (.50) 40 (.49) 9 9 (.30) 27 (.33) 11 18 (.57) (a) 41 (.48) 18 19 (.63) (a) 36 (.42) Total 176 477 (a)(b)(c) Entries in the same row with different letters are significantly different. For example, for test 11 the incorrect inherited is significantly different than the other two columns, but there is no difference between the correct subjects and the incorrect non-error. TABLE 4 Impact of Correct Error Tests on Likelihood of Correct Cause Performance Outcome Incorrect Self- Impact on Likelihood for Incorrect Generated Correct Cause Correct Inherited Non-Error Test 7 selected: n = 40 Decreased: n = 6 3 2 1 No change: n = 16 5 5 6 Increased: n = 8 4 2 2 Not applicable (a): n = 10 4 6 Test 9 selected: n = 27 Decreased: n = 4 0 3 1 No change: n = 8 2 4 2 Increased: n = 8 7 0 1 Not applicable: n = 7 2 5 Test 11 selected: n = 41 Decreased: n = 10 1 5 4 No change: n = 9 2 1 6 Increased: n = 15 12 1 2 Not applicable: n = 7 1 6 Test 18 selected: n = 36 Decreased: n = 8 0 2 6 No change: n = 5 1 0 4 Increased: n = 16 12 2 2 Not Applicable: n = 7 0 7 Total Tests 7, 9, 11, 18: n = 144 Decreased: n = 28 4 12 12 No change: n = 38 10 10 18 Increased: n = 47 35 5 7 Not Applicable: n = 31 7 24 Chi-Square for total tests combining incorrect groups ([chi square] = 56.10; p = 0.000). (a) Some incorrect participants never included the correct cause in their hypothesis set and therefore we could not examine the impact on the likelihood of the correct cause. TABLE 5 Participants Generating the Correct Error Cause Selected Selected Selected Correct Incorrect Incorrect Cause Inherited Non-Error n = 24 n = 20 n = 19 Number of Tests Selected 5.71 5.85 6.05 Budgeted Hours Utilized 22.13 21.50 22.42 Breadth of Hypothesis Testing 3.96 4.15 4.21 Depth of Testing Differences Category 2 Tests (1) 0.63 (a) 1.95 (b) 1.26 (c) Category 6 Tests (2) 2.00 (d) 1.35 (e) 1.95 (d)(e) Final Likelihood of Inherited Explanation (3) 2.88 (a) 84.00 (b) 14.21 (a) Final Likelihood of Selected Cause (4) 75.04 (a) 84.00 (a) 54.74 (b) Time to Complete Experiment (minute) 42.94 40.61 45.65 Number of Participants Choosing a Category 6 Test as Last Tests (5) 13 5 9 (1) F = 21.900, p = 0.000 (2) F = 3.141, p = 0.050 (3) F = 153.379, p = 0.000 (4) F = 8.264, p = 0.001 (5) [chi square] = 4.016, p = 0.134 (a)(b)(c) Entries in same row with different letters are significantly different. (d)(e) Entries in same row with different letters are marginally significantly different.
The authors thank Jean Bedard, Stan Biggs, Mario Maletta, Roger Simnett, Amie Wright, Bill Messier (editor), and two anonymous reviewers for their helpful comments. We also thank Steve Asare and Amie Wright for access to their original research instrument.
(1) A correct hypothesis set is a set of hypotheses that includes the correct cause/error.
(2) Asare and Wright (2003) randomly assigned participants to one of six cells created by crossing two levels of information search with three levels of hypothesis set. Only one of these six cells looked at all four stages of the analytical procedures process. Their analysis compared the judgments in this cell with judgments in the other cells. Our study addresses the situation where all participants are involved in the four stages and we compare the performance in these phases of correct and incorrect participants.
(3) Asare and Wright (2003) use the term "balanced information set" to indicate that evidence was provided to the auditors for each of the provided hypotheses.
(4) Originally, there were 84 participants. Two of the participants proposed an incorrect self-generated error explanation. Given the small number in this group, the remainder of the analysis does not consider these two participants, concentrating on the other three groups described in the "Results" section (i.e., 82 participants).
(5) These controls aimed to ensure: all sections of the instrument were completed; no cause could be deleted after inclusion; amount of budget time left was clearly displayed after each test; and that the cause selected was one of those included in the hypothesis set.
(6) The inherited hypothesis was manipulated using a 2 x 2 between-participants design, varying the source (either client management or audit superior) and the timing of receipt of the inherited explanation (either prior to or after self-generation of explanations). These manipulations and the results examining them are included in a related study by one of the authors. These manipulations had neither significant impact on the correctness of the final judgment nor any impact on oar hypotheses.
(7) The results for these four tests did not provide clear confirmation of the inherited explanation. Results for Test 3 confirmed that there had been some decline in gross margins through the industry due to increased Japanese competition, however the extent of the decline did not account for substantially all (defined in the research instrument as 90 percent or more) of the noted fluctuation. Results for Test 16 confirmed the existence of increased Japanese competition, however that test also noted that the clients' sales prices had been unchanged for the past three years indicating that the inherited explanation did not account for the noted fluctuation in the current period. Results for Test 17 did confirm the presence of increased Japanese competition in the market, however the test results provided no client-specific details to enable confirmation of that factor as the cause of the noted fluctuation for the client for the current time period. Finally, Test 13 disconfirmed the inherited explanation since it indicated there had been no sales price changes. Therefore, although Test 17 provided evidence generally indicative of increased Japanese competition, since no evidence relating to client-specific effects was provided by this test, it is reasonable to suggest that the auditors should have seen a need for further attention to this explanation via additional tests to determine the impact on the client's financial statements. In fact, every participant who chose Test 17 also chose at least one other test of the inherited explanation.
(8) The actual seeded error was that standard cost variances relating to labor and overhead were posted to the COGS account rather than being allocated across COGS and finished goods inventory. The result of this error was an overstatement of COGS and an understatement of finished goods inventory.
(9) As discussed in footnote 4 this latter category only included two participants who were not included in the analysis.
(10) Neither the general experience (years and level in the firm) nor the domain experience (number of manufacturing audits and number of manufacturing clients) was found to differ significantly between the participants able to and not able to generate the correct error cause. These measures also did not differ significantly between participants able to and not able to select the correct error cause after generating it.
Asare, S. K., and A. Wright. 1995. Normative and substantive expertise in multiple hypothesis evaluation. Organizational Behavior and Human Decision Processes 64: 171-184.
--, and --. 1997a. Hypothesis revision strategies in conducting analytical procedures. Accounting, Organizations and Society 22 (8): 737-755.
--, and --. 1997b. Evaluation of competing hypotheses in auditing. Auditing: A Journal of Practice & Theory 16 (Spring): 1-13.
--, --, and S. Wright. 1998. Utilizing analytical procedures as substantive evidence: The impact of a client explanation on hypothesis testing. Advances in Accounting Behavioral Research: 13-31.
--, and --. 2003. A note on the interdependence of hypothesis generation and information search in conducting analytical procedures. Contemporary Accounting Research. 20 (2): 235-251.
Bedard, J. C., and S. F. Biggs. 1991a. Pattern recognition, hypothesis generation and auditor performance in an analytical task. The Accounting Review 66 (July): 622-642.
--, and --. 1991b. The effect of domain-specific experience on evaluation of management representations in analytical procedures. Auditing: A Journal of Practice & Theory l0 (Supplement): 77-90.
--, --, and J. DePietro. 1998. The effects of hypothesis quality, client management explanations and industry experience on audit planning decisions. Advances in Accounting 16: 49-73.
Bell, T., F. Marts, I. Solomon, and H. Thomas. 1997. Auditing Organizations Through a Strategic-Systems Lens. Montvale, NJ: KPMG Peat Marwick LLP.
Bierstaker, J. L., J. C. Bedard, and S. F. Biggs. 1999. The role of problem representation shifts in auditor decision processes in analytical procedures. Auditing: A Journal of Practice & Theory 18 (Spring): 18-36.
Biggs, S. F., T. Mock, and P. Watkins. 1988. Auditors' use of analytical review in audit program design. The Accounting Review 63 (January): 148-161.
Elstein, A. S., L. E. Shulman, and S. A. Sprafka. 1978. Medical Problem Solving. An Analysis of Clinical Reasoning. Cambridge, MA: Harvard University Press.
Heiman, V. 1990. Auditors' assessments of the likelihood of error explanations in analytical review. The Accounting Review 65 (October): 875-890.
Heiman-Hoffman, V., D. V. Moser, and A. Joseph. 1995. The impact of an auditors' initial hypothesis on subsequent performance at identifying actual errors. Contemporary Accounting Research 11 (Spring): 763-779.
Ismail, Z., and K. T. Trotman. 1995. The impact of the review process in hypothesis generation tasks. Accounting, Organizations and Society: 20 (5): 345-357.
Joseph, G. M., and V. L. Patel. 1990. Domain knowledge and hypothesis generation in diagnostic reasoning. Medical Decision Making 10:31-46.
Koonce, L. 1992. Explanation and counterexplanation during audit analytical review. The Accounting Review 67 (January): 59-76.
--. 1993. A cognitive characterization of audit analytical review. Auditing: A Journal of Practice & Theory 12 (Supplement): 57-76.
Libby, R. 1981. Accounting and Human Information Processing: Theory and Applications. Englewood Cliffs, NJ: Prentice Hall.
--. 1985. Availability and the generation of hypotheses in analytical review. Journal of Accounting Research: 23 (2): 648-667.
--, and D. Frederick. 1990. Experience and the ability to explain audit findings. Journal of Accounting Research 28 (Autumn): 348-367.
Patel, V., and G. Groen. 1986. Knowledge-based solution strategies in medical reasoning. Cognitive Science: 91-116.
Submitted: October 2001
Accepted: March 2003
Wendy J. Green is a Senior Lecturer and Ken T. Trotman is a Professor, both at The University of New South Wales.
|Printer friendly Cite/link Email Feedback|
|Author:||Green, Wendy J.; Trotman, Ken T.|
|Publication:||Auditing: A Journal of Practice & Theory|
|Date:||Sep 1, 2003|
|Previous Article:||The impact of retention incentives and client business risks on auditors' decisions involving aggressive reporting practices.|
|Next Article:||A test of changes in auditors' fraud-related planning judgments since the issuance of SAS No. 82.|