# Finding the bad egg: audit sampling risk.

You are engaged in an audit and you are about to do some substantive testing on a certain account balance. To your horror, you have found that this account contains 100 items. You are not sure whether you can (or want to) examine each and every one of them.

You approach your partner with this problem, who makes the following remark: "This is an important account and we don't want to have too many mistakes. Why don't we do this? First, choose 10 items at random to examine. If you don't find any mistakes, we'll accept the entire batch. Otherwise, we must do a complete examination on all. This way even if the batch has a 10% error rate, you will at least find one wrong item and we'll be safe."

The advice you just received certainly sounds reasonable, but unfortunately it is quite wrong, as we shall see. The problem is that your partner has jumped to a conclusion about the manner in which errors are distributed among the 100 items in the population. It would be correct if the population consisted of not 100 individual items, but rather 10 groups of 10 items with each individual group having a 10% error rate. Sometimes, such a knee-jerk response could be dangerously misleading.

This article will take a closer look at audit sampling risk. Specifically, we will show how the risk should be calculated and, more importantly, we will develop a tool to do so on an electronic spreadsheet.

Closer Look at Sampling Risk

Basically, sampling risk is the probability that (1) there is an error in the population and (2) your examination by sampling fails to reveal it. Before tackling our original problem, let's look at a simplified version to see how it works.

Suppose the population size is six instead of 100, and we are to draw a sample of two. There is one error among the six items. What then is the sampling risk? To answer this question we need to address two issues. First, how many ways can we choose a sample of two out of a total population of six? Secondly, how many ways can we choose a sample of two out of the same population without including the error (i.e., a population of five)?

The possible combinations can be visualized as follows:

(1) Label the items in the population as A, B, C, D, E and F. For illustration purposes, item F is the bad one.

(2) The total number of possible ways to form a sample of two out of the total six is 15.

AB BC CD DE EF AC BD CE DF AD BE CF AE BF AF

(3) Similarly, the total number of ways to choose a sample of two from only the good population is 10.

AB BC CD DE AC BD CE AD BE AE

(4) Since F is the culprit, we will be right if our sample includes F and wrong if it does not. Thus, the probability of our making a mistake equals:

# of ways to form a sample excluding the culprit/# of ways to form a sample per se

For our simplified scenario, the risk of not being able to find our faulty party is two-third (10 / 15).

Returning to our original example, the challenge becomes one of calculating all the possible combinations without having to list them out one by one. Fortunately, the technique is available--the binomial coefficient.

The computation formula is:

[Mathematical Expression Omitted]

where: n is population size and r is the sample size ! is the notation for "factorial" calculation--for example, 3! = 3 x 2 x 1 = 6

For our simplified example, it is represented as:

[Mathematical Expression Omitted]

For the original example, the solution becomes:

[Mathematical Expression Omitted]

Thus, we have seen that our risk of "missing the boat" in the original scenario is actually much higher than 10%!

Our computation using the binomial coefficient can become forbidding even if the population size is modestly large. For example, 10! = 3,628,800 and 100! = 9.33 times [10.sup.157]. However, timely rescue is on the way. Some electronic spreadsheet packages have bundled such computing facilities as part of the built-in sensitivity analysis can be done with minimal effort.

Illustrations 1-3 show the basic risk analysis worksheets for three levels of population size (small, medium and large) using Quattro Pro for Windows[R], version 5.0. Figures 1-3 graph the risk for the 1%, 5% and 10% levels.

For a relatively small population with a small expected error rate, the risk/sample size relationship is close to a straight line (see the 1% column). However, for a large expected error rate, it does not take a very large sample to control the risk at a small level. For example, with the expected error rate of 5%, a sample of 35 is sufficient to control our risk at approximately the 10% level. Similarly, if the expected population error rate is 20%, then only a sample of 10 is needed.

This phenomenon highlights a very important concept in audit risk reduction. Unless our goal is to discover and to correct all the errors in the population, having a large sample is not usually an economical way to learn about the population. As a preliminary screening, audit sampling techniques will provide a means to control our acceptable level of risk in a prudent manner.

The second illustration examines the risk behavior of our model when the population is increased to 1000. Notice that all the probabilities remain virtually the same for the high expected population error categories (i.e., 20% or above). Therefore, when the population is error prone, there is no need to increase our sample size in the same proportion to the increase in population size. As a matter of fact, even for a small expected error rate of 5%, the sample size needed to control risk at the same level only requires an increase from approximately 45 to 55. Although the population now is ten times larger, we only need an additional 10 units.

Illustration 3 confirms our previous findings. The most interesting phenomenon is that in order to maintain risk at 5% as before, only about five more sample units are needed (60), even when the population has increased to 10,000. Similarly, the high error rate columns are virtually unchanged.

What-if Sensitivity Analysis

Nowadays, spreadsheet packages have a command feature that is extremely useful for this type of risk analysis. Using such a "what-if" routine, we can manipulate any three of the four variables to find the other one. For example, by specifying the population size, expected error rate and desired risk level, we can calculate sample size. Alternatively, using the same spreadsheet, we can determine the projected population error rate based on our risk level, sample size and estimated population size. The appendix provides the codes to set up the spreadsheet and shows how to find our answers. Furthermore, for readers who do not have the latest version of Quattro Pro [R], a more generic alternative is provided and it should work on any spreadsheet with the "what-if" feature.
```Table 1: Sampling Risk Calculation
Using Combinatorial Function in Quattro Pro for Windows 5.0
With Small Population Size

Sample Expected Population Error Rate--Population: 100
Size 1.00% 5.00% 10.00% 20.00% 50.00% 75.00%

0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
5 0.9500 0.7696 0.5838 0.3193 0.0281 0.0007
10 0.9000 0.5838 0.3305 0.0951 0.0006 0.0000
15 0.8500 0.4357 0.1808 0.0262 0.0000 0.0000
20 0.8000 0.3193 0.0951 0.0066 0.0000 0.0000
25 0.7500 0.2292 0.0479 0.0015 0.0000 0.0000
30 0.7000 0.1608 0.0229 0.0003 0.0000 0.0000
35 0.6500 0.1097 0.0103 0.0001 0.0000 0.0000
40 0.6000 0.0725 0.0044 0.0000 0.0000 0.0000
45 0.5500 0.0462 0.0017 0.0000 0.0000 0.0000
50 0.5000 0.0281 0.0006 0.0000 0.0000 0.0000
55 0.4500 0.0162 0.0002 0.0000 0.0000 0.0000
60 0.4000 0.0087 0.0000 0.0000 0.0000 0.0000
65 0.3500 0.0043 0.0000 0.0000 0.0000 0.0000
70 0.3000 0.0019 0.0000 0.0000 0.0000 0.0000
75 0.2500 0.0007 0.0000 0.0000 0.0000 0.0000
80 0.2000 0.0002 0.0000 0.0000 0.0000 0.0000
85 0.1500 0.0000 0.0000 0.0000 0.0000 0.0000
90 0.1000 0.0000 0.0000 0.0000 0.0000 0.0000
95 0.0500 0.0000 0.0000 0.0000 0.0000 0.0000
100 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Sampling Risk Calculation With Medium Population Size
Sample Expected Population Error Rate--Population: 1,000
Size 1.00% 5.00% 10.00% 20.00% 50.00% 75.00%

0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
5 0.9509 0.7734 0.5898 0.3269 0.0309 0.0009
10 0.9040 0.5973 0.3469 0.1062 0.0009 0.0000
15 0.8591 0.4607 0.2035 0.0343 0.0000 0.0000
20 0.8163 0.3549 0.1190 0.0110 0.0000 0.0000
25 0.7754 0.2730 0.0694 0.0035 0.0000 0.0000
30 0.7364 0.2097 0.0403 0.0011 0.0000 0.0000
35 0.6991 0.1608 0.0234 0.0003 0.0000 0.0000
40 0.6636 0.1232 0.0135 0.0001 0.0000 0.0000
45 0.6297 0.0942 0.0078 0.0000 0.0000 0.0000
50 0.5973 0.0720 0.0045 0.0000 0.0000 0.0000
55 0.5665 0.0549 0.0026 0.0000 0.0000 0.0000
60 0.5371 0.0418 0.0015 0.0000 0.0000 0.0000
65 0.5090 0.0318 0.0008 0.0000 0.0000 0.0000
70 0.4823 0.0241 0.0005 0.0000 0.0000 0.0000
75 0.4569 0.0183 0.0003 0.0000 0.0000 0.0000
80 0.4327 0.0139 0.0002 0.0000 0.0000 0.0000
85 0.4096 0.0105 0.0001 0.0000 0.0000 0.0000
90 0.3877 0.0079 0.0000 0.0000 0.0000 0.0000
95 0.3668 0.0060 0.0000 0.0000 0.0000 0.0000
100 0.3469 0.0045 0.0000 0.0000 0.0000 0.0000

Sampling Risk Calculation With Large Population Size
Sample Expected Population Error Rate:--Population: 10,000
Size 1.00% 5.00% 10.00% 20.00% 50.00% 75.00%

0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
5 0.9510 0.7737 0.5904 0.3276 0.0312 0.0010
10 0.9043 0.5986 0.3485 0.1073 0.0010 0.0000
15 0.8600 0.4630 0.2057 0.0351 0.0000 0.0000
20 0.8177 0.3581 0.1213 0.0115 0.0000 0.0000
25 0.7776 0.2770 0.0716 0.0037 0.0000 0.0000
30 0.7394 0.2141 0.0422 0.0012 0.0000 0.0000
35 0.7030 0.1656 0.0249 0.0004 0.0000 0.0000
40 0.6684 0.1280 0.0147 0.0001 0.0000 0.0000
45 0.6355 0.0989 0.0086 0.0000 0.0000 0.0000
50 0.6043 0.0764 0.0051 0.0000 0.0000 0.0000
55 0.5745 0.0591 0.0030 0.0000 0.0000 0.0000
60 0.5462 0.0456 0.0018 0.0000 0.0000 0.0000
65 0.5192 0.0353 0.0010 0.0000 0.0000 0.0000
70 0.4936 0.0272 0.0006 0.0000 0.0000 0.0000
75 0.4693 0.0210 0.0004 0.0000 0.0000 0.0000
80 0.4461 0.0162 0.0002 0.0000 0.0000 0.0000
85 0.4240 0.0125 0.0001 0.0000 0.0000 0.0000
90 0.4031 0.0097 0.0001 0.0000 0.0000 0.0000
95 0.3832 0.0075 0.0000 0.0000 0.0000 0.0000
100 0.3642 0.0058 0.0000 0.0000 0.0000 0.0000
```

Conclusion

The above demonstrations have two objectives in mind. The first is to provide an intuitive but basic understanding of audit sampling risk. This should be helpful for accountants who have been overwhelmed by the technical jargon of statistical sampling. With this basic understanding, accountants can better appreciate the power and capabilities of statistical auditing.

More importantly, despite the seeming complexity, many of these nice features are readily at hand. The second objective, therefore, is to show how to set up sampling risk analysis on an electronic spreadsheet. In this regard, accountants can put the conceptual knowledge to substantive use. Improved audit decision making and enhanced quality of work can be obtained with a small investment of time and effort.

Chak-Tong Chau, PhD, CPA, CMA, CIA, AHKSA, is assistant professor of accounting at the University of Texas at San Antonio.
COPYRIGHT 1994 National Society of Public Accountants
No portion of this article can be reproduced without the express written permission from the copyright holder.