# Statistical sampling: a potential win for business taxpayers.

The IRS Large and Mid-Size Business (LMSB) Division has recently sought and received advice from Tax Executives Institute, the American Institute of Certified Public Accountants, and others on the appropriate guidelines for the use of statistical sampling in the filing of tax returns. Although this action has understandably created some apprehension for businesses, it is also an explicit recognition that taxpayers can use statistical sampling to allow them to claim deductions and credits they might not be able to claim otherwise.

The IRS has used statistical sampling as an audit tool for many years. At the same time, there was little acknowledgement that taxpayers could also use this tool, except in a few narrowly defined situations. Taxpayers, however, have been increasingly using sampling for the filing of tax returns in more and more situations.

Tax Sampling Situations

How can businesses use sampling for filing tax returns? Sampling should be considered whenever there are facts and circumstances determinations and the population is too large to review it in its entirety without incurring excessive costs and risking overtaxing the reviewers to the point of having a bad process. One example is the review of a sample of meals and entertainment expenses currently subject to a 50-percent limitation. The underlying invoices and other documentation are reviewed to determine a percentage of such expenses that can be reclassified to 100-percent deductible. That percent is then applied to the entire 50-percent limitation population. Statistical sampling has also been used on invoices from the earlier months of a taxpayer's fiscal year to determine a percentage that are related to work performed in the prior fiscal year. Another example is the use of a sample of records from documented costs associated with large fixed assets--property, plant and equipment--to determine which costs qualify as research or experimental and therefore are eligible for immediate deduction rather than for depreciation over a period of up to 39 years. In this case, the search is for detailed items, like carpeting, that are classified as 39-year property but can be depreciated over a shorter time. Sampling has also been used for many years for LIFO inventory determination, both to develop an index or to assign the inventory to published categories.

In addition to situations where the population is so large that it would be essentially impossible to review them all, there are also situations where the population may be moderate in size but the cost of processing each sample or population unit is high. An example is the determination of eligibility for the R&D tax credit. A sample of projects or employee wages can be used to determine the eligible amounts spent. The IRS frequently uses sampling to review R&D credit claims during the course of an audit.

In fact, it makes sense for businesses to consider sampling for filing returns in any situation where the IRS tends to use sampling for the audit of such claims. Another example is the sampling of business properties for cost segregation purposes. In this case, the population may be quite modest in size, a few hundred, but because of the high cost of processing each property, sampling can save several hundred thousand dollars over the cost of having engineers and architects look at each property.

As the foregoing examples demonstrate, sampling has broad applicability in tax and should be considered whenever deductions or credits might be left on the table because the population is too large or the cost is too high to look at everything.

Why Sample?

There is a common perception that sampling is inferior to reviewing the entire population, that it is some kind of necessary evil when the population is simply too large to review. The estimates from a sample will not be exactly the same as the true population value, except purely (and very rarely) by chance. Neither, however, will a review of the entire population. A review of the entire population, even when the population is only moderately large (say 10,000 or so) is a tremendous undertaking. The level of effort needed is so large that fatigue, boredom, the need to meet deadlines, and other factors lead to error on the part of the reviewers. At the same time, the level of effort makes it difficult, if not impossible, to carefully monitor the work so that reviewers are held to standards and errors are kept to minimal levels.

The type of error introduced by the fact that we are looking at a sample, which is expected to differ from the true population value and from other samples is called sampling error. The type of error introduced by fatigue, boredom, etc., is called non-sampling or measurement error.

Statistical sampling is all about identifying and minimizing sources of error. While the review of the entire population is not subject to sampling error, it is subject to measurement error. The review of a sample is subject to both. A major advantage of a well-designed statistical sample is that it minimizes total error. In a well-designed statistical sample, the sampling error is as small as we need it to be. It is reduced by simply increasing the sample size or by using a more efficient sample design or estimator. Its magnitude is known and under our control. Large improvements in sample precision can be achieved simply from looking at a design change. In one example involving an IRS audit, we were able to retain the same sampling error that the IRS had decided to tolerate in their sample but reduce the size by one-fifth simply by making their design more efficient. This reduced the burden for both the company and the IRS.

The non-sampling error is not easily measured. Its control is more difficult, requiring whatever training, monitoring, and process improvement necessary to ensure a highly accurate and consistent review process. Needless to say, this type of quality is easier to achieve when the number of items to be reviewed is manageable. The total error in the sample can be smaller than the total error resulting from a review of the whole population.

What is Statistical Sampling?

Statistical sampling is a way to take a small portion to learn something about the whole. The accuracy of what is learned depends on how similar the whole is throughout, how big the sampled portion, and how well it is spread throughout the population.

What makes a sample a statistical sample? The key difference between statistical samples and others is that statistical samples are selected using a random process. This means that whether or not some element of the population is included in the sample is governed solely by having its number come up on the roll of a die, toss of a coin, or, more realistically, by being selected by a random number generator. This does not mean that all elements in the population have the same chance of selection; it does mean that the probability of selection is known for each element in the population and that all population elements have some chance of being in the sample. A major advantage of statistical sampling is that the accuracy of the estimates made from the sample can be quantified.

Statistical sampling accuracy can be quantified in terms of the confidence and precision of the estimates from the sample. If we were to select repeated samples from the same population, calculate the sample average for each, and put the sampled cases back in the population after we calculate the average, we would expect those sample averages to be close to each other but not identical. In fact, most would be fairly close but a few might be quite atypical if we selected enough samples. If we made our sample size larger for each of the repeated samples, we would expect their sample averages to cluster more closely together. Precision is the measure of how tightly clustered these averages are. (1) It can be calculated without actually doing the repeated sampling. The most common and useful measure of precision is the standard error. It is an indication of the average amount our sample averages would be expected to differ.

We can use our estimate of precision, the standard error, to construct what is called a confidence interval around the estimate from a single sample. If we take the standard error and add that amount to the average from our sample, we get an upper range where we think most of the sample averages will fall. We can also subtract that amount to get a lower range of likely values. The interval from the lower limit to the upper limit is called the confidence interval. If we want to be more certain that the true population value falls somewhere in the interval, we can just widen the interval. Statistical theory allows us to associate a probability with confidence intervals by applying the correct multiple of the standard error. (2) Conventionally, we use 90 percent or 95 percent confidence levels but other levels are possible. The relevant consideration is that the higher the degree of confidence, the more likely we are to capture the true value in the interval but the interval also widens as we increase our confidence. The interval can be made shorter by designing a more efficient sample, by increasing the sample size, or by lowering the confidence level.

Confidence intervals are likely to be an important element in the IRS LMSB guidelines. This ability to quantify the accuracy of the estimate from the sample is the primary advantage of statistical sampling over non-statistical sampling. Without statistical sampling, there is no way to tell whether the sample estimate is accurate or not.

Examples of non-statistical samples include reviewing one year (or one month) and applying the results to several years, selecting the ten largest accounts from several groups of accounts, or simply taking the fifth file from each drawer of a set of file cabinets. In all of these examples, the choice of which elements are included in the sample is not in any way governed by chance. Therefore, probability theory cannot be used to develop confidence and precision levels for the estimates from these samples. With simple modifications, these samples could be made statistical (or probabilistic). In the first case, instead of reviewing one year or one month, a sample could be spread throughout the years and randomly selected. If desired, a fixed sample size could be determined for each year and randomly selected within the years. (3) In the second example, instead of taking only the largest accounts, additional accounts could be randomly selected in each group of accounts. (4) In the third example, that of taking the fifth file from each drawer, the sample could be made random by numbering the files in the drawers and selecting them with a random number generator.

Non-statistical samples also have no built-in mechanism to control for non-representativeness or bias. The kind of haphazard or judgment sample drawn can be very atypical and there is no way to control for that. A statistical sample is based on randomization. That means that although an unusual statistical sample can occur, it is rare. With advanced sampling methods, we can virtually ensure that the sample is representative. We once did a meals and entertainment statistical sample following a preliminary feasibility non-random sample. The non-random sample had not been representative and resulted in a very different estimate of the rate of movement from the 50-percent limitation to 100-percent deductible. This meant that the company expected to find more value than was actually there--not a happy situation for the company or the tax consultants! The bias control of statistical samples makes them well worth the additional effort.

Although all statistical samples have the property that their sampling accuracy can be quantified, they are not all equal. The simple fact that a sample is statistical is not enough to make it a good sample. Good statistical samples are tailored to the specific population and sampling situation to make them as small as possible while still meeting the precision requirements. A simple random sample can be as much as two to three times larger than a well-designed complex sample. We once compared the sample sizes for a simple random sample and a stratified sample for the same sampling circumstances. The stratified sample size was around 400 and the simple random sample size to achieve the same confidence and precision was over 1000.

A good statistical sample is not achieved through a one size fits all type of approach. It requires analysis of the population to ensure that sampling assumptions are met and to design the sample so it is best for that population. Best in this context means that all the tools available to the sampler are used to handle population characteristics that might otherwise increase the sample size needed to achieve the specified precision.

What the IRS Guidance is Likely to Include?

Although the IRS LMSB Division is still in the process of developing sampling guidelines, they have been willing to discuss what they are considering. They are likely to establish a "safe harbor" for taxpayers using sampling. The "safe harbor" would allow statistical sampling in filing situations. So long as the taxpayer uses the lower limit of the confidence interval described above when claiming deductions or credits based on a sample, the IRS will not challenge the sampling procedures. (If the taxpayer were using sampling to estimate income, the upper limit would be required.) Of course, the sampling procedures must be documented well enough that the IRS can verify that correct procedures were used. The practical implication of this is that the sample must be large enough and efficiently designed so the lower limit of the confidence interval is not too far from the actual estimate (the point estimate). If the business taxpayer does not wish to take advantage of the safe harbor, some mechanism will be developed to negotiate the sampling and estimation approach on a case-by-case basis before undertaking the sampling.

Is this requirement to use the limits of the confidence interval rather than the point estimate statistically sound? The actual estimate itself (which is the mid-point of the confidence interval) is technically the best estimate in terms of having many desirable statistical properties. The statistical theory underlying the construction of the confidence interval, however, also indicates that the true value can fall anywhere in the interval and is not more likely to be at the mid-point than at the ends of the interval. We can interpret the lower limit of the confidence interval as the lowest value that we believe the true population deduction could have. In that sense, the lower limit is a statistically sound but conservative estimate of the deductible amount.

The harder question is whether this is a fair and reasonable requirement, which will be seen over time once guidelines are issued. One bit of comfort for the taxpayer is that the LMSB representatives stated that they will adhere to whatever guidelines are issued to taxpayers for their own samples. Current, IRS policy is to use the lower limit of the confidence interval in determining an audit adjustment based on sampling. In practice, however, when the confidence interval is too wide, the IRS often uses the point estimate or uses non-statistical sampling to avoid dealing with it. The expectation is that, under the proposed guidelines, the IRS would always either use the lower limit or obtain taxpayer agreement to an alternative approach.

During discussions with TEI, AICPA, and other interested parties, the IRS also indicated that it would not disallow any methodology that could be defended by reference to published articles in refereed statistical journals, even if that methodology was not yet incorporated into textbooks. This would allow the sampler to take advantage of new developments that can reduce the sample sizes significantly below those in textbooks while still giving acceptable confidence and precision.

Conclusion

The use of statistical sampling can be a win/win for both the IRS and taxpayers. It decreases the burden and cost of compliance for taxpayers and also reduces the audit burden for the IRS. Reasonable guidelines for sampling, a safe harbor procedure, and a process for dealing with exceptions remove the sampling itself as an issue and allow the parties to appropriately focus their efforts on correctly addressing the tax issues.

(1) There are several related measures of sampling precision. We are describing the most basic measures here, the standard error and confidence interval.

(2) The multiple is generally between 1.5 and 2.

(3) When a population is divided into groups or segments and a specific number of cases are sampled from each segment (year in this case), for the production of an overall estimate the sample segments must be correctly weighted to reflect their different probabilities of selection.

(4) In this case, the five largest in each group were sampled with certainty and represent only themselves. They would thus have a weight of one. The randomly selected accounts would each have a weight that is the inverse of their selection probability.

MARY BATCHER is the National Director for Statistics and Sampling for Ernst & Young LLP. She has successfully directed many sampling studies for tax filing purposes. She was formerly employed by the Internal Revenue Service. She holds a Ph.D. in statistics from the University of Maryland.

The IRS has used statistical sampling as an audit tool for many years. At the same time, there was little acknowledgement that taxpayers could also use this tool, except in a few narrowly defined situations. Taxpayers, however, have been increasingly using sampling for the filing of tax returns in more and more situations.

Tax Sampling Situations

How can businesses use sampling for filing tax returns? Sampling should be considered whenever there are facts and circumstances determinations and the population is too large to review it in its entirety without incurring excessive costs and risking overtaxing the reviewers to the point of having a bad process. One example is the review of a sample of meals and entertainment expenses currently subject to a 50-percent limitation. The underlying invoices and other documentation are reviewed to determine a percentage of such expenses that can be reclassified to 100-percent deductible. That percent is then applied to the entire 50-percent limitation population. Statistical sampling has also been used on invoices from the earlier months of a taxpayer's fiscal year to determine a percentage that are related to work performed in the prior fiscal year. Another example is the use of a sample of records from documented costs associated with large fixed assets--property, plant and equipment--to determine which costs qualify as research or experimental and therefore are eligible for immediate deduction rather than for depreciation over a period of up to 39 years. In this case, the search is for detailed items, like carpeting, that are classified as 39-year property but can be depreciated over a shorter time. Sampling has also been used for many years for LIFO inventory determination, both to develop an index or to assign the inventory to published categories.

In addition to situations where the population is so large that it would be essentially impossible to review them all, there are also situations where the population may be moderate in size but the cost of processing each sample or population unit is high. An example is the determination of eligibility for the R&D tax credit. A sample of projects or employee wages can be used to determine the eligible amounts spent. The IRS frequently uses sampling to review R&D credit claims during the course of an audit.

In fact, it makes sense for businesses to consider sampling for filing returns in any situation where the IRS tends to use sampling for the audit of such claims. Another example is the sampling of business properties for cost segregation purposes. In this case, the population may be quite modest in size, a few hundred, but because of the high cost of processing each property, sampling can save several hundred thousand dollars over the cost of having engineers and architects look at each property.

As the foregoing examples demonstrate, sampling has broad applicability in tax and should be considered whenever deductions or credits might be left on the table because the population is too large or the cost is too high to look at everything.

Why Sample?

There is a common perception that sampling is inferior to reviewing the entire population, that it is some kind of necessary evil when the population is simply too large to review. The estimates from a sample will not be exactly the same as the true population value, except purely (and very rarely) by chance. Neither, however, will a review of the entire population. A review of the entire population, even when the population is only moderately large (say 10,000 or so) is a tremendous undertaking. The level of effort needed is so large that fatigue, boredom, the need to meet deadlines, and other factors lead to error on the part of the reviewers. At the same time, the level of effort makes it difficult, if not impossible, to carefully monitor the work so that reviewers are held to standards and errors are kept to minimal levels.

The type of error introduced by the fact that we are looking at a sample, which is expected to differ from the true population value and from other samples is called sampling error. The type of error introduced by fatigue, boredom, etc., is called non-sampling or measurement error.

Statistical sampling is all about identifying and minimizing sources of error. While the review of the entire population is not subject to sampling error, it is subject to measurement error. The review of a sample is subject to both. A major advantage of a well-designed statistical sample is that it minimizes total error. In a well-designed statistical sample, the sampling error is as small as we need it to be. It is reduced by simply increasing the sample size or by using a more efficient sample design or estimator. Its magnitude is known and under our control. Large improvements in sample precision can be achieved simply from looking at a design change. In one example involving an IRS audit, we were able to retain the same sampling error that the IRS had decided to tolerate in their sample but reduce the size by one-fifth simply by making their design more efficient. This reduced the burden for both the company and the IRS.

The non-sampling error is not easily measured. Its control is more difficult, requiring whatever training, monitoring, and process improvement necessary to ensure a highly accurate and consistent review process. Needless to say, this type of quality is easier to achieve when the number of items to be reviewed is manageable. The total error in the sample can be smaller than the total error resulting from a review of the whole population.

What is Statistical Sampling?

Statistical sampling is a way to take a small portion to learn something about the whole. The accuracy of what is learned depends on how similar the whole is throughout, how big the sampled portion, and how well it is spread throughout the population.

What makes a sample a statistical sample? The key difference between statistical samples and others is that statistical samples are selected using a random process. This means that whether or not some element of the population is included in the sample is governed solely by having its number come up on the roll of a die, toss of a coin, or, more realistically, by being selected by a random number generator. This does not mean that all elements in the population have the same chance of selection; it does mean that the probability of selection is known for each element in the population and that all population elements have some chance of being in the sample. A major advantage of statistical sampling is that the accuracy of the estimates made from the sample can be quantified.

Statistical sampling accuracy can be quantified in terms of the confidence and precision of the estimates from the sample. If we were to select repeated samples from the same population, calculate the sample average for each, and put the sampled cases back in the population after we calculate the average, we would expect those sample averages to be close to each other but not identical. In fact, most would be fairly close but a few might be quite atypical if we selected enough samples. If we made our sample size larger for each of the repeated samples, we would expect their sample averages to cluster more closely together. Precision is the measure of how tightly clustered these averages are. (1) It can be calculated without actually doing the repeated sampling. The most common and useful measure of precision is the standard error. It is an indication of the average amount our sample averages would be expected to differ.

We can use our estimate of precision, the standard error, to construct what is called a confidence interval around the estimate from a single sample. If we take the standard error and add that amount to the average from our sample, we get an upper range where we think most of the sample averages will fall. We can also subtract that amount to get a lower range of likely values. The interval from the lower limit to the upper limit is called the confidence interval. If we want to be more certain that the true population value falls somewhere in the interval, we can just widen the interval. Statistical theory allows us to associate a probability with confidence intervals by applying the correct multiple of the standard error. (2) Conventionally, we use 90 percent or 95 percent confidence levels but other levels are possible. The relevant consideration is that the higher the degree of confidence, the more likely we are to capture the true value in the interval but the interval also widens as we increase our confidence. The interval can be made shorter by designing a more efficient sample, by increasing the sample size, or by lowering the confidence level.

Confidence intervals are likely to be an important element in the IRS LMSB guidelines. This ability to quantify the accuracy of the estimate from the sample is the primary advantage of statistical sampling over non-statistical sampling. Without statistical sampling, there is no way to tell whether the sample estimate is accurate or not.

Examples of non-statistical samples include reviewing one year (or one month) and applying the results to several years, selecting the ten largest accounts from several groups of accounts, or simply taking the fifth file from each drawer of a set of file cabinets. In all of these examples, the choice of which elements are included in the sample is not in any way governed by chance. Therefore, probability theory cannot be used to develop confidence and precision levels for the estimates from these samples. With simple modifications, these samples could be made statistical (or probabilistic). In the first case, instead of reviewing one year or one month, a sample could be spread throughout the years and randomly selected. If desired, a fixed sample size could be determined for each year and randomly selected within the years. (3) In the second example, instead of taking only the largest accounts, additional accounts could be randomly selected in each group of accounts. (4) In the third example, that of taking the fifth file from each drawer, the sample could be made random by numbering the files in the drawers and selecting them with a random number generator.

Non-statistical samples also have no built-in mechanism to control for non-representativeness or bias. The kind of haphazard or judgment sample drawn can be very atypical and there is no way to control for that. A statistical sample is based on randomization. That means that although an unusual statistical sample can occur, it is rare. With advanced sampling methods, we can virtually ensure that the sample is representative. We once did a meals and entertainment statistical sample following a preliminary feasibility non-random sample. The non-random sample had not been representative and resulted in a very different estimate of the rate of movement from the 50-percent limitation to 100-percent deductible. This meant that the company expected to find more value than was actually there--not a happy situation for the company or the tax consultants! The bias control of statistical samples makes them well worth the additional effort.

Although all statistical samples have the property that their sampling accuracy can be quantified, they are not all equal. The simple fact that a sample is statistical is not enough to make it a good sample. Good statistical samples are tailored to the specific population and sampling situation to make them as small as possible while still meeting the precision requirements. A simple random sample can be as much as two to three times larger than a well-designed complex sample. We once compared the sample sizes for a simple random sample and a stratified sample for the same sampling circumstances. The stratified sample size was around 400 and the simple random sample size to achieve the same confidence and precision was over 1000.

A good statistical sample is not achieved through a one size fits all type of approach. It requires analysis of the population to ensure that sampling assumptions are met and to design the sample so it is best for that population. Best in this context means that all the tools available to the sampler are used to handle population characteristics that might otherwise increase the sample size needed to achieve the specified precision.

What the IRS Guidance is Likely to Include?

Although the IRS LMSB Division is still in the process of developing sampling guidelines, they have been willing to discuss what they are considering. They are likely to establish a "safe harbor" for taxpayers using sampling. The "safe harbor" would allow statistical sampling in filing situations. So long as the taxpayer uses the lower limit of the confidence interval described above when claiming deductions or credits based on a sample, the IRS will not challenge the sampling procedures. (If the taxpayer were using sampling to estimate income, the upper limit would be required.) Of course, the sampling procedures must be documented well enough that the IRS can verify that correct procedures were used. The practical implication of this is that the sample must be large enough and efficiently designed so the lower limit of the confidence interval is not too far from the actual estimate (the point estimate). If the business taxpayer does not wish to take advantage of the safe harbor, some mechanism will be developed to negotiate the sampling and estimation approach on a case-by-case basis before undertaking the sampling.

Is this requirement to use the limits of the confidence interval rather than the point estimate statistically sound? The actual estimate itself (which is the mid-point of the confidence interval) is technically the best estimate in terms of having many desirable statistical properties. The statistical theory underlying the construction of the confidence interval, however, also indicates that the true value can fall anywhere in the interval and is not more likely to be at the mid-point than at the ends of the interval. We can interpret the lower limit of the confidence interval as the lowest value that we believe the true population deduction could have. In that sense, the lower limit is a statistically sound but conservative estimate of the deductible amount.

The harder question is whether this is a fair and reasonable requirement, which will be seen over time once guidelines are issued. One bit of comfort for the taxpayer is that the LMSB representatives stated that they will adhere to whatever guidelines are issued to taxpayers for their own samples. Current, IRS policy is to use the lower limit of the confidence interval in determining an audit adjustment based on sampling. In practice, however, when the confidence interval is too wide, the IRS often uses the point estimate or uses non-statistical sampling to avoid dealing with it. The expectation is that, under the proposed guidelines, the IRS would always either use the lower limit or obtain taxpayer agreement to an alternative approach.

During discussions with TEI, AICPA, and other interested parties, the IRS also indicated that it would not disallow any methodology that could be defended by reference to published articles in refereed statistical journals, even if that methodology was not yet incorporated into textbooks. This would allow the sampler to take advantage of new developments that can reduce the sample sizes significantly below those in textbooks while still giving acceptable confidence and precision.

Conclusion

The use of statistical sampling can be a win/win for both the IRS and taxpayers. It decreases the burden and cost of compliance for taxpayers and also reduces the audit burden for the IRS. Reasonable guidelines for sampling, a safe harbor procedure, and a process for dealing with exceptions remove the sampling itself as an issue and allow the parties to appropriately focus their efforts on correctly addressing the tax issues.

(1) There are several related measures of sampling precision. We are describing the most basic measures here, the standard error and confidence interval.

(2) The multiple is generally between 1.5 and 2.

(3) When a population is divided into groups or segments and a specific number of cases are sampled from each segment (year in this case), for the production of an overall estimate the sample segments must be correctly weighted to reflect their different probabilities of selection.

(4) In this case, the five largest in each group were sampled with certainty and represent only themselves. They would thus have a weight of one. The randomly selected accounts would each have a weight that is the inverse of their selection probability.

MARY BATCHER is the National Director for Statistics and Sampling for Ernst & Young LLP. She has successfully directed many sampling studies for tax filing purposes. She was formerly employed by the Internal Revenue Service. She holds a Ph.D. in statistics from the University of Maryland.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | tax returns |
---|---|

Author: | Batcher, Mary |

Publication: | Tax Executive |

Date: | Nov 1, 2001 |

Words: | 2892 |

Previous Article: | New tax law significantly improves benefits of 401(k) and other qualified plans. |

Next Article: | Canadian legislation on non-resident trusts: October 25, 2001. |

Topics: |

## Reader Opinion