# It's a long way to Monte Carlo: simulated solutions of commonly used financial models do not converge very quickly.

This paper presents practical examples on how Monte Carlo simulation software must be evaluated for precision. The examples are from financial planning cases, but the relevance of the methodologies displayed here for other probabilistic problems, such as VaR and stress testing, is clear. Users must be comfortable not only with the assumptions of the models they use, but also with the reliability of percentiles, means, variances, and other moments of the final distributions that are produced by the software solving such models. Legal and reputation risks are the most obvious consequences of overlooking this issue.

**********

Monte Carlo simulation is one of the best-known solution approaches for probabilistic financial problems. Since its adoption by financial practitioners for stress testing and other problems has occurred rapidly, not everyone has had a chance to weigh pros and cons of the approach. This paper discusses the implications that the choice of the batch size (sometimes called the number of experiments, of replications, or of "runs") has on the reliability of a simulation. In the technical literature, this is called the convergence speed issue. This paper analyzes the impact of batch size on the solutions of two common financial planning problems. The results unfortunately show that, in practice, simulation accuracy can be easily overlooked. Thus, placing blind faith in simulation results could cause financial tragedy for clients or employers and, quite possibly, lawsuits and liabilities.

The practical implication is that, when running simulations, one should make sure that the batch size is sufficient to yield an estimate within an acceptable range of the true solution. When writing software, this can be achieved by using statistical software or a programming language offering double precision rather than a spreadsheet, and by obtaining error estimates using the methods reviewed in this paper. When obtaining the software from a third party, the producer should provide error estimates and indications of the batch size.

Solutions to probabilistic problems are generally moments of random variables (such as means and variances), which can be expressed as integrals. Integrals, in turn, may be solved either analytically (that is, by algebraically finding either an exact or an approximated "quadrature" solution) or numerically (that is, by computing an approximated solution). Since the solutions generally represent estimates, numerical solutions should be used when the exact analytical solutions are either very difficult or impossible to derive. Monte Carlo is one of several reasonable approaches to numerical integration, as explained by numerical integration manuals such as Davis and Rabinowitz (1984) or Zwillinger (1992).

The speed of convergence affects a very practical aspect of a Monte Carlo simulation, namely, its precision. As explained in the examples of this paper, if the batch size is too small, the simulation may yield a result that is substantially far from the truth. Press et al. (1992) point out in their famous book (p. 26) that Monte Carlo is appealing because it is relatively easy to program but has precision problems. Of course, for a planner, an incorrect answer implies incorrect financial advice. For a risk practitioner, a wrong estimate of VaR may imply shareholder lawsuits or the employer's financial distress.

In this paper we test models of retirement spending, that is, of the probability of outliving one's money given a wealth amount, an asset allocation, and an annual withdrawal amount. The problem is written as an example, and some of its assumptions are unrealistic, because the point of the paper is testing convergence speed and not replicating the results of studies such as Cooley et al. (1998) (also known as the "Trinity Study"), Jarrett and Stringfellow (2000), or Milevsky (2001). However, the results of the paper show that the authors of similar studies should provide detailed information about sample sizes and error estimates. This paper borrows examples from the financial planning literature, but its results can be clearly applied to VaR problems.

Models and Results

The models used in this paper are similar to those used in the papers listed above. There is a single person of age sixty, who can invest in a mix of three assets whose random returns in the same year are correlated (no serial correlation). The returns are lognormally distributed with constant moments (mean, variance, covariance). The person chooses a fixed allocation and performs annual rebalancing. There is no inflation, and the person withdraws a fixed dollar amount per year, starting at the end of this year, until death.

For simplicity, we test only one such allocation (i.e., sixty percent stocks, thirty percent bonds, and ten percent cash). It is important to remember that this paper is not concerned with the validity of an asset allocation or of a spending amount, but with the precision of the estimated outcomes.

There are two versions of the model. One version has certain lifetime (i.e., the investor knows that exactly forty-one annual withdrawals of \$60,000 shall be needed). The other version has uncertain lifetime (i.e., the investor dies in an unknown future period between today and forty-one years from now).

The model with certain lifetime is simple. We need to calculate the future value of the portfolio, given the assumptions above, of \$1,000,000 from which the user withdraws \$60,000 for forty-one years. For each of the replications, if the person has at least zero dollars after forty-one withdrawals, a success is tallied. If the person dies in debt, a failure is tallied. The percentage of successes is nothing but an empirical measure of probability of not outliving one's money given the stated assumptions.

The model is first calculated using Microsoft Excel. (1) Writing a correct model using Excel is not easy because the size of the worksheet is limited. One random number per year is needed to simulate the return of each asset in the portfolio. Thus, given that the model uses three assets for forty-one years, 123 random numbers are needed for each of the replications. Given that Excel worksheets have 255 columns, 250 replications are the chosen batch size. Thus, (41 x 3 x 250) = 30,750 random standard normal numbers must be generated.

Refreshing the numbers to generate a new batch of the same size takes several minutes. Moreover, extreme values of [+ or -] 5,000,000 sometimes appear in the output. The user needs to replace these by hand with either an acceptably large number or the expected value of the standard normal distribution (zero). The relatively frequent appearance of these extreme numbers casts doubt on the reliability of the random number generator.

Informal testing suggests that large samples from this random number generator have skewness and kurtosis consistent with those of a normal distribution. However, the period of the algorithm (which should ideally be a very large number) and the serial correlation of the sequence (that should be zero) are not stated in the Excel "Help" files. More extensive testing, of the kind explained by McCullough and Vinod (1999), is needed to verify these issues. Moreover, the issue of how to fit a larger batch size within an Excel worksheet remains.

Once generated, the random numbers must be given mean, variance, and covariance according to the assumptions of the model--a complicated procedure in Excel. (2) Finally, one needs to add the expected values to the random numbers (because they were generated with expected value equal to zero) and calculate the exponentials to transform the continuously compounded returns into log-normal gross returns.

The reader has already realized that a large number of computations are required to do a Monte Carlo simulation with Excel. Moreover, as stressed by McCullough and Vinod (1999), the numerical accuracy of large summations in Excel is not guaranteed because of the extensive rounding involved, which may add up to large errors. Thus, the user should seriously consider using statistical or mathematical software for this kind of problem, because such software is often optimized for numerical accuracy (and sometimes for speed, too).

Given the large amount of time required to obtain new batches of random numbers, the simulation is performed with twenty-five batches of 250 replications each. The results are summarized in Table 1, where the mean represents the percentage of success of a given allocation policy for a person of age sixty who has \$1,000,000 invested in a given asset allocation and withdraws real \$60,000 annually (adjusted for inflation), given an arbitrary mortality table. That is, in 52.26 percent of the experiments in the hypothetical scenario, the investor did not outlive his or her money--our criterion for success. (Obviously, the numbers are an example and should not be interpreted as investment or spending advice.) The results were calculated in Excel 97, using the random normal number generator found in the package's Data Analysis Tools.

As a second step, the model presented above was then written in the mathematical and statistical language Matlab and simulated with different numbers of Monte Carlo experiments. (3) The Matlab program was executed on a Pentium III, 650 MHz, with 128 MB of RAM, running Windows NT 4. Since Matlab interprets (does not compile) the program; and given that the code itself was not optimized for speed, the execution times mentioned in this section are intended mainly for comparison against each other. The results are summarized in Table 2.

These results are for exactly the same problem as addressed using Excel in Table 1. Columns indicate the batch size. For each of the columns, the simulation was carried out 500 times without changing any of the parameters, compared to only twenty-five times using Excel. The Jarque-B era test is a test of the null hypothesis of normality of a distribution. Under the null it is distributed according to a [[chi square].sub.(2)], whose five percent critical value is 5.99. (4)

The mean and variance of the Matlab simulations with batch size equal to 250 seen in Table 2 differ from those in Table 1 calculated with Excel. However, these results are not strictly comparable, since the experiments were run 500 times in Matlab, but only twenty-five times in Excel. Even without considering deeper statistical concepts, it is clear that we can get a better idea of the fairness of a coin if we toss it 500 rather than only twenty-five times. Therefore, these results suggest that there may be issues with the numerical accuracy and the goodness of the random number generator in Excel, but they are not a full scientific assessment of such numerical accuracy.

The information summarized in Table 2 is detailed in Figure 1. It is interesting to see that the distributions of the results do not look particularly symmetrical. However, all skewness statistics are near zero, and considering the kurtosis and Jarque-Bera statistics, the distributions of the results for several batch sizes do not appear to be significantly different from normal distributions.

This histogram show the distributions of the probability of success of a certain asset allocation and withdrawal strategy for a person knowing the exact year of his or her death. The batch size indicates the number of times each problem was simulated before calculating the probability of success. A higher batch size appears more likely to yield results more concentrated near the "real" result.

What is the importance of Figure 1 for analysts concerned with VaR issues? After all, they are interested in the left tail of the distribution of wealth, and keep the probability fixed (e.g., to five percent). The answer is that if the probability number is all over the place (as in the top panels of the figure), so will be the left tail of the distribution. If plus or minus two percent can be acceptable to a financial planner, who gives longterm estimates destined to be revised periodically, it may not necessarily be a good precision level for stress testing a bank's loan portfolio.

A further complication, mortality, is added to the model in this section. Assume that the person withdraws \$70,000 per year, (5) but is not sure of when, in the next forty-one years, he or she will die. This implies that nobody reaches the age of 101.

Instead of using mortality tables, a beta random number generator for mortality is used. (6) To add more realism, one should use a uniform random number generator and an inverted cumulative mortality distribution from an actuarial table (which is clearly beyond the scope of this paper and not necessary to illustrate the paper's basic principles).

First, the model was simulated 500 times, each time with 500 runs. This took 41 minutes. Second, the same model was simulated 500 more times, but this time with 1,000 runs each. This took 112 minutes--thus, twice the number of runs required more than twice as long, possibly because it requires time-consuming memory swapping to disk after exhausting the RAM space. Third, the model was simulated 500 times with 5,000 runs each. This took 692 minutes, that is, about eleven hours and thirty minutes.

Summary descriptive statistics of the results are listed in Table 3, conceptually identical to Tables 1 and 2, where the mean is the percentage of success of a given allocation policy for a person of age sixty who has \$1,000,000 invested in a given asset allocation and withdraws real \$70,000 annually (adjusted for inflation), given an arbitrary mortality table. The information summarized in Table 3 is detailed in Figure 2, similar in concept to Figure 1. Also in this case, the distributions of the results do not particularly look symmetrical, but the Jarque-Bera tests indicate that they are normal.

The variance of the results in the column with batch size equal to 500 is 4.13, while that of the column with batch size equal to 1,000 is 1.92, which is roughly half as much; the column with the results for 5,000 replications per batch is 0.80, which again is roughly half as much as the variance in the previous column. Thus, in this problem, doubling the batch size from 500 to 1,000 effectively halves the variance. However, to reduce the variance to one-fourth of that for 500 replications one needs to use a batch size of 5,000 (and not of 4 x 500 = 2,000) as a linear prediction could have suggested).

The practical implications of the results are easily explained. Leaving aside the unrealistic assumptions of the model for a moment, suppose that a well-intentioned financial planner using Matlab had applied the model to a specific client. Given the chosen asset allocation, if the planner had used a batch of 500 runs, she might by chance have obtained the lowest result. In this case, she could have reasonably told her client that there is only a sixty-seven percent probability of success if withdrawing "real" \$70,000 per year. Consequently, she could have tested a sharply lower withdrawal rate, for example \$50,000.

If, as it appears from the Jarque-Bera tests, the distribution of the results is normal, then about ninety-five percent of the answers (of running 500 runs once) will be within plus or minus two standard deviations from the mean. For the given assumptions (including a \$70,000 withdrawal, as well as allocation, age, etc.) this implies 73.54 [+ or -] 2 x [square root of (4.13)], that is the interval [69.48 77.60], which is a large enough interval to cause planning errors.

Of course, if the outcome had been the highest one, she could have observed a seventy-nine percent chance--a dramatically different number, suggesting that a slightly lower withdrawal rate (for example, \$65,000) might have been affordable. However, running the simulations with \$65,000 might have given a puzzling result (e.g., a sixtysix percent success rate), because of the relatively high variance of the simulation with only 500 runs.

If the planner is more patient and runs 5,000 replications, her answers are likely to be between seventy-one and seventy-six percent--a much narrower range. If again we suppose that the distribution of the results is normal, then about ninety-five percent of the answers (of running 5,000 runs once) will be within 73.52 [+ or -] 2 x [square root of (0.80)], that is, the interval [71.73 75.31].

What Precision Level?

As already mentioned in the text, because of normality we know that about ninety-five percent of the probability is in the range of plus or minus two standard deviations from the mean (see the examples above as well as the following one). The user of the software must therefore decide whether the precision is sufficient. For example, one could decide that a standard deviation of two percent is acceptable, because it leads to estimates that in about ninety-five cases out of one hundred are within plus or minus four percent of the true value.

While plus or minus four or five percent can be a reasonably acceptable rule in general, the planner can decide for stricter or more relaxed criteria. The fundamental thing is that the standard deviation of the error (as well as possible biases in the mean) should be clearly disclosed by the author of the software.

One final caveat: the number of simulations is not necessarily a good proxy for precision. Even if in general a larger batch size increases precision, one must consider that there are numerous techniques for variance reduction, which increase precision without requiring a larger batch size. This is why, instead of focusing too much on the batch size, one should concentrate on percentiles, mean, and variance of the estimate.

Moreover, since users of VaR software focus on the left tail of the distribution, they must pay particular attention to the quality of the random number generator. Quasi Monte Carlo techniques, such as those in Li (2000) and Siegl and Tichy (2000), overcome the problems of the random number generator by using well-spaced grids of points, which are particularly effective at mapping the boundary of the problem even with a relatively small batch size. (7)

Conclusion

The practical implications of this paper can be summarized in the following points:

* When using Monte Carlo software, one should know about the number of simulations, the convergence rates, and error estimates. Since the problem is clearly important, one should not be satisfied with generic statements.

* Calculating Monte Carlo simulations using a spreadsheet may not be very accurate in solving probabilistic problems such as those of retirement planning, since the batch size is limited by the worksheet. These models should be very carefully tested.

* Convergence speed is not the same for all problems. Doubling the batch size does not necessarily cut the variance of the result in half. Therefore, it is necessary that, when testing a problem, one try a number of scenarios (i.e., sets of parameters such as age and spending amount) and of batches (i.e., repeating the same simulation over and over with different realizations of the pseudo-random numbers). This is clearly time consuming, but it is worth doing if Monte Carlo is the chosen solution method. In case of purchased software, the purchaser should ask the producer for details about variance as well as batch size of the solution.

* The quality of the random number generator (period, correspondence with the requested distribution, serial correlation) should be investigated. If repeating the simulation, one obtains exactly the same result, it is likely that the program uses the same series of random numbers over and over, thus making testing for accuracy quite difficult.

* One should look for analytical solutions and for reasonable approximations when they exist. These are often faster and can be more reliable than simulations. Moreover, and contrary to popular belief, Monte Carlo is not the only simulation method available, and Quasi Monte Carlo methods have been experimented for several financial problems.

The methodology used in this paper can be easily adapted to test different solution approaches to financial problems. For example, suppose a planner is evaluating new software that quickly calculates solutions for a specific financial planning problem. In this case, he can calculate Monte Carlo simulations and study the distribution of their results for that problem, and then compare the Monte Carlo results to those obtained using the faster method (be it analytical or numerical). The advantage is that, if the new software package is found to give reliable solutions, it can be used by the planner to obtain correct solutions in a shorter amount of time than if using Monte Carlo.

Overlooking the problems and suggestions given in this paper is likely to lead to incorrect calculation and serious mistakes, with consequent reputation and legal risks. It is important to choose software and repetitions appropriate to the problem being investigated.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]
```TABLE 1

MODEL 1 -- EXCEL SIMULATION OUTCOMES

Statistic Value

Mean 52.26%
Variance 12.65%
Std. Dev 3.56%
Min 46.40%
Max 59.60%
Median 52.00%
Skewness 0.28
Kurtosis -0.79
TABLE 2

MODEL 1 -- MATLAB SIMULATION OUTCOMES

Statistic 250 Runs 500 Runs 1,000 Runs 5,000 Runs

Mean 53.58% 53.64% 53.62% 53.64%
Variance 9.6 4.67 2.33 0.51
Std. Dev. 3.1 2.16 1.53 0.74
Min 44.00% 47.40% 48.00% 51.66%
Max 65.20% 59.40% 57.30% 55.84%
Median 53.60% 53.60% 53.70% 53.60%
Skewness 0.16 -0.02 -0.21 0.03
Kurtosis 3.12 2.68 3.05 2.75
Jarque-Bera 0.32 0.32 0.17 0.47
Run Time (secs) 1,305 2,798 7,372 189,380
TABLE 3

MODEL 2--MATLAB SIMULATION OUTCOMES

Statistic 500 Runs 1,000 Runs 5,000 Runs

Mean 73.54% 73.43% 73.52%
Variance 4.13 1.92 0.80
Std. Dev. 2.03 1.38 0.89
Min 67.20% 69.60% 70.84%
Max 79.40% 77.80% 75.80%
Median 73.60% 73.40% 73.52%
Skewness -0.07 0.06 -0.01
Kurtosis 2.98 3.33 2.69
Jarque-Bera 0.43 2.36 2.08
Run time (secs) 2,470 6,715 41,514
```

(1.) The worksheet is available from the author.

(2.) To do this one can use the Cholesky factorization matrix C of the variance-covariance matrix [SIGMA], which loosely speaking is a way to calculate a "square root" of the variance-covariance matrix: [SIGMA] = C'C. That is, by multiplying the Cholesky matrix by its transpose one obtains the initial variance-covariance matrix. (This procedure is explained in many linear algebra and econometrics textbooks.) For each of the replications, one needs to multiply the matrix of the random returns (a matrix of 41 rows by 3 columns) by the transposed Cholesky matrix (in our case, a square triangular matrix of size 3). In this way, the returns not only have the correct variance, but also the correct correlation with the returns to the other assets in the same year. This is a step that should not be overlooked.

(3.) The worksheet is available from the author.

(4.) The Jarque-Bera statistic is described in Bern and Jarque (1981) as well as in many statistics and econometrics textbooks.

(5.) \$70,000 rather than \$60,000 in the "certain lifetime" is used since there is a high probablity that the investor will die in less than 41 years.

(6.) The beta distribution is similar to a Gaussian but has bounds. It is of course unrealistic as a mortality table, and it is just used for sake of argument and to point out that this paper does not provide answers to specific cases, but only focuses on the batch size issue.

(7.) The interested reader can learn about such techniques in any manual of numerical methods, including those mentioned in the introduction of this paper.

REFERENCES

Bera, A.K., and C.M. Jarque. 1981. "Efficient Tests for Normality, Heteroscedasticity, and Serial Independence of Regression Residuals: Monte Carlo Evidence." Economics Letters. Vol. 7. Pp. 313-318.

Cooley, Philip L., Carl M. Hubbard, and Daniel T. Walz. 1998. "Retirement Savings: Choosing a Withdrawal Rate That Is Sustainable." American Association of Individual Investors Journal, Vol. XX, No. 2. February, Available online at http://www.aaii.com/ promo/mstar/feature.shtml

Davis, P.J., and P. Rabinowitz. 1984. "Methods of Numerical Integration." Academic Press.

Jarrett. Jaye C., and Tom Stringfellow. 2000. "Optimum Withdrawals From an Asset Pool." Journal of Financial Planning. January. Available online at http://jjarrett.home.texas.net/resOptWD/index.htm

Li, Jenny X. 2000. "Quasi-Monte Carlo Algorithm for Pricing Options." Revista de Analisis Economico, Vol. 15, No. 1. June. Pp. 111-119.

McCullough, B.D. and H.D. Vinod. 1999. "The Numerical Reliability of Econometric Software." Journal of Economic Literature. Vol. 37, No. 2. June. Pp. 633-665.

Milevsky. Moshe Arye. 2001. "Spending Your Retirement in Monte Carlo." Retirement Planning. January-February. Pp. 21-29.

Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brain P. Flannery. 1992. "Numerical Recipes in C--The Art of Scientific Computing." Cambridge University Press.

Siegl, Thomas. and Robert F. Tichy. 2000. "Ruin Theory with Risk Proportional to the Free Reserve and Securitization." Insurance: Mathematics and Economics. Vol. 26, No. 1. February. Pp. 59-73.

Zwillinger, Daniel. 1992. "Handbook of Integration." A.K. Peters.

Michele Gambera is a Senior Quantitative Analyst in the research department at Morningstar. Since joining Morningstar in June 2000, he has been actively involved in the development and testing of the mathematical and statistical models behind Morningstar's online investment advice service. He also conducts research in macroeconomics and finance. Previously he was an economist at the Federal Reserve Bank of Chicago. A graduate of the University of Trento (Italy), he received a MA and a Ph.D. in Economics from the Pennsylvania State University.

The views expressed here are those of the author and not necessarily those of Morningstar.