# Predictive aggregate claims distributions.

INTRODUCTION

The traditional inferential actuarial problem usually involves the passage from observed past data to unobserved future outcomes. It is well known that actuarial calculations, stop loss premiums, for example, are not robust to the distributional assumptions. Fitted distributions, in contrast to predictive distributions, do not incorporate parameter estimation uncertainty, or what North American actuaries call "risk parameter uncertainty". As a result, premium calculations, ruin probability calculations, surplus calculations etc., can be understated if they are based on the fitted distribution. In certain circumstances the understatement can be quite substantial. Incorporation of parameter uncertainty can make a substantial difference, even to premiums calculated by the expected value principle.

The traditional inferential actuarial problem is solved from a Bayesian point of view in a straightforward manner by finding the posterior predictive density of the unobserved random variable given the data. The Bayesian paradigm is often criticised because the choice of the prior distribution is subjective and removes the aura of objectivity in any analysis. This objection is removed as we use non-informative priors. A Bayesian uses the so-called Bayesian predictive density to forecast future observations. The approach is natural in that one bases one's predictions on the conditional distribution of the future given the past. Moreover, the Bayesian predictive distribution automatically incorporates both sources of uncertainty, namely process uncertainty and parameter estimation error.

The use of predictive distributions, in place of fitted distributions, is not new. Klugman (1992) describes the (objective) Bayesian paradigm and illustrates, inter alia, the difference between a fitted and a predictive Weibull distribution. The fitted distribution does not display as much variation as the predictive distribution. Indeed, the fitted distribution does not necessarily belong to the same family of distributions as the predictive distribution. Cairns (1995) describes the Bayesian approach as a way of incorporating parameter uncertainty in the "Modelling Process" and applies the approach to rain theory's adjustment coefficient. Applications of predictive distributions to reinsurance have been given by Hesselager (1993) and Hurlimann (1993, 1995).

In the last decade attempts have also been made to develop non-Bayesian likelihood approaches to prediction via the concept of predictive likelihood. The concept of predictive likelihood is rather vague, reflected by the fact that there are over a dozen versions of predictive likelihood. For example, the profile predictive likelihood, first studied by Mathiasen (1979), yields the same predictive distribution for the mean of the normal distribution as the Bayesian predictive distribution with respect to a flat prior. This is because the Bayesian posterior predictive density with a fiat prior can be thought of as an integrated marginal likelihood.

In the present paper we apply the Bayesian paradigm, and in certain circumstances illustrate that the profile likelihood produces the same predictive distribution. However, since the Bayesian approach is straightforward and easier to apply, we use it as a standard pragmatic device.

The outline of the paper is as follows. In the second and third sections, we present background results. In the fourth section, we compare fitted and predictive aggregate claims distributions and in the fifth section we consider reinsurance. Finally, in the penultimate section, we discuss the situation when the moments of the predictive aggregate claim amount distribution do not exist.

PRELIMINARIES

In this section predictive distributions for the exponential and Poisson distributions are provided. In broad outline we start with prior beliefs represented by proper prior distributions and take limits to obtain posterior beliefs and predictive distributions based on diffuse (or ignorant) priors. We will use these predictive distributions in our examples in subsequent sections.

Exponential Distribution with Gamma Prior

Let [X.sub.1], ..., [X.sub.n][where][Theta] be independent and identically distributed random variables with p.d.f.

f(x[where][Theta]) = [Theta] exp {-[Theta] x} for x [greater than] 0 (1)

where 6 [greater than] 0. Led D = ([X.sub.1], ..., [X.sub.n]) represent the data vector. When the prior distribution for 6 is G([Alpha], [Beta]), i.e. a gamma distribution with p.d.f.

f([Theta]) = [[Theta].sup.[Alpha]-1] [[Beta].sup.[Alpha]] [e.sup.-[Beta][Theta]]/[Gamma]([Alpha]) for [Theta] [greater than] 0

where [Alpha], [Beta] [greater than] 0, it is well known that

[Mathematical Expression Omitted]

where [similar to] is to be read "is distributed as." Now let g([Theta][where]D) be the p.d.f. of [Theta][where]D and let [X.sup.*] be a subsequent observation from the exponential distribution given by (1). Then the p.d.f. of [X.sup.*][where]D is given by

[Mathematical Expression Omitted]

so that

[Mathematical Expression Omitted].

Thus [X.sup.*][where]D has a Pareto distribution with parameters n+a and [Mathematical Expression Omitted].

The diffuse prior is obtained by letting both a and [Beta] go to zero in such a way that [Alpha]/[Beta] is a constant. Hence, with a diffuse prior [X.sup.*][where]D has a Pareto [Mathematical Expression Omitted] distribution.

Now, the profile predictive likelihood is given by the joint likelihood of [X.sub.1], ..., [X.sub.n] and [X.sup.*] evaluated at the maximum likelihood estimator [Mathematical Expression Omitted] based on the observations [X.sub.1], ..., [X.sub.n] and [X.sup.*]. So the profile likelihood is proportional to

[Mathematical Expression Omitted]

which is equivalent to the predictive distribution with respect to a diffuse prior.

Note that when the prior is diffuse both E([X.sup.*][where]D) and V([X.sup.*][where]D) exceed the mean and variance of an exponential distribution whose parameter is the maximum likelihood estimate of 6. This may not be the case when the prior is not diffuse.

Poisson Distribution with Gamma Prior

Let N[where][Lambda] [similar to] Poisson([Lambda]) and let the prior distribution for [Lambda] be G([Alpha], [Beta]). Then the unconditional distribution of N is negative binomial NB([Alpha], [Beta]/(1 + [Beta]))and the posterior distribution for [Lambda][where]N is G([Alpha]+N, 1 + [Beta]). It therefore follows that if [N.sup.*] is a subsequent observation from the Poisson distribution then the predictive distribution for [N.sup.*][where]N is NB([Alpha] + N, (1 + [Beta])/(2 + [Beta])).

The diffuse prior then leads to a predictive distribution that is NB(N,1/2). Hence the predictive distribution has the same mean as the fitted Poisson(N) distribution, but has twice the variance of the fitted distribution.

In this particular case, the mean square error (MSE) of prediction using classical theory is

E[([N.sup.*] - N).sup.2] = E[([N.sup.*] - [Lambda] + [Lambda] - N).sup.2]

= E[([N.sup.*] - [Lambda]).sup.2] + E[(N - [Lambda]).sup.2]

= [Lambda] + [Lambda]

= 2[Lambda].

Hence, an estimate of the MSE is given by [Mathematical Expression Omitted] which is the same as the predictive variance of [N.sup.*] under the Bayesian paradigm with a diffuse prior. Accordingly, the estimate of the MSE of prediction using the classical approach is equivalent to the predictive variance using the objective Bayesian approach. However, the profile predictive distribution is very different from the Bayesian one, i.e. NB(N,1/2).

Normal Distribution with Normal Prior for Mean and Known Variance

Let [Mathematical Expression Omitted] with [[Sigma].sup.2] known and let the prior distribution for [Mu] be [Mathematical Expression Omitted]. Let D = ([X.sub.1], ..., [X.sub.n]) again represent the data vector. It is well known (see, for example, Lee 1989) that

[Mathematical Expression Omitted]

where [Mathematical Expression Omitted] is given by

[Mathematical Expression Omitted]

and the posterior variance or mean square error (MSE) is given by

[Mathematical Expression Omitted]

where Z (the credibility factor) gives the relative precision of the two sources of information, i.e.

[Mathematical Expression Omitted].

Let us assume for the remainder of this section that the prior is diffuse, i.e. that [Mathematical Expression Omitted]. Then we fred that [Mathematical Expression Omitted] and [Mathematical Expression Omitted].

Now let [X.sup.*] be a subsequent observation from the N([Mu], [[Sigma].sup.2]) distribution. Then it is straightforward to show that the distribution of [Mathematical Expression Omitted]. Hence the predictive distribution contains the two sources of uncertainty or variability, viz., [Mathematical Expression Omitted] and [[Sigma].sup.2].

Now let [Y.sup.*] = exp([X.sup.*]). Then the distribution of [Y.sup.*][where]D is lognormal with parameters [Mathematical Expression Omitted] and ([[Sigma].sup.2]/n) + [[Sigma].sup.2],and so

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted].

Note that both E([Y.sup.*][where]D) and V([Y.sup.*][where]D) exceed the mean and variance of a lognormal distribution whose parameters are the maximum likelihood estimates of [Mu] and [[Sigma].sup.2]. The predictive mean incorporates the component exp([[Sigma].sup.2]/2n) where [[Sigma].sup.2]/n is the variance of the sample mean.

Normal Distribution with Gamma Prior for Unknown Variance

Once more let [Mathematical Expression Omitted]. If [[Sigma].sup.2] is unknown the diffuse prior is given by

p([[Sigma].sup.2]) [varies] 1/[[Sigma].sup.2]

where [varies] means "is proportional to." Note that the prior is an improper distribution. However, both the posterior and predictive distributions are proper. We know (see, for example, Lee (1989)) that

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted] (2)

where

[Mathematical Expression Omitted],

G([Alpha], [Beta]) denotes a gamma distribution with mean [Alpha]/[Beta] and t(v, r, s) denotes a t-distribution which is defined in Appendix 1. If we let s = [(S/(n-1)).sup.1/2] then an alternative way of writing (2) is

[Mathematical Expression Omitted].

The classical statistics (or sampling theory statistics) approach leads to a similar conclusion. In this case

[Mathematical Expression Omitted].

However, in the classical approach it is the data that are regarded as random and the parameters [Mu] and [[Sigma].sup.2] are fixed. By contrast, the objective Bayesian approach regards [Mu] and [[Sigma].sup.2] as random and the available data as being fixed.

If [X.sup.*] denotes a subsequent observation from the N([Mu], [[Sigma].sup.2]) distribution, the predictive distribution for [X.sup.*][where]D is

[Mathematical Expression Omitted].

This result is derived in Appendix 2. From results given in Appendix I it follows that [Mathematical Expression Omitted] and V([X.sup.*][where]D) = (n+1)S/(n(n - 3)). From the classical statistics standpoint, the distribution of

[Mathematical Expression Omitted]

is also t(n - 1,0,1) and is equivalent to the profile predictive likelihood. (Here we use S/(n - 3) as the estimator of [[Sigma].sup.2].)

If we again define [Y.sup.*] by [Y.sup.*] = exp([X.sup.*]), then the predictive distribution for [Y.sup.*][where]D is the distribution of exp([X.sup.*])[where]D.

PREDICTIVE AGGREGATE CLAIMS DISTRIBUTIONS

The traditional risk model for aggregate claims is as follows:

S = [Y.sub.1] + ... + [Y.sub.N]

where S represents the aggregate claim amount in a fixed time period (typically one year), N represents the number of claims occurring in that period, and [Y.sub.1], [Y.sub.2], ... represent the amounts of successive claims. We assume that [Mathematical Expression Omitted] is a sequence of i.i.d. random variables and that N is independent of this sequence.

In the last fifteen years much attention has been devoted to computing the aggregate claims distribution given the distributions of N and [Y.sub.i]. Examples of recursive algorithms to calculate the distribution of S are given by Panjer (1981), Schroter (1991) and Sundt (1992).

The approach in practice is to fit the distributions of N and [Y.sub.i] to data and, if the distribution of N belongs to the appropriate class, to apply the fitted distributions as inputs into these algorithms. There is a major drawback to this approach as the distributions do not incorporate parameter estimation error. The distributions are assumed to be known with certainty.

Consider the following statistical setting:

Data for one time period: D = {N; [Y.sub.1], ..., [Y.sub.N]}

"Future" observations for the next period are [N.sup.*]; [[Y.sub.1].sup.*], ..., [[Y.sup.*].sub.[N.sup.*]].

We are interested in computing the predictive distribution of

[S.sup.*] = [[Y.sub.1].sup.*] + ... + [[Y.sup.*].sub.[N.sup.*]]

(conditional on the data D).

We will compare this with the fitted aggregate claims distribution defined by

[Mathematical Expression Omitted]

where [Mathematical Expression Omitted] has the fitted distribution of N based on the data D, and [Mathematical Expression Omitted] has the fitted distribution of [Y.sub.i].

We will assume throughout that the claim count N[where][Lambda] has a Poisson distribution with mean [Lambda]. Accordingly, if [Mathematical Expression Omitted] is the fitted value of [Lambda], the mean and variance of the fitted and predictive aggregate claims are

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

for the fitted distribution and

E([S.sup.*][where]D) = E([N.sup.*][where]D)E([[Y.sub.i].sup.*][where]D) = NE ([[Y.sub.i].sup.*][where]D)

V([S.sup.*][where]D) = V([N.sup.*][where]D)[E.sup.2]([[Y.sub.i].sup.*][where]D) + E([N.sup.*][where]D)V([[Y.sub.i].sup.*][where]D)

for the predictive distribution.

In the special case when we have a diffuse prior for the Poisson parameter and the distribution of the individual claim amounts is assumed to be known we have

[Mathematical Expression Omitted] and

V([S.sup.*][where]D) = 2[NE.sup.2] ([Y.sub.i]) + NV ([Y.sub.i]) = NE([[Y.sub.i].sup.2]) + [NE.sup.2]([Y.sub.i])

Therefore,

[Mathematical Expression Omitted].

Thus, uncertainty in the value of the Poisson parameter results in equal means but greater variability in the predictive distribution. We will see in our examples in the next section that this is not true when the parameters of the individual claim amount distribution are unknown, or when the prior for the Poisson parameter is not diffuse.

MOMENTS AND PERCENTILES OF AGGREGATE CLAIMS DISTRIBUTIONS

In this section we will consider moments and percentiles of aggregate claims distributions. We will compare values for fitted and predictive distributions under three scenarios for the claim number distribution. Recall that our basic model is N[where][Lambda] has a Poisson([Lambda]) distribution. Having observed the value of N, we will use the following three distributions for the claim number distribution in the next year:

(i) Poisson[Mathematical Expression Omitted], where [Mathematical Expression Omitted] is the maximum likelihood estimate of [Lambda] (which is just equal to the observed value of N).

(ii) Negative binomial with parameters [Alpha]+N and ([Beta]+1)/([Beta]+2). This is the predictive distribution resulting from a G(a, [Beta]) prior for [Lambda].

(iii) Negative binomial with parameters N and 1/2. This is the predictive distribution for N resulting from a diffuse prior for [Lambda].

Hence case (i) results in a fitted aggregate claims distribution and cases (ii) and (iii) in predictive aggregate claims distributions. Note that all subsequent results are conditional on the observed data D.

Known Claim Amount Distribution

Let us first assume that the parameter of the exponential distribution is known and that the distribution has mean 1. Thus, we are initially considering the effect of uncertainty about the claim number distribution on the aggregate claims distribution.

Example 1. Let the observed value of N be 106. (This was in fact obtained as a simulation from the Poisson(100) distribution.) Let the parameters of the prior distribution in (ii) be [Alpha] = 4 and [Beta] = 0.04 so that the prior distribution has mean 100 and standard deviation 50. Table 1 shows the mean, variance and coefficient of skewness of the aggregate claims distributions for each of the three cases. Formulae for calculating these quantifies can be found in Panjer and Willmot (1992). The pattern of figures in Table 1 is much as expected. As the variability of the counting distribution increases from case (i) through to case (iii) the variance and skewness of the aggregate claims distribution both increase. The mean for case (ii) is less than for the other two cases. Thus, parameter variability in the counting distribution alone impacts on all insurance calculations, such as setting premiums or surplus requirements, which hinge on the moments of the aggregate claims distribution.

Table 2 shows percentiles of the aggregate claims distribution in each case. These distributions were calculated according to Panjer's (1981) recursion formula and the exponential distribution was discretised on intervals of 0.05 using the method of de Vylder and Goovaerts (1988). We will use this discretisation method in all but one of our examples. Percentiles are denoted [C.sub.x] and the tabulated values show the least value of Z such that the probability that the aggregate claim amount is no more than z is at least x. It is a consequence of our discretisation method that the percentiles are integer multiples of 0.05.

The figures in Table 2 confirm the findings from Table 1. As variability in the claim number distribution increases, percentiles of the aggregate claims distribution increase.

As a simple illustration of the effect of parameter uncertainty, consider the following problem. Suppose that the insurer calculates the premium, P, for a risk by the expected value principle using a premium loading factor of 10%, and wants to find the surplus U such that

Pr (U + P [greater than] S) = [Pi]

where S denotes aggregate claims from the risk. Table 3 shows values of U for different values of [Pi] when the distribution of S is given by each of cases (i), (ii) and (iii).

Table 3 shows that by applying a fitted distribution instead of a predictive distribution, the insurer can set up a surplus level that is quite inadequate - the worst case in Table 3 shows a surplus under the fitted distribution that is about 2/3rds of that required using the predictive distribution with a diffuse prior. This is a substantial margin, bearing in mind that the individual claim amount distribution in assumed known in this example!

Unknown Claim Amount Distribution

Let us now assume that the parameter of the exponential individual claim amount distribution is unknown. In the following example we have used the same number of claims as in Example 1, then simulated this number of observations from an exponential distribution with mean 1.

Example 2. The maximum likelihood estimate of the parameter of the exponential distribution based on 106 simulated individual claim amounts is [Mathematical Expression Omitted]. We will use the same three counting distributions as before. The individual claim amount distributions will be:

(a) Exponential with mean 0.9888 for case (i), i.e. the fitted exponential distribution.

(b) Pareto(110,108.81) for case (ii), i.e. the predictive distribution based on a gamma prior for the exponential parameter with mean 1 and variance 0.25.

(c) Pareto(106,104.81) for case (iii), i.e. the predictive distribution based on the diffuse prior.

Tables 4 and 5 show the same quantifies as Tables 1 and 2 respectively.

[TABULAR DATA FOR TABLE 5 OMITTED]

Comparing Tables 1 and 4 we see that both the variance and skewness of each aggregate claims distribution are slightly increased when we introduce parameter uncertainty to the individual claim amount distribution. However, these increases are not of a huge magnitude. Comparing Tables 2 and 5, we see that the same pattern is present in each table. The slightly smaller values in Table 5 simply reflect the lower means in Table 4. Figure 1 shows the three aggregate claims distributions.

These tables suggest that uncertainty in the claim number distribution is of much greater significance than uncertainty in the individual claim amount distribution. This is confirmed in Example 3 where we have considered a larger portfolio. This is not particularly surprising. In each case the coefficient of variation of the estimate of the parameter [Lambda] of the claim number distribution is very much greater than that of the estimate of the parameter 6 of the individual claim amount distribution.

Example 3. The maximum likelihood estimate of the parameter of the exponential distribution based on 515 simulated individual claim mounts is [Mathematical Expression Omitted]. We will consider the three cases used in Example 2. For case (ii) we have adopted the same prior distribution for the exponential parameter. The prior for the Poisson parameter has mean 500 and variance 2500. Table 6 shows percentiles of the fitted and predictive aggregate claims distributions. In addition we have shown percentiles when the individual claim amount distribution is assumed to be known (and has mean 1). These cases are denoted by K in the table.

This table shows that apart from a small change in location, the use of fitted and predictive individual claim amount distributions has little impact on the percentiles of the aggregate claims distribution calculated with the known individual claim amount distribution.

REINSURANCE PREMIUMS

In this section we make a comparison between the pure premiums for excess of loss reinsurance and for stop loss reinsurance for the three cases described in Example 2.

Excess of Loss Reinsurance

Let [S.sub.R] (M) denote the reinsurer's aggregate claim amount under an excess of loss reinsurance arrangement with retention level M. Figure 2 shows the pure excess of loss premium, E([S.sub.R] (M)), as a function of the retention level for each of the three cases. These functions were calculated from the following formulae. For case (i)

E([S.sub.R] (M)) = ([Lambda]/6) exp{-6M}

where [Lambda] is the parameter of the fitted Poisson distribution and 6 is the parameter of the fitted exponential distribution. For cases (ii) and (iii)

E([S.sub.R](M)) = k(1 - p)/p [([Delta]/[Delta] + M).sup.[Alpha]] [Delta] + M / [Alpha] - 1

where k and p are the parameters of the predictive negative binomial claim number distribution and [Alpha] and [Delta] are the parameters of the predictive Pareto individual claim amount distribution. We can see that for some values of M there is not a great deal of difference between the pure premiums. However, the difference can be significant. For example, when M = 2 the value of E([S.sub.R(M)) under case (iii) is about 5% greater than under case (i). Figure 3 shows that there are much greater differences between the variances of the aggregate claims distributions for the reinsurer. Again considering the case M = 2, the variance under case (iii) is 16% greater than under case (i). The formulae underlying Figure 3 are

V ([S.sub.R] (M)) = (2[Lambda] / [[Theta].sup.2]) exp{-[Theta]M}

for case (i) and

[Mathematical Expression Omitted]

for cases (ii) and (iii).

In both Figures 2 and 3 the values under case (ii) are very close to those under case (iii). Our experiments with other parameter values for the prior distribution for 6 in case (ii) indicate that the functions E([S.sub.R](M)) and V([S.sub.R](M)) are not particularly sensitive to the parameters of this prior distribution. Thus, the main reason for differences in values between the fitted and predictive aggregate claims distributions is the difference in the claim number distributions.

Stop Loss Reinsurance

Figure 4 shows the pure stop loss premiums as a function of the retention level, denoted d, for each of the three cases. These have been calculated recursively from the discrete aggregate claim amount distribution. (See, for example, Bowers et al 1986.) This figure shows that the premium calculated from the fitted aggregate claims distribution always understates that calculated from the predictive distribution. For example, when d = 120, the premium calculated from the fitted distribution is about 50% of that calculated from the predictive distribution of case (iii).

DISTRIBUTIONS WHOSE MOMENTS DO NOT EXIST

In the second section we noted that when the model for individual claim amounts is lognormal with parameters [Mu] and [Sigma], with a diffuse prior, the predictive distribution is that of exp([X.sup.*])[where]D where [Mathematical Expression Omitted]. An immediate problem that arises with applying this predictive distribution to insurance problems is that its moments do not exist. Klugman (1992) notes this problem in relation to a Weibull distribution. It is possible to calculate the aggregate claims distribution with this predictive individual claim amount distribution, but it does not seem to be a very useful model, especially as insurance claim amounts are finite in practice. In this section we consider two pragmatic approaches to the problem of predictive individual claim amount distributions whose moments do not exist. Each approach approximates the predictive distribution by a distribution whose moments exist. The first is specific to the lognormal model, the second is more generally applicable.

Our first approach is to approximate the [Mathematical Expression Omitted] distribution by a [Mathematical Expression Omitted] distribution. This is a well-known approximation, and as the value of n increases, the quality of the approximation improves. An immediate consequence of this approximation is that the predictive individual claim amount distribution is lognormal and hence the moments of the predictive distribution exist. If we fit a lognormal distribution to data, then the maximum likelihood estimates of the parameters are [Mathematical Expression Omitted]. Hence the moments of our (approximate) predictive lognormal distribution exceed those of the fitted lognormal distribution.

The second, and more general, approach is to assume that there is a fixed amount, say w, which is the maximum possible claim. Thus, if [Y.sup.*] [absolute value of D = exp([X.sup.*])]D has distribution function F(x) we will approximate this distribution over the interval (0, w) by F(x)/F(w). There is of course an element of subjectivity in this approach, namely the choice of w. However, the advantage of this approach is that once again all the moments of the individual claim amount distribution exist. We refer to this distribution below as the truncated predictive distribution.

To illustrate ideas, we consider a set of 100 observations which were simulated from a lognormal distribution with mean 1 and variance 3. These observations gave [Mathematical Expression Omitted] and S = 142.36. Table 7 below shows the first three moments of the three individual claim amount distributions. Case (i) is the fitted lognormal distribution, case (ii) is the (approximating) predictive lognormal distribution and case (iii) is the truncated predictive distribution. For case (iii) the moments are actually the moments of the discretised distribution used to calculate the aggregate claims distribution. For this case only, the individual claim amount distribution was discretised using the method of crude rounding. (See, for example, Panjer and Willmot 1992.) We assumed that w = 300. Under the true distribution of [Y.sup.*][where]D the probability of observing a claim in excess of 300 is less than [10.sup.-6].

As expected, the moments of both the approximate and the truncated predictive distributions exceed those of the fitted distribution. Table 8 shows moments of the aggregate claims distributions. For case (i) we have used a fitted Poisson(100) distribution for the claim number distribution, whereas for cases (ii) and (iii) we have used a predictive NB(100, 0.5) distribution.

Whilst the means of the three distributions are relatively close there is a considerable increase in variance and third central moment going from case (i) through to case (iii). Strangely, the coefficient of skewness is smaller in case (ii) than in case (i).

Finally, to get an idea of how appropriate these approaches are, let us consider percentiles of the aggregate claims distributions. Table 9 shows percentiles for four aggregate claims distributions. Cases (i) to (iii) represent the situations covered in Table 8. Case (iv) represents the true predictive distribution, using the true distribution of [Y.sup.*][where]D and a NB(100, 0.5) counting distribution.

The ordering of values in Table 9 is what we would expect from Table 8. Two features stand out. First, there is quite a difference between values in cases (ii) and (iv). Second, case (iii) proves to be a very good approximation to case (iv), in terms of percentiles at least.

Each of the above approaches has its advantages and disadvantages. The first approach has the advantage that the predictive individual claim amount distribution is easy to deal with, particularly if we wish to calculate moments. It also has greater variability than the fitted lognormal distribution. However, both the moments and percentiles are smaller than in case (iii). The second approach has the advantage that its percentiles provide a good match for those of the true predictive distribution. (It is largely a feature of our discretisation interval that there is exact correspondence in the figures given in Table 9.) The disadvantage is the subjective element introduced by w. Although there are major disadvantages to fitting parameters to distributions by matching percentiles, it does seem like a possible way of determining a suitable value for w. However, our experience has been that the moments of the predictive aggregate claims distribution are more sensitive to the value of w than the percentiles are. Nevertheless, we would suggest that truncated predictive distributions provide a good solution to the problem of predictive distributions whose moments do not exist.

CONCLUSIONS

The main conclusion to be drawn from the examples in the fourth section is that parameter uncertainty has a major impact on moments and percentiles of aggregate claims distributions. In particular, parameter uncertainty in the claim number distribution seems to be of more importance than in the individual claim amount distribution when the moments of the predictive individual claim amount distribution exist.

The objective Bayesian approach may lead to predictive distributions for which moments do not exist. However, we have shown in the penultimate section that given a Bayesian predictive distribution we can modify this distribution in such a way that it is suitable for insurance purposes.

APPENDIX 1

A random variable X is said to have a t - distribution with parameters v, [Mu] and p, denoted t([Nu], [Mu], p), if the density of X is

[Mathematical Expression Omitted]

for -[infinity] [less than] x [less than] [infinity], where -[infinity] [less than] [Mu] [less than] [infinity], v [greater than] 0 and p [greater than] 0.

The quantity (X- [Mu]) [-square root of p] has a Student t-distribution with v degrees of freedom denoted t([Nu],0,1) or [t.sub.v]. It is well known that if Y [similar to] [t.sub.v], then E(Y) = 0 and V(Y) = [Nu]([Nu]-2) provided that [Nu] [greater than] 2. It therefore follows that E(X) = [Mu] and V(x) [approximately equal to] [Nu]/([Nu] -2)p provided that [Nu] [greater than] 2.

APPENDIX 2

Suppose [Mathematical Expression Omitted] and let [Tau] = 1/[[Sigma].sup.2] and [Mathematical Expression Omitted]. We have

[Mathematical Expression Omitted]

[Tau][where]D [similar to] (v/2, S/2)

where v = n - 1 and

[Mathematical Expression Omitted].

Suppose [X.sup.*] is the next observation from the N([Mu], [[Sigma].sup.2]) distribution. Then the predictive density for [X.sup.*] is given by

[Mathematical Expression Omitted]

Consider the exponent in the inner integral. We have

[Mathematical Expression Omitted]

Therefore, the inner integral becomes

[Mathematical Expression Omitted]

Hence we have

[Mathematical Expression Omitted]

and so

[Mathematical Expression Omitted]

REFERENCES

Bowers, Newton L., Hans U. Gerber, James C. Hickman, Donald A. Jones, and Cecil J. Nesbitt. 1986. Actuarial Mathematics. Itasca, IL: Society of Actuaries.

Cairns, Andrew J.G. 1995. Uncertainty in the Modelling Process. In Transactions of the XXV International Congress of Actuaries. Vol. 1. Leuven, Ceuterick.

De Vylder, Florian, and Marc J. Goovaerts. 1988. Recursive Calculation of Finite-Time Ruin Probabilities. Insurance, Mathematics & Economics 7(1): 1-8.

Hesselager, Ole. 1993. A Class of Conjugate Priors with Applications to Excess of Loss Reinsurance. ASTIN Bulletin 23(1): 77-95.

Hurlimann, Werner. 1993. Predictive Stop-Loss Premiums. ASTIN Bulletin 23(1): 55-76.

Hurlimann, Werner. 1995. Predictive Stop-Loss Premiums and Student's t-Distribution. Insurance, Mathematics & Economics 16(2): 151-159.

Klugman, Smart A. 1992. Bayesian Statistics in Actuarial Science. Norwell, MA: Kluwer Academic Publishers.

Lee, Peter M. 1989. Bayesian Statistics: An Introduction. New York: Oxford University Press.

Mathiasen, Poul E. 1979. Prediction Functions. Scandinavian Journal of Statistics 6(1): 1-21.

Panjer, Harry H. 1981. Recursive Evaluation of a Family of Compound Distributions. ASTIN Bulletin 12(1): 22-26.

Panjer, Harry H., and Gordon E. Willmot. 1992. Insurance Risk Models. Schaumberg, IL: Society of Actuaries.

Schroter, Klaus J. 1991. On a Family of Counting Distributions and Recursions for Related Compound Distributions. Scandinavian Actuarial Journal (1991): 161-175.

Sundt, Bjorn. 1992. On Some Extensions of Panjer's Class of Counting Distributions. ASTIN Bulletin 22(1): 61-80.

David C. M. Dickson is an Associate Professor at the University of Melbourne. Leanna M. Tedesco is a Statistical Analyst at the Anz Banking Group LTD. Ben Zehnwirth is Managing Director of Insureware. The authors gratefully acknowledge financial support for this project from an Institute of Actuaries of Australia research grant and a Faculty of Economics and Commerce research grant (The University of Melbourne).

The traditional inferential actuarial problem usually involves the passage from observed past data to unobserved future outcomes. It is well known that actuarial calculations, stop loss premiums, for example, are not robust to the distributional assumptions. Fitted distributions, in contrast to predictive distributions, do not incorporate parameter estimation uncertainty, or what North American actuaries call "risk parameter uncertainty". As a result, premium calculations, ruin probability calculations, surplus calculations etc., can be understated if they are based on the fitted distribution. In certain circumstances the understatement can be quite substantial. Incorporation of parameter uncertainty can make a substantial difference, even to premiums calculated by the expected value principle.

The traditional inferential actuarial problem is solved from a Bayesian point of view in a straightforward manner by finding the posterior predictive density of the unobserved random variable given the data. The Bayesian paradigm is often criticised because the choice of the prior distribution is subjective and removes the aura of objectivity in any analysis. This objection is removed as we use non-informative priors. A Bayesian uses the so-called Bayesian predictive density to forecast future observations. The approach is natural in that one bases one's predictions on the conditional distribution of the future given the past. Moreover, the Bayesian predictive distribution automatically incorporates both sources of uncertainty, namely process uncertainty and parameter estimation error.

The use of predictive distributions, in place of fitted distributions, is not new. Klugman (1992) describes the (objective) Bayesian paradigm and illustrates, inter alia, the difference between a fitted and a predictive Weibull distribution. The fitted distribution does not display as much variation as the predictive distribution. Indeed, the fitted distribution does not necessarily belong to the same family of distributions as the predictive distribution. Cairns (1995) describes the Bayesian approach as a way of incorporating parameter uncertainty in the "Modelling Process" and applies the approach to rain theory's adjustment coefficient. Applications of predictive distributions to reinsurance have been given by Hesselager (1993) and Hurlimann (1993, 1995).

In the last decade attempts have also been made to develop non-Bayesian likelihood approaches to prediction via the concept of predictive likelihood. The concept of predictive likelihood is rather vague, reflected by the fact that there are over a dozen versions of predictive likelihood. For example, the profile predictive likelihood, first studied by Mathiasen (1979), yields the same predictive distribution for the mean of the normal distribution as the Bayesian predictive distribution with respect to a flat prior. This is because the Bayesian posterior predictive density with a fiat prior can be thought of as an integrated marginal likelihood.

In the present paper we apply the Bayesian paradigm, and in certain circumstances illustrate that the profile likelihood produces the same predictive distribution. However, since the Bayesian approach is straightforward and easier to apply, we use it as a standard pragmatic device.

The outline of the paper is as follows. In the second and third sections, we present background results. In the fourth section, we compare fitted and predictive aggregate claims distributions and in the fifth section we consider reinsurance. Finally, in the penultimate section, we discuss the situation when the moments of the predictive aggregate claim amount distribution do not exist.

PRELIMINARIES

In this section predictive distributions for the exponential and Poisson distributions are provided. In broad outline we start with prior beliefs represented by proper prior distributions and take limits to obtain posterior beliefs and predictive distributions based on diffuse (or ignorant) priors. We will use these predictive distributions in our examples in subsequent sections.

Exponential Distribution with Gamma Prior

Let [X.sub.1], ..., [X.sub.n][where][Theta] be independent and identically distributed random variables with p.d.f.

f(x[where][Theta]) = [Theta] exp {-[Theta] x} for x [greater than] 0 (1)

where 6 [greater than] 0. Led D = ([X.sub.1], ..., [X.sub.n]) represent the data vector. When the prior distribution for 6 is G([Alpha], [Beta]), i.e. a gamma distribution with p.d.f.

f([Theta]) = [[Theta].sup.[Alpha]-1] [[Beta].sup.[Alpha]] [e.sup.-[Beta][Theta]]/[Gamma]([Alpha]) for [Theta] [greater than] 0

where [Alpha], [Beta] [greater than] 0, it is well known that

[Mathematical Expression Omitted]

where [similar to] is to be read "is distributed as." Now let g([Theta][where]D) be the p.d.f. of [Theta][where]D and let [X.sup.*] be a subsequent observation from the exponential distribution given by (1). Then the p.d.f. of [X.sup.*][where]D is given by

[Mathematical Expression Omitted]

so that

[Mathematical Expression Omitted].

Thus [X.sup.*][where]D has a Pareto distribution with parameters n+a and [Mathematical Expression Omitted].

The diffuse prior is obtained by letting both a and [Beta] go to zero in such a way that [Alpha]/[Beta] is a constant. Hence, with a diffuse prior [X.sup.*][where]D has a Pareto [Mathematical Expression Omitted] distribution.

Now, the profile predictive likelihood is given by the joint likelihood of [X.sub.1], ..., [X.sub.n] and [X.sup.*] evaluated at the maximum likelihood estimator [Mathematical Expression Omitted] based on the observations [X.sub.1], ..., [X.sub.n] and [X.sup.*]. So the profile likelihood is proportional to

[Mathematical Expression Omitted]

which is equivalent to the predictive distribution with respect to a diffuse prior.

Note that when the prior is diffuse both E([X.sup.*][where]D) and V([X.sup.*][where]D) exceed the mean and variance of an exponential distribution whose parameter is the maximum likelihood estimate of 6. This may not be the case when the prior is not diffuse.

Poisson Distribution with Gamma Prior

Let N[where][Lambda] [similar to] Poisson([Lambda]) and let the prior distribution for [Lambda] be G([Alpha], [Beta]). Then the unconditional distribution of N is negative binomial NB([Alpha], [Beta]/(1 + [Beta]))and the posterior distribution for [Lambda][where]N is G([Alpha]+N, 1 + [Beta]). It therefore follows that if [N.sup.*] is a subsequent observation from the Poisson distribution then the predictive distribution for [N.sup.*][where]N is NB([Alpha] + N, (1 + [Beta])/(2 + [Beta])).

The diffuse prior then leads to a predictive distribution that is NB(N,1/2). Hence the predictive distribution has the same mean as the fitted Poisson(N) distribution, but has twice the variance of the fitted distribution.

In this particular case, the mean square error (MSE) of prediction using classical theory is

E[([N.sup.*] - N).sup.2] = E[([N.sup.*] - [Lambda] + [Lambda] - N).sup.2]

= E[([N.sup.*] - [Lambda]).sup.2] + E[(N - [Lambda]).sup.2]

= [Lambda] + [Lambda]

= 2[Lambda].

Hence, an estimate of the MSE is given by [Mathematical Expression Omitted] which is the same as the predictive variance of [N.sup.*] under the Bayesian paradigm with a diffuse prior. Accordingly, the estimate of the MSE of prediction using the classical approach is equivalent to the predictive variance using the objective Bayesian approach. However, the profile predictive distribution is very different from the Bayesian one, i.e. NB(N,1/2).

Normal Distribution with Normal Prior for Mean and Known Variance

Let [Mathematical Expression Omitted] with [[Sigma].sup.2] known and let the prior distribution for [Mu] be [Mathematical Expression Omitted]. Let D = ([X.sub.1], ..., [X.sub.n]) again represent the data vector. It is well known (see, for example, Lee 1989) that

[Mathematical Expression Omitted]

where [Mathematical Expression Omitted] is given by

[Mathematical Expression Omitted]

and the posterior variance or mean square error (MSE) is given by

[Mathematical Expression Omitted]

where Z (the credibility factor) gives the relative precision of the two sources of information, i.e.

[Mathematical Expression Omitted].

Let us assume for the remainder of this section that the prior is diffuse, i.e. that [Mathematical Expression Omitted]. Then we fred that [Mathematical Expression Omitted] and [Mathematical Expression Omitted].

Now let [X.sup.*] be a subsequent observation from the N([Mu], [[Sigma].sup.2]) distribution. Then it is straightforward to show that the distribution of [Mathematical Expression Omitted]. Hence the predictive distribution contains the two sources of uncertainty or variability, viz., [Mathematical Expression Omitted] and [[Sigma].sup.2].

Now let [Y.sup.*] = exp([X.sup.*]). Then the distribution of [Y.sup.*][where]D is lognormal with parameters [Mathematical Expression Omitted] and ([[Sigma].sup.2]/n) + [[Sigma].sup.2],and so

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted].

Note that both E([Y.sup.*][where]D) and V([Y.sup.*][where]D) exceed the mean and variance of a lognormal distribution whose parameters are the maximum likelihood estimates of [Mu] and [[Sigma].sup.2]. The predictive mean incorporates the component exp([[Sigma].sup.2]/2n) where [[Sigma].sup.2]/n is the variance of the sample mean.

Normal Distribution with Gamma Prior for Unknown Variance

Once more let [Mathematical Expression Omitted]. If [[Sigma].sup.2] is unknown the diffuse prior is given by

p([[Sigma].sup.2]) [varies] 1/[[Sigma].sup.2]

where [varies] means "is proportional to." Note that the prior is an improper distribution. However, both the posterior and predictive distributions are proper. We know (see, for example, Lee (1989)) that

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted] (2)

where

[Mathematical Expression Omitted],

G([Alpha], [Beta]) denotes a gamma distribution with mean [Alpha]/[Beta] and t(v, r, s) denotes a t-distribution which is defined in Appendix 1. If we let s = [(S/(n-1)).sup.1/2] then an alternative way of writing (2) is

[Mathematical Expression Omitted].

The classical statistics (or sampling theory statistics) approach leads to a similar conclusion. In this case

[Mathematical Expression Omitted].

However, in the classical approach it is the data that are regarded as random and the parameters [Mu] and [[Sigma].sup.2] are fixed. By contrast, the objective Bayesian approach regards [Mu] and [[Sigma].sup.2] as random and the available data as being fixed.

If [X.sup.*] denotes a subsequent observation from the N([Mu], [[Sigma].sup.2]) distribution, the predictive distribution for [X.sup.*][where]D is

[Mathematical Expression Omitted].

This result is derived in Appendix 2. From results given in Appendix I it follows that [Mathematical Expression Omitted] and V([X.sup.*][where]D) = (n+1)S/(n(n - 3)). From the classical statistics standpoint, the distribution of

[Mathematical Expression Omitted]

is also t(n - 1,0,1) and is equivalent to the profile predictive likelihood. (Here we use S/(n - 3) as the estimator of [[Sigma].sup.2].)

If we again define [Y.sup.*] by [Y.sup.*] = exp([X.sup.*]), then the predictive distribution for [Y.sup.*][where]D is the distribution of exp([X.sup.*])[where]D.

PREDICTIVE AGGREGATE CLAIMS DISTRIBUTIONS

The traditional risk model for aggregate claims is as follows:

S = [Y.sub.1] + ... + [Y.sub.N]

where S represents the aggregate claim amount in a fixed time period (typically one year), N represents the number of claims occurring in that period, and [Y.sub.1], [Y.sub.2], ... represent the amounts of successive claims. We assume that [Mathematical Expression Omitted] is a sequence of i.i.d. random variables and that N is independent of this sequence.

In the last fifteen years much attention has been devoted to computing the aggregate claims distribution given the distributions of N and [Y.sub.i]. Examples of recursive algorithms to calculate the distribution of S are given by Panjer (1981), Schroter (1991) and Sundt (1992).

The approach in practice is to fit the distributions of N and [Y.sub.i] to data and, if the distribution of N belongs to the appropriate class, to apply the fitted distributions as inputs into these algorithms. There is a major drawback to this approach as the distributions do not incorporate parameter estimation error. The distributions are assumed to be known with certainty.

Consider the following statistical setting:

Data for one time period: D = {N; [Y.sub.1], ..., [Y.sub.N]}

"Future" observations for the next period are [N.sup.*]; [[Y.sub.1].sup.*], ..., [[Y.sup.*].sub.[N.sup.*]].

We are interested in computing the predictive distribution of

[S.sup.*] = [[Y.sub.1].sup.*] + ... + [[Y.sup.*].sub.[N.sup.*]]

(conditional on the data D).

We will compare this with the fitted aggregate claims distribution defined by

[Mathematical Expression Omitted]

where [Mathematical Expression Omitted] has the fitted distribution of N based on the data D, and [Mathematical Expression Omitted] has the fitted distribution of [Y.sub.i].

We will assume throughout that the claim count N[where][Lambda] has a Poisson distribution with mean [Lambda]. Accordingly, if [Mathematical Expression Omitted] is the fitted value of [Lambda], the mean and variance of the fitted and predictive aggregate claims are

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

for the fitted distribution and

E([S.sup.*][where]D) = E([N.sup.*][where]D)E([[Y.sub.i].sup.*][where]D) = NE ([[Y.sub.i].sup.*][where]D)

V([S.sup.*][where]D) = V([N.sup.*][where]D)[E.sup.2]([[Y.sub.i].sup.*][where]D) + E([N.sup.*][where]D)V([[Y.sub.i].sup.*][where]D)

for the predictive distribution.

In the special case when we have a diffuse prior for the Poisson parameter and the distribution of the individual claim amounts is assumed to be known we have

[Mathematical Expression Omitted] and

V([S.sup.*][where]D) = 2[NE.sup.2] ([Y.sub.i]) + NV ([Y.sub.i]) = NE([[Y.sub.i].sup.2]) + [NE.sup.2]([Y.sub.i])

Therefore,

[Mathematical Expression Omitted].

Thus, uncertainty in the value of the Poisson parameter results in equal means but greater variability in the predictive distribution. We will see in our examples in the next section that this is not true when the parameters of the individual claim amount distribution are unknown, or when the prior for the Poisson parameter is not diffuse.

MOMENTS AND PERCENTILES OF AGGREGATE CLAIMS DISTRIBUTIONS

In this section we will consider moments and percentiles of aggregate claims distributions. We will compare values for fitted and predictive distributions under three scenarios for the claim number distribution. Recall that our basic model is N[where][Lambda] has a Poisson([Lambda]) distribution. Having observed the value of N, we will use the following three distributions for the claim number distribution in the next year:

(i) Poisson[Mathematical Expression Omitted], where [Mathematical Expression Omitted] is the maximum likelihood estimate of [Lambda] (which is just equal to the observed value of N).

(ii) Negative binomial with parameters [Alpha]+N and ([Beta]+1)/([Beta]+2). This is the predictive distribution resulting from a G(a, [Beta]) prior for [Lambda].

(iii) Negative binomial with parameters N and 1/2. This is the predictive distribution for N resulting from a diffuse prior for [Lambda].

Hence case (i) results in a fitted aggregate claims distribution and cases (ii) and (iii) in predictive aggregate claims distributions. Note that all subsequent results are conditional on the observed data D.

Known Claim Amount Distribution

Let us first assume that the parameter of the exponential distribution is known and that the distribution has mean 1. Thus, we are initially considering the effect of uncertainty about the claim number distribution on the aggregate claims distribution.

Example 1. Let the observed value of N be 106. (This was in fact obtained as a simulation from the Poisson(100) distribution.) Let the parameters of the prior distribution in (ii) be [Alpha] = 4 and [Beta] = 0.04 so that the prior distribution has mean 100 and standard deviation 50. Table 1 shows the mean, variance and coefficient of skewness of the aggregate claims distributions for each of the three cases. Formulae for calculating these quantifies can be found in Panjer and Willmot (1992). The pattern of figures in Table 1 is much as expected. As the variability of the counting distribution increases from case (i) through to case (iii) the variance and skewness of the aggregate claims distribution both increase. The mean for case (ii) is less than for the other two cases. Thus, parameter variability in the counting distribution alone impacts on all insurance calculations, such as setting premiums or surplus requirements, which hinge on the moments of the aggregate claims distribution.

Table 1 Moments of Aggregate Claims Distributions Case Mean Variance Skewness (i) 106.00 212.00 0.2060 (ii) 105.77 313.24 0.2598 (iii) 106.00 318.00 0.2617

Table 2 shows percentiles of the aggregate claims distribution in each case. These distributions were calculated according to Panjer's (1981) recursion formula and the exponential distribution was discretised on intervals of 0.05 using the method of de Vylder and Goovaerts (1988). We will use this discretisation method in all but one of our examples. Percentiles are denoted [C.sub.x] and the tabulated values show the least value of Z such that the probability that the aggregate claim amount is no more than z is at least x. It is a consequence of our discretisation method that the percentiles are integer multiples of 0.05.

Table 2 Percentiles of Aggregate Claims Distributions Case [C.sub.0.90] [C.sub.0.95] [C.sub.0.99] [C.sub.0.995] (i) 124.95 130.80 142.05 146.30 (ii) 128.90 136.15 150.25 155.60 (iii) 129.30 136.60 150.85 156.25

The figures in Table 2 confirm the findings from Table 1. As variability in the claim number distribution increases, percentiles of the aggregate claims distribution increase.

As a simple illustration of the effect of parameter uncertainty, consider the following problem. Suppose that the insurer calculates the premium, P, for a risk by the expected value principle using a premium loading factor of 10%, and wants to find the surplus U such that

Pr (U + P [greater than] S) = [Pi]

where S denotes aggregate claims from the risk. Table 3 shows values of U for different values of [Pi] when the distribution of S is given by each of cases (i), (ii) and (iii).

Table 3 Required Initial Surplus Case [Pi] = 0.1 [Pi] = 0.05 [Pi] = 0.01 [Pi] = 0.005 (i) 8.35 14.20 25.45 29.70 (ii) 12.55 19.80 33.90 39.25 (iii) 12.70 20.00 34.25 39.65

Table 3 shows that by applying a fitted distribution instead of a predictive distribution, the insurer can set up a surplus level that is quite inadequate - the worst case in Table 3 shows a surplus under the fitted distribution that is about 2/3rds of that required using the predictive distribution with a diffuse prior. This is a substantial margin, bearing in mind that the individual claim amount distribution in assumed known in this example!

Unknown Claim Amount Distribution

Let us now assume that the parameter of the exponential individual claim amount distribution is unknown. In the following example we have used the same number of claims as in Example 1, then simulated this number of observations from an exponential distribution with mean 1.

Example 2. The maximum likelihood estimate of the parameter of the exponential distribution based on 106 simulated individual claim amounts is [Mathematical Expression Omitted]. We will use the same three counting distributions as before. The individual claim amount distributions will be:

(a) Exponential with mean 0.9888 for case (i), i.e. the fitted exponential distribution.

(b) Pareto(110,108.81) for case (ii), i.e. the predictive distribution based on a gamma prior for the exponential parameter with mean 1 and variance 0.25.

(c) Pareto(106,104.81) for case (iii), i.e. the predictive distribution based on the diffuse prior.

Tables 4 and 5 show the same quantifies as Tables 1 and 2 respectively.

Table 4 Moments of Aggregate Claims Distributions Case Mean Variance Skewness (i) 104.81 207.28 0.2060 (ii) 105.59 314.12 0.2616 (iii) 105.81 318.89 0.2635

[TABULAR DATA FOR TABLE 5 OMITTED]

Comparing Tables 1 and 4 we see that both the variance and skewness of each aggregate claims distribution are slightly increased when we introduce parameter uncertainty to the individual claim amount distribution. However, these increases are not of a huge magnitude. Comparing Tables 2 and 5, we see that the same pattern is present in each table. The slightly smaller values in Table 5 simply reflect the lower means in Table 4. Figure 1 shows the three aggregate claims distributions.

These tables suggest that uncertainty in the claim number distribution is of much greater significance than uncertainty in the individual claim amount distribution. This is confirmed in Example 3 where we have considered a larger portfolio. This is not particularly surprising. In each case the coefficient of variation of the estimate of the parameter [Lambda] of the claim number distribution is very much greater than that of the estimate of the parameter 6 of the individual claim amount distribution.

Example 3. The maximum likelihood estimate of the parameter of the exponential distribution based on 515 simulated individual claim mounts is [Mathematical Expression Omitted]. We will consider the three cases used in Example 2. For case (ii) we have adopted the same prior distribution for the exponential parameter. The prior for the Poisson parameter has mean 500 and variance 2500. Table 6 shows percentiles of the fitted and predictive aggregate claims distributions. In addition we have shown percentiles when the individual claim amount distribution is assumed to be known (and has mean 1). These cases are denoted by K in the table.

Table 6 Percentiles of Fitted and Predictive Aggregate Claims Distributions Case [C.sub.0.90] [C.sub.0.95] [C.sub.0.99] [C.sub.0.995] (i)K 556.45 568.65 591.85 600.45 (ii)K 561.80 576.40 604.35 614.75 (iii)K 565.85 580.95 609.85 620.60 (i) 548.95 560.95 583.85 592.35 (ii) 555.35 569.80 597.45 607.70 (iii) 559.30 574.25 602.80 613.45

This table shows that apart from a small change in location, the use of fitted and predictive individual claim amount distributions has little impact on the percentiles of the aggregate claims distribution calculated with the known individual claim amount distribution.

REINSURANCE PREMIUMS

In this section we make a comparison between the pure premiums for excess of loss reinsurance and for stop loss reinsurance for the three cases described in Example 2.

Excess of Loss Reinsurance

Let [S.sub.R] (M) denote the reinsurer's aggregate claim amount under an excess of loss reinsurance arrangement with retention level M. Figure 2 shows the pure excess of loss premium, E([S.sub.R] (M)), as a function of the retention level for each of the three cases. These functions were calculated from the following formulae. For case (i)

E([S.sub.R] (M)) = ([Lambda]/6) exp{-6M}

where [Lambda] is the parameter of the fitted Poisson distribution and 6 is the parameter of the fitted exponential distribution. For cases (ii) and (iii)

E([S.sub.R](M)) = k(1 - p)/p [([Delta]/[Delta] + M).sup.[Alpha]] [Delta] + M / [Alpha] - 1

where k and p are the parameters of the predictive negative binomial claim number distribution and [Alpha] and [Delta] are the parameters of the predictive Pareto individual claim amount distribution. We can see that for some values of M there is not a great deal of difference between the pure premiums. However, the difference can be significant. For example, when M = 2 the value of E([S.sub.R(M)) under case (iii) is about 5% greater than under case (i). Figure 3 shows that there are much greater differences between the variances of the aggregate claims distributions for the reinsurer. Again considering the case M = 2, the variance under case (iii) is 16% greater than under case (i). The formulae underlying Figure 3 are

V ([S.sub.R] (M)) = (2[Lambda] / [[Theta].sup.2]) exp{-[Theta]M}

for case (i) and

[Mathematical Expression Omitted]

for cases (ii) and (iii).

In both Figures 2 and 3 the values under case (ii) are very close to those under case (iii). Our experiments with other parameter values for the prior distribution for 6 in case (ii) indicate that the functions E([S.sub.R](M)) and V([S.sub.R](M)) are not particularly sensitive to the parameters of this prior distribution. Thus, the main reason for differences in values between the fitted and predictive aggregate claims distributions is the difference in the claim number distributions.

Stop Loss Reinsurance

Figure 4 shows the pure stop loss premiums as a function of the retention level, denoted d, for each of the three cases. These have been calculated recursively from the discrete aggregate claim amount distribution. (See, for example, Bowers et al 1986.) This figure shows that the premium calculated from the fitted aggregate claims distribution always understates that calculated from the predictive distribution. For example, when d = 120, the premium calculated from the fitted distribution is about 50% of that calculated from the predictive distribution of case (iii).

DISTRIBUTIONS WHOSE MOMENTS DO NOT EXIST

In the second section we noted that when the model for individual claim amounts is lognormal with parameters [Mu] and [Sigma], with a diffuse prior, the predictive distribution is that of exp([X.sup.*])[where]D where [Mathematical Expression Omitted]. An immediate problem that arises with applying this predictive distribution to insurance problems is that its moments do not exist. Klugman (1992) notes this problem in relation to a Weibull distribution. It is possible to calculate the aggregate claims distribution with this predictive individual claim amount distribution, but it does not seem to be a very useful model, especially as insurance claim amounts are finite in practice. In this section we consider two pragmatic approaches to the problem of predictive individual claim amount distributions whose moments do not exist. Each approach approximates the predictive distribution by a distribution whose moments exist. The first is specific to the lognormal model, the second is more generally applicable.

Our first approach is to approximate the [Mathematical Expression Omitted] distribution by a [Mathematical Expression Omitted] distribution. This is a well-known approximation, and as the value of n increases, the quality of the approximation improves. An immediate consequence of this approximation is that the predictive individual claim amount distribution is lognormal and hence the moments of the predictive distribution exist. If we fit a lognormal distribution to data, then the maximum likelihood estimates of the parameters are [Mathematical Expression Omitted]. Hence the moments of our (approximate) predictive lognormal distribution exceed those of the fitted lognormal distribution.

The second, and more general, approach is to assume that there is a fixed amount, say w, which is the maximum possible claim. Thus, if [Y.sup.*] [absolute value of D = exp([X.sup.*])]D has distribution function F(x) we will approximate this distribution over the interval (0, w) by F(x)/F(w). There is of course an element of subjectivity in this approach, namely the choice of w. However, the advantage of this approach is that once again all the moments of the individual claim amount distribution exist. We refer to this distribution below as the truncated predictive distribution.

To illustrate ideas, we consider a set of 100 observations which were simulated from a lognormal distribution with mean 1 and variance 3. These observations gave [Mathematical Expression Omitted] and S = 142.36. Table 7 below shows the first three moments of the three individual claim amount distributions. Case (i) is the fitted lognormal distribution, case (ii) is the (approximating) predictive lognormal distribution and case (iii) is the truncated predictive distribution. For case (iii) the moments are actually the moments of the discretised distribution used to calculate the aggregate claims distribution. For this case only, the individual claim amount distribution was discretised using the method of crude rounding. (See, for example, Panjer and Willmot 1992.) We assumed that w = 300. Under the true distribution of [Y.sup.*][where]D the probability of observing a claim in excess of 300 is less than [10.sup.-6].

Table 7 Moments of Individual Claim Amount Distributions Case 1st moment 2nd moment 3rd moment (i) 1.0232 4.3469 76.6781 (ii) 1.0537 4.8884 99.8625 (iii) 1.0598 5.3427 135.6334

As expected, the moments of both the approximate and the truncated predictive distributions exceed those of the fitted distribution. Table 8 shows moments of the aggregate claims distributions. For case (i) we have used a fitted Poisson(100) distribution for the claim number distribution, whereas for cases (ii) and (iii) we have used a predictive NB(100, 0.5) distribution.

Table 8 Moments of Aggregate Claims Distributions Case Mean Variance Skewness (i) 102.32 434.69 0.8461 (ii) 105.37 599.86 0.8008 (iii) 105.98 646.59 0.9427

Whilst the means of the three distributions are relatively close there is a considerable increase in variance and third central moment going from case (i) through to case (iii). Strangely, the coefficient of skewness is smaller in case (ii) than in case (i).

Finally, to get an idea of how appropriate these approaches are, let us consider percentiles of the aggregate claims distributions. Table 9 shows percentiles for four aggregate claims distributions. Cases (i) to (iii) represent the situations covered in Table 8. Case (iv) represents the true predictive distribution, using the true distribution of [Y.sup.*][where]D and a NB(100, 0.5) counting distribution.

Table 9 Percentiles of Aggregate Claims Distributions Case [C.sub.0.90] [C.sub.0.95] [C.sub.0.99] [C.sub.0.995] (i) 129.10 139.10 161.70 171.90 (ii) 136.95 148.60 174.50 186.00 (iii) 138.35 150.75 179.50 193.20 (iv) 138.35 150.75 179.50 193.20

The ordering of values in Table 9 is what we would expect from Table 8. Two features stand out. First, there is quite a difference between values in cases (ii) and (iv). Second, case (iii) proves to be a very good approximation to case (iv), in terms of percentiles at least.

Each of the above approaches has its advantages and disadvantages. The first approach has the advantage that the predictive individual claim amount distribution is easy to deal with, particularly if we wish to calculate moments. It also has greater variability than the fitted lognormal distribution. However, both the moments and percentiles are smaller than in case (iii). The second approach has the advantage that its percentiles provide a good match for those of the true predictive distribution. (It is largely a feature of our discretisation interval that there is exact correspondence in the figures given in Table 9.) The disadvantage is the subjective element introduced by w. Although there are major disadvantages to fitting parameters to distributions by matching percentiles, it does seem like a possible way of determining a suitable value for w. However, our experience has been that the moments of the predictive aggregate claims distribution are more sensitive to the value of w than the percentiles are. Nevertheless, we would suggest that truncated predictive distributions provide a good solution to the problem of predictive distributions whose moments do not exist.

CONCLUSIONS

The main conclusion to be drawn from the examples in the fourth section is that parameter uncertainty has a major impact on moments and percentiles of aggregate claims distributions. In particular, parameter uncertainty in the claim number distribution seems to be of more importance than in the individual claim amount distribution when the moments of the predictive individual claim amount distribution exist.

The objective Bayesian approach may lead to predictive distributions for which moments do not exist. However, we have shown in the penultimate section that given a Bayesian predictive distribution we can modify this distribution in such a way that it is suitable for insurance purposes.

APPENDIX 1

A random variable X is said to have a t - distribution with parameters v, [Mu] and p, denoted t([Nu], [Mu], p), if the density of X is

[Mathematical Expression Omitted]

for -[infinity] [less than] x [less than] [infinity], where -[infinity] [less than] [Mu] [less than] [infinity], v [greater than] 0 and p [greater than] 0.

The quantity (X- [Mu]) [-square root of p] has a Student t-distribution with v degrees of freedom denoted t([Nu],0,1) or [t.sub.v]. It is well known that if Y [similar to] [t.sub.v], then E(Y) = 0 and V(Y) = [Nu]([Nu]-2) provided that [Nu] [greater than] 2. It therefore follows that E(X) = [Mu] and V(x) [approximately equal to] [Nu]/([Nu] -2)p provided that [Nu] [greater than] 2.

APPENDIX 2

Suppose [Mathematical Expression Omitted] and let [Tau] = 1/[[Sigma].sup.2] and [Mathematical Expression Omitted]. We have

[Mathematical Expression Omitted]

[Tau][where]D [similar to] (v/2, S/2)

where v = n - 1 and

[Mathematical Expression Omitted].

Suppose [X.sup.*] is the next observation from the N([Mu], [[Sigma].sup.2]) distribution. Then the predictive density for [X.sup.*] is given by

[Mathematical Expression Omitted]

Consider the exponent in the inner integral. We have

[Mathematical Expression Omitted]

Therefore, the inner integral becomes

[Mathematical Expression Omitted]

Hence we have

[Mathematical Expression Omitted]

and so

[Mathematical Expression Omitted]

REFERENCES

Bowers, Newton L., Hans U. Gerber, James C. Hickman, Donald A. Jones, and Cecil J. Nesbitt. 1986. Actuarial Mathematics. Itasca, IL: Society of Actuaries.

Cairns, Andrew J.G. 1995. Uncertainty in the Modelling Process. In Transactions of the XXV International Congress of Actuaries. Vol. 1. Leuven, Ceuterick.

De Vylder, Florian, and Marc J. Goovaerts. 1988. Recursive Calculation of Finite-Time Ruin Probabilities. Insurance, Mathematics & Economics 7(1): 1-8.

Hesselager, Ole. 1993. A Class of Conjugate Priors with Applications to Excess of Loss Reinsurance. ASTIN Bulletin 23(1): 77-95.

Hurlimann, Werner. 1993. Predictive Stop-Loss Premiums. ASTIN Bulletin 23(1): 55-76.

Hurlimann, Werner. 1995. Predictive Stop-Loss Premiums and Student's t-Distribution. Insurance, Mathematics & Economics 16(2): 151-159.

Klugman, Smart A. 1992. Bayesian Statistics in Actuarial Science. Norwell, MA: Kluwer Academic Publishers.

Lee, Peter M. 1989. Bayesian Statistics: An Introduction. New York: Oxford University Press.

Mathiasen, Poul E. 1979. Prediction Functions. Scandinavian Journal of Statistics 6(1): 1-21.

Panjer, Harry H. 1981. Recursive Evaluation of a Family of Compound Distributions. ASTIN Bulletin 12(1): 22-26.

Panjer, Harry H., and Gordon E. Willmot. 1992. Insurance Risk Models. Schaumberg, IL: Society of Actuaries.

Schroter, Klaus J. 1991. On a Family of Counting Distributions and Recursions for Related Compound Distributions. Scandinavian Actuarial Journal (1991): 161-175.

Sundt, Bjorn. 1992. On Some Extensions of Panjer's Class of Counting Distributions. ASTIN Bulletin 22(1): 61-80.

David C. M. Dickson is an Associate Professor at the University of Melbourne. Leanna M. Tedesco is a Statistical Analyst at the Anz Banking Group LTD. Ben Zehnwirth is Managing Director of Insureware. The authors gratefully acknowledge financial support for this project from an Institute of Actuaries of Australia research grant and a Faculty of Economics and Commerce research grant (The University of Melbourne).

Printer friendly Cite/link Email Feedback | |

Author: | Dickson, David C.M.; Tedesco, Leanna M.; Zehnwirth, Ben |
---|---|

Publication: | Journal of Risk and Insurance |

Date: | Dec 1, 1998 |

Words: | 5822 |

Previous Article: | More than cost shifting: moral hazard lowers productivity. |

Next Article: | Risk and insurance research productivity: 1987-1996. |

Topics: |