# The impact of age on employment tenure: results from an employment discrimination case.

I. Introduction

In a recent Equal Employment Opportunity Commission class action suit, a company was charged with discriminating against applicants age 40 and over. During the first phase of the trial the company was found guilty of age discrimination. The second phase of the trial involved determining damages to be awarded to a class of approximately 152 members age 40 or over. Part of the damage calculation required estimating the amount of time members of the class would have been employed, absent the discrimination.

The employer in this case argued that there was a link between age and job tenure, and that if older workers had been hired, the company's experience suggests that they would not have been employed as long as the pool of current younger workers. Therefore, in calculating damages, it is important to examine the tenure/age relationship.

Information about current and previous employees is used to predict the relationship between age and tenure. For previous employees, company records indicate the date of hire and the date of termination. For those employees tenure is calculated as the number of weeks from hire to termination. For current employees, however, the calculation is not as straightforward. The date of hire is known. But, since each is employed at the time of sampling, termination has to occur at some unknown time in the future. Therefore, each current employee's true job tenure is underestimated, starting at the date of hire and ending at the date of sampling. This is a classic censoring problem.

One way to address the censoring problem may be to remove the censored observations from the sample. This would leave a subsample of just uncensored observations, each with known job tenure. However, if the censored observations come from a different population than the uncensored observations, using only the uncensored observations for statistical analysis will lead to biased predictions. This means that a statistical procedure will have to account for censoring. Fortunately, a procedure exists to estimate job tenures when censoring is an issue. This procedure is called duration modeling.

A duration model is developed that accounts for censored data. The model allows for specifying tenure as a function of age. The model is estimated for a sample of 170 current and previous employees. The results indicate that tenure is decreasing in age. Someone starting employment at age 24, for example, would have an expected job tenure of 166 weeks. Someone starting employment at age 40 would have an expected tenure of 129 weeks.

The next section of this paper describes duration modeling. This is a statistical technique that can be used to estimate tenure as a function of age when censoring occurs. Section III describes the data related to this particular case. Results of the estimation are shown in section IV. These results are compared to results generated either by omitting censored observations or by using all observations but not accounting for censoring. This is followed by a conclusion.

II. Duration Estimation with a Censored Sample

Duration models can be used to estimate tenure when there is a censoring problem. The simplest form of the duration model makes tenure solely a function of time. Let T represent someone's tenure or duration in a job. Some employees will have a relatively short duration in a job. Others may have a relatively long duration in a job. Therefore, the duration or tenure variable, T, has some distribution associated with it. Let f(t) be the probability distribution associated with T.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This is just the probability that a particular individual's duration in a job is less than t weeks. Conversely, the probability that someone survives at least t weeks in a job is S(t) = 1 - F(t). Not surprisingly, this is called a survival function. Putting the probability function and the survival function together, it is possible to estimate the probability that someone who has already lasted t weeks in a job, leaves before the next week is out. This is called the hazard function, h(t). If a particular individual's duration is T weeks, h(t)=Prob(t [is less than] T [is less than] t+1|T [is greater than] t), or h(t) is the probability that someone lasts t weeks in a job, but not t+1 weeks. Some straightforward manipulation shows that h(t)=f(t)/S(t).(1)

The simplest duration models treat the hazard function as a constant, h(t) = h. The probability of someone ending their employment by the end of the week is always the same, regardless of how many weeks they have already been employed. This necessarily means that the survival function is exponential, S(t) = [e.sup.-ht]. Since h(t)=f(t)/S(t), a constant hazard rate implies that f(t) = [he.sup.ht] and maximum likelihood techniques can be used to solve for h. Once h is estimated, the tenure probability function can be determined and the expected tenure calculated.

A more general form allows the hazard function to be either monotonically increasing or decreasing over time. The typical distribution used for this type of hazard function is the Weibull distribution. The Weibull is a commonly used distribution that meets duration model requirements and also creates a monotonically increasing or decreasing hazard function.(2) It implies that the hazard function takes the form h(t)= hp[(ht).sup.p-1]. If p in the Weibull distribution is positive, the probability of leaving a job by the end of the week increases as current tenure increases. If p is negative, the probability of leaving a job by the end of the week decreases as current tenure increases. It is readily apparent that the constant hazard function is a special case of the Weibull hazard function with p = 1. The Weibull hazard leads to the survival function, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Making tenure a function of age is a straightforward modification of the basic model.(3) If A represents age, then instead of treating h in the hazard function as a constant, it becomes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The hazard function becomes H(A)[e.sup.h(A)t] for the exponential distribution and h(A)p[{h(A)t}.sup.p-1] for the Weibull. This necessarily implies forms for the survival function, S(A,t) and the probability distribution function, f(A,t). The probability distribution f(A,t) gives the probability distribution for tenure as a function of age. Hence, estimating the probability distribution as a function of [[Beta].sub.0], [[Beta].sub.1] and the Weibull parameter p is at the heart of duration modeling. Maximum likelihood techniques can be used to solve for the parameters, [[Beta].sub.0], [[Beta].sub.1] and p. Censoring can be dealt with very easily in a duration model. It just alters the likelihood function. Fortunately, statistical programs such as LIMDEP are available for censored maximum likelihood estimation.

Once the parameters of the duration model have been calculated, it is straightforward to calculate the expected value of tenure, given age. For the exponential hazard function, the expected value of tenure, given age, is just

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For the Weibull hazard function, the expected value of tenure, given age, is

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Note that [Gamma] ((1/p)+1) is the gamma function evaluated at (1/p)+1 where p is the parameter estimated as part of the Weibull function.(4)

Duration models generally express tenure as an exponential function. This effectively restricts the dependent variable to being a positive number. This is desirable when estimating employment tenures. After all, people can only be employed for positive amounts of time. They cannot have negative job tenures. Hence this is a second advantage of duration modeling over linear modeling.

III. Data

The sample consists of 170 current and previous employees. Age is the employee's age at the date of hire. For uncensored observations, tenure is the number of weeks from the date of hire to the date of termination. For censored observations, tenure is the number of weeks from date of hire to date of sampling.

The average age at date of hire for the entire sample is 31 years old. For workers under 40, the average age at date of hire is 29. For workers 40 and over, the average age at date of hire is 45. For the pool of applicants that were discriminated against, that average age at date of application is 47.

The average tenure across the entire sample is almost 85 weeks. In this sample there are 127 non-censored and 43 censored observations. If the censored observations come from a different population than the uncensored observations, then using only the uncensored observations will lead to biased predictions. Sample statistics indicate that the populations may indeed be different. The mean tenure for the uncensored observations is 51 weeks. For the censored observations, the mean recorded tenure is 185 weeks. The two means are statistically different.(5, 6) Otherwise, the two samples are hard to distinguish. Workers 40 or over make up about 7% of each sample. There is no clear indicator that a worker 40 or over would have been more or less likely to be in the censored group than a worker under 40.

IV. Results

Table 1 shows results of duration model estimation. The first row shows results when the hazard function is exponential. In this case, the parameters of interest are [[Beta].sub.0] [[Beta].sub.1] [[Beta].sub.0] where [[Beta].sub.0] is a constant and [[Beta].sub.1] is the coefficient on age. The results show that the coefficient on age is negative and statistically significant at the 95% confidence level using a two-tailed test. Older workers have a shorter expected tenure than do younger workers.

The second row of Table 1 shows results when the Weibull distribution is used. In this case, the parameters of interest are [[Beta].sub.0], [[Beta].sub.1] and p, where, as before, [[Beta].sub.0] is a constant, [[Beta].sub.1] is the coefficient on age and p is the Weibull parameter. The results show that the coefficient on age is negative and statistically significant. Once again, older workers have a shorter expected tenure than do younger workers.

The exponential distribution results are just a special case of the Weibull distribution results with p assumed equal to one. The Weibull results show that the estimated value of p is 0.62 and statistically different than one. A 95% confidence interval on p ranges from 0.50 to 0.73. For this reason, it seems reasonable to prefer the Weibull to the exponential results.

Two measures can be developed to see how well the estimates fit the actual tenures for the uncensored observations.(7) The first is the root mean squared error, RMSE, measured as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The other is the mean proportionate error, MPE, measured as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Table 1 shows that when the exponential distribution is used for the hazard function, the RMSE is 87. The average error is about 87 weeks. When the exponential distribution is used, the MPE is 15, suggesting that the mean error is about 15 times greater than the associated tenure. When the Weibull distribution is used, the RMSE is 115 and the MPE is 20. Both of these measures suggest that there is quite a bit of error associated with the predictions.

The relatively high errors suggest that there may be other unmeasured variables that influence tenure as well. Theory suggests several possibilities. Unfortunately the economic analysis of damages in this particular case was initiated rather late, after discovery was already complete. This precluded including additional variables. If anything, this suggests the importance of starting economic analysis before discovery closes.

It is interesting to see the expected tenures that result from each of these models. Since the results indicate that tenure is an inverse function of age, expected tenure is calculated for individuals starting employment from ages 24 to 61.(8) Table 2 shows expected tenures using the formulas in the previously developed equations (1) and (2), and the parameter results in Table 1. Using the Weibull results, someone first employed at age 24 would have an expected tenure of 166 weeks. A 40-year-old would have an expected tenure of 129 weeks--about 77% of the 24-year-old's expected tenure. Tenure drops by about two weeks for every additional year. Someone that was first hired at age 50, for example, has an expected tenure of 110 weeks--just over two years. A 61-year-old has an expected tenure of 92 weeks, or 1.77 years.(9)

If workers starting at age 39 or younger are considered as a group, their sample average starting age is 29. A 29-year-old has an expected tenure of 153 weeks. The average age at application for workers 40 or older is 47. A 47-year-old would have an expected tenure of 115 weeks, about 25% less than the younger workers.

The exponential results lead to lower expected tenures. A person that was hired at age 24 has an expected tenure of 129 weeks. A 40-year-old's expected tenure drops to 70% of that, about 92 weeks. A 50-year-old has an expected tenure of 74 weeks. A 47-year-old has an expected tenure about 33% lower than a 29-year-old.

As a matter of curiosity, it is interesting to compare the previous results to results that are generated when two potential mistakes are made. The first error omits censored observations from the data. With this omission, the duration model is estimated with just the 127 uncensored observations. This assumes the censored and uncensored observations come from the same population. The second error includes all 170 observations. However, no adjustment is made in the likelihood function for censoring. All 170 observations are treated as if they are uncensored data.

Table 3 shows estimation results when the censored observations are omitted from the sample and when they are included but treated the same as uncensored observations. Both models are estimated assuming the Weibull distribution for the hazard function.(10) Both models show that tenure is a decreasing function of age. Both have a smaller intercept and a more negative slope than the Weibull results in Table 1 that account for censoring.

Notice that the error measures are somewhat smaller in Table 3 than Table 1. Given that the uncensored group has a mean tenure of 51 weeks and the censored sample has a mean tenure of 185 weeks, this reduction in error for the model estimated with just the 127 uncensored observations is understandable. In effect, regression on the subsample of uncensored observations fits a function through that subsample. However, because the observations have a much lower mean tenure than do the censored observations, adding the censored observations in effect forces an upward shift in the fitted line. This is reflected in the increase in the intercept and flattening of the slope between the duration results in Tables 1 and 3. The population model fits the sample as a whole better, but it fits the subsample of uncensored observations worse.

The relative errors when comparing the model that accounts for censoring to the model that treats all 170 observations as uncensored are reasonable as well. Censoring changes the likelihood function. Fitting a model to censored data is like setting a minimum boundary for censored tenures at their observed values. Since the censored observations have longer tenures than the uncensored observations, it raises the function. But this means that it fits the 127 uncensored observations with relatively shorter tenures less well.

Table 4 shows the expected tenures from the models for workers age 40 and older. Column (1) repeats the information from Table 2. These are the expected tenures using the Weibull distribution when accounting for censoring. Column (2) shows expected tenures when parameters are calculated excluding censored observations. Column (3) shows expected tenures when parameters are calculated using all 170 observations but not accounting for censoring. It is clear from Table 4 that tenures are grossly underestimated when censoring is not accounted for. Hence, it is important that correct statistical methods be used to estimate the age-tenure relationship.

V. Conclusion

In a recent Equal Employment Opportunity Commission class action suit, a firm was found guilty of age discrimination against workers age 40 and over. Estimating damages required estimating the period of time each member of the class would have worked had that member been hired. The employer alleged that the job was physically demanding and that if the firm had hired workers age 40 or over, those workers would have had job tenures significantly shorter than those of younger workers.

A sample of previous and current employees is developed showing the age and tenure of each employee. The employees in the sample fall into two groups. One group consists of people who had been employed previously but left before the sampling occurred. Individuals in this group have known tenures on the job. The second group consists of people that still remain at the time of the sampling. For this group, the starting date is known, but the ending date is unknown. This is a classic censoring problem.

Censoring is addressed within a duration model. Duration models estimate the underlying function that determines the probability that someone will leave a job after a certain amount of time, given they have been employed up to that point in time. For this particular application, the probability is hypothesized to be a function of age. Maximum likelihood techniques are used to estimate the parameters of the function when there are censored and uncensored observations in the data. Results indicate that tenure is an inverse function of age. The expected tenure of a worker that starts at age 40 would be about 23% lower than the expected tenure of a worker that starts at age 24. Additional results indicate that significant mis-estimating of the tenure-age relationship can occur if censoring is not accounted for properly. Hence it is important that theoretically and empirically justified methods be used to estimate expected tenures.

(1) For the appropriate manipulation, and for a description of duration modeling in general, see Greene (1997), pp. 986-988. For applications of duration models to employment, see Taylor (1999) or Dolton and van der Klaauw (1995).

(2) For more specifics on the Weibull distribution and its use in duration modeling, see Kalbfleisch and Prentice (1980) or Cox and Oakes (1985).

(3) The model is developed and results are shown with tenure a linear function of age. A quadratic form was tried with tenure a function of age and age squared. The squared term was statistically insignificant and tests indicate that the results were not statistically different from results using just a linear term.

(4) With Microsoft's EXCEL program, the function GAMMALN(x) returns the natural log of the gamma function evaluated at any value x. The command exp(GAMMALN((1/p)+1)) returns the desired value of the gamma function for any estimated value of p.

(5) Since the tenures for the censored group are underestimates, their true mean will be greater than 185 weeks.

(6) Testing for the equivalence of means when the samples are small is called the "Behrens-Fisher" problem. Nayak and Gastwirth (1997) show a statistical test to address this problem. For another small sample application to statistical analysis in employment litigation, see Piette and White (1999).

(7) Errors are estimated for only the uncensored observations, as tenures for censored observations are truncated at the time of sampling.

(8) The ages of 24 and 61 form the bounds of ages in the raw data used for the estimation.

(9) The actual employment data used to generate these results show that one person was hired at age 59 and worked for 138 weeks (2.6 years). This employee was still working at the time of sampling. The oldest employee in the sample was hired at age 61 and worked 90 weeks before leaving. This is just two weeks less than the expected tenure of a 61-year-old.

References

Cox, D., and D. Oakes, Analysis of Survival Data, New York: Chapman and Hall, 1985.

Dolton, P., and W. van der Klaauw, "Leaving Teaching in the UK: A Duration Analysis," 1995, The Economic Journal, 105, 431-444.

Greene, William H., Econometric Analysis, 3rd Edition, Upper Saddle River, NJ: Prentice-Hall, Inc., 1997.

Kalbfleisch, J., and R. Prentice, The Statistical Analysis of Failure Time Data, New York: John Wiley and Sons, 1990.

Nayak, Tapan K., and Joseph L. Gastwirth, "The Peters-Belson Approach to Measures of Economic and Legal Discrimination," in Norman L. Johnson and N. Balakrishnan, editors, Advances in the Theory and Practice of Statistics, New York: John Wiley and Sons, 1997.

Piette, Michael J., and Paul F. White, "Approaches for Dealing with Small Sample Sizes in Employment Discrimination Litigation," 1999, Journal of Forensic Economics, 12(1), 43-56.

Taylor, Mark P., "Survival of the Fittest? An Analysis of Self-Employment Duration in Britain," The Economic Journal, 1999, 109, C140-C155.

David I. Rosenbaum, Professor of Economics, Department of Economics, University of Nebraska-Lincoln, Lincoln, NE, drosenbaum@unl.edu.

In a recent Equal Employment Opportunity Commission class action suit, a company was charged with discriminating against applicants age 40 and over. During the first phase of the trial the company was found guilty of age discrimination. The second phase of the trial involved determining damages to be awarded to a class of approximately 152 members age 40 or over. Part of the damage calculation required estimating the amount of time members of the class would have been employed, absent the discrimination.

The employer in this case argued that there was a link between age and job tenure, and that if older workers had been hired, the company's experience suggests that they would not have been employed as long as the pool of current younger workers. Therefore, in calculating damages, it is important to examine the tenure/age relationship.

Information about current and previous employees is used to predict the relationship between age and tenure. For previous employees, company records indicate the date of hire and the date of termination. For those employees tenure is calculated as the number of weeks from hire to termination. For current employees, however, the calculation is not as straightforward. The date of hire is known. But, since each is employed at the time of sampling, termination has to occur at some unknown time in the future. Therefore, each current employee's true job tenure is underestimated, starting at the date of hire and ending at the date of sampling. This is a classic censoring problem.

One way to address the censoring problem may be to remove the censored observations from the sample. This would leave a subsample of just uncensored observations, each with known job tenure. However, if the censored observations come from a different population than the uncensored observations, using only the uncensored observations for statistical analysis will lead to biased predictions. This means that a statistical procedure will have to account for censoring. Fortunately, a procedure exists to estimate job tenures when censoring is an issue. This procedure is called duration modeling.

A duration model is developed that accounts for censored data. The model allows for specifying tenure as a function of age. The model is estimated for a sample of 170 current and previous employees. The results indicate that tenure is decreasing in age. Someone starting employment at age 24, for example, would have an expected job tenure of 166 weeks. Someone starting employment at age 40 would have an expected tenure of 129 weeks.

The next section of this paper describes duration modeling. This is a statistical technique that can be used to estimate tenure as a function of age when censoring occurs. Section III describes the data related to this particular case. Results of the estimation are shown in section IV. These results are compared to results generated either by omitting censored observations or by using all observations but not accounting for censoring. This is followed by a conclusion.

II. Duration Estimation with a Censored Sample

Duration models can be used to estimate tenure when there is a censoring problem. The simplest form of the duration model makes tenure solely a function of time. Let T represent someone's tenure or duration in a job. Some employees will have a relatively short duration in a job. Others may have a relatively long duration in a job. Therefore, the duration or tenure variable, T, has some distribution associated with it. Let f(t) be the probability distribution associated with T.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This is just the probability that a particular individual's duration in a job is less than t weeks. Conversely, the probability that someone survives at least t weeks in a job is S(t) = 1 - F(t). Not surprisingly, this is called a survival function. Putting the probability function and the survival function together, it is possible to estimate the probability that someone who has already lasted t weeks in a job, leaves before the next week is out. This is called the hazard function, h(t). If a particular individual's duration is T weeks, h(t)=Prob(t [is less than] T [is less than] t+1|T [is greater than] t), or h(t) is the probability that someone lasts t weeks in a job, but not t+1 weeks. Some straightforward manipulation shows that h(t)=f(t)/S(t).(1)

The simplest duration models treat the hazard function as a constant, h(t) = h. The probability of someone ending their employment by the end of the week is always the same, regardless of how many weeks they have already been employed. This necessarily means that the survival function is exponential, S(t) = [e.sup.-ht]. Since h(t)=f(t)/S(t), a constant hazard rate implies that f(t) = [he.sup.ht] and maximum likelihood techniques can be used to solve for h. Once h is estimated, the tenure probability function can be determined and the expected tenure calculated.

A more general form allows the hazard function to be either monotonically increasing or decreasing over time. The typical distribution used for this type of hazard function is the Weibull distribution. The Weibull is a commonly used distribution that meets duration model requirements and also creates a monotonically increasing or decreasing hazard function.(2) It implies that the hazard function takes the form h(t)= hp[(ht).sup.p-1]. If p in the Weibull distribution is positive, the probability of leaving a job by the end of the week increases as current tenure increases. If p is negative, the probability of leaving a job by the end of the week decreases as current tenure increases. It is readily apparent that the constant hazard function is a special case of the Weibull hazard function with p = 1. The Weibull hazard leads to the survival function, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Making tenure a function of age is a straightforward modification of the basic model.(3) If A represents age, then instead of treating h in the hazard function as a constant, it becomes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The hazard function becomes H(A)[e.sup.h(A)t] for the exponential distribution and h(A)p[{h(A)t}.sup.p-1] for the Weibull. This necessarily implies forms for the survival function, S(A,t) and the probability distribution function, f(A,t). The probability distribution f(A,t) gives the probability distribution for tenure as a function of age. Hence, estimating the probability distribution as a function of [[Beta].sub.0], [[Beta].sub.1] and the Weibull parameter p is at the heart of duration modeling. Maximum likelihood techniques can be used to solve for the parameters, [[Beta].sub.0], [[Beta].sub.1] and p. Censoring can be dealt with very easily in a duration model. It just alters the likelihood function. Fortunately, statistical programs such as LIMDEP are available for censored maximum likelihood estimation.

Once the parameters of the duration model have been calculated, it is straightforward to calculate the expected value of tenure, given age. For the exponential hazard function, the expected value of tenure, given age, is just

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For the Weibull hazard function, the expected value of tenure, given age, is

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Note that [Gamma] ((1/p)+1) is the gamma function evaluated at (1/p)+1 where p is the parameter estimated as part of the Weibull function.(4)

Duration models generally express tenure as an exponential function. This effectively restricts the dependent variable to being a positive number. This is desirable when estimating employment tenures. After all, people can only be employed for positive amounts of time. They cannot have negative job tenures. Hence this is a second advantage of duration modeling over linear modeling.

III. Data

The sample consists of 170 current and previous employees. Age is the employee's age at the date of hire. For uncensored observations, tenure is the number of weeks from the date of hire to the date of termination. For censored observations, tenure is the number of weeks from date of hire to date of sampling.

The average age at date of hire for the entire sample is 31 years old. For workers under 40, the average age at date of hire is 29. For workers 40 and over, the average age at date of hire is 45. For the pool of applicants that were discriminated against, that average age at date of application is 47.

The average tenure across the entire sample is almost 85 weeks. In this sample there are 127 non-censored and 43 censored observations. If the censored observations come from a different population than the uncensored observations, then using only the uncensored observations will lead to biased predictions. Sample statistics indicate that the populations may indeed be different. The mean tenure for the uncensored observations is 51 weeks. For the censored observations, the mean recorded tenure is 185 weeks. The two means are statistically different.(5, 6) Otherwise, the two samples are hard to distinguish. Workers 40 or over make up about 7% of each sample. There is no clear indicator that a worker 40 or over would have been more or less likely to be in the censored group than a worker under 40.

IV. Results

Table 1 shows results of duration model estimation. The first row shows results when the hazard function is exponential. In this case, the parameters of interest are [[Beta].sub.0] [[Beta].sub.1] [[Beta].sub.0] where [[Beta].sub.0] is a constant and [[Beta].sub.1] is the coefficient on age. The results show that the coefficient on age is negative and statistically significant at the 95% confidence level using a two-tailed test. Older workers have a shorter expected tenure than do younger workers.

Table 1 Estimation of Tenure as a Function of Age Using Censored Sample Techniques Measure of Errors in Prediction Root Mean Mean Squared Proportionate Error (MSE) Error (MPE) [MATHEMATICAL [MATHEMATICAL EXPRESSSION EXPRESSSION NOT NOT Predictor Formula for REPRODUCIBLE REPRODUCIBLE Predictor IN ASCII] IN ASCII] Constant = 5.36(*) Duration Model (0.286) using Exponential 87 15 Distribution Age =-0.021(*) (0.009) Constant = 5.13(*) (0.627) Duration Model Age = -0.016(*) 115 20 using Weibull (0.002) Distribution p = 0.62(+) (0.057) (*) = Statistically different from zero at the 95% confidence level using a two-tailed test. (+) = Statistically different from one at the 95% confidence level using a one-tailed test.

The second row of Table 1 shows results when the Weibull distribution is used. In this case, the parameters of interest are [[Beta].sub.0], [[Beta].sub.1] and p, where, as before, [[Beta].sub.0] is a constant, [[Beta].sub.1] is the coefficient on age and p is the Weibull parameter. The results show that the coefficient on age is negative and statistically significant. Once again, older workers have a shorter expected tenure than do younger workers.

The exponential distribution results are just a special case of the Weibull distribution results with p assumed equal to one. The Weibull results show that the estimated value of p is 0.62 and statistically different than one. A 95% confidence interval on p ranges from 0.50 to 0.73. For this reason, it seems reasonable to prefer the Weibull to the exponential results.

Two measures can be developed to see how well the estimates fit the actual tenures for the uncensored observations.(7) The first is the root mean squared error, RMSE, measured as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The other is the mean proportionate error, MPE, measured as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Table 1 shows that when the exponential distribution is used for the hazard function, the RMSE is 87. The average error is about 87 weeks. When the exponential distribution is used, the MPE is 15, suggesting that the mean error is about 15 times greater than the associated tenure. When the Weibull distribution is used, the RMSE is 115 and the MPE is 20. Both of these measures suggest that there is quite a bit of error associated with the predictions.

The relatively high errors suggest that there may be other unmeasured variables that influence tenure as well. Theory suggests several possibilities. Unfortunately the economic analysis of damages in this particular case was initiated rather late, after discovery was already complete. This precluded including additional variables. If anything, this suggests the importance of starting economic analysis before discovery closes.

It is interesting to see the expected tenures that result from each of these models. Since the results indicate that tenure is an inverse function of age, expected tenure is calculated for individuals starting employment from ages 24 to 61.(8) Table 2 shows expected tenures using the formulas in the previously developed equations (1) and (2), and the parameter results in Table 1. Using the Weibull results, someone first employed at age 24 would have an expected tenure of 166 weeks. A 40-year-old would have an expected tenure of 129 weeks--about 77% of the 24-year-old's expected tenure. Tenure drops by about two weeks for every additional year. Someone that was first hired at age 50, for example, has an expected tenure of 110 weeks--just over two years. A 61-year-old has an expected tenure of 92 weeks, or 1.77 years.(9)

Table 2 Expected Job Tenures by Age Expected Job Tenure Expected Job Tenure Age Using Weibull Distribution Using Exponential Distribution 24 166 129 25 164 126 26 161 123 27 158 121 28 156 118 29 153 116 30 151 113 31 149 111 32 146 109 33 144 106 34 142 104 35 139 102 36 137 100 37 135 98 38 133 96 39 131 94 40 129 92 41 127 90 42 125 88 43 123 86 44 121 84 45 119 83 46 117 81 47 115 79 48 113 78 49 111 76 50 110 74 51 108 73 52 106 71 53 104 70 54 103 68 55 101 67 56 100 66 57 98 64 58 96 63 59 95 62 60 93 60 61 92 59

If workers starting at age 39 or younger are considered as a group, their sample average starting age is 29. A 29-year-old has an expected tenure of 153 weeks. The average age at application for workers 40 or older is 47. A 47-year-old would have an expected tenure of 115 weeks, about 25% less than the younger workers.

The exponential results lead to lower expected tenures. A person that was hired at age 24 has an expected tenure of 129 weeks. A 40-year-old's expected tenure drops to 70% of that, about 92 weeks. A 50-year-old has an expected tenure of 74 weeks. A 47-year-old has an expected tenure about 33% lower than a 29-year-old.

As a matter of curiosity, it is interesting to compare the previous results to results that are generated when two potential mistakes are made. The first error omits censored observations from the data. With this omission, the duration model is estimated with just the 127 uncensored observations. This assumes the censored and uncensored observations come from the same population. The second error includes all 170 observations. However, no adjustment is made in the likelihood function for censoring. All 170 observations are treated as if they are uncensored data.

Table 3 shows estimation results when the censored observations are omitted from the sample and when they are included but treated the same as uncensored observations. Both models are estimated assuming the Weibull distribution for the hazard function.(10) Both models show that tenure is a decreasing function of age. Both have a smaller intercept and a more negative slope than the Weibull results in Table 1 that account for censoring.

Table 3 Estimation of Tenure as a Function of Age Using Censored Sample Techniques Measure of Errors in Prediction Root Mean Squared Mean Error Proportionate (MSE) Error (MPE) [MATHEMATICAL [MATHEMATICAL EXPRESSION NOT EXPRESSION NOT Predictor Formula for REPRODUCIBLE REPRODUCIBLE Predictor IN ASCII] IN ASCII] Constant = 4.53(**) (0.501) Duration Model using just Age = -0.025(*) 62 7 the 127 (0.016) Uncensored Observations p = 0.77(+) (0.058) Constant = 5.06(**) (0.447) Duration Model Age = -0.026(**) 70 11 using all 170 (0.014) Observations not but Accounting for p = 0.77(+) Censoring (0.057) (**) = Statistically different from zero at the 95% confidence level using a two-tailed test. (*) = Statistically different from zero at the 90% confidence level using a two-tailed test. (+) = Statistically different from one at the 95% confidence level using a one-tailed test.

Notice that the error measures are somewhat smaller in Table 3 than Table 1. Given that the uncensored group has a mean tenure of 51 weeks and the censored sample has a mean tenure of 185 weeks, this reduction in error for the model estimated with just the 127 uncensored observations is understandable. In effect, regression on the subsample of uncensored observations fits a function through that subsample. However, because the observations have a much lower mean tenure than do the censored observations, adding the censored observations in effect forces an upward shift in the fitted line. This is reflected in the increase in the intercept and flattening of the slope between the duration results in Tables 1 and 3. The population model fits the sample as a whole better, but it fits the subsample of uncensored observations worse.

The relative errors when comparing the model that accounts for censoring to the model that treats all 170 observations as uncensored are reasonable as well. Censoring changes the likelihood function. Fitting a model to censored data is like setting a minimum boundary for censored tenures at their observed values. Since the censored observations have longer tenures than the uncensored observations, it raises the function. But this means that it fits the 127 uncensored observations with relatively shorter tenures less well.

Table 4 shows the expected tenures from the models for workers age 40 and older. Column (1) repeats the information from Table 2. These are the expected tenures using the Weibull distribution when accounting for censoring. Column (2) shows expected tenures when parameters are calculated excluding censored observations. Column (3) shows expected tenures when parameters are calculated using all 170 observations but not accounting for censoring. It is clear from Table 4 that tenures are grossly underestimated when censoring is not accounted for. Hence, it is important that correct statistical methods be used to estimate the age-tenure relationship.

Table 4 Expected Tenure by Age (1) (2) (3) Treating All 170 Duration Model With 127 Uncensored Observations As AGE Censoring Observations Uncensored Data 40 129 40 65 41 127 39 63 42 125 38 62 43 123 37 60 44 121 36 59 45 119 35 57 46 117 34 56 47 115 33 54 48 113 33 53 49 111 32 51 50 110 31 50 51 108 30 49 52 106 29 48 53 104 29 46 54 103 28 45 55 101 27 44 56 100 27 43 57 98 26 42 58 96 25 41 59 95 25 40 60 93 24 39 61 92 24 38

V. Conclusion

In a recent Equal Employment Opportunity Commission class action suit, a firm was found guilty of age discrimination against workers age 40 and over. Estimating damages required estimating the period of time each member of the class would have worked had that member been hired. The employer alleged that the job was physically demanding and that if the firm had hired workers age 40 or over, those workers would have had job tenures significantly shorter than those of younger workers.

A sample of previous and current employees is developed showing the age and tenure of each employee. The employees in the sample fall into two groups. One group consists of people who had been employed previously but left before the sampling occurred. Individuals in this group have known tenures on the job. The second group consists of people that still remain at the time of the sampling. For this group, the starting date is known, but the ending date is unknown. This is a classic censoring problem.

Censoring is addressed within a duration model. Duration models estimate the underlying function that determines the probability that someone will leave a job after a certain amount of time, given they have been employed up to that point in time. For this particular application, the probability is hypothesized to be a function of age. Maximum likelihood techniques are used to estimate the parameters of the function when there are censored and uncensored observations in the data. Results indicate that tenure is an inverse function of age. The expected tenure of a worker that starts at age 40 would be about 23% lower than the expected tenure of a worker that starts at age 24. Additional results indicate that significant mis-estimating of the tenure-age relationship can occur if censoring is not accounted for properly. Hence it is important that theoretically and empirically justified methods be used to estimate expected tenures.

(1) For the appropriate manipulation, and for a description of duration modeling in general, see Greene (1997), pp. 986-988. For applications of duration models to employment, see Taylor (1999) or Dolton and van der Klaauw (1995).

(2) For more specifics on the Weibull distribution and its use in duration modeling, see Kalbfleisch and Prentice (1980) or Cox and Oakes (1985).

(3) The model is developed and results are shown with tenure a linear function of age. A quadratic form was tried with tenure a function of age and age squared. The squared term was statistically insignificant and tests indicate that the results were not statistically different from results using just a linear term.

(4) With Microsoft's EXCEL program, the function GAMMALN(x) returns the natural log of the gamma function evaluated at any value x. The command exp(GAMMALN((1/p)+1)) returns the desired value of the gamma function for any estimated value of p.

(5) Since the tenures for the censored group are underestimates, their true mean will be greater than 185 weeks.

(6) Testing for the equivalence of means when the samples are small is called the "Behrens-Fisher" problem. Nayak and Gastwirth (1997) show a statistical test to address this problem. For another small sample application to statistical analysis in employment litigation, see Piette and White (1999).

(7) Errors are estimated for only the uncensored observations, as tenures for censored observations are truncated at the time of sampling.

(8) The ages of 24 and 61 form the bounds of ages in the raw data used for the estimation.

(9) The actual employment data used to generate these results show that one person was hired at age 59 and worked for 138 weeks (2.6 years). This employee was still working at the time of sampling. The oldest employee in the sample was hired at age 61 and worked 90 weeks before leaving. This is just two weeks less than the expected tenure of a 61-year-old.

References

Cox, D., and D. Oakes, Analysis of Survival Data, New York: Chapman and Hall, 1985.

Dolton, P., and W. van der Klaauw, "Leaving Teaching in the UK: A Duration Analysis," 1995, The Economic Journal, 105, 431-444.

Greene, William H., Econometric Analysis, 3rd Edition, Upper Saddle River, NJ: Prentice-Hall, Inc., 1997.

Kalbfleisch, J., and R. Prentice, The Statistical Analysis of Failure Time Data, New York: John Wiley and Sons, 1990.

Nayak, Tapan K., and Joseph L. Gastwirth, "The Peters-Belson Approach to Measures of Economic and Legal Discrimination," in Norman L. Johnson and N. Balakrishnan, editors, Advances in the Theory and Practice of Statistics, New York: John Wiley and Sons, 1997.

Piette, Michael J., and Paul F. White, "Approaches for Dealing with Small Sample Sizes in Employment Discrimination Litigation," 1999, Journal of Forensic Economics, 12(1), 43-56.

Taylor, Mark P., "Survival of the Fittest? An Analysis of Self-Employment Duration in Britain," The Economic Journal, 1999, 109, C140-C155.

David I. Rosenbaum, Professor of Economics, Department of Economics, University of Nebraska-Lincoln, Lincoln, NE, drosenbaum@unl.edu.

Printer friendly Cite/link Email Feedback | |

Author: | Rosenbaum, David I. |
---|---|

Publication: | Journal of Forensic Economics |

Geographic Code: | 1USA |

Date: | Sep 22, 2000 |

Words: | 4021 |

Previous Article: | Worklife expectancies: increment-decrement less accurate than conventional. |

Next Article: | Patton-Nelson Personal Consumption Tables 1997-98 update. |

Topics: |