Measurement error in job evaluation and the gender wage gap.
Wage equations are widely used to estimate the size of the pay gap between male- and female-dominated jobs. When jobs are the unit of observation, job wage rates are regressed on job attributes (such as skill, effort, responsibility, and working conditions) and on the percent of female incumbents in the job. The coefficient on the percent female variable estimates the size of the gender pay gap, which is often taken as a measure of pay discrimination against women.(1) These estimates frequently become the metric for upward adjustment of female-dominated job pay to redress the discrimination. Summarizing the research which uses jobs as the unit of observation, Ehrenberg  concluded that female-dominated jobs are underpaid by 15%-34% relative to male-dominated jobs. Similar findings have been reported in studies which use individuals instead of jobs as the unit of observation.
Many criticisms have been raised of these two types of studies. One common criticism is that not all relevant control variables for job attributes or individual productivity may be measured (or even measurable) so that specification error arises. If these omitted variables are correlated with the gender variable, then the estimated gender pay gap incorporates legitimate differences in productivity and/or job attributes. However, few studies have analyzed the impact of omitted factors bias empirically.
Another common criticism is that there is measurement error in the control variables which in turn leads to biases in estimating the size of the pay gap. Measured job attributes are highly subjective in nature, making measurement error a major problem. Even human capital measures of education and training are subject to error. Although one may easily measure years of completed education and experience, it is much more difficult to measure years of effective learning or achievement. Only a few studies have analyzed the size of these biases empirically.
This paper provides the first estimates of the size of this measurement error bias in the context of job level (job attribute) wage equations. To our knowledge, only two previous studies have estimated measurement error bias using individual level, human capital wage equations, and both are tangential to the major question that we address.
In particular, we find that the estimated gender pay gap is overstated by 34%-44% when measurement error bias is ignored in job level wage equation analysis. Wage adjustments to correct the gender gap in pay which adhere strictly to such biased estimates may lead to millions of dollars in over-compensation. In the future, job evaluation should correct for this bias and we suggest appropriate procedures. In addition, we provide unique evidence that previous attempts to add measures of seemingly "omitted" job attributes need not have reduced the bias in the measured pay gap. We find that the number of job attributes which enter the wage regression is artificially increased because measurement errors reduce the measured collinearity among job factors.
The next section explains factor-point pay plans and summarizes the few related studies. Section III reviews the statistical theory of regression analysis when there is measurement error in the explanatory variables. We analyze the effect of independent and correlated measurement errors between job factors on the estimated magnitude of pay discrimination. Section IV discusses the data and reports the empirical results from implementing various measurement error corrections. The final section summarizes our findings and suggests how this study could be used to improve future factor-point pay analysis.
II. BACKGROUND AND RELATED RESEARCH
More than half of the workers in the United States are paid according to a pay system at least partly based on job evaluation. These systems establish hierarchies of jobs and pay levels using information on a common set of job attributes such as skill, effort, responsibility, and working conditions. Most frequently, a factor-point system is used. For each job, points are assigned to each job factor. Higher levels of compensable factors receive higher points. Weights are assigned to each factor to reflect its relative importance to the employer. The sum of the weighted job factor points creates a ranking of jobs by which relative pay is set.(2) Such plans are widely used in both the private and public sector.
A comparable worth pay plan is a factor-point system which sets pay according to the relative value of jobs to the firm without explicitly incorporating market wage information. Market wages are ignored because comparable worth advocates view them as distorted by gender discrimination. Since factor-point systems claim to establish the relative value of jobs without the direct influence of potentially discriminatory market wages, they have become the leading mechanisms for implementing comparable worth pay plans. A set of job factors (attributes) are defined, each having a set of levels and factor points. Teams of evaluators are given the task of assigning points to each job for each factor. Typically, each team is given information about the jobs (including job descriptions) and asked to reach an agreement on assigned points in a process that is inherently very subjective. Sometimes only one team evaluates each job. However, if two or more teams independently evaluate a set of jobs, then measurement error bias may be evaluated and corrected as emphasized in this paper.
After all jobs have been evaluated and factor points have been assigned, wage equations and the size of the gender pay gap can be estimated. The typical procedure is to regress job pay (y) on a vector of job factors (X) and the percentage of female incumbents (f) in a job classification. The regression is of the form y = X [Beta] + f[Gamma] + v.(3) A finding that y is negative is interpreted to imply that the firm's pay structure undercompensates predominantly female jobs relative to male dominated jobs. If [Gamma] = 0, then pay is determined entirely by the vector of job attributes (X). Therefore, if [Gamma] = 0, a gender neutral pay structure can be created by setting pay equal to X [Beta] or, equivalently, setting pay equal to [y - f[Gamma]]. For example, such a policy would leave pay for 100% male jobs unchanged but would raise pay for 100% female jobs by -100 [Gamma].(4)
The consensus of numerous studies is that, at least in the public sector where most research has focused, [Gamma] [less than] 0 so that female-dominated jobs are underpaid.(5) Although pay has seldom been adjusted by the full amount (-f [Gamma]) in practice due to compromises made as part of the associated political process, the adjustment costs can be substantial.(6) Hence it is important that the estimates be as accurate as possible. As a consequence, the possibility of measurement error and specification error create major concern.
The problems of measurement error are well-recognized in the literature, with several papers questioning the validity and reliability of job factor measurements.(7) Nevertheless, England [1992, 213] argued that presumed random errors "cannot explain the systematic tendency for job evaluation studies to find that female jobs have lower pay lines." However, such views are necessarily speculative because no studies have explicitly corrected for the measurement error problem in the context of job evaluation or comparable worth. As a consequence, the magnitude of the bias has not been established.
A few studies have examined the effect of measurement error on measured discrimination in other contexts. In an early paper, Hashimoto and Kochin  pointed out that human capital model estimates of race discrimination are biased if, as is likely, the reported levels of non-white schooling are greater than the actual levels of attained knowledge. They illustrated the problem by reestimating a human capital based earnings function using grouped data. They found that measurement error overstates the size of racial pay discrimination. However, they caution that the magnitude of the overstatement is uncertain due to data limitations.
More recently, Rapaport  concluded that measurement error in survey responses concerning years of work experience may explain most of the pay gap between male and female teachers in two California school districts. She uses survey responses concerning educational attainment and work experience to predict the level of actual earnings of individual teachers paid according to a standard pay plan. Although there should be little room for sex or racial discrimination in such "impersonal" pay plans, conventionally estimated wage equations imply that females and blacks are paid less. Through a process of elimination, Rapaport concludes that this finding is likely due to measurement error, although she can't rule out the possibility of discrimination. Unlike Rapaport's study, we explicitly observe and control for measurement error in the control variables. Our study also differs in that we use job level data and control for job evaluation factors, which are likely to be more prone to measurement error than is education and experience. Like Rapaport, we show that the potential impact of measurement error can be substantial.
Disagreement on the methodology of pay analysis has not been restricted to issues of measurement error. Job evaluation analysts have also disagreed about which or how many job factors should be included. The presumption has been that adding some control for potentially important job attributes, even if subject to measurement error, will yield better estimates of [Gamma] than would result from ignoring these job attributes.(8) Our paper shows that adding noisy estimates of job factors may increase rather than decrease the bias in [Gamma].
III. SUMMARY OF STATISTICAL THEORY
A. Measurement Error Biases the Coefficient on Percent Female
To illustrate how measurement error in job factors can affect the coefficient on percent female incumbents, consider the linear regression model:(9)
(1) [y.sub.j] = [[Beta].sub.0] + [[Beta].sub.1][x.sub.1j] + [Gamma] [x.sub.2j] + [e.sub.j] j = 1, 2, . . ., n
[X.sub.1j] = [x.sub.1j] + [u.sub.1j] j = 1, 2, . . ., n
where [[Beta].sub.0], [[Beta].sub.1] and [Gamma] designate true parameters and [e.sub.j] is a random error with mean zero and variance [[Sigma].sub.e]. Lower-case variables represent true variables and upper-case variables are observed values. In equation (1), [X.sub.1j] measures [x.sub.1j] with error [u.sub.1j], where [u.sub.1j] [similar to] NI(0, [[Sigma].sub.u]). The variables [x.sub.2j] and [y.sub.j] are measured without error.(10)
Assuming [x.sub.ij], [u.sub.1j], and [e.sub.j] are mutually uncorrelated for i = 1, 2, model (1) can be rewritten as
(2) [y.sub.j] = [[Beta].sub.0] + [[Beta].sub.1]([X.sub.1j] - [u.sub.1j]) + [Gamma] [x.sub.2j] + [e.sub.j]
= [[Beta].sub.0] + [[Beta].sub.1] [X.sub.1j] + [Gamma] [x.sub.2j] + [v.sub.j]
j = 1, 2, . . ., n
where [v.sub.j] = [e.sub.j] - [[Beta].sub.1][u.sub.1j]. Note that in model (2), [X.sub.1j] and [v.sub.j] are not independent. The Ordinary Least squares (OLS) estimator is [(X[prime]X).sup.-1] (X[prime]y), where X = (l, [X.sub.1], [x.sub.2]), and y, l, [X.sub.1], and [x.sub.2] are all nx1 vectors.
The least squares estimator is inconsistent. It can be shown(11) that in the limit, the bias approaches
(3) [Mathematical Expression Omitted]
where [a.sub.ij] is the ijth element of the matrix plim [[(X[prime]X)/n].sup.-1] and [[Sigma].sub.u1] is the measurement error variance for [X.sub.1]. From equation (3) one can see that, in general, every parameter estimated by OLS will be biased, whether or not the variable is estimated with error. In particular, the regression coefficient on the explanatory variable which is measured without error, [Mathematical Expression Omitted], is also biased by the presence of measurement error in [X.sub.1j]. Because [a.sub.22] [greater than] 0, and [[Sigma].sub.u1] [greater than] 0, [Mathematical Expression Omitted] is biased toward zero if [[Beta].sub.1] [greater than] 0. The direction of bias for [Mathematical Expression Omitted] depends on the sign of [a.sub.23] and cannot be established in general.
The implications of these results for pay analysis are clear. Random measurement error in a job factor (i.e. [x.sub.1j]) will bias the coefficient on percent female incumbents (i.e. [x.sub.2j]), provided [a.sub.23] [not equal to] 0, and provided the true job characteristic affects pay ([[Beta].sub.1] [not equal to] 0). The bias exists, even though female incumbency is observable without error. However, even in this simplest of cases with only one job factor, the direction of bias for [Gamma] is ambiguous.
B. Measurement Error in Job Factors Reduces Correlation Between Factors
The true population squared correlation between [x.sub.1j] and [x.sub.2j] is defined by
[([R.sub.x1x2]).sup.2] = [([[Sigma].sub.x1x2]).sup.2] / ([[Sigma].sub.x1][[Sigma].sub.x2]),
while the observed population squared correlation between [x.sub.1j] and [x.sub.2t] is
[Mathematical Expression Omitted],
where [[Kappa].sub.x1] = ([[Sigma].sub.x1] / [[Sigma].sub.x1]) is the reliability ratio for [X.sub.1]. Because 0 [less than] [[Kappa].sub.x1] [less than] 1, measurement error causes the observed correlation, [R.sub.x1x2], to be less than the true correlation [R.sub.x1x2]. Similarly, if both explanatory variables are measured with mutually independent random errors, the observed population squared correlation between [X.sub.1j] and [X.sub.2j] is
(4) [Mathematical Expression Omitted].
The existence of random measurement error means that observed correlation among factors is less than the true correlation among factors. This can lead to more job factors being used in the pay analysis than would be possible were the job factors measured without error.
For example, suppose that only one job factor entered the true model of wages, but there are two measures of this job factor. If the two measures were perfectly correlated, only one could be used in the regression. Inclusion of both would lead to a singular moment matrix. However, if both measures included random errors, the two factors would no longer be perfectly correlated and both could be used as regressors. The measurement error provides an artificial identification of the ordinary least squares coefficients, and artificially increases the dimensionality of the empirical model from one to two.(12)
C. Generalization to Many Job Factors Measured with Error
Let us now consider the specific linear regression model used in this paper.
(5) [y.sub.j] = [[Beta].sub.0] + [[Beta].sub.1] [x.sub.1j] + [[Beta].sub.2] [x.sub.2j] + . . . + [[Beta].sub.k] [x.sub.kj] + [Gamma] [f.sub.j] + [e.sub.j] j = 1, 2, . . ., n
(6) [X.sub.ij] = [x.sub.ij] + [u.sub.ij] i = 1, 2, . . ., k j = 1, 2, . . ., n
where [y.sub.j] is pay in [j.sup.th] job classification (measured without error); [X.sub.ij] is "measured" [i.sup.th] evaluation factor in the jth job classification; [x.sub.ij] is "true" [i.sup.th] evaluation factor in the jth job classification; [u.sub.ij] is measurement error with [u.sub.ij] [similar to] NI(0, [[Sigma].sub.ui]); [f.sub.j] is the percentage of female incumbents in the jth job classification (measured without error); and [e.sub.j] is a random disturbance term with [e.sub.j] [similar to] (0, [[Sigma].sub.e]).
Combining equations (5) and (6), one obtains
(7) [y.sub.j] = [[Beta].sub.0] + [[Beta].sub.1] [X.sub.1j] + [[Beta].sub.2] [X.sub.2j] + ... + [[Beta].sub.k] [X.sub.kj] + [Gamma] [f.sub.j] + [v.sub.j] = [W.sub.j] [Delta] + [v.sub.j]
j = 1, 2, . . ., n
[W.sub.j] = (1, [X.sub.1j], [X.sub.2j], . . ., [X.sub.kj], [f.sub.j])
[Delta] = ([[Beta].sub.0], [[Beta].sub.1], [[Beta].sub.2], . . ., [[Beta].sub.k], [Gamma])[prime]
[v.sub.j] = [e.sub.j] - [[Beta].sub.1] [u.sub.1j] - [[Beta].sub.2] [u.sub.2j] - ... - [[Beta].sub.k] [u.sub.kj].
Under the assumption that [x.sub.ij], [u.sub.ij], [f.sub.j], and [e.sub.j] are mutually uncorrelated for all i and j, it can be shown that for large samples, the inconsistency from applying OLS to regression (7) is
(8) [Mathematical Expression Omitted]
where [Mathematical Expression Omitted].
As before, the regression coefficient for [f.sub.j], [Mathematical Expression Omitted], is inconsistent due to the presence of measurement error of the X's despite the fact that [f.sub.j] is measured without error. The only difference between equations (8) and (3) is that the direction of bias cannot be established for any of the parameters in equation (8).
D. Generalization to Correlated Measurement Errors
All previous results are based upon the assumption of independent measurement errors. However, the measurement errors associated with evaluating job factors may be correlated. An individual evaluator may consistently give low scores or high scores to all job factors. Or evaluators may not maintain independence in their scoring, creating common errors across evaluators. Correlated measurement errors complicates the analysis considerably. Assume the previous general case in equations (5) and (6) but now let measurement errors associated with the job evaluation factors be positively correlated. The correlation can be expressed as follows:
(9) [X.sub.ij] = [x.sub.ij] + [u.sub.ij] j = 1, 2, . . ., n
[X.sub.kj] = [x.sub.kj] + [u.sub.kj] j = 1, 2, . . ., n
where [X.sub.ij] is measured job evaluation factor i; [u.sub.ij] is measurement error in the [i.sup.th] job evaluation factor; [X.sub.kj] is measured job evaluation factor k; [u.sub.kj] is measurement error in the [k.sup.th] job evaluation factor; and [x.sub.ij], [u.sub.kj] are uncorrelated for all i and j. Error terms [u.sub.ij] and [u.sub.kj] are correlated with covariance [[Sigma].sub.uiuk]. The observed population squared correlation between [X.sub.i] and [X.sub.j] under model (9) is:
(10) [Mathematical Expression Omitted],
where A = [[Rho].sub.uiuk] [[(1 - [[Kappa].sub.xi])(1 - [[Kappa].sub.xk]) [[Sigma].sub.xi] [[Sigma].sub.xk]].sup.0.5], and [[Rho].sub.uiuk] is the correlation in measurement error between [x.sub.i] and [x.sub.k]. If the measurement errors are uncorrelated so that [[Rho].sub.uiuk] = 0 in (10), the second expression on the right-hand-side vanishes and the equation collapses to the form in equation (4). However, if the measurement errors are correlated, then the second term on the right-hand-side of equation (10) is not zero. The second term in equation (10) can be positive or negative, so the true correlations can be larger or smaller than the observed correlations.
IV. DATA AND ESTIMATION PROCEDURES
A. Data and Empirical Methodology
In 1984, Arthur Young Company  conducted a factor point pay analysis of 758 jobs in the State of Iowa Merit Pay System. Arthur Young trained nine teams of state employees to conduct the evaluations. Each team was composed of two men and two women with one personnel specialist on each team. Most jobs were evaluated by only one team, but two or more teams evaluated 90 of the 758 jobs.(13) The multiple observations for specific jobs allowed Arthur Young to compute interrater reliability ratios for each of the 13 job factors.
To show how these reliability ratios are computed in practice, suppose we have raters, each of whom measure the [i.sup.th] factor for J different jobs. All raters are unbiased by assumption.(14) The reliability ratio for factor i can be estimated as the [R.sup.2] from a regression of the form:
[X.sub.ijm] = [summation of] E([x.sub.ij]) [D.sub.j] + [u.sub.ijm] where j = 1 to j
E([x.sub.ij]) is the average estimate across raters for the [i.sup.th] factor for the jth job, estimated as the coefficient on the jth of J dummy variables for the job being evaluated. The [m.sup.th] rater's deviation from the mean estimate for factor i and job j is [u.sub.ijm]. Because of the unbiasedness assumption, E([u.sub.ijm]) = 0. If there is no uncertainty among the raters about the measure of factor i across the J jobs, [u.sub.ijm] = 0 [for every] j and m, so that the [R.sup.2] (and the reliability ratio) will equal one. More typically, there will be disagreement among the raters so that [u.sub.ijm] [not equal to] 0 [for every] j and m, and the [R.sup.2] (reliability ratio) will be between zero and one. These reliability ratios can be treated as estimates of the [[Kappa].sub.xi] in equations (4) and (10). The sample statistics and reliability ratios for the 13 job factors are reported in Table I.
Reliability ratios provide the information needed to extract true correlation coefficients from observed data. Using equation (4) and the independence assumption, the true correlations [([R.sub.x1x2]).sup.2] can be extracted by dividing observed population correlation coefficients by the appropriate reliability ratios. With correlated measurement errors, the true correlations can be extracted using equation (10), the reliability ratios, and information on correlation coefficients between measurement errors.
Reliability ratios can also be used to extract the true coefficients by correcting the covariance matrix for the presence of measurement error. The estimator is:
(11) [Beta] = [(X[prime]X - D[prime][Lambda]D).sup.-1] X[prime]y
D = diag ([[Sigma].sub.x1], [[Sigma].sub.x2], . . ., [[Sigma].sub.xn])
[Lambda] = diag (1 - [[Kappa].sub.x1], 1 - [[Kappa].sub.x2], . . ., 1-[[Kappa].sub.xn]),
for the uncorrelated measurement error case. If measurement errors are correlated, the A matrix has off-diagonal elements of the form [[Rho].sub.uiuk] [([[Sigma].sub.ui] [[Sigma].sub.uk]/[[Sigma].sub.xi] [[Sigma].sub.xk]).sup.1/2].(15) A computer program (EVCARP) developed by Schnell, Park and Fuller  was used to estimate the regression equations reported in this study.(16)
The original regression analysis conducted by Arthur Young Company  regressed pay grade and maximum salary for 758 jobs on the thirteen job factors and a variable measuring percent female incumbents. The analysis herein concentrates on predicting maximum salary since pay grade is difficult to interpret.(17) Use of maximum biweekly pay also has the advantage of holding constant incumbent step level across jobs. Variation in pay across jobs is therefore due solely to differences in how the job is rewarded by the pay system and not by differences in incumbent job tenure across jobs. The ordinary least squares regression is reported in the first column of Table II.
The functional form uses level of salary as the dependent variable. A Box-Cox regression strongly rejected the log specification of the dependent variable.(18) The results appear quite reasonable on the surface. The regressors explain 92% of the variation in maximum salary. Most job factors raise pay, eight of them significantly. Most importantly for this study, the coefficient on percent female is negative and significant. A ten point increase in percentage female incumbents reduces biweekly pay by $7.19.
The estimates correcting for reliability of job factors (column 2 labeled "EVCARP" of Table II) are very different. The standard errors explode so that all coefficients become insignificant. In addition, the test of singularity of the moment matrix easily fails to reject [TABULAR DATA FOR TABLE I OMITTED] the null hypothesis of singularity.(19) Computation of the true correlation matrix using equation (4) reveals why. Several of the true job factors are perfectly correlated, as shown in Table III. The only reason these variables had estimable coefficients in the first column of Table II was that they were measured with error and the measurement error was sufficient to create independent variation in the measured factors. Measurement errors in the factors increased the apparent dimensionality of the job factor space.
The five factors which are perfectly or nearly perfectly correlated are difficult to distinguish on a conceptual as well as an empirical basis. The factors include required knowledge from experience; a job's complexity, judgment and problem-solving; supervision required on the job; the scope and effect of the job on the institution; and the impact of errors on the institution. All these factors deal with job requirements and skills that come with increased experience and responsibility.
The five collinear factors were combined into a single factor (JOINT) by taking the simple average of the five.(20) Because the combined factor is a linear combinations of the original five factors, the reliability ratio for the combined factor can be constructed from the five individual reliability ratios.(21)
The ordinary least squares regression using JOINT in place of the five factors is reported in column three of Table II. The linear restrictions implied by using the combined factor in place of the five separate factors was tested and could not be rejected at standard significance levels. Most importantly, the coefficient [TABULAR DATA FOR TABLE II OMITTED] on percent female incumbents is nearly the same as before (-.74 versus -.72) so that our use of the combined factor does not of itself affect the outcomes. Therefore, all the remaining regressions use JOINT in place of the five highly correlated factors.
The reliability corrected regressions are reported in Table IV. For comparison, the uncorrected regression is reported in the first column. The next four columns contain regressions which correct for measurement errors under progressively higher assumed error correlations across factors. In each of these columns, [[Rho].sub.uiuk] is assumed to equal a constant p for all job factors, i and k. Four specifications are reported, [Rho] = 0, [Rho] = .1, [Rho] = .2 and [Rho] = .3. The specifications with [Rho] [greater than] .1 create moment matrices that approach singularity. Therefore, the discussion will concentrate on the specifications with [Rho] [less than] .2.
Correcting for measurement error in the factors sharply decreases the impact of the percent female variable, although the coefficient remains significant. When we assume [TABULAR DATA FOR TABLE III OMITTED] uncorrelated errors, this reduces the absolute value of the coefficient by 34% of the OLS estimate. Allowing measurement error correlation across factors further reduces the impact of percent female incumbents. With [Rho] = .1, the coefficient is reduced by 44% relative to the OLS coefficient. Clearly, the extent of implied pay bias against predominantly female jobs is sensitive to measurement error in job factors.
One way to correct the pay structure for bias against predominantly female jobs is to increase pay per job by -[Gamma]f (from equation (7)). Because the coefficient ([Gamma]) on percent female incumbents (f) is negative, this implies raising the pay for jobs having female incumbents. At sample means of percent female incumbents, the OLS coefficient implies an average increase of $645 per year. The measurement error corrected regression with [Rho] = .1 implies an average increase of only $359. For 100% female jobs, the dollar difference associated with measurement error is even larger: $1,924 versus $1,071.
The coefficients on job factors are also sensitive to measurement error corrections. The coefficient on JOINT, which is based on indicators of experience and responsibility, increases as measurement error corrections are introduced. Mental and visual demands, an indicator of coordination required on the job, also increases in importance. Supervisory responsibility and personal contacts lose importance.
Table V provides additional insights on the impact of measurement error in the job factors on the outcomes of comparable worth pay analysis. For each worker in Iowa state government, we predicted pay using the indicated regression equation from Table II or IV, setting the percent female variable equal to zero. We then grouped workers into various occupational, educational, and market wage rate level categories and computed the average group biweekly state government pay as reported in Table V.(22) The first two columns represent two baseline sets of pay structures (one using 13 job factors and the other using 8 factors plus JOINT) where no adjustment is made using the reliability coefficients. The average wages vary little between these two baseline pay structures.(23)
The other three columns of Table V report the pay distributions using wage equations (eight factors plus JOINT) corrected for measurement error. Correcting for measurement error makes little difference for highly educated and highly paid jobs. However, it does make a large difference for the less educated and lower paid jobs. Measurement corrected equations imply that lower biweekly pay is appropriate for these jobs than would be implied by the OLS estimates. Put another way, lower skilled and lower paid workers would [TABULAR DATA FOR TABLE IV OMITTED] see smaller upward comparable worth pay adjustments using measurement error adjusted wage equations.
The magnitude of these adjustments is not trivial. For those with less than a high school education, the measurement error corrected biweekly pay was $24.20-$37.40 lower than the uncorrected pay, a difference of $629-$972 per year. Aggregating over approximately 20,000 state employees in Iowa, the measurement error corrected pay would have resulted in $4.2 million to $5.6 million lower adjustment in pay per year than the uncorrected pay based on the OLS estimates.
V. CONCLUSIONS AND IMPLICATIONS
This study evaluates the impacts of measurement errors on "policy capturing" regression coefficient weights using data from the State of Iowa's comparable worth system. Corrections for measurement error and multicollinearity in the original Arthur Young's job evaluation factors are used to examine the sensitivity and statistical robustness of these factor weights. The empirical findings can be summarized as follows:
1. The presence of measurement error caused upward bias on the absolute value of the estimated coefficient on percent female incumbents in Iowa. The adverse impact of percent female on pay is reduced by 34%-44% when the problem of measurement error is corrected. If this coefficient is taken to be a measure of discrimination against predominantly female jobs, then measurement errors caused the implied discrimination to be overstated. As a consequence, proposed comparable [TABULAR DATA FOR TABLE V OMITTED] worth wage adjustments necessary to bring female jobs to parity with male jobs in Iowa were too large. While it would be inappropriate to generalize from one study, measurement errors also led to greater implied discrimination in the Hashimoto and Kochin  and Rapaport  studies.
2. Measurement error reduces the collinearity among job factors, allowing too many factors to be included in the pay analysis. Using reliability ratios, the "true" correlation matrix was estimated. Five factors out of thirteen were found to be perfectly or nearly perfectly correlated once we correct for measurement error. The OLS coefficients for these five factors were only identified because of measurement error.
3. Only additional research can determine whether measurement error is a common flaw in studies of pay discrimination and whether any resulting bias tends to be in the same direction as we find. However, the current study demonstrates that measurement error correction models are tractable in pay analysis. It is straightforward for analysts to estimate reliability ratios for factors by conducting more than one independent evaluation per job. Measurement error variances, covariances and reliability coefficients can be estimated and applied to obtain unbiased estimates of factor weights using equation (11).
In the limit, the bias discussed in Section III.A approaches
(A1) [Mathematical Expression Omitted]
where v = e - [[Beta].sub.1][u.sub.1], [Delta] = ([[Beta].sub.0], [[Beta].sub.1], [Gamma])[prime] and [u.sub.1] = ([u.sub.11], [u.sub.12], . . ., [u.sub.1n])[prime]. Assume the following limits exist:
[Mathematical Expression Omitted]
[Mathematical Expression Omitted]
where l is an nx1 vector of ones, and [[Sigma].sub.u1] is the measurement error variance for [X.sub.1]. The expression (3) can then be written as
(A2) [Mathematical Expression Omitted].
In Section III.B, assuming independent measurement errors, the new reliability ratio is
[[Kappa].sub.JOINT] = [summation of] [[Kappa].sub.xi][[Sigma].sub.xi] where i = 1 to 5 / [summation of] [[Sigma].sub.xi] where i = 1 to 5.
When measurement errors are assumed to be correlated, the new joint reliability ratio is
[Mathematical Expression Omitted]
OLS: Ordinary Least squares
We wish to gratefully acknowledge partial support from NSF Grant Number 8909479. We received helpful comments from Yasuo Amemiya, Wayne Fuller, and Wally Huffman on earlier drafts of this paper. Donna Otto prepared the manuscript.
1. See Treiman and Hartmann .
2. Schwab  estimated the share of the labor force covered by job evaluation. England [1992, Chapter 4] has a good review of job evaluation methods and related research.
3. It is common for analysts to set the coefficients, [Beta], on the basis of a priori presumed relative importance of the various job factors to the firm. In essence, this means establishing the [Beta]'s without appealing to statistical analysis. The Hay system and its variants are examples of a priori weighted systems in which the consultant reports an aggregate job index, [Mathematical Expression Omitted], where [Mathematical Expression Omitted] is the consultants weights on the vector of job factors. Nevertheless, one can still regress y on I and f to get an estimate of [Gamma].
4. We should note that the pay equation does not require that the firm literally set pay according to the pay equation. The equation is often interpreted as "policy capturing," meaning that the [Beta] and [Gamma] coefficients capture the implicit returns to job attributes and female incumbency in the firm. In fact, given federal and state legislation, it is unlikely that there is ever an explicit policy to underpay predominantly female jobs, so the pay equation is unlikely to be a literal statement of firm official pay policy. The analogous case is that the standard earnings function does not require a concious policy that wages increase with experience and education, but the earnings function captures market returns which demonstrate those tendencies.
5. In addition to the Ehrenberg  review cited earlier, Sorenson  provides estimates that strict adherence to job evaluation would raise female pay an average of 11%. Ames  found that job evaluation raised female pay in Ontario and Manitoba. England [1992, 205] concluded that job evaluation, "nearly always gives women's jobs higher wages relative to men's ..."
6. See Ames  or Orazem and Mattila  for studies of how the implementation process may divert the results away from the proposed gender-neutral pay structures.
7. See Collins and Muchinsky , Breig et al. , Madigan , Mount and Ellis  and Schwab  for various viewpoints and examples of the sources and consequences of measurement error in pay analysis.
8. Examples include Lucas , Filer  and Schuman, Ahlburg and Mahoney .
9. Judge et al. , Chapter 13 contains an introduction to measurement error problems. See Fuller  for a detailed discussion.
10. In the context here, job factors are not observable but are estimated by evaluators. If evaluators are unbiased, the mean evaluation of the same job factor across evaluators will be a consistent estimator of [x.sub.ij], and deviations from the mean can be used to estimate the measurement error variance, [Sigma][([X.sub.ij] - [x.sub.ij]).sup.2]/(n - 1).
11. See Appendix.
12. A likely example of this phenomenon is Filer's  study which includes 225 regressors in a model explaining variation in average occupational wages.
13. No information is available as to how the 90 jobs were selected except that Young  states 62 resulted from a policy of evaluating at least one job each day that had been done by another team. A referee raised the possibility that the most difficult jobs were selected so that the true reliability would be understated. We can't rule out this possibility but note that there were other practices that would have worked in the opposite direction. Greig, Orazem and Mattila  point out that analysts have an incentive to show their procedures are reliable and adopt practices such as sharing information between evaluation teams that tend to accomplish this.
14. Unbiased raters are a maintained assumption in job analysis because if one assumed the raters were biased, the job analysis would not be undertaken in the first place. To reduce potential bias, raters are sent through a training program to sensitize them to possible sources of bias and to disqualify bad raters. In addition, teams are chosen to represent diverse points of view to reduce the possibility of strategic ratings.
15. Note that when i = k, [[Rho].sub.uiuk][([[Sigma].sub.uiuk]/[[Sigma].sub.xi][[Sigma].sub.xk]).sup.1/2] = [[Sigma].sub.ui]/[[Sigma].sub.xi] = 1 - [[Kappa].sub.xi], the same diagonal elements of A as in equation (11). If there is no measurement error, all reliability ratios equal one, and all error variances are zero. In that case, [Lambda] becomes a null matrix, so that [Beta] = [[Beta].sub.OLS].
16. The program, EVCARP, has a routine designed to correct for measurement error, given information on reliability ratios.
17. Chen  reports results using pay grade as the dependent variable. Qualitative results are similar to those reported herein.
18. The Box-Cox regression (Judge ) finds the value of [Lambda] which best transforms y into a normally distributed variable. The transformation ([y.sup.[Lambda]] - 1)/[Lambda] approaches ln (y) as [Lambda] [approaches] 0 and approaches y - 1 as [Lambda] [approaches] 1. The maximum likelihood estimate of [Lambda] was .78, so the linear specification was preferred. Logarithmic specifications yielded similar qualitative results. See Chen  for details.
19. Fuller  shows that the test statistic for the null hypothesis that the rank of the true moment is k - 1 when there are k regressors is (n/[n - k - 1])r[prime] where r[prime] is the smallest characteristic root of the matrix X[prime]X-D[prime][Lambda]D in equation (12). The test statistic is distributed F(n - k + 1, n - 1).
20. The first principal component explained nearly 80% of the variation in these five factors using principal component analysis. Substituting this first principal component for JOINT yielded virtually identical results.
21. See Appendix.
22. State jobs were matched to market jobs using job titles and job qualification requirements. Details on the market wage data are contained in Orazem and Mattila .
23. While the magnitudes of the differences in means between OLS13 and OLS9 are small, most were statistically significant. The larger differences between the EVCARP predicted salaries and predicted salaries using OLS9 are also generally significant.
Ames, Lynda J. "Fixing Women's Wages: The Effectiveness of Comparable Worth Policies." Industrial and Labor Relations Review, July 1995, 709-725.
Arthur Young Company. Study to Establish an Evaluation System for State of Iowa Merit Employment System Classifications on the Basis of Comparable Worth: Final Report and Statistical Supplement. Milwaukee: Arthur Young Company, 1984.
Chen, Shih-Neng. "Two Applications of Measurement Error Correction in the Economics of Human Resources." Ph.D. dissertation, Ames: Iowa State University, 1995.
Collins, Judith M., and Paul M. Muchinsky. "An Assessment of the Construct Validity of Three Job Evaluation Methods: A Field Experiment." Academy of Management Journal, August 1993, 895-904.
Ehrenberg, Ronald. "Empirical Consequences of Comparable Worth," in Comparable Worth: Analysis and Evidence, edited by M. Anne Hill and Mark R. Killingsworth. Cornell, N.Y.: ILR Press, 1989, 90-106.
England, Paula. Comparable Worth: Theories and Evidence. New York: Aldine de Gruyter, 1992.
Filer, Randall K. "Occupational Segregation, Compensating Differentials, and Comparable Worth," in Pay Equity: Empirical Inquiries, edited by Robert T. Michael, Heidi I. Hartmann, and Brigid O'Farrell. Washington, D.C.: National Academy Press, 1989, 153-70.
Fuller, Wayne A. Measurement Error Models. New York: John Wiley & Sons, Inc., 1987.
Greig, Jeffrey J., Peter F. Orazem, and J. Peter Mattila. "Measurement Errors in Comparable Worth Pay Analysis: Causes, Consequences, and Corrections." Journal of Social Issues, Winter 1989, 135-51.
Hashimoto, Masanori, and Levis Kochin. "A Bias in the Statistical Estimation of the Effects of Discrimination." Economic Inquiry, July 1980, 478-86.
Judge, George G., W. E. Griffiths, R. Carter Hill, Helmut Lutkepohl, and Tsoung-Chao Lee. The Theory and Practice of Econometrics. 2nd ed. New York: John Wiley & Sons, Inc., 1985.
Lucas, Robert E. B. "Hedonic Wage Equations and Psychic Wages in the Returns to Schooling." American Economic Review, September 1977, 549-57.
Madigan, Robert M. "Comparable Worth Judgments: A Measure Properties Analysis." Journal of Applied Psychology, February 1985, 137-47.
Mount, Michael M., and Rebecca A. Ellis. "Sources of Bias in Job Evaluation: A Review and Critique of Research." Journal of Social Issues, Winter 1989, 153-68.
Orazem, Peter F., and J. Peter Mattila. "Comparable Worth and the Structure of Earnings: The Iowa Case," in Pay Equity: Empirical Inquiries, edited by Robert T. Michael, Heidi I. Hartmann, and Brigid O'Farrell. Washington, D.C.: National Academy Press, 1989, 179-99.
-----. "The Implementation Process of Comparable Worth: Winners and Losers." Journal of Political Economy, February 1990, 134-52.
Rapaport, Carol. "Apparent Wage Discrimination when Wages are Determined by Nondiscriminatory Contracts." American Economic Review, December 1995, 1,263-77.
Schnell, Daniel, Heon Jin Park, and Wayne A. Fuller. EV CARP. Ames: Iowa State University, Statistical Laboratory, 1987.
Schuman, Paul L., Dennis A. Ahlburg, and Christine Brown Mahoney. "The Effects of Human Capital and Job Characteristics on Pay." Journal of Human Resources, Spring 1994, 481-503u
Schwab, Donald P. "Using Job Evaluation to Obtain Pay Equity," in Comparable Worth: Issues for the 80's: A Consultation of the U.S. Commission on Civil Rights, Vol. 1. Washington, D.C.: U.S. Commission on Civil Rights, 1984, 83-92.
Sorenson, Elaine. "Effect of Comparable Worth Policies on Earnings." Industrial Relations, Fall 1987, 227-39.
Treiman, Donald J., and Heidi I. Hartmann, eds. Women, Work and Wages: Equal Pay for Jobs of Equal Value. Washington, D.C.: National Academy Press, 1981.
|Printer friendly Cite/link Email Feedback|
|Author:||Chen, Shih-Neng; Orazem, Peter F.; Matilla, J. Peter; Greig, Jeffrey J.|
|Date:||Apr 1, 1999|
|Previous Article:||Economic activity in the shadow of conflict.|
|Next Article:||Women's rising market opportunities and increased labor force participation.|