Printer Friendly

How valuable are matched data files?

How Valuable Are Matched Data Files? (A Comment on "Enhanced Demographic-Economic Data Sets")

"PROBABLY the most conspicuous and important fact to be found in the history of economic science during the last 30 years is this, namely while there has been no change in the objects to which it is directed... there has been a marked change in the methods according to which economic science is cultivated. It has ceased to be an abstract science--it has ceased to be a system of subtle and ingenious reasoning--and little by little, and by a process cautious and full of promise, become a science almost entirely experimental."

This assessment was voiced by William Newmarch in an address to the British Association for the Advancement of Science in 1861.(1) Quantitative economics was nourished by Henry L. Moore(2) and by Henry Schultz, and it has blossomed into what we now know as Econometrics. The development of this discipline owes much to the substantial public and private resources allocated to the collection and dissemination of economic statistics.

That facts are valuable is an unassailable fact. Frank Stafford (1986) contends that the large microdata sets that were collected over the last 20 to 30 years have contributed much, not only to our understanding of the determinants of wages, labor supply, and fertility, but also to the advances in theoretical economics and econometric methodology. The panel surveys and large data sets--such as the Current Population Survey, the Survey of Economic Opportunity, and the Survey of Income and Program Participation, or SIPP, which is the centerpiece of the present paper--have complemented the other bodies of statistics assembled by public and private agencies. Stafford argues that the available data are inadequate to study various aspects of the functioning of labor markets, especially on the demand side of the market.

Stafford's opinion is shared by Dan Hamermesh who claims that the slower progress in the study of labor demand (employment dynamics and factor substitution) is due to "a failure to invest in the kinds of data that would allow us to obtain answers, a failure that continues today" (Hamermesh 1988, 10).

Hamermesh argues that a representative sample of establishments surveyed at monthly or quarterly intervals would yield useful data for studying adjustment costs, employment dynamics, etc. Household data could, evidently, be collected by sampling employees from the establishment's payroll records. This suggestion calls for a new longitudinal establishment survey rather than for combining existing files.

Hamermesh cautions us against trying to make do with what we already have when he writes, "Rather than rely on inappropriate data, those of us interested in empirical research... must adopt some of the sociologists' willingness to generate new sets of data" (p. 27). There is clearly a perceived demand for data that will aid in estimating labor demand functions and in analyzing the interaction of labor demand and supply functions.

Hamermesh is vague about the way in which establishments and workers will be sampled. Should we follow the Rees-Shultz (1970) procedure of sampling employees from the establishment payroll records? Or should we randomly select individuals and tie them to the establishments? The latter approach is examined in the paper "Enhanced Demographic-Economic Data Sets" (Herriot et al. 1988). These authors propose to supply us not with more new data but rather, by linking existing data files, with more information.

Their first project--linking SIPP to Social Security Administration (SSA) administrative records--is an exciting one. It will provide work histories for the SIPP respondents that will go back to their first jobs in which they contributed to social security. However, the authors tell us that they were able to find valid social security numbers for only 85 percent of the SIPP respondents. To solve this problem, would it be possible to begin with a sample drawn from the SSA files that could be included in the rotation for SIPP?

Some 1,352 SIPP respondents were asked to sign releases so that employer-provided fringe benefits data could be obtained directly from the employer, but only 560 persons (42 percent) signed the releases. Two questions come to mind. First, were attempts made to follow up the 58 percent who did not sign the releases to ascertain whether there is a sample selection bias? Second, did the SIPP data include employee-provided estimates of fringe benefits that could be compared with employer-provided data in a manner analogous to a record-checking project?

The linkages of SIPP to economic data files go at least part of the way in addressing the concerns voiced by Hamermesh (1988) and Stafford (1986). Three data files are discussed by Herriot et al. The first, the Standard Statistical Establishment List (SSEL), appears to be a nearly complete canvas of all establishments; it includes nearly 5 million plants. SSEL contains relatively few variables and is maintained for only 2 years. If the SSEL records for the establishments that are linked to the SIPP respondents could be retained one could construct a panel data set of worker-plant matches; constructing such a panel would probably involve retaining data for 100,000 to 200,000 establishments.

The second, the Longitudinal Research Database (LRD), covers only manufacturing establishments, but it contains a longer list of variables than the SSEL, especially for plants with 250 employees or more. However, only a subset of all manufacturing establishments are surveyed on an annual basis. The chances of finding a "match" with SIPP data are very small: Only one in every six workers is employed in manufacturing, and an even smaller fraction of manufacturing establishments is included in the annual surveys.

The third economic file, enterprise statistics (ES), is only collected every 5 years. If the turnover of firms is important for employment continuity, links to the ES have very little value.

In the section on low-wage workers and low-wage firms, attention is directed to a hypothetical project that could be undertaken if the SIPP file was linked to the SSEL, the LRD, or the ES. Do these three existing economic data files provide enough information?

None of these three existing files has data to validate very many propositions about low-wage firms. They cannot tell us whether firms acquire new or used capital equipment, operate single or multiple shifts, and own or lease assets. Although the Census of Manufacturers distinguishes between production and nonproduction workers, the latter group covers a wide range of employees--clerk, supervisors, salaried sales personnel, managers, etc. Indeed, Kochan, Katz, and McKersie (1986) describe one plant that produces agricultural implements and that has no employees on hourly rates of pay. I have argued elsewhere that firm size is a close proxy for the set of employers in a "low-wage" labor market (Oi 1985, 1988). The relationship between firm size and wages was initially examined by Henry Moore ([1911] 1967, Ch. 6), and it was more carefully documented by Mellow (1982) and Brown and Medoff (1986).(3)

It is regrettable that the SIPP did not include questions inquiring about the size of the firm and establishment in which the respondent was employed. These questions were included in the Current Population Survey for May 1979 and May 1983. I strongly urge that these questions be included in every wave of the SIPP for two reasons: (1) Without linkages, the response can be used to control for the effect of firm and plant sizes on wages, job tenure, fringe benefits, etc., and (2) if SIPP is linked to establishment files, we can determine the accuracy of employee estimates of firm and plant sizes. The relationship between firm size and wages varies across industries and, possibly, by occupation. If a longitudinal SSEL file could be matched with the SIPP, we could learn how establishment traits affect the firm-size profile of wages.

The problems that can be analyzed with matched files are limited by the information available in the establishment files. The preliminary projects that were conducted by Sater (1986) and Haber (1985) have to be studied to gauge the potential benefits of these matched files.

Herriot et al. identify two methodological issues in matching demographic and economic data: (1) Tying workers to firms and (2) estimating missing data, notably assets and fringe benefits for the establishment. The authors argue that data on capital assets for small establishments can be interpolated from a relationship between capital assets and establishment size for the large establishments that report such data. It is a questionable procedure.

Clark (1923) analyzed the implications of overhead costs and emphasized the proposition that "Sunk costs are sunk." The costs of collecting, editing, and coding the data for the SIPP, the SSEL, the Census of Manufactures, and other existing data files are sunk costs. The incremental cost of linking two or more existing data files is small compared to the cost of a new survey. Further, a new survey may render an existing survey obsolete.

But costs are only one side of the equation. One has to compare the incremental benefits of linkages to the incremental costs. But what are the incremental benefits?

Public agencies are very reluctant to abandon existing projects, especially when large sums have already been invested in them. That a data file exist is not, in itself, enough to justify its use, unless its use is costless. (1.) The address was quoted by Henry L. Moore ([1911] 1967, 170-171). (2.) The case is persuasively argued by Professor George Stigler who wrote: "If one seeks distinctive traits of modern economics, traits which are not shared to any important degree with the Marshallian or earlier period, he will find only one, the development of statistical estimation of economic relationships. Mathematical analysis became increasingly more common after Walras' first edition... But Statistical Economics, the name given by Henry Moore, is the one important modern development. Henry Moore was its founder in the sense in which most large movements have a founder." (Stigler 1965, 343-344.) (3.) W.I. King (1923) assembled data on hours, earnings, and employment by industry and establishment size for the period 1921-22. References to other studies of what I call the firm-size profile of wages can be found in Oi (1988) and Browand Medoff (1986)

References Brown, Charles, and J.M. Medoff (1986), "The Employer Size Wage Effect," Harvard Institute of Economic Research Discussion Paper No. 1202, January 1986. Forthcoming in Journal of Political Economy (1988). Clark, J.M. (1923), Studies in the Economics of Overhead Costs, Chicago: University of Chicago Press. Haber, S. (1985), Applications of a Matched File Linking the Bureau of the Census Survey of Income and Program Participation and Economic Data, Survey of Income and Program Participation Working Paper Series, No. 8502, U.S. Bureau of the Census, Washington, DC: GPO. Hamermesh, Daniel S. (1988), "Data Difficulties in Labor Economics," Paper presented at the Conference on Research in Income and Wealth, May 1988. Forthcoming in Fifty Years of Economic Measurement, Edited by Ernst R. Berndt, W. Erwin Diewert, and Jack E. Triplett. Herriot, Roger, Chester Bowie, Daniel Kasprzyk, and Sheldon Haber (1988), "Enhanced Demographic-Economic Data Sets," SURVEY OF CURRENT BUSINESS 68 (November 1988). King, W.I. (1923), Employment, Hours, and Earnings in Prosperity and Depression, United States, 1920-1922, New York: National Bureau of Economic Research. Kochan, Thomas A., Harry C. Katz, and Robert E. McKersie (1986), The Transformation of American Industrial Relations, New York: Basic Books. Mellow, W. (1982), "Employer Size and Wages," Review of Economics and Statistics 64 (August 1982): 495-501. Moore, Henry L. ([1911] 1967), The Law of Wages, Reprint, New York: Augustus M. Kelley. Oi, Walter Y. (1988), "Employment Relations in Dual Labor Markets," Journal of Labor Economics. Forthcoming. Oi, Walter Y. (1985), "Low Wages and Small Firms," Report prepared for the U.S. Department of Labor, Washington, DC, November 1985. Rees, Albert, and George P. Shultz (1970), Workers and Wages in An Urban Labor Market, Chicago: University of Chicago Press. Sater, D.K. (1986), "SSN Response Rates and Results of SSN Validation/Improvement Operation," Memorandum for Roger Herriot, Population Division, U.S. Bureau of the Census, March 11, 1986. Stafford, Frank (1986), "Forestalling the Demise of Empirical Economics: The Role of Microdata in Labor Economics Research," Handbook of Labor Economics, Edited by O. Ashenfelter and R. Layard, North Holland, The Netherlands, 1986: 387-423. Stigler, George J. (1965), Essays in the History of Economics, Chicago: University of Chicago Press.
COPYRIGHT 1988 U.S. Government Printing Office
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1988 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Oi, Walter Y.
Publication:Survey of Current Business
Date:Nov 1, 1988
Previous Article:Enhanced demographic-economic data sets.
Next Article:The business situation.

Related Articles
Automated Crime Information Systems.
Physician Prescribing Data. (AMA Briefs).
Physician prescribing data. (AMA Briefs).
Experian Offers Next Generation USPS Hygiene Services; DSF(2) Now Available Through Experian's Advanced List Processing Technology.
TMC lab reviews. (TMC Labs).
Software Decryption Leader - AccessData - Releases Registry Viewer Forensic Investigative Tool as Part of the 'Ultimate Toolkit' for Law Enforcement;...
Vontu Discover Gives Fortune 500 Companies Insight Into Confidential Data Stored on Exposed Shared Network Drives, Web Servers, and Desktops.
WinZip(R) 10.0 Now Available.
New Software From Avanquest Protects Important and Sensitive Computer Files.

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters