Printer Friendly

The statistics corner: research with economic microdata: the Census Bureau's Center for Economic Studies.

In a typical year, the Census Bureau's Economic Programs area carries out scores of surveys or censuses of business establishments and firms. These surveys produce a myriad of "data products" (as the Census Bureau calls them), most of which consist of aggregated sums and cross tabulations, across various economic sectors (industries, states, size classes, etc.). The vast majority of the readers of this journal use one or more of these data products directly or indirectly (for example, through the National Income and Product Accounts or the Merchandise Trade Statistics).(1)

While most readers are familiar with Census Bureau data products, they may not be familiar with the Center for Economic Studies (CES), a part of the Economic Programs area. Most of the work of CES consists of analytical studies carried out by CES staff and by "research associates" (as they are called) from government, academia, and other research groups. Until now, CES work products have usually appeared in academic journals, not in Census Bureau publications.

CES conducts no surveys of its own. Instead, it links survey microdata over time to form longitudinal panels (called longitudinal micropanel databases), and it broadens these panels by linking them with other data sets from both within and outside the Census Bureau.

Table 1 describes most of the databases at CES. The primary database is the Longitudinal Research Database (LRD), which consists of annual cost and output data on manufacturing establishments (plants) from the Census of Manufactures (1963, 1967, 1972, 1977, 1982, and 1987) and from the Annual Survey of Manufactures (since 1972), linked to form an unbalanced longitudinal panel.(2) Among many microdata sets CES has linked to the LRD are the National Science Foundation/Census Research and Development (R&D) survey and the Pollution Abatement Costs and Expenditures (PACE) survey. These enriched data sets provide researchers with new tools to test hypotheses and examine policy options.

The CES research program is broad, reflecting the diversity of Census Bureau data programs and the needs of researchers and policy makers. In this paper, we can only give examples that suggest the scope of the research. Recent studies find that:

1. Recessions are times in which job destruction rises TABULAR DATA OMITTED sharply but job creation falls only slightly. Similarly, expansions are better characterized as reductions in job destruction than as increases in job creation. Variations in job destruction rates, therefore, appear to be the crucial difference between recessions and expansions, at least for the U.S. manufacturing sector in 1972-88 (Davis and Haltiwanger 1990).

2. The conventional view of recessions -- that jobs disappear temporarily while the creation of new jobs declines, and that most of the workers are recalled when aggregate demand recovers -- appears incorrect. Most jobs created are created permanently and most jobs lost are lost permanently (Davis and Haltiwanger 1992).

3. During the 1980s, the rise in total factor productivity in manufacturing industries resulted from increases in the market share of the most productive existing plants. Exit of plants with low productivity and entry of plants with high productivity played almost no role in the rise in productivity (Baily, Hulten and Campbell 1992).

4. Most large manufacturing plants use advanced technology, and those using the most advanced technology pay the highest average production worker wages. Moreover, the adoption of advanced technology has an effect on wages that operates separately from the relationship between plant size and wages (Dunne and Schmitz 1992).

5. Secondary products of plants producing the same primary products bear little relationship to one another unless they are produced by plants under common ownership (Streitwieser 1991).

6. Ownership changes tend to be associated with improvements in the productivity of surviving plants (Lichtenberg and Siegel 1992).

7. Recent takeovers involve related activities, particularly vertically related activities. However, the degree and type of relatedness vary greatly among takeovers and no particular type of takeover (horizontal, vertical, conglomerate) is most profitable (McGuckin, Nguyen, and Andrews 1991).

8.Leveraged buyouts (LBOs) typically are associated with low tech firms and with reduced R&D intensity (R&D as a percentage of sales) after the buyout. Performance of LBO firms improves in spite of the declines in R&D intensity. Therefore, at least for R&D, most LBOs do not pose a business or public policy problem (Long and Ravenscraft 1993a, 1993b).


The research program at CES takes advantage of a natural coincidence of interests between researchers and the Census Bureau. Researchers want access to Census Bureau economic microdata, and the Census Bureau wants to use researchers' perspectives to improve its data programs. By supporting research at CES using economic microdata, the Census Bureau satisfies both desires. While producing a body of significant economic research results (as illustrated above), the CES program has already enabled the Census Bureau to produce new data products and to make other improvements to its data programs, with the potential for many more.

This outcome is consistent with a substantial recent literature arguing that to improve its data programs, a statistical agency must undertake analytical research with its survey microdata (Triplett 1991, McGuckin 1992). CES fills this role at the Census Bureau by: (1) developing integrated longitudinal micropanel databases from various surveys; (2) providing users access to these databases while protecting the confidentiality of the data; (3) undertaking its own program of research and analysis using these databases; and (4) developing alternative data and analytical products and services. The way CES carries out these tasks is called the "CES Research Model." Using this model, the Census Bureau learns a great deal, at a relatively modest cost, about how to improve its data programs.

Researchers who wish to become CES research associates must become Special Sworn Employees (SSEs) of the Bureau. To attain SSE status, the researchers must take a legal oath not to disclose confidential data(3), and the projects they conduct must have benefits to the Census Bureau's data collection activities. The research associates carry out their projects using secure CES facilities.

Project selection at CES involves several considerations. Proposed research projects must use Census Bureau microdata and have scientific merit. As part of their projects, the research associates must be willing to produce new data products or provide the Census Bureau with recommendations for improving its data programs -- improved survey concepts or questionnaires, better survey processing procedures, new data products, etc. For the most part, the research associates pay the cost of their projects in the form of laboratory fees.

CES employees support the research projects by becoming closely involved with them. CES staff researchers, augmented by a small computer staff, undertake an internal program of database development and economic research using the databases. Often, staff researchers and research associates carry out joint projects. This arrangement provides three types of benefits: (1) CES staff research improves; (2) staff researchers become more efficient at providing the advice, consultation, and related support that the research associates require; and (3) staff researchers become more efficient at providing the analytical user perspective to the rest of the Census Bureau.


The projects and activities at CES are diverse.(4) For purposes of this paper, we can divide them into database development, researcher results, data products, and data improvement. Of course, these four aspects are intertwined. Database development provides the raw materials for research projects. The research projects themselves generate other useful data products, and they provide an analytical user perspective that the Census Bureau uses to improve its data programs.

Database Development

CES does not conduct surveys; instead, it increases the worth and research potential of existing data by combining micro data sets from various sources and making them ready for research. CES builds research data sets in two ways -- by linking the observations from corresponding survey units (establishments and firms) across time; and by linking together microdata from different surveys or other sources, both within and outside the Bureau. The following are the major databases at CES:

1. The Longitudinal Research Database (LRD), described above, is the primary database at CES. The file contains key variables that allow classification of establishments by ownership (company), industry, location, and more. An outstanding feature is that it identifies plant births, deaths, and ownership changes. For more details, see McGuckin and Pascoe (1988). In spite of its richness, the LRD is limited. It contains little or no information on many items of interest. It covers only the manufacturing sector, which is a serious limitation for many types of studies. To overcome these problems, CES is currently broadening the LRD by linking it with other data sets, from the Census Bureau and other sources. In fact, CES research associates often supplement the data development process by brining outside data sets with them when they conduct their projects at CES. The bulk of the studies at CES have used the LRD, often linked to one or more of the other data sets described next.

2. Planning and development have just begun on a Longitudinal Business Database (LBD) that will link the LRD with data from Census Bureau programs that cover a large part of the population of domestically located businesses with employees. Initial versions of the LBD will combine the LRD with data from the 1982 and 1987 Economic Censuses. This database will enable researchers to study restructuring that occurs within sectors outside of manufacturing, and reallocations of resources between the manufacturing and nonmanufacturing sectors.

3. The new Quarterly Financial Reports (QFR) database provides income statement and balance sheet information for public and private firms in manufacturing, mining, wholesaling, and retailing for the period 1977 through 1990. The QFR database was first developed and used for the Long and Ravenscraft project, described above, on leveraged buyouts (LBOs).

4. The Research and Development (R&D) database, sponsored by the National Science Foundation (NSF), includes annual data from 1972 through 1988 on firms performing R&D in the U.S. The database is well suited for studies of firms' investments in technology. Several research projects have developed and refined this database.

5. The 1982 and 1987 Characteristics of Business Owners (CBO) surveys, sponsored by the Small Business Administration (SBA) and the Minority Business Development Agency (MBDA), provide data on the demographic and economic characteristics of business owners and the economic performance of their firms. Because the CBO oversamples firms owned by minorities and women, it is particularly useful for studying small businesses owned by these groups. For more detail, see Nucci (1992).

6. Two new databases will provide a great deal of information for studies of issues related to energy and environmental pollution. Because energy production and consumption generates most environmental pollution, these topics are closely related. The Environmental Database, now under development, combines the LRD with information on pollution emission and pollution abatement expenditures. These data come from several Census Bureau and Environmental Protection Agency data sets. The Manufacturing Energy Database (MED), sponsored by the Department of Energy (DOE), links the LRD with other Census Bureau data on plant-level energy consumption. The MED is potentially very useful for studying a host of energy issues in the manufacturing sector.

7. The Worker-Establishment Characteristic Database (WECD), also under development, is an attempt to combine information on worker characteristics, obtained from the Decennial Census, with information on the characteristics of the worker's plant, obtained from the LRD. It will allow labor market studies that include plant-level labor demand conditions; and it will enable researchers to investigate various labor market puzzles such as why some industries and some larger firms seem to pay wage premiums.

Research Results

Economic analysis usually is based on published aggregate data such as the Census Bureau's traditional data products. These aggregate data reduce the myriad of economic activity to manageable proportions and provide confidentiality protection. Unfortunately, information is lost or distorted in this aggregation process. Aggregation is carried out under the assumption of a "representative agent" model, which assumes that the behavior of all agents is essentially alike. The research results at CES show that the behavior of firms and establishments does vary greatly (i.e., idiosyncratic). This is true no matter what variables we analyze (e.g., output, employment, investment, or productivity), no matter what sectors we use to classify (e.g., industry, size or location) and no matter what topic we analyze (e.g., merger policy, job turnover, business cycle analysis, research and development, energy consumption, pollution emissions, or pollution abatement expenditures). In the face of idiosyncratically behaving agents, aggregation error is introduced. Thus, one primary strand of research at CES evaluates the effect of aggregation error and develops new forms of analysis that take advantage of the extra information provided by the microdata available at CES.

Moreover, for many problems, analysis really only makes sense when using microdata. For example, Olley and Pakes (1991) studied the recently deregulated telecommunications industry through a dynamic model of firm behavior that incorporates firm-specific technology differences and allows for entry and exit. Pakes and his associates are continuing this research in a new project that is developing models to evaluate how structural changes, such as price shocks or government gasoline mileage requirements, affect the automobile market. These models incorporate both the demand and supply side of the market, in contrast to the telecommunications study, which focused on productivity alone. The project will estimate the dynamic relationships between automobile costs and automobile characteristics, using plant level LRD production data and firm level R&D data together with estimates of demand based on publicly available data on the prices and characteristics of automobile models. This project is one example of many where the winners and losers for particular policies cannot be identified without reference to the microdata.

Other examples of this type of research are projects based on the recent idea in macroeconomics that to understand aggregate economic fluctuations, it may be necessary to analyze time series fluctuations in the cross-sectional distributions of economic activity across establishments (Davis and Haltiwanger 1990). A simple example illustrates the new approach.(5) In the aggregate we have observed increases in average labor productivity in the manufacturing sector and a decline in employment. A representative agent model would suggest that the "representative plant" has been able to increase productivity through downsizing. However, it is not clear whether this representative agent hypothesis holds up. It could be that the gain in productivity is primarily from plants that expanded output and employment but expanded output more than employment. The decline in employment obviously is driven by a significant number of plants downsizing, but it may not be the downsizing plants that account for the productivity gains. Several new projects at CES (by Haltiwanger and other investigators) hope to answer these types of questions.

Data Products and Data Improvements

Until recently, CES focused primarily on developing its databases and putting its research programs into place. Nevertheless, analytic users have made specific suggestions for improvements to Census Bureau data programs and have provided their point of view on measurement issues important to the Census Bureau. Researchers have sent reports with suggestions for survey improvements to the Manufacturers' Shipments, Inventories, and New Orders (M3) survey and to the R&D survey; a similar report will be sent on the PACE survey. Almost as important as the reports are the informal working relationships built up between the Census Bureau's data production divisions and analytic users at CES. It is through such long-term working relationships that the Census Bureau learns how it can make its data programs more responsive to the needs of analytic users.

One important set of longstanding measurement issues concerns the problems the Standard Industrial Classification system has in trying to classify economic activity. Center work is supporting the "fresh slate" examination of economic classification recently endorsed by the Office of Management and Budget (OMB). CES has undertaken research designed to assess the feasibility and design of such systems (Abbott and Andrews 1990, McGuckin 1992, Mattey 1993).

Attention is now turning to developing new data products. Three new data products have already appeared. The first data product was an index of manufacturing product diversification (Gollop and Monahan 1988, 1991). The index has several desirable properties and uses detailed product level data from the LRD to measure diversification at the plant and firm level for the five Census years from 1963 to 1982. The index, together with some more recent evidence (Streitwieser 1991), shows that over the past thirty years, firms became more diversified, but plants became more specialized in the products they produce.

CES have also helped the Bureau's Foreign Trade Division to develop improved statistics showing the volume of trade in advanced technology products (ATP; see McGuckin, Abbot, Herrick, and Norfolk 1992). Since January 1989, the ATP series has been a part of the Census Bureau's monthly trade statistics.

This summer, the Center will release a monograph by Davis, Haltiwanger, and Schuh (1993) that contains annual and quarterly measures of manufacturing job creation and destruction during the 1970s and 1980s by detailed industry, geographic location, and establishment characteristics. These measures, based on establishment employment data in the LRD, are the first longitudinal statistics provided by CES. The data provide three important types of new information about labor market behavior: (1) they estimate the total reallocation, or gross flow, of jobs across establishments and sectors; (2) they reveal distinguishing characteristics of manufacturers that lose and gain jobs; and (3) they yield new insights about business cycles and trends in the labor market. We are also developing a series of nontechnical Statistical Briefs based on these data (as well as from other policy relevant research projects).


CES research projects have shown that with access to longitudinal micropanel data sets, researchers can generate significant research results and improve underlying data quality without compromising data confidentiality. As a result, demand for access has increased. To meet this demand, CES is moving in two directions:

1. We have developed working relationships with researchers from other government agencies. This makes sense because Census Bureau data underlie many of the aggregate data products that other federal statistical agencies use or produce. For example, the Federal Reserve Board has funded four projects at the Center. These projects will help the Fed to improve its estimates of industrial production and enable the Fed to learn more about how aggregation affects macroeconomic models, how real wages relate to the business cycle, and how small and large firms differ in their responses to changes in monetary policy.

2. We have established secure CES research facilities outside Washington, DC. Because facilities are limited at Census Bureau headquarters, significant expansion is not feasible there. Also, it is inconvenient and costly for many researchers outside the Washington area to come to headquarters to carry out projects. In August 1992, the Census Bureau's Executive Staff approved a major new pilot program to develop Research Data Centers (RDCs) at the Census Bureau's regional offices. Under a proposal currently under review by the National Science Foundation, the National Bureau of Economic Research and CES would set up a pilot RDC in the fall of 1993 at the Boston regional office. This pilot RDC would be able to accommodate up to eight research projects at a time. At least one other RDC in another city should be set up soon thereafter. If these pilot RDCs work out, the Census Bureau hopes to establish secure sites for RDCs outside the regional office, e.g., at universities. This would achieve a primary goal for CES -- providing wide access to Census Bureau microdata for important economic research while protecting the privacy of those who provide the data.


1 For more descriptions of the data products, see the July 1988 and April 1991 "Statistics Corner" sections of Business Economics.

2 A balanced panel would include data for all establishments in all years. The LRD is unbalanced because establishments are born and die, and because the ASM, as a sample survey, does not cover all establishments.

3 The relevant law is Title 13, U.SC, section 214. Violations are punishable with a fine of not more than $5,000 and imprisonment of not more than five years, or both.

4 For a complete description, see the Center for Economic Studies (1992).

5 The example is from a recent research proposal by Haltiwanger.


Bailey, Martin N., Charles Hulten, and David C. Campbell, "Productivity Dynamics in Manufacturing Plants," Brookings Papers on Economic Activity: Microeconomics, 1992, pp. 187-267, 1992 (with comments and discussion).

Center for Economic Studies, Center for Economic Studies Annual Report, Fiscal Year 1992, available from CES by calling (301) 763-2337.

Davis, Steven J. and John A. Haltiwanger, "Gross Job Creation and Destruction: Microeconomic Evidence and Macroeconomic Implications," NBER Macroeconomic Annual, 1990.

-----, "Gross Job Creation, Gross Job Destruction, and Employment Reallocation," Quarterly Journal of Economics, 1992, pp. 819-64.

Dunne, Timothy and James A. Schmitz, "Wages, Employer Size-Wage Premia and Employment Structure: Their Relationship to Advanced-Technology Usage at U.S. Manufacturing Establishments," CES Discussion Paper 92-15, December 1992.

Gollop, Frank M. and James L. Monahan, From Homogeneity to Heterogeneity: An Index of Diversification, Bureau of the Census Technical Paper 60, Washington, DC.: U.S. Government Printing Office, 1988.

-----, "A Generalized Index of Diversification: Trends In U.S. Manufacturing," The Review of Economics and Statistics, LXXIII:2, May 1991, pp. 318-30.

Long, William F. and David J. Ravenscraft, "Decade of Debt: Lessons from LBOs in the 1980s" (with comments and discussion), in Margaret M. Blair, ed., The Deal Decade: What Takeovers and Leveraged Buyouts Mean for Corporate Governance, Washington, DC.: The Brookings Institution, 1993, pp. 205-238. (1993a)

-----, "LBOs, Debt, and R&D Intensity," CES Discussion Paper 93-3, February 1993. (1993b)

Lichtenberg, Frank R., Corporate Takeovers and Productivity, Cambridge MA: The MIT Press, 1992.

Mattey, Joseph, "Evidence on IO Technology Assumptions from the Longitudinal Research Database," mimeo, February 1993.

McGuckin, Robert H., "Analytic Use of Microdata: A Model for Researcher Access and Confidentiality Protection," paper presented at the Eurostat sponsored International Seminar on Statistical Confidentiality, September 1992, Dublin, Ireland. (1992a)

McGuckin, Robert H., "Multiple Classification Systems for Economic Data: Can a Thousand Flowers Bloom? And Should They?," Proceedings, 1991 International Conference on the Classification of Economic Activities, 1992, pp. 384-407, (1992b)

McGuckin, Robert H., Thomas A. Abbott, Paul Herrick, and I. Leroy Norfolk, "Measuring Advanced Technology Products Trade: A New Approach," Journal of Official Statistics, Volume 8, Number 2, 1992, pp. 223-233.

McGuckin, Robert H., Sang V. Nguyen, and Stephen H. Andrews, "The Relationships Among Acquiring and Acquired Firms' Product Lines," Journal of Law and Economics, Fall, 1991; CES Discussion Paper 90-12.

McGuckin, Robert H. and George A. Pascoe, "The Longitudinal Research Database (LRD): Status and Research Possibilities," Survey of Current Business, November 1988, pp. 30-37.

Streitwieser, Mary L., "The Extent and Nature of Establishment Level Diversification in Sixteen U.S. Manufacturing Industries," Journal of Law and Economics, Part 2, Fall, 1991, pp. 496-55.

Triplett, Jack, "The Federal Statistical System's Response to Emerging Data Needs," Journal of Economic and Social Measurement, Volume 17, 1992. With comments and discussion.
COPYRIGHT 1993 The National Association for Business Economists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1993 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:McGuckin, Robert H.; Reznek, Arnold P.
Publication:Business Economics
Date:Jul 1, 1993
Previous Article:The business economist at work: government economists working on the national accounts.
Next Article:Windows.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters