Printer Friendly

Implementing the New Sample Design for the Current Employment Statistics Survey.

In June 1995, the Bureau of Labor Statistics (BLS) announced plans for a comprehensive sample redesign of its monthly payroll survey--the first one ever undertaken in the program's long history, which dates back over sixty years. In June 2000, following completion of the research and production test phases of the project, the first estimates from the redesigned sample were published by BLS for the wholesale trade industry. Publication for the remaining industry divisions will be phased in between 2001 and 2003. The completion of the phase-in for the redesign in June 2003 will coincide with the conversion of the Current Employment Statistics (CES) series from industry coding based on the 1987 Standard Industrial Classification (SIC) system to industry coding based on the North American Industrial Classification System (NAICS).


The CES survey is a federal/state cooperative program that provides monthly estimates of nonfarm payroll jobs and the hours and earnings of workers, derived from a sample of over 350,000 business establishments nationwide. These data are some of the most closely watched and widely used economic indicators among public and private policy makers. In addition to serving as major economic indicators in their own right, CES data series are also inputs to several other key economic statistics including the indexes of leading and coincident indicators, the national income and product accounts, productivity measures, and indexes of industrial production. The CES program offers several important attributes to its users: timely release of data, an abundance of industry and geographic detail, and an annual benchmark to full population counts from state unemployment insurance (UI) tax records, which helps to maintain overall survey accuracy.

However, the CES program also has faced with some serious limitations. Most significantly, the CES survey has been based on a quota sample since its inception, which predated the introduction of probability sampling as the internationally recognized standard for sample surveys. Quota samples are known to be at risk for potentially significant biases. Thus, the sample redesign, which introduces a probability-based design for CES, more effectively insures a proper representation of the universe of nonfarm business establishments through randomized selection techniques and improved estimation methodology.

In addition, the CES sample redesign addresses a second critical limitation in the program--lack of timely sample-based representation of employment from new business births. Procedures have been developed for regular sample updates that ensure better representation of new units in the CES sample. Time-series modeling techniques are used to estimate the residual portion of employment from business births that are not accounted for through improved sampling techniques.

Introduction of a probability-based sample for the CES survey also allows for the publication of sampling errors and confidence intervals--standard survey accuracy measures not directly applicable to the quota design.

Summary of Sample Redesign Implementation Plans

Initial implementation in June 2000

BLS began a phased-in implementation of the new CES sample design in June 2000, coincident with the publication of the March 1999 CES national benchmark revisions. The major effects from this introduction were:

* The wholesale trade industry series for CES national estimates were converted to the new probability-based procedures, and estimates for April 1998 forward were revised to incorporate these changes.

* Net birth/death modeling replaced bias adjustment for wholesale trade series.

* There were no series breaks in the wholesale trade estimates.

* There were no changes to the wholesale trade publication levels.

* Publication of sampling errors and confidence intervals for wholesale trade series began.

* There were no methodology changes to any industry series beyond the national wholesale trade estimates in June 2000.

Further implementation plans

Probability-based estimates for state and area wholesale trade series are targeted for introduction in March 2001 with the next state benchmark revision. After the initial conversion of wholesale trade, BLS will continue a phase-in of the new design by major industry division. Implementation of the new sample and estimators for major divisions are scheduled to coincide with the publication of benchmark revisions in order not to disrupt published over-the-month changes for current month estimates with a continually changing sample composition. The complete schedule for redesign implementation for the national, state, and metropolitan area CES series is shown the table.

Publication and error measurement

Initial implementation for each major industry division will preserve existing publication levels for the majority of series. There are no major changes planned to publication levels in any industry divisions until the introduction of NAICS-based series in 2003.

The benchmark revision will continue to be the most comprehensive error measure for the all-employee estimates, but a measure of sampling error also will be available. Sampling errors will be calculated for hours and earnings estimates as well. This allows for the first time the calculation of standard confidence intervals for CES estimates.

Methodological Issues and How They Are Being Addressed.

The new CES sample design

The new design is a stratified, simple random sample of worksites, clustered by UI account number. The UI account number is a major identifier on the BLS longitudinal database of employer records. This database serves as both the sample frame and the benchmark source for the CES employment estimates. The sample strata, or sub-populations, are defined by state, industry and employment size and yield a state-based design. The sampling rates for each stratum are determined through a method known as optimum allocation, which distributes a fixed number of sample units across a set of strata to minimize the overall variance, or sampling error, on the primary estimate of interest. The statewide total non-farm employment level is the primary estimate of interest, and the new design gives top priority to measuring it as precisely as possible. In other words, it minimizes the statistical error around the statewide total nonfarm employment estimates.

For the CES redesign, the number of sample units drawn for each state was fixed to the approximate size of the existing CES sample, the sample size supportable by current program resources. This sample size supports the publication of considerable industry and geographic detail within a state and provides for highly reliable national CES estimates at the total nonfarm and detailed industry levels.

The sampling frame and the CES sample are updated twice a year with new quarters of UI-based universe data. This helps to keep the sample upto-date by adding business births and deleting business deaths. In addition, the new design specifies an annual update process that includes sample frame maintenance and the redrawing of the entire sample for the first quarter of each year. Frame maintenance provides for the updating of industry, size class, and metropolitan area designations and for the merging of semi-annual birth samples into the overall frame. A high degree of overlap is expected at each annual update because all UI accounts are ordered on the frame with permanent random numbers (PRN). This technique assigns random numbers to all UI accounts on the universe frame at the time they first appear and then orders the frame by PRN. The allocation for each sampling cell is fulfilled by working down the ordered PRN list until the full complement of needed units is drawn. Because the random numbers are perman ent and thus remain in essentially the same order on the frame, this technique minimizes cancellation of existing sample units and the need to solicit replacement units.

Modifications to the research design

The new sample design, initially developed in a two-year research phase, was first tested under near-ideal conditions, using historical UI universe microdata files as a proxy for monthly collection of CES sample data from respondents. Thus, the research phase allowed BLS to define the best design and estimators from a purely methodological viewpoint.

The next phase in the redesign, a three-year production test, afforded the opportunity to evaluate the originally specified methodology in the actual operating environment of the CES survey, which includes sample nonresponse and short production timeframes. The production test was a valuable exercise in tempering an ideal statistical methodology with actual operating conditions and resource constraints. Several design modifications resulting from the production test are described below.

First, a 'swapping' procedure was developed for the annual sample redraw. This procedure increased the sample overlap from year to year, thereby reducing the new sample solicitation workload to conform to existing program resource constraints. The underlying principle of the swapping procedure is the retention of some of the previously selected sample units that were not selected in the redraw and that would otherwise have been dropped. These retained units are substituted or swapped with newly selected sample units from the redraw, thus reducing the overall amount of new unit solicitation associated with the annual sample update. The swapping procedure has a potential for introducing sample bias as it departs somewhat from purely random sampling, but simulations and evaluations done to date indicate that no bias has resulted. BLS will continue to study the issue.

As a second workload efficiency measure, the CES survey is retaining the largest current CES sample members (those units with one hundred or more employees), even in the cases where they were not selected for the probability-based sample. They are treated as certainty units, in other words assigned a weight of one, such that they represent only themselves in the estimates. This precludes them from introducing bias into the estimates. These units strengthen the CES estimation process by providing additional sample coverage without incurring solicitation costs.

Additionally, routines for sub-sampling of large multi-establishment UI accounts were developed. The subsampling is used only for cases where the reporting of a large number of worksites would be prohibitively expensive. Generally, this occurs if the selected firm has decentralized record keeping and if each of many worksites would have to be collected through a separate contact. Sub-sampling has been used only sparingly thusfar; it has the potential to introduce additional variability into the estimates.

One final change to the original research design is adoption of a cluster sampling principle, in which worksites are the basic sample unit but are clustered by UI account. Originally, the UI account itself was selected as the basic sample unit. The advantage of the UI account as the sample unit was that it could implicitly provide a mechanism for the capture of worksite births and deaths within multi-establishment firms that are covered by a single UI account. However, capture of this information proved operationally infeasible during the production test, owing to a quick-turnaround CES monthly production environment that largely relies on respondent self-reporting.

After the full implementation of the sample redesign is completed in 2003, BLS plans to begin a regular program of ongoing sample rotation, primarily to reduce respondent burden and related sample attrition. The largest units in the sample (tentatively defined as those with employment of 1000 or more) will not be rotated. The exact rotation period, or length of time a unit is in the sample, will be determined based on solicitation and data collection cost constraints.

Estimation formulas

In order to support the new sample design, improved estimators also have been developed and tested for the CES survey. The primary difference from the current CES estimator is the application of a weight to each sample unit in the estimation process. The weights are derived from population sampling fractions and are a standard feature of probability sample estimators. A sampled unit's weight is the inverse of its probability of selection; for example, a sample unit selected from a cell where one in ten units are selected will have a weight of ten, because it represents itself and nine other units. Previously, CES used an unweighted ratio estimator, known as the link relative, for the all-employee estimates. The new estimator is defined as a weighted link relative. There will be no level shifts or series breaks in the all-employee series, because the redesign estimates, like the historical published series, are anchored once a year to the benchmark level derived from the UI universe count of employment.

In order to prevent series breaks in the hours and earnings estimates, the initial implementation of the redesign also utilizes the weighted link relative estimator for these data types; and the first month's redesign estimate links to the final month's estimate produced under the old sample design. By utilizing the weighted link relative methodology for hours and earnings, the new sample design can be incorporated without subjecting these series to level shifts. BLS will implement any necessary level shifts in 2003, the final year of redesign implementation. In addition to introducing the final set of industries under a probability sample, 2003 also will feature the conversion from an SIC to a NAICS industry coding structure. Thus any series breaks, or level shifts, needed in the CES hours and earnings series from either the redesign or the coding structure changes will be incorporated simultaneously.

Business birth and death estimation

Regular updating of the CES sample frame with information from the UI universe files will help keep the CES survey current with respect to employment from business births and business deaths. The most timely UI universe files available, however, always will be a minimum of nine months out-of-date. The CES survey thus cannot rely on regular frame maintenance alone to provide estimates for business birth and death employment contributions. BLS has researched both sample-based and model-based approaches to measuring birth units that have not yet appeared on the UI universe frame. The research demonstrated that sampling for births was not feasible in the very short CES production timeframes. BLS is therefore utilizing a model-based approach for this component.

Exploratory research indicated that while both the business birth and death portions of total employment are generally significant, their net contribution is relatively small and stable. To account for this net birth/death portion of total employment, BLS has implemented an estimation procedure with two components.

The first component is incorporated into the sample-based link relative estimation procedure by simply not reflecting sample units going out of business, but imputing to them the same trend as the other firms in the sample. The second component uses ARIMA time series modeling to estimate the residual net business birth/death employment that is not accounted for by the first component. The historical time series used to create and test the ARIMA model was derived from the UI universe micro-level database and reflects the actual residual net of births and deaths over the past ten years.

The most significant potential drawback to this (or any) model-based approach is that time-series modeling assumes a predictable continuation of historical patterns and relationships. Therefore, it is likely to have some difficulty producing reliable estimates at economic turning points or during periods when there are sudden changes in trend. BLS will continue researching alternative model-based techniques for the net birth/death component; it is likely to remain as the most problematic part of the estimation process.

The net birth/death models replace the bias adjustment modeling used in the CES program as each major industry division is phased into publication. The ARIMA model component will be updated and reviewed on a quarterly basis, as are the current bias adjustments. However, the net birth/death model component figures will be unique to each month; but the bias adjustments are identical for all three months of a given quarter. The net birth/death model components exhibit a seasonal pattern that can result in negative adjustments in some months.

An important conceptual and empirical distinction between bias adjustment models and the new net birth/death models is what they are modeling for. Although the primary purpose of the bias adjustment process is to account for new business birth employment, it also adjusts for other elements of non-sampling error, or bias, in the CES estimates, as the primary input to the model is total estimation error. Sample bias can be significant in the old sample because of its quota design, and therefore the bias adjustment component is relatively large. In contrast, the net birth/death models estimate only the residual component not measurable by the sample; they do not attempt to correct for deficiencies in sample design. Therefore, the net birth/death model component in the redesign series is expected to be significantly smaller overall than the bias adjustment component in the historical CES estimates.


Annual benchmark adjustments that revise two years of data will continue under the redesign, but there will be some changes to the process. Historically, when national series were benchmarked, sample links derived from the final (or third) set of monthly estimates were applied to the March benchmark level to re-estimate one year forward from the new benchmark levels. The year prior to the benchmark was adjusted by a simple wedge-back procedure that distributes the benchmark error in equal increments across the year preceding the March benchmark.

For initial implementation of redesign estimates for each major industry division, both the year prior to and the year following the March benchmark month are revised to incorporate sample-based estimates calculated from the new sample and estimators. Thus, there will be more revision in the benchmark period than experienced previously for all data types. In particular, basic cell-level hours and earnings estimates, which had no benchmark revision under the previous procedures, are subject to change.
Major Division National State and Area
Wholesale Trade June 2000 March 2001
Manufacturing June 2001 March 2002
Mining June 2001 March 2002
Construction June 2001 March 2002
Transportation and Public Utilities June 2002 June 2003
Finance, Insurance and Real Estate June 2002 June 2003
Retail Trade June 2002 June 2003
Services June 2003 June 2003
COPYRIGHT 2000 The National Association for Business Economists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Getz, Patricia M.
Publication:Business Economics
Geographic Code:1USA
Date:Oct 1, 2000
Previous Article:Bribery in International Business Transactions and the OECD Convention: Benefits and Limitations.
Next Article:The Global Market for Electric Vehicles.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters