Printer Friendly

Developing statistics to meet society's needs.

Developing statistics to meet society's needs

Three historical illustrations show how Government agencies adapt to changing social and economic needs by developing new concepts and methods

The development of statistics in the United States has been very much stimulated by the country's need for knowledge about its people, its economy, and the conditions of life. Beginning with the counting of the population as required by the Constitution, government data collection has expanded to cover employment, agriculture, industrial production, prices, earnings, consumption, health conditions, and a variety of other important areas. As the statistical system developed, data collection techniques became standardized and scientific sampling and estimation procedures were developed.

Although the history of this methodological progress is well known, it is surprising that so little attention has been paid to the development of the concepts and definitions that frame the issues and give substance to the results of statistical series. This is especially true when social and economic phenomena are measured, because definitions in these areas tend to change with society's view of the issue.

A statistical system, if it is to remain relevant, must build on the past but also must be prepared for change. Of course, there also must be order in the system for useful statistics to be developed; without consensus on what to measure and on the definitions and classifications involved, statistical knowledge cannot be developed.

Imagine the confusion if analysts were to compare statistics on the textile industry and some surveys included knitting mills while others restricted the information to weaving mills; or, if it had not been decided whether trucking firms that deliver textiles were part of the industry or separate from it, or whether the manufacture of machinery for textile production should be included as part of the industry. Ambiguities such as these led to the establishment of the Standard Industrial Classification system.

Even what would appear to be the simple counting of the people in the country has required the development of definitions and categories that are accepted as relevant to the characteristics of the population at the time of data collection. The earliest U.S. censuses enumerated slaves and free men. Slavery was abolished, but concerns about racial characteristics continued, and the categories for which counts would be made reflected those concerns. Later, the large waves of immigration that took place in the 19th and 20th centuries highlighted the need for additional racial and ethnic classifications.

As congressional legislation required the collection of information on conditions of work, and more particularly on the earnings of working men and women in the United States, further refinement of concepts occurred in that area. The point is that the phenomena underlying government statistics keep changing, the country's view of the concepts underlying data also changes, and those responsible for the measurement of these phenomena in official statistical series need to take account of the changes in the definitions used in the conduct of surveys.

As conditions in society have changed, new information needs have emerged, and new classification schemes and innovative approaches to the conceptual framework and the definitions within it have been developed and modified to meet those needs. This article discusses three examples of the conceptual contributions of Federal agencies to statistical development.

Employment by industry

National information on employment by detailed industry dates back to the 1899 Census of Manufactures, although the Bureau of Labor Statistics had conducted a number of special surveys in particular areas and industries in the 1880's. Earlier population censuses, such as the one in 1810, made broad distinctions among agriculture, commerce, and manufacturing. By 1910, the population census obtained information on the occupation and industry of every working person. The instructions to the enumerators noted that "the occupation, if any, followed by a child of any age, or by a woman is just as important for census purposes as the occupation followed by a man." The interviewers were further instructed to "describe the branch of industry, the kind of business or establishment, line of work or place in which this person works, as cotton mill, general farm dry goods store, insurance office, bank."(1)

Some individual States began compiling intercensal employment estimates early in the 20th century, but these data were largely restricted to those industries dominant in each State's economy.(2)

The Bureau of Labor Statistics introduced publication of the Monthly Labor Review in 1915 and included employment statistics for about a dozen countries in the statistical section. Recognizing that the information for Great Britain, Germany, and France was superior to that for the United States, BLS began a program to collect and publish industry employment data. Beginning with four industries--boots and shoes, cotton goods, cotton finishing, and hosiery and underwear--the program was the forerunner of today's Current Employment Statistics Program, a Federal-State cooperative venture that covers all nonfarm establishments.

The depression of 1920-21 focused attention on the need for timely industry employment data, and funds were provided by the Congress to expand the survey. By 1923, the survey covered 52 industries grouped into 12 major categories, one of the first examples of industry classification.(3)

In the 1930's, several Federal agencies had their own systems of industrial classification, including the Bureau of the Census, the Internal Revenue Service, and the Social Security Board and its affiliated State Employment Security Agencies. However, the data system was fragmented and comparisons were difficult. Recognizing the need to develop a general industrial classification system for all Federal statistical agencies, developmental work was begun under the auspices of the U.S. Central Statistical Board, the predecessor of the Statistical Policy Office of the Office of Management and Budget (OMB). The first Standard Industrial Classification (SIC) Manual was issued in 1939.

The SIC manual has been revised several times, to reflect changes in the economy and in the consensus of how best to organize the information. For example, views have changed back and forth on the proper classification of government activities--either according to the particular function, such as education or health services, or separately as its own industry. Other issues have included the treatment of separate administrative offices, the type of organization (corporate, sole proprietor, for profit/not for profit), character of the work force, and use of technology.

The basic principle of the SIC system is that establishments are classified by type of economic activity. But under that umbrella come several different approaches. In most cases, the dominating factor is product or activity, but, in some instances, end use, nature of raw materials, or market structure may play a role. Thus, one can have the anomaly of one industry producing what seems to be several different products, while what appears to be a single product may be produced in several different industries. For example, SIC 3651--Household Audio and Video Equipment--consists of establishments that manufacture not only VCR's and clock radios for consumer use, but also juke boxes and loud speakers for public address systems. On the other hand, a simple product, chairs, may be produced in one of six different industries depending on whether the chair is wood or metal, upholstered or not, produced for home or for office use. Establishments that produce chairs that convert into beds would be classified in still another industry.

The latest SIC manual, the 1987 revision, is just now being introduced into the Federal statistical system, but discussions continue on many issues. Is the establishment still the best unit of measurement? Should the process of production carry more weight than the output? How do you best classify firms with many products or services? What is the nature of output in the service sector?

It is important to recognize, of course, that once a classification system has been set in place, change is often difficult to achieve. A tradeoff must be made between relevance to new conditions and continuity of time series analysis. Furthermore, the development of historical revisions or overlapping series can be very costly. The SIC has, over the years, provided the consistency and uniformity required for an organized system of Federal statistics. Nonetheless, as the statistical system comes to grips with changes in the economic system that have caused the bulk of its employment and a large part of its output to move to the service-producing sector, the need for a thorough review of the basic theory of the SIC and of the concepts underlying it has become increasingly apparent, and some work has begun in this direction.

Race and ethnicity

One important classification with a long history revolves around race and ethnicity. The subject is also one of considerable sensitivity because the availability of data for a particular demographic group may determine fund allocation or program development.

While at least a partial identification of whites and blacks goes back to the first population census, the underlying concepts and the salient aspects have changed markedly. For example, in the 1890 census, separate information for quadroons and octoroons--persons with one-quarter or one-eighth black parentage--was collected, while in 1930, any mixture of white and some other race was to be reported according to the race of the parent who was not white.

We often behave as though there were a uniform scientific basis for the racial definitions, yet the categories have changed markedly over the years, as has our understanding of them. In 1870, the census form instructed, "Be particularly careful in reporting the class Mulatto. The word here is generic, and includes quadroons, octoroons, and all persons having any perceptible trace of African blood. Important scientific results depend on the correct determination of this class...." A hundred years later, the Statistical Policy Division of OMB, in issuing Race and Ethnic Standards for Federal Statistics and Administrative Reporting, noted that "these classifications should not be interpreted as being scientific or anthropological in nature."(4) Similarly, a BLS-prepared Directory of Data Sources on Racial and Ethnic Minorities noted that "the concept of race as used in these data sources does not denote clear-cut scientific definitions of biological stock. Rather it reflects self-identification by respondents or determination of race by an interviewer."(5)

The issue of self-determination versus interviewer determination is an interesting one. In the early years of the census, the determination was always by observation. In the biographical novel, Sally Hemings, Barbara Chase-Riboud describes the 1830 visit of a census enumerator to the home of Sally Hemings, a former slave, widely believed to have been the mistress of Thomas Jefferson. The census taken "opened to a new page in his ledger. If Sally Hemings was who and what people said she was, then Thomas Jefferson had broken the law of Virginia.... He hesitated for a moment and then wrote: Sally Hemings, Female, between 50 and 60, Without occupation, Race: White."

The practice of racial classification by the interviewer rather than the respondent was carried over into the Current Population Survey (CPS) both for operational and conceptual reasons. Operationally, the fear was that in some face-to-face situations the asking of a person's race might be considered so offensive as to damage further respondent cooperation in the survey. Also, because a major objective was to obtain information on the number of persons in the study population who might be subject to discrimination because of the community's perception of their racial or ethnic heritage, the observation of the interviewer was thought to be a good proxy for community opinion. In the 1970 population census, data collection changed from being done exclusively or largely by personal visit to mail. This, of course, precluded determination by observation, and questions for self-identification were developed.(6) At the same time, rising consciousness among various segments of our society led to a strong demand for statistics based on self-identification. Thus, in 1978, the collection procedures in the CPS were officially changed to self-identification.

In the CPS, tabulation and publication of information separately for whites and all others began in 1948 but, without separate monthly population estimates, only rates and percentages were shown. In 1954, with the introduction of procedures to make monthly population estimates by race, absolute numbers were published for the first time. The nonwhite category--including blacks and other minorities--was used as a proxy for the labor market situation for what were then called Negroes. In the 1960's, it became clear that significant differences existed in labor market experiences within the overall nonwhite category, and the possibility of tabulating data separately for "Negroes" was explored. Procedures were developed to do this, and, beginning in 1972, data became available monthly for blacks as a separate group.

In the last two decades, rising interest in the extent of Hispanic immigration and the socio-economic conditions of this group has led to a desire for separate data on persons of Hispanic origin. Yet, there was considerable difficulty in developing an appropriate method of classification. The ethnic identifier with the longest history of use in household surveys is the birthplace of the individual or his or her parents. Obviously, this only identifies first- and second-generation Americans.

Other identifiers that have been used are Spanish surname, mother tongue, and Hispanic origin. A list of Spanish surnames was developed for use in the five Southwestern States with large concentrations of Mexican-Americans, many of whose ancestors had settled in the area centuries earlier and could not be identified by country of birth. The list of surnames was not useful elsewhere in the country because many of the names on the list are also common among persons of Italian, Portuguese, and other Latin but non-Hispanic origin.

Mother tongue--the language spoken at home during childhood--has also been used as an identifier. It also tends to be most successful for first- and second-generation Americans.

For the 1970 population census, a "Spanish heritage" definition was adapted which combined these various identifiers:

(1) Spanish surname or Spanish mother tongue for the five Southwestern States (Arizona, California, Colorado, New Mexico, and Texas);

(2) Puerto Rican birth or parentage in New York, New Jersey, and Pennsylvania; and

(3) Spanish mother tongue in the remaining States.

The confusion and difficulty of using such mixed procedures led to efforts to develop a single, specific question to obtain Hispanic origin. This approach is now used in both the population census and the Current Population Survey. In the CPS, the respondent is asked the origin or descent of each member of the household while being shown a flashcard with such entries as German, Irish, Polish, Mexican, Puerto Rican, Cuban. The CPS interviewers' manual states that "origin or descent refers to the national or cultural group from which a person is descended and is determined by the nationality or lineage of a person's ancestors. There is no set rule as to how many generations are to be taken into account in determining origin."

Some of the issues we have faced in trying to develop appropriate classifications for race and ethnicity have also been faced in other countries. For example, in Great Britain where the evolution into a multiracial society is relatively recent, and historically there had been little large-scale immigration, the measurement of race and ethnicity has been problematic. In the 1950's and 1960's, questions on country of birth could be used to infer race/ethnicity, and a question on parents' country of birth was added in 1971 to identify the second generation. With recognition that this approach would not last another generation, work was begun on the development of a question on national or ethnic origin. The 1991 British census will probably have such a question--most likely with seven categories: white, black, Indian, Pakistani, Bangladeshi, Chinese, and other. But, there is concern about possible respondent objection, and discussion about the appropriate groups to identify continues.(7)


In the first 50 years after the American Statistical Association was established, occasional attempts were made to develop statistics on the social and economic status of American workers through wage surveys. Then, as now, however, the underlying concepts, purposes, and definitions were complex and sometimes difficult to understand. Even a century ago, survey programs had to meet more than one objective. In fact, about 100 years ago, a State Commissioner of Labor Statistics, in the first annual report of his agency, wrote:

Investigation about wages may have several

distinct objects. One is, to find the rate of money

wages actually paid. Another is, to compare it

with the necessary expenses of living. A third is,

to compare the laborer's share of the product with

that of the capitalist's. A fourth question, perhaps

most important of all, is to find in what direction

things are moving.(8)

The early attempts collected information on wage rates--either per hour or per year--for different demographic groups--men, women, and children. As early as 1875, the collection of wage statistics was attempted in a State population census. Interestingly, in the State of Massachusetts, experiments were tried to collect wages from two different sources: from employers and from the workers themselves. Data collected from employers--$580 a year on average for males--was considerably higher than the data collected directly from workers--only $482.(9)

The feeling at the time was that these two sources of reports might contain bias. The employer paying high wages is proud of that fact, it was thought, and would be happy to report this good treatment, whereas the low-wage employer would prefer to conceal the facts from the data collectors. On the other side, the bias could be upward or downward. A worker willing to report was generally thought to be a person of greater than average intelligence--and, therefore, someone likely to be earning a higher salary. On the other hand, a worker reporting his earnings never believes that they are adequate and might well under report them.(10)

While modern society requires that employers maintain accurate records, our efforts to collect data directly from individuals still may suffer from some of these conditions. Studies have found that earnings collected from CPS households, for example, generally are slightly lower than those collected from business records. In addition, definitions have become more complex, and recall more difficult. Many people remember take-home pay--not the overall rate of pay before deductions for Social Security and income taxes, health insurance, and the employee's share of the cost of employer-provided benefits. The statistical community is making efforts to improve the questions asked in household surveys because this source is essential for understanding individual earnings in a family context.

The problem of developing averages and interpreting their meaning was also an issue that was discussed a century ago. Carroll Wright, the first Commissioner of BLS, wrote in the first report of his new Federal bureau in 1886: "A casual examination of these summaries will show that any attempt to prove an American rate of wages must necessarily result in failure. There is no such thing as an American rate of wages."(11)

Even then, it was clear that a way had to be found to separate wages by occupation and by hours of work if the data were to be meaningful for analytical purposes. In those early days, the Nation's railroads hired temporary workers, many of whom did not work full time. In discussing the question of the meaning of aggregate wages with his State colleagues, Mr. Wright expressed the view that it was very easy to collect two simple facts from the railroads--the aggregate wages paid and the total number of workers employed at a given time. Division of one number by the other produced, according to Wright, "a vicious quotient" to represent the average earnings of all railroad workers. This general average could be quite misleading, he maintained, and he insisted that those involved with data collection must find a way to "individualize" the account so that the actual earnings of each worker would be properly reported.(12)

From these beginnings, two types of wage and earnings statistics have evolved. The effort has involved both the collection of average earnings for business establishments and the study of occupational wages by industry and by geographic area.

The early requests for data often involved "rate of wages paid in different States of the Union ... for instance, for puddlers in New York or carpenters in Ohio."(13) These surveys, generally of straight-time hourly wage rates, have been collected for a changing group of occupations and industries ever since. Over the years, the surveys have been expanded to cover salary rates as well as wage rates of pay and to provide information on the structure of rates by region and locality, industry, union status, and sex.

The other early source of earnings statistics was from the monthly survey of establishments' employment and payroll. While this survey began in 1915, only payroll totals were available until 1933, when average hourly earnings and average weekly earnings were published for the first time. At about the same time, legislation was passed to establish the payroll survey as a Federal--State Cooperative program, enabling it to expand in size to its current position as the largest monthly establishment survey. This survey was an excellent vehicle for collection of aggregate wage data as well as payroll employment information at the detailed industry level, making its average hourly and weekly earnings series quite popular for general analytical purposes.

These data have been especially useful during recent decades, which have included periods of recession and expansion as well as years of very high inflation and concerns about the trend of unit labor costs. The average earnings series, while affected by problems of shifting mix--of changes in full-time and part-time workers as well as shifts in occupations and earnings--proved useful in gauging overall trends in the economy.

During the early 1970's, Federal Government efforts at wage and price controls highlighted the need for a general wage index based on occupational wage surveys of employers that would include the increasingly important supplements to wages, cover the entire economy, and be free from shifts of employment among occupations and industries. The Employment Cost Index (ECI), currently the best indicator of wage trends, was designed to cover all costs of workers' compensation--wages, salaries, and employer costs for workers' benefits. The ECI, like the Consumer Price Index, has a market basket with base-period weights; the ECI uses fixed employment weights by occupation and by industry. It has developed in stages to its current profile of more than 100 published series, including occupations, industries, geographic regions, and union status.

Discussion continues on such issues as the treatment of lump-sum and other nonrecurring payments, and the value of noncash payments such as health insurance, retirement contributions, and child care benefits. It is clear that the classification system in the wage area will continue to undergo further development.

Where we are

This article has focused on three examples which illustrate different aspects of the evolution of content in Federal statistics. The first, the system of industry classification, introduced order and relationship into survey design so that statistical data could be defined more precisely, presented more intelligently, and analyzed in a more meaningful fashion. Although a number of revisions and additions to the Standard Industrial Classification system have taken place, the system has promoted stability in data relationships over a long period of time. The industrial restructuring that has taken place, especially over the last few decades, and the challenges of new technology suggest that it may be time for a comprehensive reexamination of the concepts underlying the SIC structure and a modernization of the entire system.

The review of the definitions of race and ethnicity shows the evolution that occurred in collecting and processing these demographic data; it also demonstrates the use of innovative approaches to deal with societal change within the survey process. These issues remain with us. As the country's ethnic composition and the situation of our minority citizens change, our information data base must be kept relevant.

The final example deals with the historical development of an economic concept, clearly one of the most difficult of all the issues with which the survey statistician must deal. Compensation, which can be looked at as a cost to the employer as well as a benefit to the worker, has been measured in one form or another for more than a century, and studies on the issues are still going on. This example is intended to show how a clear understanding of the underlying concept is essential for the collection of meaningful data. The statistical system will need to give far more attention in the future than it has in the past to the identification and delineation of the concepts which underlie our data collection. Indeed, this area is one of the most important elements of nonsampling error that must be dealt with by the statistical system.

As we look to the future, we see emerging issues of economic growth, income distribution, potential labor shortages, illness, pollution, and a whole host of other important topics. Will the progress made in the three areas discussed here be sufficient to carry us into the year 2000 and beyond? Probably not. But we have seen from this brief review that the changing views of society force changes in survey concepts and definitions so that the Nation's data base can keep up with society's needs. We know that changes will occur in the future, and we believe that the statistical community will continue to be responsive to the need of our country for information that remains relevant to the critical issues of our time.

Footnotes (1)Bureau of the Census, Twenty Censuses: Population and Housing Questions, 1790-1980 (Washington, Government Printing Office, 1979), pp. 43-44. (2)"Thumbnail Sketches of BLS Statistical Series," Bureau of Labor Statistics, unpublished, Apr. 2, 1964. (3)Ibid. (4)Katherine K. Wallman and John Hodgdon, "Race and Ethnic Standards for Federal Statistics and Administrative Reporting," Statistical Reporter, 1977, pp. 450-54. (5)Bureau of Labor Statistics, Directory of Data Sources on Racial and Ethnic Minorities, Bulletin 1879 (Washington, Government Printing Office, 1975). (6)For an excellent discussion of the development of these questions, see Elizabeth Martin, Theresa DeMaio, and Pamela Campanelli, "Context Effects for Census Measures on Race and Hispanic Origin," Proceedings of the American Statistical Association Annual Meetings, 1988. (7)Martin Bulmer, "A Controversial Census Topic: Race and Ethnicity in the British Census," Journal of Official Statistics, Vol. 2, No. 4, 1986, pp. 471-80. (8)Connecticut Bureau of Labor Statistics, First Annual Report (Hartford, CT, Case, Lockwood & Brainard Co., 1885). (9)Ibid. (10)Ibid. (11)U.S. Commissioner of Labor, First Annual Report, Industrial Depressions (Washington, Government Printing Office, 1886), p. 142. (12)National Convention of Chiefs and Commissioners of the Various Bureaus of Statistics of Labor in the United States, Proceedings, 1889, p. 20. (13)"Thumbnail Sketches."

Janet L. Norwood is Commissioner of Labor Statistics. Deborah P. Klein is an economist in the Office of the Commissioner. The article is drawn from "The Changing Focus of Government Statistics: A Historical Perspective," an invited paper prepared by the authors for the Sesquicentennial Program of the American Statistical Association. Summaries of other BLS papers presented at the 1989 ASA conference appear on pages 29-33.
COPYRIGHT 1989 U.S. Bureau of Labor Statistics
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1989 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Norwood, Janet L.; Klein, Deborah P.
Publication:Monthly Labor Review
Date:Oct 1, 1989
Previous Article:A profile of the working poor.
Next Article:Employer provisions for parental leave.

Related Articles
Report cites shortage of rehab professionals.
Preparing for change.
International symposium on linked employer-employee data.
New partnership for educating doctors.
Limousins dominate beef market.
IRISH DAILY MIRROR COMMENT : Speeding as lethal as drink driving.
Nursing Honor Society and The John A. Hartford Foundation centers to create Geriatric Nursing Leadership Academy.
For Maghreb strategy to collect and handle statistical data.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters