Printer Friendly

Building and validating an administrative records database for the United States.

The Statistical Administrative Records System (STARS 1999)is a prototype database of addresses and person records built from national administrative records sources. The impetus for creating STARS arose from an administrative records census experiment conducted during the 2000 Census. The needs of the experiment shaped the design of STARS, which had to be census-like to enable evaluation of the experiment. STARS consists of address records linked to the appropriate person records that include race, Hispanic origin, sex and age.

The administrative records that make up STARS are:

* Internal Revenue Service Individual Master file of tax returns

* Internal Revenue Service Information Returns Master file of reported income and other information

* Medicare enrollment database

* Department of Housing and Urban Development public housing assistance file

* Selective Service System registration file of potential young male draft candidates

* Indian Health Service patient file

These files were selected to maximize coverage of the United States population, and to facilitate the integration of these data. Most people file an income tax return for themselves and their dependents, or have wage or interest income reported to the government. Because there are segments of the population that do not file tax returns or that have unreported income, we attempt to fill in coverage gaps with targeted files. The Medicare file includes data for much of the elderly population. The Selective Service file provides nearly complete coverage of young males, who are required to register should a military draft be necessary. The Indian Health Service file includes many American Indians, and the public housing assistance file targets the poor population.

There are several challenges in integrating these large data sources, which combined, total about 800 million records. Many people are represented in multiple files. For example, a young American Indian male might file a tax return, be registered for Selective Service, and have a record with the Indian Health Service. To facilitate integration, the administrative records include the Social Security number, a unique personal identifier, for each person record that allows us to avoid duplication. We compare each Social Security number to the administrative master list of numbers from the Social Security Administration, and remove invalid person records, such as those of the deceased, foreigners who filed taxes in the United States, and individuals with falsified Social Security numbers. The final STARS database does not include Social Security Numbers or names to preserve privacy and confidentiality, according to Census Bureau policy.

We select only administrative records files that include an address for each person record, which allows us to allocate people to census blocks and expand the applicability of the STARS database. For example, independent statistics based on administrative data can be computed for census blocks or higher levels of geographic aggregation. However, a given person may have varying addresses across files. For example, someone might move after filing his taxes and seek health services under Medicare at his new address. We resolve multiple addresses with a complex algorithm that generally uses address quality and timeliness to determine a single address for that person record in STARS.

The source files that comprise the prototype STARS 1999 are generally of a vintage that precedes Census 2000 by about 15 months. This precludes validating STARS using absolute numbers from the census. For example, STARS 1999 has about 257 million person records, compared to about 284 million for Census 2000. It is more meaningful to compare relative distributions of the population by race, Hispanic origin, age and sex. The tables below show that STARS 1999 does reasonably well at getting the correct demographic distribution of the population.

An updated version of STARS using more recent files along with several other improvements is currently under construction. STARS 2000 includes additional files to increase coverage of the population in public housing and to obtain more complete reporting of mortality. These and myriad other improvements will make STARS 2000 a more complete and accurate representation of the population of the United States.
Age comparisons at the national level (%)

 0 - 17 years 18 - 29 years 30 - 49 years
 old old old

Census 2000 26.0 16.6 30.3
StARS 1999 22.6 16.5 31.5

 50+ years old

Census 2000 27.1
StARS 1999 29.4

Race comparisons at the national level (%)

 American Asian or
 White or Other Black Indian Pacific
 Islander

Census 2000 81.5 12.7 1.4 4.4
StARS 1999 83.1 11.9 0.9 4.1

Hispanic origin comparisons at the national level (%)

 Hispanic Not Hispanic

Census 2000 12.5 87.5
StARS 1999 10.9 89.1

Sex comparisons at the national level (%)

 Male Female

Census 2000 49.1 50.9
StARS 1999 49.3 50.7


James Farber and Charlene Leggieri, The authors are from the U.S. Census Bureau.
COPYRIGHT 2002 New Zealand Association of Economists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Farber, James; Leggieri, Charlene
Publication:New Zealand Economic Papers
Geographic Code:1USA
Date:Jun 1, 2002
Words:793
Previous Article:Matching and cleaning administrative data.
Next Article:A Norwegian perspective on data integration.
Topics:

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |