# Rising inequality: transitory or persistent? New evidence from a panel of U.S. tax returns.

ABSTRACT We use a new, large, and confidential panel of tax returns
to study the persistent-versus-transitory nature of rising inequality in
male labor earnings and in total household income, both before and after
taxes, in the United States over the period 1987-2009. We apply various
statistical decomposition methods that allow for different ways of
characterizing persistent and transitory income components. For male
labor earnings, we find that the entire increase in cross-sectional
inequality over our sample period was driven by an increase in the
dispersion of the persistent component of earnings. For total household
income, we find that most of the increase in inequality reflects an
increase in the dispersion of the persistent income component, but the
transitory component also appears to have played some role. We also show
that the tax system partly mitigated the increase in income inequality,
but not sufficiently to alter its broadly increasing trend over the
period.

**********

An extensive literature has documented a large increase in income inequality in the United States in recent decades. In this paper we ask to what extent this observed increase reflects an increase in persistent or in transitory inequality. By persistent inequality we mean long-run inequality, or the dispersion across the population in those components of income that are more or less stable over periods of more than a few years. By transitory inequality we mean the dispersion arising from short-run variability in incomes, as individuals move around within the income distribution at relatively short frequencies of one to a few years. (1)

The distinction between persistent and transitory inequality is important for various reasons. First, it is useful in evaluating proposed explanations for the documented increase in annual cross-sectional inequality. For example, if rising inequality reflects solely an increase in persistent inequality, then explanations consistent with this rise would include skill-biased technical change and long-lasting changes in employers' compensation policies. By contrast, an increase in transitory inequality could reflect increases in income mobility, driven perhaps by greater flexibility among workers to switch jobs. Second, the distinction is useful because it informs the welfare evaluation of changes in inequality. Lifetime income captures an individual's (or a household's) long-term available resources, and hence an increase in persistent inequality would reduce welfare according to most social welfare functions. By contrast, increasing transitory inequality would have less of an effect on welfare, especially in the absence of liquidity constraints restricting consumption smoothing.

One important aspect of our contribution is the use of a new and superior data source to shed new light on the decomposition of inequality and of changes in inequality into persistent and transitory components. We use a new, large, and confidential panel of tax returns from the Internal Revenue Service (IRS) to study the persistent-versus-transitory nature of rising inequality in individual male labor earnings and in total household income, both before and after taxes, in the United States over the period 19872009. (2) Our panel constitutes a 1-in-5,000 random sample of the population of U.S. taxpayers. It contains individual-level labor earnings information from W-2 forms as well as household-level income information from Form 1040. It also includes information on the age and sex of the primary and secondary tax filers from matched Social Security Administration (SSA) records. Our broadest sample consists of roughly 350,000 observations on 35,000 households and is therefore substantially larger than the publicly available, survey-based panels typically used to address related questions in the literature. In addition, our data are not subject to top-coding and are less likely than the survey data to be affected by measurement error.

We analyze the persistent-versus-transitory nature of rising inequality by decomposing income into persistent and transitory parts and examining how much each of these parts contributed to the increase in the cross-sectional variance of income (our measure of income inequality; see footnote 1) over our sample period. In reality, incomes are subject to many different types of shocks. Some of these might be truly persistent (or even permanent), and some entirely transitory, but many are likely to exhibit some degree of persistence (that is, serial correlation) in between the two extremes. As a result, decomposing income into persistent and transitory components requires taking a stand on what degree of serial correlation in income shocks will be considered "persistent" and what degree will be considered "transitory." This choice necessarily involves some arbitrariness.

Our analysis uses two sets of methods, each of which takes a somewhat different approach to separating income into persistent and transitory parts. First, we employ simple nonparametric decomposition methods that essentially separate income into a highly transitory piece that exhibits no serial correlation and one other piece, which we call "persistent." These methods then ask how much of the rise in the variance of income is coming from changes in the variance of the transitory piece and how much from changes in the variance of the persistent piece. Second, we employ rich non-stationary error components models of income dynamics. (3) These models fully specify the process that generates income over time and essentially decompose income into a highly persistent piece and another, transitory piece that allows for some (limited) degree of serial correlation. Here, too, we then ask how much of the rise in the variance of income is coming from changes in the variance of the persistent piece and of the transitory piece.

The two approaches can give somewhat different answers about the shares of income inequality at any given point in time that are attributed to the persistent and to the transitory income components. The more serial correlation that is allowed in the transitory income component, the larger the share of inequality at a given point in time that will be attributed to that component (because some of the short-duration persistence in the income data will be attributed to the transitory piece). The simple nonparametric methods, which use a stricter definition of transitory income, attribute the vast majority of the variance to the broadly defined persistent income component. Our error components models, which, as noted, allow for some serial correlation in transitory income, assign a somewhat larger fraction of total inequality to this more broadly defined transitory income.

However, and most important, both approaches yield very similar results for our main object of interest: the increase in income inequality and its components over time. For male labor earnings, both approaches imply that the entire increase in cross-sectional inequality over the 1987-2009 period was driven by an increase in the variance of the persistent component of earnings. Specifically, we find that the variance of the persistent component of log male labor earnings increased over this period but the variance of the transitory component did not.

For total household income--which in addition to male labor earnings includes spousal labor earnings, transfer income, investment income, and business income--both approaches imply that the increase in inequality over our sample period was mostly (although not entirely) persistent. For this broader category of income, the variance of both the persistent and the transitory components of income increased, but the persistent component contributed the bulk of the increase in the total variance. Furthermore, the increase in the variance of the transitory component of total household income reflects increases in the transitory variance of spousal labor earnings and of investment income.

Next, we use our data from tax returns to examine the role of the federal tax system in the observed trend in income inequality. In particular, we investigate whether the increase in inequality for after-tax household income differs materially from that for pre-tax income. Our measure of after-tax household income accounts for all federal personal income taxes (obtained from Form 1040), including all refundable tax credits, as well as payroll taxes (calculated using information from W-2 forms). We find that the cross-sectional variance of after-tax income is on average 0.10 squared log point, or roughly 15 percent, smaller than the variance of pre-tax income, reflecting the overall progressivity of the federal tax system. In terms of the trend, we find that the tax system helped mitigate somewhat the increase in household income inequality over the sample period, but this attenuating effect was insufficient to significantly alter the broad trend toward rising inequality.

Finally, we note that our paper is the first to estimate error components models of income dynamics using U.S. administrative data, and that the quality and significant size of our data set allow us to obtain very precise estimates of our models. Our paper is also among the first to apply non-stationary models to household-level income, which is arguably a more relevant income measure than individual earnings for questions regarding consumption and welfare. Additionally, our comparison of decompositions using different approaches should help clarify the connections as well as the differences that exist across the different methods.

The rest of the paper is organized as follows. Section I discusses the related literature and places our results in the context of existing studies. Section II describes our data set, our sample selection, and the trends in income inequality in our data. Section III outlines our methodological approach. Section IV introduces the simpler nonparametric methods and presents results for male earnings using those methods. Section V introduces our error components models, discusses their estimation, presents model estimates for male labor earnings, and uses the estimated model to decompose the cross-sectional variance of male earnings into persistent and transitory parts. Section VI presents results using our various methods for pre-tax total household income. Section VII investigates the role of the federal tax system in the increase in income inequality. Section VIII concludes.

I. Related Literature

An extensive literature has documented a large increase in labor earnings inequality in the United States in recent decades. (4) A small branch of this literature has attempted to determine whether this documented increase in cross-sectional earnings inequality reflects an increase in persistent or in transitory inequality, as these are defined in footnote 1. The earlier studies, including Peter Gottschalk and Robert Moffitt (1994), Moffitt and Gottschalk (1995), and Steven Haider (2001), all use data from the Panel Study of Income Dynamics (PSID) and generally conclude that a substantial part (as much as one half) of the increase in cross-sectional earnings inequality in the 1970s and early 1980s was transitory. (5)

Very few studies have analyzed the last two decades, although earnings inequality has continued to increase. Furthermore, the results across the more recent studies are not conclusive. For example, using the PSID, Moffitt and Gottschalk (2011) find that the transitory variance has not increased since the mid- to late 1980s, whereas Jonathan Heathcote, Fabrizio Perri, and Gianluca Violante (2010) conclude that the transitory variance rose substantially in the 1990s. (6) Wojciech Kopczuk, Emmanuel Saez, and Jae Song (2010), using Social Security earnings data, find that the increase in inequality from the 1970s to the early 2000s was entirely driven by the persistent component of earnings. However, they use only a simple nonparametric decomposition method, and their findings contradict the more established results of the earlier literature for the 1970s and early 1980s, raising some doubts about the factors driving their results for the more recent period as well. In this paper, our data clearly show that the increase in male earnings inequality since the mid- to late 1980s has been entirely driven by the persistent component of earnings. We confirm this finding with a variety of methods, obtaining very robust results. (7)

Inequality in total household income has also increased in recent decades, as documented by, among others, Dirk Krueger and Perri (2006) and Heathcote, Perri, and Violante (2010). Studies that have in some way attempted to decompose the increase in household income inequality into persistent and transitory parts include Gottschalk and Moffitt (2009), Giorgio Primiceri and Thijs van Rens (2009), and Richard Blundell, Luigi Pistaferri, and Ian Preston (2008). Gottschalk and Moffitt (2009) use a simple nonparametric method and provide only suggestive evidence of an increase in the transitory variance starting in the mid-1980s, without conducting a full analysis. By contrast, Primiceri and van Rens (2009), using repeated cross sections on income and consumption from the Consumer Expenditure Survey (CE), find that all of the increase in household income inequality in the 1980s and 1990s reflects an increase in the persistent (or permanent) component of the variance. Our results indicate that, for the increase in the cross-sectional variance of household income, the transitory variance does play some role, although not as prominent a role as Gottschalk and Moffitt (2009) seem to suggest. (8) Furthermore, we show that the (relatively small) increase in the transitory variance of household income reflects increases in the transitory variance of spousal labor earnings and of investment income.

Our paper is also related to a recent literature that has analyzed the trends in the dispersion of short-term income changes, or income volatility, where volatility is defined as the standard deviation of percentage changes in male earnings over, say, 1 year. The findings in this literature have been more consistent across different studies. For instance, Congressional Budget Office (2008), John Sabelhaus and Song (2009, 2010), Sule Celik and coauthors (2012), and Donggyun Shin and Gary Solon (2011) all find that the volatility of male earnings did not increase between the 1980s and the early 2000S. (9) Our male labor earnings data are consistent with the findings in this literature, as we document no increase in male earnings volatility. However, we do find an increase in the volatility of total household income.

Finally, our study also relates to a literature that examines changes in the distribution of household consumption expenditure in the United States. Economic theory predicts that increases in the dispersion of the persistent components of income are likely to lead to increases in the dispersion of consumption. A few studies have examined whether the well-documented increase in U.S. income inequality has indeed been accompanied by an increase in consumption inequality of similar magnitude. Some of the earlier studies in this literature, including Daniel Slesnick (2001), Krueger and Peril (2006), Heathcote, Peril, and Violante (2010), and perhaps to a lesser extent Orazio Attanasio, Eric Battistin, and Hide Ichimura (2007) and Attanasio, Battistin, and Mario Padula (2011), find that consumption inequality increased by only a fraction of the increase in income inequality. However, these studies relied on data from the CE, and it has been increasingly recognized in the literature that these data are subject to potentially severe measurement error problems. More recent studies, such as Mark Aguiar and Mark Bils (2012) and Attanasio, Erik Hurst, and Pistaferri (2012), attempt to control for these measurement problems and conclude that consumption inequality has increased by a similar magnitude as income inequality. Thus, the implications of our results of a significant increase in consumption inequality appear to be borne out by the most recent evidence based on consumption data.

II. Data

This section describes our panel of income data from tax returns, the main variables we use, our sample selection, and the trends in income inequality observed in our data over the period 1987-2009.

II. A. Panel

We use a 23-year panel of income data from tax returns spanning the period 1987-2009. Our sample is a 1-in-5,000 random sample of the U.S. tax-filing population (with two exceptions noted below), (10) and inclusion of tax units in the sample is based on the last four digits of the Social Security number (SSN) of the primary tax filer. (11) The sample is kept representative of the tax-filing population by adding, each year, any new tax units that join the population of fliers (for example, immigrants and young people entering the work force) and have an SSN with the sampled four-digit ending. Our panel is not subject to the usual attrition or nonresponse problems present in most survey-based panels. Tax units might leave the sample because of death, emigration, or income falling below the tax filing threshold, but these exits do not affect the representativeness of the sample. Additionally, the age distribution of our sample is representative, each year, of the age distribution in the population of tax fliers in that year.

To create our 23-year panel, we started with tax returns from an existing panel, known as the 1987-96 Family Panel, constructed by the Statistics of Income (SOI) division of the IRS. We then extended this panel using returns contained in cross-sectional files from 1997 to 2009. From this extended sample we then selected those returns for which the primary filer had an SSN ending in one of two four-digit combinations. The resulting panel (again, with two exceptions noted below) is essentially a 1-in-5,000 random sample of tax units in each year of the period 1987-2009. Each of the original data sources is next described in turn.

The 1987-96 SOI panel started with a stratified random sample of taxpayers who filed in 1987, a subset of which was chosen based on the primary filer's SSN ending in one of two four-digit combinations. (12) All individuals represented on the tax return of a member of this cross section, including secondary taxpayers on joint returns and dependents, were considered to be members of the panel. Over the following 9 years, the SOI division included in the panel all returns that reported any panel member as a primary or secondary taxpayer, including returns filed by panel members who were dependents of another taxpayer. To keep the sample representative of the tax-filing population in subsequent years, returns from tax years 1988 through 1996 were added to the panel if the primary filer had an SSN ending in one of the two original four-digit combinations but did not file a return in 1987. In addition to information from each taxpayer's Form 1040, the data set includes information on the age and sex of the primary and secondary fliers from matched SSA records, and information on wages and contributions to employer-based retirement plans from W-2 forms.

The 1997-2009 data come from yearly cross sections, also collected by the SOI division. As with the 1987 sample described above, a stratified random sample was collected in each of these years, consisting partly of a strictly random sample based on the last four digits of the primary filer's SSN. In each year the set of SSNs used for sampling included the original two four-digit endings from 1987, making it possible to extend the earlier panel using returns collected from the yearly cross sections. Each cross section contains information from the taxpayer's Form 1040 and from a number of other forms and schedules. Into these data we merged information on the age and sex of the primary and secondary fliers from SSA records, and information on wages and contributions to employer-based retirement plans from W-2 forms.

We note, however, that there was a change in the sampling frame of our data in 1996. As a result of this change, we are missing two groups of filers in the pre-1996 period: dependent filers in 1987 over the period 1987-96, and nondependent primary filers in 1988-96 who were either dependent or secondary fliers in 1987. These two groups primarily consist of young (in the case of dependents) or female (in the case of secondary) taxpayers. The effect of missing these returns is therefore likely to be very small when we examine the labor income of males in their earning years, although it may be larger when we examine household income.

II. B. Variable Description

The ideal measure of individual-level earnings for this study would be gross labor income before any amounts are deducted for health insurance premiums or retirement account contributions. However, our data do not contain such a variable, and hence we use a measure of labor income that is as close to gross labor income as is possible when using tax data. For this we start with taxable wages, as reported in the "Wages, tips, other compensation" box of taxpayers' W-2 forms, and add the contributions to retirement savings accounts reported on the W-2 forms. This measure of labor income will include all income that a taxpayer's employer has reported to the IRS, namely, wages, salaries, and tips, as well as the portion of these that is placed in a retirement account. Since our data do not include information on the health insurance premiums paid by the taxpayer and excluded from taxable wages, our measure of labor income will exclude those amounts. Our measure also excludes any income earned from self-employment.

For pre-tax total household income, we start with "total income" as reported on Form 1040. This variable includes wages and salaries; dividends; alimony; business income (from sole proprietorships, partnerships, or S corporations); income from rental real estate, royalties, and trusts; unemployment compensation; capital gains; and taxable amounts of interest, IRA distributions, pensions, and Social Security benefits. To this we add back nontaxable interest, IRA distributions, pensions, and Social Security benefits reported on Form 1040.

There is some debate as to whether capital gains should be included in the measure of household income. Capital gains realized and reported in a particular year may include gains that accrued in past years. Hence, including capital gains may make household income appear "lumpier" than it actually is, since income will be higher in years when gains from earlier years are realized, and lower in years when gains accrued but were not realized. However, excluding capital gains will result in the measure of household income being too low for any taxpayer who had gains in that year (whether or not they were realized), and this downward bias will be quite large for taxpayers whose primary source of income is from investments. On balance, we feel that this concern is more important, and therefore we include capital gains in our benchmark measure of household income. However, we have verified that our results are robust to the exclusion of capital gains.

For after-tax household income, we start with the measure of pre-tax household income described above. We then subtract the amount of "total tax" reported on Form 1040. This amount captures total income taxes (including self-employment taxes) after nonrefundable tax credits are taken into account. Next, we subtract the total amount of payroll (FICA) taxes owed on the earned income of the couple. This is done to ensure that all federal taxes (including income and payroll taxes) are included for all taxpayers, regardless of whether they are wage and salary workers or self-employed. Finally, we add refundable tax credits (including the earned income tax credit and the refundable portion of the child tax credit) to arrive at our measure of after-tax household income.

As is usually the case with administrative data, our data contain relatively few sociodemographic variables. Most important, although we have information on the age and sex of the primary and secondary filers, we do not have information on the education or race of either. We also lack information on hours of work, and hence our analysis will focus on annual earnings as opposed to hourly wage rates.

II. C Sample Selection

For the case of individual earnings, we restrict our sample to males (whether they appear as the primary or the secondary filer in the tax form), as is standard in the literature, because the movements of females into and out of the labor force introduce discontinuities in the earnings process that are difficult for the statistical models of income to handle. For household income we carry out our analysis using two alternative samples. The first includes only households with a male primary or secondary filer and is thus similar to the sample we use to study male earnings. This avoids confounding the effects of moving to a broader measure of income (total household income) with the effects of moving to a broader sample of households. In addition, this sample is less likely to be affected by the change in sampling frame discussed in section II. A. In a slight abuse of terminology, we refer to this sample as our "male-headed households" sample. The second sample adds to this sample all other tax-filing households (that is, those without a male primary or secondary filer), a group that consists largely of single females. We are also interested in this broader sample because it is representative of the population of U.S. taxpayers.

For both male earnings and household income, we restrict our sample to individuals aged 25 to 60. We impose this restriction because individuals in this age group are likely to have completed most of their formal schooling and are sufficiently young not to be too strongly affected by early retirement. We also exclude earnings (or income) observations below a minimum threshold. For male earnings, since tax records do not provide information on employment status or hours of work, we can exclude individuals with presumably weak labor force attachment only by dropping low-earnings observations. For household income, we cannot simply exploit the fact that households with sufficiently low income are not required to file taxes, because many actually do so to claim refundable tax credits such as the earned income tax credit. Therefore, in order to treat low-income observations consistently, we exclude observations with reported household income below a minimum threshold. (13) We take the relevant threshold to be one-fourth of a full-year, full-time minimum wage. (14)

After imposing the restrictions above, we end up with a male earnings sample of 221,099 person-year observations on 20,859 individuals. For household income, our broader sample, which includes households without a male primary or secondary filer, contains 353,975 person-year observations on 33,730 households. We refer to this sample as our "all households" sample. Table 1 reports the number of observations and the mean and the standard deviation of the relevant income measure for our male earnings sample and for each of our household income samples.

II. D. Income Inequality Trends, 1987-2009

We begin by documenting the trends in inequality for male earnings and for household income, the latter before and after taxes, in our panel of tax returns. The top panel of figure 1 shows the cross-sectional variance of (the logs of) male earnings, pre-tax household income, and after-tax household income annually over 1987-2009, and the bottom panel the Gini coefficient for the same three measures of income. The figures show an increase in both measures of inequality for all three measures of income over the period. For example, the cross-sectional variance increases by 0.14 squared log point for male earnings (from 0.61 in 1987 to 0.75 in 2009), by 0.19 squared log point for pre-tax household income, and by 0.12 squared log point for after-tax household income. (15) In general, inequality in individual earnings is lower than inequality in household income. Furthermore, inequality in after-tax household income is lower than inequality in pre-tax household income, reflecting the progressivity of the federal tax system.

[FIGURE 1 OMITTED]

These inequality trends in our data are consistent with trends that have been documented in many other U.S. studies using different data sets. In the remainder of the paper, we focus on the cross-sectional variance of (the logs of) earnings and household income as our measure of inequality, because of its tractability for statistical decompositions, and we investigate to what extent the increase in the variance shown here represents an increase in the variance of the persistent or in the transitory component of income.

III. Methodological Approach

As discussed in the introduction, given that the degree of persistence (or serial correlation) of income shocks lies in a range between the two theoretical extremes, the choice of the dividing line between what degree of serial correlation will be considered "persistent" and what degree "transitory" is necessarily somewhat arbitrary. In our analysis we use two sets of methods, each of which takes a somewhat different approach to separating income into persistent and transitory parts.

First, in section IV we employ simple nonparametric decomposition methods that essentially decompose income into a highly transitory piece that exhibits no serial correlation and one other piece, which we call "persistent." These methods then ask, for each of these two pieces, how much of the rise in the variance of income is coming from changes in the variance of that piece. Second, in section V we employ nonstationary error components models of income dynamics. These models fully specify the process that generates income over time and essentially decompose income into a highly persistent piece and another, transitory piece that allows for some limited degree of serial correlation. Here, too, we then ask how much of the rise in the variance of income is coming from changes in the variances of the persistent and of the transitory piece. Note that neither approach is right or wrong: each is interesting in its own right. And as we show, both yield very similar qualitative results for the trends in inequality and its components.

Before turning to the specific methods and results, we note that throughout the paper we work with measures of income from which we have removed the predictable life-cycle variation in income, that is, the variation that can be explained by differences in age across individuals. For male earnings we work with residuals from least squares regressions (run separately for each calendar year) of log earnings against a full set of age dummy variables. For the two measures of household income, in addition to the age-related variation, we remove the income variation that is due to differences in household size and composition. We work with residuals from regressions (run separately for each calendar year) of log household income on a full set of age dummies for the primary tax filer, indicators of sex and marital status for the primary filer, and a full set of dummies for the number of children (up to 10) in the household. We have verified, however, that working directly with the raw measures of male earnings and total household income, rather than with these residuals, leads to qualitatively similar results. (16)

IV. Simple Nonparametric Methods

We begin our analysis using simple nonparametric methods. In this section we introduce the methods and present the corresponding decompositions for male labor earnings. The methods used in this section are largely descriptive and do not explicitly rely on any model of the income process. In section V we turn to our analysis using error components models and again present the resulting decompositions for male labor earnings. Results of both approaches for total household income are presented in section VI.

IV. A. Volatility

We start with a simple, purely descriptive measure of the dispersion in the cross-sectional distribution of income changes that occur over short horizons, namely, the standard deviation of percentage changes in (residual) male earnings. Following Shin and Solon (2011), we refer to this measure as the "volatility" of earnings. This measure is closely related, although not equivalent, to the variance of the transitory component of income that we will discuss in the following sections. (17) Figure 2 plots over the sample period the standard deviations of both 1-year and 2-year percentage changes in residual male earnings. The figure shows no clear increasing or decreasing trend in either series. Although volatility increased in the last 3 years of our sample, there is no indication that this represents the beginning of a rising trend. In fact, regressing each of the two volatility series shown on a constant and a linear time trend yields an estimated coefficient on the latter that is essentially zero. (18) There is thus no evidence in our data of a trend in male earnings volatility for our sample period.

[FIGURE 2 OMITTED]

IV.B. Simple Nonparametric Decomposition Methods

We next consider two simple nonparametric methods that decompose the cross-sectional variance of income (our measure of income inequality) into persistent and transitory parts. The methods in this section essentially define the persistent component of income as the average of annual income over a certain number of years, and transitory income as the deviations of annual income from that average.

The first method, which is used in Kopczuk, Saez, and Song (2010, hereafter KSS), defines person i's persistent income component in year t as the average of person i's annual log income (or residual log income) over a P-year period centered around t. Transitory income for person i in year t is then defined as the difference between person i's current annual income at t and his or her persistent income in the same year. The persistent and transitory components of the variance are next calculated as the variances, across individuals, of persistent and transitory income, respectively.

For our decomposition of the cross-sectional variance of (residual) male earnings into persistent and transitory parts using the KSS method, we set parameter P = 5, the same value used by KSS. (19) Whereas they use raw (as opposed to residual) log earnings and restrict observations to individuals who are present in the sample for all 5 years, we use residual log earnings and do not require individuals to be present in the sample in all 5 years. However, the results are not materially different when we follow their treatment and restrictions.

The top panel of figure 3 presents the results of this decomposition, showing that the persistent component of the variance in male earnings increased over our sample period but the transitory component did not. Hence the increase in the total cross-sectional variance was entirely driven by the persistent component. Table 2 formalizes this result, reporting estimates from a regression that fits a linear time trend, separately, to the persistent variance series and to the transitory variance series.

The first column in each of the two panels of table 2 corresponds to the KSS decomposition from figure 3. The dependent variable is either the persistent (left panel) or the transitory (right panel) variance component, and the explanatory variables are a constant (not shown) and a linear time trend. The table shows a statistically significant rising linear trend in the persistent variance: the estimated linear trend coefficient is 0.0037 with a standard error of 0.0002, implying an increase of 0.09 squared log point over 23 years. There is no trend in the transitory variance component (the estimated trend coefficient is 0.0000). That is, the entire increase in the total cross-sectional variance of (residual log) male earnings was driven by an increase in the variance of the persistent component of earnings, and thus reflects an increase in persistent inequality.

[FIGURE 3 OMITTED]

The second nonparametric decomposition method that we consider was introduced by Gottschalk and Moffitt (1994, hereafter GM). The GM method is similar, although not identical, to the KSS method, and we consider it separately because it relies (indirectly) on a simple model of income, which might provide a slightly more direct way of relating it to our error components models. The method is based on the simple specification of (residual) log earnings [[xi].sub.it] = [[alpha].sub.i] + [[alpha].sub.i], where [[alpha].sub.i] is purely permanent (time-invariant) and [[epsilon].sub.it], is purely transitory (i.i.d.). For a P-year window centered around each year t, the method uses the standard formulas implied by this simple "random effects model" to compute the persistent variance of [[xi].sub.it] as the variance of the [[alpha].sub.i] component, and the transitory variance of [[xi].sub.it], as the variance of the [[epsilon].sub.it], component. (20) To obtain a series of persistent and transitory variance estimates over time, this procedure is repeated for consecutive, overlapping P-year moving windows. (21)

The bottom panel of figure 3 presents the GM inequality decomposition. As with the KSS method, this decomposition implies that the persistent variance component increased over the sample period but the transitory component did not. This is confirmed in the second column in each panel in table 2. Here, too, the coefficient on the linear time trend is large and significant for the persistent variance component and is essentially zero for the transitory component. Both trend coefficients are quite precisely estimated. Thus, once again, the increase in the total cross-sectional variance was entirely driven by the increase in the variance of persistent earnings, constituting an increase in persistent inequality.

Note as well that both the KSS method and the GM method attribute a large fraction of the total variance (more than 80 percent on average across all years) to the persistent component. We will come back to this point below.

V. Error Components Models

In this section we turn to error components models (ECMs) of income dynamics to examine the role of persistent and transitory income components in determining the trend in inequality. These ECMs are statistical models (stochastic processes) that approximate the dynamic properties and the trajectory of income over time. Like the simpler nonparametric decomposition methods presented in section IV, ECMs typically specify income as consisting of a persistent component and a transitory component, and they can be used to decompose the variance of (log) income into persistent and transitory parts.

For example, the persistent component of income in the model will tend to capture differences in incomes across individuals that are due to differences in permanent characteristics such as education and unobserved ability. It will also capture income changes that have lasting effects on the path of the income process, such as the onset of a chronic illness or the permanent loss of a high-paying job. The transitory component will tend to capture changes in income that are less persistent but may have some serial correlation, such as a temporary illness or transitory unemployment. The model then essentially attributes variation in income to the persistent or the transitory component according to the strength in the correlations between individuals' current and future income in the data, and to how this strength changes as the periods move further apart. Statistically, the separate identification of the persistent and transitory components relies on the simple idea that the contribution of the transitory component to the autocovariance of income between two periods vanishes as the periods get further apart.

[FIGURE 4 OMITTED]

Flexible specifications of the income process, such as the ones we consider in this paper, can match the entire autocovariance structure of income in the data, as well as its changes over the life cycle and over calendar time. To illustrate, figure 4 shows two particular aspects of the autocovariance structure of male labor earnings in our data. Here we focus on the series labeled "empirical" in each of the two panels in the figure. (22) The top panel displays the variance (calculated across all individuals of the same age) of residual log male earnings as a function of age. To construct the series, we computed the variance of (residual) male labor earnings in the data for each combination of age and calendar year and regressed this variance against a full set of year and age indicators. The figure displays the estimated coefficients on the age indicators (normalized so that a = 1 in the figure corresponds to age 25).

The corresponding series in the bottom panel displays the empirical autocovariance function for our male earnings data, that is, how the strength of the autocovariance between current earnings and future earnings changes as the periods get further apart. In other words, the figure shows how the empirical autocovariance (the autocovariance of earnings in the data for observations that are k years apart) depends on the "lead" k. To construct the series, we computed the autocovariance of male labor earnings for each combination of age, calendar year, and lead k and then regressed the autocovariance against a full set of age, year, and lead indicators. We then calculated the value of the autocovariance that is implied by the estimated regression for individuals aged 35 in base year 1990. The implied autocovariances for different ages or different years look very similar. For now, we simply note that the goal of the ECMs is to match aspects of the data such as these. (23) We will return to these figures below.

V.A. Stationary ECMs

We begin by presenting stationary models of the income process, that is, models in which the parameters are not allowed to change over calendar time. (24) In the next section we will present nonstationary ECMs, which allow certain parameters in the model to change over time, in order to capture changes in the distribution of income.

Let [y.sup.i.sub.a,t], denote log income, where i indexes individuals, a age, and t calendar years. (25) Log income is given by

(1) [y.sup.i.sub.a,t], = g([zeta]; [X.sup.i.sub.a,t]) + [[xi].sup.i.sub.a,t],

where [X.sup.i.sub.a,t] is a vector of observable characteristics, g(*) is the part of log income that is common to all individuals conditional on [X.sup.i.sub.a,t], [zeta] is a vector of parameters, and [[xi].sup.i.sub.a,t] is the unobservable error term. As is common in the literature on income dynamics, we control for the income variation that is due to observables, [X.sup.i.sub.a,t], and focus on the dynamics of the error term, [X.sup.i.sub.a,t]. (26)

The error [[xi].sup.i.sub.a,t] is modeled as consisting of a persistent and a transitory part:

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(3) [p.sup.i.sub.a,t] = [psi][p.sup.i.sub.a-1,t-1] + [[eta].sup.i.sub.a,t]

(4) [[tau].sup.i.sub.a,t] = [[epsilon].sup.i.sub.a-1,t-1] + [[theta].sub.2][[epsilon].sup.i.sub.a-2,t-2]

(5) [[alpha].sup.i] ~ i.i.d.(0, [[sigma].sup.2.sub.[alpha]), [[eta].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[eta]]), [[epsilon].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[epsilon]]).

The persistent part of income includes, first, an individual-specific, time-invariant component, [[alpha].sup.i], which captures differences in income across individuals due to factors that include education as well as unobserved ability or productivity. It also includes an autoregressive component, [p.sup.i.sub.a,t], which captures other components of income that are highly persistent. As is common in such models, our estimates of [psi] for the above specification will turn out to be quite close to 1, so it is appropriate to label component [p.sup.i.sub.a,t], as "persistent." These large values of [psi] allow the model to match both the nearly linear increase in the variance of (residual) income in the data as a function of age seen in the top panel of figure 4, and the very gradual decline (after the first 1 to 2 years) in the empirical autocovariance function seen in the bottom panel. (27)

We specify the transitory income component in the model, [[tau].sup.i.sub.a,t], as an MA(2) process. Several studies of income processes have found evidence for the presence of either an MA(1) or an MA(2) transitory component. (28) We choose an MA(2) process to err on the side of allowing the transitory income component to exhibit more persistence, but we have verified that our results are not sensitive to this choice.

The top panel of table 3 presents point estimates and standard errors for the model in equations 2 through 5 for our various measures of income and our various samples. (29) For instance, the first column reports the following point estimates (with standard errors in parentheses) for residual male earnings [[??].sup.2.sub.[alpha]] = 0.1968 (0.0018), [psi] = 0.9623 (0.0010), [[??].sup.2.sub.[eta]] = 0.0293 (0.0007), [[??].sup.2.sub.[epsilon]] = 0.1826 (0.0034), [[??].sub.1] = 0.2286 (0.0144), and [[??].sub.2] = 0.1231 (0.0151). For (residual) pre-tax household income using the sample of all households (third column of the table), the estimates are [[??].sup.2.sub.[alpha]] = 0.1960 (0.0016), [psi] = 0.9669 (0.0007), [[??].sup.2.sub.[eta]] = 0.0269 (0.0006), [[??].sup.2.sub.[epsilon]] = 0.1577 (0.0032), [[??].sub.1] = 0.2766 (0.0148), and [[??].sub.2] = 0.1639 (0.0154). These estimates are broadly comparable to those obtained by other studies that use similar specifications. (30) Also, the estimated models match the main features of the data, such as those presented in figure 4, quite well. (31)

The bottom panel of table 3 presents estimates for a version of the model that imposes the restriction that [psi] = 1, that is, that [p.sup.i.sub.a,t], follows a random walk, an assumption often made about the persistent component. Here we simply note that, in terms of matching the features of the data shown in figure 4, the random walk specification matches the nearly linear increase with age of the cross-sectional variance in the top panel of figure 4, but it does not match well the gradual decline in the autocovariance function shown in the bottom panel. By contrast, the unrestricted estimates of [psi] (which generally lie around 0.96 to 0.98 for our various income measures and samples) allow the unrestricted model to match the increase in the variance with age fairly well and the pattern of the autocovariance function of male earnings quite closely. In the analysis that follows, we do not impose the restriction [psi] = 1 on component [p.sup.i.sub.a,t], in part to better match the autocovariance function of income.

V.B. Nonstationary ECMs

Stationary models, however, cannot be used to study changes in the distribution of income (such as income inequality) over calendar time. This question requires the use of nonstationary models, which allow certain features of the income process (and hence of the income distribution) to change over time. Such models can capture (in addition to those features of the autocovariance structure of the data shown in the previous section) trends in the cross-sectional variance of income, such as that seen in the top panel of figure 1.

Our baseline nonstationary ECM is as follows. We model residual income, [[xi].sup.i.sub.a,t], as

(6) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(7) [p.sup.i.sub.a,t] = [psi][p.sup.i.sub.a-1,t-1] + [[eta].sup.i.sub.a,t]

(8) [[tau].sup.i.sub.a,t] = [[pi].sub.t][[epsilon].sup.i.sub.a,t] + [[theta].sub.i][[pi].sub.t-1] [[epsilon].sup.i.sub.a-1,t-1] + [[theta].sub.2][[pi].sub.t-2][[epsilon].sup.i.sub.a-2,t-2]

(9) [[alpha].sup.i] ~ i.i.d.(0, [[sigma].sup.2.sub.[alpha]]), [[eta].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[eta]]), [[epsilon].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[epsilon]]).

In the equations above, both components of persistent income, [[alpha].sup.i] and [p.sup.i.sub.a,t], are multiplied by the year-specific factor loadings [[lambda].sub.t], which allow the relative importance of the persistent components of income to vary over calendar time (note that the parameter [[lambda].sub.t] can change from year to year). The transitory income component in the model, [[tau].sup.i.sub.a,t], is specified as an MA(2) process in which the transitory innovations, [[epsilon].sup.i.sub.a,t], are multiplied by the year-specific factor loadings [[pi].sub.t], which allow the variance of the innovations, and hence the relative importance of the transitory component, to vary by calendar year.

A few words about the interpretation of the [[lambda].sub.t] parameters are in order. Suppose, first, for simplicity that [[alpha].sup.i] represents solely education, and that [p.sup.i.sub.a,t] represents human capital (which changes slowly over time and is highly persistent). Then, the [[lambda].sub.t] parameters would represent the "price" that the economy attributes to these characteristics in year t. Note as well that the "price" of such characteristics can indeed change from year to year, as evidenced, for example, by the well-documented changes in the returns to education in recent decades. It seems reasonable to expect that the economy will assign a price not just to education, but also to other productive characteristics of individuals (including, but not restricted to, those embedded in human capital). (32) More generally, [[alpha].sup.i] will capture, in addition to education, other permanent characteristics of individuals (or households) such as unobserved ability or productivity, and [p.sup.i.sub.a,t] will capture characteristics that are slow-moving and persistent, such as human capital and social connections. A similar modeling approach of nonstationarity in the persistent component of income is followed, for example, in Moffitt and Gottschalk (1995, 2011), Haider (2001), and Baker and Solon (2003). (33)

A key element of the above specification is clearly the ability of the [[lambda].sub.t] parameters to change over time. One potential concern that this raises, however, is that the [[lambda].sub.t] parameters could in principle bounce around from year to year. Such transitory variation in [[lambda].sub.t] could muddle the labeling of [[lambda].sub.t]([[alpha].sup.i] + [p.sup.i.sub.a,t]) in equation 6 as the "persistent" component of income. To address this concern, when estimating the above model, we impose some smoothness on the movements of [[lambda].sub.t] over time by restricting [[lambda].sub.t] to lie on a fourth-degree polynomial. (34)

V.C. Estimation

Estimation of our ECMs proceeds in two stages. In the first stage we construct residuals from regressions of log earnings (or log income) against observables, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], as discussed in section III. In the second stage we use those residuals to estimate all model parameters other than [zeta], using a minimum distance estimator. The estimator matches all of the theoretical variances and autocovariances implied by the model in equations [zeta], through 9 to their empirical counterparts. The procedure matches 7,912 variances and autocovariances in total. All variances and autocovariances are specified in levels. Appendix C provides details on the minimum distance estimation procedure, and appendix D shows the theoretical moments that are implied by the model and that are matched in estimation.

V.D. ECM-Based Variance Decomposition for Male Earnings

Table 4 presents parameter estimates of our baseline nonstationary ECM for our various measures of income and our various samples. Note that the estimates of parameters [[sigma].sup.2.sub.[alpha]], [psi], [[sigma].sup.2.sub.[eta]], [[sigma].sup.2.sub.[epsilon]], [[theta].sub.1], and [[theta].sub.2] (those also present in the stationary version of the model) in table 4 are quite similar to the corresponding estimates in table 3 for the stationary model. The lines labeled "ECM-predicted" in figure 4 show the estimated nonstationary model's predictions for the variance of male earnings as a function of age (top panel) and for the autocovariance function of male earnings (bottom panel). (35) As the figure shows, the estimated model fits the data quite well.

In this section we use our estimated nonstationary ECM to decompose the cross-sectional variance of log (residual) male earnings into its persistent and transitory parts. For each calendar year between 1987 and 2009, and given an age distribution, the ECM in equations 6 through 9 implies a specific value for the total cross-sectional variance, the variance of the persistent component, and the variance of the transitory component of log (residual) earnings, as a function of the model parameters. We compute these variances implied by the estimated model using the actual empirical age distribution for each year in our sample. (36) Note that the trends in the persistent and the transitory variance components in our baseline model are primarily determined by the estimates of the [[lambda].sub.t], and [[pi].sub.t], parameters, respectively.

The decomposition of inequality implied by our estimated baseline ECM is presented in figure 5. The top line, which shows the total cross-sectional variance implied by the estimated model for each calendar year, is essentially identical to the empirical cross-sectional variance of log (residual) male earnings in our data. That is, our estimated model matches the evolution of the cross-sectional variance over calendar time very closely. (37)

[FIGURE 5 OMITTED]

The persistent component of the variance in figure 5 displays a clearly increasing trend, rising from 0.38 squared log point in 1987 to around 0.47 squared log point in 2009. The transitory component of the variance, by contrast, fluctuates over the 23-year period but does not exhibit any trend. The last column of table 2 shows that there is no trend for the transitory variance: the estimated trend coefficient is 0.0001 (with a standard error of 0.0004), which would imply a negligible increase of 0.003 squared log point over 23 years. In other words, the entire increase in the total cross-sectional variance of (residual log) male earnings as determined by the nonstationary ECM is driven by an increase in the variance of the persistent component of earnings, confirming the results obtained previously with the simpler nonparametric methods.

V.E. Comparison with Simple Nonpammetric Decompositions

Here we briefly discuss the relationship between the model-based decomposition just presented and the simple nonparametric decompositions shown previously, in the hope of clarifying some of the connections and the differences that exist across the methods. So far we have shown that the different methods yield essentially the same answer regarding the trends in inequality, namely, that the rising trend in male earnings inequality over our sample period has been entirely driven by the persistent component of earnings. However, the different decompositions presented above yield somewhat different relative shares of persistent and transitory inequality at a given point in time. Specifically, the KSS and GM methods attribute, on average, more than 80 percent of the total variance to the persistent component, whereas the ECM attributes slightly less than 70 percent.

This difference reflects the feature of the KSS and GM decompositions that transitory income is defined as deviations from multiyear averages of annual income, and therefore captures only purely transitory income (that is, income that has no serial correlation whatsoever). As a result, basically all the persistence in the income data is attributed to the persistent income component. This implies in turn that even shocks that dissipate in 1 to 2 years, and that would generally be viewed as transitory but are somewhat serially correlated, will tend to be attributed to the persistent income component. Consequently, the persistent component is assigned a larger role overall and accounts for a large fraction of total inequality at any given point in time. In the ECM, by contrast, transitory income is allowed to have some degree of serial correlation, so it captures some of the short-duration persistence in the data, and thus the transitory component is assigned a slightly larger share of the total variance. It is reassuring that despite some differences in the persistent and transitory shares of inequality, both approaches yield essentially the same answer for the trends in income inequality. (38)

VI. Household Income

We next examine the trend in the variance of the persistent and transitory components of pre-tax total household income. As noted in the introduction, examining household income is important because it is a broader measure of a household's resources and therefore has a more direct beating on household consumption and welfare. In going from individual male earnings to total household income, a number of income components are added. These can be grouped into four main categories: spousal labor earnings, transfer income, investment income, and business income. Transfers are defined here as the sum of alimony received, pensions and annuities, unemployment compensation, Social Security benefits, and tax refunds. Investment income includes interest, dividends, and capital gains. Business income includes income from sole proprietorships, partnerships, and S corporations. (39)

As already mentioned in section II, we carry out the analysis of household income using two alternative samples. The first, our "male-headed households" sample, consists of households with a male primary or secondary filer aged 25 to 60 whose annual labor earnings are above the minimum threshold. Our second, broader sample of "all households" essentially adds single females to the previous sample. (40) As table 1 shows, for pretax household income the broader sample has about 133,000 observations more than the sample of male-headed households.

As described in section III, the analysis here is performed on residuals from a first-stage regression of log household income on the sex, age, and filing status of the primary filer, and on a full set of dummies for the number of children. (41)

VI.A. Volatility

Figure 6 plots the standard deviation of 1-year and 2-year percentage changes in total household income for our sample of all households over the sample period. (The corresponding figure for the sample of male-headed households is very similar and is not shown.) As the figure shows, household income volatility, as measured here, rose 9 percent for 1-year income changes and 11 percent for 2-year income changes over the sample period, and there appears to be a clear rising trend. In fact, fitting a linear time trend to each of these two series yields coefficients on the time trend of 0.0022 (0.0003) for 1-year changes, and 0.0020 (0.0003) for 2-year changes, each implying an increase of about 0.05, or more than 10 percent, over the 23-year period. Thus, in contrast to male earnings, household income volatility appears to have increased over the sample period, which suggests that the transitory component of the variance might have played a role in the increase in the cross-sectional inequality of household income.

[FIGURE 6 OMITTED]

VI.B. Simple Nonparametric Variance Decompositions

Figure 7 shows the decomposition of the cross-sectional variance of (residual) pre-tax household income on the sample of all households, using the KSS method. (The decomposition using the GM method is very similar and is therefore not shown.) The figure shows a clear increase in the persistent part of the variance over the period of about 22 percent. The first column in the bottom panel of table 5 fits a linear time trend to the persistent variance. The estimated trend coefficient of 0.0056 (0.0004) is strongly significant and implies an increase in the variance of 0.13 squared log point over 23 years, explaining nearly the entire increase in the total variance shown in the figure.

However, the transitory variance component in the figure has also increased over the period, by about 15 percent. (This is somewhat hard to see in the figure because of the low level of the transitory variance.) The fourth column in the bottom panel of table 5 shows an estimated linear time trend coefficient of 0.0008 (0.0001) for the transitory variance, which is statistically significant but implies an increase in the variance of only 0.02 squared log point over 23 years. In other words, although the transitory component of the variance did increase, that increase had little effect on the total variance because the KSS method attributes only a very small fraction of the total variance to the transitory component (13 percent, on average, in this decomposition). Thus, the increase in the total variance is again driven by the increase in the persistent component. However, under a decomposition that assigned a larger share of the total variance to the transitory component, the transitory variance would likely play a somewhat larger role.

[FIGURE 7 OMITTED]

VI.C ECM-Based Variance Decomposition

We next examine the decomposition of the variance of pre-tax household income based on our nonstationary ECM. The second and third columns of table 4 present point estimates and standard errors for our baseline specification estimated on pre-tax household income, for both our sample of households with a male head (second column) and our broader sample of all households (third column). Figure 8 presents the corresponding variance decompositions. (42)

The figure shows a clear increasing trend in the persistent component of the variance, which appears to have been concentrated in the first half of the 23-year sample period. The transitory component, by contrast, appears to have been relatively flat, although it increased somewhat in the last few years of the sample (the early to mid-2000s). The third and sixth columns of table 5 fit a linear time trend to the two variance components from figure 8 and confirm the rising trend for the persistent component of pre-tax household income. In the third column of the bottom panel, which corresponds to the sample of all households, the estimated linear trend coefficient of 0.0048 (0.0005) is strongly statistically significant and implies an increase of 0.11 squared log point over 23 years, accounting for roughly 80 percent of the increase in the total variance seen in the bottom panel of figure 8. The estimates in the sixth column of the bottom panel show a small rising trend in the transitory component of the variance, which has an estimated trend coefficient of 0.0013 (0.0005), implying an increase of 0.03 squared log point over 23 years and accounting for the remaining 20 percent of the increase in the total variance.

[FIGURE 8 OMITTED]

These results suggest that an increase in the variance of the persistent component of income accounted for the bulk of the increase in the cross-sectional variance of total pre-tax household income. The transitory component also contributed to the increase, but only a relatively small fraction, the precise contribution depending somewhat on the decomposition method used, on model specification in the case of the model-based decompositions, and on other factors such as the sample used. We conclude that the increase in household income inequality was mostly persistent. (43)

VI. D. The Increase in the Transitory Variance of Household Income

We have shown that the increase in the total variance of household income was mostly persistent, but that unlike with male earnings, the transitory variance appears to have played some role. Here we explore which source or category of household income might account for the increase in the transitory variance of total household income. As previously discussed, household income can be decomposed into male labor earnings, spousal labor earnings, transfer income, investment income, and business income. In this section we take male earnings and then sequentially (and cumulatively) add each of spousal earnings, transfer income, investment income, and business income. For each of the resulting income aggregates, we estimate our ECM and decompose the cross-sectional variance into persistent and transitory parts. (44) We then fit a linear time trend to the transitory variance component and estimate the increase in the transitory variance over 1987-2009 that is implied by the estimated time trend. Here we report results from decompositions based on our baseline ECM and our male-headed households sample, but the other methods lead to similar conclusions. (45) Starting with male earnings and moving along the series of increasingly broad income aggregates, the implied increases in the transitory variance over 1987-2009 (in squared log points) are 0.003, 0.015, 0.016, 0.035, and 0.038, respectively. That is, the addition of spousal labor earnings and of investment income leads to a larger change in the implied increase in the transitory variance component over the sample period. We conclude that both spousal labor earnings and investment income contributed to the (relatively small) increase in the transitory variance of total household income.

**********

An extensive literature has documented a large increase in income inequality in the United States in recent decades. In this paper we ask to what extent this observed increase reflects an increase in persistent or in transitory inequality. By persistent inequality we mean long-run inequality, or the dispersion across the population in those components of income that are more or less stable over periods of more than a few years. By transitory inequality we mean the dispersion arising from short-run variability in incomes, as individuals move around within the income distribution at relatively short frequencies of one to a few years. (1)

The distinction between persistent and transitory inequality is important for various reasons. First, it is useful in evaluating proposed explanations for the documented increase in annual cross-sectional inequality. For example, if rising inequality reflects solely an increase in persistent inequality, then explanations consistent with this rise would include skill-biased technical change and long-lasting changes in employers' compensation policies. By contrast, an increase in transitory inequality could reflect increases in income mobility, driven perhaps by greater flexibility among workers to switch jobs. Second, the distinction is useful because it informs the welfare evaluation of changes in inequality. Lifetime income captures an individual's (or a household's) long-term available resources, and hence an increase in persistent inequality would reduce welfare according to most social welfare functions. By contrast, increasing transitory inequality would have less of an effect on welfare, especially in the absence of liquidity constraints restricting consumption smoothing.

One important aspect of our contribution is the use of a new and superior data source to shed new light on the decomposition of inequality and of changes in inequality into persistent and transitory components. We use a new, large, and confidential panel of tax returns from the Internal Revenue Service (IRS) to study the persistent-versus-transitory nature of rising inequality in individual male labor earnings and in total household income, both before and after taxes, in the United States over the period 19872009. (2) Our panel constitutes a 1-in-5,000 random sample of the population of U.S. taxpayers. It contains individual-level labor earnings information from W-2 forms as well as household-level income information from Form 1040. It also includes information on the age and sex of the primary and secondary tax filers from matched Social Security Administration (SSA) records. Our broadest sample consists of roughly 350,000 observations on 35,000 households and is therefore substantially larger than the publicly available, survey-based panels typically used to address related questions in the literature. In addition, our data are not subject to top-coding and are less likely than the survey data to be affected by measurement error.

We analyze the persistent-versus-transitory nature of rising inequality by decomposing income into persistent and transitory parts and examining how much each of these parts contributed to the increase in the cross-sectional variance of income (our measure of income inequality; see footnote 1) over our sample period. In reality, incomes are subject to many different types of shocks. Some of these might be truly persistent (or even permanent), and some entirely transitory, but many are likely to exhibit some degree of persistence (that is, serial correlation) in between the two extremes. As a result, decomposing income into persistent and transitory components requires taking a stand on what degree of serial correlation in income shocks will be considered "persistent" and what degree will be considered "transitory." This choice necessarily involves some arbitrariness.

Our analysis uses two sets of methods, each of which takes a somewhat different approach to separating income into persistent and transitory parts. First, we employ simple nonparametric decomposition methods that essentially separate income into a highly transitory piece that exhibits no serial correlation and one other piece, which we call "persistent." These methods then ask how much of the rise in the variance of income is coming from changes in the variance of the transitory piece and how much from changes in the variance of the persistent piece. Second, we employ rich non-stationary error components models of income dynamics. (3) These models fully specify the process that generates income over time and essentially decompose income into a highly persistent piece and another, transitory piece that allows for some (limited) degree of serial correlation. Here, too, we then ask how much of the rise in the variance of income is coming from changes in the variance of the persistent piece and of the transitory piece.

The two approaches can give somewhat different answers about the shares of income inequality at any given point in time that are attributed to the persistent and to the transitory income components. The more serial correlation that is allowed in the transitory income component, the larger the share of inequality at a given point in time that will be attributed to that component (because some of the short-duration persistence in the income data will be attributed to the transitory piece). The simple nonparametric methods, which use a stricter definition of transitory income, attribute the vast majority of the variance to the broadly defined persistent income component. Our error components models, which, as noted, allow for some serial correlation in transitory income, assign a somewhat larger fraction of total inequality to this more broadly defined transitory income.

However, and most important, both approaches yield very similar results for our main object of interest: the increase in income inequality and its components over time. For male labor earnings, both approaches imply that the entire increase in cross-sectional inequality over the 1987-2009 period was driven by an increase in the variance of the persistent component of earnings. Specifically, we find that the variance of the persistent component of log male labor earnings increased over this period but the variance of the transitory component did not.

For total household income--which in addition to male labor earnings includes spousal labor earnings, transfer income, investment income, and business income--both approaches imply that the increase in inequality over our sample period was mostly (although not entirely) persistent. For this broader category of income, the variance of both the persistent and the transitory components of income increased, but the persistent component contributed the bulk of the increase in the total variance. Furthermore, the increase in the variance of the transitory component of total household income reflects increases in the transitory variance of spousal labor earnings and of investment income.

Next, we use our data from tax returns to examine the role of the federal tax system in the observed trend in income inequality. In particular, we investigate whether the increase in inequality for after-tax household income differs materially from that for pre-tax income. Our measure of after-tax household income accounts for all federal personal income taxes (obtained from Form 1040), including all refundable tax credits, as well as payroll taxes (calculated using information from W-2 forms). We find that the cross-sectional variance of after-tax income is on average 0.10 squared log point, or roughly 15 percent, smaller than the variance of pre-tax income, reflecting the overall progressivity of the federal tax system. In terms of the trend, we find that the tax system helped mitigate somewhat the increase in household income inequality over the sample period, but this attenuating effect was insufficient to significantly alter the broad trend toward rising inequality.

Finally, we note that our paper is the first to estimate error components models of income dynamics using U.S. administrative data, and that the quality and significant size of our data set allow us to obtain very precise estimates of our models. Our paper is also among the first to apply non-stationary models to household-level income, which is arguably a more relevant income measure than individual earnings for questions regarding consumption and welfare. Additionally, our comparison of decompositions using different approaches should help clarify the connections as well as the differences that exist across the different methods.

The rest of the paper is organized as follows. Section I discusses the related literature and places our results in the context of existing studies. Section II describes our data set, our sample selection, and the trends in income inequality in our data. Section III outlines our methodological approach. Section IV introduces the simpler nonparametric methods and presents results for male earnings using those methods. Section V introduces our error components models, discusses their estimation, presents model estimates for male labor earnings, and uses the estimated model to decompose the cross-sectional variance of male earnings into persistent and transitory parts. Section VI presents results using our various methods for pre-tax total household income. Section VII investigates the role of the federal tax system in the increase in income inequality. Section VIII concludes.

I. Related Literature

An extensive literature has documented a large increase in labor earnings inequality in the United States in recent decades. (4) A small branch of this literature has attempted to determine whether this documented increase in cross-sectional earnings inequality reflects an increase in persistent or in transitory inequality, as these are defined in footnote 1. The earlier studies, including Peter Gottschalk and Robert Moffitt (1994), Moffitt and Gottschalk (1995), and Steven Haider (2001), all use data from the Panel Study of Income Dynamics (PSID) and generally conclude that a substantial part (as much as one half) of the increase in cross-sectional earnings inequality in the 1970s and early 1980s was transitory. (5)

Very few studies have analyzed the last two decades, although earnings inequality has continued to increase. Furthermore, the results across the more recent studies are not conclusive. For example, using the PSID, Moffitt and Gottschalk (2011) find that the transitory variance has not increased since the mid- to late 1980s, whereas Jonathan Heathcote, Fabrizio Perri, and Gianluca Violante (2010) conclude that the transitory variance rose substantially in the 1990s. (6) Wojciech Kopczuk, Emmanuel Saez, and Jae Song (2010), using Social Security earnings data, find that the increase in inequality from the 1970s to the early 2000s was entirely driven by the persistent component of earnings. However, they use only a simple nonparametric decomposition method, and their findings contradict the more established results of the earlier literature for the 1970s and early 1980s, raising some doubts about the factors driving their results for the more recent period as well. In this paper, our data clearly show that the increase in male earnings inequality since the mid- to late 1980s has been entirely driven by the persistent component of earnings. We confirm this finding with a variety of methods, obtaining very robust results. (7)

Inequality in total household income has also increased in recent decades, as documented by, among others, Dirk Krueger and Perri (2006) and Heathcote, Perri, and Violante (2010). Studies that have in some way attempted to decompose the increase in household income inequality into persistent and transitory parts include Gottschalk and Moffitt (2009), Giorgio Primiceri and Thijs van Rens (2009), and Richard Blundell, Luigi Pistaferri, and Ian Preston (2008). Gottschalk and Moffitt (2009) use a simple nonparametric method and provide only suggestive evidence of an increase in the transitory variance starting in the mid-1980s, without conducting a full analysis. By contrast, Primiceri and van Rens (2009), using repeated cross sections on income and consumption from the Consumer Expenditure Survey (CE), find that all of the increase in household income inequality in the 1980s and 1990s reflects an increase in the persistent (or permanent) component of the variance. Our results indicate that, for the increase in the cross-sectional variance of household income, the transitory variance does play some role, although not as prominent a role as Gottschalk and Moffitt (2009) seem to suggest. (8) Furthermore, we show that the (relatively small) increase in the transitory variance of household income reflects increases in the transitory variance of spousal labor earnings and of investment income.

Our paper is also related to a recent literature that has analyzed the trends in the dispersion of short-term income changes, or income volatility, where volatility is defined as the standard deviation of percentage changes in male earnings over, say, 1 year. The findings in this literature have been more consistent across different studies. For instance, Congressional Budget Office (2008), John Sabelhaus and Song (2009, 2010), Sule Celik and coauthors (2012), and Donggyun Shin and Gary Solon (2011) all find that the volatility of male earnings did not increase between the 1980s and the early 2000S. (9) Our male labor earnings data are consistent with the findings in this literature, as we document no increase in male earnings volatility. However, we do find an increase in the volatility of total household income.

Finally, our study also relates to a literature that examines changes in the distribution of household consumption expenditure in the United States. Economic theory predicts that increases in the dispersion of the persistent components of income are likely to lead to increases in the dispersion of consumption. A few studies have examined whether the well-documented increase in U.S. income inequality has indeed been accompanied by an increase in consumption inequality of similar magnitude. Some of the earlier studies in this literature, including Daniel Slesnick (2001), Krueger and Peril (2006), Heathcote, Peril, and Violante (2010), and perhaps to a lesser extent Orazio Attanasio, Eric Battistin, and Hide Ichimura (2007) and Attanasio, Battistin, and Mario Padula (2011), find that consumption inequality increased by only a fraction of the increase in income inequality. However, these studies relied on data from the CE, and it has been increasingly recognized in the literature that these data are subject to potentially severe measurement error problems. More recent studies, such as Mark Aguiar and Mark Bils (2012) and Attanasio, Erik Hurst, and Pistaferri (2012), attempt to control for these measurement problems and conclude that consumption inequality has increased by a similar magnitude as income inequality. Thus, the implications of our results of a significant increase in consumption inequality appear to be borne out by the most recent evidence based on consumption data.

II. Data

This section describes our panel of income data from tax returns, the main variables we use, our sample selection, and the trends in income inequality observed in our data over the period 1987-2009.

II. A. Panel

We use a 23-year panel of income data from tax returns spanning the period 1987-2009. Our sample is a 1-in-5,000 random sample of the U.S. tax-filing population (with two exceptions noted below), (10) and inclusion of tax units in the sample is based on the last four digits of the Social Security number (SSN) of the primary tax filer. (11) The sample is kept representative of the tax-filing population by adding, each year, any new tax units that join the population of fliers (for example, immigrants and young people entering the work force) and have an SSN with the sampled four-digit ending. Our panel is not subject to the usual attrition or nonresponse problems present in most survey-based panels. Tax units might leave the sample because of death, emigration, or income falling below the tax filing threshold, but these exits do not affect the representativeness of the sample. Additionally, the age distribution of our sample is representative, each year, of the age distribution in the population of tax fliers in that year.

To create our 23-year panel, we started with tax returns from an existing panel, known as the 1987-96 Family Panel, constructed by the Statistics of Income (SOI) division of the IRS. We then extended this panel using returns contained in cross-sectional files from 1997 to 2009. From this extended sample we then selected those returns for which the primary filer had an SSN ending in one of two four-digit combinations. The resulting panel (again, with two exceptions noted below) is essentially a 1-in-5,000 random sample of tax units in each year of the period 1987-2009. Each of the original data sources is next described in turn.

The 1987-96 SOI panel started with a stratified random sample of taxpayers who filed in 1987, a subset of which was chosen based on the primary filer's SSN ending in one of two four-digit combinations. (12) All individuals represented on the tax return of a member of this cross section, including secondary taxpayers on joint returns and dependents, were considered to be members of the panel. Over the following 9 years, the SOI division included in the panel all returns that reported any panel member as a primary or secondary taxpayer, including returns filed by panel members who were dependents of another taxpayer. To keep the sample representative of the tax-filing population in subsequent years, returns from tax years 1988 through 1996 were added to the panel if the primary filer had an SSN ending in one of the two original four-digit combinations but did not file a return in 1987. In addition to information from each taxpayer's Form 1040, the data set includes information on the age and sex of the primary and secondary fliers from matched SSA records, and information on wages and contributions to employer-based retirement plans from W-2 forms.

The 1997-2009 data come from yearly cross sections, also collected by the SOI division. As with the 1987 sample described above, a stratified random sample was collected in each of these years, consisting partly of a strictly random sample based on the last four digits of the primary filer's SSN. In each year the set of SSNs used for sampling included the original two four-digit endings from 1987, making it possible to extend the earlier panel using returns collected from the yearly cross sections. Each cross section contains information from the taxpayer's Form 1040 and from a number of other forms and schedules. Into these data we merged information on the age and sex of the primary and secondary fliers from SSA records, and information on wages and contributions to employer-based retirement plans from W-2 forms.

We note, however, that there was a change in the sampling frame of our data in 1996. As a result of this change, we are missing two groups of filers in the pre-1996 period: dependent filers in 1987 over the period 1987-96, and nondependent primary filers in 1988-96 who were either dependent or secondary fliers in 1987. These two groups primarily consist of young (in the case of dependents) or female (in the case of secondary) taxpayers. The effect of missing these returns is therefore likely to be very small when we examine the labor income of males in their earning years, although it may be larger when we examine household income.

II. B. Variable Description

The ideal measure of individual-level earnings for this study would be gross labor income before any amounts are deducted for health insurance premiums or retirement account contributions. However, our data do not contain such a variable, and hence we use a measure of labor income that is as close to gross labor income as is possible when using tax data. For this we start with taxable wages, as reported in the "Wages, tips, other compensation" box of taxpayers' W-2 forms, and add the contributions to retirement savings accounts reported on the W-2 forms. This measure of labor income will include all income that a taxpayer's employer has reported to the IRS, namely, wages, salaries, and tips, as well as the portion of these that is placed in a retirement account. Since our data do not include information on the health insurance premiums paid by the taxpayer and excluded from taxable wages, our measure of labor income will exclude those amounts. Our measure also excludes any income earned from self-employment.

For pre-tax total household income, we start with "total income" as reported on Form 1040. This variable includes wages and salaries; dividends; alimony; business income (from sole proprietorships, partnerships, or S corporations); income from rental real estate, royalties, and trusts; unemployment compensation; capital gains; and taxable amounts of interest, IRA distributions, pensions, and Social Security benefits. To this we add back nontaxable interest, IRA distributions, pensions, and Social Security benefits reported on Form 1040.

There is some debate as to whether capital gains should be included in the measure of household income. Capital gains realized and reported in a particular year may include gains that accrued in past years. Hence, including capital gains may make household income appear "lumpier" than it actually is, since income will be higher in years when gains from earlier years are realized, and lower in years when gains accrued but were not realized. However, excluding capital gains will result in the measure of household income being too low for any taxpayer who had gains in that year (whether or not they were realized), and this downward bias will be quite large for taxpayers whose primary source of income is from investments. On balance, we feel that this concern is more important, and therefore we include capital gains in our benchmark measure of household income. However, we have verified that our results are robust to the exclusion of capital gains.

For after-tax household income, we start with the measure of pre-tax household income described above. We then subtract the amount of "total tax" reported on Form 1040. This amount captures total income taxes (including self-employment taxes) after nonrefundable tax credits are taken into account. Next, we subtract the total amount of payroll (FICA) taxes owed on the earned income of the couple. This is done to ensure that all federal taxes (including income and payroll taxes) are included for all taxpayers, regardless of whether they are wage and salary workers or self-employed. Finally, we add refundable tax credits (including the earned income tax credit and the refundable portion of the child tax credit) to arrive at our measure of after-tax household income.

As is usually the case with administrative data, our data contain relatively few sociodemographic variables. Most important, although we have information on the age and sex of the primary and secondary filers, we do not have information on the education or race of either. We also lack information on hours of work, and hence our analysis will focus on annual earnings as opposed to hourly wage rates.

II. C Sample Selection

For the case of individual earnings, we restrict our sample to males (whether they appear as the primary or the secondary filer in the tax form), as is standard in the literature, because the movements of females into and out of the labor force introduce discontinuities in the earnings process that are difficult for the statistical models of income to handle. For household income we carry out our analysis using two alternative samples. The first includes only households with a male primary or secondary filer and is thus similar to the sample we use to study male earnings. This avoids confounding the effects of moving to a broader measure of income (total household income) with the effects of moving to a broader sample of households. In addition, this sample is less likely to be affected by the change in sampling frame discussed in section II. A. In a slight abuse of terminology, we refer to this sample as our "male-headed households" sample. The second sample adds to this sample all other tax-filing households (that is, those without a male primary or secondary filer), a group that consists largely of single females. We are also interested in this broader sample because it is representative of the population of U.S. taxpayers.

For both male earnings and household income, we restrict our sample to individuals aged 25 to 60. We impose this restriction because individuals in this age group are likely to have completed most of their formal schooling and are sufficiently young not to be too strongly affected by early retirement. We also exclude earnings (or income) observations below a minimum threshold. For male earnings, since tax records do not provide information on employment status or hours of work, we can exclude individuals with presumably weak labor force attachment only by dropping low-earnings observations. For household income, we cannot simply exploit the fact that households with sufficiently low income are not required to file taxes, because many actually do so to claim refundable tax credits such as the earned income tax credit. Therefore, in order to treat low-income observations consistently, we exclude observations with reported household income below a minimum threshold. (13) We take the relevant threshold to be one-fourth of a full-year, full-time minimum wage. (14)

After imposing the restrictions above, we end up with a male earnings sample of 221,099 person-year observations on 20,859 individuals. For household income, our broader sample, which includes households without a male primary or secondary filer, contains 353,975 person-year observations on 33,730 households. We refer to this sample as our "all households" sample. Table 1 reports the number of observations and the mean and the standard deviation of the relevant income measure for our male earnings sample and for each of our household income samples.

II. D. Income Inequality Trends, 1987-2009

We begin by documenting the trends in inequality for male earnings and for household income, the latter before and after taxes, in our panel of tax returns. The top panel of figure 1 shows the cross-sectional variance of (the logs of) male earnings, pre-tax household income, and after-tax household income annually over 1987-2009, and the bottom panel the Gini coefficient for the same three measures of income. The figures show an increase in both measures of inequality for all three measures of income over the period. For example, the cross-sectional variance increases by 0.14 squared log point for male earnings (from 0.61 in 1987 to 0.75 in 2009), by 0.19 squared log point for pre-tax household income, and by 0.12 squared log point for after-tax household income. (15) In general, inequality in individual earnings is lower than inequality in household income. Furthermore, inequality in after-tax household income is lower than inequality in pre-tax household income, reflecting the progressivity of the federal tax system.

[FIGURE 1 OMITTED]

These inequality trends in our data are consistent with trends that have been documented in many other U.S. studies using different data sets. In the remainder of the paper, we focus on the cross-sectional variance of (the logs of) earnings and household income as our measure of inequality, because of its tractability for statistical decompositions, and we investigate to what extent the increase in the variance shown here represents an increase in the variance of the persistent or in the transitory component of income.

III. Methodological Approach

As discussed in the introduction, given that the degree of persistence (or serial correlation) of income shocks lies in a range between the two theoretical extremes, the choice of the dividing line between what degree of serial correlation will be considered "persistent" and what degree "transitory" is necessarily somewhat arbitrary. In our analysis we use two sets of methods, each of which takes a somewhat different approach to separating income into persistent and transitory parts.

First, in section IV we employ simple nonparametric decomposition methods that essentially decompose income into a highly transitory piece that exhibits no serial correlation and one other piece, which we call "persistent." These methods then ask, for each of these two pieces, how much of the rise in the variance of income is coming from changes in the variance of that piece. Second, in section V we employ nonstationary error components models of income dynamics. These models fully specify the process that generates income over time and essentially decompose income into a highly persistent piece and another, transitory piece that allows for some limited degree of serial correlation. Here, too, we then ask how much of the rise in the variance of income is coming from changes in the variances of the persistent and of the transitory piece. Note that neither approach is right or wrong: each is interesting in its own right. And as we show, both yield very similar qualitative results for the trends in inequality and its components.

Before turning to the specific methods and results, we note that throughout the paper we work with measures of income from which we have removed the predictable life-cycle variation in income, that is, the variation that can be explained by differences in age across individuals. For male earnings we work with residuals from least squares regressions (run separately for each calendar year) of log earnings against a full set of age dummy variables. For the two measures of household income, in addition to the age-related variation, we remove the income variation that is due to differences in household size and composition. We work with residuals from regressions (run separately for each calendar year) of log household income on a full set of age dummies for the primary tax filer, indicators of sex and marital status for the primary filer, and a full set of dummies for the number of children (up to 10) in the household. We have verified, however, that working directly with the raw measures of male earnings and total household income, rather than with these residuals, leads to qualitatively similar results. (16)

IV. Simple Nonparametric Methods

We begin our analysis using simple nonparametric methods. In this section we introduce the methods and present the corresponding decompositions for male labor earnings. The methods used in this section are largely descriptive and do not explicitly rely on any model of the income process. In section V we turn to our analysis using error components models and again present the resulting decompositions for male labor earnings. Results of both approaches for total household income are presented in section VI.

IV. A. Volatility

We start with a simple, purely descriptive measure of the dispersion in the cross-sectional distribution of income changes that occur over short horizons, namely, the standard deviation of percentage changes in (residual) male earnings. Following Shin and Solon (2011), we refer to this measure as the "volatility" of earnings. This measure is closely related, although not equivalent, to the variance of the transitory component of income that we will discuss in the following sections. (17) Figure 2 plots over the sample period the standard deviations of both 1-year and 2-year percentage changes in residual male earnings. The figure shows no clear increasing or decreasing trend in either series. Although volatility increased in the last 3 years of our sample, there is no indication that this represents the beginning of a rising trend. In fact, regressing each of the two volatility series shown on a constant and a linear time trend yields an estimated coefficient on the latter that is essentially zero. (18) There is thus no evidence in our data of a trend in male earnings volatility for our sample period.

[FIGURE 2 OMITTED]

IV.B. Simple Nonparametric Decomposition Methods

We next consider two simple nonparametric methods that decompose the cross-sectional variance of income (our measure of income inequality) into persistent and transitory parts. The methods in this section essentially define the persistent component of income as the average of annual income over a certain number of years, and transitory income as the deviations of annual income from that average.

The first method, which is used in Kopczuk, Saez, and Song (2010, hereafter KSS), defines person i's persistent income component in year t as the average of person i's annual log income (or residual log income) over a P-year period centered around t. Transitory income for person i in year t is then defined as the difference between person i's current annual income at t and his or her persistent income in the same year. The persistent and transitory components of the variance are next calculated as the variances, across individuals, of persistent and transitory income, respectively.

For our decomposition of the cross-sectional variance of (residual) male earnings into persistent and transitory parts using the KSS method, we set parameter P = 5, the same value used by KSS. (19) Whereas they use raw (as opposed to residual) log earnings and restrict observations to individuals who are present in the sample for all 5 years, we use residual log earnings and do not require individuals to be present in the sample in all 5 years. However, the results are not materially different when we follow their treatment and restrictions.

The top panel of figure 3 presents the results of this decomposition, showing that the persistent component of the variance in male earnings increased over our sample period but the transitory component did not. Hence the increase in the total cross-sectional variance was entirely driven by the persistent component. Table 2 formalizes this result, reporting estimates from a regression that fits a linear time trend, separately, to the persistent variance series and to the transitory variance series.

The first column in each of the two panels of table 2 corresponds to the KSS decomposition from figure 3. The dependent variable is either the persistent (left panel) or the transitory (right panel) variance component, and the explanatory variables are a constant (not shown) and a linear time trend. The table shows a statistically significant rising linear trend in the persistent variance: the estimated linear trend coefficient is 0.0037 with a standard error of 0.0002, implying an increase of 0.09 squared log point over 23 years. There is no trend in the transitory variance component (the estimated trend coefficient is 0.0000). That is, the entire increase in the total cross-sectional variance of (residual log) male earnings was driven by an increase in the variance of the persistent component of earnings, and thus reflects an increase in persistent inequality.

[FIGURE 3 OMITTED]

The second nonparametric decomposition method that we consider was introduced by Gottschalk and Moffitt (1994, hereafter GM). The GM method is similar, although not identical, to the KSS method, and we consider it separately because it relies (indirectly) on a simple model of income, which might provide a slightly more direct way of relating it to our error components models. The method is based on the simple specification of (residual) log earnings [[xi].sub.it] = [[alpha].sub.i] + [[alpha].sub.i], where [[alpha].sub.i] is purely permanent (time-invariant) and [[epsilon].sub.it], is purely transitory (i.i.d.). For a P-year window centered around each year t, the method uses the standard formulas implied by this simple "random effects model" to compute the persistent variance of [[xi].sub.it] as the variance of the [[alpha].sub.i] component, and the transitory variance of [[xi].sub.it], as the variance of the [[epsilon].sub.it], component. (20) To obtain a series of persistent and transitory variance estimates over time, this procedure is repeated for consecutive, overlapping P-year moving windows. (21)

The bottom panel of figure 3 presents the GM inequality decomposition. As with the KSS method, this decomposition implies that the persistent variance component increased over the sample period but the transitory component did not. This is confirmed in the second column in each panel in table 2. Here, too, the coefficient on the linear time trend is large and significant for the persistent variance component and is essentially zero for the transitory component. Both trend coefficients are quite precisely estimated. Thus, once again, the increase in the total cross-sectional variance was entirely driven by the increase in the variance of persistent earnings, constituting an increase in persistent inequality.

Note as well that both the KSS method and the GM method attribute a large fraction of the total variance (more than 80 percent on average across all years) to the persistent component. We will come back to this point below.

V. Error Components Models

In this section we turn to error components models (ECMs) of income dynamics to examine the role of persistent and transitory income components in determining the trend in inequality. These ECMs are statistical models (stochastic processes) that approximate the dynamic properties and the trajectory of income over time. Like the simpler nonparametric decomposition methods presented in section IV, ECMs typically specify income as consisting of a persistent component and a transitory component, and they can be used to decompose the variance of (log) income into persistent and transitory parts.

For example, the persistent component of income in the model will tend to capture differences in incomes across individuals that are due to differences in permanent characteristics such as education and unobserved ability. It will also capture income changes that have lasting effects on the path of the income process, such as the onset of a chronic illness or the permanent loss of a high-paying job. The transitory component will tend to capture changes in income that are less persistent but may have some serial correlation, such as a temporary illness or transitory unemployment. The model then essentially attributes variation in income to the persistent or the transitory component according to the strength in the correlations between individuals' current and future income in the data, and to how this strength changes as the periods move further apart. Statistically, the separate identification of the persistent and transitory components relies on the simple idea that the contribution of the transitory component to the autocovariance of income between two periods vanishes as the periods get further apart.

[FIGURE 4 OMITTED]

Flexible specifications of the income process, such as the ones we consider in this paper, can match the entire autocovariance structure of income in the data, as well as its changes over the life cycle and over calendar time. To illustrate, figure 4 shows two particular aspects of the autocovariance structure of male labor earnings in our data. Here we focus on the series labeled "empirical" in each of the two panels in the figure. (22) The top panel displays the variance (calculated across all individuals of the same age) of residual log male earnings as a function of age. To construct the series, we computed the variance of (residual) male labor earnings in the data for each combination of age and calendar year and regressed this variance against a full set of year and age indicators. The figure displays the estimated coefficients on the age indicators (normalized so that a = 1 in the figure corresponds to age 25).

The corresponding series in the bottom panel displays the empirical autocovariance function for our male earnings data, that is, how the strength of the autocovariance between current earnings and future earnings changes as the periods get further apart. In other words, the figure shows how the empirical autocovariance (the autocovariance of earnings in the data for observations that are k years apart) depends on the "lead" k. To construct the series, we computed the autocovariance of male labor earnings for each combination of age, calendar year, and lead k and then regressed the autocovariance against a full set of age, year, and lead indicators. We then calculated the value of the autocovariance that is implied by the estimated regression for individuals aged 35 in base year 1990. The implied autocovariances for different ages or different years look very similar. For now, we simply note that the goal of the ECMs is to match aspects of the data such as these. (23) We will return to these figures below.

V.A. Stationary ECMs

We begin by presenting stationary models of the income process, that is, models in which the parameters are not allowed to change over calendar time. (24) In the next section we will present nonstationary ECMs, which allow certain parameters in the model to change over time, in order to capture changes in the distribution of income.

Let [y.sup.i.sub.a,t], denote log income, where i indexes individuals, a age, and t calendar years. (25) Log income is given by

(1) [y.sup.i.sub.a,t], = g([zeta]; [X.sup.i.sub.a,t]) + [[xi].sup.i.sub.a,t],

where [X.sup.i.sub.a,t] is a vector of observable characteristics, g(*) is the part of log income that is common to all individuals conditional on [X.sup.i.sub.a,t], [zeta] is a vector of parameters, and [[xi].sup.i.sub.a,t] is the unobservable error term. As is common in the literature on income dynamics, we control for the income variation that is due to observables, [X.sup.i.sub.a,t], and focus on the dynamics of the error term, [X.sup.i.sub.a,t]. (26)

The error [[xi].sup.i.sub.a,t] is modeled as consisting of a persistent and a transitory part:

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(3) [p.sup.i.sub.a,t] = [psi][p.sup.i.sub.a-1,t-1] + [[eta].sup.i.sub.a,t]

(4) [[tau].sup.i.sub.a,t] = [[epsilon].sup.i.sub.a-1,t-1] + [[theta].sub.2][[epsilon].sup.i.sub.a-2,t-2]

(5) [[alpha].sup.i] ~ i.i.d.(0, [[sigma].sup.2.sub.[alpha]), [[eta].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[eta]]), [[epsilon].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[epsilon]]).

The persistent part of income includes, first, an individual-specific, time-invariant component, [[alpha].sup.i], which captures differences in income across individuals due to factors that include education as well as unobserved ability or productivity. It also includes an autoregressive component, [p.sup.i.sub.a,t], which captures other components of income that are highly persistent. As is common in such models, our estimates of [psi] for the above specification will turn out to be quite close to 1, so it is appropriate to label component [p.sup.i.sub.a,t], as "persistent." These large values of [psi] allow the model to match both the nearly linear increase in the variance of (residual) income in the data as a function of age seen in the top panel of figure 4, and the very gradual decline (after the first 1 to 2 years) in the empirical autocovariance function seen in the bottom panel. (27)

We specify the transitory income component in the model, [[tau].sup.i.sub.a,t], as an MA(2) process. Several studies of income processes have found evidence for the presence of either an MA(1) or an MA(2) transitory component. (28) We choose an MA(2) process to err on the side of allowing the transitory income component to exhibit more persistence, but we have verified that our results are not sensitive to this choice.

The top panel of table 3 presents point estimates and standard errors for the model in equations 2 through 5 for our various measures of income and our various samples. (29) For instance, the first column reports the following point estimates (with standard errors in parentheses) for residual male earnings [[??].sup.2.sub.[alpha]] = 0.1968 (0.0018), [psi] = 0.9623 (0.0010), [[??].sup.2.sub.[eta]] = 0.0293 (0.0007), [[??].sup.2.sub.[epsilon]] = 0.1826 (0.0034), [[??].sub.1] = 0.2286 (0.0144), and [[??].sub.2] = 0.1231 (0.0151). For (residual) pre-tax household income using the sample of all households (third column of the table), the estimates are [[??].sup.2.sub.[alpha]] = 0.1960 (0.0016), [psi] = 0.9669 (0.0007), [[??].sup.2.sub.[eta]] = 0.0269 (0.0006), [[??].sup.2.sub.[epsilon]] = 0.1577 (0.0032), [[??].sub.1] = 0.2766 (0.0148), and [[??].sub.2] = 0.1639 (0.0154). These estimates are broadly comparable to those obtained by other studies that use similar specifications. (30) Also, the estimated models match the main features of the data, such as those presented in figure 4, quite well. (31)

The bottom panel of table 3 presents estimates for a version of the model that imposes the restriction that [psi] = 1, that is, that [p.sup.i.sub.a,t], follows a random walk, an assumption often made about the persistent component. Here we simply note that, in terms of matching the features of the data shown in figure 4, the random walk specification matches the nearly linear increase with age of the cross-sectional variance in the top panel of figure 4, but it does not match well the gradual decline in the autocovariance function shown in the bottom panel. By contrast, the unrestricted estimates of [psi] (which generally lie around 0.96 to 0.98 for our various income measures and samples) allow the unrestricted model to match the increase in the variance with age fairly well and the pattern of the autocovariance function of male earnings quite closely. In the analysis that follows, we do not impose the restriction [psi] = 1 on component [p.sup.i.sub.a,t], in part to better match the autocovariance function of income.

V.B. Nonstationary ECMs

Stationary models, however, cannot be used to study changes in the distribution of income (such as income inequality) over calendar time. This question requires the use of nonstationary models, which allow certain features of the income process (and hence of the income distribution) to change over time. Such models can capture (in addition to those features of the autocovariance structure of the data shown in the previous section) trends in the cross-sectional variance of income, such as that seen in the top panel of figure 1.

Our baseline nonstationary ECM is as follows. We model residual income, [[xi].sup.i.sub.a,t], as

(6) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(7) [p.sup.i.sub.a,t] = [psi][p.sup.i.sub.a-1,t-1] + [[eta].sup.i.sub.a,t]

(8) [[tau].sup.i.sub.a,t] = [[pi].sub.t][[epsilon].sup.i.sub.a,t] + [[theta].sub.i][[pi].sub.t-1] [[epsilon].sup.i.sub.a-1,t-1] + [[theta].sub.2][[pi].sub.t-2][[epsilon].sup.i.sub.a-2,t-2]

(9) [[alpha].sup.i] ~ i.i.d.(0, [[sigma].sup.2.sub.[alpha]]), [[eta].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[eta]]), [[epsilon].sup.i.sub.a,t] ~ i.i.d.(0, [[sigma].sup.2.sub.[epsilon]]).

In the equations above, both components of persistent income, [[alpha].sup.i] and [p.sup.i.sub.a,t], are multiplied by the year-specific factor loadings [[lambda].sub.t], which allow the relative importance of the persistent components of income to vary over calendar time (note that the parameter [[lambda].sub.t] can change from year to year). The transitory income component in the model, [[tau].sup.i.sub.a,t], is specified as an MA(2) process in which the transitory innovations, [[epsilon].sup.i.sub.a,t], are multiplied by the year-specific factor loadings [[pi].sub.t], which allow the variance of the innovations, and hence the relative importance of the transitory component, to vary by calendar year.

A few words about the interpretation of the [[lambda].sub.t] parameters are in order. Suppose, first, for simplicity that [[alpha].sup.i] represents solely education, and that [p.sup.i.sub.a,t] represents human capital (which changes slowly over time and is highly persistent). Then, the [[lambda].sub.t] parameters would represent the "price" that the economy attributes to these characteristics in year t. Note as well that the "price" of such characteristics can indeed change from year to year, as evidenced, for example, by the well-documented changes in the returns to education in recent decades. It seems reasonable to expect that the economy will assign a price not just to education, but also to other productive characteristics of individuals (including, but not restricted to, those embedded in human capital). (32) More generally, [[alpha].sup.i] will capture, in addition to education, other permanent characteristics of individuals (or households) such as unobserved ability or productivity, and [p.sup.i.sub.a,t] will capture characteristics that are slow-moving and persistent, such as human capital and social connections. A similar modeling approach of nonstationarity in the persistent component of income is followed, for example, in Moffitt and Gottschalk (1995, 2011), Haider (2001), and Baker and Solon (2003). (33)

A key element of the above specification is clearly the ability of the [[lambda].sub.t] parameters to change over time. One potential concern that this raises, however, is that the [[lambda].sub.t] parameters could in principle bounce around from year to year. Such transitory variation in [[lambda].sub.t] could muddle the labeling of [[lambda].sub.t]([[alpha].sup.i] + [p.sup.i.sub.a,t]) in equation 6 as the "persistent" component of income. To address this concern, when estimating the above model, we impose some smoothness on the movements of [[lambda].sub.t] over time by restricting [[lambda].sub.t] to lie on a fourth-degree polynomial. (34)

V.C. Estimation

Estimation of our ECMs proceeds in two stages. In the first stage we construct residuals from regressions of log earnings (or log income) against observables, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], as discussed in section III. In the second stage we use those residuals to estimate all model parameters other than [zeta], using a minimum distance estimator. The estimator matches all of the theoretical variances and autocovariances implied by the model in equations [zeta], through 9 to their empirical counterparts. The procedure matches 7,912 variances and autocovariances in total. All variances and autocovariances are specified in levels. Appendix C provides details on the minimum distance estimation procedure, and appendix D shows the theoretical moments that are implied by the model and that are matched in estimation.

V.D. ECM-Based Variance Decomposition for Male Earnings

Table 4 presents parameter estimates of our baseline nonstationary ECM for our various measures of income and our various samples. Note that the estimates of parameters [[sigma].sup.2.sub.[alpha]], [psi], [[sigma].sup.2.sub.[eta]], [[sigma].sup.2.sub.[epsilon]], [[theta].sub.1], and [[theta].sub.2] (those also present in the stationary version of the model) in table 4 are quite similar to the corresponding estimates in table 3 for the stationary model. The lines labeled "ECM-predicted" in figure 4 show the estimated nonstationary model's predictions for the variance of male earnings as a function of age (top panel) and for the autocovariance function of male earnings (bottom panel). (35) As the figure shows, the estimated model fits the data quite well.

In this section we use our estimated nonstationary ECM to decompose the cross-sectional variance of log (residual) male earnings into its persistent and transitory parts. For each calendar year between 1987 and 2009, and given an age distribution, the ECM in equations 6 through 9 implies a specific value for the total cross-sectional variance, the variance of the persistent component, and the variance of the transitory component of log (residual) earnings, as a function of the model parameters. We compute these variances implied by the estimated model using the actual empirical age distribution for each year in our sample. (36) Note that the trends in the persistent and the transitory variance components in our baseline model are primarily determined by the estimates of the [[lambda].sub.t], and [[pi].sub.t], parameters, respectively.

The decomposition of inequality implied by our estimated baseline ECM is presented in figure 5. The top line, which shows the total cross-sectional variance implied by the estimated model for each calendar year, is essentially identical to the empirical cross-sectional variance of log (residual) male earnings in our data. That is, our estimated model matches the evolution of the cross-sectional variance over calendar time very closely. (37)

[FIGURE 5 OMITTED]

The persistent component of the variance in figure 5 displays a clearly increasing trend, rising from 0.38 squared log point in 1987 to around 0.47 squared log point in 2009. The transitory component of the variance, by contrast, fluctuates over the 23-year period but does not exhibit any trend. The last column of table 2 shows that there is no trend for the transitory variance: the estimated trend coefficient is 0.0001 (with a standard error of 0.0004), which would imply a negligible increase of 0.003 squared log point over 23 years. In other words, the entire increase in the total cross-sectional variance of (residual log) male earnings as determined by the nonstationary ECM is driven by an increase in the variance of the persistent component of earnings, confirming the results obtained previously with the simpler nonparametric methods.

V.E. Comparison with Simple Nonpammetric Decompositions

Here we briefly discuss the relationship between the model-based decomposition just presented and the simple nonparametric decompositions shown previously, in the hope of clarifying some of the connections and the differences that exist across the methods. So far we have shown that the different methods yield essentially the same answer regarding the trends in inequality, namely, that the rising trend in male earnings inequality over our sample period has been entirely driven by the persistent component of earnings. However, the different decompositions presented above yield somewhat different relative shares of persistent and transitory inequality at a given point in time. Specifically, the KSS and GM methods attribute, on average, more than 80 percent of the total variance to the persistent component, whereas the ECM attributes slightly less than 70 percent.

This difference reflects the feature of the KSS and GM decompositions that transitory income is defined as deviations from multiyear averages of annual income, and therefore captures only purely transitory income (that is, income that has no serial correlation whatsoever). As a result, basically all the persistence in the income data is attributed to the persistent income component. This implies in turn that even shocks that dissipate in 1 to 2 years, and that would generally be viewed as transitory but are somewhat serially correlated, will tend to be attributed to the persistent income component. Consequently, the persistent component is assigned a larger role overall and accounts for a large fraction of total inequality at any given point in time. In the ECM, by contrast, transitory income is allowed to have some degree of serial correlation, so it captures some of the short-duration persistence in the data, and thus the transitory component is assigned a slightly larger share of the total variance. It is reassuring that despite some differences in the persistent and transitory shares of inequality, both approaches yield essentially the same answer for the trends in income inequality. (38)

VI. Household Income

We next examine the trend in the variance of the persistent and transitory components of pre-tax total household income. As noted in the introduction, examining household income is important because it is a broader measure of a household's resources and therefore has a more direct beating on household consumption and welfare. In going from individual male earnings to total household income, a number of income components are added. These can be grouped into four main categories: spousal labor earnings, transfer income, investment income, and business income. Transfers are defined here as the sum of alimony received, pensions and annuities, unemployment compensation, Social Security benefits, and tax refunds. Investment income includes interest, dividends, and capital gains. Business income includes income from sole proprietorships, partnerships, and S corporations. (39)

As already mentioned in section II, we carry out the analysis of household income using two alternative samples. The first, our "male-headed households" sample, consists of households with a male primary or secondary filer aged 25 to 60 whose annual labor earnings are above the minimum threshold. Our second, broader sample of "all households" essentially adds single females to the previous sample. (40) As table 1 shows, for pretax household income the broader sample has about 133,000 observations more than the sample of male-headed households.

As described in section III, the analysis here is performed on residuals from a first-stage regression of log household income on the sex, age, and filing status of the primary filer, and on a full set of dummies for the number of children. (41)

VI.A. Volatility

Figure 6 plots the standard deviation of 1-year and 2-year percentage changes in total household income for our sample of all households over the sample period. (The corresponding figure for the sample of male-headed households is very similar and is not shown.) As the figure shows, household income volatility, as measured here, rose 9 percent for 1-year income changes and 11 percent for 2-year income changes over the sample period, and there appears to be a clear rising trend. In fact, fitting a linear time trend to each of these two series yields coefficients on the time trend of 0.0022 (0.0003) for 1-year changes, and 0.0020 (0.0003) for 2-year changes, each implying an increase of about 0.05, or more than 10 percent, over the 23-year period. Thus, in contrast to male earnings, household income volatility appears to have increased over the sample period, which suggests that the transitory component of the variance might have played a role in the increase in the cross-sectional inequality of household income.

[FIGURE 6 OMITTED]

VI.B. Simple Nonparametric Variance Decompositions

Figure 7 shows the decomposition of the cross-sectional variance of (residual) pre-tax household income on the sample of all households, using the KSS method. (The decomposition using the GM method is very similar and is therefore not shown.) The figure shows a clear increase in the persistent part of the variance over the period of about 22 percent. The first column in the bottom panel of table 5 fits a linear time trend to the persistent variance. The estimated trend coefficient of 0.0056 (0.0004) is strongly significant and implies an increase in the variance of 0.13 squared log point over 23 years, explaining nearly the entire increase in the total variance shown in the figure.

However, the transitory variance component in the figure has also increased over the period, by about 15 percent. (This is somewhat hard to see in the figure because of the low level of the transitory variance.) The fourth column in the bottom panel of table 5 shows an estimated linear time trend coefficient of 0.0008 (0.0001) for the transitory variance, which is statistically significant but implies an increase in the variance of only 0.02 squared log point over 23 years. In other words, although the transitory component of the variance did increase, that increase had little effect on the total variance because the KSS method attributes only a very small fraction of the total variance to the transitory component (13 percent, on average, in this decomposition). Thus, the increase in the total variance is again driven by the increase in the persistent component. However, under a decomposition that assigned a larger share of the total variance to the transitory component, the transitory variance would likely play a somewhat larger role.

[FIGURE 7 OMITTED]

VI.C ECM-Based Variance Decomposition

We next examine the decomposition of the variance of pre-tax household income based on our nonstationary ECM. The second and third columns of table 4 present point estimates and standard errors for our baseline specification estimated on pre-tax household income, for both our sample of households with a male head (second column) and our broader sample of all households (third column). Figure 8 presents the corresponding variance decompositions. (42)

The figure shows a clear increasing trend in the persistent component of the variance, which appears to have been concentrated in the first half of the 23-year sample period. The transitory component, by contrast, appears to have been relatively flat, although it increased somewhat in the last few years of the sample (the early to mid-2000s). The third and sixth columns of table 5 fit a linear time trend to the two variance components from figure 8 and confirm the rising trend for the persistent component of pre-tax household income. In the third column of the bottom panel, which corresponds to the sample of all households, the estimated linear trend coefficient of 0.0048 (0.0005) is strongly statistically significant and implies an increase of 0.11 squared log point over 23 years, accounting for roughly 80 percent of the increase in the total variance seen in the bottom panel of figure 8. The estimates in the sixth column of the bottom panel show a small rising trend in the transitory component of the variance, which has an estimated trend coefficient of 0.0013 (0.0005), implying an increase of 0.03 squared log point over 23 years and accounting for the remaining 20 percent of the increase in the total variance.

[FIGURE 8 OMITTED]

These results suggest that an increase in the variance of the persistent component of income accounted for the bulk of the increase in the cross-sectional variance of total pre-tax household income. The transitory component also contributed to the increase, but only a relatively small fraction, the precise contribution depending somewhat on the decomposition method used, on model specification in the case of the model-based decompositions, and on other factors such as the sample used. We conclude that the increase in household income inequality was mostly persistent. (43)

VI. D. The Increase in the Transitory Variance of Household Income

We have shown that the increase in the total variance of household income was mostly persistent, but that unlike with male earnings, the transitory variance appears to have played some role. Here we explore which source or category of household income might account for the increase in the transitory variance of total household income. As previously discussed, household income can be decomposed into male labor earnings, spousal labor earnings, transfer income, investment income, and business income. In this section we take male earnings and then sequentially (and cumulatively) add each of spousal earnings, transfer income, investment income, and business income. For each of the resulting income aggregates, we estimate our ECM and decompose the cross-sectional variance into persistent and transitory parts. (44) We then fit a linear time trend to the transitory variance component and estimate the increase in the transitory variance over 1987-2009 that is implied by the estimated time trend. Here we report results from decompositions based on our baseline ECM and our male-headed households sample, but the other methods lead to similar conclusions. (45) Starting with male earnings and moving along the series of increasingly broad income aggregates, the implied increases in the transitory variance over 1987-2009 (in squared log points) are 0.003, 0.015, 0.016, 0.035, and 0.038, respectively. That is, the addition of spousal labor earnings and of investment income leads to a larger change in the implied increase in the transitory variance component over the sample period. We conclude that both spousal labor earnings and investment income contributed to the (relatively small) increase in the transitory variance of total household income.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | p. 67-106 |
---|---|

Author: | Debacker, Jason; Heim, Bradley; Panousi, Vasia; Ramnath, Shanthi; Vidangos, Ivan |

Publication: | Brookings Papers on Economic Activity |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Mar 22, 2013 |

Words: | 10570 |

Previous Article: | The missing "one-offs": the hidden supply of high-achieving, low-income students. |

Next Article: | Rising inequality: transitory or persistent? New evidence from a panel of U.S. tax returns. |

Topics: |