# Data revisions and out-of-sample stock return predictability.

I. INTRODUCTIONLettau and Ludvigson (2001;LL, thereafter) show that the consumption-wealth ratio (cay)--the error term from the cointegration relation among consumption, net worth, and labor income--is a strong predictor of stock market returns. Their findings are important because the forecasting power of cay is consistent with rational pricing theories, for example, Campbell and Cochrane (1999), and helps explain many puzzling phenomena in the equity market; see, for example, Lettau and Ludvigson (2003), for an overview. Moreover, by contrast with Bossaerts and Hillion (1999), Ang and Bekaert (2007), Goyal and Welch (2003), and others, cay also forecasts stock market returns out of sample. (1)

This paper investigates the out-of-sample predictive power of cay using real-time data instead of using revised data (as in LL). The information content of the two data sets could be quite different because of periodic revisions in consumption and labor income data provided by the Bureau of Economic Analysis (BEA) and in net worth data provided by the Federal Reserve Board. Therefore, the analysis has important implications for practitioners, for example, mutual funds managers and monetary policy makers, who might want to use cay to improve their forecasts with real-time information. (2)

Consistent with early studies, for example, Croushore and Stark (1999), I document substantial revisions to consumption and labor income data; consequently, cay varies considerably across vintages. For example, during the period 1996-97, cay is substantially below its sample average in real-time data, although it is above or around the sample average in the 2002:Q3 vintage, the latest release when this paper was written. That is, in hindsight, there was no irrational exuberance in stock markets until 1998, which is over 1 yr after the remarks by Fed Chairman Alan Greenspan. If investors had switched from stocks to bonds, as signaled by the low level of real-time cay, they would have missed the stock market run-ups over this period. The example illustrates the main finding of this paper that cay has negligible out-of-sample predictive power for stock market returns in real time. Similarly, given that stock prices continued to rise despite the irrational exuberance speech, Alan Greenspan adopted the new economy explanation in 1998 and stock prices rose further until the crash in 2000. This episode highlights the theoretical results in Bernanke and Gertler (1999, 2001): Although policy makers cannot ignore the dramatic movements in the equity market, it is tricky in practice for central banks to predict stock prices at the business-cycle frequency.

It is tempting to attribute the poor performance of real-time cay to the look-ahead bias suggested by Brennan and Xia (2005) and Avramov (2002). However, Guo (2006) argues that their results actually reflect an omitted variable problem: Recursively estimated cay regains the out-of-sample forecasting power when combined with a measure of realized stock market variance. His results are also consistent with an equilibrium model by Guo (2004), who argues that, in addition to a risk premium in the Capital Asset Pricing Model, investors also require a liquidity premium because of limited stock market participation. That is, realized market variance and ca), forecast stock returns because they are proxies for the risk and liquidity premiums, respectively.

To disentangle the effect of the look-ahead bias and data revisions, I make forecasts with recursively estimated cay obtained from (1) the current vintage data and (2) real-time data. The two approaches are identical if there were no data revisions. I confirm the early results that current vintage cay outperforms a benchmark model of constant stock returns when combined with realized market variance, although it does not do so by itself. However, cay has negligible out-of-sample forecasting power in real-time data even after adding realized market variance to the forecasting equation. Thus, the poor performance of cay in real-time data is mainly due to data revisions but not the look-ahead bias. For example, in the 1999 comprehensive benchmark revision, the BEA reclassified the employer contributions of government employee retirement plans as "other labor income" instead of "transfer payments to persons." The change, which was intended to treat government plans consistently with those of the private sector and is thus appropriate, explains the difference between real-time and revised cay over the period 1996-97.

The poor performance in real-time data does not suggest that cay is a useless forecasting variable. This is because cay performs well in the revised data, which might be available to investors at the time of forecast since the BEA does not create any new information. For example, investors could have had reclassified the employer contributions of government employee retirement from the very beginning and achieved a better investment outcome over the period 1996-97. Therefore, the results actually provide support for cay as a theoretically motivated variable because, as expected, it performs best in the most recent vintage data, which have the smallest measurement errors among all vintages.

Investors might also obtain similar information from alternative sources if cay is indeed related to economic activity, for example, business investments (Lettau and Ludvigson 2002). In particular, when cay is low or stocks are "overvalued," a firm's cost of capital is lower, leading to the acceptance of more new capital investment projects, which can result in an increase in the firm's idiosyncratic volatility. Consistent with this conjecture, Guo and Savickas (2006) find that value-weighted average idiosyncratic variance, which is directly observable and not subject to data revisions, forecasts stock market returns because of its strong co-movements with cay.

We cannot rule out that the predictive power of cay reflects data mining, for example, in Lo and MacKinlay (1990) and spurious regressions, for example, in Ferson, Sarkissian, and Simin (2003), and that there is no reliable out-of-sample predictability that investors can exploit with real-time information, as given by Cooper, Gutierrez, and Marcum (2005). These issues cannot be satisfactorily addressed until we have sufficient fresh data, and I leave them for future research.

The remainder of the paper is organized as follows. I discuss data in Section II and present the forecasting results in Section III. Section IV offers some concluding remarks.

II. DATA

A. Real-Time cay

I use exactly the same formula as that in LL to construct cay in real-time data. (3) I denote a variable, for example, consumption expenditure, as [C.sub.t,v], where t is the date of the observation and v is the date of the vintage. For example, [C.sub.1962:Q2,1962:Q3] is consumption expenditure of 1962:Q2 reported in the 1962:Q3 vintage. I define [C.sub.t,v] as

(1) [C.sub.t,v] = [CN.sub.t,v] + [CS.sub.t,v] - [CNL.sub.t,v],

where [CN.sub.t,v] is non-durable consumption, [CS.sub.t,v] is services, and [CNL.sub.t,v] is shoes and clothing. Labor income, [Y.sub.t,v], is defined as

[Y.sub.t,v] = [YPW.sub.t,v] + [YPTP.sub.t,v] + [YPL.sub.t,v] - [YPSS.sub.t,v] - [YPW.sub.t,v] [YPX.sub.t,v]/[YPW.sub.t,v] + [YOP.sub.t,v] + [YRI.sub.t,v] + [YPDV.sub.t,v] + [YPIN.sub.t,v], (2)

where [YPW.sub.t,v] is wages and salaries, [YPTP.sub.t,v] is transfer payments, [YPL.sub.t,v] is other labor income, [YPSS.sub.t,v] is social security contributions, [YOP.sub.t,v] is proprietors' income with inventory valuation adjustment and capital consumption adjustment (CCAdj), [YRI.sub.t,v] is rental income with CCAdj, [YPDV.sub.t,v] is personal dividend income, [YPIN.sub.t,v] is personal interest income, and [YPX.sub.t,v] is personal tax and non-tax payment. Net worth, [A.sub.t,v], is directly available and does not require any transformation. I then divide [C.sub.t,v], [A.sub.t,v], and [Y.sub.t,v] by total population, [POP.sub.t,v], and by the corresponding price deflators. As in LL, I use the deflator of personal consumption expenditure, [JC.sub.t,v], for net worth and labor income, while each component of consumption in Equation (1) has its own deflator: [JCN.sub.t,v] for non-durable consumption, [JCS.sub.t,v] for services, and [JCNL.sub.t,v] for shoes and clothing. I can sum up the real components of consumption directly before 1996, when the BEA used the fixed weighting scheme; however, I would have to construct real consumption using the Fisher ideal index subsequently when the BEA uses the chained weighting scheme.

I obtain real-time net worth data from the Federal Reserve Board. (4) The vintages span from 1995:Q3 to 2002:Q3 and the observations of each vintage start from 1952:Q1. Net worth data are available to the public with about a 2-mo delay, for example, the 2002:Q3 vintage contains observations from 1952:Q1 to 2002:Q2.

I follow Croushore and Stark (1999) in the collection of all the other real-time data from various issues and supplements of the Survey of Current Business. However, the timing convention is different from theirs. For example, for the 2002:Q3 vintage, Croushore and Stark use information up to August 15, 2002, the middle point of that quarter. In contrast, I incorporate all the information available at the end of the quarter and collect the data on September 30, 2002. This approach is appropriate, given that the purpose of this paper is to forecast stock returns using all the available information. Similar to net worth data, consumption and labor income are also available to the public with about a 1-mo delay. The vintages of consumption and labor income data span from 1968:Q2 to 2002:Q3 and with a few exceptions, the observations of each vintage start from 1952:Q1 in order to match net worth data. (5) I compare the two common variables of the dataset, (1) real non-durable consumption and (2) real services, with those collected by Croushore and Stark; I find that they match very well except for the difference due to the timing convention. To conserve space, I put a detailed discussion of data in an appendix, which is available on request.

For each vintage v, I estimate the ordinary least squares regression for the equation

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where lower cases denote log, real, and per capita variables; [DELTA] denotes the first difference; [[alpha].sub.v], [[beta].sub.a,v], [[beta].sub.y,v], [b.sub.y,i,v], and [b.sub.y,i,v] are coefficients, and [[epsilon].sub.t,v] is the error term. It should be noted that, as in LL, [a.sub.t,v] is the net worth at the beginning of the period and I set k equal to 8. Thus [cay.sub.t,v] is the deviation from the trend or [cay.sub.t,v] = [c.sub.t,v] - [[??].sub.a,v][a.sub.t,v] - [[??].sub.y,v][y.sub.t,v], where hats denote the estimated parameters.

B. Data Revisions

It is important to understand data revisions because, as I report in this paper, they are quite substantial and account for the poor performance of cay in real-time data. However, given the larger literature on this issue, for example, in Croushore and Stark (1999), I provide only a brief summary of the results that are relevant to the main purpose of the paper.

After the release of the end of quarter data, as I collect from various issues of Survey of Current Business, the BEA revises data on a regular basis, including the 1-yr revision, the 3-yr revision, and the comprehensive benchmark revision about every 5 yr. I denote a revision

(4) [R.sub.v1,v2] ([x.sub.t]) = [x.sub.t,v2] - [x.sub.t,v1], t < v1, t < v2, and v1 < v2,

where [x.sub.t,v2] and [x.sub.t,v1] could be consumption growth, for example. Mankiw, Runkle, and Shapiro (1984) and Croushore and Stark (1999), among others, suggest that the revision can be characterized as (1) containing news or (2) reducing noise. In the first case, the revision [R.sub.v1,v2]([x.sub.t]) is correlated with the subsequent release, [x.sub.t,v2], but not related with the earlier release, [x.sub.t,v1], because [x.sub.t,v2] contains new information beyond [x.sub.t,v1]. Also, given that [x.sub.t,v1] is an efficient estimate of [x.sub.t,v2], the variance of [x.sub.t,v2] is larger than the variance of [x.sub.t,v1]. In the second case, however, the revision, [R.sub.v1,v2]([x.sub.t]), is correlated with the earlier release, [x.sub.t,v1], but not the subsequent release, [x.sub.t,v2], because the latter just eliminates the noise of the former. Similarly, the variance of [x.sub.t,v2] is smaller than the variance of [x.sub.t,v1].

Following Croushore and Stark (1999), I define the initial released growth rate of consumption as [DELTA][c.sub.p.sub.t] = [c.sub.t-1,t] - [c.sub.t-2,t], where [c.sub.t-1,t] is the last observation of vintage t. It should be noted that the notation reflects the fact that macrovariables are available with a one-quarter delay. To analyze the effect of the revisions, I also calculate the growth rate in the vintage 1 yr later as [DELTA][c.sup.1.sub.t] = [c.sub.t-1,t+4] - [c.sub.t-2,t+4], in the vintage 3 yr later as [DELTA][c.sup.3.sup.t] = [c.sub.t-1,t+12] - [c.sub.t-2,t+12], and in the current vintage as [DELTA][c.sup.c.sub.t] = [c.sub.t-1,c] - [c.sub.t-2,c], where c refers to the 2002:Q3 vintage (the latest release when this paper was written). I calculate the growth rates for labor income and net worth in the same fashion.

Figures 1-3 plot the growth rate of consumption, labor income, and net worth, respectively, in different vintages. Unless otherwise indicated, I report throughout this subsection the results of real per capita consumption and labor income; however, I use nominal net worth because I want to show that nominal net worth is not much revised. There is a substantial difference between the initial release (solid line) and revised data (dashed line) for both consumption and labor income. However, the difference is very small for net worth data because most variations in net worth are related to stock price movements, which are not subject to revisions. I investigate these issues in some details below.

[FIGURE 1 OMITTED]

Table 1 reports the standard deviation of the growth rates for both the full sample of vintages 1968:Q2 to 1999:Q3 (upper panel) and the post-1996 subsample of vintages 1996:Q1 to 2001:Q3 (lower panel). The post-1996 subsample is special for two reasons. First, BEA changed from fixed weighting to chained weighting after 1996. Second, real-time net worth data are available over this period. Ir should be noted that all numbers are reported in percentage. In the full sample, the standard deviation increases from the initial release to the 1-yr-later release and falls from the 1-yr-later release to the 3-yr-later release for both consumption and labor income. From the 3-yr-later release to the latest release, the standard deviation rises for consumption and is about the same for labor income. Therefore, consistent with Croushore and Stark (1999), there are substantial data revisions to consumption and labor income due to incorporating news and reducing noise.

[FIGURE 2 OMITTED]

I do not consider the 3-yr-later release in the post-1996 subsample because of the relatively small number of vintages. Consistent with Figure 3, the relative change in the variance of net worth is very small, indicating that the net worth data are reliably measured in real time. Also, consumption exhibits the same pattern as in the full sample. Interestingly, the variance of labor income increases dramatically from .28 for the initial release to .50 for the current vintage, indicating that the revision incorporates substantial news. To get a closer look, I plot the initial and latest releases over the post-1996 period in Figure 4. While the revision is relatively small for consumption (Panel A), there is a substantial difference in labor income (Panel B), especially in 1996-97 and 2000-01. While it is difficult to pin down the exact source for the discrepancy, I note that the BEA has redefined components of labor income, as shown in Equation (2). For example, in the 1999 comprehensive benchmark revision, the BEA reclassified the employer contributions of government employee retirement plans as "other labor income" instead of "transfer payments to persons." Accordingly, the dividend and interest paid to these plans were reclassified as personal interest income and personal dividend income, respectively. As a result, labor income defined by Equation (2) was substantially revised downward over the period 1996-97. As I show in the next section, these revisions explain the poor performance of cay in real-time data.

[FIGURE 3 OMITTED]

Table 2 reports the correlation coefficient among the growth rates of the various releases and the results are consistent with those reported in Table 1. For example, the correlation coefficients of net worth are almost equal to 1 and are much larger than those of consumption and labor income. Also, in the post-1996 subsample, the correlation coefficient between the initial and latest releases of labor income is only .50, indicating that revisions incorporate substantial news. Table 3 presents the correlation coefficient between the revision and the growth rate, with bold denoting significance at the 5% level. Again, the results are consistent with those reported in Table 1. For example, in the full sample, the revision of consumption and labor income from the initial to the 1-yr-later releases reflects adding news rather than reducing noise; however, the subsequent revisions both incorporate news and reduce noise. In contrast, the correlation is never statistically significant for net worth because the revision of net worth is small. Also, the revision of labor income incorporates substantial new information in the post-1996 subsample. To summarize, I find substantial data revisions to consumption and labor income but not net worth.

C. Other Forecasting Variables and Stock Market Returns

I obtained data of stock market returns and the risk-free rate from Kenneth French at Dartmouth College, and excess stock market return is the difference between these two variables. (6) Return data are available at monthly frequency and I aggregate them into quarterly data through simple compounding. I also use realized market variance and the stochastically detrended risk-free rate as additional forecasting variables. Following Merton (1980) and many others, I construct quarterly realized market variance using daily stock market return data, which are assumed to be the return on the S&P 500 index. (7) As in Campbell et al. (2001), I adjust downward realized stock market variance for 1987:Q4 because the 1987 stock market crash has confounding effects on it. The stochastically detrended risk-free rate, rrel, is the difference between the nominal risk-free rate and its last four-quarter average. It should be noted that these financial variables are never revised.

III. EMPIRICAL RESULTS

In this section, I first assume that there are no revisions in net worth and use its latest (2002:Q3) release for all vintages. This assumption, which is unlikely to affect the results in any qualitative manner, allows us to investigate the performance of cay using vintages from 1968:Q2 to 2002:Q3 in forecasting stock market returns over the period 1968:Q3 to 2002:Q4, an updated sample analyzed by LL. For robustness, I also analyze a shorter subsample of vintages from 1996:Q1 to 2002:Q3, over which real-time net worth is available. (8) To disentangle the effect of data revisions from the look-ahead bias argued by Brennan and Xia (2005) and Avramov (2002), I compare two specifications. First, I estimate cay recursively using only real-time data collected in this paper. Second, following Brennan and Xia, I estimate cay recursively using the latest (2003:Q3) release. For example, at the beginning of 1996:Q1, I estimate the cointegration parameters using observations up to 1995:Q3. It should be noted that I allow macrovariables to be available with a one-quarter delay, as in real-time data. Therefore, the two approaches are identical if there were no data revisions. Lastly, I also include realized market variance and the stochastically detrended risk-free rate as additional forecasting variables in some specifications. Realized market variance improves the forecasting power of cay because of an omitted variable problem. The stochastically detrended risk-free rate provides additional information about future stock returns for the reason mentioned in footnote 1; however, I find qualitatively similar results if I exclude it from the forecasting equation.

[FIGURE 4 OMITTED]

A. Vintages 1968:Q2 to 2002:Q3

Figure 5 plots the recursively estimated cointegration parameters of Equation (3) for the vintages 1968:Q2 through 2002:Q3 across time. The thin solid line is [[beta].sub.y,v] and the thin dashed line is [[beta].sub.a,v]. For comparison, I also superimpose the thick solid line ([[beta].sub.y,v]) and the thick dashed line ([[beta].sub.a,v]), which are estimated recursively using the latest (2002:Q3) vintage, as in Brennan and Xia (2005). Real-time cointegration parameters change substantially after the comprehensive benchmark revisions denoted by the vertical bars. For both real-time and current vintage data, the estimated cointegration parameters appear to be relatively stable after 1991, although they fluctuate widely during the earlier periods because of the relatively small number of observations used in the estimation. Interestingly, the real-time estimates move closely with their current vintage counterparts after 1996 when the BEA switched to the chained weighting from the fixed weighting. Therefore, neither data revisions nor the look-ahead bias should have any sizeable effect on the cointegration parameters after 1996.

Figure 6 plots the adjusted [R.sup.2] of in-sample regression using vintages 1968:Q2 to 2002:Q3 for real-time data (solid line). Throughout the paper, I use an expanding sample for the in-sample regression. For comparison, I also plot the adjusted [R.sup.2] obtained from the current vintage (dashed line) over the corresponding period. In addition to cay, I also include realized market variance and the stochastically detrended risk-free rate in the forecasting equation. To conserve space, I simply note that, consistent with Guo (2006), past market variance improves the forecasting power of cay substantially in all vintages; however, the results are not sensitive to whether I include the stochastically detrended risk-free rate or not. These results are available on request. The adjusted [R.sup.2] of real-time cay is always above 15% and exhibits a similar pattern to that of the current vintage, especially after 1996. Again, I simply note that cay is statistically significant at the conventional level except in few early vintages, in which I do not have sufficient amount of observations to estimate the cointegration parameters precisely, as shown in Figure 5. To summarize, consistent with LL, cay is a strong in-sample predictor of stock market returns in real-time data.

Table 4 compares the out-of-sample performance of the real-time consumption-wealth ratio ([cay.sub.RT]) with that of the current vintage ([cay.sub.CV]). I consider two specifications: (1) cay only (columns (2) and (3)) and (2) cay augmented by past market variance, [[sigma].sub.2], and the stochastically detrended risk-free rate, rrel (columns (4) and (5)). I also include a benchmark ofconstant stock market return (column (1)). Panel A presents results for the vintages 1968:Q2 to 2002:Q3.

The results are consistent with those found in early studies. As in Brennan and Xia (2005) and Avramov (2002), recursively estimated cay by itself (column (3)) generates a bigger root mean squared forecasting error (RMSFE) than the benchmark model (column). However, consistent with Guo (2006), it regains the out-of-sample forecasting power when combined with realized market variance and the stochastically detrended risk-free rate (column (5)). Therefore, the look-ahead bias does not explain the forecasting power of cay in the current vintage data.

I also find some interesting new results. First, real-time data perform substantially worse than the current vintage does: [cay.sub.RT] (column (2)) has a RMSFE of .0957, which is much bigger than the RMSFE of .0916 for [cay.sub.CV] (column (3)). The Diebold and Mariano (Diebold and Mariano 1995) test indicates that the difference of the squared forecasting error between the two models is significant at the 10% level. Second, adding realized market variance and the stochastically detrended risk-free rate also improves the forecasting ability of real-time cay: RMSFE decreases substantially from .0957 to .0938. Third, real-time cay still performs much worse than the current vintage data in the augmented specification and the Diebold and Mariano test shows that the difference is statistically significant at the 5% level. More importantly, real-time cay has a much higher RMSFE than the benchmark model. Panel B reports qualitatively the same results for the forecast period 1996:Q2 to 2002:Q4. Therefore, data revisions have an important effect on the out of sample forecasting abilities of cay.

Leitch and Tanner (1991), among many others, argue that we should also evaluate out-of-sample predictive abilities using economic criteria. To address this issue, I investigate whether one can exploit forecasting abilities of cay using a simple and popular trading strategy, that is, holding stocks (bonds) if the one-quarter-ahead forecast of excess return is positive (negative). For brevity, I report only the results of the augmented model, which are similar to the model that uses only cay. For comparison, in Figure 7, I first plot the return on the managed portfolio (thick solid line) based on the current vintage data, along with the return on the buy-and-hold portfolio (dashed line). I find that the switching strategy avoids several large downward movements in the stock market. Overall, based on this strategy, a $100 investment in 1968:Q3 grew to $7,621 by 2002:Q4, which is over three times as much as the $2,483 gained with the buy-and-hold strategy. In stark contrast, however, Figure 8 shows that the switching strategy (thick solid line) performs poorly in real time; for example, it misses most stock market run-ups in the 1990s. Overall, real-time switching strategy turned the $100 investment into $2,567, slightly above that of the buy-and-hold strategy.

The BEA used a fixed weighting price index before 1996, which suffers from a so-called substitution bias. This bias is amplified in the calculation of cay because, as in LL, I use different price deflators for consumption, labor income, and net worth. Consequently, real-time cay is overly downward biased during the early 1990s relative to the current (2002:Q3) vintage, which is free of the substitution bias. Unfortunately, the data source does not provide enough details to allow us to construct the chained weighting price index prior to 1996. However, I might partially solve this problem by deflating labor income and net worth using the price deflator of consumption as defined by Equation (1), rather than that of the personal consumption expenditure. Figure 9 shows that, although the switching strategy based on modified real-time cay captured some stock market run-ups in the early 1990s, it still missed quite a few large gains during the period 1996-97. Overall, the $100 investment grew to $3,542, which is still far less than that from the current vintage. Also, the modified real-time cay has a bigger RMSFE than the benchmark of constant return as well. Therefore, the predictive ability of real-time cay is statistically and economically weaker than that of cay constructed from revised data.

[FIGURE 5 OMITTED]

[FIGURE 6 OMITTED]

[FIGURE 7 OMITTED]

[FIGURE 8 OMITTED]

B. Vintages 1996:Q1 to 2002:Q3

The post-1996 vintages are particularly interesting for the following reasons. First, over this period, I have all the required data, including net worth, to construct real-time cay. Second, the BEA switched to the chained weighting scheme in 1996. Focusing on the post-1996 subsample, I avoid the undesirable complication brought about by the substitution bias associated with the fixed weighting scheme used previously. Third, a relatively large number of observations are required to obtain sensible estimates of the cointegration parameters. As shown in Figure 5, the parameter estimates appear to be quite stable from the 1996 vintage on. Therefore, the vintages of 1996:Q1 to 2002:Q3 allow us to make a reliable assessment about the forecasting ability of cay in real time.

Table 5 reports the RMSFE. For comparison, I also include the results of the current vintage and the benchmark model of constant return, which are the same as those in Panel B of Table 4. Again, real-time data perform substantially worse than the current vintage: RMSFE is .1145 for [cay.sub.RT] and is .1123 for the augmented [cay.sub.RT], compared with .1063 and. 1014 for [cay.sub.CV] and the augmented [cay.sub.CV], respectively. Also, the Diebold and Mariano test shows the difference is significantly at the 5% level in both cases. Most importantly, they are both larger than the RMSFE of. 1042 for the benchmark model of constant return, indicating that real-time cay has negligible out-of-sample forecasting power for stock returns. In contrast, the augmented [cay.sub.CV] beats the benchmark of constant return. Therefore, cay does not forecast stock market returns out of sample mainly because of data revisions, which I discuss further below.

[FIGURE 9 OMITTED]

[FIGURE 10 OMITTED]

[FIGURE 11 OMITTED]

Figure 10 plots the one-quarter-ahead forecast from both real-time data (solid line with square) and current vintage (dashed line with triangle), along with the realized stock market returns (thick solid line). Compared with the current vintage, the real-time forecast is downward biased during the period 1996-97 and is upward biased during the period 2000-01. This is because, as shown in Figure 11, real-time cay (solid line) is severely downward biased during the period 1996-97 and is upward biased during the period 2000-01 relative to the current vintage (dashed line). (9) In hindsight, the current vintage suggests that the stock market did not exhibit irrational exuberance until the middle of 1998, more than 1 yr after the remarks by Federal Reserve Bank Chairman Alan Greenspan. Similarly, LL claim that "over this period (last 5 yr), consumption often remained far below its trend relationship with assets and labor earnings." That is, cay has been below the sample average since 1995:Q4 in the 1998:Q3 vintage reported by LL. However, in the 2002:Q4 vintage constructed by the same authors, cay remains above the sample average until 1997:Q3 (note reported here). (10) Therefore, the concern about the stock market overvaluation is clearly exaggerated in real-time data. This difference also accounts for the poor performance of real-time switching strategy, as shown in Figure 12. A real-time investor would have missed stock market run-ups in 1996-97 and suffered from a big loss in 2000 (thick solid line with square), compared with the outcome from the current vintage (dashed line with triangle).

It is clear that the discrepancy between real-time data and the latest release reflects the ongoing revisions by the BEA. As shown in the preceding section, the final labor income data incorporate substantial new information in the post-1996 sample. In particular, the BEA reclassified the employer contributions of government employee retirement plans as "other labor income" instead of "transfer payments to persons" in the 1999 comprehensive revision. Accordingly, the dividend and interest paid to these plans were reclassified as personal interest income and personal dividend income, respectively. As a result, labor income defined by Equation (2) was substantially revised downward (Panel B in Figure 4) and cay was revised upward for the period 1996-97.

The 1996-97 episode provides some interesting insight in the debate about cay as a forecasting variable. On the one hand, it is clear that investors should be cautious in using cay if they must rely on the information provided by the BEA. On the other hand, it also provides support for cay as a theoretically motivated variable because, as expected, it has the best forecasting power when it is properly measured, that is, in the current vintage data. These two conclusions are consistent with one another. For example, if an investor had correctly reclassified the employer contributions of government employee retirement plans from the very beginning, he would have achieved a better investment outcome.

[FIGURE 12 OMITTED]

IV. CONCLUSION

In the past two decades, there has been an ongoing debate about stock return predictability in the U.S. data. While many authors find that some variables forecast stock market returns in sample, others argue that the in-sample evidence are mainly the result of data mining because these variables have negligible out-of-sample predictive power. Recently, Lettau and Ludvigson (2001) show that the consumption-wealth ratio forecasts stock returns out of sample. In this paper, I find that, although it forecasts stock market returns out of sample in the revised data, the consumption-wealth ratio has negligible forecasting abilities in real-time data. The difference reflects the fact that real-time consumption-wealth ratio is a biased estimate of its true value because of the ongoing revisions of consumption and labor income data. The results provide support for the consumption-wealth ratio as a theoretically motivated variable. Nevertheless, they also suggest that practitioners, for example, investors and monetary policy makers, should be cautious if they need to use real-time data to predict stock market movements.

ABBREVIATIONS

BEA: Bureau of Economic Analysis

CRSP: Center of Research on Security Prices

CWR: Consumption-Wealth Ratio

LL: Lettau and Ludvigson

OSF: Out-of-Sample Forecast

RMSFE: Root Mean Squared Forecasting Error

RTD: Real-Time Data

S&P: Standard & Poor

SMTS: Stock Market Timing Strategies

SMV: Stock Market Volatility

SRP: Stock Return Predictability

REFERENCES

Ang, A., and G. Bekaert. "Stock Return Predictability: Is It There?" Review of Financial Studies, 20, 2007, 651-707.

Avramov, D. "Stock Return Predictability and Model Uncertainty." Journal of Financial Economics, 64, 2002, 423-58.

Bernanke, B., and M. Gertler. "Agency Costs, Net Worth, and Business Fluctuations." American Economic Review, 1989, 79, 14-31.

--. "Monetary Policy and Asset Price Volatility, Federal Reserve Bank of Kansas City Economic Review." Fourth Quarter, 1999, 17-51.

--. "Should Central Banks Respond to Movements in Asset Prices?" American Economic Review Papers and Proceedings, 91, 2001, 253-57.

Bossaerts, P., and P. Hillion. "Implementing Statistical Criteria to Select Return Forecasting Models: What Do We Learn?" Review of Financial Studies, 12, 1999, 405-28.

Brennan, M., and Y. Xia. "Tay's as Good as Cay." Finance Research Letters, 2, 2005, 1-14.

Campbell, J. "Stock Returns and the Term Structure." Journal of Financial Economics, 18, 1987, 373-99.

Campbell, J., and J. Cochrane. "By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior." Journal of Political Economy, 107, 1999, 205-51.

Campbell, J., A. Lo, and C. MacKinlay. The Econometrics of Financial Markets. Princeton, NJ: Princeton University Press, 1997.

Campbell, J., M. Lettau, B. Malkiel, and Y. Xu. "Have Individual Stocks Become More Volatile? An Empirical Exploration of Idiosyncratic Risk." Journal of Finance, 56, 2001, 1-43.

Cecchetti, S., H. Genberg, J. Lipsky, and S. Wadhwani. Geneva Reports on the World Economy, Vol. 2. Geneva: International Center for Monetary and Banking Studies, 2000.

Cooper, M., R. Gutierrez, and W. Marcum. "On the Predictability of Stock Returns in Real Time." Journal of Business, 78, 2005, 469-99.

Croushore, D., and T. Stark. "A Real-time Data Set for Macroeconomists: Does the Data Vintage Matter." Working Paper 1999-21, Federal Reserve Bank of Philadelphia, 1999.

Diebold, F., and R. Mariano. "Comparing Predictive Accuracy". Journal of Business and Economic Statistics, 13, 1995, 253-63.

Fama, E., and K. French. "Business Conditions and Expected Returns on Stocks and Bonds." Journal of Financial Economics, 25, 1989, 23-49.

Ferson, W., S. Sarkissian, and T. Simin. "Spurious Regressions in Financial Economics?" Journal of Finance, 58, 2003, 1393-413.

Goyal, A., and I. Welch. "Predicting the Equity Premium with Dividend Ratios." Management Science, 49, 2003, 639-54.

Guo, H. "Limited Stock Market Participation and Asset Prices in a Dynamic Economy." Journal of Financial and Quantitative Analysis, 39, 2004, 495-516.

--"On the Out-of-Sample Predictability of Stock Market Returns." Journal of Business, 79, 2006, 645-70.

Guo, H., and R. Savickas. "Idiosyncratic Volatility, Stock Market Volatility, and Expected Stock Returns." Journal of Business and Economics Statistics, 24, 2006, 43-56.

Inoue, A., and L. Kilian. "In-Sample or Out-of-Sample Tests of Predictability: Which One Should We Use?" Econometric Reviews, 23, 2005, 371-402.

Leitch, G., and E. Tanner. "Economic Forecast Evaluation: Profits Versus Conventional Error Measures." American Economic Review, 81, 1991, 580-90.

Lettau, M., and S. Ludvigson. "Consumption, Aggregate, Wealth and Expected Stock Returns." Journal of Finance, 56, 2001, 815-49.

--. "Time-Varying Risk Premia and the Cost of Capital: An Alternative Implication of the q Theory of Investment." Journal of Monetary Economics, 49, 2002, 31-66.

--. "Measuring and Modeling Variation in the Risk-Return Tradeoff." Unpublished Working Paper, New York University, 2003.

Lo, A., and A. MacKinlay. "Data-Snooping Biases in Tests of Financial Asset Pricing Models." Review of Financial Studies, 3, 1990, 431-67.

Mankiw, G., D. Runkle, and M. Shapiro. "Are Preliminary Announcements of the Money Stock Rational Forecasts." Journal of Monetary Economics, 14, 1984, 15-27.

Merton, R. "On Estimating the Expected Return on the Market: An Exploratory Investigation." Journal of Financial Economics, 8, 1980, 323-61.

Patelis, A. "Stock Return Predictability and the Role of Monetary Policy." Journal of Finance, 52, 1997, 1951-72.

Rudd, J., and K. Whelan. A Note on the Cointegration of Consumption, Income, and Wealth." Finance and Economics Discussion Series 2002-53, Board of Governors of the Federal Reserve System, 2002.

HUI GUO, I thank the editor, Dennis Jansen, an anonymous referee, and the conference participants at the 2006 FMA annual meeting in Salt Lake City. Bill Bock, Kamyar Nasseh, and Allison Rodean provided excellent research assistance. The first draft of the paper was finished when Guo was senior economist at the Federal Reserve Bank of St. Louis.

Guo: Assistant Professor, Department of Finance and Real Estate, University of Cincinnati, 418 Carl H. Lindner Hall, PO Box 210195 Cincinnati, OH 45221-0195. Phone (513) 556-7077, Fax (513) 556-0979, E-mail hui.guo@uc.edu

(1.) Bossaerts and Hillion (1999), Ang and Bekaert (2007), and Goyal and Welch (2003) focus on the forecasting variables advocated by the early authors (e.g., Campbell 1987; Fama and French 1989) such as the dividend yield, the default premium, and the term premium. LL show that these variables lose their predictive abilities if cay is also included in the forecasting equation for stock market returns. There is an exception: The stochastically detrended risk-free rate advocated by Campbell, Lo, and MacKinlay (1997), among many others, provides additional information about future returns. Patelis (1997) suggests that variables such as the stochastically detrended risk-free rate forecast stock returns because they reflect the stance of monetary policies, which have state-dependent effects on real economic activities through a credit channel (e.g., Bernanke and Gertler 1989). Also note that Inoue and Kilian (2005) recently argue that out-of-sample forecast tests are not necessarily more reliable than in-sample forecast tests, while the latter have better power.

(2.) Cecchetti et al. (2000), among many others, argue that monetary authorities should incorporate information variables such as cay in the policy-making process because these variables provide a gauge about the deviation of stock prices from their fundamental values and thus forecast aggregate economic activity.

(3.) Rudd and Whelan (2002) suggest alternative measures of consumption, labor income, and asset wealth. This issue is certainly interesting; however, I adopt Lettau and Ludvigson's specification because the main focus of the paper is the effect of the data revision on stock return predictability.

(4.) I thank Michael Palumbo at the Federal Reverse Board for providing real-time net worth data.

(5.) The observations of some vintages start from a later date. For example, after the 1996 comprehensive revision, the BEA did not release the observations prior to 1959 until 1997:Q2.

(6.) http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. I obtain very similar results using the return on the S&P 500 index as well as the Center of Research on Security Prices (the CRSP) value-weighted stock market return. These results are available on request.

(7.) I obtain almost identical results using the CRSP daily value-weighted stock market return.

(8.) The BEA uses the fixed weighting scheme before 1996, which, as I show below, has a confounding effect on the forecasting ability of cay. To avoid this complication, I focus on the post-1996 sample, even though the vintages of net worth start from 1995:Q3.

(9.) For each date in Figure 7, for example, 1996:Q1, I first estimate cay using the 1996:Q1 vintage (observations up to 1995:Q4 for the current vintage data), then subtract the sample mean from it, and use the value of the last observation for 1996:Q1.

(10.) I obtain both vintages from Martin Lettau at New York University, which can also be replicated using the real-time data collected in this paper.

TABLE 1 Standard Deviations of Growth Rates Data Set Consumption Net Worth Labor Income Vintages 1968:Q2 to 1999:Q3 Initial 0.47 0.84 1 Yr later 0.49 0.90 3 Yr later 0.44 0.88 Latest 0.46 0.88 Vintages 1996:Q1 to 2001:Q3 Initial 0.32 3.41 0.28 1 Yr later 0.33 3.38 0.35 Latest 0.24 3.47 0.50 Notes: I define the initial released growth rate of consumption as [[DELTA]c.sup.p.sub.t] = [c.sub.t-1,t] - [c.sub.t-2,t], where [c.sub.t-1,t] is the last observation of each vintage. It should be noted that the notation reflects the fact that macrovariables are available with a one-quarter delay. To analyze the effect of the revisions, I also calculate the growth rate in the vintage 1 yr later as [[DELTA]c.sup.l.sub.t] = [c.sub.t-1,t] - [c.sub.t-2,t+4], in the vintage 3 yr later as [[DELTA]c.sup.3.sub.t] = [c.sub.t 1,t+12 - [c.sub.t-2,t+12], and in the latest (2002:Q3) vintage as [[DELTA]c.sup.c.sub.t] = [c.sub.t-1,c] - [c.sub.t-2,c]. the growth rates for labor income and net worth in the same fashion. I report the results of real, per capita consumption and labor income; however, I use nominal net worth because I want to show that nominal net worth is not much revised. All numbers are reported in percentage. TABLE 2 Correlation between Growth Rates of Different Vintages 1 Yr 3 Yr Initial Later Later Latest Panel A. Consumption Vintages 1968:Q2 to 1999:Q3 Initial 1.00 1 Yr later .86 1.00 3 Yr later .75 .53 1.00 Latest .69 .76 .86 1.00 Vintages 1996:Q1 to 2001:Q3 Initial 1.00 1 Yr later .93 1.00 Latest .78 .87 1.00 Panel B. Labor income Vintages 1968:Q2 to 1999:Q3 Initial 1.00 1 Yr later .88 1.00 3 Yr later .84 .91 1.00 Latest .81 .89 .96 1.00 Vintages 1996:Q1 to 2001:Q3 Initial 1.00 1 Yr later .55 1.00 Latest .50 .34 1.00 Panel C. Net worth Vintages 1996:Q1 to 2001:Q3 Initial 1.00 1 Yr later .99 1.00 Latest .99 1.00 1.00 Notes: I define the initial released growth rate of consumption as [[DELTA]c.sup.p.sub.t] = [c.sub.t-1,t] - [c.sub.t-2,t], where [c.sub.t-1,t] is the last observation of each vintage. It should be noted that the notation reflects the fact that macrovariables are available with a one-quarter delay. To analyze the effect of the revisions, I also calculate the growth rate in the vintage 1 yr later as [[DELTA]c.sup.l.sub.t] = [c.sub.t-1,t+4] - [c.sub.t-2,t+4], in the vintage 3 yr later as [[DELTA]c.sup.3.sub.t] = [c.sub.t- 1,t+12 - [c.sub.t 2,t+12], and in the latest (2002:Q3) vintage as [[DELTA]c.sup.c.sub.t] = [c.sub.t-1,c] - [c.sub.t-2,c]. I define the growth rates for labor income and net worth in the same fashion. I report the results of real, per capita consumption and labor however, I use nominal net worth because I want to show that nominal net worth is not much revised. TABLE 3 Correlation between Revisions and Growth Rates Revisions/Data Set Initial 1 Yr Panel A. Consumption Vintages 1968:Q2 to 1999:Q3 Initial to 1 yr -.17 (-1.45) .35 (3.34) Initial to 3 yr -.42 (-5.00) -.07 (-0.79) Initial to final -.41 (-4.72) -.14 (-1.76) Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr -.10 (-0.53) .26 (1.35) Initial to final -.66 (-4.88) -.45 (-2.35) Panel B. Labor income Vintages 1968:Q2 to 1999:Q3 Initial to 1 yr -.11 (-1.89) .38 (2.03) Initial to 3 yr -.21 (-3.63) .13 (0.66) Initial to final -.25 (-3.60) .07 (0.36) Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr -.30 (-3.17) .64 (4.13) Initial to final -.07 (-0.47) .04 (-0.18) Panel C. Net worth Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr -.13 (-0.33) -.02 (-0.06) Initial to final .08 (0.42) .17 (0.95) Revisions/Data Set 3 Yr Final Panel A. Consumption Vintages 1968:Q2 to 1999:Q3 Initial to 1 yr .28 (3.12) .20 (2.13) Initial to 3 yr .29 (2.88) .18 (1.78) Initial to final .13 (1.44) .38 (4.80) Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr .32 (1.75) Initial to final -.04 (-0.28) Panel B. Labor income Vintages 1968:Q2 to 1999:Q3 Initial to 1 yr .28 (1.66) .28 (1.57) Initial to 3 yr .36 (2.40) .33 (2.14) Initial to final .26 (1.56) .37 (2.42) Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr -.08 (0.44) Initial to final .83 (10.65) Panel C. Net worth Vintages 1996:Q1 to 2001:Q3 Initial to 1 yr -.04 (0.15) Initial to final .20 (1.15) Notes: I define the initial released growth rate of consumption as [DELTA][c.sup.p.sub.t] = [c.sub.t-1,t] - [c.sub.t-2,t] where [c.sub.t-1,t] is the last observation of each vintage. It should be noted that the notation reflects the fact that macrovariables are available with a one-quarter del ay. To analyze the effect of the revisions, I also calculate the growth rate in the vintage 1 yr later as [[DELTA]c.sup.l.sub.t], = [c,t-1,t+4] - [c.sub.t-2,t+4], in the vintage 3 yr later as [[DELTA]c.sup.3.sub.t] = [c.sub.t-1,t+12] - [c.sub.t-2,t+12], and in the latest (2002:Q3) vintage as [[DELTA]c.sup.c.sub.t] = [c.sub.t-1,c - ct-2,c]. The revision, for example, from initial to 1 yr, is defined as [[DELTA]c.sup.l.sub.t] - [[DELTA]c.sup.p.sub.t]. I define the growth rates and revisions for labor income and net worth in the same fashion. I report the results of real, per capita consumption and labor income; however, I use nominal net worth because I want to show that nominal net worth is not much revised. I report heteroskedastic-consistent t-statistics in parentheses and bold denotes significant at the 5%, level. TABLE 4 RMSFE: Current Vintage Net Worth (1) Constant (2) (3) [cay.sub.RT] [cay.sub.CV] Panel A. 1968:Q3 to 2002:Q4 RMSFE .0907 .0957 * .0916 Panel B. 1996:Q2 to 2002:Q4 RMSFE .1042 .1149 ** .1063 (4) (5) [cay.sub.RT] [cay.sub.RT] + rrel + + rrel + [[sigma].sup.2] [[sigma].sup.2] Panel A. 1968:Q3 to 2002:Q4 .0938 ** .0895 RMSFE Panel B. 1996:Q2 to 2002:Q4 .1122 ** .1014 RMSFE Notes: This table reports RMSFE of five forecasting models: (1) the benchmark of constant return (Constant); (2) real-time cay ([cay.sub.RT]); (3) current vintage cay ([cay.sub.CV]); real-time cay augmented by the stochastically detrended risk-free rate and past stock market variance ([cay.sub.RT] + rrel + [[sigma].sup.2]); and (5) current vintage cay augmented by the stochastically detrended risk-free rate and past stock market variance ([cay.sub.CV] + rrel + [[sigma].sub.2]). I use real-time consumption and labor income data, and the latest vintage of net worth in the construction of real-time cay. In Panel A I use vintages 1968:Q2 to 2002:Q3 to forecast stock market returns over the period 1968:Q3 to 2002:Q4. In Panel B I use vintages 1996:Q1 to 2002:Q3 to forecast stock market returns over the period 1996:Q2 to 2002:Q4. I also conduct the Diebold and Mariano (1995) test of the equal forecasting error: * and ** indicate that the squared forecasting error of the real-time data is significantly larger than that of the current vintage at the 10%, and 5%, level, respectively. TABLE 5 RMSFE: Real-Time Net Worth (1) Constant (2) [cay.sub.RT] (3) [cay.sub.CV] 1996:Q2 to 2002:Q4 RMSFE .1042 .1145 ** .1063 (4) [cay.sub.RT] + rrel (5) [cay.sub.CV] + rrel + [[sigma].sup.2] + [[sigma].sup.2] 1996:Q2 to 2002:Q4 RMSFE .1123 ** .1014

Printer friendly Cite/link Email Feedback | |

Author: | Guo, Hui |
---|---|

Publication: | Economic Inquiry |

Geographic Code: | 1USA |

Date: | Jan 1, 2009 |

Words: | 8234 |

Previous Article: | The health effects of military service: evidence from the Vietnam draft. |

Next Article: | Black unemployment and infotainment. |

Topics: |