Drifts and volatilities: monetary policies and outcomes in the post WWII U.S.
JEL classification: C11, E31, E5
Key words: Bayesian analysis, inflation, monetary policy
This paper extends the model of Cogley and Sargent (2001) to incorporate stochastic volatility and then reestimates it for post World War II U.S. data in order to shed light on the following questions. Have aggregate time series responded via time-invariant linear impulse response functions to possibly heteroskedastic shocks? Or is it more likely that the impulse responses to shocks themselves have evolved over time because of drifting coefficients or other nonlinearities? We present evidence that shock variances evolved systematically over time, but that so did the autoregressive coefficients of VARs. One of our main conclusions is that much of our earlier evidence for drifting coefficients survives after we take stochastic volatility into account. We use our evidence about drift and stochastic volatility to infer that monetary policy rules have changed and that the persistence of inflation itself has drifted over time.
1.1 Time invariance versus drift
The statistical tests of Sims (1980, 1999) and Bernanke and Mihov (1998a, 1998b) seem to affirm a model that contradicts our findings. They failed to reject the hypothesis of time-invariance in the coefficients of VARs for periods and variables like ours. To shed light on whether our results are inconsistent with theirs, we examine the performance of various tests that have been used to detect deviations from time invariance. Except for one, we find that those tests have low power against our particular model of drifting coefficients. And that one test actually rejects time invariance in the data. These results about power help reconcile our findings with those of Sims and Bernanke and Mihov.
1.2 Bad policy or bad luck?
This paper accumulates evidence inside an atheoretical statistical model. (1) But we use the patterns of time variation that our statistical model detects to shed light on some important substantive and theoretical questions about post WWII U.S. monetary policy. These revolve around whether it was bad monetary policy or bad luck that made inflation-unemployment outcomes worse in the 1970s than before or after. The view of DeLong (1997) and Romer and Romer (2002), which they support by stringing together interesting anecdotes and selections from government reports, asserts that it was bad policy. Their story is that during the 1950s and early 1960s, the Fed basically understood the correct model (which in their view incorporates the natural rate theory that asserts that there is no exploitable trade off between inflation and unemployment); that Fed policy makers in the late 1960s and early 1970s were seduced by Samuelson and Solow's (1960) promise of an exploitable trade-off between inflation and unemployment; and that under Volcker's leadership, the Fed came to its senses, accepted the natural rate hypothesis, and focused monetary policy on setting inflation low.
Aspects of this "Berkeley view" receive backing from statistical work by Clarida, Gali, and Gertler (2000) and Taylor (1993), who fit monetary policy rules for subperiods that they choose to illuminate possible differences between the Burns and the Volcker-Greenspan eras. They find evidence for a systematic change of monetary policy across the two eras, a change that in Clarida, Gali, and Gertler's 'new-neoclassical-synthesis' macroeconomic model would lead to better inflation-unemployment outcomes.
But Taylor's and Clarida, Gertler, and Gali's interpretation of the data has been disputed by Sims (1980, 1999) and Bernanke and Mihov (1998a, 1998b), both of whom have presented evidence that the U.S. data do not prompt rejection of the time invariance of the autoregressive coefficients of a VAR. They also present evidence for shifts in the variances of the innovations to their VARs. If one equation of the VAR is interpreted as describing a monetary policy rule, then Sims's and Bernanke and Mihov's results say that it was not the monetary policy strategy but luck (i.e., the volatility of the shocks) that changed between the Burns and the non-Burns periods.
1.3 Inflation persistence and inferences about the natural rate
The persistence of inflation plays an important role in some widely used empirical strategies for testing the natural rate hypothesis and for estimating the natural unemployment rate. As we shall see, inflation persistence also plays an important role in lending relevance to instruments for estimating monetary policy rules. Therefore, we use our statistical model to portray the evolving persistence of inflation. We define a measure of persistence based on the normalized spectrum of inflation at zero frequency, then present how this measure of persistence increased during the 1960s and 70s, then fell during the 1980s and 1990s.
1.4 Drifting coefficients and the Lucas Critique
Drifting coefficients have been an important piece of unfinished business within macroeconomic theory since Lucas played them up in the first half of his 1976 Critique, but then ignored them in the second half. (2) In Appendix A, we revisit how drifting coefficients bear on the theory of economic policy in the context of recent ideas about self-confirming equilibria. This appendix provides background for a view that helps to bolster the time-invariance view of the data taken by Sims and Bernanke and Mihov.
We take a Bayesian perspective and report time series of posterior densities for various economically interesting functions of hyperparameters and hidden states. We use a Markov Chain Monte Carlo algorithm to compute posterior densities.
The remainder of this paper is organized as follows. Section 2 describes the basic statistical model that we use to develop empirical evidence. We consign to appendix B a detailed characterization of the priors and posterior for our model, and appendix C describes a Markov Chain Monte Carlo algorithm that we use to approximate the posterior density. Section 3 reports our results, and section 4 concludes. Appendix A pursues a theme opened in the Lucas Critique about how drifting coefficient models bear on alternative theories of economic policy.
2 A Bayesian Vector Autoregression with Drifting Parameters and Stochastic Volatility
The object of Cogley and Sargent (2001) was to develop empirical evidence about the evolving law of motion for inflation and to relate the evidence to stories about changes in monetary policy rules. To that end, we fit a Bayesian vector autoregression for inflation, unemployment, and a short term interest rate. We introduced drifting VAR parameters, so that the law of motion could evolve, but assumed the VAR innovation variance was constant. Thus, our measurement equation was
[y.sub.t] = [X'.sub.t] [[theta].sub.t] + [[epsilon].sub.t], (1)
where [y.sub.t] is a vector of endogenous variables, [X.sub.t] includes a constant plus lags of [y.sub.t], and [[theta].sub.t] is a vector of VAR parameters. The residuals, [[epsilon].sub.t], were assumed to be conditionally normal with mean zero and constant covariance matrix R.
The VAR parameters were assumed to evolve as driftless random walks subject to reflecting barriers. Let
[[theta].sub.T] = [[[theta]'.sub.1], ..., [[theta]'.sub.T]]', (2)
represent the history of VAR parameters from dates 1 to T. The driftless random walk component is represented by a joint prior,
f([[theta].sup.T], Q) = f([[theta].sup.T]|Q)f(Q) = f(Q) [[PI].sup.T-1.sub.s=0] f([[theta].sub.s+1]|[[theta].sub.s], Q). (3)
f([[theta].sub.t+1]|[[theta].sub.t], Q) ~ N([[theta].sub.t], Q). (4)
Thus, apart from the reflecting barrier, [[theta].sub.t] evolves as
[[theta].sub.t] = [[theta].sub.t-1] + [v.sub.t], (5)
The innovation [v.sub.t] is normal with mean zero and variance Q, and we allowed for correlation between the state and measurement innovations, cov([v.sub.t], [[epsilon].sub.t]) = C. The marginal prior f(Q) makes Q an inverse-Wishart variate.
The reflecting barrier was encoded in an indicator function, I([[theta].sup.T]) = [[PI].sup.T.sub.s=1] I([[theta].sub.s]). The function I([[theta].sub.s]) takes a value of 0 when the roots of the associated VAR polynomial are inside the unit circle, and it is equal to 1 otherwise. This restriction truncates and renormalizes the random walk prior,
p([[theta]sup.T], Q) [proportional] I([[theta].sup.T]) f([[theta].sup.T], Q) (6)
This is a stability condition for the VAR, reflecting an a priori belief about the implausibility of explosive representations for inflation, unemployment, and real interest. The stability prior follows from our belief that the Fed chooses policy rules in a purposeful way. Assuming that the Fed has a loss function that penalizes the variance of inflation, it will not choose a policy rule that results in a unit root in inflation, for that results in an infinite loss. (3)
In appendix B, we derive a number of relations between the restricted and unrestricted priors. Among other things, the restricted prior for [[theta].sup.T]|Q can be expressed as
p([[theta].sup.T]|Q) = I([[theta].sup.T]) f([[theta].sup.T]|Q)/[m.sub.[theta]](Q), (7)
the marginal prior for Q becomes
p(Q) = [m.sub.[theta]](Q) f(Q)/[m.sub.Q], (8)
and the transition density is
p([[theta].sub.t+1]|[[theta].sub.t], Q) [proportional] I([[theta].sub.t+1]) f([[theta].sub.t+1]|[[theta].sub.t], Q) [pi]([[theta].sub.t+1], Q). (9)
The terms [m.sub.[theta]](Q) and [m.sub.Q] are normalizing constants and are defined in the appendix. (4)
In (7), the stability condition truncates and renormalizes f([[theta].sup.T]|Q) to eliminate explosive [theta]'s. In (8), the marginal prior f(Q) is re-weighted by [m.sub.[theta]](Q), the probability of an explosive draw from f([[theta].sup.T]|Q). This lessens the probability of Q-values that are likely to generate explosive [theta]'s. Since large values of Q make explosive draws more likely, this shifts the prior probability toward smaller values of Q. In other words, relative to f(Q), p(Q) is tilted in the direction of less time variation in [theta]. Finally, in (9), f([[theta].sub.t+1]|[[theta].sub.t],Q) is truncated and re-weighted by p([[theta].sub.t+1], Q). The latter term represents the probability that random walk paths emanating from [[theta].sub.t+1] will remain in the nonexplosive region going forward in time. Thus, the restricted transition density censors explosive draws from f([[theta].sub.t+1]|[[theta].sub.t],Q) and down-weights those likely to become explosive. (5)
2.1 Sims's and Stock's criticisms
Sims (2001) and Stock (2001) were concerned that our methods might exaggerate the time variation in [[theta].sub.t]. One comment concerned the distinction between filtered and smoothed estimates. Cogley and Sargent (2001) reported results based on filtered estimates, and Sims pointed out that there is transient variation in filtered estimates even in time-invariant systems. In this paper, we report results based on smoothed estimates of [theta].
More importantly, Sims and Stock questioned our assumption that R is constant. They pointed to evidence developed by Bernanke and Mihov (1998a,b), Kim and Nelson (1999), McConnell and Perez Quiros (2000), and others that VAR innovation variances have changed over time. Bernanke and Mihov focused on monetary policy rules and found a dramatic increase in the variance of monetary policy shocks between 1979 and 1982. Kim and Nelson and McConnell and Perez Quiros studied the growing stability of the U.S. economy, which they characterize in terms of a large decline in VAR innovation variances after the mid-1980s. The reason for this decline is the subject of debate, but there is now much evidence against our assumption of constant R.
Sims and Stock also noted that there is little evidence in the literature to support our assumption of drifting [theta]. Bernanke and Mihov, for instance, used a procedure developed by Andrews (1993) to test for shifts in VAR parameters and were unable to reject time invariance. Indeed, their preferred specification was the opposite of ours, with constant [theta] and varying R.
If the world were characterized by constant [theta] and drifting R, and we fit an approximating model with constant R and drifting [theta], then it seems likely that our estimates of [theta] would drift to compensate for misspecification of R, thus exaggerating the time variation in [theta]. Stock suggested that this might account for our evidence on changes in inflation persistence. There is much evidence to support a positive relation between the level and variance of inflation, but the variance could be high either because of large innovation variances or because of strong shock persistence. A model with constant [theta] and drifting R would attribute the high inflation variance of the 1970s to an increase in innovation variances, while a model with drifting [theta] and constant R would attribute it to an increase in shock persistence. If Bernanke and Mihov are right, the evidence on inflation persistence reported in Cogley and Sargent (2001) paper may be an artifact of model misspecification.
2.2 Strategy for sorting out the issues
Of course, it is possible that both the coefficients and the volatilities vary, but most empirical models focus on one or the other. In this paper, we develop an empirical model that allows both to vary. We use the model to consider the extent to which drift in R undermines our evidence on drift in [theta], and also to conduct power simulations for the Andrews-Bernanke-Mihov test. Their null hypothesis, which they were unable to reject, was that [theta] is time invariant. Whether this constitutes damning evidence against our vision of the world depends on the power of the test. Their evidence would be damning if the test reliably rejected a model like ours, but not so damning otherwise.
To put both elements in motion, we retain much of the specification described above, but now we assume that the VAR innovations can be expressed as
[[epsilon].sub.t] = [R.sup.1/2.sub.t] [[xi].sub.t], (10)
where [[x].sub.t] is a standard normal random vector. Because we are complicating the model by introducing a drifting innovation variance, we simplify in another direction to economize on free parameters. Thus, we also assume that standardized VAR innovations are independent of parameter innovations,
E([[xi].sub.t] [v.sub.s]) = 0 for all t, s. (11)
To model drifting variances, we adopt a multivariate version of the stochastic volatility model of Jacquier, Polson, and Rossi (1994).6 In particular, we assume that [R.sub.t] can be expressed as
[R.sub.t] = [B.sup.-1][H.sub.t][B.sup.-1'], (12)
where [H.sub.t] is diagonal and B is lower triangular,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (13)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (14)
The diagonal elements of [H.sub.t] are assumed to be independent, univariate stochastic volatilities that evolve as driftless, geometric random walks,
ln[h.sub.it] = ln. [h.sub.it-1] + [[sigma].sub.i][[eta].sub.it]. (15)
The random walk specification is designed for permanent shifts in the innovation variance, such as those emphasized in the literature on the growing stability of the U.S. economy. The volatility innovations, [[eta].sub.it], are standard normal random variables that are independent of one another and of the other shocks in the model, [[xi].sub.t] and [v.sub.t]. The volatility innovations are each scaled by a free parameter [[sigma].sub.i] that determines their magnitude. The factorization in (12) and log specification in (15) guarantee that [R.sub.t] is positive definite. The free parameters in B allow for correlation among the elements of [[epsilon].sub.t]. The matrix B orthogonalizes [[epsilon].sub.t], but it is not an identification scheme.
This specification differs from others in the literature that assume finite-state Markov representations for [R.sub.t]. Our specification has advantages and disadvantages relative to hidden Markov models. One advantage of the latter is that they permit jumps, whereas our model forces the variance to adjust continuously. An advantage of our specification is that it permits recurrent, permanent shifts in variance. Markov representations in which no state is absorbing permit recurrent shifts, but the system forever switches between the same configurations. Markov representations with an absorbing state permit permanent shifts in variance, but such a shift can only occur once. Our specification allows permanent shifts to recur and allows new patterns to develop going forward in time.
We use Markov Chain Monte Carlo (MCMC) methods to simulate the posterior density. (7) Let
[Y.sup.T] = [[y'.sub.1], ..., [y'.sub.T]]' (16)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (17)
represent the history of data and stochastic volatilities up to date T, let [sigma] = ([[sigma].sub.1], [[sigma].sub.2], [[sigma].sub.3]) stand for the standard deviations of the log-volatility innovations, and let [beta] = [[[beta].sub.21], [[beta].sub.31], [[beta].sub.32]] represent the free parameters in B. The posterior density,
p([[theta].sup.T], Q, [sigma], [beta], [H.sup.T] | [Y.sup.T]), (18)
summarizes beliefs about the model's free parameters, conditional on priors and the history of observations, [Y.sup.T].
3 Empirical Results
In order to focus on the influence of drift in R, we use the same data as in our earlier paper. Inflation is measured by the CPI for all urban consumers, unemployment by the civilian unemployment rate, and the nominal interest rate by the yield on 3-month Treasury bills. Inflation and unemployment data are quarterly and seasonally adjusted, and Treasury bill data are the average of daily rates in the first month of each quarter. The sample spans the period 1948.1 to 2000.Q4. We work with VAR(2) representations for nominal interest, inflation, and the logit of unemployment.
The hyperparameters and initial states are assumed to be independent across blocks, so that the joint prior can be expressed as the product of marginal priors,
f([[theta].sub.0], [h.sub.10] [h.sub.20] [h.sub.30], Q, [beta], [[sigma].sub.1] [[sigma].sub.2] [[sigma].sub.3]) = f ([theta].sub.0] f [h.sub.10] f [h.sub.20] f [h.sub.30] f (Q) f ([beta]) f [[sigma].sub.1] f [[sigma].sub.2]p [[sigma].sub.3]. (19)
Our prior for [[theta].sub.0] is a truncated Gaussian density,
p([[theta].sub.0]) [proportional] I([[theta].sub.0]) f([[theta].sub.0]) = I([[theta].sub.0]) N([??], [??]). (20)
The mean and variance of the Gaussian piece are calibrated by estimating a time-invariant vector autoregression using data for 1948.Q3-1958.Q4. The mean, [??], is set equal to the point estimate, and the variance, [??], is its asymptotic variance. Because the initial estimates are based on a short stretch of data, the location of [[theta].sub.0] is only weakly restricted.
The matrix Q is a key parameter because it governs the rate of drift in [theta]. We adopt an informative prior for Q, but we set its parameters to maximize the weight that the posterior puts on sample information. Our prior for Q is inverse-Wishart,
f(Q) = IW([??].sup.-1], [T.sub.0]), (21)
with degrees of freedom [T.sub.0] and scale matrix [??]. The degrees of freedom [T.sub.0] must exceed the dimension of [[theta].sub.t] in order for this to be proper. To put as little weight as possible on the prior, we set
[T.sub.0] = dim([[theta].sub.t]) + 1. (22)
To calibrate [??], we assume
[??] = [[gamma].sup.2][??] (23)
and set [[gamma].sup.2] = 3.5e-04. This makes [??] comparable to the value used in Cogley and Sargent (2001). (8) This setting can be interpreted as a weak version of a 'business as usual' prior, in the sense of Leeper and Zha (2001a,b). The prior is weak because it involves minimal degrees of freedom. It reflects a business-as-usual perspective because the implied values for [??] result in little variation in [theta]. Indeed, had we calibrated Q = [??], or set [T.sub.0] so that a substantial weight was put on the prior, drift in posterior estimates of [theta] would be negligible. Thus, the setting for [??] is conservative for our vision of the world.
The parameters governing priors for [R.sub.t] are set more or less arbitrarily, but also very loosely, so that the data are free to speak about this feature as well. The prior for [h.sub.i0] is log-normal,
f(ln [h.sub.i0]) = N(ln [[??].sub.i0]), (24)
where [[??].sub.i] is the initial estimate of the residual variance of variable i. Notice that a variance of 10 is huge on a natural log scale, making this weakly informative for [h.sub.i0]. Similarly, the prior for [beta] is normal with a large variance,
f([beta]) = N(0, 10000 x [I.sub.3]). (25)
Finally, the prior for [[sigma].sup.2.sub.i] is inverse gamma with a single degree of freedom,
f([[sigma].sup.2.sub.i]) = IG([.01.sup.2]/2, 1/2). (26)
The specification is designed to put a heavy weight on sample information.
3.3 Details of the Simulation
We executed 100,000 replications of a Metropolis-within-Gibbs sampler and discarded the first 50,000 to allow for convergence to the ergodic distribution. We checked convergence by inspecting recursive mean plots of various parameters and by comparing results across parallel chains starting from different initial conditions. Because the output files are huge, we saved every 10th draw from the Markov chain, to economize on storage space. This has a side benefit of reducing autocorrelation across draws, but it does increase the variance of ensemble averages from the simulation. This yields a sample of 5000 draws from the posterior density. The estimates reported below are computed from averages of this sample.
3.4 The Posterior Mean of Q
We begin with evidence on the rate of drift in [theta], as summarized by posterior estimates of Q. Recall that Q is the virtual innovation variance for VAR parameters. Large values mean rapid movements in [theta], smaller values imply a slower rate of drift, and Q = 0 represents a time-invariant model. The following table addresses two questions, whether the results are sensitive to the VAR ordering and how the stability prior influences the rate of drift in [theta].
Sims (1980) reported that the ordering of variables in an identified VAR mattered for a comparison of interwar and postwar business cycles. In particular, for one ordering he found minimal changes in the shape of impulse response functions, with most of the difference between interwar and postwar cycles being due to a reduction in shock variances. He suggested to us that the ordering of variables might matter in our model too because of the way VAR innovation variances depend on the stochastic volatilities. In our specification, the first and second variables share common sources of stochastic volatility with the other variables, but the third variable has an independent source of volatility. Shuffling the variables might alter estimates of VAR innovation variances.
Accordingly, we estimated all possible orderings to see whether there exists an ordering that mutes evidence for drift in [theta], as in Sims (1980). This seems not to be the case. With the stability condition imposed (our preferred specification), there are only minor differences in posterior estimates of Q. The ordering that minimizes the rate of drift in [theta] is [[i.sub.t], [u.sub.t], [[pi].sub.t]]', and the remainder of the paper focuses on this specification. This is conservative for our perspective, but results for the other orderings are similar.
The second question concerns how the stability prior influences drift in [theta]. One might conjecture that the stability constraint amplifies evidence for drift in [theta] by pushing the system away from the unit root boundary, forcing the model to fit inflation persistence via shifts in the mean. Again, this seems not to be the case; posterior mean estimates for Q are smaller when the stability condition is imposed. Withdrawing the stability prior increases the rate of drift in [theta].
The next table explores the structure of drift in [theta], focusing on the minimum-Q ordering [i, u, [pi]]'. Sargent's (1999) learning model predicts that reduced form parameters should drift in a highly structured way, because of the cross-equation restrictions associated with optimization and foresight. A formal treatment of cross-equation restrictions with parameter drift is a priority for future work. Here we report some preliminary evidence based on the principal components of Q.
The table confirms that drift in [theta] is highly structured. There are 21 free parameters in a trivariate VAR(2) model, but only three linear combinations vary significantly over time. The first principal component accounts for almost half the total variation, the first two components jointly account for more than 80 percent, and the first three account for roughly 95 percent. These components load most heavily on lags of nominal interest and unemployment in the inflation equation; they differ in the relative weights placed on various lags. The remaining principal components, and the coefficients in the nominal interest and unemployment equations, are approximately time invariant. Thus the model's departure from time invariance is not as great as it first may seem. There are two or three drifting components in [theta] that manifest themselves in a variety of ways.
3.5 The Evolution of [R.sub.t]
Next we consider evidence on the evolution of [R.sub.t]. Figure 1 depicts the posterior mean of [R.sub.t] for the minimal-Q ordering [i, u, [pi]]'. The left-hand column portrays standard deviations for VAR innovations, expressed in basis points at quarterly rates, and the right-hand column shows correlation coefficients.
[FIGURE 1 OMITTED]
The estimates support the contention that variation in [R.sub.t] is an important feature of the data. Indeed, the patterns shown here resemble those reported by Bernanke and Mihov, Kim and Nelson, McConnell and Perez Quiros, and others.
For example, there is a substantial reduction in the innovation variance for unemployment in the early 1980s. At that time, the standard deviation fell by roughly 40 percent, an estimate comparable to those of Kim and Nelson and McConnell and Perez Quiros. Indeed, this seems to be part of a longer-term trend of growing stability in unemployment innovations. Our estimates suggest that there was a comparable decrease in variance in the early 1960s and that the standard error has fallen by a total of roughly 60 percent since the late 1950s. The trend toward greater stability was punctuated in the 1970s and early 1980s by countercyclical increases in variance. Whether the downward drift or business cycle pattern are likely to recur is an open question.
In addition, between 1979 and 1981, there is a spike in the innovation variances for nominal interest and inflation. The spike in the innovation variance for nominal interest resembles the estimates of Bernanke and Mihov. The two variances fell sharply after 1981 and reverted within a few years to levels achieved in the 1960s.
The right-hand column illustrates the evolution of correlations among the VAR innovations, calculated from the posterior mean, E([R.sub.t|T]). Unemployment innovations were negatively correlated with innovations in inflation and nominal interest throughout the sample. The correlations were largest in magnitude during the Volcker disinflation. At other times, the unemployment innovation was virtually orthogonal to the others. Inflation and nominal interest innovations were positively correlated throughout the sample, with the maximum degree of correlation again occurring in the early 1980s.
This correlation pattern has some bearing on one strategy for identifying monetary policy shocks. McCallum (1999) has argued that monetary policy rules should be specified in terms of lagged variables, on the grounds that the Fed lacks good current-quarter information about inflation, unemployment, and other target variables. This is especially relevant for decisions early in the quarter. If the Fed's policy rule depends only on lagged information, then it can be cast as the nominal interest equation in a VAR. Among other things, this means that nominal interest innovations are policy shocks and that correlations among VAR innovations represent unidirectional causation from policy shocks to the other variables.
The signs of the correlations in figure 1 suggest that this interpretation is problematic for our VAR. If nominal interest innovations were indeed policy shocks, conventional wisdom suggests they should be inversely correlated with inflation and positively correlated with unemployment, the opposite of what we find. A positive correlation with inflation and a negative correlation with unemployment suggests a policy reaction. There must be some missing information. (9)
Finally, figure 2 reports the total prediction variance, log|E([R.sub.t|T])|. Following Whittle (1953), we interpret this as a measure of the total uncertainty entering the system at each date.
[FIGURE 2 OMITTED]
The smoothed estimates shown here are similar to the filtered estimates reported in our earlier paper. Both suggest a substantial increase in short-term uncertainty between 1965 and 1981 and an equally substantial decrease thereafter. The increase in uncertainty seems to have happened in two steps, one occurring between 1964 and 1972 and the other between 1977 and 1981. Most of the subsequent decrease occurred in the mid-1980s, during the latter years of Volcker's term. This picture suggests that the growing stability of the economy may reflect a return to stability, though the earlier period of stability proved to be short-lived.
3.6 The Evolution of [[theta].sub.t]
There is no question that variation in R is an interesting and important feature of the data, but does it alter the patterns of drift in [theta] documented in our earlier paper? Our main interests concern movements in core inflation, the natural rate of unemployment, inflation persistence, the degree of policy activism, and how they relate to one another. Our interest in these features follows from their role in stories about how changes in monetary policy may have contributed to the rise and fall of inflation in the 1970s and 1980s.
3.6.1 Core Inflation and the Natural Rate of Unemployment
The first set of figures depicts movements in core inflation and the natural rate of unemployment, which are estimated from local linear approximations to mean inflation and unemployment, evaluated at the posterior mean, E([[theta].sub.t|T]). Write (1) in companion form as
[z.sub.t] = [[mu].sub.t|T] + [A.sub.t|T] [z.sub.t-1] + [u.sub.t], (27)
where [z.sub.t] consists of current and lagged values of [y.sub.t], [[mu].sub.t|T] contains the intercepts in E([[theta].sub.t|T]), and [A.sub.t|T] contains the autoregressive parameters. By analogy with a time-invariant model, mean inflation at t can be approximated by
[[??].sub.t] = [s.sub.[pi]][(I - [A.sub.t|T]).sup.-1][[mu].sub.t|T], (28)
where [s.sub.[pi]] is a row vector that selects inflation from [z.sub.t]. Similarly, mean unemployment can be approximated as
[[??].sub.t] = [s.sub.u][(I - [A.sub.t|T]).sup.-1] [[mu].sub.t|T], (29)
where [s.sub.u] selects unemployment from [z.sub.t].
Figures 3 portrays the evolution of [[??].sub.t] and [[??].sub.t] for the ordering [i, u, [pi]]'. Two features are worth noting. First, allowing for drift in [R.sub.t] does not eliminate economically meaningful movements in core inflation or the natural rate. On the contrary, the estimates are similar to those in our earlier paper. Core inflation sweeps up from around 1.5 percent in the early 1960s, rises to a peak of approximately 8 percent in the late 1970s, and then falls to a range of 2.5 to 3.5 percent through most of the 1980s and 1990s. The natural rate of unemployment also rises in the late 1960s and 1970s and falls after 1980.
[FIGURE 3 OMITTED]
Second, it remains true that movements in [[??].sub.t] and [[??].sub.t] are highly correlated with one another, in accordance with the predictions of Parkin (1993) and Ireland (1999). The unconditional correlation is 0.748.
Table 3 and figures 4 and 5 characterize the main sources of uncertainty about these estimates. The table and figures are based on a method developed by Sims and Zha (1999) for constructing error bands for impulse response functions. We start by estimating the posterior covariance matrix for [[??].sub.t] via the delta method,
[FIGURES 4-5 OMITTED]
[V.sub.[??]] = [partial derivative][??]/[partial derivative][theta] [V.sub.[theta]] [partial derivative][??]/[partial derivative][theta]'. (30)
[V.sub.[theta]] is the KT x K[T.sup.10] covariance matrix for [[theta].sup.T] and [partial derivative][??]/[partial derivative][theta] is the T x KT matrix of partial derivatives of the function that maps VAR parameters into core inflation, evaluated at the posterior mean of [[theta].sup.T]. The posterior covariance [V.sub.[theta]] is estimated from the ensemble of Metropolis draws, and derivatives were calculated numerically. (11)
[V.sub.[??]] is a large object, and we need a tractable way to represent the information it contains. Sims and Zha recommend error bands based on the first few principal components. (12) Let [V.sub.[??]] = W [LAMBDA] W', where [LAMBDA] is a diagonal matrix of eigenvalues and
W is an orthonormal matrix of eigenvectors. A two-sigma error band for the ith principal component is
[[??].sub.t] [+ or -] 2[[lambda].syp.1/2.sub.i][W.sub.i], (31)
where [[lambda].sub.i] is the variance of the ith principal component and [W.sub.i] is the ith column of W.
Table 3 reports the cumulative proportion of the total variation for which the principal components account. The second column refers to [V.sub.[pi]], and the third column decomposes the covariance matrix for the natural rate, [V.sub.[??]]. The other columns are discussed below.
One interesting feature is the number of non-trivial components. The first principal component in [V.sub.[??]] and [V.sub.[??]] accounts for 40 to 50 percent of the total variation, and the first 5 jointly account for about 75 percent. This suggests an important departure from time invariance. In a time-invariant model, there would be a single factor representing uncertainty about the location of the terminal estimate, but smoothed estimates going backward in time would be perfectly correlated with the terminal estimate and would contribute no additional uncertainty. (13) V would be a T x T matrix with rank one, and the single principal component would describe uncertainty about the terminal location. In a nearly time-invariant model, i.e. one with small Q, the path to the terminal estimate might wiggle a little, but one would still expect uncertainty about the terminal estimate to dominate. That the first component accounts for a relatively small fraction of the total suggests there is also substantial variation in the shape of the path.
Error bands for core inflation are shown in figure 4. The central dotted line is the posterior mean estimate, reproduced from figure 3. The horizontal line is a benchmark, end-of-sample, time-invariant estimate of mean inflation.
The first principal component, which accounts for roughly half the total variation, describes uncertainty about the location of core inflation in the late 1960s and 1970s. As core inflation increased, so too did uncertainty about the mean, and by the end of the decade a two-sigma band ranged from 2 to 14 percent. The growing uncertainty about core inflation seems to be related to changes in inflation persistence. Core inflation can be interpreted as a long-horizon forecast, and the variance of long-horizon forecasts depends positively on the degree of persistence. As shown below, inflation also became more persistent as core inflation rose. Indeed, our estimates of inflation persistence are highly correlated with the width of the first error band.
Components 3 through 5 portray uncertainty about the number of local peaks in the 1970s, and they jointly account for about 15 percent of the total variation. Bands for these components cross several times, a sign that some paths had more peaks than others. For example, in panel 3, trajectories associated with a global peak at the end of the 1970s tended also to have a local peak at the end of the 1960s. In contrast, paths that reached a global peak in the mid-1970s tended to have a single peak.
Finally, the sixth component loads heavily on the last few years in the sample, describing uncertainty about core inflation in the late 1990s. At the end of 2000, a two-sigma band for this component ranged from approximately 1 to 5 percent.
Error bands for the natural rate are constructed in the same way, and they are shown in figure 5. Once again, the central dotted line is the posterior mean estimate, and the horizontal line is an end-of-sample, time-invariant estimate of mean unemployment. The first principal component in [V.sub.[??]] also characterizes uncertainty about the 1970s. The error band widens in the late 1960s when the natural rate began to rise, and it narrows around 1980 when the mean estimate fell. The band achieved its maximum width around the time of the oil shocks, when it ranged from roughly 4 to 11 percent. The width of this band also seems to be related to changes in the persistence of shocks to unemployment.
The second, third, and fourth components load heavily on the other years of the sample, jointly accounting for about 30 percent of the total variation. Roughly speaking, they cover intervals of plus or minus 1 percentage point around the mean. The fifth and sixth components account for 8 percent of the variation, and they seem to be related to uncertainty about the timing and number of peaks in the natural rate.
3.6.2 Inflation Persistence
Next we turn to evidence on the evolution of second moments of inflation. Second moments are measured by a local linear approximation to the spectrum for inflation,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (32)
evaluated at the posterior mean of [theta] and R. An estimate of [f.sub.[pi][pi]]([omega], t) is shown in figure 6. Time is plotted on the x-axis, frequency on the y-axis, and power on the z-axis.
[FIGURE 6 OMITTED]
Again, the estimates are similar to those reported in Cogley and Sargent (2001). The introduction of drift in [R.sub.t] does not undermine our evidence on variation in the spectrum for inflation.
The most significant feature of this graph is the variation over time in the magnitude of low frequency power. In our earlier paper, we interpreted the spectrum at zero as a measure of inflation persistence. Here that interpretation is no longer quite right, because variation in low-frequency power depends not only on drift in the autoregressive parameters, [A.sub.t|T], but also on movements in the innovation variance, E([R.sub.t|T]). In this case, the normalized spectrum,
[g.sub.[pi][pi]](w, t) = [f.sub.[pi][pi]](w, t)/[[integral].sup.[pi].sub.-[pi]] [f.sub.[pi][pi]](w, t) dw, (33)
provides a better measure of persistence. The normalized spectrum is the spectrum divided by the variance in each year. The normalization adjusts for changes in innovation variances and measures autocorrelation rather than autocovariance. We interpret [g.sub.[pi][pi]](0, t) as a measure of inflation persistence.
Estimates of the normalized spectrum are shown in figure 7. As in figure 6, the dominant feature is the variation over time in low-frequency power, though the variation in [g.sub.[pi][pi]](0, t) differs somewhat from that in [f.sub.[pi][pi]](0, t). Instead of sharp spikes in the 1970s, [g.sub.[pi][pi]](0, t) sweeps gradually upward in the latter half of the 1960s and remains high throughout the 1970s. The spectrum at zero falls sharply after 1980, and there is discernible variation throughout the remainder of the sample.
[FIGURE 7 OMITTED]
Figure 8 depicts two-sigma error bands for [g.sub.[pi][pi]](0, t), based on the principal components of its posterior covariance matrix, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The latter was estimated in the same way as [V.sub.[??]] or [V.sub.[??]]. The third column in table 3 indicates that the first component in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] accounts for only 37 percent of the total variation and that the first 5 components jointly account for 84 percent. Again, this signifies substantial variation in the shape of the path for [g.sub.[pi][pi]](0, t).
[FIGURE 8 OMITTED]
Error bands for the first two components load heavily on the 1970s. Although the bands suggest there was greater persistence than in the early 1960s or mid-1990s, the precise magnitude of the increase is hard to pin down. Roughly speaking, error bands for the first two components suggest that [g.sub.[pi][pi]](0, t) was somewhere between 2 and 10. For the sake of comparison, a univariate AR(1) process with coefficients of 0.85 to 0.97 has values of [g.sub.[pi][pi]](0) in this range. In contrast, the figure suggests that inflation was approximately white noise in the early 1960s and not far from white noise in the mid-1990s. Uncertainty about inflation persistence was increasing again at the end of the sample.
The third, fourth, and fifth components reflect uncertainty about the timing and number of peaks in [g.sub.[pi][pi]](0, t). For example, panels 3 and 5 suggest that paths on which there was a more gradual increase in persistence tended to have a big global peak in the late 1970s, while those on which there was a more rapid increase tended to have comparable twin peaks, first in the late 1960s and then again in 1980. Panel 4 suggests that some paths had twin peaks at the time of the oil shocks, while others had a single peak in 1980. These components jointly account for about 17 percent of the total variation.
One of the questions in which we are most interested concerns the relation between inflation persistence and core inflation. In Cogley and Sargent (2001), we reported evidence of a strong positive correlation. Here we also find a strong positive correlation, equal to 0.92.
The relation between the two series is illustrated in figure 9, which reproduces estimates from figures 3 and 7. As core inflation rose in the 1960s and 1970s, inflation also became more persistent. Both features fell sharply during the Volcker disinflation. This correlation is problematic for the escape route models of Sargent (1999) and Cho, Williams, and Sargent (2002), which predict that inflation persistence grows along the transition from high to low inflation. Our estimates suggest the opposite pattern.
[FIGURE 9 OMITTED]
3.6.3 Monetary Policy Activism
Finally, we consider evidence on the evolution of policy activism. Following Clarida, Gali, and Gertler (2000), we estimate this from a forward-looking Taylor rule with interest smoothing,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (34)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] represents average inflation from t to t + [h.sub.[pi]] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is average unemployment. The activism parameter is defined as A = [[beta].sub.1][(1 - [[beta].sub.3]).sup.-1], and the policy rule is said to be activist if A [greater than or equal to] 1. With a Ricardian fiscal policy, an activist monetary rule delivers a determinate equilibrium. Otherwise, sunspots may matter for inflation and unemployment.
We interpret the parameters of the policy rule as projection coefficients and compute projections from our VAR. This is done via two-stage least squares on a date-by-date basis. The first step involves projecting the Fed's forecasts [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] onto a set of instruments, and the second involves projecting current interest rates onto the fitted values. At each date, we parameterize the VAR with posterior mean estimates of [[theta].sub.t] and [R.sub.t] and calculate population projections associated with those values.
The instruments chosen for the first-stage projection must be elements of the Fed's information set. Notice that a complete specification of their information set is unnecessary; a subset of their conditioning variables is sufficient for forming first-stage projections, subject of course to the order condition for identification. Among other variables, the Fed observes lags of inflation, unemployment, and nominal interest when making current-quarter decisions, and we project future inflation and unemployment onto a constant and two lags of each. Thus, our instruments for the Fed's forecasts [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the VAR forecasts [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], respectively.
Here we follow McCallum, who warns against the assumption that the Fed sees current quarter inflation and unemployment when making decisions. This strategy also sidesteps assumptions about how to orthogonalize current quarter innovations. This is an important advantage of the Clarida, et. al. approach relative to structural VAR methods. Establishing that the Fed can observe some variables is easier than compiling a complete list of what the Fed sees.
It does impose cross-equation restrictions on the VAR, however, since it relates one-step ahead forecasts for the nominal interest rate to averages of multi-step forecasts of inflation and unemployment. We checked these cross-equation restrictions by comparing one-step ahead VAR forecasts for the interest rate with those implied by the estimated Clarida, et. al. rule, and we found that the two forecasts track one another very closely. The mean difference between the two is only 1 basis point at an annual rate, and the standard deviation is only 8 basis points. The VAR predictions are marginally better, with an innovation standard deviation of 0.877 versus 0.879 for the Clarida, et. al. rule, but the difference is in the third decimal point. Thus, the cross-equation restrictions seem admissible.
Point estimates for A are shown in figure 10. Here we assume [h.sub.[pi]] = 4 and [h.sub.u] = 2, but the results for one-quarter ahead forecasts are similar. The estimates broadly resemble those reported by Clarida, et. al., as well as those in our earlier paper. The estimated policy rule was activist in the early 1960s, but became approximately neutral in the late 1960s. In the early 1970s, the policy rule turned passive, and it remained so until the early 1980s. The estimate of A rose sharply around the time of the Volcker disinflation and has remained in the activist region ever since. As shown in figure 11, the estimates of A are inversely related to core inflation and the normalized spectrum at zero, suggesting that changes in policy activism may have contributed to the rise and fall of inflation as well as to changes in its persistence.
[FIGURES 10-11 OMITTED]
Figure 12 suggests, however, that some qualifications are necessary, especially at the beginning and end of the sample. The figure portrays two-sigma error bands based on the principal components of the posterior covariance matrix, [V.sub.A]. The last column of table 3 shows that several principal components contribute to [V.sub.A], with the first component accounting for about two-thirds of the total variation. That there is more than one important component is evidence for variation in the path of A.
[FIGURE 12 OMITTED]
But the shape of the path is well determined only in the middle of the sample. The first four principal components record substantial uncertainty at the beginning and end. We interpret this as a symptom of weak identification. Substantial uncertainty about A occurs at times when inflation is weakly persistent. Our instruments have little relevance when future inflation is weakly correlated with lagged variables, and the policy rule parameters are weakly identified at such times. Thus, inferences about A are fragile at the beginning and end of the sample. There is better evidence of changes in A during the middle of the sample. Lagged variables are more relevant as instruments for the 1970s, when inflation and unemployment were very persistent, and for that period the estimates are more precise.
The next figure characterizes more precisely how the posterior for [A.sub.t] differs across the Burns and Volcker-Greenspan terms. It illustrates histograms for [A.sub.t] for the years 1975, 1985, and 1995. The histograms were constructed by calculating an activism parameter for each draw of [[theta].sub.t] and [R.sub.t] in our simulation, for a total of 5000 in each year. (14) Values for 1975 are shown in black, those for 1985 are in white, and estimates for 1995 are shown in gray.
In 1975, the probability mass was concentrated near 1, and the probability that [A.sub.t] > 1 was 0.208. By 1985, the center of the distribution had shifted to the right, and the probability that [A.sub.t] > 1 had increased to 0.919. The distribution for 1995 is similar to that for 1985, with a 0.941 probability that [A.sub.t] > 1. Comparing estimates along the same sample paths, the probability that At increased between 1975 and 1985 is 0.923, and the probability that it increased between 1975 and 1995 is 0.943.
[FIGURE 13 OMITTED]
The estimates seem to corroborate those reported by Clarida, et. al. that monetary policy was passive in the 1970s and activist for much of the Volcker-Greenspan era. Estimates for the latter period are less precise, but it seems clear that the probability distribution for At shifted to the right.
3.7 Tests for [theta] Stability
Finally, we consider classical tests for variation in [theta]. Bernanke and Mihov (1998a, b) were also concerned about the potential for shifts in VAR parameters arising from changes in monetary policy, and they applied a test developed by Andrews (1993) to examine stability of [theta]. For reduced form vector autoregressions similar to ours, they were unable to reject the hypothesis of time invariance.
We applied the same test to our data and found the same results. We considered two versions of Andrews's sup-LM test, one that examines parameter stability for the VAR as a whole and another that tests stability on an equation-by-equation basis. The results are summarized in table 4. Columns labelled with variable names refer to single-equation tests, and the column labelled 'VAR' refers to a test for the system as a whole. In each case, we fail to reject that . is time invariant. (15)
Bernanke and Mihov correctly concluded that the test provides little evidence against stability of [theta]. But does the result constitute evidence against parameter instability? A failure to reject provides evidence against an alternative hypothesis only if it has reasonably high power. Whether this test has high power against a model like ours is an open question, so we decided to investigate it.
To check the power of the test, we performed a Monte Carlo simulation using our drifting parameter VAR as a data generating process. To generate artificial data, we parameterized equation (1) with draws of [[theta].sup.T], [H.sup.T], and B from the posterior density. For each draw of ([[theta].sup.T],[H.sup.T],B), we generated an artificial sample for inflation, unemployment, and nominal interest and then calculated the sup-LM statistics. We performed 10,000 replications and counted the fraction of samples in which the null hypothesis of constant . is rejected at the 5 percent level. The results are summarized in the second row of table 4.
The power of the test is never very high. The VAR test has highest the success rate, detecting drift in [theta] in about one-fourth of the samples. The detection probabilities are lower in the single equation tests, which reject at the 5 percent level in only about 14 percent of the samples. Thus, even when [theta] drifts in the way we describe, a failure to reject is at least 3 times as likely as a rejection.
Andrews's test is designed to have power against alternatives involving a single shift in [theta] at some unknown break date. The results of this experiment may just reflect that this test is less well suited to detect alternatives such as ours that involve continual shifts in parameters. Accordingly, we also investigate a test developed by Nyblom (1989) and Hansen (1992) that is designed to have power against alternatives in which parameters evolve as driftless random walks. Results for the Nyblom-Hansen test are summarized in table 5.
When applied to actual data, the Nyblom-Hansen test also fails to reject time invariance for [theta]. To examine its power, we conducted another Monte Carlo simulation using our drifting parameter VAR as a data generating mechanism, and we found that this test also has low power against our representation. Indeed, the detection probabilities are a bit lower than those for the sup-LM test.
Boivin (1999) conjectures that the sup-Wald version of Andrews's test may have higher power than the others, and so we also consider this procedure. The results, which are shown in table 6, provide some support for his conjecture. The detection probability is higher in each case, and it is substantially higher for the inflation equation. Indeed, this is the only case among the ones we study in which the detection probability exceeds 50 percent. It is noteworthy that in this case we also strongly reject time invariance in the actual data. Time invariance is also rejected for the VAR as a whole.
We made two other attempts to concentrate power in promising directions. The first focuses on parameters of the Clarida, et. al. policy rule. If drift in [theta] is indeed a manifestation of changes in monetary policy, then tests for stability of the latter should be more powerful than for stability of the former. The vector [theta] has high dimension, and the drifting components in [theta] should lie in a lower-dimensional subspace corresponding to drifting policy parameters. (16) To test stability of the Clarida, et. al. rule, we estimated a version for the period 1959-2000 using our data and instruments, and calculated Andrews's statistics. (17) Perhaps surprisingly in light of their results, the tests fail to reject time invariance (see table 7). We repeated the procedure for artificial samples generated from our VAR to check the power of the test. Once again, the results show that the tests have low detection probabilities.
We also tried to increase power by concentrating on a single linear combination of [theta] that we think is most likely to vary. The linear combination with greatest variance is the first principal component, and we used the dominant eigenvector of the sample variance of E([DELTA][[theta].sub.t|T]) to measure this component. As figures 14 and 15 illustrate, the first principal component dominates the variation in [[theta].sub.t|T]; (18) most of the other principal components are approximately time invariant. The first component is also highly correlated with variation in the features discussed above. Thus it seems to be a promising candidate on which to concentrate.
[FIGURES 14-15 OMITTED]
Yet the results of a Monte Carlo simulation, shown in table 8, suggest that power remains low, with a rejection probability of only about 15 percent. Indeed, the procedure is inferior to the VAR tests reported above. Agnosticism about drifting components in [theta] seems to be better. Despite the low power, one of the tests rejects time invariance in actual data.
To summarize, most of our tests fail to reject time invariance of [theta], but most also have low power to detect the patterns of drift we describe above. In the one case where a test has a better-than-even chance of detecting drift in [theta], for the data time invariance is rejected at better than the one-percent level. One reasonable interpretation is that [theta] is drifting, but that most of the procedures are unable to detect it.
Perhaps low power should not be a surprise. Our model nests the null of time invariance as a limiting case, i.e. when Q = 0. One can imagine indexing a family of alternative models in terms of Q. For Q close to zero, size and power should be approximately the same. Power should increase as Q gets larger, and eventually the tests are likely to reject with high probability. But in between there is a range of alternative models, arrayed in terms of increasing Q, that the tests are unlikely to reject. The message of the Monte Carlo detection statistics is that a model such as ours with economically meaningful drift in [theta] often falls in the indeterminate range.
One respectable view is that either an erroneous model, insufficient patience, or his inability to commit to a better policy made Arthur Burns respond to the end of Bretton Woods by administering monetary policy in a way that produced the greatest peace time inflation in U.S. history; and that an improved model, more patience, or greater discipline led Paul Volcker to administer monetary policy in a way that conquered American inflation. (19) Another respectable view is that what distinguished Burns and Volcker was not their models or policies but their luck. This paper and its predecessor (Cogley and Sargent (2001)) fit time series models that might help distinguish these views.
This paper also responds to Sims's (2001) and Stock's (2001) criticism of the evidence for drifting systematic parts of vector autoregressions in Cogley and Sargent (2001) by altering our specification to include stochastic volatility. While we have found evidence for drifting variances within our new specification, we continue to find evidence that the VAR coefficients have drifted, mainly along one important direction. Our model is atheoretical, but for reasons discussed in Appendix A and also by Sargent (1999) and Luca Benati (2001), the presence of drifting coefficients contains clues about whether government policy makers' models or preferences have evolved over time.
It is appropriate to be cautious in accepting evidence either for or against drifting coefficients. For reasons that are most clear in continuous time (see Anderson, Hansen, and Sargent (2000)), it is much more difficult to detect evidence for movements in the systematic part of a vector autoregression than it is to detect stochastic volatility. This situation is reflected in the results of our experiments with implementing Bernanke and Mihov's tests under an artificial economy with drifting coefficients.
A Theories of economic policy
Contrasting visions of aggregate economic time series that we attribute to Lucas (1976), Sargent and Wallace (1976), and Sims (1982) can be represented within the following modification of the setting of Lucas and Sargent (1981). A state vector [x.sub.t] [member of] X evolves according to the possibly nonlinear stochastic difference equation
[x.sub.t+1] - [x.sub.t] = f([x.sub.t], t, [u.sub.t], [v.sub.t], [[member of].sub.t+1]) (35)
where [u.sub.t] [member of] U is a vector of decisions of private agents, [v.sub.t] [member of] V is a vector of decisions by the government, and [[member of].sub.t] [member of] [epsilon] is an i.i.d. sequence of random variables with cumulative distribution function [PHI]. A particular example of (35) is
[x.sub.t+1] - [x.sub.t] = [mu]([x.sub.t], t, [u.sub.t], [v.sub.t]) + [sigma]([x.sub.t], t) [[epsilon].sub.t+1] (36)
where [PHI] is Gaussian. Borrowing terms from the corresponding continuous time diffusion specification, we call [mu] the drift and [sigma] the volatility.
Suppose that [u.sub.t] and [v.sub.t] are governed by the sequences of history-dependent policy Functions
[u.sub.t] = h([x.sup.t], t), (37)
[v.sub.t] = g([x.sup.t], t), (38)
where [x.sup.t] denotes the history of [x.sub.s], s = 0, ..., t. Under the sequences of decision rules (37) and (38), equation (36) becomes
[x.sub.t+1] - [x.sub.t] = [mu]([x.sub.t], t, h([x.sup.t], t), g([x.sup.t], t)) + [sigma]([x.sub.t], t)[[member of].sub.t+1]. (39)
This is a nonlinear vector autoregression with stochastic volatility.
Economic theory restricts h and g. Private agents' optimum problems and market equilibrium conditions imply a mapping (20)
h = [T.sub.h](f, g) (40)
from the technology and information process f and the government policy g to the private sector's equilibrium policy h. Given [T.sub.h], the normative theory of economic policy would have the government choose g as the solution of the problem max
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (41)
where W is a one-period welfare criterion and the optimization is subject to (36) and (40). Notice that the government chooses both g and h, although its manipulation of h is subject to (40). Problem (41) is called a Stackelberg or Ramsey problem.
Lucas's (1976) Critique was directed against a faulty econometric policy evaluation procedure that ignores constraint (40). The faulty policy evaluation problem is (21)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (42)
subject to (36) and h = [??], where [??] is a fixed sequence of decision rules for the private sector. Lucas pointed out first that problem (42) ignores (40) and second that a particular class of models that had been used for [??] were misspecified because they imputed irrational expectations to private decision makers. Let us express the government's possibly misspecified econometric model for [??] through
[??] = S(f, g, h), (43)
which maps the truth as embodied in the f, g, h that actually generate the data into the government's beliefs about private agents' behavior. The function S embodies the government's model specification and also its estimation procedures. See Sargent (1999) for a concrete example of S within a model of the Phillips curve.
The faulty policy evaluation problem (42) induces
g = [T.sub.g](f, [??]). (44)
The heart of the Lucas critique is that this mapping does not solve the appropriate policy problem (41).
A.1 Positive implications of imperfect policy making
What outcomes should we expect under the faulty econometric policy evaluation procedure? The answer depends partly on how the government's econometric estimates [??] respond to observed outcomes through the function (43). Suppose that the government begins with an initial specification [[??].sub.0] and consider the following iterative process for j [greater than or equal to] 1:
[g.sub.j] = [T.sub.g](f, [[??].sub.j-1]), (45)
[h.sub.j] = [T.sub.h](f, [g.sub.j]), (46)
[[??].sub.j] = S(f, [g.sub.j], [h.sub.j]). (47)
In step (45), for fixed [[??].sub.j-1], the government solves the faulty policy problem (42); in step (46) the private sector responds to the government policy [g.sub.j]; in step (47), the government adjusts its econometric model [[??].sub.j] to reflect outcomes under government policy [g.sub.j] . We can write the iterative process more compactly as
[g.sub.j] = B(f, [g.sub.j-1]) (48)
where B(f, [g.sub.j-1]) = [T.sub.g](f, S(f, [g.sub.j-1], [T.sub.h](f, [g.sub.j-1]))). Eventually, this iterative process might settle down to a fixed point
g = B(f, g). (49)
In the spirit of Fudenberg and Levine (1993), Fudenberg and Kreps (1995), and Sargent (1999), a self-confirming equilibrium is a government policy g that satisfies (49) and an associated government belief [??].
In the following subsections, we first use the iterative scheme (45), (46), (47) to make contact with part of Lucas's critique. Then we relate the fixed point (49) to the views and practices of Sims (1982, 1999).
A.2 Adaptation: reconciling two parts of the Lucas critique
Lucas's (1976) Critique consisted of two parts. The first part of Lucas's paper summarized empirical evidence for drift in representations like (36), that is, dependence of [mu] on t, and interpreted it as evidence against particular econometric specifications that had attributed suboptimal forecasts about (x, v) to private agents. The second part of his paper focused on three concrete examples designed to show how the mapping (40) from g to h would influence time series outcomes. Though Lucas didn't explicitly link the first and second parts, a reader can be forgiven for thinking that he meant to suggest that a substantial part of the drift in [mu] described in the first part of his paper came from drift in private agents' decision rules that had been induced through mapping (40) by drift in government decision rules.
If we could somehow make a version of the iterative process (45), (46), (47) occur in real time, we get a model of coefficient drift that is consistent with this vision. The literature on least squares learning gets such a real time model by attributing to both private agents and the government a sophisticated kind of adaptive behavior in which the mappings [T.sub.g], [T.sub.h], S play key roles. This literature uses recursive versions of least squares learning to deduce drift in g whose average behavior can eventually be described by the ordinary differential equation (22)
d/[d.sub.t]g = B(f, g) - g. (50)
In this way it is possible to use the transition dynamics of adaptive systems based on (45), (46), (47) to explain the parameter drift that Lucas emphasized in the first part of his critique. Sargent (1999) and Cho, Williams, and Sargent (2002) pursue this line and use it to build models of drifting unemployment-inflation dynamics. (23,24)
A.3 Another view: asserting a self-confirming equilibrium
Another view takes the data generating mechanism to be the self-confirming equilibrium composed of (49) and (36), unadorned by any transition dynamics based on (45), (46), (47). (25) This view assumes that any adaptation had ended before the sample began. It would either exclude parameter drift or else would interpret it as consistent with a self-confirming equilibrium. (26) Thus, parameter drift would reflect nonlinearities in the law of motion (36) that are accounted for in decision making processes (i.e., hyperparameters would not be drifting). That g is a fixed point of (49) either excludes government policy regime shifts or requires that they be interpreted as equilibrium government best responses that are embedded in the mapping [T.sub.g] in (44) and that are discounted by private agents in the mapping [T.sub.h].
A.4 Empirical issues
Inspired by theoretical work within the adaptive tradition that permits shifts in policy outside of a self-confirming equilibrium, our earlier paper (Cogley and Sargent (2001)) used a particular nonlinear vector autoregression (39) to compile evidence about how the systematic part of the autoregression, [x.sub.t] + [mu]([x.sup.t]) in (39), has drifted over time. Our specification excluded stochastic volatility (we assumed that [sigma]([x.sub.t], t) = [??]). We appealed to adaptive models and informally interpreted the patterns of 'drifting coefficients' in our nonlinear time series model partly as reflecting shifting behavior rules of the Fed, shifts due to the Fed's changing preferences or views of the economy. (27)
Sims (1999) and Bernanke and Mihov (1998a, 1998b) analyzed a similar data set in a way that seems compatible with a self-confirming equilibrium within a linear time-invariant structure. They used specializations of the vector time series model (39) that incorporate stochastic volatility but not drift in the systematic part of a linear vector autoregression. Their models can be expressed as
[x.sub.t+1] - [x.sub.t] = A[x.sub.t] + [sigma]([x.sub.t], t)[[member of].sub.t+1], (51)
where we can regard [x.sub.t] as including higher order lags of variables and A is composed of companion submatrices. They compiled evidence that this representation fits post World War II data well and used it to interpret the behavior of the monetary authorities. They found that the systematic part of the vector autoregression A did not shift over time, but that there was stochastic volatility ([sigma]([x.sub.t], t) [not equal to] = [??]). Thus, they reconciled the data with a linear autoregression in which shocks drawn from time-varying distributions nevertheless feed through the system linearly in a time-invariant way. They reported a lack of evidence for alterations in policy rules (in contrast to the perspective taken for example by Clarida, Gali, and Gertler (2000)).
In this paper, we fit a model of the form (39) that, permits both drifting coefficients and stochastic volatility, thereby generalizing both our earlier model and some of the specifications of Bernanke and Mihov and Sims. We use this specification to confront criticisms from Sims (2001) and Stock (2001), both of whom suggested that our earlier results were mainly artifacts of our exclusion of stochastic volatility.
B The Relation Between the Restricted and Unrestricted Models
The prior for the unrestricted model is
f([[theta].sup.T],Q, [OMEGA]) = f([[theta].sup.T]|Q, [OMEGA]) f(Q, [OMEGA]), (52)
where [[theta].sup.T] represents the VAR parameters, Q is their innovation variance, and [OMEGA] stands for everything else. Because of the independence assumptions on the prior, this can be written as
f([[theta].sup.T],Q, [OMEGA]) = f([[theta].sup.T]|Q) f(Q)f([OMEGA]). (53)
The restricted model adds an a priori condition that rules out explosive values of [theta],
p([[theta].sup.T],Q, [OMEGA]) = I([[theta].sup.T]) f([[theta].sup.T], Q, [OMEGA])/[integral] [integral] [integral] I([[theta].sup.T]) f([[theta].sup.T],Q, [OMEGA]) d[[theta].sup.T]dQd[OMEGA]. (54)
Thus, the stability condition truncates and renormalizes the unrestricted prior.
We can factor f([[theta].sup.T],Q, [OMEGA]) as before to obtain
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (55)
where the last equality follows from the fact that f(.) is proper. Now define
[m.sub.[theta]](Q) [equivalent to] [integral] I([[theta].sup.T]) f([[theta].sup.T]|Q) d[[theta].sup.T], (56)
[m.sub.Q] [equivalent to] [integral] [m.sub.[theta]](Q) f(Q)dQ. (57)
The term [m.sub.[theta]](Q) is the conditional probability of a non-explosive draw from the unrestricted transition density, f([[theta].sup.T]|Q), as a function of Q. The number [m.sub.Q] is the mean of the conditional probabilities, averaged across draws from the marginal prior f(Q). Since both are probabilities, it follows that
0 [less than or equal to] [m.sub.[theta]](Q) [less than or equal to] 1, (58) 0 [less than or equal to] [m.sub.Q] = 1.
The left-hand inequality for [m.sub.Q] is strict if there is some chance of a non-explosive draw for some value of Q. For finite T there always is.
After re-arranging terms in (55), we find
p([[theta].sup.T],Q, [OMEGA]) = I([[theta].sup.T]) f([[theta].sup.T]|Q)/ [m.sub.[theta]](Q) [m.sub.[theta]](Q) f(Q)/ [m.sub.Q] f([OMEGA]). (59)
Thus the restricted conditional prior for [[theta].sup.T] given Q is
p([[theta].sup.T]|Q) = I([[theta].sup.T])f([[theta].sup.T]|Q)/[m.sub.[theta]](Q), (60)
(equation 7 in the text). Similarly, the restricted marginal prior for Q is
p(Q) = [m.sub.[theta]](Q)f(Q)/[m.sub.Q] (61)
(equation 8 in the text). The marginal prior for [OMEGA] remains the same as for the unrestricted model, p([OMEGA]) = f([OMEGA]). Notice that each term is normalized to integrate to 1; i.e., each component is proper.
From (7) we can derive the restricted transition density. This is defined as
p([[theta].sub.t+1]|[[theta].sub.t],Q) = p([[theta].sub.t+1], [[theta].sub.t]|Q)/p([[theta].sub.t]|Q). (62)
The numerator can be expressed as
p([[theta].sub.t+1], [[theta].sub.t]|Q) = [integral][integral] p([[theta].sup.T]|Q)d[[theta].sup.t-1] d[[theta].sup.t+2,T], (63)
where [[theta].sup.t-1] represents the history of [[theta].sub.s] up to date t -1 and [[theta].sup.t+2,T] represents the path from dates t + 2 to T. After substituting from equation (7), this becomes
p([[theta].sub.t+1], [[theta].sub.t]|Q) = 1/[m.sub.[theta]](Q) [integral][integral] [[PI].sup.T-1.sub.s=0] I([[theta].sub.s+1]) f([[theta].sub.s+1]|[[theta].sub.s],Q) d[[theta].sup.t-1] d[[theta].sup.t+2,T], (64)
The integrand can be expanded as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (65)
It follows that
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (66)
The marginal density for [[theta].sub.t] can be expressed as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (67)
The transition density is the ratio of (67) to (68),
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (68)
The integral in the numerator is the expectation of I([[theta].sup.t+2,T]) with respect to the conditional density f([[theta].sup.t+2,T]|[[theta].sub.t+1],Q). This represents the probability that random walk trajectories emanating from [[theta].sub.t+1] will remain in the nonexplosive region from date t + 2 through date T. In the text, this term is denoted [pi]([[theta].sub.t+1],Q). Hence the transition density is
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (69)
The posterior for the restricted model is
p([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]) = p([Y.sup.T]|[[theta].sup.T],Q, [OMEGA])p([[theta].sup.T]|Q)p(Q)p([OMEGA])/ m([Y.sup.T]), (70)
where m([Y.sup.T]) is the marginal likelihood. After substituting from equations (7) and (8), we can express this as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (71)
The term in brackets in the numerator is the posterior kernel for the unrestricted model. After multiplying and dividing by the marginal likelihood for the unrestricted model, which we denote [m.sub.U]([Y.sup.T]), we find
p([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]) = [m.sub.U]([Y.sup.T])/m([Y.sup.T])[m.sub.Q] I([[theta].sup.T]) [p.sub.U]([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]), (72)
where [p.sub.U]([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]) is the posterior corresponding to the unrestricted prior, f(*). The posterior for the restricted model is proportional to the truncation of the posterior of the unrestricted model, with a factor of proportionality depending on the normalizing constants [m.sub.Q],[m.sub.L]([Y.sup.T]), and m([Y.sup.T]).
C A Markov Chain Monte Carlo Algorithm for Simulating the Posterior Density
We use MCMC methods to simulate the restricted posterior density. As in our earlier paper, we simulate the unrestricted posterior [p.sub.U](*|[Y.sup.T]), and then use rejection sampling to rule out explosive outcomes. The first part of this appendix justifies rejection sampling, and the second describes the algorithm used for simulating draws from [p.sub.U](*|[Y.sup.T]).
C.1 Rejection Sampling
The target density is p([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]), and the proposal is [p.sub.U]([[theta].sup.T],Q, [OMEGA]|[Y.sup.T]). Since the former is proportional to a truncation of the latter, the proposal is well-defined and positive on the support of the target. Since the proposal is a probability density, it integrates to 1. The importance ratio,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (73)
has a known, finite upper bound, [??]. The acceptance probability is
R([[theta].sup.T],Q, [OMEGA])/[??] = I([[theta].sup.T]). (74)
This says we accept if [[theta].sup.T] is non-explosive and reject otherwise. Thus, we can sample from the posterior of the restricted model by simulating the unrestricted model and discarding the explosive draws.
C.2 Sampling from [p.sub.U](*|[Y.sup.T])
We combine the techniques used in Cogley and Sargent (2001) with those of Jacquier, Polson, and Rossi (1994) to construct a Metropolis-within-Gibbs sampler. The algorithm consists of 5 steps, one for [[theta].sup.T], Q, [beta], the elements of [sigma], and the elements of [H.sup.T]. Our prior is that the blocks of parameters are mutually independent, and we assume the marginal prior for each block has a natural conjugate form; details are given above. The first two steps of the algorithm are essentially the same as in our earlier paper, [beta] is treated as a vector of regression parameters, and the elements of s are treated as inverse-gamma variates. To sample [H.sup.T], we apply a univariate algorithm from Jacquier, et. al. to each element. This is possible because the stochastic volatilities are assumed to be independent.
C.2.1 VAR parameters, [[theta].sup.T]
We first consider the distribution of VAR parameters conditional on the data and other blocks of parameters. Conditional on [H.sup.T] and ?, one can calculate the entire sequence of variances [R.sub.t]; we denote this sequence by [R.sup.T]. Conditional on [R.sup.T] and Q, the joint posterior density for VAR parameters can be expressed as (28)
[p.sub.U]([[theta].sup.T]|[Y.sup.T],Q,[R.sup.T]) = f([[theta].sup.T]|[Y.sup.T],Q,[R.sup.T]) [[PI].sup.T-1.sub.t=1]f([[theta].sub.t]|[[theta].sub.t+1], [Y.sup.t],Q,[R.sub.t]). (75)
The unrestricted model is a linear, conditionally Gaussian state-space model. Assuming a Gaussian prior for [[theta].sub.0], all the conditional densities on the right hand side of (75) are Gaussian. Their means and variances can be computed via a forward and backward recursion.
The forward recursion uses the Kalman filter. Let
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (76)
represent conditional means and variances going forward in time. These can be computed recursively, starting from the prior mean and variance for [[theta].sub.0],
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (77)
At the end of the sample, the forward recursion delivers the mean and variance for [[theta].sup.T], and this pins down the first term in (75),
f([[theta].sub.T]|[Y.sup.T],Q,[R.sup.T]) = N([[theta].sub.T|T], [P.sub.T|T]). (78)
The remaining terms in (75) are derived from a backward recursion, which updates conditional means and variances to reflect the additional information about [[theta].sub.t] contained in [[theta].sub.t+1]. Let
[[theta].sub.t|t+1] [equivalent to] E([[theta].sub.t]|[[theta].sub.t+1], [Y.sup.t], Q, [R.sup.T]), [P.sub.t|t+1] [equivalent to] V ar([[theta].sub.t]|[[theta].sub.t+1], [Y.sup.t], Q, [R.sup.T]), (79)
represent updated estimates of the mean and variance. Because [[theta].sub.t] is conditionally normal, these are
[[theta].sub.t|t+1] = [[theta].sub.t|t] + [P.sub.t|t][P.sup.-1.sub.t+1|t]([[theta].sub.t+1] - [[theta].sub.t]|t), (80) [P.sub.t|t+1] = [P.sub.t|t] - [P.sub.t|t][P.sup.-1.sub.t+1|t][P.sub.t|t]. (80)
The updated estimates determine the mean and variance for remaining elements in (75),
f([[theta].sub.t]|[[theta].sub.t+1], [Y.sup.T], Q, [R.sup.T]) = N([[theta].sub.t|t+1], [P.sub.t|t+1]). (81)
A random trajectory for [[theta].sup.T] is generated by iterating backward. The backward recursion starts with a draw of [[theta].sub.T] from (78). Then, conditional on its realization, [[theta].sub.T-1] is drawn from (81), [[theta].sup.T]-2 is drawn conditional on the realization of [[theta].sup.T-1], and so on back to the beginning of the sample.
C.2.2 Innovation Variance for VAR Parameters, Q:
The next step involves the distribution of Q conditional on the data and other parameter blocks. Conditional on a realization for [[theta].sup.T], the VAR parameter innovations, [v.sub.t], are observable. Furthermore, the other conditioning variables are irrelevant at this stage,
f(Q|[Y.sup.T], [[theta].sup.T], [sigma], [beta], [H.sup.T]) = f(Q|[Y.sup.T], [[theta].sup.T]). (82)
Knowledge of [sigma] is redundant conditional on [H.sup.T], and [beta] and [H.sup.T] are irrelevant because [v.sub.t] is independent of [[theta].sub.t] and [[eta].sub.it].
Under the linear transition law, [v.sub.t] is iid normal. The natural conjugate prior in this case is an inverse-Wishart distribution, with scale parameter [??] and degrees of freedom [T.sub.0]. Given an inverse-Wishart prior and a normal likelihood, the posterior is inverse-Wishart,
f(Q|[Y.sup.T], [[theta].sup.T]) = IW([Q.sup.-1.sub.1], [T.sub.1]), (83)
with scale and degree-of-freedom parameters,
[Q.sub.1] = [??] + [[SIGMA].sup.T.sub.t=1] [v.sub.t][v'.sub.t], [T.sub.1] = [T.sub.0] + T. (84)
C.2.3 Standard Deviation of Volatility Innovations, [sigma]
The third step involves the full conditional distribution for [sigma],
f([sigma]|[[theta].sup.T],[H.sup.T], [beta], Q, [Y.sup.T]). (85)
Knowledge of Q is redundant conditional on [[theta].sup.T]. The latter conveys information about [v.sub.t] and [[epsilon].sub.t], but both are conditionally independent of the volatility innovations. (29) Thus, conditioning on [[theta].sup.T] is also irrelevant. [beta] orthogonalizes [R.sub.t] and therefore carries information about [H.sub.t], but this is redundant given direct observations on [H.sub.t]. Given a realization for [H.sup.T], one can compute the scaled volatility innovations, [[sigma].sub.i][[eta].sub.it], i = 1, ..., 3. Because the volatility innovations are mutually independent, we can work with the full conditional density for each. Therefore the density for s1 simplifies to
f([[sigma].sub.1]|[[sigma].sub.2], [[sigma].sub.3], [[theta].sup.T], [H.sup.T], [beta], Q, [Y.sup.T]) = f([[sigma].sub.1]|[h.sup.T.sub.1], [Y.sup.T]),
and similarly for [[sigma].sup.2.sub.2] and [[sigma].sup.2.sub.3].
The scaled volatility innovations are iid normal with mean zero and variance [[sigma].sup.2.sub.i]. Assuming an inverse-gamma prior with scale parameter [[delta].sub.0] and [v.sub.0] degrees of freedom, the posterior is also inverse gamma,
f([[sigma].sup.2.sub.i]|[h.sup.T.sub.i],[Y.sup.T]) = IG([v.sub.1]/2, [[delta].sub.1]/2), (86)
[v.sub.1] = [v.sub.0] + T, (87)
[[delta].sub.1] = [[delta].sub.0] + [[SIGMA].sup.T.sub.t=1 [([DELTA]ln[h.sub.it]).sup.2. (88)
C.2.4 Covariance Parameters, [beta]
Next, we consider the distribution of [beta] conditional on the data and other parameters. Knowledge of [[theta].sup.T] and [Y.sup.T] implies knowledge of et, which satisfies
B[[epsilon].sub.t] = [u.sub.t], (89)
where [u.sub.t] is a vector of orthogonalized residuals with known error variance [H.sub.t]. We interpret this as a system of unrelated regressions. The first equation in the system is the identity
[[epsilon].sub.1t] = [u.sub.1t]. (90)
The second and third equations can be expressed as transformed regressions,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (91)
with independent standard normal residuals.
Once again, many of the conditioning variables drop out. Q and [sigma] are redundant conditional on [[theta].sup.T] and [H.sup.T], respectively, and [h.sup.T.sub.j], j [not equal to] i, are irrelevant because the elements of [u.sub.t] are independent. Assuming a normal prior for the regression coefficients in each equation,
f([[beta].sub.i]) = N([[beta].sub.i0], [V.sub.i0]), i = 2, 3, (92)
the posterior is also normal,
f([[beta].sub.i]|[Y.sup.T], [[theta].sup.T], [h.sup.T.sub.i]) = N([[beta].sub.i1], [V.sub.i1]), i= 2, 3, (93)
[V.sub.i1] = [(V.sup.-1.sub.i0] + [Z'.sub.i][Z.sub.i]).sup.-1], (94)
[[beta].sub.i1] = [V.sub.i1]([V.sup.-1.sub.i0] [[beta].sub.i0] + [Z'.sub.i][z.sub.i]). (95)
The variables [z.sub.i] and [Z.sub.i] refer to the left and right-hand variables, respectively, in the transformed regressions.
C.2.5 Stochastic Volatilities, [H.sup.T]
The final step involves the conditional distribution of the elements of [H.sup.T]. To sample the stochastic volatilities, we apply the univariate algorithm of Jacquier, et. al. (1994) to each element of the orthogonalized VAR residuals, [u.sub.t]. The latter are observable conditional on [Y.sup.T], [[theta].sup.T], and B. We can proceed on a univariate basis because the stochastic volatilities are mutually independent.
Jacquier, et. al. adopted a date-by-date blocking scheme and developed the conditional kernel for
f([h.sub.it]|[h.sub.-it], [u.sup.T.sub.i], [[sigma].sub.i]) = f([h.sub.it]|[h.sub.it-1], [h.sub.it+1], [u.sup.T.sub.i], [[sigma].sub.i]), (96)
where [h.sub.-it] represents the vector of h's at all other dates. The simplification follows from the assumption that [h.sub.it] is Markov. Knowledge of Q is redundant given [[theta].sup.T], and [h.sup.T.sub.j] and [[sigma].sub.j], i [not equal to] j, are irrelevant because the stochastic volatilities are independent. By Bayes' theorem, the conditional kernel can be expressed as (30)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (97)
Its form follows from the normal form of the conditional likelihood, f([u.sub.it]|[h.sub.it]), and the log-normal form of the log-volatility equation, (15). The parameters [[mu].sub.it] and [[sigma].sup.2.sub.ic] are the conditional mean and variance of [h.sub.it] implied by (15) and knowledge of [h.sub.it-1] and [h.sub.it+1]. In the random walk case, they are
[[mu].sub.it] = (1/2)(ln [h.sub.it+1] + ln [h.sub.it-1]), [[sigma].sup.2.sub.ic] = (1/2)[[sigma].sup.2.sub.i]. (98)
Notice that the normalizing constant is absent from (97). Jacquier, et. al. say the normalizing constant is costly to compute, and they recommend a Metropolis step instead of a Gibbs step. One natural way to proceed is to draw a trial value for [h.sub.it] from the log-normal density implied by (15), and then use the conditional likelihood f([u.sub.it]|[h.sub.it]) to compute the acceptance probability. Thus, our proposal density is
q([h.sub.it]) [proportional] [h.sup.-1.sub.it] exp(-[(ln [h.sub.it] - [[mu].sub.it]).sup.2]/2[[sigma].sup.2.sub.ic]), (99)
and the acceptance probability for the mth draw is
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (100)
We set [h.sup.m.sub.it] = [h.sup.m-1.sub.it] if the proposal is rejected. The algorithm is applied on a date-by-date basis to each of the elements of [u.sub.t].
For comments and suggestions, the authors are grateful to Jean Boivin, Marco Del Negro, Mark Gertler, Sergei Morozov, Simon Potter, Christopher Sims, Mark Watson, and Tao Zha. This paper was presented at the Monetary Policy and Learning Conference sponsored by the Federal Reserve Bank of Atlanta in March 2003. The views expressed here are the authors' and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System. Any remaining errors are the authors' responsibility.
Please address questions regarding content to Tim Cogley, Department of Economics, University of California, Davis, One Shields Avenue, Davis, California 95616, 530-752-1581, email@example.com or Tom Sargent, Department of Economics, New York University, 269 Mercer Street, 8th Floor, New York, New York 10003, 212-998-3548, thomas[[theta].sub.s]firstname.lastname@example.org.
Aguilar, Omar and Mike West, 2001, "Bayesian Dynamic Factor Models and Portfolio Allocation,' Journal of Business and Economic Statistics.
Anderson, Evan, Lars Peter Hansen, and Thomas J. Sargent, 2000, "Robustness, Detection, and the Price of Risk," Mimeo, Department of Economics, Stanford University.
Andrews, Donald W.K., 1993, "Tests for Parameter Instability and Structural Change with Unknown Change Point," Econometrica 61, pp. 821-856.
Benati, Luca, 2001, "Investigating Inflation Dynamics Across Monetary Regimes: Taking the Lucas Critique Seriously," Bank of England working paper.
Bernanke, Ben S. and Ilian Mihov, 1998a, "The Liquidity E.ect and Long-Run Neutrality." In Carnegie-Rochester Conference Series on Public Policy, 49, Bennett T. McCallum and Charles I. Plosser, eds. (Amsterdam: North Holland), pp. 149-194.
-- and --, 1998b, "Measuring Monetary Policy." Quarterly Journal of Economics, 113, August, pp. 869-902.
Boivin, Jean, 1999, "Revisiting the Evidence on the Stability of Monetary VARs," unpublished manuscript, Graduate School of Business, Columbia University.
Bray, Margaret M. and David Kreps, 1986, "Rational Learning and Rational Expectations," in W. Heller, R. Starr, and D. Starrett, eds., Essays in Honor of Kenneth J. Arrow (Cambridge University Press: Cambridge, UK).
Cho, In Koo, Noah Williams, and Thomas J. Sargent, 2002, "Escaping Nash Inflation," Review of Economic Studies, Vol. 69, January, pp. 1-40.
Clarida, Richard, Jordi Gali, and Mark Gertler, 2000, "Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory," Quarterly Journal of Economics 115(1), pp. 147-180.
Cogley, Timothy and Thomas J. Sargent, 2001, "Evolving Post World War II U.S. Inflation Dynamics, NBER Macroeconomics Annual 16, pp. 331-373.
DeLong, J. Bradford, 1997, "America's Only Peacetime Inflation: the 1970's," in Christina Romer and David Romer (eds.), Reducing Inflation. NBER Studies in Business Cycles, Volume 30.
Evans, George W. and Seppo Honkapohja, 2001, Learning and Expectations in Macroeconomics (Princeton University Press: Princeton, New Jersey).
Fudenberg, Drew and David K. Levine, 1993, "Self-Confirming Equilibrium," Econometrica 61, pp. 523-545.
-- and David M. Kreps, 1995, "Learning in Extensive Games, I: Self-Confirming Equilibria," Games and Economic Behavior 8, pp. 20-55.
Hansen, Bruce E., 1992, "Testing For Parameter Instability in Linear Models," Journal of Policy Modeling 14, pp. 517-533.
Ireland, Peter, 1999, "Does the Time-Consistency Problem Explain the Behavior of Inflation in the United States?" Journal of Monetary Economics 44(2), pp. 279-292.
Jacquier, Eric, Nicholas G. Polson, and Peter Rossi, 1994, "Bayesian Analysis of Stochastic Volatility Models," Journal of Business and Economic Statistics 12, pp. 371-418.
--,--, and, --, 1999, "Stochastic Volatility: Univariate and Multivariate Extensions," unpublished manuscript, Finance Department, Boston College and Graduate School of Business, University of Chicago.
Kim, Chang-Jin and Charles R. Nelson, 1999a, "Has The U.S. Economy Become More Stable? A Bayesian Approach Based on a Markov Switching Model of the Business Cycle," Review of Economics and Statistics 81(4), 608-661.
Kreps, David, 1998, "Anticipated Utility and Dynamic Choice," Mimeo, 1997 Schwartz Lecture, Northwestern University.
Kydland, Finn and Edward C. Prescott, 1977, "Rules Rather than Discretion: the Inconsistency of Optimal Plans," Journal of Political Economy 85, pp. 473-491.
Leeper, Eric and Tao Zha, 2001a, "Empirical Analysis of Policy Interventions," Mimeo, Department of Economics, Indiana University and Research Department, Federal Reserve Bank of Atlanta.
-- and --, 2001b, "Toward a Theory of Modest Policy Interventions," Mimeo, Department of Economics, Indiana University and Research Department, Federal Reserve Bank of Atlanta.
Lucas, Robert E., Jr., 1976, "Econometric Policy Evaluation: A Critique," in The Phillips Curve and Labor Markets, edited by Karl Brunner and Alan Meltzer, Carnegie-Rochester Series on Public Policy, vol. 1.
-- and Thomas J. Sargent, 1981, "Introduction" in Robert E. Lucas and Thomas J. Sargent (eds.) Rational Expectations and Econometric Practice (Minneapolis: University of Minnesota Press).
McCallum, Bennett T., 1999, "Issues in the Design of Monetary Policy Rules" in Taylor, John B. and Michael Woodford, eds., Handbook of Macroeconomics vol. 1C (Amsterdam: Elsevier Science).
McConnell, Margaret and Gabriel Perez Quiros, 2000, "Output Fluctuations in the United States: What Has Changed Since the Early 1980s?" American Economic Review 90(5), 1464-1476.
Nyblom, Jukka, 1989, "Testing for the Constancy of Parameters Over Time," Journal of the American Statistical Association 84, pp. 223-230.
Parkin, Michael, 1993, "Inflation in North America," in Price Stabilization in the 1990s, edited by Kumiharo Shigehara.
Pitt, Mark and Neil Shepard, 1999, "Time-Varying Covariances: A Factor Stochastic Volatility Approach," in Bayesian Statistics 6, J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, eds., (Oxford University Press: Oxford).
Romer, Christina D. and David H. Romer, 2002, "The Evolution of Economic Understanding and Postwar Stabilization Policy," forthcoming in the 2002 Jackson Hole conference volume, Federal Reserve Bank of Kansas City.
Rudebusch, Glenn D. and Lars E.O. Svensson, 1999, "Policy Rules for Inflation Targeting," in Monetary Policy Rules, edited by John B. Taylor, NBER Conference Report (University of Chicago Press: Chicago, Illinois).
Samuelson, Paul A., and Robert M. Solow, 1960. "Analytical Aspects of Anti-Inflation Policy," American Economic Review, Vol 50, May, pp. 177--184.
Sargent, Thomas J., 1999, The Conquest of American Inflation (Princeton University Press: Princeton, New Jersey).
--1984, "Autoregressions, Expectations, and Advice," American Economic Review, Papers and Proceedings 74, pp. 408-415.
-- and Neil Wallace, 1976, "Rational Expectations and the Theory of Economic Policy," Journal of Monetary Economics 2, pp. 169-183.
Sims, Christopher A., 1980, "Comparison of Interwar and Postwar Business Cycles: Monetarism Reconsidered", American Economic Review, pp. 250-257.
--, 1982, "Policy Analysis with Econometric Models," Brookings Papers on Economic Activity, Vol. 1, pp. 107-152.
--, 1988, "Projecting Policy Effects with Statistical Models," Revista de Analysis Economico 3, pp. 3-20.
--, 1999. "Drifts and Breaks in Monetary Policy," mimeo, Princeton University.
--, 2001, "Comment on Sargent and Cogley's 'Evolving Post World War II U.S. Inflation Dynamics,"' NBER Macroeconomics Annual 16, pp 373-379.
Sims, Christopher A. and Tao Zha, 1999, "Error Bands for Impulse Responses," Econometrica 67, pp. 1113-1155.
Stock, James H., 2001, "Discussion of Cogley and Sargent 'Evolving Post World War II U.S. Inflation Dynamics,"' NBER Macroeconomics Annual 16, pp. 379-387.
Stokey, Nancy L., 1989, "Reputation and Time Consistency," American Economic Review, Papers and Proceedings 79, pp. 134-139.
Taylor, John B., (1993), "Discretion versus Policy Rules in Practice," in Carnegie-Rochester Conference Series on Public Policy, Vol. 39, December, pp. 195-214.
-- 1997, "Comment on America's Only Peacetime Inflation: the 1970's," in Christina Romer and David Romer (eds.), Reducing Inflation. NBER Studies in Business Cycles, Volume 30.
Whittle, Peter, 1953, "The Analysis of Multiple Stationary Time Series," Journal of the Royal Statistical Society, Series B, vol. 15, pp. 125-139.
Timothy Cogley, University of California, Davis Thomas J. Sargent, New York University and Hoover Institution
(1) By atheoretical we mean that the model's parameters are not explicitly linked to parameters describing decision makers' preferences and constraints.
(2) See Sargent (1999) for more about this interpretation of the two halves of Lucas's 1976 paper.
(3) To take a concrete example, consider the model of Rudebusch and Svennson (1999). Their model consists of an IS curve, a Phillips curve, and a monetary policy rule, and they endow the central bank with a loss function that penalizes inflation variance. The Phillips curve has adaptive expectations with the natural rate hypothesis being cast in terms of Solow and Tobin's unit-sum-of-the weights form. That form is consistent with rational expectations only when there is a unit root in inflation. The autoregressive roots for the system are not, however, determined by the Phillips curve alone; they also depend on the choice of monetary policy rule. With an arbitrary policy rule, the autoregressive roots can be inside, outside, or on the unit circle, but they are stable under optimal or near-optimal policies. When a shock moves inflation away from its target, poorly chosen policy rules may let it drift, but well-chosen rules pull it back.
(4) These expressions supercede those given in Cogley and Sargent (2001). We are grateful to Simon Potter for pointing out an error in our earlier work and for suggesting ways to correct it.
(5) The probability that random walk trajectories will leave the nonexplosive region increases with the distance between t and T, but this tendency for [pi]([[theta].sub.t+1], Q) to decrease also affects the normalizing constant for equation (9). What matters is the relative likelihood of future instability, not the absolute likelihood.
(6) This formulation is closely related to the multi-factor stochastic volatility models of Aguilar and West (2001), Jacquier, Polson, and Rossi (1999), and Pitt and Shephard (1999).
(7) See appendix B for details.
(8) An earlier draft experimented with alternative values of [gamma] that push [??] toward zero, i.e. in the direction of less variation in [theta]. We found only minor sensitivity to changes in [gamma].
(9) Two possibilities come to mind. There may be omitted lagged variables, so that the nominal interest innovation contains a component that is predictable based on a larger information set. The Fed may also condition on current-quarter reports of commodity prices or long term bond yields that are correlated with movements in inflation or unemployment.
(10) K is the number of elements in ?, and T represents the number of years. We focused on every fourth observation to keep V to a manageable size.
(11) This roundabout method for approximating [V.sub.[??]] was used because the direct estimate was contaminated by a few outliers, which dominated the principal components decomposition on which Sims-Zha bands are based. The outliers may reflect shortcomings of our linear approximations near the unit root boundary.
(12) If the elements of [[??].sub.t] were uncorrelated across t, it would be natural to focus instead on the diagonal elements of [V.sub.[pi]], e.g. by graphing the posterior mean plus or minus two standard errors at each date. But [[??].sub.t] is serially correlated, and Sims and Zha argue that a collection of principal components bands better represents the shape of the posterior in such cases.
(13) Setting Q = 0 in the Kalman filter implies [P.sub.t+1|t] = [P.sub.t|t]. Then the covariance matrix in the backward recursion of the Gibbs sampler would be [P.sub.t|t+1] = 0, implying a perfect correlation between draws of [[theta].sub.t+1] and [[theta].sub.t].
(14) Outliers are collected in the end bins.
(15) We also performed a Monte Carlo simulation to check the size of the Andrews test; the results confirmed that size distortions do not explain the failure to reject.
(16) This assumes that shifts in policy are the only source of drift in [theta].
(17)We chose Andrews's tests because the CCG rule is estimated by GMM. The Nyblom-Hansen test is based on ML estimates.
(18) More precisely, the figures illustrate partial sums of the first principal component for [DELTA][[theta].sub.t|T].
(19) See J. Bradford DeLong (1997) and John Taylor (1997).
(20) See Stokey (1989) for a description of how households' optimum problems and market clearing are embedded in the mapping (40). Stokey clearly explains why the policies h, g are history dependent.
(21) Sargent (1999) calls this a "Phelps problem".
(22) See Sargent (1999) and Evans and Honkapohja (2001) for examples and for precise statements of the meanings of 'average' and 'eventually'. Equation (50) embodies the 'mean dynamics' of the system. See Cho, Williams, and Sargent (2002) and Sargent (1999). They also describe how 'escape dynamics' can be used to perpetuate adaptation.
(23) As Bray and Kreps (1986) and Kreps (1998) describe, before it attains a self-confirming equilibrium, such an adaptive system embodies irrationality because, while the self-confirming equilibrium is a rational expectations equilibrium, the least squares transition dynamics are not. During the transition, both government and private agents are basing decisions on subjective models that ignore sources of time-dependence in the actual stochastic process that are themselves induced by the transition process. Bray and Kreps (1986) and Kreps (1998) celebrate this departure from rational expectations because they want models of learning about a rational expectations equilibrium, not learning within a rational expectations equilibrium.
(24) In their Phillips curve example, Kydland and Prescott (1977) explicitly use an example of system (45), (46), (47) and compute its limit to argue informally that inflation would converge to a suboptimal 'time consistent' level. Unlike Lucas (1976), Kydland and Prescott's mapping (47) was [??] = h. Lucas's focus was partly to criticize versions of mapping (47) that violated rational expectations, but that was not Kydland and Prescott's concern.
(25) The literature on least squares learning itself provides substantial support for this perspective by proving almost sure convergence to a self-confirming equilibrium. Sargent (1999) and Cho, Williams, and Sargent (2002) arrest such convergence by putting some forgetting or discounting into least squares.
(26) Sargent and Wallace (1976), Sims (1982), and Sargent (1984) have all expressed versions of this point of view.
(27) Partly we appealed to adaptive models like ones described by Sims (1988) and Sargent (1999), which emphasize changes in the Fed's understanding of the structure of the economy.
(28) The elements of [sigma] are redundant conditional on [H.sup.T].
(29) The measurement innovations are informative for [R.sub.t], which depends indirectly on [sigma], but this information is subsumed in [H.sub.t].
(30) The formulas are a bit different at the beginning and end of the sample.
Table 1: Posterior Mean Estimates of Q Stability Imposed Stability Not Imposed VAR Orderings tr(Q) max([lambda]) tr(Q) max([lambda]) i, [pi] , u 0.055 0.025 0.056 0.027 i, u, [pi] 0.047 0.023 0.059 0.031 [pi] , i, u 0.064 0.031 0.082 0.044 [pi] , u, i 0.062 0.031 0.088 0.051 u, i, [pi] 0.057 0.026 0.051 0.028 u, [pi] , i 0.055 0.024 0.072 0.035 Note: The headings tr(Q) and max([lambda]) refer to the trace of Q and to the largest eigenvalue. Table 2 Principal Components of Q Variance Percent of Total Variation 1st PC 0.0230 0.485 2nd PC 0.0165 0.832 3rd PC 0.0054 0.945 4th PC 0.0008 0.963 5th PC 0.0007 0.978 Note: The second column reports the variance of the nth component (the nth eigenvalue of Q), and the third states the fraction of the total variation (trace of Q) for which the first n components account. The results refer to the minimum-Q ordering [i, u, [pi]]'. Table 3: Principal Component Decomposition for Sims-Zha Bands [V.sub.[??]] [V.sub.[??]] 1st PC 0.521 0.382 2nd PC 0.604 0.492 3rd PC 0.674 0.597 4th PC 0.715 0.685 5th PC 0.750 0.727 6th PC 0.778 0.767 8th PC 0.822 0.820 10th PC 0.851 0.856 [V.sub.g[pi][pi] [V.sub.A] 1st PC 0.374 0.662 2nd PC 0.490 0.801 3rd PC 0.561 0.870 4th PC 0.612 0.906 5th PC 0.662 0.936 6th PC 0.701 0.949 8th PC 0.756 0.972 10th PC 0.800 0.984 Note: Entries represent the cumulative percentage of the total variation (trace of V) for which the first n principal components account. Table 4: Andrews's sup-LM Test Nominal Interest Unemployment Inflation VAR Data F F F F Power 0.136 0.172 0.112 0.252 Note: An 'F' means the test fails to reject at the 1 0 percent level when applied to actual data. Entries in the second row refer to the fraction of artificial samples in which the null hypothesis is rejected at the 5 percent level. Table 5: The Nyblom-Hansen Test Nominal Interest Unemployment Inflation VAR Data F F F F Power 0.076 0.1 70 0.086 0.234 See the note to table 4. Table 6: Andrews's sup-Wald Test Nominal Interest Unemployment Inflation VAR Data F F R 1 % R 5% Power 0.173 0.269 0.711 0.296 Note: 'R x%' signifies a rejection at the x percent level. Table 7: Stability of the CGG Policy Rule sup-LM sup-Wald Data F F Power 0.143 0.248 See the note to table 4. Table 8: Stability of the First Principal Component sup-LM sup-Wald Data F R 5% Power 0.220 0.087 See the note to table 4.
|Printer friendly Cite/link Email Feedback|
|Author:||Cogley, Timothy; Sargent, Thomas J.|
|Publication:||Federal Reserve Bank of Atlanta, Working Paper Series|
|Date:||Oct 25, 2003|
|Previous Article:||Learning and monetary policy shifts.|
|Next Article:||Fussing and Fuming over Fannie and Freddie: how much smoke, how much fire?|