# El Teorema del Limite Central Funcional con algunas aplicaciones a raices unitarias con cambios estructurales.

Understanding the Functional Central Limit Theorems with Some Applications to Unit Root Testing with Structural Change

1. INTRODUCTION

The application of different unit root statistics is by now a standard practice in empirical work. In spite of being a practical issue, these statistics have complex nonstandard distributions that depend on the functionals (1) of some stochastic processes, and their derivations represent a challenge even for many theoretical econometricians. These derivations are based on rigorous and fundamental statistical tools which are not (very) well known by standard econometricians. This paper aims to plug this gap by explaining in a simple way one of these fundamentals tools: the Functional Central Limit Theorem. To this end, this paper analyzes the foundations and applicability of two related versions of the Functional Central Limit Theorem within the framework of a unit root with a structural break.

Four decades ago, the empirical study of key macroeconomic variables was done through the use of the ARMA models proposed by Box and Jenkins (1970). In this type of models, first and second moments depend upon time separation but do not depend on the time variable itself. Hence, these models are covariance stationary (2), whose behavior reverts to a time invariant unconditional mean and where the associated methodology is based on the steps of identification, estimation and diagnostic (3).

However, the assumptions underlying ARMA models are not suitable for modeling macroeconomic series, which usually exhibit an upward trend over time. Hence, any model that aims at representing macroeconomic data must include such a trend. One of the most popular approach to this task is the deterministic trend model: [y.sub.t] = [mu] + [delta]t + [u.sub.t], t = 1, ..., T, where [mu] and [delta] are constants, [u.sub.t] ~ N(0,[[sigma].sup.2.sub.u]) and [[sigma].sup.2.sub.u] > 0. Since a stationary process is obtained after subtracting [delta]t this process is called trend stationary. Notice also that each realization of [u.sub.t] only has a contemporaneous effect on [y.sub.t].

An alternative approach considers the data generating process as autoregressive, containing a unit root: [y.sub.t] = [mu] + [alpha][y.sub.t-1] + [u.sub.t], where t = 1, ..., T, [mu] is a constant, [alpha] = 1, [y.sub.0] is an initial condition, [u.sub.t] ~ N(0,[[sigma].sup.2.sub.u]) and [[sigma].sup.2.sub.u] > 0. In this case, [y.sub.t] = [y.sub.0] + [mu]t + [[summation].sup.t.sub.(i=1)][u.sub.i] or, equivalently, the realization of any [u.sub.i] has a permanent effect on the level of [y.sub.t] and the appropriate procedure to obtain a stationary series is to work on first differences [DELTA][y.sub.t] = [y.sub.t] - [y.sub.t-1].

From an economic viewpoint, these two approaches require the identification of the type of processes representing macroeconomic data and to understand the long term effects of shocks. Also, based on a predictive perspective, this distinction is nontrivial since in the deterministic trend model the forecasting error has a constant variability whereas in the stochastic case this element has an increasing variability (4).

Turning back to empirical concerns, the unit root framework allows us to consider a series [{[y.sub.t]}.sup.T.sub.t = 0] that obeys a first order autoregressive process [y.sub.t] = [mu] + [alpha][y.sub.t - 1] + [u.sub.t], t = 1, ..., T, where [mu] and [alpha] are constants, [y.sub.0] is an initial condition, [u.sub.t] ~ N(0, [[sigma].sup.2.sub.u]) and [[sigma].sup.2.sub.u] > 0. A first conclusion to be arrived at is that the effect of shocks on the dependent variable is linked to the unrestricted value of [alpha], an assertion that can be confirmed after manipulating the previous expression: [y.sub.t] = [[alpha].sup.t][y.sub.0] + [mu] [[summation].sup.t.sub.(i=1)] [[alpha].sup.t - i] + [[summation].sup.t.sub.(i=1)] [[alpha].sup.t - i] [u.sub.i]. For [mu] = 0, the process reduces to

[y.sub.t] = a[y.sub.t - 1] + [u.sub.t] (1)

and allows for testing

[H.sub.0]: [alpha] = 1 against [H.sub.1]: [absolute value of [alpha]] < 1. (2)

The study by White (1958) was the first to perform such a procedure: in order to test [H.sub.0] against [H.sub.1] with a sample of size T and the OLS estimator [??] for parameter [alpha], under the null hypothesis, he calculated that

T/[square roof of 2]([??] - 1) [right arrow] [[[integral].sup.1.sub.0]W(r)dW(r)/[[integral].sup.1.sub.0]W[(r).sup.2]dr] = 1/2 [W[(1).sup.2] - 1/[[integral].sup.1.sub.0]W[(r).sup.2]dr]. (3)

In the previous expression (T/[square root of 2]) ([??] - 1) denotes a centered and standardized estimator for [alpha], a random variable, and [right arrow] denotes weak convergence of probability measures. This result was an application of a theorem due to Donsker (1951) and the asymptotic distribution was formulated in terms of functionals of a standard Wiener process W whose details and properties are to be examined. It is worth mentioning that this result is not independent of the correlation between the disturbance terms [u.sub.t] (assumed to be zero in this case for sake of simplicity) and the fact that there is no specification error when estimating [alpha].

Another study in this line was that of Dickey and Fuller (1979), who assumed normal i.i.d. disturbances and developed several one-tailed tests with the following rejection rule: for a given confidence level, if the (properly transformed) centered estimator [??] - 1 yields a value that is low relative to a critical value, then the unit root hypothesis is rejected. In order to understand the previous rule, consider equation (1) which is equivalent to

[DELTA][y.sub.t] = [b.sub.0][y.sub.t - 1] + [u.sub.t], (4)

with [b.sub.0] = [alpha] - 1. Therefore, [alpha] = 1 holds true if and only if [b.sub.0] = 0. In this context, the Dickey-Fuller (DF) test is simply the t statistic (used when testing for unit roots) for the significance of [y.sub.t - 1] in (4). When lagged values of [DELTA][y.sub.t] are included in (4), the implied t statistic is known as the (lag) augmented Dickey-Fuller test or ADF test.

Te analysis by Dickey and Fuller (1979) is done by considering three types of autoregressive models: without an intercept or (deterministic) trend, with an intercept but without a trend, and with both an intercept and a trend. In this particular study, assumptions allow the asymptotic distributions to be represented through moment generating functions. Monte Carlo simulations allow the authors to compare the power of these tests with those of (autocorrelation-based) Q statistics proposed by Box and Pierce (1970). Their main results are: firstly, Q statistics are systematically less powerful; secondly, the performance of Dickey-Fuller test is uniformly superior when there is no misspecification error (5); and thirdly, there is evidence that Dickey-Fuller tests are biased towards not rejecting the null hypothesis for values of the autoregressive coefficient a arbitrarily close to 1.

For our purposes, a simple way to illustrate the role of specification is provided by generating samples from the data generating process [y.sub.t] = [y.sub.t-l] + [u.sub.t], [u.sub.t] ~ N(0,1). The distribution of T([??] - 1) is plotted under three cases (see Figure l): when there is no specification error, when the intercept is redundant and when both intercept and trend are redundant. It can be appreciated that the simulated distributions progressively move to the left and tabulated critical values tend to be higher (in absolute value) as far as redundant regressors are included. This makes the test biased towards not rejecting the null hypothesis and, in this sense, their power is reduced.

So far, this brief review shows that, for the first half of the 1980s, unit root econometrics exhibited two well-defined limitations: vulnerability to misspecification and to local stationary alternatives, and that each of them implies an expected loss of power. Additionally, the recurrent use of normal i.i.d. disturbances considerably reduces the applicability of these approaches by applied researchers. Two important advances were produced during the second half of that decade. Firstly, Phillips (1987a) proposed an asymptotic theory under very general conditions for integrated processes, which meant that the subsequent discussion was to be conducted under firmly established foundations, and secondly, Perron (1989) identified the presence of a structural break as an element that also reduced the power of the augmented Dickey-Fuller tests.

The reader must also take into account that none of these two advances could have been devised without discussing the notion of weak convergence of probability measures. To understand the need for this concept, it is necessary to consider first the classical Central Limit Theorem which, under conditions that vary along different versions, allows for the distribution of the centered and standardized sample mean to converge to those corresponding to a normal standard distribution. In an analogous fashion, this is a desirable property when dealing with dependent heterogeneously distributed disturbances that do not satisfy any of the normal i.i.d. assumptions in conventional autoregressive models. Indeed, this idea is summarized by several versions of the Functional Central Limit Theorem which, in a wider sense, states that the distribution of standardized partial sums converges to those of a functional of a standard Wiener process W. As described in Brzezniak and Zastawniak (1999), for a fixed value of r [member of] [0,1], the density [f.sub.W(r)] of the random variable W(r) is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Therefore, in order to fully understand the advances in this literature, two requisites are needed. Firstly, one must formally understand both the mathematical and probabilistic structure of the data-generating processes in order to state the main (weak) convergence results. Secondly, and most importantly, one needs to recognize the importance of incorporating particular problems faced by researchers into the analysis, because their formalization leads to the development of new specific procedures and testing statistics. This task is frequently undertaken by employing creative alternative hypotheses, which help to identify current limitations.

This paper reviews a selection of theoretical advances in the unit root literature, starting from the second half of the 1980s and finishing with several contemporary developments. The presentation emphasizes both the relevance of the Functional Central Limit Theorem to the discussion as well as the econometric considerations behind novel approaches. Since the time series literature can consider the case of multiple structural breaks, attention is focused here only on a singular structural break. An applied survey that considers multiple breaks can be found in Glynn and Perera (2007).

This paper is organized as follows: Section 2 describes the probabilistic structure of the disturbance sequences involved, a building block for this literature. Section 3 details a general version of the Functional Central Limit Theorem that covers a wide range of disturbance processes. Section 4 presents the asymptotic theory for integrated time series proposed by Phillips (1987a). Section 5 generalizes the former framework in order to consider near-integrated processes, as made by Phillips (1987b). Section 6 studies linear processes and the class of modified or M tests proposed by Stock (1999), which is intended to be employed in later developments. Section 7 presents three econometric applications of the above theories in the context of unit root testing when structural change is present. Section 7.1 details the warning made by Perron (1989) about the effects of structural breaks on the power of Dickey-Fuller statistics and the methodology proposed for dealing with an (assumed) exogenous break. Section 7.2 covers the critique made by Zivot and Andrews (1992) to this exogeneity assumption and the new test proposed by them, which involves estimating an endogenous structural break. Since none of the two previous studies deals with the power loss due to local-to-unity alternatives, Section 7.3 illustrates the results of Perron and Rodriguez (2003), who develop efficient (power increasing) unit root tests under structural break and extend the results obtained by Elliot et al. (1996) for linear processes. Section 8 shows some empirical applications. Section 9 concludes with a retrospective overview of the developments in statistical inference with integrated series and the role played by the theory of diffusion processes.

2. ASYMPTOTIC THEORY: THE STRUCTURE OF WEAKLY DEPENDENT AND HETEROGENEOUSLY DISTRIBUTED DISTURBANCES

Most of the econometric theory to be reviewed by us in this paper is related with extensions of the following autoregressive model: [y.sub.t] = [alpha][y.sub.t - 1] + [u.sub.t], t = 1, 2, .... The main objective here is to contrast the null hypothesis [H.sub.0]: [alpha] = 1 when a sample of T observations [{[y.sub.t]}.sup.T.sub.t = 1] is available, and the previous section introduced this task in some detail. However, a major limitation is imposed by the assumption that the unobservable disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] is composed by i.i.d. normal random variables. Thus, the empirical applicability of several procedures would be heavily restricted and it becomes desirable to cover a case intended to be as general as possible. This case is formalized by considering a sequence of disturbance terms [{[u.sub.t]}.sup.[infinity].sub.t = 1] that are dependent and heterogeneously distributed (6). A way to control the extent to which this dependence occurs, that permits convergence results to be derived, is to define a measure of dependence among the random variables contained in a sequence. For this measure to be well-defined it needs to be associated with a specific probabilistic structure. The conditions that bind the extent of dependence are called mixing conditions. Results expressed below follow both White (1984) and Herrndorf (1984).

Consider a probabilistic space ([OMEGA], F, P), where [OMEGA] is the sample space containing all of the possible results for an experiment, F is a set of events of [OMEGA] ([sigma]-field) and P : F [right arrow] [0,1] is a probability measure (P([OMEGA]) = 1) over events contained in F. Next, consider a sequence of random variables [{[u.sub.t]}.sup.[infinity].sub.t = 1] (that is, [u.sub.t]: [OMEGA] [right arrow] R is a Borel-measurable real function for all t) on ([OMEGA], F, P). Let m and n denote two positive integers and consider a track of disturbances {[u.sub.t] : n [less than or equal to] t [less than or equal to] n + m}. Since we will need to assign probabilities to events involving the random variables contained in such a track, and since such events need to be included into a family with a [sigma]-field structure, it becomes necessary to define the [sigma]-field generated by the random variables contained in the track as the smallest [sigma]-field that contains events for which each [u.sub.t], t = n, ..., n + m is measurable.

Definition 1 Let B denote the Borel [sigma]-field on R. The Borel [sigma]-field generated by the random variables included in the track {[u.sub.t]: n [less than or equal to] t [less than or equal to] n + m}, [B.sup.n + n.sub.n] = [sigma]([u.sub.t]: n [less than or equal to] t [less than or equal to] n + m), is the smallest [sigma]-field that contains

1. all the sets of the form [X.sup.n - 1.sub.i = 1] R [X.sup.n + m.sub.i = n] [B.sub.i] [X.sup.[infinity].sub.i = n + m + 1] R with [B.sub.i] [member of] B,

2. the complement [A.sup.c] of each set A in [B.sup.n + m.sub.n], and

3. the union [[union].suo.[infinity].sub.i = 1] [A.sub.i] of each sequence {[A.sub.i]} contained in [B.sup.n + m.sub.n].

Intuitively, [B.sup.n + m.sub.n] is the smallest collection of events that allows to assign probabilities to events, for example, of the form {[omega] [member of] [OMEGA] : [u.sub.n] ([omega]) < [a.sub.1] and [u.sub.n + 1] ([omega]) < [a.sub.2]} [member of] F, where [a.sub.1], [a.sub.2] [member of] R.

The notion of mixing is needed to make explicit the fact that, although two arbitrary sets of random variables can exhibit dependence; this vanishes as time separation increases (7).

In order to illustrate the former idea, consider the track composed by the first n elements of [{[u.sub.t]}.sup.[infinity].sub.t = 1] and denote it by [{[u.sub.t]}.sup.n.sub.t = 1]. Within this track, two non-overlapping subtracks can be identified: a first one starting at [u.sub.1] and a second one ending at [u.sub.n]. Let k [greater than or equal to] 1 denote the difference between time indexes corresponding to the last element of the first subtrack (denoted by m [greater than or equal to] 1) and the first element of the second subtrack (see Figure 2). Of course, the previous characterization does not completely determine both subtracks, but it allows for several cases. Indeed, the following definition of mixing coefficients employs the previous observations in order to quantify the dependence between random variables separated by k periods at least, given the first n elements of a sequence.

Definition 2 The mixing coefficients of the sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] are

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Intuitively, given the first n [greater than or equal to] 1 elements of [{[u.sub.t]}.sup.[infinity].sub.t = 1], [[alpha].sub.n](k) measures how far dependence among events contained in the [sigma]-fields H = [sigma]([u.sub.t]: 1 [less than or equal to] t [less than or equal to] m) and G = [sigma]([u.sub.t]: m + k [less than or equal to] t [less than or equal to] n) is situated from the independent case. k [greater than or equal to] 1 denotes time separation between these two sets of random variables (see Figure 2). If H and G were independent, then for any h [member of] H and g [member of] G, the condition P(g [intersection] h) = P(g)P(h) must hold true or, equivalently, it must be true that [[alpha].sub.n](k) = 0.

Since mixing coefficients only takes into account a finite number of disturbances (i.e. the first n random variables), this notion is extended to consider the highest magnitude of dependence among random variables separated by at least k periods.

Definition 3 The strong mixing coefficient of the sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] is

[alpha](k) = [sup over (n[member of]N)] [[alpha].sub.n](k), for k [member of] N.

Therefore, [alpha](k) provides a measure of dependence. If [alpha](k) = 0 for some k, events separated by k periods are independent. Also, if [alpha](k) [right arrow] [infinity] as k [right arrow] [infinity], the sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] is said to be strong mixing, and the notion of asymptotic independence must be considered too. For future reference, it is useful to emphasize for a strong mixing sequence the velocity at which [alpha](k) tends to zero or, equivalently, the rate of decay of [alpha](k). This will be denoted by [alpha](k) = O([k.sup.-v]) for some v > 0.

3. THE FUNCTIONAL CENTRAL LIMIT THEOREM

3.1. THE SKOROHOD TOPOLOGY

The logic behind the Functional Central Limit Theorem relies on the convergence of a sequence of standardized partial sums of disturbances [u.sub.t]. The limit for this new sequence is W, a standard Wiener process. Correspondingly, the elements of this sequence of partial sums are contained on D = Z)[0,1] which is the space of right-continuous functions whose left limit exists everywhere on the unit interval, also referred to as cadlag (8) functions.

The concept of convergence mentioned above must be understood as the weak convergence of a sequence of random functions. As will be shown, in order to guarantee the convergence results, it is sufficient to endow D with a metric d in such a way that (D,d) is a complete separable space, so that the limit of any convergent sequence of elements contained in D is also contained in D. The concepts and results discussed here are strongly based upon Billingsley (1968), although our presentation follows Davidson (1994). The following definition characterizes the properties of the functions to be considered hereafter.

Definition 4 D[0,1] is the space of functions x: [0,1] [right arrow] R satisfying the following conditions:

1. [lim.sub.t [right arrow] 1] + x(t) = x(r) for r [member of] (0,1],

2. [lim.sub.t [right arrow] 1] - x(t) exists for r [member of] (0,1],

3. x(1) = [lim.sub.t [right arrow] 1] - x(t).

Therefore, only first class discontinuities are admitted. A first candidate for a suitable metric for D is the uniform metric [d.sub.U], defined as

[d.sub.U](x,y) = [sup.sub.r][absolute value of (x(r) - y(r))], x, y [member of] D.

The above metric states that two functions are arbitrarily close if the maximum difference between ordinates corresponding to the same abscissa is small. In that case, the metric space (C, [d.sub.U]) (9) is complete but, since C [subset] D, completeness does not necessarily generalize to (C, [d.sub.U]). In fact, it is not difficult to show that the limit of sequences of cadlag functions does not necessarily lie on D under [d.sub.U]. Thus, (C, [d.sub.U]) is not a complete space and the strategy adopted by Billingsley (1968) consists in metrizing D as a separable complete space by introducing the Skorohod metric.

Definition 5 (Skorohod metric) Let [LAMBDA] be the collection of all homeomorphisms (10) [lambda]: [0,1] [right arrow] [0,1] with [lambda](0) = 0 and [lambda](1) = 1. The Skorohod metric is defined as

[d.sub.s](x,y) = [inf.sub.[lambda][member of][LAMBDA]] {[epsilon] > 0: [sup.sub.r][absolute value of ([lambda](r) - r)] [less than or equal to] [epsilon] and [sup.sub.r][absolute value of (x(r) - y([lambda](r)))] [less than or equal to] [epsilon]}.

This metric is defined in order to overcome the following key limitation in the (C, [d.sub.U]) space: given two cadlag functions x, y [member of] D, under the uniform metric x and y are arbitrarily near to each other only if the distance between the functions is uniformly small, whereas the Skorohod metric also takes into account the fact that the distance between the arguments of these functions is small.

The metric space (D, [d.sub.S]) induces a topological space. As usual, an open ball of radius r > 0 around x [member of] D is defined as B(x, r) = {y [member of] D: [d.sub.S](x,y) < r}. Open balls like the previous one generate a topology on (D, [d.sub.S]) that is referred to as the Skorohod topology and denoted by [T.sub.S]. In this sense the topological space (D, [T.sub.S]) is a metrizable topological space.

However, D is not complete under [d.sub.S] yet. For this purpose, a new equivalent (11) metric (the Billingsley metric) to [d.sub.S] is introduced in such a way that these two metrics induce the same topology in D, the Skorohod topology. The only difference now lies in the fact that the new metric space is complete.

Definition 6 (Billingsley metric) Let [LAMBDA] be the collection of all homeomorphisms [lambda]: [0,1] [right arrow] [0,1], with [lambda](0) = 0 and [lambda](1) = 1 satisfying

[parallel] [lambda] [parallel] = [sup.sub.t [not equal to] s] [absolute value of (log ([[lambda](t) - [lambda](s)/t - s)])] < [infinity].

The Billingsley metric is

[d.sub.B](x,y) = [inf.sub.[lambda][member of][LAMBDA]] {[epsilon] > 0: [sup.sub.r][parallel] [lambda] [parallel] [less than or equal to] [epsilon], [sup.sub.r][absolute value of (x(t) - y([lambda](t)))] [less than or equal to] [epsilon]}.

The next two results formalize the fact commented above.

Theorem 1 In D, metrics [d.sub.B] and [d.sub.S] are equivalent.

Proof. See Davidson (1994), Theorem 28.7, p. 464.

Theorem 2 The space (D, [d.sub.B]) is complete.

Proof. See Davidson (1994), Theorem 28.8, p. 464.

3.2. THE MAIN RESULT (HERRNDORF, 1984)

The main result to be considered in this section is a generalization of the Central Limit Theorem for the case of functional spaces such as D, known as the Functional Central Limit Theorem. In order to understand the theorem, the concepts previously defined are complemented with additional conditions for the disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] and, specifically, for the sequence of partial sums [{[S.sub.T]}.sup.[infinity].sub.t = 1] where [S.sub.T]= [[summation].sup.T.sub.(t = 1)] [u.sub.t]. First, the disturbances are required to have zero mean and finite variance

E([u.sub.t]) = 0 and E([u.sup.2.sub.t]) < [infinity], for t = 1,2, .... (5)

Second, the variance of partial sums must converge

[lim.sub.T[right arrow][infinity]]E([T.sup.-l] [S.sup.2.sub.T]) = [[sigma].sup.2] > 0, for some [sigma] > 0. (6)

Consider now the space D endowed with the Skorohod topology with Borel [sigma]-field B and define the random functions [W.sub.T]:[OMEGA] [right arrow] R by

[W.sub.T](r) = [1/[square root of (T[sigma])]] [S.sub.[rT]], r [member of] [0,l], T = 1,2, ...

where [absolute value of *] denotes the integer part of its argument. Each [W.sub.T] is a measurable map from ([OMEGA],F) into (D,B). The sequence [{[W.sub.T]}.sup.[infinity].sub.t = 1] is said to satisfy the invariance principle if it is weakly convergent to a standard Wiener process W on D. For the development of this result, let [[parallel]u[parallel].sub.[beta]] be defined as

[[parallel]u[parallel].sub.[beta]] = [(E [[absolute value of u].sup.[beta]]).sup.1/[beta]] for [beta] [member of] [1, [infinity]]

[[parallel]u[parallel].sub.[beta]] = ess sup [absolute value of u] for [beta] [member of] [infinity].

As will be shown in the next section, the following version of the Functional Central Limit Theorem is the starting point for recent literature on unit roots. This result is due to Herrndorf (1984).

Theorem 3 (Herrndorf, 1984 Corollary 1 p. 142) Let [beta] [member of] (2, [infinity]] and [gamma] = 2/[beta].

If[{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies (5)-(6),

[[summation].sup.[infinity].sub.(k=l)] [alpha][(k).sup.l - [gamma]] < [infinity] and lim [sup.sub.t[member of]N][[parallel][u.sub.t][parallel].sub.[beta]] < [infinity],

then [W.sub.T] [right arrow] W as T [right arrow] [infinity].

Proof. See Herrndorf (1984), Corollary 1, p. 148.

4. ASYMPTOTICS FOR INTEGRATED PROCESSES (PHILLIPS, 1987A)

The two previous sections stated the probabilistic foundations for the econometric developments to be considered in the following lines. The first of these works is due to Phillips (1987a), who developed a rather general asymptotic theory for processes that contain a unit root.

4.1. PROBABILISTIC STRUCTURE OF TIME SERIES WITH A UNIT ROOT

The first study to develop a general framework for testing unit roots was due to Phillips (1987a). This study established weak dependence conditions for the disturbance sequence in order to propose a new asymptotic theory and develop new testing statistics. Exposition here is focused on the first task because of its application in subsequent studies. One starts by considering a data generating process for a sequence [{[y.sub.t]}.sup.[infinity].sub.t = 1] that satisfies

[y.sub.t] = [alpha][y.sub.t - 1] + [u.sub.t], t = 1,2, ..., (7)

with

[alpha] = 1. (8)

Under such a representation [y.sub.t] = [S.sub.t] + [y.sub.0], where [S.sub.t] = [[summation].sup.t.sub.(i = l)] [u.sub.i] and [y.sub.0] is a random initial state whose distribution is assumed to be known. Interest is placed here on the limiting distribution of standardized partial sums defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

where [sigma] is a positive constant. Note that the sample paths [W.sub.T](r) lie in D. It is worth emphasizing that Phillips (1987a) endows D with the uniform metric [d.sub.U] and this is done in order to show that each random function [W.sub.T](r) lies on D. In addition, by adopting assumptions about the disturbances that are less restrictive than i.i.d., Phillips (1987a) showed that [W.sub.T](r) weakly converges to a standard Wiener process W(r) through a direct application of the Functional Central Limit Theorem developed by Herrndorf (1984). Assumptions regarding [{[u.sub.t]}.sup.[infinity].sub.t = 1] are grouped in the following statement and are intended to be as general as possible.

Assumption 1 (Phillips, 1987a, p. 280) The disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies the following conditions:

1. E([u.sub.t]) = 0 for t = 1,2, ...,

2. [sup.sub.t]E[[absolute value of t].sup.[beta]] < [infinity] for [beta] > 2,

3. [[sigma].sup.2] = [lim.sub.T[right arrow][infinity]] [T.sup.-1]([S.sub.T.sup.2]) exists and [[sigma].sup.2] > 0, with [S.sub.T] = [[summation].sup.T.sub.(t = 1)][u.sub.t],,

4. [{[u.sub.t]}.sup.[infinity].sub.t = 1] is strong mixing, with strong mixing coefficients [alpha](k) that satisfy

[[summation].sup.[infinity].sub.(k = l)] [alpha][(k).sup.1 - 2/[beta]] < [infinity]. (10)

As usual, condition 1 imposes a zero mean disturbance for every t. Condition 2 bounds the probability of outliers: the higher [beta], the lower the probability of outliers. As long as such [beta] > 2 exists, all of the lower absolute moments of each [u.sub.t] (including the second one) are finite. Condition 3 is conventional along central limit theory, concerning the convergence of the average variance of partial sums [S.sub.T]. Condition 4 bounds the temporal dependence among the disturbances contained in [[summation].sup.T.sub.(t = 1)], and the elements covered in previous sections allow it to be asserted that although dependence can exist between any pair of disturbances, it vanishes as time separation increases. Hence, any two random disturbances sufficiently distant along time are almost independent. Finally, the summability condition (10) is satisfied as long as the mixing decay rate is [alpha](k) = O([k.sup.-v]) for some v > 0 in such a way that -v(1 - 2/[beta]) < 1 or, equivalently v > [beta]/([beta] - 2).

It is interesting to notice that as T increases the constant sections conforming [W.sub.T](r) [member of] D shrink, and their discontinuities become less perceptible (see Figure 3), reflecting how this sequence of random functions in D converges to a random function in C, the standard Wiener process. This property is exploited by Phillips (1987a) through two lemmas. The first lemma is the Functional Central Limit Theorem shown in Theorem 3. The second result is widely known as the Continuous Mapping Theorem and states conditions under which convergence to a Wiener process can be preserved (almost everywhere) along continuous transformations.

Lemma 4 (Phillips, 1987a, p. 281) If [{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies Assumption 1 then, as T [right arrow] [infinity], [W.sub.T] [right arrow] W a standard Wiener process on C.

Proof. See Herrndorf (1984), Corollary 1, p. 142.

Lemma 5 (Phillips, 1987a, p. 281) If [W.sub.T] [right arrow] W(r) as T [right arrow] [infinity] and h is a continuous functional on D almost everywhere (a.e.) then h([W.sub.T]) [right arrow] h(W) as T [right arrow] [infinity].

Proof. See Billingsley (1968), Corollary 1, p. 31.

4.2. AN ASYMPTOTIC THEORY FOR ECONOMETRICIANS

The importance of the two previous lemmas relies on the fact that they allow the derivation of convergence rules often employed by theoretical econometricians. These rules are summarized in the next theorem.

Theorem 6 (Phillips, 1987a, p. 282) If [{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies Assumption 1 and if

[Sup.sub.t][[absolute value of (u.sub.t)].sup.[beta] + [epsilon]] < [infinity] for some [epsilon] > 0.

(where [beta] > 2 is the same as that in Assumption 1), then as T [right arrow] [infifnity]:

1. [T.sup.-2] [[summation].sup.T.sub.(t = l)] [y.sup.2.sub.t - 1] [right arrow] [[sigma].sup.2] [[integral].sup.1.sub.0] W [(r).sup.2] dr,

2. [T.sup.-1] [[summation].sup.T.sub.(t = l)] [y.sub.t - 1] ([y.sub.t] - [y.sub.t - 1]) [right arrow] ([[sigma].sup.2] / 2)(W [(1).sup.2] - [[sigma].sup.2.sub.u]/[[sigma].sup.2]),

3. T([??] - l) [right arrow] 1/2(W[(1).sup.2] - [[sigma].sup.2.sub.u] / [[sigma].sup.2])/[[integral].sup.1.sub.0](r)dr,

4. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

5. [t.sub.[??]] [right arrow] ([sigma] / 2[[sigma].sub.u])(W[(1).sup.2] - [[sigma].sup.2.sub.u] / [[sigma].sup.2])/[{[[integral].sup.1.sub.0]W[(r).sup.2]dr}.sup.1/2],

where [[sigma].sup.2.sub.u] = [lim.sub.T [right arrow] [infinity]] [T.sup.-1] [[summation].sup.T.sub.(t = 1)] E([u.sup.2.sub.t]), [[sigma].sup.2] = [lim.sub.T [right arrow] [infinity]]E([T.sup.-1][S.sup.2.sub.T]) and W is a standard Wiener process on C.

Proof. See Phillips (1987a), Theorem 3.1 p. 296.

In the previous theorem, results 1 and 2 constitute derivation rules for limiting distributions. Result 3 is focused on the limiting distribution of the statistic T([??] - l), which corrects the results of White (1958) (12), among others. Result 4 states the consistency of the OLS estimator [??] in the presence of a unit root and under the general case of dependent and heterogeneously distributed disturbances. Finally, result 5 shows the asymptotic distribution of the t statistic used when testing for unit roots. It is worth mentioning that under (7) and (8) the t statistic does not follow a Student's t distribution. Since W(l) follows a normal standard distribution, W[(l).sup.2] follows a chisquared distribution with one degree of freedom. However, the functional [[[integral].sup.1.sub.0]W[(r).sup.2] dr is a random variable with a rather complex distribution, which implies that usual distributions (normal, chi-squared, t and F) employed in the stationary case are not relevant for the subsequent analysis.

Following the previous considerations, Phillips (1987a) proposed (after developing consistent estimators for the parameters [[sigma].sup.2.sub.u] and [[sigma].sub.2]) two new test statistics for the unit root hypothesis often referred to as the Z tests. Although it is important to remember that both (7) and (8) correspond only to the case of a unit root without drift or deterministic trend, the importance of this study lies on its providing us with a general theory on test statistics for the unit root hypothesis. Distributions considered here differ from those involved in the stationary case ([absolute value of [alpha]] < l). Obviously, this methodology is well suited for extensions that include both drift and deterministic trend, derived by Phillips and Perron (1988), and constitute the starting point for the study of the unit root test under structural break that we develop in the following sections.

5. ASYMPTOTICS FOR NEAR-INTEGRATED PROCESSES (PHILLIPS, 1987B)

For later discussion of the asymptotic power of unit root tests against alternative hypotheses that consider autoregressive coefficients close but not equal to one, it will be useful to consider generalizations of integrated processes often referred to as near-integrated processes and studied in detail by Phillips (1987b). The focus is on the time series [{[y.sub.t]}.sup.[infinity].sub.t = 1] which is assumed to be generated according to the following model

[y.sub.t] = [alpha][y.sub.t - 1] + [u.sub.t], t = 1,2,.... (11)

[alpha] = [e.sup.c/T], -[infinity] < c < [infinity]. (12)

In the above model, initial condition [y.sub.0] is allowed to be any random variable whose distribution is fixed and independent of T. The constant c is interpreted as a noncentrality parameter that quantifies deviations from the unit root null hypothesis that holds true when c = 0:

[H.sub.0]: [alpha] = l. (13)

Under (13), [{[y.sub.t]}.sup.[infinity].sub.t = 0] is an integrated process of order l or an I(1) process. Additionally, any c [not equal to] 0 in (12) represents a local alternative to [H.sub.0]. For future reference, the next definition formally establishes this distinction.

Definition 7 A time series [{[y.sub.t]}.sup.[infinity].sub.t = 1] that is generated by (11) and (12) with c [not equal to] 0 is called near-integrated. When c = 0 (i.e. [alpha] = l) in (12), [{[y.sub.t]}.sup.[infinity].sub.t = 1] is also called integrated.

The main objective of the present section is to present an asymptotic theory for these types of processes. Naturally, results and properties are indexed by the parameter c.

5.1. PROBABILISTIC STRUCTURE OF TIME SERIES WITH A NEAR-TO-UNIT ROOT

For a wide applicability of this asymptotic theory, some general assumptions concerning the disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 0] are necessary. For this reason, the following mixing conditions about the behaviour of the disturbances [{[u.sub.t]}.sup.[infinity].sub.t = 0] (hereby now familiar) are adopted and summarized in the next statement.

Assumption 2 (Phillips, 1987b, p. 537) The disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies

1. E([u.sub.t]) = 0 for t = l,2, ...,

2. [sup.sub.t]E[[absolute values of (u.sub.t)].sup.[beta] + [epsilon]] < [infinity] for some [beta] > 2 and [epsilon] > 0.

3. [[sigma].sup.2] = [lim.sub.T [right arrow] [infinity]] [T.sup.-1] E([S.sub.T.sup.2]) exists and [[sigma].sup.2] > 0 with [S.sub.T] = [[summation].sup.T.sub.(t = 1)] [u.sub.t],

4. [{[u.sub.t]}.sup.[infinity].sub.t = 1] is strong mixing, with strong mixing coefficients [alpha](k) that satisfy

[[summation].sup.[infinity].sub.(k = 1)] [alpha][(k).sup.1 - 2/[beta]] < [infinity]. (14)

Notice that Assumptions 1 and 2 are quite similar and the only difference relies on the existence of [epsilon] > 0 in such a way that the existence of [sup.sub.t]E[[absolute value of (u.sub.t)].sup.[beta] + [epsilon]] holds true.

On the other hand, it will be useful to represent the stochastic limit theory by means of an extensive use of certain diffusion process, which can be interpreted as the continuous time version of an AR(1) process.

Definition 8 (Ornstein-Uhlenbeck process) An Ornstein-Uhlenbeck process is a functional [W.sub.c] of the form [W.sub.c](r) = [[integral].sup.r.sub.0] [e.sup.(r - s)c] dW(s) that satisfies the stochastic differential equation

d[W.sub.c](r) = c[W.sub.c](r)dr + dW(r), [W.sub.c](0) = 0. (15)

Equation (15) is called the Ornstein-Uhlenbeck or Langevin equation. It is a particular case of the following differential equation in term of a continuous-time stochastic process X(t)

dX(t) = b (t, X(t)) dt + [sigma](t, X(t)) dW(t), (16)

where b(t,X(t)), [sigma](t,X(t)) [member of] R and W(t) is a Wiener process with t [member of] [0, [infinity]) (Oksendal, 2000). Equation (15) can also be written as

[W.sub.c](r) = W(r) + c [[integral].sup.r.sub.0] [e.sup.(r - s)c]W{s)ds

and the effect of the non-centrality parameter c becomes even more evident.

5.2. MORE ASYMPTOTIC THEORIES FOR ECONOMETRICIANS

If the value of the parameter c was fixed, it would be natural to expect, based on (12), that [alpha] [right arrow] 1 as T [right arrow] [infinity]. However, within this framework the speed of convergence of [alpha] towards 1 is controlled at O([T.sup.-1]). Equivalently, such a speed is not so fast that the effect of c on the main results in Section 4 does not vanish (13). This observation leads to the following derivation of rules and properties for regression-based statistics.

Lemma 7 (Phillips, 1987b, p. 539) If {[y.sub.t]} is a near-integrated time series generated by (11) and (12) then, as T [right arrow] [infinity].

1. [T.sup.-1/2][y.sub.[Tr]] [right arrow] [sigma][W.sub.c](r),

2. [T.sup.-3/2] [[summation].sup.T.sub.(t = 1)] [y.sub.t] [right arrow] [sigma] [[integral].sup.1.sub.0] [W.sub.c](r)dr,

3. [T.sup.-2] [[summation].sup.T.sub.(t = 1)] [y.sup.2.sub.t] [right arrow] [[sigma].sup.2] [[integral].sup.1.sub.0] [W.sub.c][(r).sup.2]dr,,

4. [T.sup.-1] [[summation].sup.T.sub.(t = 1)] [y.sub.t - 1] [u.sub.t] [right arrow] [[sigma].sup.2] [[integral].sup.1.sub.0] [W.sub.c](r)dW(r) + 1/2 ([[sigma].sup.2] - [[sigma].sup.2.sub.u]), with [[sigma].sub.u] is defined by [[sigma].sub.u] = [lim.sub.T [right arrow] [infinity]] [T.sup.-1] [[summation].sup.T.sub.(t = 1)] E([u.sup.2.sub.t]).

Proof. See Phillips (1987b), Lemma 1, p. 539.

Theorem 8 (Phillips, 1987b, p. 540) If [{[y.sub.t]}.sup.[infinity].sub.t = 0] is a near-integrated time series generated by (11) and (12) then, as T [right arrow] [infinity]:

1. T([??] - [alpha]) [right arrow] {[[integral].sup.1.sub.0] [W.sub.c](r)dW(r) + 1/2 (1 - [[sigma].sup.2.sub.u]/[[sigma].sup.2])}/[[integral].sup.1.sub.0] [{[W.sub.c](r)}.sup.2] dr,

2. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

3. [t.sub.[alpha]] [right arrow] ([sigma] / [[sigma].sub.u]) {[[integral].sup.1.sub.0] [W.sub.c](r)dW(r) + 1/2 (1 - [[sigma].sup.2.sub.u]/[[sigma].sup.2])}/[[[integral].sup.1.sub.0] [{[W.sub.c](r)}.sup.2] dr.sup.1/2].

Proof. See Phillips (1987b), Theorem 1, p. 540.

Up to this point, the theory presented can be employed in the analysis of the power of unit root tests under local alternatives. For a non-centrality parameter c arbitrarily close to 0 it is easy to show that [e.sup.c/T] [approximately equal to] 1 +c/T and this is the approach usually adopted in unit root testing. A brief illustration of this procedure can be found, for example, in Phillips (1988).

6. LINEAR PROCESSES AND MODIFIED UNIT ROOT TESTS

6.1. MOTIVATION

Although the reader must have noticed that mixing conditions are intended to be powerful tools that allow the derivation of weak convergence results for a wide range of processes, Phillips and Solo (1992) pointed out that, since much of the time series analysis is concerned with parametric models that fall into the linear process class, mixing conditions present a major drawback. The reason is quite simple: not all linear processes are strong mixing. In spite of this, they proposed a turnback to linear processes as the main focus for developing time series asymptotics.

Under the class of linear models, Phillips and Solo (1992) make extensive use of the algebraic Beveridge-Nelson decomposition (see Appendix A) to demonstrate the Functional Central Limit Theorem once provided with a disturbance sequence [{[[epsilon].sub.t]}.sup.[infinity].sub.t = 0] that is a A (see Appendix B), strongly uniformly integrable (see Appendix C) with dominating random variables [{[Z.sub.t]}.sup.[infinity].sub.t = 0] in such a way that E([Z.sup.2 + [eta].sub.t]) < [infinity] for some [eta] > 0, as well as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [F.sub.t] denotes the [sigma]-field generated by {[[epsilon].sub.t], [[epsilon].sub.t - 1], ...}. Given the latter notation it is now possible to establish the following:

Theorem 9 (Phillips and Solo, 1992) Suppose that [{[u.sub.t]}.sup.[infinity].sub.t = 0] is the linear process described by

[u.sub.t] = [[summation].sup.[infinity].sub.(j = 0)] [c.sub.j] [[epsilon].sub.t - j] = C(L)[[epsilon].sub.t], C(L) = [[summation].sup.[infinity].sub.(j = 0)] [c.sub.j] [L.sub.j]

with 0 < C(1) [equivalent to] [[summation].sup.[infinity].sub.(j = 0)] [c.sub.j] < [infinity] and [[summation].sup.[infinity].sub.(j = 0)] [c.sup.2.sub.j] < [infinity]. If [[summation].sup.[infinity].sub.(j = 1)] j [absolute value of ([c.sub.j])] < [infinity], then

1/[square root of T] [[summation].sup.[rT].sub.(t = 1)] [u.sub.t] [right arrow] [[sigma].sub.[epsilon]] C(1)W(r).

Proof. See Phillips and Solo (1992) Theorem 3.4 p. 983.

Although the latter Functional Central Limit Theorem is less general than the versions previously presented, it has been used frequently in subsequent work, especially along the developments by Stock (1999).

6.2. THE M CLASS OF INTEGRATION TESTS (STOCK, 1999)

Stock (1999) proposed a new class of statistics that directly test the implication that an integrated process has a growing variance having an order of probability (14) of [T.sup.-1/2]([O.sub.p]([T.sup.-1/2])). Since the remainder of this paper deals with this class of tests under several frameworks, the general class is examined in some detail. First, suppose the following data generating process for [{[y.sub.t]}.sup.[infinity].sub.t = 1]:

[y.sub.t] = [[delta].sub.t]([beta]) + [[summation].sup.t.sub.(i = 1)] [u.sub.i],

for t = 1, ..., T. That is, under the null hypothesis the series [{[y.sub.t]}.sup.[infinity].sub.t = 1] can be written as the sum of a purely deterministic component [[delta].sub.t]([beta]) (with the finite dimensional vector [beta] estimated by [??]) and an integrated or I(1) component that is the partial sum of weakly stationary or I(0) terms. Let the long term variance of [u.sub.t] be denoted by [[sigma].sup.2] = 2[pi][s.sub.u](0), where [s.sub.u](0) is the spectral density of [u.sub.t] at frequency zero; then, for r [member of] [0,1] we define the following functionals:

[S.sub.T](r) = 1/[square root of T] [[summation].sup.[rT].sub.(i = 1)] [u.sub.i], and

[D.sub.T] (r, [beta]) = [[delta].sub.[rT]]([beta]),

which are both cadlag versions of the components of the discrete time process. Our goal is to apply the Functional Central Limit Theorem. Such functionals are assumed to satisfy the following:

Assumption 3 (Stock, 1999 p. 137) The following two conditions hold:

1. [S.sub.T] [right arrow] [sigma]W, where 0 < [[sigma].sup.2] < + [infinity], and

2. [square root of T] {[D.sub.t](.,[??]) - [D.sub.t](.,[beta])} [right arrow] [sigma]D, where D [member of] D[0,1] has a distribution that does not depend on b or on c, the nuisance parameters describing the distribution of {[u.sub.t]}.

In line with the proposal of Phillips and Solo (1992), Stock (1999) focuses on linear processes

[u.sub.t] = C(L)[[epsilon].sub.t], [[summation].sup.[infinity].sub.(j = 1)] j[absolute value of [c.sub.j]] < [infinity] C(1) [not equal to] 0

where [[epsilon].sub.t] is a martingale difference sequence (m.d.s.) with

E[[[epsilon].sub.t]|[F.sub.t - 1]] = 0, and (17)

[sup.sub.t]E[[[epsilon].sup.2 + [kappa].sub.t] | [F.sub.t - 1]] < + [infinity] for some [kappa] > 0. (18)

As usual, condition (17) imposes zero-mean disturbances, whereas condition (18) bounds the probability of outliers in a similar fashion to condition 2 presented in Assumption 1. Also, although the deterministic component [[delta]].sub.t]([beta] is designed to potentially contain polynomial and further general trends, we consider the following three cases:

1. No deterministic trend: [[delta].sub.t]([beta]) = 0. In this case there is no need for detrending. For the sake of completeness, let the "detrended" series be [y.sup.0.sub.t] = [y.sub.t].T

2. Constant: [[delta].sub.t]([beta]) = [[beta].sub.0]. In this case [[beta].sub.0] is estimated by [[??].sub.0] = [bar.y] = [T.sup.-1] [[summation].sup.T.sub.(t = 1)] [y.sub.t] and the demeaned series is [y.sup.[mu].sub.t] [equivalent to] [y.sub.t] - [bar.y].

3. Linear trend: [[delta].sub.t]([beta]) = [[beta].sub.0] + [[beta].sub.1](t/T). If ([[beta].sub.0], [[beta].sub.1]) is estimated by the OLS estimator ([[??].sub.0], [[??].sub.1]) then the detrended series is [y.sup.[tau].sub.t] [equivalent to] [y.sub.t] - [[??].sub.0] - [[??].sub.1](t/T). Normalization of the known part of the deterministic component is done for its continuous time analogous to lie in the interval [0,1].

The three former cases are enough for subsequent analysis. Since the limiting representation in Assumption 3 depends on the nuisance parameter [[sigma].sup.2], it is assumed that there exists a consistent estimator [[??].sup.2] for [[sigma].sup.2].

Assumption 4 (Stock, 1999 p. 137) Under the null hypothesis [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The elements for the development of the new class of tests are based on both the Functional Central Limit Theorem and the Continuous Mapping Theorem. For each of the cases considered, we define [S.sup.d.sub.T] as the scaled stochastic process formed using the respective detrended series:

[S.sup.d.sub.T] (r) = [1/[square root of (T[[??].sup.2])] [y.sup.d.sub.[rT]], d = 0, [mu], [tau],

for r [member of] [0,1]. If Assumptions 3 and 4 hold, then

[S.sup.d.sub.T] = [S.sup.d] = W - [??], for certain [??] [member of] D[0,1]. (19)

For the three functional forms of the deterministic component, the following theorem shows the specific form that D adopts.

Theorem 10 (Stock, 1999 p. 137) Suppose that Assumptions 3 and 4 hold.

1. If [[delta].sub.t]([beta]) = 0, then

[S.sup.0.sub.T](r) = [1/[square root of (T [[sigma].sup.2])][y.sup.0.sub.[rT]]=W(r).

2. If [[delta].sub.t]([beta]) = [[beta].sub.0], then

[S.sup.[mu].sub.T] = [1/[square root of (T[[sigma].sup.2])][y.sup.[mu].sub.[rT]] = [S.sup.[mu]]](r) = W(r) - [[integral].sup.1.sub.0]W(s)ds.

3. If [[delta].sub.t]([beta]) = [[beta].sub.0] + [[beta].sub.1](t/T), then

[S.sup.[tau].sub.T](r) = [1/[square root of (T[[sigma].sup.2])][y.sup.[tau].sub.[rT]] [right arrow] [S.sup.[tau]](r) = W(r) - (4 - 6r)[[integral].sup.1.sub.0]W(s)ds - (6 + 12r) [[integral].sup.1.sub.0]s W(s)ds.

Proof. See Stock (1999), Theorem 1 p. 139.

This latter result is one of the cornerstones for the class of tests proposed. Also, it follows from the Continuous Mapping Theorem that if (19) holds and g is a continuous function g: D[0,1] [right arrow] R, then

g([S.sup.d.sub.T]) [right arrow] g([S.sup.d]). (20)

Let [M.sup.d] = {m: D[0,1] [right arrow] R} be the collection of functionals that satisfy the following conditions:

1. m is continuous,

2. there exists [c.sub.v], [absolute value of [c.sub.v]] < + [infinity], in such a way that P[m([S.sup.d]) [less than or equal to] [c.sub.v]] = v for all v [member of] (0,1) and

3. m(0) < [c.sub.v] for all v [member of] (0,1).

The class of testsMd referred only to continuous functionals of [S.sup.d.sub.T] and it grouped test statistics for the null hypothesis that [y.sub.t] is I(1) against the alternative that it is I(0). Since [S.sup.d.sub.T] represents any of the three detrended series mentioned, under the null hypothesis m([S.sup.d.sub.T]) has an asymptotic distribution with critical values that depend on the functional m, whereas under a fixed alternative [y.sub.t] is I(0), which suggests the construction of one tailed test of level v of the form:

reject H0: [y.sub.t] ~ if m([S.sup.d.sub.T]) [less than or equal to] [c.sub.v].

This approach, as Stock (1999) asserted, suggests working backwards from the desired asymptotic representation to the actual test statistic. The fact that the form of the functional m does not depend on the type of detrending emphasizes that the steps of eliminating the deterministic components and testing for a unit root are distinct; detrending a series when it is not required does not affect the size of the tests (although it can affect power) since [??] does not depend on [beta]. In contrast, failing to detrend a series that contains a trend typically leads to a loss of consistency and an incorrect asymptotic size.

In summary, always detrending a series before hypothesis testing does not affect the size and thus is a desirable property. Once the size is guaranteed to be fixed, power increasing procedures can be performed. The next two subsections illustrate the main idea behind the following: if a given [V.sub.T] test statistic has a limiting distribution characterized as the functional m of certain diffusion process [S.sup.d], that is

[S.sub.T] = m([S.sup.d]),

this asymptotic distribution can also be written as the that which limits a respective modified test statistic for detrended data m([S.sup.d.sub.T]):

m([S.sup.d.sub.T]) = m([S.sup.d]),

In such a way that [S.subB.T] and its modified version m([S.sup.d.sub.T]) are asymptotically equivalent.

6.3. THE MODIFIED SARGAN-BHARGAVA TEST

One of the test statistics to be covered along section 7.3 is that by Sargan and Bhargava (1983) for the model

[y.sub.t] = [[beta].sub.0] + [[summation].sup.t.sub.(s = 1)] [[alpha].sup.t - s] [[epsilon].sub.s] (21)

where [[epsilon].sub.t~]N(0, [[sigma].sup.2]), t = 1,Tand ([alpha], [[beta].sub.0], [[sigma].sup.2]) is a vector of unknown parameters. The authors proposed the following Durbin-Watson statistic for a regression of [y.sub.t] against a constant

S[B.sub.[mu]] = [[[summation].sup.T.sub.(t = 2)][([DELTA][y.sub.t]).sup.2]/[[summation].sup.T.sub.(t = 1)][([y.sup.[mu].sub.t]).sup.2]]

where [y.sup.[mu].sub.t] [equivalent to] [y.sub.t] - [bar.y]. For the case where there exists a linear deterministic trend, Bhargava (1986) considered the following extension:

[y.sub.t] = [[beta].sub.0] + [[beta].sub.1]t + [[summation].sup.t.sub.(s = 1)] [[alpha].sup.t - s][[epsilon].sub.s], (22)

where [[epsilon].sub.t~] N(0, [[sigma].sup.2]), t = 1, T, and ([alpha], [[beta].sub.0], [[beta].sub.1], [[sigma].sup.2]) is a vector of unknown parameters. A similar test was proposed:

S[B.sub.[tau]] = [[[summation].sup.T.sub.(t = 2)][([DELTA][y.sub.t]).sup.2]/[[summation].sup.T.sub.(t = 1)][(y[tau]t]).sup.2]]

where [y.sup.[mu].sub.t] [equivalent to] [y.sub.t] - [[beta].sub.0] - [[beta].sub.1] (t/T),

[[beta].sub.0] = [bar.y] - [1/2][T + 1/(T - 1)]([y.sub.T] - [y.sub.1]).

[[beta].sub.1] = [T/(T - 1)]([y.sub.T] - [y.sub.1]).

For both tests S[B.sub.[mu]] and S[B.sub.[tau]], Stock (1999) derived their limiting distribution

[T.sup.-l] S[B.sub.d] [right arrow] [[[sigma].sup.2]/var([DELTA][y.sub.t])] [[integral].sup.1.sub.0] [S.sup.d](r)2dr, for d = [mu], [tau]. (23)

After noticing in (21) and (22) that [[sigma].sup.2] = var([DELTA][y.sub.t]), (23) can be rewritten as

[T.sup.-l]S[B.sub.d] [right arrow] [[integral].sup.1.sub.0] [S.sup.d](r)2dr, for d = [mu], [tau].

Now, note that the functional

[m.sub.SB](f) = [[integral].sup.1.sub.0] f[(r).sup.2] dr

is also involved in the limiting distribution of the following functional in D[0,1]

[1/[square root of T]][[summation].sup.T.sub.(t = 1)][([y.sup.d.sub.t]).sup.2].

This latter statistic will be referred to as the modified Sargan-Bhargava, or MSB test.

6.4. A MODIFIED Z TEST

For a model that contains a constant deterministic component, Phillips (1987a) and Phillips and Perron (1988) proposed the following test statistic:

[Z.sub.[alpha]] = T([??] - 1) - [1/2][[[??].sup.2] - [[??].sup.2.sub.u]/[T.sup.-2][[summation].sup.T.sub.(t = 1)][y.sup.2.sub.t = 1], (24)

where

[alpha] = [[summation].sup.T.sub. (T = 2)] [y.sup.[mu].sub.t] [y.sup.[mu].sub.t - 1]/[[summation].sup.T.sub.(t = 2)][(y.sup.[mu].sub.t - 1]).sup.2] (25)

[u.sub.t] = [y.sup.[mu].sub.t] - [alpha][y.sup.[mu].sub.t - 1], (26)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and (27)

[[sigma].sup.2.sub.u] = [T.sup.-1] [[summation].sup.T.sub.(t = 1)] [([y.sub.t] - [alpha][y.sub.t - 1]).sup.2]. (28)

Since [[summation].sup.T.sub.(t = 1)][y.sub.t - 1][DELTA][y.sub.t] = (1/2)([y.sup.2.sub.T] - [DELTA][y.sup.2.sub.t]), the test [Z.sub.[alpha]] in (24) can be rewritten as

[Z.sub.[alpha]] = [1/2][[[S.sub.T][(1).sup.2] - 1/[T.sup.-1][[summation].sup.(T - 1).sub.(t = 1)][S.sub.T][(t/T).sup.2]] - [1/2][T.sup.-2]([??] - 1)

and, provided that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], its asymptotic distribution is

[Z.sub.[alpha]] [right arrow] [1/2][W[(1).sup.2] - 1/[[integral].sup.1.sub.0]W[(r).sup.2]dr]

This latter expression suggests the use of the following functional:

[m.sub.Z[alpha]] [right arrow] [1/2] [W[(1).sup.2] - 1/[[integral].sup.1.sub.0]W[(r).sup.2]dr]

as shown by Stock (1999). For the study of Perron and Rodriguez (2003) to be covered in section 7.3, the modified [Z.sub.[alpha]] test will be referred to as the [M.sub.Z[alpha]] test.

7. ECONOMETRIC APPLICATIONS

7.1. EXOGENOUS STRUCTURAL BREAK (PERRON, 1989)

The previous sections have established the foundations for studying the inference in the case of nonstationary time series. Henceforth we extend the analysis to the case in which a structural break is present. This literature starts with the identification of key limitations concerning ADF tests.

After the work of Dickey and Fuller (1979), several empirical studies were done in order to test for the existence of unit roots along macroeconomic variables. Most of these empirical results favored such a hypothesis and the perception that macroeconomic variables were characterized by stochastic trends became popular. One of the most influential studies in this empirical literature was conducted by Nelson and Plosser (1982), who studied fourteen macroeconomic variables for the US economy. Under the stochastic trend perspective, a series that exhibits an upward sloping behavior and an abrupt reduction (see Figure 4a) is interpreted as the consequence of an atypical realization of u (situated on the left tail of its distribution) for the process [y.sub.t] = [mu] + [y.sub.t - 1] + [u.sub.t]. However, the same behaviour can be interpreted as the trend stationary process [y.sub.t] = [[mu].sub.t] + [delta]t + [u.sub.t] whose intercept changes its value from, say, [[mu].sub.1] to [[mu].sub.2] < [[mu].sub.1] (see Figure 4b).

Indeed, Perron (1989) switched emphasis to this latter interpretation and asserted that

"... most macroeconomic variables are trend stationary if one allows a single change in the intercept of the trend function after 1929 and a single change in the slope of the trend function after 1973" (Perron, 1989, p. 1962-63).

Perron (1989) considered atypical events as interventions on the deterministic component of the model, and this allowed him to distinguish what can be explained by the disturbance term from what cannot. Additionally, the date of this intervention is assumed to be known by the researcher. Because there are two competing interpretations mentioned above for time series with an abrupt shift, the models considered by Perron (1989) are summarized in Table 1.

In Table 1, [theta], [mu], [[mu].sub.1], [[mu].sub.2], [delta], [[delta].sub.1] and [[delta].sub.2] are parameters, and A(L)[u.sub.t] = B(L)[e.sub.t] where [e.sub.t] ~ i. i. d. (0, [[sigma].sup.2.sub.[epsilon]]). A(L) and B(L) arepth and qth order polynomials. That is, {[u.sub.t]} is an ARM A(p, q) process with p and q possibly unknown. This assumption allows {[y.sub.t]} to represent the general processes. In this sense, different specifications allow for different models:

1. Under the null hypothesis, model A contains a dummy variable that equals 1 only immediately after [T.sub.B] (a one time change of the intercept), whereas under the alternative hypothesis the series is trend stationary with a permanent shift in the intercept of the trend function after [T.sub.B] (see Figure 5).

2. For model B, under the null hypothesis, a permanent change in the intercept is allowed after [T.sub.B]; whereas under the alternative hypothesis only a permanent shift is allowed in the slope of the deterministic component.

3. Finally, model C allows both the two shifting types simultaneously: a shift in level accompanied by a shift in slope.

In this way, Perron (1989) introduced a third interpretation to the discussion (see Figure 5) in order to identify limitations present in already known testing statistics.

A first attempt to discriminate between the two approaches included in Figure 4 could be through the use of DF tests. However, by using numerical experiments, Perron (1989) examined the performance of this class of tests under the alternative hypothesis. Specifically, Monte Carlo simulations revealed that when the data generating process was described as being by model A under the alternative, DF tests tended to detect a spurious unit root that did not vanish, even asymptotically. Therefore, a power loss was expected. Perron (1989, Theorem 1) derived this property also at the theoretical level, and he adopted Phillips's (1987a) assumptions concerning the innovation sequence {[u.sub.t]} in order for his results to be as general as possible.

Assumption 5 (Perron, 1989, p. 1371) Disturbance sequence [{[u.sub.t]}.sup.[infinity].sub.t = 1] satisfies

1. E([u.sub.t]) = 0 for all t;

2. [sup.sub.t] E[[absolute value of [u.sub.t]].sup.[beta] + [epsilon]] < [infinity] for some [beta] > 2 and [epsilon] > 0;

3. [[sigma].sup.2] = [lim.sub.T[right arrow][infinity]] [T.sup.-1] E([S.sup.2.sub.T]) exists and [[sigma].sup.2] > 0, where [S.sub.T] = [[summation].sup.T.sub.(t = 1)] [u.sub.t];

4. [{[u.sub.t]}.sup.[infinity].sub.t = l] is strong mixing with strong mixing coefficients [alpha](k) that satisfy

[[summation].sup.[infinity].sub.(k = 1)] = [alpha][(k).sup.1 - 2[beta]] < [infinity].

As expected, the Functional Central Limit Theorem by Herrndorf (1984) can still be employed in this case. Specifically, Assumption 2 allows for the generalization of the asymptotic theory included in Theorem 6 (Perron 1989, Lemma A.3), now under the presence of a [lambda] [member of] (0,1) breakfraction. The next subsection presents the strategy adopted and the main results.

7.1.1. Structure of the Model and Main Findings

Because of the caveats when using DF tests, the strategy adopted by Perron (1989) consisted of developing a unit root test under a structural break. That is, the null hypothesis specified the model as an autoregressive model that simultaneously contained both a unit root and a sudden shift (either on slope, intercept or both).

The two test statistics of interest were generalizations of the Z-tests proposed by Phillips (1987a). The intuition behind it was simple: since the researcher was assumed to know the [lambda] breakfraction, this effect must be removed from the data. Thus, let {[y.sup.i.sub.t]} denote the detrended data under model i (i = A, B, C). Furthermore, let [a.sup.i] be the least squares estimator of [a.sup.i] in the following regression:

[y.sup.i.sub.t] = [[alpha].sup.i] [y.sup.i.sub.t - 1] + [e.sub.t] (29)

where i = A, B, C; t = 1,2, ..., T. If the null hypothesis were in fact true, the value of [a.sup.i] should lie sufficiently close to one or, equivalently, the bias [a.sup.i] - 1 must lie close to zero. Formally, the next theorem presents the asymptotic distribution of both the standardized bias T([a.sup.i] - 1) and the t statistic [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] along several specifications.

Theorem 11 (Perron, 1989, p. 1373) Let the process {[y.sub.t]} be generated under the null hypothesis of model i (i = A, B, C), with the innovation sequence {u} satisfying Assumption 5. Let [right arrow] denote weak convergence in distribution and [lambda] = [T.sub.B]/T for all T. Then, as T [right arrow] [infinity]:

a) T([[alpha].sup.i] - 1) [right arrow] [H.sub.i]/[K.sub.i]; a) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], ([sigma]/[[sigma].sub.u])[H.sub.i]/[([g.sub.i][K.sub.i]).sup.1/2];

where

[H.sub.A] = [g.sub.A] [D.sub.1] - [D.sub.5] [[psi].sub.1] - [D.sub.6] [[psi].sub.2]; [K.sub.A] = [g.sub.A] [D.sub.2] - [D.sub.4] [[psi].sub.2] - [D.sub.3] [[psi].sub.1];

[H.sub.B] = [g.sub.B] [D.sub.1] + [D.sub.5] [[psi].sub.3] + [D.sub.8] [[psi].sub.4]; [K.sub.B] = [g.sub.B] [D.sub.2] + [D.sub.7] [[psi].sub.4] + [D.sub.3] [[psi].sub.3];

[H.sub.C] = [g.sub.C] [D.sub.9] + [D.sub.13] [[psi].sub.5] - [D.sub.14] [[psi].sub.6]; [K.sub.C] = [g.sub.C] [D.sub.10] - [D.sub.12][[psi].sub.6] + [D.sub.11] [[psi].sub.5];

with

[[psi].sub.1] = 6[D.sub.4] + 12[D.sub.3]; [[psi].sub.2] = 6[D.sub.3] + [(1 - [lamba]).sup.- 1][[lambda].sup.-1][D.sub.4]:

[[psi].sub.3] = (1 + 2[lambda]) [(1 - [lambda]).sup.-1] [D.sub.7] - (1 + 3[lambda]) [D.sub.3];

[[psi].sub.4] = (1 + 2[lambda]) [(1 - [lambda]).sup.-1] [D.sub.3] - [(1 - [lambda]).sup.-3] [D.sub.7];

[[psi].sub.5] = [D.sub.12] - [D.sub.11]; [[psi].sub.6] = [[psi].sub.5] + [(1 - [lambda]).sup.2] [D.sub.12] /[[lambda].sup.3];

Proof. See Perron (1989), Theorem 2 p. 1393, and Appendix D for an extended definition of coefficients.

The reader must take into account that the previous limiting distributions depend, besides [lambda], on the nuisance parameters [[sigma].sup.2] and [[sigma].sup.2.sub.u]. The finding of consistent estimators for the variance of innovations [[sigma].sup.2.sub.u] and the long run variance of partial sums [[sigma].sup.2] constitutes an empirical issue. In the case of weakly stationary innovations, [[sigma].sup.2] = 2[pi]f(0) where f(0) is the spectral density of {[u.sub.t]} evaluated at the zero frequency. Even more, Perron (1989) mentioned that when the sequence {[u.sub.t]} was independent and identically distributed, [[sigma].sup.2] = [[sigma].sup.2.sub.u], and in that case the limiting distributions were invariant with respect to the nuisance parameters, except [lambda].

With these theoretical results and the tabulation of critical values through Monte Carlo simulations, evidence was found against the unit root hypothesis for the series studied by Nelson and Plosser (1982). Thus, the relevance of the results by Perron (1989) lies in the analysis of the performance of ADF tests when misspecification was present. As will be shown below, misspecification becomes crucial for the identification of desirable properties of new tests to be proposed. On the other hand, these results generalized the tests by Phillips (1987a) and the inference procedure assumed knowledge of both the existence of structural break and the breakfraction value. Subsequent studies progressively avoided these two assumptions and included desirable properties.

7.2. ENDOGENOUS STRUCTURAL BREAK (ZIVOT AND ANDREWS, 1992)

7.2.1. A Simple Reason for Relaxing Exogeneity

Before the formal analysis corresponding to this section, it is important to illustrate the main argument held by Zivot and Andrews (1992) against Perron (1989) through the following example. First, consider two sample paths as described in Figure 6. From Perron's perspective, applied researchers will choose a breakfraction near to 0.25 for the first sample path, whereas they are more likely to choose a breakfraction near to 0.75 for the second one. Thus, the breakfraction is no longer exogenous since the previous selections are based on an a priori inspection of data, which incorporates an implicit selection rule behind it. This fact is going to be exploited formally and will lead to the use of the Functional Central Limit Theorem under somewhat different conditions.

7.2.2. The Approach

The first of the abovementioned two assumptions is avoided by Zivot and Andrews (1992). They consider not an exogenous breakfraction, but an endogenous one that has to be estimated. As they assert:

"If one takes the view that these events are endogenous, then the correct unit root testing procedure would have to account for the fact that that the breakpoints in Perron's regressions are data dependent. The null hypothesis of interest in these cases is a unit root process with drift that excludes any structural change. The relevant alternative hypothesis is still a trend stationary process that allows for a one time break in the trend function. Under the alternative, however, we assume that we do not know exactly when the breakpoint occurs" (Zivot and Andrews, 1992, p. 252).

As noticed, attention is turned back to the competing approaches shown in Figure 4 and formalized in Table 2. Additionally, while the tests developed by Perron (1989) are conditional on a given breakfraction [lambda] [member of] (0,1), Zivot and Andrews (1992) attempted to transform these tests into unconditional ones by designing an estimation method for [lambda].

It is important to mention that conventional wi[S.sup.d]om in applied econometrics considers the Zivot-Andrews tests as unit root tests under structural break. By definition, this is not true since the null hypothesis considers only a unit root and no other deterministic component. On the other hand, in line with the structural change literature under an unknown changepoint, Zivot and Andrews (1992) suggested to choose the [lambda] breakfraction that gave the least favorable result for the null hypothesis [H.sub.0]: [[alpha].sup.i] = 1 (i = A, B, C) using the one sided t statistic [t.sub.[alpha]]([lambda]) when small values of the statistic lead to the rejection of the null. Let [[lambda].sup.i.sub.inf] denote such a value for model i, then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where A is a specified closed subset of (0,1). For models A, B and C, t statistics are obtained from the following regression equations:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (30)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (31)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (32)

respectively, where parameter estimates are denoted with a hat and [[??].sub.t] is the residual term. In (30)-(32), D[U.sup.t]([lambda]) = 1 if t > [lambda] l and 0 otherwise, while D[T.sup.*.sub.t] = t - T [lambda] if t > T [lambda] and 0 otherwise. The number of extra lags k is included here to potentially take into account the correlation between disturbances and [??] denotes the estimated value of [lambda]. In order to make the results as simple as possible, the authors considered first the case where k = 0 (no correlation among disturbances). In contrast with the work of Perron (1989), when correlation between disturbances is present, it is restricted to being of the ARMA structure. It is worth mentioning that this structure is a particular case of mixing processes and that this implies that the Functional Central Limit Theorem can still be applied.

For unit root testing, intuition relies on the following reasoning: if [H.sub.0] were in fact true, then the minimum t statistic should not significantly differ from zero, whereas if [H.sub.1] were true then [H.sub.0] should be rejected and an estimated value for [lambda] would be provided for the alternative trend stationary specification. When [lambda] is estimated, the critical values in Perron (1989) cannot be employed for unit root testing. Consider an estimated [lambda] with minimum t statistic. Then, the decision rule can be summarized as

reject [H.sub.0] if [inf.sub.[lambda][member of][LAMBDA]] [t.sub.[??].i] ([lambda]) < [k.sup.i.sub.inf,v], i = A, B, C,

where [k.sup.i.sub.inf,v] denotes the asymptotic critical value of [inf.sub.[lambda][member of][LAMBDA] [t.sub.[??].i] ([lambda]) for a size equal to v. By definition, critical values are larger than (in absolute value) to those calculated on the basis of an arbitraryl. Thus, the tests built by Perron (1989) are biased towards rejecting the null. In order to formally establish this distinction, distributions for the statistics [inf.sub.[lambda][member of][LAMBDA]] [t.sub.[??].i] (i = A, B, C) are needed.

7.2.3. Asymptotic Distribution Theory

In order to obtain the limiting distribution for their proposed statistic, Zivot and Andrews (1992) made use of the framework suggested by Ouliaris, Park and Phillips (1989), which allowed for a compact form for their results. It is worth mentioning that this framework was also used by Perron (1989) when the objective was to develop a generalization for his main theorem on the case of disturbances that express autocorrelation. Attention is here focused on i.i.d. disturbances. The following two definitions are necessary for the understanding of the main theorem.

Definition 9 [L.sub.2] [0,1] is the Hilbert space of square integrable functions on [0,1] with

inner product [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Definition 10 [W.sup.i] ([lambda], r) is the stochastic process on [0,1] that is the projection residual in [L.sub.2] [0,1] of a Wiener process projected onto the subspace generated by the following:

1. for i = A: 1, r, du([lambda], r);

2. for i = B: 1, r, d[t.sup.*]([lambda], r); and

3. for i = C: 1, r, du(1, r), d[t.sup.*]([lambda], r) where du([lambda], r) = 1 ifr > [lambda], 0 otherwise and d[t.sup.*]([lambda], r) = r - [lambda] ifr > [lambda] and 0 otherwise.

Asymptotic distribution is presented in the next theorem (15).

Theorem 12 (Zivot and Andrews, 1992, Theorem 1 p. 256) Let {[y.sub.t]} be generated under the null hypothesis and let the disturbances {[u.sub.t]} be i.i.d., mean 0, and variance o2 random variables with 0 < [[sigma].sup.2] < [infinity]. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denote the t statistic for testing [[alpha].sup.i] = 1 computed from either (30), (31) or (32) with k = 0 for Models i = A, B and C, respectively. Let [LAMBDA] be a closed subset of (0,1). Then,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for i = A, B and C, where [right arrow] denotes convergence in distribution.

Proof. See Zivot and Andrews (1992), Appendix A, p. 266.

It is worth mentioning that when a correlation of the ARMA type is allowed, the previous result can be extended in order to obtain an autoregressive estimate of the spectral density of [e.sub.t] at the zero frequency. This empirical issue is addressed by authors with the help of an assumption similar to assumption 2 in Phillips (1987a). That is, the probability of outliers is controlled and such an assumption was also adopted in subsequent work.

7.3. EFFICIENT UNIT ROOT TESTING UNDER STRUCTURAL BREAK (PERRON AND RODRIGUEZ, 2003)

Based on the elements contained in the previous sections, one can identify two features along the unit root literature:

1. Deterministic trend and size. Most of the earlier unit root tests under less restrictive assumptions were extensions of augmented Dickey-Fuller tests and therefore their asymptotic distributions depended on whether or not a deterministic component had been added to the regression equation. According to Stock (1999) this problem could be solved by first detrending the series and next performing (robust) modified unit root tests in such a way that size is not affected.

2. Structural break and power. Perron (1989) illustrated how deterministic trends that contained a break could induce spurious unit roots in Dickey-Fuller tests. Following Stock (1999), a trend with structural break could be incorporated in the detrending process. Since it is guaranteed that size will not be affected, it becomes desirable to increase the power of the tests against local alternatives. Such a procedure can be done by following the near-integrated time series approach proposed by Phillips (1987b) and developed by Elliot et al. (1996) for the case of no structural break. Therefore, an extension is called for.

Within this framework, Perron and Rodriguez (2003) extended the modified or M tests (analyzed in detail by Ng and Perron, 2001) to the case in which a structural break in the trend function exists.

7.3.1. Data Generating Process

The observed series [{[y.sub.t]}.sup.T.sub.t = 0] is assumed to be generated according to

[y.sub.t] = [psi]' [z.sub.t] + [u.sub.t], (33)

[u.sub.t] = [alpha][u.sub.t-1] + [v.sub.t], and (34)

for t = 1, ..., T. Perron and Rodriguez (2003) considered two models for testing the presence of structural change, summarized in Table 3. A model with structural change in the intercept is not considered, since its limiting distribution is the same as those corresponding to both intercept and slope. For disturbances, the authors, following Phillips and Solo (1992), adopted the following specification:

Assumption 6 (Perron and Rodriguez, 2003 p. 3) The following conditions must hold:

1. [u.sub.0] = 0, and

2. The noise function is [v.sub.t] = [[summation].sup.[infinity].sub.i = 0] [[epsilon].sub.t-i] where [[summation].sup.[infinity].sub.i = 0] < [infinity] and where {[[epsilon].sub.t]} is an m.d.s. The process {[v.sub.t]} has a non-normalized spectral density at frequency zero given by [[sigma].sup.2] = [[sigma].sup.2.sub.[epsilon]][(1).sup.2], where [[sigma].sup.2.sub.[epsilon]] = [[lim.sub.T[right arrow][infinity]] [T.sup.-l] [[summation].sup.[infinity].sub.t = 1] E([[epsilon].sup.2.sub.T]). Furthermore, [T.sup.-1/2] [[summation].sup.[rT].sub.t= 1] [v.sub.t] [right arrow] [sigma]W(r), where [right arrow] denotes weak convergence in distribution and W(r) is the standard Wiener process defined on C[0,1] the space of continuous functions on the interval [0,1].

7.3.2. GLS Detrending and Mtests

First, denote the transformed data by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and let [??] be the estimator that minimizes (35)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (35)

The data is transformed in order to make the results dependent on the parameter a. The goal here is to derive an optimal unit root test against a local alternative hypothesis. In this sense, later a computed value for [bar.[alpha]] will be necessary. Based on Phillips (1987b), both null and alternative hypotheses can be summarized by means of a near-integrated process. In (34), the autoregressive coefficient can be written as

[alpha] = 1 + c/T.

Then, under the null c = 0, whereas under the alternative c < 0 and the power function can be explicitly obtained. The M tests, studied in section 6, are defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (36)

[MATHEMATICAL EXPRESSION NOT REPRODICIBLE IN ASCII] and (37)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (38)

with local detrended data defined by [[??].sub.t] = [y.sub.t] - [psi]'[z.sub.t] where [??] minimizes (35). The term [[??].sup.2] is an autoregressive estimate of the spectral density at frequency zero of [v.sub.t], defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[??] = [[summation].sup.k.sub.j = 1] [[??].sub.j]

where [[??].sub.j] and {[[??].sub.tk]} are obtained from the following auxiliary ADF regression

[DELTA][[??].sub.t] = [b.sub.0] [[??].sub.t = 1] + [[summation].sup.k.sub.j = 1] [b.sub.j] [DELTA][[??].sub.t - j] + [v.sub.tk].(39)

7.3.3. Asymptotic Distributions

The next theorem presents the limiting distribution of the testing statistics for fixed values of c, [bar.c] and [lambda].

Theorem 13 (Perron and Rodriguez, 2003 p. 7) Let [{[y.sub.t]).sup.T.sub.t=0] be generated by model (33) with [alpha] = 1 + clT, M[Z.sup.GLS.sub.[alpha], MS[B.sup.GLS] and M[Z.sup.GLS.sub.t] be defined by (36)-(38), with data obtained from local GLS detrending ([[??].sub.t]) at [??] = 1 + [bar.c]/T, and AD[F.sup.GLS] be the t statistic for testing [b.sub.0] = 0 in the regression (39). Also, [[??].sup.2] is a consistent estimate of [[sigma].sup.2]. For models A and B

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[K.sub.1] (c,[bar.c], [lambda]) = [V.sup.(1).sub.c[bar.c]][(1, [lambda]).sup.2] - 2[V.sup.(2).sub.c[bar.c]](1, [lambda])-1,

[K.sub.2](c, [bar.c], [lambda]) = [[integral].sup.1.sub.0] [V.sup.(1).sub.c[bar.c]] [(r, [lambda]).sup.2] dr - 2 [[integral].sup.1.sub.[lambda]] [V.sup.(2).sub.c[bar.c]](r, [lambda])dr

and [V.sup.(1).sub.c[bar.c]] (r, [lambda]) = [W.sub.c](r) - r[b.sub.3], [V.sup.(2).sub.c[bar.c]](r,[lambda]) = [b.sub.4] (r - [lambda])[[W.sub.C] (r) - r[b.sub.3] - (1/2)(r -[lambda])[b.sub.4]] with [w.sub.c](r) the Ornstein-Uhlenbeck process as the solution to the stochastic differential equation

d[W.sub.c](r) = c[W.sub.c](r)dr + dW(r) with [W.sub.c](r) = 0.

Also, [b.sub.3] and [b.sub.4] are defined by

[b.sub.1] = (1- [bar.c]) [W.sub.c] (1) + [c.sup.2] [[integral].sup.1.sub.0] r[W.sub.c] (r)dr,

[b.sub.1] = (1 - [bar.c] + [lambda][bar.c]) [W.sub.c] (1) + [[bar.c].sup.2] [[integral].sup.1.sub.[lambda]] [W.sub.c](r)(r - [lambda]) dr - [W.sub.c]([lambda]),

[b.sub.3] = [[lambda].sub.1] [b.sub.1] +[[lambda].sub.2] [b.sub.2],

[b.sub.4] = [[lambda].sub.2 [b.sub.1] + [[lambda].sub.3] [b.sub.2],

[[lambda].sub.1] = d/[THETA],

[[lambda].sub.2] = -m/[THETA],

[[lambda].sub.3] = a/[THETA],

b = 1 - [lambda] - c + 2c[lambda]-c[[lambda].sup.2] - [c.sup.2][lambda] + [c.sup.2][[lambda].sup.2] + ([c.sup.2]/3)(1-[[lambda].sup.3]),

m = 1 - [lambda] - c + c[lambda] + ([c.sup.2]/2)[[lambda].sup.3] + ([c.sup.2]/3)(1-[[lambda].sup.3]),

a = 1 - c + [c.sup.2]/3 and

Proof. See Perron and Rodriguez (2003), Theorem 1, p. 22.

7.3.4. A Feasible Point Optimal Test with Known Breakdate

As Phillips (1988) pointed out, the discriminatory power ofunit root tests is low against local alternatives near but not equal to unity because under both hypotheses the distributions are quite similar. The main idea behind efficiency relies on the increase of power or, equivalently, the probability of rejecting a false alternative hypothesis. As mentioned by Elliot et al. (1996), if the data distribution were known then the Neyman-Pearson Lemma would suggest the optimal point alternative against any other point alternative hypothesis, and in such circumstances a power envelope could be derived (16).

However, although within this framework a uniformly most powerful (UMP) test is not attainable, it is possible to define an optimal test for [alpha] = 1 against the alternative [alpha] = [bar.[alpha]]. Moreover, if vt were i.i.d., then such a test would be given by the likelihood ratio statistic which, under the normality assumption, equals the following difference

L([lambda]) [equivalent to] S([[bar.[alpha]], [lambda]) - S(1, [lambda]),

where S([[bar.[alpha]], [lambda]) and S(1, [lambda]) are the sums of squares from GLS, detrending under both [alpha] = [[bar.[alpha]] and [alpha] = 1, respectively. Under the assumption of a known [lambda] breakfraction, different values for [[bar.[alpha]] lead to a family of point optimal tests and a Gaussian envelope for testing [alpha] = 1. Furthermore, in order to allow for correlation between errors [v.sub.t], Elliot et al. (1996) proposed a feasible optimal point test [P.sup.GLS.sub.T] defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (40)

where its distribution is derived in the following theorem:

Theorem 14 (Perron and Rodriguez, 2003 p. 7) Let {[y.sub.t]} be generated by (33) with [alpha] = 1 + c/T. Let [P.sup.GLS.sub.T] be defined by (40) with data obtained from local GLS detrending ([[??].sub.t]) at [[bar.[alpha]] = 1 + [bar.c]/T. Also, let [[??].sup.2] be a consistent estimate of [[sigma].sup.2]. The limit distribution of the [P.sup.GLS.sub.T] under Models A and B is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where M(c, [bar.c], [lambda]) = A(c, [bar.c], [lambda])'B[([bar.c], [lambda]).sup.l] A(c, [bar.c], [lambda]) with A(c, [bar.c], [lambda]) a 2 X 1 vector defined by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and B([bar.c], X) is a symmetric matrix with entries

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Proof. See Perron and Rodriguez (2003), Theorem 2, p. 34. ?

The reader must remember that any test statistic is also a random variable, and rejecting the unit root hypothesis is an event in which the test statistic lies below a certain critical value. Since the distribution for the tests was derived both under the null and the alternative hypothesis, the (asymptotic) power function can be explicit by means of the probability of rejecting the null under the alternative. Such a function is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the critical value [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (c, [lambda]) is determined by the probability of Type I error

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and v is the size of the test. Therefore, different values of [lambda] generate different power functions.

7.3.5. A Feasible Point Optimal Test with Unknown Breakdate and the Power Envelope

The previous subsections refer to the case in which the breakfraction [lambda] is known. In practice, however, this parameter is required to be estimated by applied researchers. For this reason the feasible version of the statistic in (40) is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (41)

The principle behind (41) is the same as in (40). The main difference relies on the trimming parameter e introduced. This latter parameter is usually set to 0.15 in order to bind critical values, a situation that arises in the context of tests for structural change. Using Theorem 14, the following result is obtained:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Accordingly, the asymptotic Gaussian power envelope is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the critical value is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (c) so that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It must be pointed out that Elliot et al (1996) recommended a value for [bar.c] so that [[pi].sup.*](c) = 0.5. Using Monte Carlo simulations, Perron and Rodriguez (2003) found that [bar.c] = -22.5.

It must be emphasized that, within this literature, the idea behind power increasing unit root tests is related to the extent to which power functions lie close to the Gaussian power envelope (the benchmark case). When [lambda] is known, only one set of simulations is performed in order to obtain the power function corresponding to that value. When [lambda] is unknown, however, several sets of simulations are performed (one for each value of [lambda] in [[epsilon], 1 - [epsilon]]).

8. EMPIRICAL APPLICATIONS

A set of nine macroeconomic variables have been selected for the Peruvian economy: gross domestic product (GDP), absorption, consumption, consumer price index (CPI), nominal exchange rate (domestic currency by Dollar), unemployment rate, real exchange rate, export price index and import price index. Frequencies of data vary according to availability and the covered sample period appears below the name of each variable. Table 4 presents results of application of four different unit root statistics. The first is the standard augmented Dickey-Fuller (ADF) statistic without structural change. The second statistic is the ADF proposed by Perron (1989) with an exogenous breakpoint. The third one is the ADF proposed by Zivot and Andrews (1992) based on the infimum method of selecting the break point. In the two columns, two models are estimated: the crash and the breaking trend models denoted by C and BT, respectively. The remaining column is based on Perron and Rodriguez (2003) using the breaking trend model. Statistics in Column (4) are based on the infimum method to select the break point using the M[Z.sup.GLS.sub.t] and AD[F.sup.GLS] statistics, respectively. The lag length has been selected using the MAIC proposed by Perron and Ng (2001).

The results indicate the different results that may be obtained from the application of the different unit root statistics with or without structural change. Therefore, the results also show the difficulties of obtaining a unique answer regarding the stationarity or nonstationarity of the variables. Overall, we obtain mixed evidence for or against the stationarity of the analyzed variables. Mostly, no rejection of the null hypothesis of a unit root is found for the consumption, the CPI or the export price index. On the other hand, mostly rejection of the null hypothesis is found for the unemployment rate, the import price index, and the nominal exchange rate. Mixed evidence is found for the GDP and the real exchange rate.

9. CONCLUSIONS

The application of different unit root statistics is by now a standard practice in empirical work. Even when it is a practical issue, these statistics have complex nonstandard distributions depending on the functionals of certain stochastic processes, and their derivations represent a barrier even for many theoretical econometricians. These derivations are based on rigorous and fundamental statistical tools which are not (very) well known for standard econometricians. This paper aims to fix this gap by explaining in a simple way one of these fundamental tools: namely, the Functional Central Limit Theorem. Therefore, this paper analyzes the foundations and applicability of two versions of the Functional Central Limit Theorem within the framework of a unit root with a structural break. As shown, unit root tests can be described as functionals of stochastic processes such as the standard Wiener process and the Ornstein-Uhlenbeck process.

Therefore, a general framework involving mixing conditions (Phillips 1987a) generalizes the results obtained under the assumption of normal i.i.d. disturbances (Dickey and Fuller 1979). Also, the analysis of modified tests (Stock 1999) allows separation of unit root test sizes from the specific form of the deterministic component, a problem not solved in earlier works. Tools developed also allow the analytical tractability of several problems within this literature: the presence of structural breaks and the low power against local alternatives. For the issue of structural breaks (Perron 1989), first detrending in the series has shown itself to be a robust procedure, so that asymptotic size is not affected (Stock 1999). For the issue of increasing power, asymptotic distribution can be derived by means of Ornstein-Uhlenbeck processes, both under the null and local alternatives (Phillips 1987b), while a power function can be derived and maximized. When the two issues are combined, the result is an efficient test (Perron & Rodriguez 2003).

Caption: Figure 1. Asymptotic distributions for several specifications

Caption: Figure 2. Dependence and mixing coefficients

Caption: Figure 3. Convergence of standardized sums

Caption: Figure 4. Shifts under stochastic and deterministic trend frameworks

Caption: Figure 5. The "Crash" model

Caption: Figure 6. Sample paths under different breakfractions

APPENDIX

A. BEVERIDGE NELSON DECOMPOSITION

Based on Phillips and Solo (1992), let the operator C(L) = [[summation].sup.[infinity].sub.j=0] [c.sub.j] [L.sup.j] be a lag polynomial. Then

C(L) = C(1) _ (1 _ L)[??](L),

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If p [greater than or equal to] 1, then,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If p < 1, then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

B. MARTINGALE DIFFERENCE SEQUENCE

Let {[x.sub.t]} and {[y.sub.t]} denote two stochastic processes. Then {[y.sub.t]} is a martingale difference sequence with respect to {[x.sub.t]} if its expectation, conditional to past values of {[x.sub.t]}, is zero. Formally,

E[[y.sub.t]|[x.sub.t-1], [x.sub.t-2],...] = 0, for all t.

When the expectation of {[y.sub.]t}, conditional to its own past values, is zero then {[y.sub.t]} is said to be a martingale difference sequence, or m.d.s.

C. STRONGLY UNIFORM INTEGRABILITY

Let [{[Z.sub.t]}.sup.[infinity].sub.t = 1] be a sequence of random variables adapted to the filtration [{[F.sub.t]}.sup.[infinity].sub.t = 1]. For Phillips and Solo (1992), {[Z.sub.t]} is said to be strongly uniformly integrable (s.u.i.) if there exists a dominating random variable Z for which E([absolute value of Z]) < [infinity] and

P([absolute value of ([Z.sub.t])] [greater than or equal to] x) [less than or equal to] cP([absolute value of Z][greater than or equal to]x)

for each x [greater than or equal to] 0, t [greater than or equal to] 1 and for some constant c.

D. SOME THEOREMS OF UNIT ROOT WITH STRUCTURAL CHANGE

Functionals in Perron (1989), Theorem 2 p. 1393 are defined as follows:

[D.sub.1] = (1/2) (w[(1).sup.2] - [[sigma].sup.2.sub.u]/[[sigma].sup.2]) - W(l)[[integral].sup.1.sub.0] W(r)dr;

[D.sub.2] = [[integra].sup.1.sub.0] W [(r).sup.2] dr -[[integra].sup.1.sub.0]W (r)dr].sup.2]

[D.sub.3] = [[integra].sup.1.sub.0] rW (r)dr -(1/2) [[integra].sup.1.sub.0]W(r)dr;

[D.sub.4] = [[integra].sup.[lambda].sub.0] W (r)dr - [lambda] [[integra].sup.1.sub.0] W(r)dr;

[D.sub.5] = W(l)/2 -[[integra].sup.1.sub.0] (r)dr

[D.sub.6] = W ([lambda]) - [lambda]W (l);

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[D.sub.8] = ([(l- [lambda]).sup.2]/2)W (l) -[[integra].sup.1.sub.[lambda]]W (r)dr;

[D.sub.9] = [[integra].sup.1.sub.0]W [(r).sup.2] dr - [lambda].sup.-1] [[[integra].sup.1.sub.[lambda]]W (r)dr].sup.2] - [(l- [lambda]).sup.1] [[[integra].sup.1.sub.[lambda]]w (r)dr].sup.2];

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[D.sub.12] = [[integral].sup.[lambda].sub.0]rW(r)dr - ([lambda]/2) [[integral].sup.[lambda].sub.0]W(r)dr;

[D.sub.13] = (1 - [lambda]) W(1)/2 + W([lambda]/2 - [[integral].sup.1.sub.0]W(r)dr;

[D.sub.14] = [lambda]W ([lambda])/2 - [[integral].sup.[lambda].sub.0] (r)dr;

[g.sub.A] = l - 3(l - [lambda])[lambda]; [g.sub.B] = 3[[lambda].sup.3]; [g.sub.c] = l2[(l-[lambda]).sup.2];

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and W is a standard Wiener process on C.

REFERENCES

Aquino, J. C. (2011). Functional Central Limit Theorems and Unit Root Testing. Unpublished M.Sc. Dissertation, Department of Sciences, Section of Mathematics, Pontificia Universidad Catolica del Peru.

Banerjee, A., R. L. Lumsdaine & J. H. Stock (1992). Recursive and Sequential Tests of the Unit-Root and Trend-Break Hypotheses: Theory and International Evidence. Journal of Business & Economic Statistics, 10(3), 271-287.

Bhargava, A. (1986). On the Theory of Testing for Unit Roots in Observed Time Series. Review of Economic Studies, 53(3), 369-384.

Billingsley, P. (1968). Convergence of Probability Measures. John Wiley & Sons, Inc.

Box, G. E. P. & G. M. Jenkins (1970). Time Series Analysis: Forecasting and Control. John Wiley & Sons, Inc.

Box, G. E. P. & D. A. Pierce. (1970). Distribution of the Autocorrelations in Autoregressive Moving Average Time Series Models. Journal of the American Statistical Association, 65, 1509-26.

Brzezniak, Z. & T. Zastawniak (1999). Basic Stochastic Processes. Springer.

Davidson, J. (1994). Stochastic Limit Theory: An Introduction for Econometricians. Oxford University Press.

Dickey, D. A. & W. A. Fuller (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74, 427-431.

Donsker, M. (1951). An Invariance Principle for Certain Probability Limit Theorems. Memoirs of the American Mathematical Society, 6, 1-12.

Elliott, G., T. J. Rothenberg & J. H. Stock. (1996). Efficient Tests for an Autoregressive Unit Root. Econometrica, 64(4), 813-836.

Enders, W. (2004). Applied Econometric Time Series. John Wiley & Sons, Inc.

Glynn, J. & N. Perera. (2007). Unit Root Tests and Structural Breaks: A Survey with Applications. Revista de Metodos Cuantitativos para la Economia y la Empresa, 3(1), 63-79.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.

Herrndorf, N. (1984). A Functional Central Limit Theorem for Weakly Dependent Sequences of Random Variables. The Annals of Probability, 12(1), 141-53.

Nelson, C. R. & C. I. Plosser. (1982). Trends and Random Walks in Macroeconomic Time Series: Some Evidence and Implications. Journal of Monetary Economics, 10(2), 139-162.

Ng, S. & P. Perron. (2001). Lag Length Selection and the Construction of Unit Root Tests with Good Size and Power. Econometrica, 69(6), 1519-54.

Oksendal, B. (2000). Stochastic Differential Equations. Springer.

Ouliaris, S., J. Y. Park & P. C. B. Phillips. (1989). Testing for a Unit Root in the Presence of a Maintained Trend. In: B. Raj (ed.). Advances in Econometrics and Modeling, Kluwern Academic.

Perron, P. (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis. Econometrica, 57(6), 1361-1401.

Perron, P. & G. Rodriguez (2003). GLS Detrending, Efficient Unit Root Tests and Structural Change. Journal of Econometrics, 115(1), 1-27.

Phillips, P. C. B. (1987a). Time Series Regression with a Unit Root. Econometrica, 55(2), 277-301. Juan Carlos Aquino y Gabriel Rodriguez Understanding the Functional Central Limit Theorems... 149

Phillips, P. C. B. (1987b). Towards a Unified Asymptotic Theory for Autoregression. Biometrika, 74(3), 535-547.

Phillips, P. C. B. (1988). Regression Theory for Near-Integrated Time Series. Econometrica, 56(5), 1021-1043.

Phillips, P. C. B. & P. Perron. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75(2), 335-346.

Phillips, P. C. B. & V. Solo. (1992). Asymptotics for Linear Processes. he Annals of Statistics, 20(2), 971-1001.

Sargan, J. D. & A. Bhargava. (1983). Testing Residuals from Least Squares Regression for Being Generated by the Gaussian Random Walk. Econometrica, 51(1), 153-174.

Stock, J. (1999). A Class of Tests for Integration and Cointegration. In R. F. Engle and H.

White (eds.). Cointegration: Causality and Forecasting: A Festschrift in Honour of Clive W. J. Granger, Oxford University Press, pp. 135-167.

White, J. S. (1958). The Limiting Distribution of the Serial Correlation Coefficient in the Explosive Case. The Annals of Mathematical Statistics, 29(4), 1188-1197.

White, H. (1984). Asymptotic heory for Econometricians. Academic Press.

Zivot, E. & D. W. K. Andrews (1992). Further Evidence on the Great Crash, the Oil-Price

Shock, and the Unit-Root Hypothesis. Journal of Business & Economic Statistics, 10(3), 251-270.

Documento recibido el 8 de mayo de 2012 y aprobado el 25 de enero de 2013.

* This paper is drawn from Juan Carlos Aquino's M.Sc. Dissertation at the Department of Mathematics of the Pontificia Universidad Catolica del Peru (Aquino, 2011). The authors are grateful to Loretta Gasco and Luis Valdivieso and two anonymous referees for their valuable comments and suggestions. The editorial assistance of Brenda Oliva in some sections of this paper is acknowledged.

(1) A functional can be understood as a map that takes a real-valued function as the input argument and returns a real number. Naturally, this idea extends to the case of random real-valued functions.

(2) Hereafter, any reference to a stationary process will be understood on this basis.

(3) See Enders (2004) for an applied approach to this methodology.

(4) See Hamilton (1994) for further details.

(5) For example, this occurs because the knowledge of the true value for the intercept (equal to zero) is being exploited.

(6) Typically, textbook treatment of time series analysis assumes a sequence of independent and identically distributed (i.i.d.) disturbances. Within our exposition, the independence assumption is relaxed by dealing with potentially dependent disturbances. Additionally, homogeneity of distributions corresponding to disturbances is relaxed by considering a wider family of distributions.

(7) Note that the idea of progressive lack of dependence includes ergodicity and asymptotic independence.

(8) In French: "continue a droite, limitee a gauche".

(9) C = C([0,1]) is the space of all continuous functions on the unit interval.

(10) A homeomorphism (or bicontinuous function) is a continuous function that has a continuous inverse function.

(11) Consider a set X and two metrics [g.sub.1] and [g.sub.2] defined on X.[g.sub.1] and [g.sub.2] are said to be topologically equivalent or equivalent if they generate the same metric topology on X.

(12) See equation (3).

(13) Since c = T ln([alpha]).

(14) Let {[y.sub.t]} denote a sequence of random variables and let {[a.sub.t]} denote a sequence of positive non-stochastic real numbers. Then [y.sub.t] = [O.sub.p]([a.sub.t]) if for each [epsilon] > 0 there exists M > 0 such that P([absolute value of ([y.sub.t])]/[a.sub.t] > M) < [epsilon].

(15) Although independent, the derivation presented here occurs in a similar fashion to those reported by Banerjee et al. (1992).

(16) As Elliot et al. (1996) mentioned, the Gaussian power envelope is upper bound to the asymptotic power function for tests of the unit root hypothesis when the data are generated by [y.sub.t] = [d.sub.t] + [u.sub.t] and [u.sub.t] = [alpha][u.sub.t - 1] + [v.sub.t] but under ideal conditions. Namely, the process {[v.sub.t]} has a moving average representation involving independent standard normal random variables, the initial condition [u.sub.0] is 0 and the deterministic component [d.sub.t] is known. Such unrealistic assumptions are made in order to employ the Neyman-Pearson theory.

Juan Carlos Aquino, Banco Central de Reserva del Peru.

Gabriel Rodriguez, Mailing address: Gabriel Rodriguez, Department of Economics, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, Lima 32, Lima, Peru; Telephone: +511-626-2000 (4998); Fax: +511-626-2874; E-mail address: gabriel.rodriguez@pucp.edu.pe.
```
Table 1. Null and alternative hypotheses considered by Perron (1989)

Null hypothesis                           Alternative hypothesis

Model A                                   Model A
[y.sub.t] = [mu] + [y.sub.t - 1] +        [y.sub.t] = [[mu].sub.1] +
[theta]D([T.sub.B]) + [u.sub.t]           ([[mu].sub.2] -
[[mu].sub.1])D[U.sub.T] +
[[delta].sub.t] +
[u.sub.t]
Model B                                   Model B
[y.sub.t] = [[mu].sub.1] +                [y.sub.t] = [mu] +
[y.sub.t - 1] + ([[mu].sub.2] -           [[delta].sub.1] t +
[[mu].sub.1])D[U.sub.t] + [u.sub.t]       ([[delta].sub.2] -
[[delta].sub.1])D
[T.sup.*.sub.t] +
[u.sub.t]
Model C                                   Model C
[y.sub.t] = [[mu].sub.1] +                [y.sub.t] = [[mu].sub.1] +
[y.sub.t -1] + [theta]D                   [[delta].sub.1] t
[([T.sub.B]).sub.t] + ([[mu].sub.2] -     ([[mu].sub.2] -
[[mu].sub.1])D[U.sub.t] + [u.sub.t]       [[mu].sub.1])D[U.sub.t] +
([[delta].sub.2] -
[[delta].sub.1])
D[T.sub.t] + [u.sub.t]
where                                     where
D[([T.sub.B]).sub.t] = 1 if t =         D[T.sup.*.sub.t]= t -
[T.sub.B] + 1, 0 otherwise                [T.sub.B] if t >
D[U.sub.t] = 1 if t > [T.sub.B], 0          [T.sub.B], 0 otherwise
otherwise                               DTt = t if t > [T.sub.B], 0
otherwise

Table 2. Null and alternative hypotheses considered by Zivot
and Andrews (1992).

Null hypothesis                Alternative hypothesis

Model A                        Model A
[y.sub.t] = [mu]  +            [y.sub.t] = [[mu].sub.1] +
[y.sub.t - 1] + [u.sub.t]      ([[mu].sub.2] -
[[mu].sub.1])D[U.sub.t] +
[[delta].sub.t] + [u.sub.t]
Model B                        Model B
[y.sub.t] = [mu]  +            [y.sub.t] = [mu]  +
[y.sub.t - 1] + [u.sub.t]      [[delta].sub.1] t +
([[delta].sub.2] -
[[delta].sub.1])D[T.sup.*.sub.t]
+ [u.sub.t]
Model C                        Model C
[y.sub.t] = [mu]  +            [y.sub.t] = [[mu].sub.1] +
[y.sub.t - 1] + [u.sub.t]      [[delta].sub.1] t +
([[mu].sub.2] -
[[mu].sub.1])D[U.sub.t] +
([[delta].sub.2] -
[[delta].sub.1]) D[T.sub.t] +
[u.sub.t]
where                          Where
D[U.sub.t] = 1 if t >          D[T.sup.*.sub.t] = t -
[T.sub.B], 0 otherwise         [T.sub.B] if t > [T.sub.B], 0
D[T.sub.t] = t if t >            otherwise
[T.sub.B], 0 otherwise

Table 3. Deterministic components considered by Perron and
Rodriguez (2003)

Structural change in slope        Structural change in trend
and slope

Model A                           Model B
[psi]'[z.sub.t]  =                [psi]' [z.sub.t] =
[[beta].sub.1]  t +               [[mu].sub.1] + [[mu].sub.2]
[[beta].sub.2]                    D[U.sub.t] + [[beta].sub.1] t
D[T.sup.*.sub.t]                  + [[beta].sub.2]
D[T.sup.*.sub.t]
where                             where
D[T.sup.*.sub.t] = t -            D[U.sub.t] = 1 if t >
[T.sub.B] if t > [T.sub.B], 0     [T.sub.B], 0 otherwise
otherwise

Table 4. Empirical Application

(1)                 (2)
Series                    statistic           statistic
C + T           C            BT

1. GDP                      -1.638       -0.119        -2.077
(1980:I-2011:II)                     [1990:III]    [1990:III]
2. Absorption               -1.471       -0.502        -2.129
(1980:I-2011:II)                     [1990:III]    [1990:III]
3. Consumption              -1.521       -1.638        -2.326
(1980:I-2011:II)                     [1990:III]    [1990:III]
4. CPI                      -0.955      -4.273 **      -1.712
(1994:I-2011:II)                      [2000:IV]    [2000:IV]
5. Exchange Rate            -0.756      -4.039 **    -4738 ***
(1991:I-2011:II)                      [2005:I]      [2005:I]
6. Unemployment           -9.615 ***   -10.089 ***   -10.225 ***
(Jan2001-Jun2011)                     [Dec2000]    [Dec2005]
7. Real Exchange Rate       -3.397     -3.805 ***      -3.391
(Jan1994-Jun2011)                     [Sep1998]    [Sep1998]
8. Exports Price Index      -1.221       -2.043        -1.764
(Jan1991-Jun2011)                     [Dec2008]    [Dec2008]
9. Imports Price Index      -1.194       -1.715        -1.713
(Jan1991-Jun2011)                    [Dec2008]     [Dec2008]

(3)
Zivot-Andrews
Series                             statistic
C             BT

1. GDP                       -4.447       -4.707 **
(1980:I-2011:II)         [1988:III]     [1988:III]
2. Absorption                -4.048         -3.992
(1980:I-2011:II)         [1988:III]     [1988:III]
3. Consumption              -5.368 **     -5.347 ***
(1980:I-2011:II)         [1988:III]     [1988:III]
4. CPI                       -4.386         -3.921
(1994:I-2011:II)          [2001:II]     [2007:IV]
5. Exchange Rate           -5.687 ***     -4 967 ***
(1991:I-2011:II)         [1997:III]     [2006:II]
6. Unemployment            -10.684 ***    -11.245 ***
(Jan2001-Jun2011)         [Nov2005]     [Sep2005]
7. Real Exchange Rate       -4.672 *      -4.659 **
(Jan1994-Jun2011)         [Oct2005]     [Oct2005]
8. Exports Price Index       -3.013         -3.379
(Jan1991-Jun2011)         [Jun1997]     [Aug2008]
9. Imports Price Index       -2.707        -4.390 *
(Jan1991-Jun2011)         [Nov2006]     [Nov2000]

(4)
Perron-Rodriguez
Series                              statistic
BT             BT

1. GDP                       -5 405 ***       -3.226
(1980:I-2011:II)           [1988:III]      [1990:II]
2. Absorption                  -3.164         -3.184
(1980:I-2011:II)            [1990:I]       [1990:I]
3. Consumption                 -3.258         -3.340
(1980:I-2011:II)           [2004:II]       [2004:II]
4. CPI                         -2.910         -3.149
(1994:I-2011:II)            [1994:I]       [1994:I]
5. Exchange Rate               -2.060         -2.163
(1991:I-2011:II)           [2002:IV]       [2002:IV]
6. Unemployment                -3.163        -3.936 *
(Jan2001-Jun2011)          [Sep2005]       [Set2005]
7. Real Exchange Rate          -3.300         -3.274
(Jan1994-Jun2011)          [Feb2006]       [Feb2006]
8. Exports Price Index         -2.979         -2.909
(Jan1991-Jun2011)          [Jan2004]       [Ene2004]
9. Imports Price Index       -3.851 **       -3.772 *
(Jan1991-Jun2011)          [May2003]       [May2003]

Notes: Variables that exhibited a seasonal pattern were adjusted
with the programs TRAMO and SEATS in automatic mode. Excepting
unemployment, all variables are expressed in logs; for augmented
Dickey-Fuller statistics in column (1), *, ** and *** indicate
that the null hypothesis is rejected at the 10%,5% and 1% level
of significance, respectively; for column (2), exogenous (fixed)
break date is reported with in square brackets. C stands for
"Crash" model where as BT stands for "Breaking Trend" model; for
column (3), estimated break dates are reported with in square
brackets. C stands for "Crash" model where as BT stands for
"Breaking Trend" model; for column (4), estimated break dates are
reported with in square brackets. The modified AIC is employed as
in Perron and Rodriguez (2003). Source: authors' calculations.
```

----------

Aviso: Ilustracion(es) no disponible(s) por restriccion de derechos de autor.