Printer Friendly

Determining the maximum number of uncorrelated strategies in a global portfolio.

In a portfolio composed of different assets from the same class, or numerous asset classes, the drivers of return variation may appear elusive. Although there are idiosyncratic factors influencing variation of the returns, there exist as well common factors that account for the portfolios collective variation. Uncovering and decomposing the importance of these common drivers assist in cross-asset strategy building in portfolio management--a topic of interest to investment managers. One method to formalize the task of determining the maximum number of uncorrelated strategies to include in a global portfolio is the selection of the number of factors in a large-dimensional Factor model.

Factor models have been widely studied. mainly in macroeconomics and asset pricing. In macroeconomics, they are used to determine the factors that influence measures of the economy, or in policy analyses. For example, Bernanke et al. [2005] introduce the factor-augmented vector autoregressive (FAVA k) model to analyze monetary policy. Form et al. 120031 study the structure of the macroeconomy, while Favero et al. 120051 compare static and dynamic principal components in estimating macroeconomic variables. In consumer demand theory, Lewbel 119911 applied factor models to budget share data to reveal information about the demand system. In finance, Chamberlain and Rothschild. 11983] extend arbitrage pricing theory with Factor models, which has since been used not only to decompose risk and return into explicable and inexplicable components but also to describe the returns' covariance structure for prediction, and to construct portfolios with desired characteristics, among others.

Factor models are categorized by three types of factors. There are i) macroeconomic factors (observable economic or financial time series), ii) fundamental factors (observable asset characteristics), and iii) statistical factors (unobservable asset characteristics). An example of a well-known single-factor model is the capital asset pricing model (CAPM), which describes the relationship between risk and return. The observation that stocks with small capitalization and high book-to-market ratio tend to perform better led Fama and French to refine CAPM as a three-factor model. In this article, the focus is on statistical factor models, and the factors are estimated using principal component analysis (PCA).

PCA is frequenulv applied in financial research, especially in studies on systemic risk. For instance, Kritzman et al. 120111 use principal components as an implied measure of systemic risk. This measure is extended in Kinlaw et al. 120121 to include centrality, a feature encompassing an entity's vulnerability to failure, its connectivity to .other entities, and the risk oldie entities to which it is connected. More recently, Billio et al. [2012] use PCA to investigate the connectedness of hedge funds, mutual funds, insurance companies, and banks. These works' objective is to capture changes in correlation and causality among Financial institutions, whereas ours is to investigate correlation between the returns of different assets. As another example, PCA is also applied by Puk-thuanthong and Roll 12009] in their measure of global market integration. In all instances, PCA, as a method to decompose returns into factors, proves to be an invaluable tool, as its use by Financial institutions such as the U.S. Office of Financial Research makes clear.

The operational value of exposing common factors driving variances in these markets is immense and is of fundamental interest to any asset and risk manager. In addition to theoretical interest regarding the suitability of methodologies on empirical data, portfolio managers would be better able to gauge the potential risk factors from which they could profit or lose. Given that this factor analysis comes with a measure of the share of the price action that is due to each factor, it could also help investment managers decide where to put their efforts to build their market forecasts. For instance, because the level factor explains more than 80% of interest rate curve's movements, devoting a large part of the forecasting effort to the level of rates makes somewhat more sense than investing time in anticipating the slope's movements, which explain less than 10% of rate's variations. Similarly, the risk appetite factor in the Global Macro Hedge Fund accounts For up to 45% of the data's variance and hence deserves heightened attention. This is in contrast to Asian market movements, which explain only about 4% of variances: thus tracking them yields only a tenth of the insights of tracking investors' sentiment.

Knowledge of a portfolio's factor structure is also advantageous in risk management. In the event that the portfolio's variance is explained by few factors that surround a theme, then the portfolio manager should be wary about the high susceptibility of the portfolio's return to events on that theme. Our factor analysis and the dynamic measurement of the time-varying correlation between factors are of high importance, as a sudden increase in the correlations can spontaneously increase the risk of a seemingly well-diversified portfolio. For all these reasons, measuring and tracking those factors are essential steps in the construction of any portfolio.

Numerous methods to determine the number of factors in the case of unobserved factors have been proposed. Arguably the most popular is the information criteria (IC) method. IC is based on the idea that an (r+1) -factor model has to fit at least as well as an r-factor model but is less efficient. The well-known Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) cannot be directly adopted when we have two-dimensional data (i.e., N as the number of assets, T as the time span) because they are functions of N or T alone and hence fail, to consistently estimate the number of factors when the factors are unobserved. Bai and Ng [2002] (BN) propose a set of six penalty functions to replace the ones in AIC and BIC and establish conditions ensuring consistency of their method. Despite its wide empirical adoption., BN's criterion often does not converge, as demonstrated in Forni et al. 120071. Alessi et al. 120091 (ABC) refine BN's criterion and demonstrate that their criterion has superior performance. Alternate approaches include analyzing the factor loadings (Connor and Korajczyk [19931), tests on the matrix of the covariance eigenvalues of returns (Kapetanios [2005], Onatski [2009], Onatski [2010]), numerous tests on the rank of the covariance matrix (Lewbel [1991], Forni and Reichlin [1998]), and a graphical method that is rarely adopted due to its lack of theoretical basis (Donald [1997]).

By comparison of the criteria in a Monte Carlo study, ABC's is deemed to be the overall best in terms of accuracy and precision. Application of ABC's criterion to five datasets yields the following number of factors: five for the Global Macro Hedge Fund (GMHF), three for U.S. Treasury bond rates (USTB), two for commodity prices, and one each for U.S. credit spreads (USCS) and currencies. Total variation explained by the factors varies: 74% for GM HF, 94% for USTB, 49% for USCS, 27% for commodity prices, and 59% for currencies. Economic interpretation is attached to the factors according to correlation between the factors and the assets whose variances they describe. The five factors for GMHF are associated with risk appetite, commodities, the U.S. dollar, the Japanese market, and Asian stock markets. Those for USTB fit the description of factors found in previous research and are labelled as level, slope, and curvature. USCS's sole factor corresponds to midrange risky assets, and that for currencies is labelled as the carry factor. The pair of factors for commodity prices is linked to energy and metal.

Stability of the number offactors over time is investigated by testing significance of correlation between factors.. Even though by definition of principal components the factors are in theory orthogonal to each other, the estimated factors may not be so. Instantaneous correlation between these .estimated factors call be uncovered by considering .a rolling window over the time dimension. Correlation between the factors over rolling windows does not yield an overarching conclusion for all datasets regarding the hypothesis that during periods of economic downturn, fewer factors are required to explain variances in the data because of-increased cross-market correlation. For traditional asset classes, such as USTB, it is evident that .correlation between factors increased. Yet for commodity prices., it is less clear. When all asset classes are considered too-ether in the GMHF dataset, the different factor interactions that are observed within each asset class manifest in .a more complex manner and suggest that cross-asset correlation may be a leading indicator of economic cycles, because a spike in correlation is observed prior to the 2007-2010 financial crisis.

This article is organized as follows. We first present the factor model and briefly explain the numerous criteria proposed to determine of the number of factors. After a Monte Carlo Study for selected methods, application of the methods to datasets relevant to the investment management industry is carried out. We then interpret the results and analyze their stability overtime.


The Factor Model

In the analysis of .a large pool of assets' returns, a basic mean--variance analysis requires estimation of a quadratically increasing number of expected returns, standard deviations, and correlation coefficients) To overcome the difficulty of having to estimate the statistics using long historical data to attain low standard deviations, a linear factor model that relates the returns of the portfolio to a finite number of factors can be introduced. An r-factor approximatc_fiu-tor modcl is specified as

[X.sub.[mu]] = [[lambda].sub.t][F.sub.t] +[e.sub.n] i = 1 ... N and t = 1 ... T (1)

whereby [], is the observed data for the [] cross-section at time t, [F.sub.t] is the r x 1 vector of common fictors, [[lambda].sub.i] is the r x 1 vector of factor loadings, cu is the idiosyncratic component, [[lambda].sub.t]'[F.sub.t] is the common component of [X.sub.n], and' denotes the complex conjugate transpose of the matrix. [F.sub.t] and [e.sub.t] = ([e.sub.1t], [e.sub.2t], ..., [e.sub.Nt]) are assumed to be uncor-related, and the matrix composed of cov([e.sub.i], [e.sub.j]) is not necessarily diagonal (i.e., allows correlation between the asset returns' idiosyncratic components), but the largest eigenvalue of the idiosyncratic component's covariance matrix is bounded to limit the degree of correlation. Hence. [X.sub.t] = ([X.sub.1t], [X.sub.2t], ..., [X.sub.Nt])is explained by both common factors and specific components.

Principal Component Analysis

Because [F.sub.t], is unobservable in our setup, we estimate its value using principal component analysis. PCA relies on decomposition of the correlation matrix of [X.sub.t], into its eigenvalue and eigenvectors as var[[X.sub.1t]] = [SIGMA] Q[THETA]Q. Q is the matrix of eigenvectors and [THETA] the eigenvalues of var[[X.sub.t]], with [[theta].sub.1], ..., [[theta].sub.N] along its diagonal.


For i = 1 ... N, [[theta].sub.t]/[[SIGMA].sub.k=1.sup.N][[theta].sub.k] can be interpreted as the amount of variation explained by the [] factor. The eigenvector corresponding to the largest eigenvalue is a scalar transformation of the first principal component, or factor. The eigenvector corresponding to the second-largest eigenvalue is in the same direction as the second factor, and so on. (2) Thus, estimating [F.sub.t], involves computing the eigenvectors of the correlation matrix of [X.sub.t].

Methods Implemented to Determine the Number of Factors

The next natural question in the setup of the factor model is the number of factors, r, to include. For the CAPM, r = 1 (i.e., the well-known beta), whereas in the Fama--French model, r = 3 (i.e., the same as in CAPM plus capitalization and book-to-market ratio). Unlike the CAPM and the Fama--French model, we do not pre-specify r. Instead, r is determined statistically via tests that rely on different properties of the factor models. The first,

introduced by Bai and Ng[2002], is an information criterion such as the well-known Akaike or Bayesian criterion in regression analysis. The number of factors, corresponds to the r-factor model with the lowest value of a loss function. The loss function is the sum of the squared residual from the r-factor model plus a penalty function that is increasing in r. (3) The second criterion is by Alessi et al. [2009] (ABC), which applies Bai and Ng [2002] repeatedly to refine the information criterion. We also implemented the tests introduced by Connor and Kora-jczyk [1993] that rely on comparing the squared residuals of ordinary least squares models with various factors via a statistical test. The fourth and final criterion that we implement is by Onatski [2009]. This is a statistical test based on the properties of eigenvalues of an r-factor model being bounded, but its [(r+1)] factor is unbounded.

All four methods are compared in a Monte Carlo Study. The criteria are evaluated based on their accuracy on simulated data. Seven different sets of data are used, each with unique statistical characteristics.' This approach allows us to assess the conditions under which these criteria would perform well on empirical data. The Monte Carlo Study reveals that the ABC criterion is superior in accuracy and precision across data with various characteristics, and thus we will apply it to empirical data in the next section.


In this section, ABC's criterion is applied to five datasets covering major asset classes such as equities, U.S. Treasury bonds, credit spreads, currencies, and commodities. The estimated number of factors is first identified and labelled. Then the stability of factors over time is analyzed by testing the significance of correlation between them and re-estimating the number of factors after splitting each dataset along the time dimension according to the economic cycle.

Global Results

The Global Macro Hedge Fund dataset consists of major indexes, government bonds, currency exchange rates, and futures of currency exchange rates and oil, from January 1999. to March 2012. The data is of weekly frequency because of its cross-boundary nature, so the difficulty to reconcile market closing times worldwide is mitigated. Next is a dataset of daily closing rates of U.S.

Treasury bonds (USTB) with maturity of three months to thirty years, from January 1997 to March 2012. The U.S. credit spreads (USCS) data consists of daily closing rates categorized by industry (e.g., financial corporation, insurance, and energy), financial rating (e.g., AAA, AA, BBB), and duration (e.g., one to three years, three to five years), from January 1997 to March 2012. Daily closing prices of commodities such as gold, aluminum, natural gas, corn, wheat, the S&P GSCI (Goldman Sachs. Commodity Index) and so on from October 1998 to March 2012 are compiled in the dataset of commodity prices. The fifth dataset, called Currencies, includes daily prices of the euro, pound, Swiss franc, Japanese yen, Canadian dollar, Australian dollar, New Zealand dollar, Norwegian krone, and Swedish krona in terms of U.S. dollars from January 1999 to December 2012. BN, ABC, and CK criteria are applied to log prices or log spread variations for the GMHF, USCS, Commodity Prices, and Currencies datasets, whereas for USTB, variation of rates is used because with two-year rates close to zero or negative, their percentage variations are extreme. Composition of each dataset is presented in Exhibit 1.

Composition of All Datasets and Results Summary

Range: January 1999 to March 2012

 List of Assets in GMHE Dataset

Stock Indexes (13 Dim Jones, Nasdaq 100, Eurostoxx, FTSE,
assets) CAC 40, DAX. IBEX, SMI. Nikkei, TOPIX,
 Hang Seng, MSCI Singapore Free Index

Bonds (12 assets) US 30Y, US 10Y, US 5Y, US 2Y, CAN I0Y, OK
 IDY, GE 5Y, GE 2Y, UK I0Y, JP I0Y, OZ
 10Y, ED 4 (Euro-Dollar bond).


Futures (5 assets) CI IE futures, JPY futures, AUD futures,

 CAD futures, GBP futures.
Commodities (2 Oil. S&P GSCl.

Others (1 asset) Mini SP 500.

Range: January 1997 to March 2012
List of Assets in U.S. Treasury Bond Rates Dataset
(Maturity) 3M, 6M, 1Y, 2Y, 3Y, 4Y, 5Y, 7Y, 8Y, 9Y, I0Y, 15Y, 20Y,
25Y, 30Y.

Range: January 1997 to March 2012
List of Assets in U.S. Credit Spreads Dataset
Master index, financial corporateS, banks, insurance, industrials,
capital goods, energy, utilities, consumer cyclicals, consumer non
cyclicals, healthcare. AAA, AA, A, BBB, 1-3 years, 3-5 years, 5-7
years, 7-10 years.

Range: October 1998 to March 2012
List of Assets in Commodity Prices Dataset
Gold, silver, platinum, aluminum, copper, nickel, zinc. lead. WTI,
Brent, gas-oil, natural gas, heating oil, corn, wheat, coffee, sugar,
cocoa, cotton, soybean, rice. S&P GSCIs; agriculture, energy,
industrial metals, precious metals.

Range: January 1999 to December 2012
List of Assets in Currencies Dataset
Daily price in USD of: EUR, GBP, NOK SEK, CHE, JPY, AUD, NZD, CAD.

Aside from implementing BN, ABC, and CK criteria on the datasets, because the Monte Carlo study indicates that the selected criteria generally have poorer performance when cross-section correlation exists, the influence of such dependencies on the accuracy of results obtained is evaluated by fitting a vector autoregressive (VAR) model to remove dynamic linear dependence, then applying the same criteria to the residuals. Using the Akaike Information Criterion to determine the number of lags to include in the VAR model, the USTB and USCS datasets are fitted with VAR(3), commodity prices with VAR(2), and GMHF and currencies with VAR(1). BN and ABC criteria provide the same outcome when the analysis is done on the residuals as is on the returns data. CK's criterion yields slightly different estimates. Despite its commendable performance in the Monte Carlo study, BN's criterion fails to converge on all datasets: [r.sub.max] is always estimated. CK's estimates do MaX not always coincide with ABC's estimates, but the latter is taken to be more accurate because of its better performance in the Monte Carlo study, and it is invariant to whether linear dependencies exist in the data. Results are presented in Exhibit 2.
EXHIBIT 2 Summary Table of Results Using Various Criteria

 Number of Factors Estimated by

Dataset Bai and Ng Connor and Alessi
 (BN) Korajczyk el al.
 (CK) * (ABC)

Global 12(5) 5

U.S. 4(5) 3

U.S. Credit [r.sub.max] 1(2) l
Spreads is always

Commodity 3(6) 2

Currencies 1(2) l

In Exhibit 7, the number of factors estimated and the proportion of variance explained by each factor, i.e., the ratio of the [] eigenvalue and the sum of all eigenvalues of the covariance matrix of the returns, are presented. The proportion of variances explained by the estimated number of Factors suggests the concentration of correlation. An asset class or portfolio for which correlations across assets' variations are high should have a high percentage explained. Using ABC's criterion, the GM HF is estimated to have Five Fictors., which collectively explain about 74% of the variances in the dataset. USTB has three factors that explain 94% of the variances.. Commodity prices are estimated to have two factors. Together, they explain 27% of the variances--the lowest among, the datasets considered. Commodities' low concentration of correlation is consistent with the findings of .Gorton and Rouwenhorst [2004] on the asset class's diversification potential and becomes the rationale for investors to increase portfolio allocation to commodity assets (Daskalaki and Skiadopoulos [2011]). Next, one factor each is estimated for U.S. credit spreads and currencies, accounting for around 49% and 59% of the variances, respectively.
EXHIBIT 7 Results Summary--All Datasets
Global Macro Hedue Fund

Factor 1 2 3 4

Label Global Commodities U. Japanese
 Equities S.dollar Market

Eigenvalue 0.135 0.041 0.022 0.015

Proportion 45 13 7 4.8

Cumulative 45 58 65 69.8

Macro Hedue

Factor 5

Label Asian

Eigenvalue 0.014

Proportion 4.2

Cumulative 74

US Treasury Bond Rates

Factor 1 2 3

Label Level Slope Curvature
Eigenvalue 12.305 1.469 0.676
Proportion (%) 80 10 4
Cumulative (%) 80 90 44

US Credit Spreads

Factor 1

Label Mid-range risky assets
Eigenvalue 0.895
Proportion (%) 49
Commodity Prices

Factor 1 2

Label Energy Metals
Eigenvalue 0.206 0.181
Proportion (%) 15 13
Cumulative (%) 15 27

Factor 1
Label Carry factor
Eigenvalue 0.11
Proportion (%) 59

After obtaining the number offactors, sensitivity of each asset's return on the factor--i.e., factor loadings--is investigated and presented in bar charts. This notion of sensitivity can be extended to that of a portfolio, as it is merely the sum of the corresponding assets in the portfolio. Moreover, portfolios that are unsusceptible to a particular factor can be constructed by selecting assets, such that their weighted sum (i.e., loadings treated as weights) is zero. Factor loadings also help in identifying the factor--that is, to label the latent, hypothetical factors according to their relationship with the assets. Absolute correlation of the factors with the assets' returns is first ordered in descending order and progressively added to the selection, i.e., adding First those with the largest absolute correlation, until the selection achieves at least 95% R2 when used as explanatory variables in a simple regression model For the asset returns. These correlations are presented as bar plots in Exhibits 3 through 6.





Global macro hedge fund. Exhibit 3 shows that Factor 1 of GMHF is highly correlated with major equity indices worldwide. This finding suggests that Factor 1 corresponds to a risk appetite Factor. During bullish periods, investors are more willing to take risks and hence prefer to invest in equities. Conversely, during bearish periods, investments are diverted into bonds, which have lower risk. Factor 2 relates to oil and GSCI, and hence it is labelled as a commodities factor. Factor 3 is a dollar factor because of its association with the U.S. dollar. Factor 4 represents the Japanese market, whereas Factor 5 is linked to Asian markets.

U.S. treasury bonds. Prior research by Dai and Singleton [2000i and Litterman and Sheinkman 11991j has shown that observed variation in bond prices is explained by three factors: level, slope, and curvature. These terms describe the shift of the yield curve in response to a shock. A level shock shifts the curve in a parallel manner, resulting in an almost equal effect .on bonds of all maturities. The slope factor implies larger shocks for bonds with small maturity compared with bonds with longer maturity. Its name is derived from the effect of the yield curve becoming less steep as a result of a slope shock. Curvature affects medium-term interest rates, and hence it presents itself as a "hump" on the yield curve. (5) Indeed, Exhibit 4 shows that Factor 1 has relatively uniform correlation across bonds of all maturity, just as a level factor would. Correlation for Factor 2 changes sign once and has a larger correlation, and hence a larger impact, on bonds with smaller maturity, as should a slope factor. Factor 3 has a hump in its correlation figure, fitting the description of a curvature factor. Thus, the Findings are consistent with existing results.

U.S. credit spreads. The sole factor for USCS is correlated to the A-rated investments, financial corporates, and industrial credit spreads that dominate those with utilities and consumer cyclicals as shown in Exhibit 4, implying that it is associated with assets having mid to low credit risk. Investments with low credit risk and conventionally small credit spreads, such as AAA-rated assets and Treasury bonds, are not among those that possess the highest correlation with the factor, suggesting that they play a relatively smaller role in the movement of returns. Similarly, industries responsible for providing goods with low substitutability, such as utilities, healthcare, and energy, along with those that are highly dependent on the state of the economy, such as consumer cyclicals (e.g., entertainment, automorive, and so on), possess lower correlation with the factor. In comparison with other datasets, assets in USCS have a relatively more uniform correlation (i.e., all above 50%) with the factor.

Commodity prices. Interpretation for the two factors of commodity prices is straightforward: Factor 1 is the energy factor. Factor 2, being correlated with numerous metals, is the metals factor, as is evident in Exhibit 5. Similar to the case of USTB, the number of factors in commodity datasets is commonly studied. Daskalaki et al. 120131 investigate common components of a cross-section of commodity futures data, using numerous asset pricing models intended for equities, macro and equity-motivated factor models, and principal component factor models, to find that none of them satisfactorily prices commodity assets. The authors attribute this poor pricing model performance to heterogeneity in commodity markets, as well as segmentation of equity and commodity markets. Our results are consistent with Daskalaki et al.'s because among all datasets, commodity prices' factors collectively explain the least amount of variance in the data, an observation that supports the commodity diversification effect made popular by Gorton and Rouwenhorst 12041, and has motivated investors to increase their portfolio allocation in this asset class (Daskalaki and Skiadopoulos [2011]). Furthermore, identification of factors by commodity group and low correlation across these groups suggest a sectorial framework when studying the commodity market.

Currencies. For the currencies dataset, a single factor is estimated by ABC. It is labelled as the carry factor, because it is correlated to currencies with high average rates, such as the Australian dollar, Norwegian krone, and Swedish krona, as shown in Exhibit 6. It could also be understood as the dollar factor, because all currency prices are positively correlated with the U.S. dollar. To the best of our knowledge, no empirical evidence exists in the literature regarding the number of factors in currency datasets.


Because all datasets span more than 10 years, including the most recent financial crisis in 2008, it is of interest to know whether the number of estimated factors is stable over time. Literature on the stability of factor models is sparse. Most authors have either assumed stability or relied on some graphical method. Bliss [19971 divided his sample into three subperiods and investigated the factor loadings on each. His hypothesis is that if the factor loadings appear similar through all subperiods, then the Factor structure is stable. Perignon and Villa [20021 found that factor loadings are stable but factor volatility varies over time. Chantziara and Skiadopoulos [2008] analyzed the term structure of petroleum futures by splitting the sample into two as well. Because the PCA results are similar in both samples, the authors conclude that the factor structure is stable. Attempts to devise formal tests include those by Audrino et al. [2005] and Philip et al. [20071, which focus on the term structure of interest rates. The former relies on testing equivalence of the Factor loadings on subperiods, whereas the latter involves constructing a bootstrap distribution for the test statistic. Evaluation of the aforementioned research is a free-standing topic worthy of a full-length research paper and thus is not done here.

To investigate the stability of the factors on our dataset, we attempt two approaches. The first assumes that the factor structure is stable but the correlation between the factors may not be. Correlation between factors when the factor structure is stable is closely related to the concentration of eigenvalues. To reveal the evolution of the relationship between factors, the (-test for significance of correlation is used. Next, discarding the assumption of a stable factor structure, the dataset is split into expansion and contraction economic periods, and the corresponding number of estimated factors in each subperiod is obtained. To take into consideration that fmancial market performance is a leading indicator of the macroeconomic situation, the periods of contraction and expansion are lagged by a negative number of months, from--1 to--12. In other words, the number of factors prior to the start of a contraction or expansion period is computed to determine if the change in number of factors occurs before the economy takes a turn in its cycle. The key difference between the two approaches is that by assuming the factor structure is stable in the former, estimation of the number of factors is done only once, and the focus is on their dynamics, specifically correlation, over time. In the latter, the number of factors is estimated for each variation in the economic cycle using ABC's criterion.

Significance of Correlation between Factors

In this section, the factor structure estimated in the previous section is assumed to be stable, but the dynamics between factors evolve over time. Correlation between the factors is tested over a six-month rolling window for GMHF and a one-year rolling window for U.S. Treasury bond rates as well as for commodity prices, using the t-test for significance of correlation. Exhibits 8 and 9 plot the number of uncorrelated factors as a result of the t-test using the test statistic t = P[square root of (N-2/1-[p.sup.2])] with p = correlation between two factors, N = sample size. Under the null hypothesis that p = 0--that is, the correlation between the factors is insignificant--t~Student with a degree of freedom equal to N-2.



Plots of the number of uncorrelated factors suggest that the factors tend to be correlated during recession periods. Increased correlation between factors is particularly obvious for USTB. Exhibit 8 shows a marked drop of the number of uncorrelated factors to only one factor during the most recent financial crisis. This result is attributable to historically low interest rates as investors sought safe investments. Because the factors were identified by their impact on bonds of different maturity--i.e., Factor 1 affects bonds of all maturity evenly, Factor 2 has a large impact on small maturity bonds, and Factor 3 has the highest influence on bonds of midterm maturity--having all factors collapse onto a Single factor suggests that the prolonged near-zero short-term interest rates and low long-maturity rates have removed the disproportionate impact of shocks on the yield curve on bonds of varying maturity. Hence, the level, slope, and curvature factors merged into a single factor.

Factors for commodity prices appear to be unsusceptible to the economic condition, because the number of uncorrelated factors remains stable at two throughout the period studied. This result is in line with that of Kat and Oomen [2006], who find weak correlation across commodity groups. Indeed, energy and metals, the two factors identified, are distinct commodity groups. Moreover, energy and metals are essential to many industries and are not substitutable; they represent immutable drivers of return among commodity assets. Thus, the sectorial view of commodity market stays valid through ups and downs in the economy.

Because USCS and currencies have only one factor, stability analysis is not applicable. For curiosity, however, we overestimate the number of factors at five each and apply the same t-tests to investigate the evolution of the number of uncorrelated factors over time. Exhibit 9 for USCS shows that the number of uncorrelated factors fluctuates between one and two, and thus it does not provide conclusive evidence that the number of factors is lower during economic crisis. As for currencies, Exhibit 9 demonstrates that the number of factors stays constant at one for most of the period, except in early 1999, when occasionally two factors are estimated. This period coincides with the introduction of the euro, which could have given rise to an interim factor influencing the returns.

For the 5% test on GMHF, the number of uncorrelated factors is never five--the number estimated using ABC's criterion over the entire horizon--during recession. Because this number is also fewer than five in many other instances, it is less clear whether higher correlation between factors is a unique feature of recessions. To further investigate this idea, the mean absolute .correlation between factors is plotted in Exhibit 10, on which there is an indisputable spike in correlation in 2007. Hence, preceding the most recent financial crisis, correlation between factors rose drastically, followed by fluctuations in correlation. It could also be argued that the number of factors reduced before recession, an observation that motivated the analysis in the next subsection. With as many as five factors, the dynamics between factors do evolve over time, as shown by the mean absolute correlation plot, but do not necessarily materialize as a clear lowered number of factors during recession.


Stability of the factors is dependent on asset class. For USTB, which is considered to be the lowest risk among those considered, the number of factors that affects its return is lower during recessions, especially for the one between late 2007 and 2010. Commodity prices' factors seem invariant to the economic climate. When combined in the GMHF portfolio, these interactions become more complex and are realized as fluctuating correlation between factors.

Estimated Factors by Economic Cycles

Because financial markets are often thought of as leading indicators of economic cycles, it is possible that the change in the number of factors occurs prior to contractions in the economy. To investigate this, the contraction periods as determined by NBER.are lagged by a negative number of months. For example, "Lag =-1 month" of the most recent financial crisis between December 2007 and June 2009 refers to the interval November 2007 to May 2009. The results are presented in Exhibit 11. For USTB, the number of factors fluctuates between one and five during expansion periods, supporting the view that correlation between factors changes prior to recessions. On the contrary, GMHF and U.S. credit spreads have a constant number of factors throughout expansion periods. As for commodities, the number of factors is stable, coherent with the t-test for significance of correlation results. The number of Eictors in the currencies dataset falls to zero prior to contraction periods. Therefore, the view of Financial market performance as leading indicators of economic cycles is supported only by USTB rates and, to a lesser extent, by currencies.


This article investigates the number of cross-asset uncorrelated strategies available to portfolio managers. After reviewing the literature on existing approaches to determine the number of factors, four methods are selected and tested. The Monte Carlo analysis suggests that the criterion by Alessi et al. [2009] is most reliable. Implementation of the criteria on five datasets yield the corresponding estimated number of factors by ABC in parenthesis: Global Macro Hedge Fund (5), U.S. Treasury bond rates (3), U.S. credit spreads (1), commodity prices (2), and currencies (1). Plots of the number of uncorrelated factors do not all support the hypothesis of increased cross-market correlation during economic recession. Evidence is strongest for the U.S. Treasury bond rates, which is most likely the result of U.S. macroeconomic policies post--financial crisis, but least strong for commodity prices, aligned with the observation of low correlation between commodity groups claimed in previous studies. GMHF, composed of a mix of these assets except for credit spreads, demonstrates a combination of the observed outcome on the rest of the datasets, yielding fluctuating correlation between the factors during the economic downturns. The results regarding stability must be considered with caution, however, because as not every financial crisis is succeeded by a recession period. Thus, there could be a change in correlation between assets occurring outside the NBER-determined recession periods. With more in Formation on the factors driving the returns in different asset classes, investors would have a better understanding of the common sources of risk. Depending on the asset class in mind, a strategy built on exploiting these common sources of risk may have to take the economic climate into. account.
EXHIBIT 11 Number of Estimated Factors by Dataset, by Business Cycle
Expansions and Contractions, Lagged by Negative Number of Months

Dataset Global Macro Hedge U.S. Treasury
 Fund Bonds Kate

Lay (No. Expansion Contraction Expansion Contraction

0 5 6 3 4

-3 6 6 5 3

-6 5 5 2 3

-12 5 4 2 3

Dataset U.S. Credit Spreads

Lay (No. Expansion Contraction

0 1 1

-3 1 I

-6 1 1

-12 1 2

Dataset Commodities Currencies

Lag (No. Expansion Contraction Expansion Contraction

0 2 2 1 1

-3 2 2 1 0

-6 2 2 I 1

-12 2 2 1 0



The tests considered in this study are as follows.

Bai and Ng [2002] (BN)

The estimated number of factors by BN is the integer corresponding to the lowest value of the loss function V(r,F) + rg(N,T), or log(V(r,[F.sub.r])) + r[[sigma].sup.2]g(N,T), with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] whereby [F.sup.r] is the matrix of r factors, [LAMBDA] = [([[lambda].sub.i.sup.r]...[[lambda].sub.N.sup.r]), g(N,T) is the penalty for over-fitting, and r is a constant. [[sigma].sup.2] is a consistent estimate of 1/NT[[summation].sub.i=1.sup.N][[summation].sub.t=1.sup.T] E[[[]].sup.2], which in practice can be replaced by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], [r.sub.max], a pre-specified maximum number of factors considered. Two examples of g(N,T) are [g.sub.1](N,T)= N+T/NT ln(min[([square root of (N)], [square root of (T)]).sup.2), which is frequently used in empirical works, and [g.sub.2](N,T) = (N + T - k)ln(NT)/NT, which has been shown to possess good properties when errors are cross-correlated (Bai and Ng [2008]).

Alessi et at. [2009] (ABC)

Alessi et al. [2009] propose a refinement of BN that multiplies a constant, c, to the penalty function as follows, V(r,[F.sup.r]) + rcg(N,T), or log(V(r,[F.sup.r]))+ rc[[sigma].sup.2]g(N,T). The number .of estimated factors remains as the one yielding the lowest value for these modified loss functions. Furthermore, the authors suggest evaluating the loss functions over random subsamples of the data to find an estimate that is insensitive to the sample size and neighboring values of c. Detailed explanations on the role of c is provided in Hallin and Liska [2007], while generation of the random subsamples is described in. Alessi et al. [2009]. This criterion has been shown to provide a solution when BN's criterion fails, and is not any more complex in implementation because it requires, in essence, multiple repetitions of BN.

Connor and Korajczyk [1993] (CK)

An alternate approach developed by Connor and Kora-jczyk [1993] (CK) is based on the idea that an r factor model's [(r + 1)] factor can have nontrivial factor loadings for some assets but only a small proportion of them. A statistical test for this is developed to test whether the [(r + 1)] factor is pervasive. It proceeds by running two regressions by ordinary least squares (OLS), one with r factors and another with r+ 1 factors. The adjusted squared residuals, [[sigma]] = [[epsilon]]/1-i+1/T-t/N, with [[epsilon]] as the OLS estimated residuals, are computed. A cross-sectional mean for both [[sigma]]S, defined as [[mu].sub.r.sup.n] = [[sigma].sub.r][[sigma].sub.t]/N, is calculated next, for both regression models. Then even month's [[mu].sub.t.sup.n] for the regression with r + 1 factors is subtracted from the odd month's [[mu].sub.t.sup.n] for the r factor model, giving a value [[DELTA].sup.N] Under the null hypothesis that the model has r factors, [[DELTA].sup.N][[pi].sup.-1/2] with [pi] as the covariance matrix of [DELTA], is asymptotically standard normal as n [right arrow] [infinity], hence in practice, a t-test is carried out on the estimates [[DELTA].sup.N][[pi].sup.-1/2]. In order to establish distribution of the idiosyncratic components, the authors made the assumption of homoskedasticity across time periods--pointed out in Bai and Ng [2002] to be undesirable--and that [e.sub.i] = ([e.sub.11], [e.sub.12], ..., [e.sub.iT]), i = 1, ... [infinity] is a mixing process! (6)

Onatski [2009]

Drawing upon the property that an r factor panel of data has unbounded first r largest eigenvalues of the covariance matrix of [X.sub.t], and bounded [(r+ 1).sup.s+t] eigenvalue, Onatski [2009] developed a statistical test for [H.sub.0]:r = [r.sub.0] versus [H.sub.1]:[r.sub.0] < r [less than or equal to] [r.sub.1], with r as the number of factors, [r.sub.1] and [r.sub.0] are the upper and lower bounds for the number of factors, which are determined by prior knowledge. Beginning from [r.sub.0], for each successive r, the Discrete Fourier Transforms (DFTS), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are computed at prespecified frequencies [w.sub.j]. By El Karoui [2006], [X.sub.t], is asymptotically distributed as Tracy--Widom. The test statistic is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] whereby [[gamma].sub.i] is the largest eigenvalue of the covariance matrix of [X.sub.t]. R is essentially a measure of the curvature at the would-be breakpoint of the frequency-domain Scree plot postulated by the alternative hypothesis that the model has more than [r.sub.0], factors, but fewer than [r.sub.1] factors. Critical values for the test are provided in Onatski [2009] for up to r = 18 factors. Similar to the case of CK, imposing a valid distribution requires fairly strong assumptions such as having idiosyncratic components that follow a Gaussian distribution.



Before choosing one method over the others, we perform a Monte Carlo test to evaluate their relative performance on simulated data with various qualities. The experimental design employs seven data-generating processes (DGPs) that differ in their relationship between elements of the idiosyncratic components of the following model:

[F.sub.y] and [[lambda].sub.ij] are normally distributed with zero mean and unit variance. This is similar to the DGPs used in Alessi et al. [2009].

1. Homoskedastic idiosyncratic component, same variance for the common and idiosyncratic component: [e.sub.ii] ~ N(0,1) and r = [theta].

2. Heteroskedastic idiosyncratic component, same variance for the common and idiosyncratic component:

3. Homoskedastic idiosyncratic component, common component has a larger variance than the idiosyncratic component: []~N(0,1) and r= 2[theta].

4. Homoskedastic idiosyncratic component, common component has a smaller variance than the idiosyncratic component: []~N(0,1) and r= [theta]/2.

5. Small cross-section correlation across idiosyncratic parts, same variance for the common and idiosyncratic component: [] = [] + [[SIGMA].sub.h [not equal to]0 h=-H.sup.H] [bate][v.sub.i-h,t], [] ~ N(0,1), and r = [theta]

6. Serial correlation across idiosyncratic parts, common component has .a smaller variance than the idiosyncratic component:

[] = P[] + [], [] ~ N(0,1), [] ~ N(0,1), r = [theta] and r < [theta]/1-[p.sup.2]

7. Serial and small cross-section correlation across idiosyncratic parts, common component has a larger variance than the idiosyncratic component:

[] = P[] + [] + [[SIGMA].sub.h [not equal to]0 h=-H.sup.H][bate][v.sub.i-hj], [] ~ N(0,1), [] ~ N(0,1), r = [theta], and r < [theta]/1-[p.sup.2]

For all seven DGPs, we test For the pairs of time and cross-section dimension (N,T) = {(70,70),(100,120),(150,500)}. The true number of factors, r, is chosen to be 1,3,5,8,10, and 15, all of which are consistent with the requirement r < min{N,T}. The corresponding [r.sub.max] for BN and ABC, and the upper bound on CK and Onatski's test, is [r.sub.max] = 8 when r= 1,3,5; [r.sub.max] = 15 when r= 8,10; [r.sub.max] = 20 when r= 15. CK and Onastki's test always began with the lower bound of 1, to suggest that in many financial datasets, it is unlikely to have prior knowledge beyond the belief that there should be at least one factor, given that the data indeed has a factor structure. The correlation between the common and idiosyncratic components is p = 0.5, [bate] = 0.2, while H = max{N/20, 10} Five hundred Monte Carlo replications are performed for each instances. Additionally, to test ABC's criterion, the parameters to determine the random subsamples are [n.sub.j] = 3/4N (see Alessi et al. [2009]) and [c.sub.max] = 13 with step size 0.01.

BN's criterion has perfect performance for DGPs 1 to 4, correctly identifying the number of factors. When the DGP demonstrates cross-section or serial correlation across the idiosyncratic components, however, such as in DGP 5 to 7, the criterion slightly overestimates the number of factors. There is no obvious .effect of dimensions (i.e., Nand T) on the results, which is .consistent with BN's claim that their criterion yields precise estimates for min {N,T} > 40. Although ABC does not display perfect performance as does BN for DGP 1 to 4, as it is generally plagued by mild overestimation, its performance is more accurate for DGP 5. This result is similar to ABC's own Monte Carlo study. Even though BN has stellar performance in most cases, adoption of ABC is justified because most financial portfolio time series demonstrate cross-section and serial correlation.

Onatski's criterion's performance pales in comparison with the rest of the criterion in almost all cases, because it estimates that the true number of factors is 1 close to 60% of the time, 2 about 30%, and 3 about 10% of the time, being insensitive to the true number of factors. This outcome could result from having the lower bound of the test always set at one, a choice to reflect the case that when the test is implemented on actual data, no prior knowledge is available to determine the lower bound. An upper bound, however, can be set because it must be less than min{N, T}, and methods such as ABC are developed to be less sensitive to the upper bound; hence a larger upper bound can always be selected.

CK's test has the similar tendency of underestimating the true number of factors as 1 close to or exceeding 50% of the time when r [greater than or equal to] 5, when the time dimension is small. However, for the sample with N = 150, r = 500, i.e., large cross-section and time dimensions, the test performs reasonably well for all DGPs except for DGP 2 and 7, correctly identifying the number of factors at least 50% of the time for DGP 1-4 and 6, and overestimating the factor by 1 for DGP 5. In the case of DGP 2, the number of estimated factors is 1 more than 80% of the time, regardless of the actual number of factors, time, and cross-section dimensions.

In general, the Monte Carlo study substantiates BN's criterion as superior not only in accuracy of estimates but also its ease in implementation. (7) In the case when BN's does not perform well, either because of cross-sectional or serial dependence or because of difficulty in estimating [r.sub.max] then ABC's criterion should be executed. CK's test may perform well when the time and cross-section dimensions are large.


Alessi, L., M. Barigozzi, and M. Capasso. "A Robust Criterion for Determining the Number of Factors in Approximate Factor Models." Technical report, 2009.

Audrino, F., G. Barone-Adesi, and A. Mira. "The Stability of Factor Models of Interest Rates." Journal of Financial. Econometrics, Vol. 3, No. 3 (2005), pp. 422-441.

Bai, J., and S. Ng. "Determining the Number of Factors in Approximate Factor Models." Econometrica, Vol. 70, No. 1 (2002), pp. 191-221.

---. "Large Dimensional Factor Analysis." Foundations .and Trends in Econometrics, Vol. 3, No. 2 (2008), pp. 89-163.

Bernanke, B., J. Boivin, and P.S. Eliasz. "Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach." Quarterly Journal of Economics, Vol. 120, No. 1 (2005), pp. 387-422.

Billio, M., M. Getmansky, A.W. Lo, and L. Pelizzon. "Econometric Measures of Connectedness and Systemic Risk in the Finance and Insurance Sectors." Journal of Financial Economics, Vol. 104, No. 3. (2012), pp. 535-559.

Bliss, R.R. "Movements in the Term Structure of Interest Rates." Economic Review, 4Q (1997), pp. 16-33.

Chamberlain, G., and M. Rothschild. "Arbitrage, Factor Structure, and Mean--Variance Analysis on Large Asset Markets." Econometrica, Vol. 51, No. 5 (1983), pp. 1281-1304.

Chantziara, T., and G.S. Skiadopoulos. "Can the Dynamics of the Term Structure of Petroleum Futures Be Forecasted? Evidence from Major Markets." Energy Economics, Vol. 30, No. 3 (2008), pp. 962-985.

Connor, G., and R.A. Korajczyk. "A Test for the Number of Factors in an. Approximate Factor Model. Journal of Finance, Vol. 48, No. 4 (1993), pp. 1263-1291.

Dai, Q., and K.J. Singleton. "Specification Analysis of Affine Term Structure Models." Journal of Finance, Vol. 55, No. 5 (2000), pp. 1943-1979.

Daskalaki, C., A. Kostakis, and G. Skiadopoulos. "Are There Common Factors in Commodity Futures Returns?" Working paper, 2013. Available at

Daskalaki, C., and G. Skiadopoulos. "Should Investors Include Commodities in Their Portfolios After All? New Evidence.' Journal of Banking and Finance. Vol. 35, No. 10 (2011), pp. 2606-2626.

Donald, S.G. "Inference Concerning the Number of Factors in a Multivariate Nonparametric Relationship." Econometrica, Vol. 65, No. 1 (1997), pp. 103-132.

El Karoui, N. " Limit for the Largest Eigenvalue of a Large Class of Complex Wishart Matrices." Annals of Probability. Vol. 35, No. 2 (2006), pp. 663-714.

Favero, C.A., M. Marcel lino, and F. Neglia. "Principal Components at Work: The Empirical Analysis of Monetary Policy with Large Data Sets." Journal of Applied Econometrics, Vol. 20, No. 5 (2005), pp. 603-620.

Forni, M., D. Giannone, M. Lippi, and L. Reichlin. "Opening the Black Box: Structural Factor Models with Large Cross-Sections." Working Paper Series 712, European Central, Bank. 2007.

Forni, M., M. Lippi, and L. Reichlin. "Opening the Black Box: Structural Factor Models versus Structural VAR.s." CEPR. Discussion Paper 4133, Centre for Economic Policy Research, 2003.

Forni, M., and L. Reichlin. "Let's Get Real: A Factor Analytical Approach to Disaggregated Business Cycle Dynamics." ULB Institutional Repository 2013/10147, Universite Libre de Bruxelles, 1998.

Goff, J., ed. "Economic Letter: What Makes the Yield Curve Move?" No. 2003-15, Federal Reserve Bank of San Francisco, 2003.

Gorton, G., and K.G. Rouwenhorst. "Facts and Fantasies about Commodity Futures." NBER Working Papers 10595, National Bureau of Economic Research, Inc., 2004.

Hallin, M., and R. Liska. "Determining the Number of Factors in the General Dynamic Factor Model." Journal of the American Statistical Association, 102 (2007), pp. 603-617.

Jaffe, I. Principal Component Analysis, 2nd ed. New York: Springer, 2002.

Kapetanios, G. "A Testing Procedure for Determining the Number of Factors in Approximate Factor Models with Large Datasets." Journal of Business & Economic Statistics, Vol. 28, No. 3 (2010), pp. 397-409.

Kat, H.M., and R.C.A. Oomen. "What Every Investor Should Know about Commodities, Part II: Multivariate Return Analysis." Technical Report 33, Cass Business School, 2006.

Kinlaw, W., M. Kritzman, and D. Turkington. "Toward Determining Systemic Importance." The Journal of Portfolio Management, Vol. 38, No. 4 (2012), pp. 100-111.

Kritzman, M., Y. Li, S. Page, and R. Rigobon. "Principal Components as a Measure Of Systemic Risk." The Journal of Portfolio Management, Vol. 37, No. 4 (2011), pp. 112-126.

Lewbel, A. "The Rank of Demand Systems: Theory and Nonparametric Estimation." Econometrica, Vol. 59, No. 3 (1991), pp. 711-730.

Litterman, R., and J. Sheinkman. "Common Factors Affecting Bond Returns." The Journal of Fixed Income, Vol. 1, No. 1 (1991), pp. 54-61.

National Bureau of Economic Research. "U.S. Business Cycle Expansions and Contractions." 2012. Available online at

Onatski, A. "Testing Hypotheses about the Number of Factors in Large Factor Models." Econometrica, Vol. 77, No. 5 (2009), pp. 1447-1479.

---. "Determining the Number of Factors from Empirical Distribution of Eigenvalues." Review of and Statistics, Vol. 92, No. 4 (2010), pp. 1004-1016.

Perignon, C., and C. Villa. "Permanent and Transitory Factors Affecting the Dynamics of the Term Structure of Interest Rates." Technical report, International Center for Financial Asset Management and Engineering, 2002.

Philip, D., C. Kao, and G. Urga. "Testing for Instability in Factor Structure of Yield Curves." Technical report, Cass Business School. 2007.

Pukthuanthong, K., and R. Roll. "Global Market Integration: An Alternative Measure and Its Application." Journal of Financial Economics, Vol. 94, No. 2 (2009), pp. 214-232.

To order reprints of this article, please contact Dewey Palmieri at or 212-224-3675.



The content of this article is the sole responsibility oldie authors. It does not necessarily reflect the views of Amundi, Lombard Odier Asset Management, their staff members or clients.


The authors thank participants of the Computational and Financial Econometrics (CFE) Conference 2012 for insightful comments.

(1.) Mean--variance analysis of an N-asset portfolio requires [N.sup.2]+3N/2 estimates.

(2.) See Jolliffe [2002] for the theory and applications of PCA.

(3.) For more details on Bai and Ng [2002] and all tests described below, please refer to the appendixes.

(4.) More details on the simulated data are provided in the appendixes.

(5.) Refer to Goff [2003] for more details and figures illustrating these factors.

(6.) Definition of a Mixing Process: Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] whereby g and H are [sigma]-algebras. So [alpha] is the maximal difference between the joint probability of events in g and H. The mixing coefficient is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] whereby [F.sub.a.dup.b] is the [sigma]-algebra generated [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. If [alpha](m) [right arrow] 0 as m [right arrow] [infinity], then [{[e.sub.i]}.sub.i=1.sup.[infinity]] is called strong mixing, or [alpha]-mixing.

(7.) Links to the codes are,, and, respectively (last accessed November 28, 2012).

LING-NI BOON is a research analyst at Amundi in Paris, France.

FLORIAN IELPO is a fund manager at Lombard Odier Investment Management in Geneva. Switzerland.
COPYRIGHT 2013 American Oriental Society
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Boon, Ling-Ni; Ielpo, Florian
Publication:The Journal of the American Oriental Society
Article Type:Report
Geographic Code:1USA
Date:Oct 1, 2013
Previous Article:The Journal of alternative investments.
Next Article:Frontier Equity Markets: Risk Parity Lessons for Asset Allocation.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters