EVALUATING COMPETING DATA SERIES: A TELECOMMUNICATIONS APPLICATION.
Susanna Cahn [**]
Successful application of theory to practice requires good data. Yet, data can vary widely in quality, consistency, and even longevity of a particular data source. Measurement techniques can change, thus changing the definition of a data series, perhaps because of a business reorganization.
In part because of industry change caused by divestiture, there were several alternative sources of data on minutes of telephone demand during the time period covered by this study. A decision had to be made which series, or what combination, to use for forecasting access rates.
A canonical correlations model was used to test the similarity of two competing demand series. The number of statistically significant linear combinations indicates the number of demand series that are statistically distinct. If only one linear combination is significant, the canonical correlation algorithm will produce one optimal linear combination of these series to measure demand.
The exogenous variables were the ones chosen by the FCC for its aggregate demand model. They included: trend, seasonal dummies, lagged dependent variable, and subscriber lines. The two competing demand series were the dependent variables.
Results of the canonical correlation estimation showed two significant linear combinations, showing that the two competing demand series were statistically distinct and indicating that they did not both measure the true demand series.
Separate regressions of each demand series on the set of exogenous variables showed that one of them was predominantly explained by the lag of its own dependent variable; the other showed marked seasonality. A comparison with predivestiture data supported the seasonal pattern. As a result of this research, the latter series was considered the more realistic measure of demand.
This methodology could be applied similarly where, as was the case with divestiture in telecommunications, organizational discontinuity produces competing data series.
With the breakup of the Bell system in 1984, a single long distance telephone call became a transaction that might involve several companies: the caller's local telephone company ("local exchange carrier" or LEC), the receiver's LEC and a long distance telephone company ("Interexchange carrier" or IC). The caller pays the long distance company for these calls, but the long distance company is charged an "access rate" for using the local telephone network. Access rates were introduced by the FCC at the time of the divestiture of AT&T. In 1983, the FCC also mandated the development of the National Exchange Carrier Association (NECA) to administer access rates. An organization unknown outside the telephone industry, NECA has replaced AT&T as the focal point for setting rates and clearing revenues to local exchange carriers who are members of NECA.
NECA prepares and justifies tariffs and distributes revenues on behalf of its member companies in much the same way that AT&T did prior to divestiture. Access rates are set on a unit cost basis with costs including an FCC authorized rate of return on net investment. Rates are set on a prospective basis: costs and volumes are forecasted to set rates. Net investment is multiplied by the authorized rate of return set by the FCC and added to operating expenses to derive revenue requirements. For those costs that are sensitive to the volume of telephone calls, access rates are derived by dividing revenue requirements by volumes (e.g. minutes). One of these rates, the carrier common line rate, is set on a nationwide basis. All other access rates are set on a state-specific basis.
NECA is responsible for achieving the authorized rate of return for its members, The authorized rate of return is set by the FCC and is considered a target rate by the Commission. If the rate rises much above the target rate, NECA must refund some of the excess to interstate companies. If the pool does not reach the authorized level, however, the FCC does not guarantee that it will allow a rate of return increase to recover the shortfall. Achieving the authorized rates of return for the regulated pools depends in part on the accuracy of the forecast data.
These access rates generate billions of dollars of revenues for the local exchange carriers. For example, the common line rate was expected to generate approximately $7 billion of revenue in 1986. Therefore, an accurate forecast of minutes of telephone use is imperative because that forecast is used in turn to calculate the common line rate. Missing the authorized rate of return is costly to members. A 1 percent decline in the common line rate of return will lower member earnings by $25 million per month before taxes.
One of the prerequisites for an accurate forecast of total interstate minutes is an accurate total minutes historical series, as would be the case with any time series forecast. At the time of this study, there were three competing sources of data: AT&T, Tier 1 Tariff Review Plan data (TRP data), and NECA Pooling data. These data sources do not cover identical time periods; they do not always reconcile; as a result, they do not always yield the same annual historical growth rates.
AT&T Data: Predivestiture, AT&T was a complete source of message and line data. It reported message levels for MTS (message telecommunications service) and WATS (wide area telecommunications service), and it reported a count of lines leased to other common carriers (OCCs). To derive access minutes, all three series need to be converted to access minutes by weighting them. Postdivestiture, AT&T chose not to publish its WATS and leased line series, so the weighting method could no longer be used to derive a total minutes of use estimate after 1984. Even so, weighting AT&T data was no longer an appropriate method for estimating total minutes because AT&T was no longer the sole provider of facilities for completing interstate messages. MCI and other OCCs were busily completing their own networks. Nevertheless, an estimate of AT&T's interstate access minutes could still be derived-but by a new method. AT&T was now required to publish its common line access expenses. Prior to June 1986, dividing line expenses by t he common line rate, yielded an estimate of AT&T's access minutes. After June 1986 the common line rate was split into two rates, and the "closed-end of WATS" was no longer charged the common line rate. This change makes the derivation of AT&T's minutes less reliable.
NECA Data: In the postdivestiture period there also appeared two other sources of use data: NECA's Pooling data are collected by NECA from all local exchange carriers (LECs) as part of its common line settlements process. Each LEC submits to NECA demand and associated expense data, then NECA allocates the net proceeds among the LECs. Two of the key data series are premium minutes of use and nonpremium revenues (from leased lines to the OCCs), which are then converted to nonpremium-minutes by taking into account the charge per line and the estimated minutes flowing through a line. Besides the potential problems associated with the conversion formula, the LECs have 24 months to revise a reported figure. Therefore. although the data are comprehensive, the reported minutes are suspect.
TRP Data: The Tariff Review Plan data are collected by Bellcore (an organization spun off from Bell Labs at the time of divestiture) from Tier 1 exchange carriers. These are the largest exchange carriers in the industry; they carry approximately 92 percent of all long distance traffic on their local networks. The series collected are product oriented. For example, some of the series collected are AT&T MTS and WATS and 0CC MTS-like and WATS-like services. The most obvious problem with using these series is the inadequate coverage: the smaller Tier 2 and Tier 3 companies are excluded. A less obvious, but important problem, is that the data series appear to have been smoothed. Marked seasonal patterns in demand that were visible in predivestiture series are missing.
Not surprisingly, there are discrepancies in the total minutes of use data series derived from the different data sources, and controversy about AT&T's share of the long distance market. An illustration is given below using 1985 data:
NECA claims that 193.5 billion minutes of use flowed through the local networks in 1985. When only Tier I companies are considered, the level is 181.8 billion minutes. This is 3.3 billion minutes of use below the level in the TRP data series.
Approximately, two-thirds of this discrepancy (or 2.2 billion minutes) is attributed to accounting differences. NECA minutes are booked in the month that they occurred, TRP data are booked in the month that they were billed.
In its Order in CC Docket 86-125, Phase 1, the Federal Communications Commission began to link statistical measurement and regulatory policy. At issue was the reasonableness of LEC forecasts and similar forecasts by interexchange carriers (IXCs). Since interstate minutes flowing through the LECs' networks should also be flowing through the IXCs' networks, there was reason to believe that the historical data on which the forecasts are based should be similar. Neither expectation was realized. The opinion widely held in the industry was that all the available series were biased measures. This research was undertaken to develop a methodology that overcomes the arbitrary selection of a data source.
When the observed series are unbiased, the optimal linear combination is to weight each inversely to its variance (Granger and Newbold, 1977: 270-271). But this approach is not relevant here because the series may be biased. When bias exists, there is no general solution to the weighting problem. Weighting would have to be based on the distributional differences among the series and the loss function selected.
We used canonical correlations to test whether the observed demand series are statistically biased, and if they are not, optimally weight the series. The hypothesis is the assumption that all relationships among the observed demand series are stable over time. The canonical correlations methodology tests that assumption. The measure of demand most suitable for the forecasting model should be the one which is best explained by the exogenous variables specified. That is, if one measure is essentially unrelated to the exogenous variables given in the forecasting model, one would like to assign it a low weight in the linear combination because to do so will minimize the standard error of the forecast.
Canonical correlations is a statistical technique used to find linear combinations of two sets of variables that are most highly correlated. Normally, this is an exploratory method because the meaning of the linear combinations may be unclear. In our situation, however, the weights assigned to the dependent variables are a measure of the contribution of a dependent variable towards minimizing the model's forecast errors.
One variation of canonical correlations begins with a stochastic vector of dependent variables assumed to be normally distributed and a nonstochastic vector of exogenous variables. The first set of parameters maximizes the correlations between the vector of dependent variables and the vector of exogenous variables. The second set of parameters yields the highest correlation between the two vectors that is uncorrelated with the first fitted relationship. This is continued until the number of linear combinations estimated is equal to the minimum of either the number of dependent variables or the number of independent variables. Typically, the parameter weights assigned to the dependent variables sum to unity as do the parameter weights assigned to the exogenous variables.
Wilks' lambda is a test statistic used to determine the number of statistically significant linear combinations by utilizing the distributional characteristics of the characteristic roots of the covariance matrix of the data. It is distributed as a chi-square with gu degrees of freedom, where g equals the number of restrictions on the independent variables and u is the number of restrictions on the dependent variables (Morrison 1976: 222).
For this analysis, the number of statistically significant linear combinations indicates the number of interstate minutes series that are statistically distinct. If only one linear combination is significant, the canonical correlation algorithm will optimally weight the series.
Two measures of interstate demand were used, the NECA data and the TRP data. (The AT&T data was discontinued in 1984.) For convenience, each variable was normalized (subtracted from its mean and divided by its variance).
Let the true value of interstate demand,[Y.sup.*], be related to the observed values of interstate demand. [Y.sub.1] and [Y.sub.2] and an explanatory vector of variables, X, as follows:
[[Y.sup.*].sub.t] = [b.sub.1]*[Y.sub.1t] + [e.sub.1t] (1)
[[Y.sup.*].sub.t] = [b.sub.2]*[Y.sub.2t] + [e.sub.2t] (2)
[[Y.sup.*].sub.t] = [X.sub.t] D + [n.sub.t] (3)
where the error terms [e.sub.1t], [e.sub.2t] and [n.sub.t] are normal iid variables.
Substituting (1) and (2) into (3) yields:
[b.sub.1]*[Y.sub.1t] + [b.sub.2]*[Y.sub.2t] = [X.sub.t] D + [n.sub.t] - [e.sub.1t] - [e.sub.2t] (4)
Define [n.sub.t] - [e.sub.2t] as [v.sub.t] produces a model that is now in the standard canonical correlations form (Anderson 1984: Chapter 12):
[b.sub.1]*[Y.sub.1t] + [b.sub.2]*[Y.sub.2t] = [X.sub.t] D + [v.sub.t].
Two total minutes of use series were constructed: The first series, called the NECA series, is constructed from MTS messages, WATS messages, and ENFIA lines (leased from LECs by OCCs) from predivestiture AT&T monthly reports for the period 1980Q1-1983Q4, TRP data for 1984Q1 and 1984Q2, and NECA pooling for the period 1984Q3 through 1985Q2. The second series was constructed from the June 1986 version of TRP data for the period 1980Q1 through 1985Q2. The TRP data was inflated by a constant factor to convert it to total market data. The two series are graphed in Figure 1.
The exogenous variables were the ones chosen by the FCC for its aggregate demand model estimated for 1980Q1-1985Q2. (The results of their investigation are in the FCC Order Designating Issues for Investigation, Adopted: April 4, 1986.) They included: trend, seasonal dummies, lagged dependent variable, and subscriber lines.
Therefore, the specification used was:
log [[Y.sup.*].sub.t] = [b.sub.0] + [[b.sup.*].sub.t] trend + [b.sub.2]*(seasdum1) + [b.sub.3]*(seasdum2) + [b.sub.5]*(seasdum3) + [b.sub.5]*([[Y.sup.*].sub.t-1]) + [b.sub.6]*(logsublines) + [e.sub.t]. (5)
The restrictive linear functional form and the restrictive loglinear functional form were compared using the Box-Cox methodology (Judge 1985: 840). This approach was used in the interest of having interpretable functional forms, rather than using the fully flexible Box-Cox model (Judge 1985: 841) as was used by Okunade, Haryanto, and Means. Results here showed that both functional forms produce roughly the same empirical results (see Table 3).
The FCC model was estimated using TRP data; therefore, a priori, one would expect the empirical results to favor the TRP data, that is, the TRP data should be assigned a relatively large weight in the first linear combination of the dependent variables. Less obvious is whether the two series are statistically different. Both have the same general appearance although the TRP data looks smoother and has a higher growth rate toward the end of the sample period.
The results of the canonical correlation estimation are displayed in Table 1. The second set of linear combinations is significant at the 7.3 percent confidence level. This was sufficient for this application to conclude that the two series are statistically distinct. The estimated parameters (known also as canonical variables) for the dependent variables in the first canonical correlation are 1.02406 for TRP data and -.02438 for NECA data.
Cointegration analysis for the two series and their difference is shown in Table 4. Dickey-Fuller tests show acceptance of the hypothesis of a unit root for each series as well as for the difference, indicating that the series are not cointegrated (Green 1997: 850). This analysis again confirms the results of the canonical correlation discussed above.
The complete dominance of TRP data is best understood by regressing TRP and NECA data separately on the FCC exogenous variables. Highlights of the results are displayed in Table 2. The main explanatory variable in the TRP model is the lagged value of TRP, whereas; in the NECA model seasonal dummies and subscriber lines have relatively the same statistical significance as the lagged dependent variable. The implication is that the TRP series is best explained by its own history. This is not the case for NECA data. Given that predivestiture MTS and WATS message data exhibited stable seasonal patterns, one has to be suspicious of the lack of seasonality of the TRP data. In fact, this suspicion is shared by members of the FCC. In a letter addressed to the Interstate Access Data Task Force, July 28, 1986, Albert Halprin, the Chief of the Common Carrier Bureau assessed the TRP data as follows in his address to NYNEX: "the data submitted by many LECs have been grossly inaccurate, inconsistent, tardy, irrelevant or p resented in ways that impede comprehension."
Since the two series were shown to be statistically distinct, a linear combination of these two series could not be used to represent the true underlying series. Therefore, an optional weighting scheme was not produced. Because of this research the TRP data was discredited and a more realistic data series was developed by NECA.
Canonical correlations may be used as a means of testing the similarity of competing data series and, when the competing series can be shown to be alternate measures of the same underlying variable, optimally weighting these series. A critical step is to specify exogenous variables that have economic meaning.
This method was used at NECA to analyze two series for minutes of use--demand for telephone services, with an eye toward weighting these series to produce one measure of demand for use in setting rates. The two series were shown to be distinct, indicating that they did not both measure the true demand series.
(*.) Director of Demand Forecasting and Rate Development, National Exchange Carrier Association.
(**.) Associate Professor of Management, Lubin School of Business, Pace University.
Anderson, T. W. Introduction to Multivariate Statistical Analysis. 2nd Ed. New York: John Wiley and Sons (1984).
Dhrymes., Phoebus J. Introductory Econometrics. New York: Springer-Verlag, (1978).
Granger, C. W. J., and Paul Newbold, Forecasting Economic Time Series. Orlando, FL: Academic Press, Inc. (1977).
Greene, William H. Econometric Analysis. 3rd Ed. Upper Saddle River, New Jersey: Prentice Hall (1997).
Judge, et.al. The Theory and Practice of Econometrics. New York: John Wiley and Sons (1985).
Morrison, Donald F. Multivariate Statistical Methods. New York: McGraw Hill (1976).
Okunade, Albert A., Haryanto, H., and Means, Dwight B. Jr., "Testing the Unbiasedness Hypothesis of Foreign Exchange Rates and the Analysis of Transformations" Review of Quantitative Finance and Accounting, Vol. 6 (1996),
|Printer friendly Cite/link Email Feedback|
|Author:||Glass, Victor; Cahn, Susanna|
|Date:||Sep 22, 2000|
|Previous Article:||TEACHING INTERNATIONAL TRADE AND FINANCE USING COMPUTER SPREADSHEETS.|
|Next Article:||A STUDY OF INTEREST RATES IN THE LONGER RUN: NOW YOU KNOW THE REST OF THE STORY [*].|