Printer Friendly

Comparison of short-term weather forecasting models for model predictive control.


Model predictive control (MPC) of commercial buildings is highly appealing as it holds the promise of energy and cost savings through intelligent building operation. Progress toward the ultimate goal of real-time optimal control of buildings is made through a systematic inquiry of each process component. Although a wealth of research has been conducted in building physics, systems, and component modeling, there remains a great opportunity for the proper integration and control of each of these elements from a supervisory control perspective. Among the obstacles is the accuracy by which the stochastic local weather processes can be predicted. It is necessary to investigate suitable forecasting models to uncover the complexity required for short-term prediction of local weather, while observing the applicability within a MPC framework. This paper strives toward this goal through a systematic evaluation of various climates, weather variables, and forecasting models.

Time series prediction of local weather is crucial for many aspects of energy conservation, economic operation, and improved thermal comfort in commercial buildings. In particular, significant motivation exists for HVAC supervisory control applications (e.g., building thermal mass control to respond to utility pricing signals, increased free cooling through superior economizer control, and thermal load prediction for uniform chiller loading and improved part-load performance). The desire is to optimize operation based on a short-term prediction of weather and building utilization to yield energy and cost savings through the minimization of an objective function over the prediction horizon. Identifying a MPC strategy and determining the forecasting model complexity required will support the causes of the example and additional applications. Taking a generalized data-driven perspective over physical modeling, the relevance of this study ranges from utilization within embedded controllers to high-level setpoint adjustment within the building automation systems; the availability of accurate dynamic building energy simulation programs further promotes future application of MPC in commercial buildings.

In terms of the weather forecasting task investigated in this study, it is well known that government and private institutions alike provide forecasting services involving complex meteorological models and supercomputer computation. The arguments against relying on these services are the five main concerns: availability of all variables of interest, service interruption, service availability, system integration, and the data-driven perspective. First, the presented models can be applied to predicting local solar data such as global horizontal and direct normal insolation, which is not commonly offered by commercial service providers. Second, service interruption could seriously negate the benefits of model predictive control. For example, with electrical demand charges typically being levied over monthly bill periods, and ratchet clauses extending this charge to future billing periods, any failure of demand limiting strategy caused by a forecasting malfunction could have large cost implications. Third, the availability of hourly (or subhourly) weather forecasts is not pervasive. That is, forecast updates required in realtime, necessary for error minimization, are not available through most forecasting services. Error can additionally be introduced through localized deviations (e.g., the grid point for which the forecast was generated does not accurately represent local climatic observations). Fourth, system integration concerns stem from the authors' belief that the smart building benefits from an on-site weather station for appropriate model predictive supervisory control. Fifth, a practical, data-driven (on-site) perspective is favored over devising complex (physical) probability models of, say, solar irradiation and/or cloud cover. Therefore, a range of weather forecasting models for MPC have been evaluated, from simple models that only capture the shape of the weather processes to complex models that attempt to capture the stochastic processes and their underlying dynamics.


A number of relevant case studies with weather prediction aspects were reviewed. Many of the studies came from the very active research domain of electrical power forecasting used to optimally dispatch central power generation equipment, with weather forecasting typically influencing the unit commitments. Despite overlaps in forecasting strategies and algorithms, the present concerns limit the literature review to studies involving commercial buildings with weather forecasting and/or prediction features to corroborate the unique investigation presented herein. In both domains, a significant fraction of researchers are convinced that nonlinear forecasting models based on neural networks (NNs) provide superior forecasting performance. Nonetheless, many have found success using traditional time series analysis. For a detailed presentation of time series analysis and NN, the reader is referred to Box et al. (1994) and Bishop (1995).

Time Series Analysis Models

In forecasting hourly cooling loads, MacArthur et al. (1989) used an autoregressive moving average (ARMA) model. A recursive least squares (RLS) algorithm was used for on-line model parameter adjustment with an exponential forgetting factor to ensure a maximum covariance between the predicted and actual loads. The quality of the load forecast was innately tied to ambient weather prediction, because the method used historical data and external forecasts of maximum and minimum values for rescaling the profile.

Seem and Braun (1991) developed a model for electrical load forecasting by combining an autoregressive model for the stochastic part of the prediction task and an adaptive look-up table for the deterministic part. Using an exponentially weighted moving average (EWMA), hourly electrical power demand was determined using previous forecasts and measured values. The deterministic look-up table is often referred to as a cerebellar model articulation controller. Its performance showed improvement by introducing electrical load profile temperature dependence, with daily peak values modified according to the maximal daily temperature forecast as provided by the National Weather Service (NWS).

Chen and Athienitis (1996) discuss an optimal predictive control methodology for building heating systems in real-time dynamic operation. The system consisted of three loops: a conventional feedback control loop with predictive regulator, a parameter estimator using an RLS technique, and a setpoint optimizer. The weather prediction made use of both historical records and local weather forecasts. Simple rules were devised to modify historic shape factors based on external weather service forecasts.

Henze et al. (2004) investigated the impact of forecasting accuracy on predictive optimal control of active and passive building thermal storage inventory. The short-term weather forecasting models included bin, unbiased random walk, and seasonal integrated ARMA predictors. It was shown that model complexity does not imply accuracy. This paper builds on these results in the context of the MPC framework.

Neural Network Models

Ferrano and Wong (1990) used a feed-forward NN software package to predict the next day's cumulative cooling load by mapping hourly (24 input units) ambient dry-bulb temperatures, with results subsequently used in a real-time expert system. In order to allow for adequate generalization and avoid memorization, it was determined that training sets should contain a difference of at least 3% in the temperature patterns. This effectively reduced training time and reduced the prediction error from a maximum of 12% to only 4% for the validation sets.

Kreider and Wang (1992) used a feed-forward NN with nine input units in the prediction of hourly electrical consumption integrated from 15 min power values. The network was capable of responding to unusual weather phenomena (e.g., during a noncooling month (December) the weather was unusually hot for two days and the cooling plant had to come on during that time period). The network picked up this unusual condition and predicted the required power consumption for this period. Conventional methods of regression missed this event. It was shown that the quality of the forecast improves when the network is trained on data of the same season for which the prediction is made (i.e., predicting energy consumption for a noncooling month such as December should be done with a network that is trained on November data rather than data from a typical cooling month such as July).

Gibson and Kraft (1993) used a recurrent NN for electric load prediction in conjunction with a thermal energy storage system. The network was trained using electric and cooling load data from an office building monitored over a cooling season. The ability to generalize was noted by the network's ability to achieve similar prediction errors for atypical days as compared to normal days when supplied with detailed occupancy load information. Using a recurrent network architecture allowed the network to not only pick up the steady-state physical processes, but also the temporal relationships between system states (i.e., dynamic aspects).

Dodier and Henze (2004) used NNs as a general nonlinear regression tool in predicting building energy use data as part of the Energy Prediction Shootout II contest (described below). By combining a NN with a version of Wald's test, the relevance of free parameter inputs were determined. Time-lagged input variables were found by inspecting the autocovariance function. The strategy was the most accurate predictor in the competition.

Previous Comparative Studies in Prediction

The 1993 Great Energy Predictor Shootout, detailed in Kreider and Haberl (1994), was a competition in the prediction of hourly building energy data available to worldwide data analysts and competitors alike. The results showed that connectionist approaches were used in some form by all winners (i.e., NN approaches using different architectures and learning algorithms proved to achieve superior accuracy when compared with traditional statistical methods). Kawashima et al. (1995) compared hourly thermal load prediction for the coming day by means of linear regression, ARMA, EWMA, and NN models. The NN models were confirmed to have excellent prediction accuracy and were considered by the authors to be superior methods for utilization in HVAC system control, thermal storage optimal control, and load/demand management. The Great Energy Predictor Shootout II extended the understanding of prediction methods for building applications, evaluating the prediction of hourly whole-building energy baselines after energy conservation retrofits. The effectiveness of prediction models was compared for the top entries in Haberl and Thamilseran (1996). NN models were shown to be the most accurate in nearly all cases, but unique statistical approaches were shown to compete, in terms of accuracy, with the NN models.


Overview of Short-Term Weather Prediction

To ensure the forecasting models were evaluated for a wide spectrum of geographic locations, 11 climates in the United States, Europe, and Asia were investigated, as shown in Table 1. For each city, International Weather for Energy Calculations (IWEC) data was used to train and test the models' forecasting of weather variables. Due to their prevalence as required inputs to dynamic building and component simulation models, and for practical engineering purposes, the following four weather variables were predicted: global horizontal radiation [] (Wh/[m.sup.2]), dry-bulb temperature [T.sub.db] ([degrees]C), wet-bulb temperature [T.sub.wb] ([degrees]C), and relative humidity [phis]. Each forecasting model was used separately and seasonally to predict each of these weather variables for each city.
Table 1. Geographic Locations Considered for Short-Term Weather

United States                  Europe             Asia

Atlanta, Georgia          London, England    Beijing, China

Los Angeles, California    Paris, France      Tokyo, Japan

New York, New York       Stockholm, Sweden

Omaha, Nebraska          Stuttgart, Germany

Phoenix, Arizona

The seasonal testing months for this study were specified as: March (spring), June (summer), September (fall), and December (winter). To account for realistic, continuous forecasting model operation, the previous two months of the yearly IWEC data were used as training data and the third month as unseen testing data in the appraisal of seasonal forecasting performance: the spring evaluation used January and February for training data and March for testing forecasting performance, the summer evaluation used April and May for training and June for testing, and so forth in the fall and winter seasons. To achieve the research goals through a robust evaluation, the reasoning in the selection of the training and testing strategy followed a tripartite structure: 1) forecasting models were globally and seasonally evaluated to ensure their performance was not biased by a single climate and/or season, 2) the specified testing months coincided with weather phenomena typical of the respective season, and 3) the forecasting models had to perform in a data-driven, continuous manner by training on recent (historic) weather data and operating in contiguous months--on simple systems such as embedded controllers.

The discrete prediction occurred in hourly intervals, a typical simulation timestep length. The metrics used to assess forecasting accuracy are the coefficient-of-variation (CV) and mean bias error (MBE), both in (%), as shown in the following equations:

CV = [[square root of [1/N[summation over (i)][([x.sub.i]-[[^.x].sub.i]).sup.2]]]/[1/N[summation over (i)][x.sub.i]]] (1)

MBE = [[1/N[summation over (i)]([x.sub.i]-[[^.x].sub.i])]/[1/N[summation over (i)][x.sub.i]]] (2)

Where [[^.x].sub.i] is the predicted value of the monitored dependent variable [x.sub.i], and the denominator is the mean of the dependent variable over the testing set. In all cases, the metrics are calculated for an execution horizon of dL = 6 h, successively concatenated to encompass the complete testing period. The execution of the prediction constitutes only a portion of the planning horizon of L hours because most of the moving average (MA) models allow for corrections as time progresses, and the NN models must be compared on the same prediction-task basis.

The length of planning horizon L in a given prediction is primarily a function of the longest time constant of the dynamics to be controlled. That is, when performing MPC of building thermal mass, a longer planning horizon is required over which a strategy can be formulated as compared to the problem associated with faster dynamics such as chilled-water plant control involving smaller time constants. Based on Henze et al. (1997), planning horizons in excess of 18 h (better with 24 h) were suggested for building thermal mass control. On the other hand, the quality of the planned strategy depends on the quality of the forecasts it rests on. This, in turn, forces the choice of a planning horizon that accounts for the fact that forecast uncertainty grows with length of time predicted. This paper offers guidance for any MPC within commercial buildings, and thus, the choice of execution horizon dL would be a function of the specific application. A value of dL = 6 h was chosen with the thermal mass application in mind and recognizing that forecasts tend to be of sufficiently high quality for the next 6 h. However, both shorter and longer execution horizons could have been chosen for the analysis. It is the belief through considerable testing of reasonable dL time lengths, however, that the findings would not have changed significantly.

Investigated Forecasting Algorithms

A central question was whether the more computationally intensive requirements of NNs are warranted when compared to the performance of a simple and easy-to-implement time series analysis applied to the cyclical two-stage MPC process of prediction and execution. With various levels of sophistication, both methods were applied in a comparative analysis. Altogether, 14 different short-term forecasting models were proposed, and Table 2 provides the naming convention. The respective modeling details are outlined in following sections.
Table 2. Labeling of the Various Models Used in the Analysis

Model  Simple   Exponentially  Neural     External     Absolute
        Prior     Weighted     Network    Forecast     Deviation
       Moving       Moving              Modification  Modification
       Average     Average

1         X

2         X                                   X

3         X                                                 X

4         X

5         X                                   X             X

6         X                                   X

7                       X

8                       X                     X

9                       X                                   X

10                      X

11                      X                     X             X

12                      X                     X

13                              FTDNN

14                               NARX

Model    Relative




4            X


6            X




10           X


12           X



Simple Prior Moving Average (SPMA)

Historic observations are often drawn upon for developing characteristic profiles (forecasts) of weather variables, generically known as past-horizon predictors. The finite past horizon can be variable and may encompass the last day, week, month, or season to generate the forecasts. Equation 3 generates a characteristic profile for the next L hour planning horizon on the basis of a SPMA over the past horizon by performing uniformly weighted averages for each hour of the day:

[MA.sub.sp,t](d) = [1/d][d.summation over (i = 1)][x.sub.t]-[nabla]i. (3)

The number of days in the past horizon is indicated by d, the weather variable x is of interest, t is the specific hour of the day under investigation, and the [nabla] = 24 h backshift operator is modified by the index i within the summation to account for the diurnal nature of the weather variables. The investigated SPMA models used d = 60; weighting concerns and dynamic adaptation are considered below.

Planning and Executing Predictions

Due to the stochastic nature of weather, a measured weather variable datum for a particular hour will deviate from its forecast value. To allow dynamic adaptation to such an event, as the predictor moves through time, it accounts for any discrepancy at update time k*. It is assumed that the deviation will persist for a number of hours into the future. Therefore, the shape of the forecast weather variable is determined by the characteristic profile for a planning horizon L, but the forecast is anchored (described later) to the measured current value. In essence, the predictor assumes persistence of the deviation from the static forecast and adapts to measured values in realtime.

With the dynamic strategy, forecasting of weather variables is accomplished using historic events in the planning horizon L, adaptation according to current measurements at the update instance k*, and then executed over dL. At the end of the execution horizon, it is likely that another deviation will exist between the measured and forecast weather variables, as shown in Figure 1, and the process moves through time. The cyclical two-stage process of policy planning and execution is highly applicable to the requirements of MPC because recent historic data and current weather knowledge are considered, but only short-term forecasts are executed (e.g., an optimal temperature setpoint in MPC).


Anchoring of the Characteristic Profile

The deviation of the measured weather variable from its forecast value at time k* is described by the following equation:

[delta][x.sub.k*] = [x.sub.k*] - [MA.sub.[sp,k*]](d)

Modifying the SPMA based on [delta][x.sub.k*] can be accomplished in a number of ways, but the underlying assumption is that the deviation will persist for the next dL hours of the execution horizon, and thus, the prediction will be improved. Absolute and relative [delta][x.sub.k*] modifications are considered.

Absolute Deviation Modification. At the instant in time labeled k*, the actual measurement is compared with the forecast value. The calculated deviation [delta][x.sub.k*] is added to (or subtracted from) each element of the characteristic profile to produce the forecast for the next dL hours in the execution horizon. The basic elements described are combined to produce the dynamic short-term weather forecasting, according to the strategy of absolute deviation SPMA without external forecasts as seen in Equation 4:

[MA.sub.asp,t](d) = [1/d][d.summation over (i = 1)][x.sub.t-[nabla]i] + [x.sub.k*] - [1/d][d.summation over (i = 1)][x.sub.k*-[nabla]i], t[member of][k*, k* + L] or [MA.sub.asp,t](d) = [MA.sub.sp,t](d) + [delta][x.sub.k*], t[member of][k*, k* + L] (4)

where [delta][x.sub.k*] will be positive or negative according to the k* measured value.

Relative Standard Deviation Modification. The absolute vertical translation according to [delta][x.sub.k*] neglects the stochastic dispersion of weather data that has occurred in the past horizon. With the inclusion of standard deviation calculations for each weather variable and each hour of the day, it is possible to capture how much the measurement deviates from its mean value [mu], in a relative sense. For example, if the observed deviation is 3 and the standard deviation [sigma] for the current hour of the day is 4, then the SPMA for each hour in the prediction horizon L is corrected by moving each hourly average up or down the same number of standard deviations (here 3/4). Of course, if [sigma] were constant for all hours of the day, the relative deviation modification reduces to the absolute deviation.

The modification factor for any given time k* can be described by the following equation:

[] = [[[x.sub.k*]-[mu]]/[[sigma].sub.k*]] = [[[x.sub.k*] - [MA.sub.[sp,k*]](d)]/[[sigma].sub.k*]]

The relative deviation modified SPMA model can then be calculated as

[MA.sub.rsp,t](d) = [MA.sub.sp,t](d) + [][[sigma].sub.t](d). (5)

where [[sigma].sub.t] is the standard deviation of weather variable x at time of day t.

SPMA Including External Forecasts

This paper takes an on-site, data-driven perspective. However, the literature has shown that external forecasts provided by weather services could aid in performance. Common external predictions are typically limited to the next day's extreme temperatures and dew points. Scaling the SPMA with external forecast knowledge could yield improved results when the coming day's weather is irregular as compared to recent observations. Through the use of IWEC data, the extremes in weather are assumed to be perfectly predicted for the next 24 h.

The swing or range (extreme value difference) of the forecast weather variables is described by

[delta][x.sub.sw] = [x.sub.max, ext] - [x.sub.min, ext].

Scaling of the forecast depends on the forecast swing ratio (i.e., external forecast of the swing to that predicted by the SPMA). Scaling the characteristic profile according to this strategy is accomplished as


Even with the swing modification, the forecast weather variable for the next hour may deviate from the measured data. Again, an adjustment for the deviation introduced by updated forecasts is accomplished by anchoring to the measured value and described by [delta][x.sub.k*]; this is calculated after the [R.sub.sw] scaling due to distortion.

When scaling based on external forecasts is applied to the SPMA, the absolute deviation model becomes

[MA.sub.asp,e,t](d) = [R.sub.sw][MA.sub.sp,t](d) + [delta][x.sub.k*], t[member of]k*, k* + L], (6)

and the relative deviation model becomes:

[MA.sub.rsp,e,t](d) = [R.sub.sw][MA.sub.sp,t](d) + [][[sigma].sub.t](d), t[member of]k*, K* + L]. (7)

At time k*, anchoring can proceed using either the absolute or relative modifications. The distortion of the characteristic curve, both in terms of the dilation based on external swing forecasts and absolute (or relative) translation according to [delta][x.sub.k*], can allow response to recent weather observations and improve forecast accuracy.

Exponentially Weighted Moving Average

Accurate forecasts may be more heavily influenced by recent observations. Seasonality enforces this concept and the chaotic, nonstationary nature of weather may further emphasize it. The SPMA relied on two months of finite past horizon for developing characteristic profiles. In an exponentially weighted moving average (EWMA) with a theoretically infinite past horizon, more recent observations have a greater impact on the characteristic profile. As the weighting tends toward zero, the older observations have a diminishing influence and at some point the infinite past horizon becomes finite for all practical purposes.

The weighting scheme is a simple, exponentially (geometric) decreasing constant applied to each weather variable according to its temporal distance from t. The discount factor acts to diminish the influence of older observations, and weights [[alpha](1-[alpha]).sup.i] are successively applied according to the forecast model equation:

[MA.sub.ew,t] = [[infinity].summation over (i = 0)][alpha][(1-[alpha]).sup.i][x.sub.t-[nabla]i], (0<[alpha][less than or equal to]1) (8)

The sum of all weights must be unity, that is

[[infinity].summation over (i = 0)] [alpha][(1-[alpha]).sup.i] = 1.

The choice of [alpha] is influenced by the system under consideration to dampen older observations in a suitable fashion. For the models considered in this analysis, the [alpha] value was optimized during the training phase via an exhaustive search between [alpha] = 0.05 and [alpha] = 0.95, with [delta][alpha] = 0.05 steps, to find the minimum CV value.

EWMA, Including Modifications

The modifications applied to the SPMA, including absolute/relative anchoring and scaling based upon external forecasts, were also investigated for EWMA models. The absolute deviation model can be stated as

[MA.sub.aew,t](d) = [MA.sub.ew,k*](d) + [delta][x.sub.k*], (9)

and the relative deviation model as

[MA.sub.rew,t](d) = [MA.sub.ew,t](d) + [][[sigma].sub.t](d). (10)

Similarly, the expressions utilizing external weather forecasts are for the absolute deviation model

[MA.sub.aew,e,t](d) = [R.sub.sw][MA.sub.ew,t](d) + [delta][x.sub.k*], (11)

and for the relative deviation model

[MA.sub.rew,e,t](d) = [R.sub.sw][MA.sub.ew,t](d) + [][[sigma].sub.t](d). (12)

For each of the models t [member of] [k*,k* + L].

Neural Networks

With the ability to perform any arbitrary nonlinear mapping of input-output patterns, NNs appear to be well suited for forecasting that exists in the data-rich and theory-poor domain, complementing the data-driven perspective of this paper. NNs set themselves apart from sequential computation by distributing the computational tasks of a problem onto many identical simple units (neurons) that are highly interconnected and can work in parallel. NNs can approximate any continuous function to a high degree of accuracy, and are, therefore, well-suited for time series prediction. Time series prediction can be seen as the task of finding regularities and dependencies in the data set, and NNs can be taught to emulate the underlying dynamics of the system. Due to the unpredictable nature of noise, only the deterministic part of the problem can be predicted; the network establishes the nonlinear functionality.

A network is trained on a time series by presenting it a fixed number of previous data points [x.sub.t],[x.sub.t-1], ..., [x.sub.t-n], resulting in a fixed time window (or tapped-delay line) to predict the future values. The parameter n determines the dimensionality of the prediction problem. Increasing dimensionality increases performance to the point of diminishing returns, which has been formally described by Takens' Theorem, which states no more than m = 2df + 1 measurements of a variable are necessary to correctly predict the future value of this variable, where df is the effective number of the system's degrees of freedom (Rand and Young 1981). However, Takens' Theorem is derived using noise-free differential equations, and trial-and-error solutions are required for realistic implementation.

NNs can be classified into dynamic and static categories. Static (feedforward) networks have no feedback elements and contain no delays; the output is calculated directly from the input through feedforward connections. In dynamic networks, the output depends not only on the current input to the network, but also on the current or previous inputs, outputs, or states of the network. Moreover, dynamic networks also can be divided into two categories: those that have only feedforward connections, and those that have feedback, or recurrent, connections. Dynamic networks are generally more powerful than static networks (although somewhat more difficult to train). Because dynamic networks have memory, they can be trained to learn sequential or time-varying patterns.

Focused Time Delay Neural Network

The most straightforward dynamic network, which consists of a feedforward network with a tapped delay line at the input, is called the focused time-delay NN. This is part of a general class of dynamic networks, called focused networks, in which the dynamics appear only at the input layer of a static multilayer feedforward network. The architecture is equivalent to that of Figure 2, but devoid of the (square) exogenous input units. This network is well-suited to time-series prediction and will be thus adopted and named FTDNN.


Nonlinear Autoregressive with Exogenous Input Neural Network

The second NN investigated in this study is the nonlinear autoregressive with exogenous input (NARX) NN, shown in Figure 2, which uses a tapped-delay line for both the recurrent output as well as clock representation (sine and cosine) of the time of the day and day of the year as exogenous inputs per the recommendation of Dodier and Henze (2004). The motivation behind the choice of time lags is as follows: Past values of environmental variables are relevant to prediction of current variables, so including past values of inputs should improve the accuracy of predictions. However, it is clear that there are limits to the useful time lag. If the time lag is very long (e.g., a month), the time-lagged input will be almost completely uncorrelated with the current variable. If, on the other hand, the time lag is very short (e.g., a minute), the time-lagged input will be almost completely redundant with the current input variable, providing no new information. To choose an appropriate intermediate time lag, the autocovariance function was inspected, as seen in Equation 13, for each input variable at multiples of one hour.

C([tau]) = [1/N][summation over (t)]([x.sub.t]-[bar.x])([x.sub.t-[tau]]-[bar.x]) (13)

The first zero of the autocovariance represents a time at which the lagged input is uncorrelated with the current value (i.e., the two points are statistically independent, so it is the shortest time lag that brings in the most new information).

Optimization. For both the FTDNN and the NARX models, the tapped line length was bracketed around the first zero of the autocovariance function of the dry-bulb temperature of [T.sub.db] 11 h, as reported by Dodier and Henze (2004). The optimal network architecture was found by exhaustively searching in the range of 9 to 13 h for the lag and 20 to 30 neurons for the size of the hidden layer until a minimum CV was found in training. Limiting the optimization of network architecture to the stated ranges allowed a balance between the extremely long training times, the large study, and practical implementation. It should be noted that the Levenberg-Marquardt training algorithm was adopted for the FTDNN architecture and Bayesian regularization was adopted for the NARX architecture. In addition, early stopping and data scaling were utilized in both NNs.


The 14 different forecasting models of Table 2 were programmed in a commercially available, technical computing environment. For each of the 11 geographic locations in Table 1, the four weather variables ([], [T.sub.db], [T.sub.wb], and [phis]) were separately predicted, and the performance was described in terms of the metrics of Equations 1 and 2; the evaluation was accomplished separately for each of the four seasons. Nearly 2500 prediction tasks were conducted to ascertain the required complexity of a short-term weather forecasting model for MPC.

A general impression of each forecasting model's performance was obtained by collating the metrics for an exclusive prediction task (e.g., relative humidity, and the expected circumstances, [i.e., yearly basis]). Therefore, for each forecasting model, the mean of performance metrics was taken across all geographic locations and for all seasons for the exclusive prediction task. Given in percentage CV and MBE, Tables 3 and 4 present the results, respectively. Entries labeled NA indicate particularly poor prediction performance, exceeding 100% CV or MBE such that the metrics lose their interpretation other than signaling inferior performance. Observing Tables 3 and 4, it may appear the magnitude of the values is large. However, a formidable challenge of the dL = 6 h execution of prediction was intentionally put forth to thoroughly examine the forecasting models' performance. It should be kept in mind that the values are calculated by concatenating all 6-h predictions for the defined exclusive prediction task. Furthermore, the largest metrics are associated with [] and [T.sub.wb], which have a diurnal nature exhibiting more sporadic behavior than [T.sub.db] or [phis].
Table 3. Forecasting Models' Mean %MBE Over All Geographic Locations for
Each Weather Variable Predicted

Model  []  [T.sub.db]  [T.sub.wb]  RH

1          NA          NA          NA      27

2          69          NA          NA      90

3          91          19          55      12

4          78          19          55      12

5          57          20          92      13

6          83          20          95      13

7          61          39          NA      21

8          57          60          NA      65

9          49          20          62      11

10         69          21          62      12

11         45          22          92      12

12         67          22          94      12

13         NA          78          NA      19

14         NA          31          93      10

Table 4. Forecasting Models' Mean %MBE Over All Geographic Locations for
Each Weather Variable Predicted

Model  []  [T.sub.dh]  [T.sub.wb]  RH

1          74          NA          NA      22

2          35          NA          NA      76

3          53          13          35      8

4          35          12          36      8

5          32          13          59      8

6          42          13          61      8

7          30          29          NA      16

8          24          45          NA      49

9          24          13          40      7

10         28          13          40      7

11         21          14          62      8

12         29          14          63      8

13         NA          17          50      10

14         56          22          68      7

Performance Comparison

It was necessary to analyze the performance (metric) variance in order to compare forecasting models and make recommendations for application in distinct prediction tasks. This investigation of variance is only possible within a given geographic location (i.e., a unique data source), and furthermore, it was assumed the model is used for predicting one weather variable throughout the entire year. Therefore, changes in seasonal performance of a given model are considered a nuisance factor effect (i.e., a background effect that needs to be considered but is not of primary interest). The difference in forecasting performance of the 14 models is the concern and not the difference in an individual model's season-to-season forecasting performance, but the latter must still be considered.

As described in Hollander and Wolfe (1999), the Friedman test is a nonparametric, randomized block design. It was used to compare the performance effects of the 14 models while blocking the nuisance seasonal effects. The two-way analysis of variance (ANOVA) was initially considered, but the models' residuals were not normally distributed with equal treatment (forecasting model) variance as required by ANOVA assumptions. Friedman's test is a rank sum procedure that relies on the comparison of parameter locations (medians), allowing determination of the models' relative forecasting performance. The test assumes the data comes from a continuous population distribution (satisfied by IWEC weather data) and that all observations are mutually independent (satisfied by programming independence). The null hypothesis states that apart from minor nuisance block effects, the parameter location is equal for each treatment (i.e., there is no difference in forecasting model performance). The alternative hypothesis is that a significant difference exists among the treatments.

The Friedman statistic can be calculated as

Fr = [12/[mn(n + 1)][n.summation over (j = 1)][R.sup.2] * j]-3m(n + 1), (14)

where separately within m independent blocks (seasons) the n treatments (forecasting models) are ranked. The treatment with the best performance is assigned rank 1 and the worst performance rank n; there is an adjustment for ties, but it was not a concern here. [R.sub.i,j] is the rank for the ith (i [member of] 1, 2, ..., m) block for the jth (j [member of] 1, 2, ..., m) treatment. Within a given block, there are n! possible ranking arrangements, leading to m(n!) ranking possibilities when considering m blocks. If the null hypothesis is valid, then each mean ranking of a treatment is equally likely. The test statistic Fr is approximately a chi-square distribution with n - 1 degrees of freedom. In an upper-tailed critical value test, the Fr statistic is compared with the respective [x.sup.2] at an [alpha] level of significance when deciding to accept or reject the null hypothesis; a common value of [alpha] = 0.05 was selected. Statistically significant differences in (yearly) forecasting performance were found, using the Friedman test and the stated level of significance, for nearly every combination of weather variable and geographic location.

With those combinations found to have statistically significant differences with Friedman's test, a (post hoc) multiple comparison technique was used to determine a specific model's superiority over another. Hochberg and Tamhane (1987) recommend the following procedure as the most powerful for all pairwise comparisons:

|[[bar.R].sub.i]-[[bar.R].sub.i']| > [[Q.sub.[n,df].sup.[([alpha])]]/[square root of 2]][square root of [[n(n + 1)]/[6m]]] (1[less than or equal to]i[less than or equal to]i'[less than or equal to]n) (15)

When the absolute difference of any two mean ranks [[bar.R].sub.i] and [[bar.R].sub.i], exceeds a critical value they are deemed statistically different; [Q.sub.[n,df].sup.[([alpha])]] is the studentized range statistic obtained from tables or computational environments. The method relies on all pairwise difference having the same variance (i.e., pairwise balanced), and then (1 - [alpha]) confidence intervals for all comparisons can be calculated. The method is an extension of Tukey's honestly significant difference method.

The multiple comparison procedure allowed for determination of the forecasting models' statistically significant superiority or inferiority for the yearly prediction of a specific weather variable within a given geographic location. For example, the EWMA with absolute anchoring (model 9) was found to be the best CV performer for the Atlanta dry-bulb temperature prediction, and for the same prediction task was (statistically significant) superior to model 2. It is possible that one model will show statistically significant behavior over another, yet still not have the lowest CV or MBE. Figures 3 and 4 present the frequency of statistically significant superiority or inferiority for each model for both CV and MBE metrics. The frequencies were determined over all geographic locations. The figures complement Tables 3 and 4 and compactly summarize the results from the nearly 2500 prediction tasks.




To shed light on the results, Table 2 is used to distinguish among forecasting models, Tables 3 and 4 provide the mean CV and MBE metric performance, and multiple comparison results are summarized in Figures 3 and 4. For instance, models 9 and 11 differ only in the use of external (extreme) weather forecasts in the latter. Using the scaling factor [R.sub.sw] actually causes a 30% and 22% mean increase in CV and MBE for the [T.sub.wb] prediction. The corresponding frequency of superiority investigation shows that the model's superiority drops from 6 to 0 for CV and 5 to 0 for MBE. Thus, scaling EWMA forecasts based on external wet-bulb temperature forecast extremes is generally not recommended, even with perfect prediction.

The short-term forecasting of global horizontal radiation [] is a difficult task with the best predictor achieving a mean CV value of 45% and MBE of 21%. The best three predictor models-11, 9, and 8-are all EWMA models, with external forecasts (11 and 8) and absolute deviation modification (11 and 9) and no anchoring (8). The fact that an EWMA without anchoring to the current observed value fares this well comes as a surprise when compared to the other prediction variables. In Figures 3 and 4 it appears that EWMA models provide consistently superior performance. It is interesting to note that the FTDNN (13) is consistently inferior. The NNs do not seem to provide adept pattern matching for the complex, sporadic behavior of [] and are not recommended without exogenous inputs.

The prediction of dry-bulb ambient temperature is without a doubt the most important forecasting task. The best prediction performances, with mean CV values on the order of 20% (13% MBE), were achieved by SPMA models 4, 3, 6, and 5. The EWMA models 9, 10, 11, and 12 offered only slightly inferior performance of about 22% (14% MBE), while the complex NARX (14) NN model achieved only 31% CV and 22% MBE. Anchoring, both absolute and relative, improves prediction performance and is highly recommended for temperature prediction. The availability of an external forecast of the next day's extreme values offers insignificant performance benefits. Again the FTDNN fared very poorly.

With the increase in magnitude of CV and MBE values, it becomes apparent that the prediction of wet-bulb temperature alone is an extremely difficult task with none of the models achieving a mean CV of less than 50% or 35% mean MBE. The frequency of superiority is higher with the more complex models, but the efforts are futile when observing performance gains. In short, the use of [T.sub.wb] in determining ambient humidity levels is discouraged, and instead, the use of relative humidity [phis] is recommended. This is due to the dependence relative humidity has on dry-bulb temperature. The prediction task is simplified when the variables assume a fairly anticipated shape, rather than the prediction task being a completely stochastic pursuit; the argument extends to humidity ratio predictions.

Relative humidity emerged as the easiest prediction task with the best model (NARX, model 14) achieving a mean CV value of 10% and MBE of 7%. However, a wide range of models provides only slightly inferior performance: EWMA models 9, 10, 11, and 12 offer acceptable performance followed by SPMA models 4, 3, 5, and 6. The frequency of superiority enforces these results. Neither the FTDNN nor MA models without anchoring are recommended for the prediction of relative humidity. The inferiority is particularly high for models 2 and 8.

Example Weather Predictions. Time series plots are shown in Figures 5 through 7 to illustrate the closed-loop prediction performance over the dl = 6 h prediction horizon for increasingly stochastic weather phenomena. A nearly sinusoidal dry-bulb temperature prediction, with mild trend, is offered as an example in Figure 5 for the location of Stuttgart, Germany, during the summer. For this case, model 4 (SPMA with relative anchoring) achieves CV = 8.6% and MBE = 5.8%, while model 14 (NARX) yields a CV = 11.2% and MBE = 8.0%. Simple SPMA models perform as well as, or better than, the much more complex NARX model, because the daily profiles are close to the characteristic profile and the trend is accounted for through the anchoring process.




An example of relative humidity prediction in Phoenix, Arizona, during the summer season is shown in Figure 6 as a more complex time series. The daily maximum value changes drastically from 50% to 90% and back from 90% to 30%, and oscillatory behavior on days 2 and 3 complicates the prediction task. For this case, the best model is the NARX NN model 14, which yields CV = 14.9% and MBE = 11.0%, followed by the EWMA model with absolute anchoring (model 9) achieving CV = 19.8% and MBE = 11.0%.

A highly complex prediction task, the forecasting of wet-bulb temperature in Beijing, China, during the spring season is presented in Figure 7. The best performer is the NARX NN with a very poor performance of CV = 74.6% and MBE = 51.1%. Interestingly, the much simpler SPMA model 3 follows the NN, with metrics CV = 78.0% and MBE = 44.6%, as a close second.


The performance benefits of using the much more complicated FTDNN or the NARX does not appear to warrant the additional efforts in predictor model development and training. SPMA are superior in predicting dry- and wet-bulb temperatures, whereas the exponentially weighted moving average models show performance gains in predicting solar radiation and relative humidity. The two methods would be recommended for the respective prediction tasks. However, for all practical purposes, the differences in performance between SPMA and EWMA are small. This important conclusion can only be gleaned from this analysis when it is ensured that forecasts can be updated at every instance dL and anchored to the known observed value of the prediction variable. Furthermore, the use of MPC in buildings with a data-driven perspective should rely on variables that vary diurnally, and thus, reduce the prediction task difficulty (i.e., favor the prediction of relative humidity over wet-bulb temperature, as even complex NNs cannot establish the underlying dynamics solely from data). As such, the use of simple time series analysis within the cyclical two-stage model predictive control process of policy planning followed by execution outperforms even the most complicated NARX model.


The authors would like to express their sincere gratitude to ASHRAE for supporting this research through a grant-in-aid.


C = autocovariance function

d = number of days in the past horizon or dimension

df = degrees of freedom

dL = execution (prediction) horizon

Fr = Friedman statistic

[] = global horizontal radiation, Wh/[m.sup.2]

i,j = indices

k* = update time instant

L = planning horizon or characteristic profile

MA = moving average

m,n = number of blocks, treatments

Q = studentized range statistic

R = ranking

[] = relative standard deviation modification

[R.sub.sw] = swing ratio scaling modification

[T.sub.db] = dry-bulb temperature, [degrees]C

[T.sub.wb] = wet-bulb temperature, [degrees]C

t = specific hour of the day under investigation

x = general variable or weather variable of interest

[alpha] = discount factor or level of significance

[delta] = deviation

[mu] = mean value

[sigma] = standard deviation

[tau] = time lag

[phis] = relative humidity, %

[chi square] = chi-square statistic

[nabla] = backshift operator

x = mean value

[[^.x].sub.i] = predicted value

Subscripts, Superscripts, Etc.

aew = absolute deviation and exponentially weighted

asp = absolute deviation and simple prior

d = number of days in the past horizon or dimension

e = external

ew = exponentially weighted

i,j = indices

k* = update time instant

M,N = total number of terms

m,n = indices

rew = relative (standard) deviation and exponentially weighted

rsp = relative (standard) deviation and simple prior

sp = simple prior

sw = swing or range

t = specific hour of the day under investigation

[nabla] = backshift operator


Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford, England: Oxford University Press.

Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. 1994. Time Series Analysis: Forecasting & Control. Upper Saddle River, New Jersey: Prentice Hall.

Chen, T.Y., and A.K. Athienitis. 1996. Ambient temperature and solar radiation prediction for predictive control of HVAC systems and a methodology for optimal building heating dynamic operation. ASHRAE Transactions 102(1):26-35.

Dodier, R.H., and G.P. Henze. 2004. Statistical analysis of neural networks as applied to building energy prediction. Journal of Solar Energy Engineering 126(1):592-600.

Ferrano, F.J., and K.-F.V. Wong. 1990. Prediction of thermal storage loads using a neural network. ASHRAE Transactions 96(2):723-26.

Gibson, F.J., and T.T. Kraft. 1993. Electric demand prediction using artificial neural network technology. ASHRAE Journal 35(3):60-68.

Haberl, J.S., and S. Thamilseran. 1996. Great energy predictor shootout II: Measuring retrofit savings--Overview and discussion of results. ASHRAE Transactions 102(2):419-35.

Henze, G.P., R.H. Dodier, and M. Krarti. 1997. Development of a predictive optimal controller for thermal energy storage systems. HVAC&R Research 3(3):233-64.

Henze, G.P., D.E. Kalz, C. Felsmann, and G. Knabe. 2004. Impact of forecasting accuracy on predictive optimal control of active and passive building thermal storage inventory. HVAC&R Research 10(2):153-78.

Hochberg, Y., and A.C. Tamhane. 1987. Multiple Comparison Procedures. New York: John Wiley & Sons, Inc.

Hollander, M., and D.A. Wolfe. 1999. Nonparametric Statistical Methods. New York: John Wiley & Sons, Inc.

Kawashima, M., C.E. Dorgan, and J.W. Mitchell. 1995. Hourly thermal load prediction for the next 24 hours by ARIMA, EWMA, LR, and an artificial neural network. ASHRAE Transactions 101(1)1:186-200.

Kreider, J.F., and J.S. Haberl. 1994. Predicting hourly building energy usage. ASHRAE Journal 36(6):72-81.

Kreider, J.F., and X.A. Wang. 1992. Improved artificial neural networks for commercial building energy use prediction. Solar Engineering: Proceedings of Annual ASME International Solar Energy Conference, Maui, HI, pp. 361-66.

MacArthur, J.W., A. Mathur, and J. Chao. 1989. On-line recursive estimation for load profile prediction. ASHRAE Transactions 95(1):621-28.

Rand, D.A., and L.S. Young. 1981. Detecting strange attractors in turbulence. Dynamical Systems and Turbulence--Lecture Notes in Mathematics, ed. In F. Takens. Berlin: Springer-Verlag.

Seem, J.E., and J.E. Braun. 1991. Adaptive methods for real-time forecasting of building electrical demand. ASHRAE Transactions 97(1):710-21.

Anthony R. Florita

Student Member ASHRAE

Gregor P. Henze, PhD, PE


Anthony Florita is a doctoral student and Gregor Henze is a professor in the Department of Civil, Environmental and Architectural Engineering at the University of Colorado at Boulder, Boulder, CO.

Received August 18, 2008; accepted May 12, 2009
COPYRIGHT 2009 American Society of Heating, Refrigerating, and Air-Conditioning Engineers, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2009 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Florita, Anthony R.; Henze, Gregor P.
Publication:HVAC & R Research
Article Type:Report
Geographic Code:1USA
Date:Sep 1, 2009
Previous Article:A method for determining practical flammability risk when using refrigerant blends.
Next Article:Experimental and theoretical study of transient human thermal comfort response in convective and radiative environments.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters