Printer Friendly

Intelligent model based fault detection and diagnosis for HVAC system using statistical machine learning methods.


There are not many energy systems so commonly used in both an industrial, commercial and domestic setting as HVAC systems. Moreover, HVAC systems usually consume the largest portion of energy in buildings for most sectors. It is reported 'EIA, 2006; commercial buildings account for almost 20% of the US national energy consumption, or 12% of die national contribution to annual global greenhouse gas emissions. From 15% to 30% of die energy waste in commercial buildings is due to the performance degradation, improper control strategy and malfunctions of HVAC systems (Wang et al., 2010).

Regular checks and maintenance is usually die key for reaching these goals. However, due to the high cost of reactionary maintenance, preventive or predictive maintenance practices play an important role. A cost effective strategy is die development of fault detection and diagnosis (FDD) based on a new classifier (Dehestani et al., 2011). During the past decades, the system maintenance strategy has experienced three development stages (Hyvarinen and Karki, 1996; and (Yoshimura and Ito, 1989): breakdown maintenance, time-based maintenance and condition-based maintenance. Presently die condition-based intelligent preventive maintenance is gaining more and more interest for HVAC systems.

Building Management & Control Systems (BMCSs) were developed to monitor and control die HVAC systems, whilst a number of FDD methods and applications were assembled by the International Energy Agency (IEA) to detect and prevent further impact from faults (Dexter and Pakanen. 2001). Generally, the FDD methods can be divided into three types (Du et al. 1996), feature-based methods (Pandit and Wu, 19S3; and (Du et al. 1996), model-based methods (Simani et al. 2003), and a combination of both. Although these methods have been applied in a number of industrial processes with good performance (Chen and Patton, 1999), their application in the HVAC system is still largely at die research stage in laboratories, and many of the studies focus on vapor compression based system. For example, Kim (2005) investigated die effect of four artificial faults on the performance of a variable speed vapor compression system by using a rule-based fault classification method. Tassou and Grace (2005) also presented a fault diagnosis and refrigerant leak detection method for vapor compression refrigeration systems by rising neural network and expert system. Furthermore, several studies were also presented to deal with die faults in die air handling unit (AHU) (Dexter and Pakanen, 2001), (Lee et al., 2004), (Pakanen and Sundquist, 2003; and (Yoshida et al., 1999, 2001) and sensors (Hou et al., 2006; and (Wang and Xiao, 2004), respectively. At the same time, some advanced algorithms, such as transient pattern analysis (Clio et al., 2005), general regression neural networks, and feed forward control scheme (Salsbury and Diamond, 2001), were also utilized to detect and diagnose die faults in the HVAC systems. Now it is well recognized that FDD is very important in ensuring die safety of HVAC systems, improving user comfort, energy efficiency, and reducing operating and maintenance costs (House and Kelly, 2000). However, efficient FDD methods for HVAC systems still remain a challenge and commercial FDD systems are only beginning to emerge in recent years (Hyvarinen and Karki, 1996).

Motivated by these facts, die Commonwealth Science & Industrial Research Organisation (CSIRO) is developing a novel statistical machine-learning based technique for automated fault detection & diagnosis (AFDD) in HVAC systems. Preliminary results were presented in (Wall et al., 2011) and (Guo et al. 2012), showing the performance of the statistical machine learning-based technique in detecting air-handling unit (AHU) faults based on fault data obtained from ASHRAE Project 1312-RP - Tools for Evaluating Fault Detection and Diagnostic Methods for Air-Handling Units.

This paper proposes a new approach by combining a Hidden Markov Model (HMM) based FDD method and a data fusion method. The approach also includes clustering methods and an optimization process to avoid the modeling process converging to local minimum. When used in conjunction with statistical techniques, this approach has several advantages for data analysis. On one hand, the approach uses probabilistic models that consist of variables and probabilistic links between the variables. These links can denote the physical relationship as in the model-based approaches. On the other hand, the probabilistic links are learnt from the datasets as in the data-based approaches. Since the links are pre-set as prior knowledge, the learning process is more efficient. Hence, the approach is an ideal representation for combining prior knowledge and data. It does not need very detailed understanding of the physical system as in model-based approaches. It also does not need huge data sets as in the black-box approaches. Comparing with pure model-based or data-based approaches, it can take die strengths in both areas and reduce the weaknesses by balancing die dependency on physical models and datasets.

The paper is organized as follows. Section 2 provides an overview of the methodology, including the theoretical introduction of HMMs, and clustering methods. The experimental results for discussion are available in section 3. The conclusions of this study are drawn in section 4.


Fault Detection and Diagnosis Process for HVAC Systems

Generally speaking, HVAC systems are configured and used to control die environment of a building or room. The environmental variables controlled may, for example, include temperature, air-flow, humidity etc. The desired values/set-points of the environmental variables will depend on die intended use of die HVAC system. By way of broad example, if the HVAC system is being used in an office building, the environmental variables will be set to make the building, rooms therein comfortable to humans. A HVAC system typically services a number of zones within a building. The system normally includes a central plant which includes a hydronic heater and hydronic chiller. A pump system, which may include dedicated heated and chilled water pumps, circulates heated and chilled water from the heater and chiller through a circuit of interconnected pipes. A valve system, which may include dedicated heated and chilled water valves, controls the flow of water into a heat exchange system (which may include dedicated heated and chilled water coils). The heated and/or chilled water circulates through die heat exchange system before being returned to the central plant where die process repeats (i.e. the water is heated or dulled and recirculated). In the heat exchange system, energy from the heated/dulled water is exchanged with air being circulated through an air distribution system.

The HVAC system also includes a sensing system which typically includes a number of sensors located throughout the system, such as temperature sensors, humidity sensors, air velocity sensors, volumetric flow sensors, pressure sensors, gas concentration sensors, position sensors, and occupancy detection sensors. The HVAC system is controlled by a control system that may be a stand alone system, or may form part of a building automation system (BAS) or building management and control system (BMCS). The control system includes a computing system which is in communication with the various components of the HVAC system. The control system controls and/or receives feedback from the various components of the HVAC system in order to regulate environmental conditions for the inhabitancy or functional purpose of the building.

In a fault detection and diagnosis (FDD) process, data from the components of the HVAC system is received. Tins data may, for example, include sensed data from various sensors within the system and feedback data from various components of the system. Additional data from external data sources can also be received, such as the external weather data. Fault detection is processed in accordance with at least one specific fault detection models. In tins paper, we implemented a machine-learning based processes, the Hidden Markov Model.

In order to develop specific fault detection models, the KMMs are trained to learn patterns of either normal or faulty operation. The theoretical introduction of HMMs and the training process details are described in following section. Once the models are trained, the correspondence with and/or deviations from the learnt patterns of operation of the system/components; thereof can be used to detect and/or diagnose faults. Where a model is trained on normal operation of the system/components), a deviation can be identified as a fault in the system or relevant components). Conversely, where a model is trained or faulty operation of the system/component's), correspondence with the learnt model can indicate faulty operation of that system/component's). Applying machine-learning based techniques to fault detection is advantageous in that such techniques do not rely on fixed rules or models to determine a fault.

The specific fault detection models may include: models for detecting generally abnormal/faulty operation of the HVAC system; models for detecting generally normal operation of the HVAC system; models for detecting generally abnormal/faulty operation of a specific component or set of components of the HVAC system; models for detecting generally normal operation of a specific component/set of components of the HVAC system: or a combination thereof.

Multiple models can be generated for one HVAC system, based on different conditions. For instance, the HVAC system can operate very differently within different seasons. The way in which the system is modelled (using the general fault detection processes) and/or are trained will determine the nature of the specific fault detection models, and whether they detect faulty or normal operation in general or die existence of one or more specific faults.

Even for the same training datasets, multi models also have advantage in avoiding the overfit or the convergence in the local minimum. Hence, clustering and data-fusion algorithms are also implemented to achieve the final FDD results.

Hidden Markov Models (HMMs)

HMMs are chosen to model die HVAC system because they can infer optimal hidden states from observation sensor data while a lot of modelling technologies can only predict observations from observations. Generally speaking, a HMM is a statistical model in which the system being modelled is assumed to be a Markov process with unknown parameters. A Markov process is a mathematical model for the random evolution of a memoryless system. That is, the likelihood of a given future state at any given moment depends only on its present state and not on any past states.

The most common HMM structure is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution (Rabiner 1989). Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state, that is visible to an external observer and therefore states are "hidden".

To define a HMM, three basic components are needed:

1. A vector containing die prior probability of each hidden state: die initial state distribution, [pi] = [[pi].sub.i], where [[pi].sub.i] = p{[q.sub.0] = i}, for 1 [less than or equal to]i [less than or equal to]N. Here N is the number of states of the model, and [q.sub.0] denotes the initial state.

2. A set of state transition probabilities [LAMBDA] = [a.sub.ij]. Define

[a.sub.ij] = p{[q.sub.i+1] = j|[q.sub.t] = i}, 1[less than or equal to]i, j[less than or equal to]N, (1)

where [q.sub.t] denotes the current state. Transition probabilities should satisfy the normal stochastic constraints, [a.sub.ij] [greater than or equal to] 0 for 1 [less than or equal to] i, j [less than or equal to]N, and [N.summation over (j=1)] = 1 for 1[less than or equal to]i[less than or equal to]N.

3. The probability of the observation given a state, B = {[b.sub.j](k)}. Define

[b.sub.j](k) = p{[O.sub.t] = [v.sub.k]|[q.sub.t] = f}, 1[less than or equal to]j[less than or equal to]N, 1[less than or equal to]k[less than or equal to]M, (2)

where [v.sub.k] denotes the kth observation, M the number of observation, and [O.sub.t] the current parameter vector. Following stochastic constraints must be satisfied: [b.sub.j] [greater than or equal to]0 for 1[less than or equal to]k[less than or equal to]M, and [M.summation over (k=1)][b.sub.j](k) = 1 for 1[less than or equal to]j[less than or equal to]N.

In modelling HVAC system, transitions between die same or different hidden states can be predicted using state transition matrix A. and state-dependent observation matrices B. For fault detection, die state transition matrix A and state-dependent observation matrices B are based on measurement of the system during normal operation. A HMM fault detection process for a HVAC system can be easily implemented. The only pre-defined parameters of the HMM are the number of observation states and the number of hidden states. Generally speaking, die observations states are what can be measured. The hidden states may not have real-world/physical meaning, and are generally selected based on experience and/or experimentation. Figure 1 shows an example of a HMM with five sensor readings as the observe states (above die dash line), and four hidden states (below die dash line). The arrows show die transition probabilities between hidden states.

The training process is used to find die HMM parameters that maximise die probability P(O|([pi], A, B)) Tins process is performed using the recursive Baum-Welch algorithm, as described in (Rabiner, L R. 1989;. In die fault detection process, new data/information (e.g. sensor measurements and other component feedback; are input and based on this the HMM calculates die likelihood of the new measurement's fitness to die learnt HMM. If the likelihood is low, a potential fault is detected Hence, the output of the HMM fault detection process can indicate die fit between die current measurements and the learnt model.

Clustering and Data Fusion

For the same datasets, the fault detection results of the trained HMM, the likelihood, can vary from time to time, hence it is difficult to detect the fault based on the likelihood using a pre-defined threshold. Sometimes the fault detection results (the likelihood) can even converge to local minima, especially when the teaming datasets are not big enough. Figure 2 shows ten different testing results on ten different HMMs trained on the same training dataset where the green, blue and brown lines are those that do not converge to the global minimum.

In order to increase reliability, the HMM training and detection processes can be repeated a number of times (N). K-Mean clustering is then applied to N detection results (N likelihood sequences) in order to identify faults. For the fault detection in HVAC systems, one does not know whether there is any fault in the testing datasets ahead. To determine the number of clusters within the datasets (how many different faults), we defined the following distances in die K-Mean method.

Firstly assuming the cluster number is M, K-Mean clustering is applied to find these M clusters. The distances between any two clusters Ci and Cj are calculated as D(i, j). Tlie radius of each cluster Bj is defined as the maximum distance between a pomt within the cluster Ci and the centre of the cluster. Three ratios are calculated as: [disjunction] _1(i,j)=| ((D(i,j)-R2)[gamma]R1|; [disjunction] _2(i,j) = |((D(i,j)-R1)[gamma]R2; and [disjunction] _3(i,j)=D(i,j)/(R1+R2). When [gamma]_1(i,j), [gamma]_2(i,j), and [gamma]_3(i,j) are all less than one, clusters Ci and Cj merges to the same cluster, and the new centre of this cluster is calculated. The process stops when these three ratios are larger than one between any two clusters. The final cluster number minus one is the number of the faults in the testing datasets, as one cluster is for the normal situation.


In this section, the results of two real-world tests will be discussed: one for fault detection and one for fault diagnosis.

Online Fault Detection Results

In this test, real time sensor readings were received online, and the learnt HMMs need to detect the fault as real time. When the real data read in as shown in Figure, seven groups of pre-trained HMMs were used to calculate the likelihood, as shown in Figure 3-middle. The K-Mean clustering is then used to provide the classID as shown in Figure 3.bottom. The classID is defined as one for normal and two for fault. In this test, The chilled water valve in the HVAC system was set at 30% open from 11:00am to 11:30am on the day, and reset to normal operation after 11:30am. Hence the fault was identified successfully.

Fault Diagnosis Result

In the second test, the faults from the ASHRAE 1312-RP Summer 2007 dataset (Li and Wen, 2007) were used for fault diagnosis to analyze the proposed technique. The datasets were collected during 20th August to 24th August 2007. The description of these days' faults is listed in Table 1. We trained a group of HMMs using the EA Damper Stuck fault data in 21st August 2007, and used it to diagnose the other datasets. As shown in Figure 4, the whole dataset in 20th August 2007 was identified as "similar" to the training dataset, which means that it is the same type of fault - EA Damper Stuck. On the other hand, the datasets on the other three days were detected "not similar" to the training dataset, hence different class ID were identified for different type of fault.

Table 1. Fault Description for Part of the ASHRAE 1312-RP Summer 2007

Fault description                                              Date

EA Damper Stuck (Fully Open)                                 8/20/2007

EA Damper Stuck (Fully Close)                                8/21/2007

Return Fan at fixed speed (30[degrees]/ospd)                 8/22/2007

Return Fan complete failure                                  8/23/2007

Cooling Coil Valve Control unstable (Reduce PID PB by half)  8/24/2007


The paper presents a statistical machine-learning based AFDD approach, which can be seer, as a good combination of model-based methods and data-based methods. The main approaches in die paper are based on Hidden Markov Model techniques, which encode probabilistic relationships among variables of interest. Two different tests were presented, one for real time fault detection and one for fault diagnosis. Preliminary test results on a commercial building in Australia have demonstrated the approach performs well for AHU FDD applications. Future research will include integration of the approach into an on-line system for real time automated fault detection and diagnosis, covering a more comprehensive set of HVAC systems and fault conditions.


Chen, J., R. J. Patton. 1999. Robust Model-based Fault Diagnosis for Dynamic Systems, Kluwer Academic Publishers, Boston, 1999.

Cho, S.H., H. Yang, M. Zaheer-uddin, B. Ahn 2005. Transient pattern analysis for fault detection and diagnosis of HVAC system. Energy Conversion and Management 46: 3103-3116.

Dehestani, D.. S. Su. 2011. Online Support Vector Machine Application for Model Based Fault Detection and Isolation of HVAC System, International Journal of Machine Learning and Computing, IJMLC, l(l):6S-73.

Dexter, A., J. Pakanen. 2001. Demonstrating Automated Fault Detection and Diagnosis Methods in Real Buildings, IEA Annex 34, VIT, Espoo, Finland, 2001.

Du, R., Y. D. Chen, Y. B. Chen. 1996. Four dimensional holospectrume a new method for analyzing force distributions, ASME Transactions: Journal of Manufacturing Engineering and Science 119.

Du, R. 2005. Monitoring and Diagnosis of Sheet Metal Stamping Processes, Condition-based Monitoring and Control for Intelligent Manufacturing, in: R. Gao, L.H. Wang (Eds.), Springer-Verlag, New York, 2005.

Guo, Y., D. Dehestani, J. Li, J. Wall, S. West, and S. Su. 2012. Intelligent Outlier Detection for HVAC System Fault Detection, the 10th International Healthy Buildings Conference, July 2012.

Hou, Z.J., Z. Lian, H. Yao, X. Yuan. 2006. Data mining based sensor fault diagnosis and validation for building air conditioning system, Energy Conversion and Management 47: 2479-2490.

House J.M., G.E. Kelly. 2000. An overview of building diagnostics, in: National Conference on Building Commissioning and Diagnostics for Commercial Buildings: Research to Practice, Gaithersburg, USA, 2000: 1-9.

Hyvarinen, J., S. Karki. 1996. Building Optimization and Fault Diagnosis, IEA Annex 25, McGraw-Hill, Finland, 1996.

Kim M. 2005. Performance investigation of a variable speed vapor compression system for fault detection and diagnosis. International Journal of Refrigeration 2S: 4S1-4SS.

Lee W.Y., J. House, N. Kyong, Subsystem level fault diagnosis of a building's air-handing unit using general regression neural networks, Applied Energy 77: 153-170.

Pandit, S.M., S. M. Wu. 19S3. Time Series and System Analysis with Applications, John Wiley and Sons, New York, 1983.

Pakaner. J.E., T. Sundquist. 2003. Automation-assisted fault detection of an air-handing unit; implementing the method in a real building, Energy and Buildings 35: 193-202.

Rabiner, L. R. 19S9. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2):257-2S6.

Salsbury, T.I., R. Diamond. 2001. Fault detection in HVAC systems using model-based feedforward control, Energy and Buildings 33: 403-415.

Simani, S., C. Fantuzzi, R. J. Patton. 2003. Model-based Fault Diagnosis in Dynamic Systems using Identification Techniques, Springer, New York, 2003.

Tassou S.A., I. N. Grace. 2005. Fault diagnosis and refrigerant leak detection in vapour compression refrigeration systems. International Journal of Refrigeration 28: 680-688.

Wang, S.W., Q. Zhou, and F. Xiao. 2010. A system-level fault detection and diagnosis strategy for HVAC systems involving sensor faults. Energy and Buildings. 42(4): 477-490.

Wang S.W., F. Xiao. 2004. Detection and diagnosis of AHU sensor faults using principle component analysis method. Energy Conversion and Management 45: 2667-2686.

Wall, J., Y. Guo, J. Li, and S. West. 2011. A Dynamic Machine Learning-based Technique for Automated Fault Detection in HVAC Systems, ASHRAE Annual Conference, June 25-29, 2011, Montreal.

Yoshida H., S. Kumar, Y. Morita, Online fault detection and diagnosis in VAV air handling unit by RARX modeling. Energy and Buildings 33: 391-401.

Yoshida H, S. Kumar. 1999. ARX and AFMM model-based on-line real-time data base diagnosis of sudden fault in AHU of VAV system. Energy Conversion and Management 40: 1191-1206.

Yoshimura, M., N, Ito. 1989. Effective diagnosis methods for air conditioning equipment in telecommunication buildings, in: Eleventh International Telecommunications Energy Conference. Florence, Italy, 1989.

Ying Guo, Ph.D.

Josh Wall, Ph.D.


Jiaming Li, Ph.D.

Sam West

Dr Ying Guo and Dr Jiaming Li are research scientists at the CSIRO ICT Centre, Sydney NSW Australia. Dr Josh Wall is a research project leader and Sam West is a research engineer at the CSIRO Division of Energy Technology, Newcastle NSW Australia.
COPYRIGHT 2013 American Society of Heating, Refrigerating, and Air-Conditioning Engineers, Inc. (ASHRAE)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:DA-13-C018
Author:Guo, Ying; Wall, Josh; Li, Jiaming; West, Sam
Publication:ASHRAE Transactions
Article Type:Report
Geographic Code:8AUST
Date:Jan 1, 2013
Previous Article:Archi Bond Graphs: the connection between spatial representation and technical representation.
Next Article:Life cycle performance costing based building design decision support.

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |