A Combined Markov Chain Model and Generalized Projection Nonnegative Matrix Factorization Approach for Fault Diagnosis.

1. Introduction

As industrial process becomes more and more complex, process monitoring and diagnosis techniques are gaining importance for plant safety, maintenance cost, and profit margins. Multivariate statistical process monitoring (MSPM) techniques have been widely used to build statistical models for some unmeasured variables and for establishing online monitoring schemes for industrial process [1, 2]. These models extract a small number of latent variables, which in a manner better summarize the properties contained in the original variables. Monitoring and diagnosis using these latent variables are both simpler and more powerful than using the original variables [3].

It is well known that the traditional MSPM models usually require that the process data on all variables must be complete. In practice, however, one of the challenges in applying these models is to deal with the process data sets that contain some missing observations. Sometimes, more than 50% of industrial process data would contain the missing data [4]. Since the missing data in the incomplete measurements is usually correlated with some of the available variables, the conventional MSPM methods generally eliminate the sample data in the data matrix that contain them, but doing so will leave the corresponding nature of process unknown.

It is of great importance to determine how to use the incomplete process data sets to build the normal operating model. The well-known Markov chain model (MCM), a typical stochastic process model, is one of commonly used prediction approaches. The most basic feature of MCM is "Markov property," also known as "no aftereffect." The next process variable state is predicted by the transition probability matrix obtained using MCM to predict the mobility of measurements if the original time series satisfies the conditions of the Markov chain (MC) [5,6]. The MC prediction method has been widely used in various fields. In Jeon and Lee's research, MC-based prediction routing methods are proposed to select the optimal behavior nodes [7]. In order to keep balance of electric power system, Yoder et al. use the MCM to predict short-term wind power [8].

Nonnegative matrix factorization (NMF) is a novel multivariate data analysis and dimension reduction technique that has many applications in spectroscopy, data mining, and pattern recognition [9]. NMF and its variant methods are typically applied to high-dimensional data where each element has a non-negative value, and they find a low rank approximation from the historical process data sets. Unlike the traditional MSPM methods, the NMF-based algorithms do not have any assumption about the nature of the process variables except for nonnegativity. The nonnegativity restriction lets only additions in the factorization process. This property makes NMF obtain sparse and part-based subspace representations of the original data sets [10, 11]. Therefore, NMF-based methods have potential superiority to solve the monitoring and diagnosis problem of complex industrial process.

Normally, the NMF-based algorithms require that the measurement data be nonnegative. In practice, however, the process data of industrial process may be not fulfilling this constraint. Due to the difference in unit selection, the collected data is likely to contain negative numbers. Although it is possible by adjusting the unit to make negative data satisfy the nonnegative constraints, in order to make NMF method have a wider range of applications, we want to relax the nonnegative constraints on the original data sets. In this research, we will propose a new variant of NMF to solve the above problem. It can be called generalized projection nonnegative matrix factorization (GPNMF). Then, MCM and GPNMF are combined to extract useful information from incomplete process data and to combine them with statistical process monitoring techniques.

The rest of the article is organized as follows: Section 2 introduces MCM-based prediction method. Section 3 pro poses the GPNMF method. In Section 4, the MCM-GPNMF based process monitoring method is introduced. In Section 5, a case study in a 1000 MW unit boiler process is shown and discussed. Section 6 contains the conclusions.

2. MCM-Based Prediction Method

2.1. Markov Chain Model. Markov chain model is a stochastic process which can be used to estimate the transition prob ability between the discrete states of the system, and it can be expressed by some parameters. In the first-order Markov chain model, the state at a certain moment only depends on the state of its previous moment [12]. In the further research, the second-order or even higher-order Markov chain process was proposed [12,13]. In these second-order or higher-order Markov chain processes, the state of a moment depends on two or more states before it. In this paper, we will use the second-order Markov chain model to estimate the missing values in the data set.

For a second-order Markov chain model, let [X.sub.t] be a stochastic process. The set of all possible states in the stochastic process is called state space. The state space is defined as S = {1,2, ..., M}.

For a given time series [t.sub.1] < [t.sub.2] < ... < [t.sub.n-]1 < [t.sub.n], its conditional probability distribution can be expressed as

[mathematical expression not reproducible] (1)

Markov transition probability [p.sub.i,j,k] is defined as follows:

[P.subv.i,j,k] = P {[X.sub.t+1] = k | [X.sub.t-1] = i, [X.sub.t] = j}, (2)

where j represents the current state of the moment and i and k represent the previous and next states of i, respectively

If a stochastic process contains M states, the matrix P e [R.sup.MxMxM] consisting of all transition probabilities is called the state transition matrix of the second-order Markov chain. The state transition matrix P can be expressed as follows:

[mathematical expression not reproducible] (3)

The second-order Markovian transfer matrix has the following property. For any state i, the sum of the probability of its transition to all states is 1. Consider

[m.summation over (k=1)] [P.sub.i,j,k] = 1 [for all]i,j. (4)

The maximum likelihood estimation of transfer probability [p.sub.i,j,k] for second-order Markov chains is defined as follows:

[P.sub.i,j,k] = [[eta].sub.i,j,k]/[[summation].sub.m.sub.k=1] [[eta].sub.i,j,k] (5)

where [[eta].sub.i,j,k] is the transition frequency that the state is transferred from the previous moment state i and the current time state j to the next moment state k.

The cumulative distribution function (CDF), PCDF is also used in the process of generating the time series. The CDF of the second-order Markov chain model is calculated as follows:

[mathematical expression not reproducible] (6)

The second-order Markov chain model is used to generate the process variable time series, where the initial state is completely random. Generate a random number [epsilon] between 0 and 1 by the random number generator. For the second-order Markov chain model, the current time state j and the previous time state i are known. If e satisfies [mathematical expression not reproducible] then the next time state is k.

Only a time series of state is obtained by the above steps. Next, it is necessary to transform the resulting time series into the time series of actual process variable. In addition to the fact that the first and last states do not need to be transformed, all other states need to use formula (7) to transform.

P = Pt +Y, x(Pu -P) (7)

where [P.sub.u] and Pt represent the upper and lower limits of a state, respectively. [Y.sub.i] represents a random number evenly distributed between 0 and 1. P represents the actual value of the process variable.

2.2. Markov Property Test. When using the Markov chain to predict the value of a variable, it is not necessary to consider the change of the variable in the past. If the current state of the variable is known, the state of the next moment can be predicted. The Markov property test of the original time series should be carried out before the establishment of the Markov chain model. The specific method of Markov property test is given as follows: we assume that all states of the original time series are m. The marginal probability [p.sub.j] is defined as follows:

[mathematical expression not reproducible]

where [[eta].sub.ij] stands for the frequency that the variable transitions from state i to state j.

When the amount of data is large enough, the statistic [xi] obeys the [chi square] distribution and the degree of freedom of [chi square] distribution is [(m - 1).sup.2]. The statistic [xi] can be calculated as follows:

[mathematical expression not reproducible] (9)

where stands for the probability that the variable transitions from state i to state j.

When the significance level a is given, the value of [chi square].sub.[alpha]] ([(ml).sup.2]) can be obtained by looking up the table. The Markov chain model can be used to process the variable if [x] > [chi square].sub.[alpha]] ([(ml).sup.2]) 1)2) is satisfied.

2.3. Numerical Example. In this section, the active power of a 1000 MW unit is chosen for numerical experiment. Under the influence of random noise, the power measurement is randomly fluctuating near the set point. Therefore, the active power can be regarded as a stochastic process. The normal operation data of the active power is used as experiment data vector with 500 samples. The maximum and minimum values of power in all samples are 743.86 MW and 749.94 MW, respectively. Therefore, the stochastic process is divided into 16 states. In these states, the power values 743 MW and 750 MW are defined as state 1 and state 16, respectively. The other states are evenly distributed between state 1 and state 16, and the power interval between each two states is 500 kW. The significance level is given by 0.05 in the present work. The conclusion can be easily obtained through the calculation; that is, [xi] [much greater than] [[chi square].sub.0.05] (22 5). Therefore, The Markov chain model can be used to process this time series.

Next, we will select 50 samples from the experimental data randomly and their values will be set to zero. These 50 samples represent the missing data in experiment data vector. Then, the second-order Markov chain model is used to deal with the missing data in experiment data vector. The incomplete data can be replaced by the predicted value, which is shown in Figure 1.

As seen in Figure 1, the predicted data are very close to the corresponding original data. In other words, the prediction value by second-order MCM is very effective. Therefore, this example can illustrate the feasibility of the second-order MCM algorithm.

3. Generalized Projection Nonnegative Matrix Factorization

3.1. NMF Algorithm. The mathematical formulation of NMF is expressed as follows. Given a process data matrix X = [[x.sub.1], [x.sub.2], ..., [x.sub.n]] [member of] [R.sup.mxn] (each column of X is a sample vector), where all elements are nonnegative and a natural number k < {m, n}, NMF aims to find two low rank nonnegative matrices W [member of] [R.sup.mxk] and H [member of] [R.sup.kxn] such that X [approximately equal to] WH. Here, each column of W is called basis vector. Each column of H is called an encoding and is in one-to-one correspondence with a sample vector in X [10]. In other words, each sample vector [x.sub.i] is approximated by a linear combination of the columns of W, weighted by the components of [h.sub.i] [11].

The approximate factorization X [approximately equal to] WH problem can be formulated as an optimization problem that uses a cost function to measure the quality of the approximation. Lee and Seung constructed a simple objective function which utilizes the square of the Euclidean to measure the distance between X and WH [11]. The objective function of the optimization problem can be expressed as follows:

[mathematical expression not reproducible] (10)

It is well known that the function [parallel]X - WH[[parallel].sup.2] is convex in W only or H only but is not convex in both variables mean while. Therefore, it is realistic to find local minima by solving the above optimization problem. Lee and Seung presented an iterative algorithm to obtain the local minima. The objective function [parallel]X - WH[[parallel].sup.2] is monotonically nonincreasing under the following rules [11]:

[mathematical expression not reproducible] (11)

Heretofore, the multiplicative iterative algorithm is the most classic and widely used monotone algorithm. The contradiction of the optimization algorithm between convergence speed and easy use can be better coordinated by using above iterative rules.

3.2. GPNMF Algorithm. Many improvements of NMF have been presented based on the objective function of the basic NMF algorithm by adding different regularization terms so as to increase the constraint conditions to NMF from different perspectives [14, 15]. On the contrary, Yuan and Oja proposed an improved NMF algorithm based on linear projection [16]. It can be called projective nonnegative matrix factorization (PNMF). However, the PNMF algorithm does not guarantee that the objective function is monotonically decreasing during the iteration, and there maybe oscillations. In order to solve this problem and make the NMF algorithm better adapt to the actual industrial process, we propose a new variation of NMF method named generalized projection nonnegative matrix factorization (GPNMF). This method will not only guarantee the objective function monotone nonincreasing under the new iterative rule but also make the process data not necessarily constrained to be nonnegative. In this approach, the approximate factorization X [approximately equal to] WH will be rewritten as X [approximately equal to] X[H.sup.T]H and the optimization problem can be expressed as follows:

The above constrained optimization problem can be regarded as a typical application of regularized least squares problem proposed by Tikhonov in 1963 [17]. The objective function of (12) can be expanded to a function about the variable H which ignores the constant term XTX. Hence, (12) can be rewritten as follows:

[mathematical expression not reproducible]. (13)

The gradient matrix of (13) can be obtained by using matrix differential:

[partial derivative]F (H)/[partial derivative]H (-2H[T.sup.T] X + 2H[X.sup.T][XH.sup.T]H). (14_

When the value of (14) is equal to 0, the solution of the least squares problem in (12) is given by

[HX.sup.T]X = [HX.sup.T]XHTH. (15)

Since the nonnegative constraint for the original data matrix X is relaxed in the GPNMF algorithm, now consider the case where X contains positive and negative elements. The factors [X.sub.+] and [X.sub.-] are defined as the absolute values of all positive elements and negative elements in X, respectively. [X.sub.+] and [X.sub.-] are calculated separately as follows:

[mathematical expression not reproducible] (16)

where [absolute value of X] represents the absolute value for all the elements of the matrix X. Then the original data matrix X can be expressed as [X.sub.[+ or -]] = [X.sub.+] -[X.sub.-] and (15) can be adapted as follows:

[mathematical expression not reproducible] (17)

It can be seen from (13) that the objective function F is a quartic function with respect to the coefficient matrix H. Here, a new iterative rule for the optimization problem in (12) is presented as follows:

[mathematical expression not reproducible] (18)

The objective function in (12) is invariant under this update if and only if H is at local minima of the constrained optimization problem.

4. Fault Diagnosis Algorithm Based on MCM-GPNMF

4.1. Initialization of GPNMF Algorithm. For the moment, NMF and its improved algorithms are solved by iteration. It is well known that a good iteration initial value can improve the convergence speed and accuracy of the NMF based algorithm. Although many researchers have pointed out the importance of a good initial value for the NMF algorithm in their article, most people still use the random method to initialize the NMF algorithm in the practical application. Since the NMF can only converge to the local optimal solution, the different initial values will result in different results. Langville et al. compared several commonly used initialization methods [18]. For the presented GPNMF algorithm, there is only one unknown factor, that is, coefficient matrix H. Besides, the approximate factorization X [approximately equal to] XHTH can clearly indicate that the coefficient matrix H is calculated to satisfy HTH ~ I, where I is identity matrix of size k. In this research, therefore, the coefficient matrix H is initialized with a fc-by-n-dimensional matrix with ones on the diagonal and zeros elsewhere.

4.2. Statistical Monitoring Model Based on GPNMF. The GPNMF-based statistical process monitoring model can be described as

[mathematical expression not reproducible], (19)

where the matrix W[??] and E represent the eigensubspace and the residual subspace, respectively. The matrix [mathematical expression not reproducible] [([W.sup.T]W).sup.-1] [W.sup.T]X is the reconstructed value of the coefficient matrix H. It reflects the changes in process variables. Due to the fact that the eigensubspace and the residual subspace of GPNMF algorithm are similar to the principal compo nent subspace and residual subspace of PCA-based method, according to the definition of monitoring statistic [T.sup.2] and SPE in PCA method, we construct two new monitoring statistics based on GPNMF statistical monitoring model to monitor the change of eigensubspace and residual subspace. They are calculated as follows:

[mathematical expression not reproducible] (20)

where the vector [??](i) and x(i) represent the ith column of matrices [??] and X, respectively. The column vector x(i) is the reconstructed value of x(i). It can be calculated as follows:

[mathematical expression not reproducible] (21)

For the two new monitoring statistics proposed in this section, it is obviously unreasonable to assume that these two metrics are subject to a particular distribution and calculate their upper control limits like traditional monitoring algorithm.

Therefore, we use kernel density estimation (KDE) which is a more general method to calculate the actual distribution information of the monitoring statistics and then determine their upper control limits. The confidence interval d is given by 99% during the calculation process.

4.3. Contribution Plot Method. It is well known that contribution plot method has been widely used in fault isolation research field. Once a fault is detected by the fault detection method, this method will be used to isolate the variables that are most likely to cause the failure. Contribution plot method is a commonly used fault separation method in PCA based fault diagnosis algorithm. The monitoring statistics [T.sub.2] and SPE are often used in the contribution plot method. The corresponding contributions to the [T.sup.2] statistic and SPE statistic can be expressed using the following equations, respectively:

[mathematical expression not reproducible] (22)

where [mathematical expression not reproducible]. The vector [[xi].sub.i] represents the ith column of unit matrix [I.sub.m]. The integer m represents the number of process variables in PCA. The parameters [mathematical expression not reproducible] represent the contribution of each process variable to the monitoring statistics T2 and SPE, respectively.

Alcala and Qin made a detailed analysis and summary of the contribution plot method [19]. Besides, a basic framework to construct contribution graph method was given in their research. Based on this framework, the corresponding contributions to the [T.sup.2.sub.G] statistic and SPEG statistic can be designed as follows:

[mathematical expression not reproducible] (23)

where the parameters [mathematical expression not reproducible] represent the contribution of each process variable to the monitoring statistics [T.sup.2.sub.G] and SPEG, respectively. The vector [[delta].sub.i] represents the ith column of unit matrix [I.sub.m]. The natural number m represents the number of process variables in GPNMF method.

5. Monitoring Performance

5.1. Fault Detection. In this section, the proposed method is applied to a 1000 MW unit boiler process. The boiler process for monitoring experiment contains three control systems: main steam temperature control system, main steam pressure control system, and feedwater control system. This process mainly consists of 33 consecutive process measurements variables, including 14 temperature signals, 9 pressure signals, 9 flow signals, and 1 power signal. All the 33 process variables are used for fault detection in this experiment. The normal operation history data of the boiler process is used as training data set which has 500 samples. For the fault data, the corresponding testing data set consists of 500 observations. Remarkably, the testing data set begins with normal operation data and the abnormal data is introduced from sample 51. All these process data are sampled every 3 sec.

In order to display the monitoring performance of MCM- GPNMF-based method, a bias fault in the main steam temperature A is taken into account. The magnitude of the fault is 1% of the actual measured value. Next, we consider the condition that 10% of the measurement values in testing matrix are missing randomly. The testing matrix contains 33 variables and each variable has 500 samples. In other words, 1650 values of testing matrix are missing. The monitoring result of bias fault when 10% of testing data are missing is shown in Figure 2.

Figures 2(a) and 2(d) show the monitoring result based on the GPNMF algorithm when the testing data set is complete. Figures 2(b) and 2(e) represent the detection result when the same method is used and 10% of the measurement values in testing matrix are missing. Figures 2(c) and 2(f) represent the monitoring result of MCM-GPNMF-based algorithm when 10% of the measurement values in testing matrix are missing. As shown in Figure 2, the GPNMF based monitoring algorithm has an excellent monitoring performance, if all the process data is complete. However, the monitoring performance of the method has a serious decline when the testing data set contains missing data. When the fault occurs, the localized features of missing values will have a little change. So it is a huge challenge for [T.sup.2.sub.G] statistic of GPNMF, which is shown in Figure 2. Many fault samples are error-detected by [T.sup.2.sub.G] statistic. For the SPEG statistic, although it can detect almost all of the faulty samples, many normal samples are error-detected. On the contrary, the [T.sup.2.sub.G] statistic and [SPE.sub.G] statistic of MCM-GPNMF-based method can detect almost all the faulty samples (over 98% actually). Meanwhile, the MCM-GPNMF-based method has the lower false alarm rate of [SPE.sub.G] statistic than GPNMF, when the testing matrix is incomplete. However, in all three cases, it should be mentioned that several normal samples are error detected, but the monitoring result of normal samples is still acceptable. In summary, the presented MCM-GPNMF based monitoring method can deal with the fault detection problem, which contains incomplete data.

5.2. Fault Isolation. Based on the monitoring result of the previous section, in this part, we will use the contribution plot based on [T.sup.2.sub.G] statistic and SPEG statistic to identify the variable that most likely leads to the fault. Due to the fact that the accuracy of fault variable identification is closely related to the accuracy of fault detection, it is meaningful to calculate the contribution value of each variable after the system fault is detected accurately. The most obvious change when bias fault occurs is that the value of main steam temperature A suddenly changes by the reason of a step temperature change at itself. The main steam temperature A is the 13th variable of the chosen boiler process. Both of the contribution plots corresponding to TG statistic and SPEG statistic, shown in Figure 3, identify this variable accurately. It could illustrate the effectiveness of the presented algorithm.

6. Conclusions

In recent years, the NMF-based algorithm, which is still under development, has shown great potential in the field of industrial process fault detection. However, there are still some weaknesses for these methods in dealing with the problem of missing data. This paper proposed a new variant of NMF named generalized projection nonnegative matrix factorization which combines with the second-order Markov chain model for the situation that the process data contain missing data, which significantly improves the detection rate of industrial process. Meanwhile, two types of monitoring indices, [T.sup.2.sub.G] statistic and SPEG statistic, and the corresponding contribution plots were designed, respectively. As a result, the simulation in a 1000 MW unit boiler process showed that the proposed method yielded better results for fault detection and fault isolation. In other words, the monitoring method based on MCM-GPNMF proposed in this work is valuable for research and application.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

https://doi.org/10.1155/2017/7067025

References

[1] M. Kano and Y. Nakagawa, "Data-based process monitoring, process control, and quality improvement: recent developments and applications in steel industry," Computers & Chemical Engineering, vol. 32, no. 1, pp. 12-24, 2008.

[2] S. J. Qin, "Survey on data-driven industrial process monitoring and diagnosis," Annual Reviews in Control, vol. 36, no. 2, pp. 220-234, 2012.

[3] P. R. C. Nelson, J. F. MacGregor, and P. A. Taylor, "The impact of missing measurements on PCA and PLS prediction and monitoring applications," Chemometrics and Intelligent Laboratory Systems, vol. 80, no. 1, pp. 1-12, 2006.

[4] K. Lakshminarayan, S. A. Harp, and T. Samad, "Imputation of missing data in industrial databases," Applied Intelligence, vol. 11, no. 3, pp. 259-275,1999.

[5] A. A. Nudel'man, The Markov Moment Problem And Extremal Problems: Ideas And Problems of PL Cebysev And AA Markov And Their Further Development, American Mathematical Society, 1977

[6] L. E. Baum and J. A. Eagon, "An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology," Bulletin of the American Mathematical Society, vol. 73, pp. 360-363,1967

[7] I.-K. Jeon and K.-W. Lee, "A dynamic Markov chain prediction model for delay-tolerant networks," International Journal of Distributed Sensor Networks, vol. 12, no. 9, pp. 1-7, 2016.

[8] M. Yoder, A. S. Hering, W. C. Navidi, and K. Larson, "Short term forecasting of categorical changes in wind power with Markov chain models," Wind Energy, vol. 17, no. 9, pp. 1425 1439, 2014.

[9] Z. Li, X. Wu, and H. Peng, "Nonnegative Matrix Factorization on Orthogonal Subspace," Pattern Recognition Letters, vol. 31, no. 9, pp. 905-911, 2010.

[10] D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, 1999.

[11] D. D. Lee and H. S. Seung, Algorithms for Non-Negative Matrix Factorization, Advances in Neural Information Processing Systems, 2001.

[12] G. DAmico, F. Petroni, and F. Prattico, "First and second order semi-Markov chains for wind speed modeling," Physica A. Statistical Mechanics and its Applications, vol. 392, no. 5, pp. 1194-1201, 2013.

[13] A. E. Raftery, "A model for high-order Markov chains," Journal of the Royal Statistical Society, Series B: Methodological, pp. 528 539, 1985.

[14] D. Cai, X. He, J. Han, and T. S. Huang, "Graph regularized nonnegative matrix factorization for data representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548-1560, 2011.

[15] H. Lee, J. Yoo, and S. Choi, "Semi-supervised nonnegative matrix factorization," IEEE Signal Processing Letters, vol. 17, no. 1, pp. 4-7, 2010.

[16] Z. Yuan and E. Oja, "Projective nonnegative matrix factorization for image compression and feature extraction," in Proceedings of the Scandinavian Conference on Image Analysis, Springer, Berlin, Germany, 2005.

[17] A. Tikhonov, "Solution of incorrectly formulated problems and the regularization method," Soviet Mathematics Doklady, vol. 5, 1963.

[18] A. N. Langville et al., "Algorithms, initializations, and convergence for the nonnegative matrix factorization," https://arxiv .org/abs/14077299.

[19] C. F. Alcala and S. J. Qin, "Reconstruction-based contribution for process monitoring," Automatica. A Journal of IFAC, the International Federation of Automatic Control, vol. 45, no. 7, pp. 1593-1600, 2009.

Niu Yuguang, (1) Wang Shilin, (2) and Du Ming (2)

(1) State Key Laboratory for Alternate Electric Power System with Renewable Energy Source, North China Electric Power University, Beijing 102206, China

(2) School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China

Correspondence should be addressed to Wang Shilin; wslncepu@163.com

Received 9 March 2017; Revised 9 May 2017; Accepted 22 May 2017; Published 13 June 2017

Caption: Figure 1: Comparison of predicted and real values.

Caption: Figure 2: The monitoring result of bias fault.

Caption: Figure 3: The contribution plots when bias fault occurs.