# Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process.

IntroductionMineral composition analysis is a key factor in determining whether or not to carry out mining. Over the years, many scholars have proposed some new ideas and methods for accurate mineral grade assessment, many of which are based on chemical or physical test equipment to obtain the data for ingredient grade analysis (Kameshwara, Rao, & Narayana, 2014; De'nan, Naaim, & Leong, 2017). Therefore, the accuracy of the data used for ore composition analysis is critical to the ore grade analysis. At present, automated testing equipment is used in ore grade analysis, such as "BOX-A type on-stream x-ray fluorescence analyzer", which uses spectral obtain by irradiating X-rays to the pulp to get the results of ore grade. It is worth noting that BOX-A type on-stream x-ray fluorescence analyzer by default is that the spectral data obtained is correct. But whether it is chemical or physical testing equipment are inevitably produced abnormal data. Those outliers directly affect the analysis results of the mineral products analysiser (Clarke, & Levis, 1998; Rivoirard, Demange, & Freulon, 2013). Therefore, the detection and elimination of these abnormal data is the premise and key to the above ore grade analysis work.

A new algorithm is proposed here to especially do outlier detection for ore inspection data which obtain from chemical or physical testing equipment. The algorithm utilizes AR model to fit the time series and makes use of HMM as a basic detection tool, which can avoid the deficiency of presetting the threshold in traditional detection methods. To update parameters of ARHMM online, the structure of traditional BDT (Brockwell-Dahlhaus-Trindade) algorithm is improved here, and a double iterative structure in which iterative calculation from both time and order is applied respectively. With the purpose of reducing the influence of outlier on parameter update of ARHMM, the strategies of detection-before-update and detection-based-update are adopted, which also improve the robustness of the algorithm. Subsequent simulation by model data and practical application verify the accuracy,

robustness, and property of online detection of the algorithm.

In this paper our innovations are shown as follow:

1. Unlike other outlier detect method (such as the traditional AR model detection method), the outliers detect method proposed in this paper does not need to set the detection threshold.

2. Considering the problem that the model order of chemical or physical testing equipment's hard to be determined, the new detected method which is based on residual error has the function of model order self-learning.

3. In a view to avoid the influence of outliers on the test results, this paper proposes a detection-before-update and detection-based-update strategies.

The Predecessors' Achievements on Outlier detection

Many good ideas and methods are put forward for the research of outliers detection problem, such as that Barnett and Lewis proposed an outliers detected method based on statistics in their word named 'Outlier in Statistical Data (Barnett & Lewis, 1994). Outlier detection method based on distance is proposed by Knorr and Raymond (Knorr & Ng, 1999; Edwin & Raymond, 1998), an new detected method based on density is suggested by Ramaswamy et al. (2000). But for ore inspection data, the detection methods based on distance, density or variance is a lack of feasibility since an online real-time detection method be needed for the ore test data. With the research of anomaly data detection technology, many new ideas and techniques are introduced, such as clustering analysis (Almeida & Barbosa, 2007) and neural network method (Bullen, Cornford, & Nnbney, 2003; Prakobphol, & Zhan, 2008). But clustering analysis method is also not suitable for online outlier detection for extensive test data, and neural network method requires a lot of data to model learning. In 1995, Ragaran and Argrawal put forward the concept of "sequence anomaly" (Han & Micheline, 2001) and proposed the detection method based on deviation (Takeuchi, & Yamanishi, 2006). Because this method needs to know the order of the model, it can not be directly used for the outlier detection of mineral grade analysis data.

Structure of Double iteration in BDT

To make the BDT algorithm can be calculated online, the improved BDT algorithm with double repetition structure is proposed in this paper.

Traditional BDT algorithm

The traditional BDT algorithm is improved by Levinson-Durbin algorithm which is proposed by Brockwell et al. (2002). For traditional BDT algorithm, using all the data to the iterative calculation of model order, in a view to obtain the order of the forward and backward AR model.

[x.sub.t], t =1,2,... is the test data waiting for detection, where xt is m -dimension vectors. So forward AR model can be express as Equation 1.

[mathematical expression not reproducible] (1)

In which, [[epsilon].sub.k], (t) is forward residual under k order model, which obeys Gauss distribution with zero means. [a.sub.k] (i) is the coefficient of forwarding AR model under k order model. So backward AR model can be written as Equation 2.

[mathematical expression not reproducible] (2)

With Minimizing all data forward and backward residual as the target, so the generalized objective function written as Equation 3 (Trindade, 2003).

[mathematical expression not reproducible] (3)

n is the number of the data. [[omega].sub.1] and [[omega].sub.2] are weighted coefficient matrix for forwarding and backward AR model, and in BDT algorithm, the values of [[omega].sub.1] and [[omega].sub.2] is 1. [[epsilon].sub.k] (t) and are Estimated residual values for forward and backward AR model individually.

[mathematical expression not reproducible] (4)

[mathematical expression not reproducible] (5)

In which, [a.sub.](i),i = 1, 2, ... ,k [b.sub.k](j),J = 1, 2, ... ,k are m -dimension matrix. The traditional BDT algorithm can be written in full as followed:

[mathematical expression not reproducible] (6)

[mathematical expression not reproducible] (7)

[mathematical expression not reproducible] (8)

[mathematical expression not reproducible] (9)

[mathematical expression not reproducible] (10)

[mathematical expression not reproducible] (11)

[mathematical expression not reproducible] (12)

[mathematical expression not reproducible] (13)

In Equation 6 to Equation 13, [U.sub.k] and [V.sub.k] are estimated variance for forwarding and backward noise. The initial condition for traditional BDT algorithm are:

[mathematical expression not reproducible] (14)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE] (15)

[mathematical expression not reproducible] (16)

The subscript [empty set] express that when the initial iteration, the model order set is empty.

* Double iteration BDT algorithm

The objective function of improved BDT algorithm also is Equation 3. The dynamic performance of the algorithm is enhanced by the forgetting factor . The improved BDT algorithm has double loop structure which model order is inner loop and time is the outer loop.

We set [mathematical expression not reproducible] which is part of Equation 6. So:

[mathematical expression not reproducible] (17)

In Equation 7, k a set maximum value for model order. Considering the time-varying characteristics of the model parameters, the forgetting factor is added to the outer loop(time loop) updates.

[mathematical expression not reproducible] (18)

[mathematical expression not reproducible] (19)

In Equation 18, [mathematical expression not reproducible] is the mean of the covariance matrix for [[epsilon].sub.i-l] ( t ) and [[eta].sub.i-l] (t-i) in time . Similarly, Equation 13 can be rewritten as:

[[eta].sub.k](t-k) = [[eta].sub.k-1](t-k)-[b.sub.k](k)[[epsilon].sub.k-1](t) (20)

So the calculation process of double iteration algorithm is illustrated in Figure 1.

Implementation of Order Self-learning ARHMM Detection Algorithm

The traditional ARHMM structure is composed of two parts (Wang, & Chiang, 2008): one is Markov chain, which is expressed as initial state probability [pi] and state transition matrix [mathematical expression not reproducible] in which [S.sub.t] is the state at time t, N is the total state for HMM, and [mathematical expression not reproducible] is a conditional probability. The other is expressed as observation probability matrix B = [([b.sub.tj]).sub.NxN] calculated by AR model.

[mathematical expression not reproducible] (21)

In Equation 21, N(*) is Gauss function, and [[summation].sub.k] is estimated the variance of Gauss distribution.

ARHMM outlier detection algorithm also composed of two steps:

One step--Preliminary detection

From Equation 1, we can see that there is a deviation between estimated process data by AR model and real process data.

[x.sub.t] = [x.sub.t] + [[epsilon].sub.k] (t) (22)

If the deviation [[epsilon].sub.k] (t) is only noise, it obeys Gauss distribution. So the preliminary criteria for outlier detection are to determine the probability that the deviation follows Gauss distribution.

[mathematical expression not reproducible] (23)

In Equation 23, [s.sub.t] = 1 indicates that the real data detected is normal, [s.sub.t] = 0 means it is the outlier. So the detection criteria can be expressed that:

if P{[x.sub.t], [s.sub.t] = 1} [greater than or equal to] 0.5, then [s.sub.t] = 1; (24) if P{[x.sub.t] + [s.sub.t] = 1} < 0.5, then [s.sub.t] = 0;

In Equation 23, the subscript p is the optimal model order calculated by KICvc criteria whose expression is:

[mathematical expression not reproducible] (25)

In Equation 25, [mathematical expression not reproducible] is the mean of residual [[epsilon].sub.k] (t) under various model order (Bilmes, 2006).

[mathematical expression not reproducible] (26)

[mathematical expression not reproducible] (27)

Two step--Final detection

In final detection, the result of Preliminary detection is the observed value of HMM. So the final detection result obtained by Viterbi algorithm (Abd-Krim, 2006):

[mathematical expression not reproducible] (28)

For improved ARHMM algorithm, when the data at t time is detected, the data before t time already is detected. So the traditional Viterbi algorithm is request into:

[mathematical expression not reproducible] (29)

if [[phi].sub.t](1) > [[phi].sub.t](0) [x.sub.t] is norma] if [[phi].sub.t](1) [less than or equal to] [[phi].sub.t](0) [x.sub.t] is outlier (30)

Parameters Updating by Outlier

The parameters of order self-learning ARHMM algorithm need update online, and the parameters are estimated residual mean [mathematical expression not reproducible], State transition matrix A , [mathematical expression not reproducible] and in improved BDT algorithm. Specific update algorithm is as follows:

(A) [mathematical expression not reproducible] If is normal data, then [mathematical expression not reproducible] maintained by Equations 30, 31, otherwise, [mathematical expression not reproducible] not updated at time t.

(B) A = [([a.sub.ij]).sub.2x2] : since there are two states in ARHMM algorithm. So the updated algorithm is:

[mathematical expression not reproducible] (31)

In Equation 31, N ([a.sub.y]) indicates the times of the situation that [S.sub.t-1] = i, [S.sub.t] = j, appears (Lou, 195).

(C) [[epsilon].sub.[phi]](t), [[eta].sub.[phi]](t):If [x.sub.t], : is normal data, then [[epsilon].sub.[phi]](t),[[eta].sub.[phi]](t), calculated by Equation(14), otherwise, Using data means to replace [x.sub.t].

(D) [GAMMA]'(0): If [x.sub.t] is normal data, then [GAMMA]'(0) calculated by Equation 19; otherwise, it calculated by Equation (32).

[GAMMA][(0).sup.t] = [eta] x r x [GAMMA][(0).sup.t-1] + (1 - [eta] x r) x [x.sub.t] [x.sub.t]' (32)

(E) [R.sub.t]: If [x.sub.t] is normal data, then [GAMMA]'(0) calculated by Equation (18); otherwise, it calculated by Equation 33.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE] (33)

Results and Discussion

Model-Based Validations

To verify the accuracy of the algorithm to detect the order of the model, the three order model which reacting ore detection interference process are used to generate the data. The data is shown in Figure 2-(a), and the order detection results are shown in Figure 2-(b).

As can be seen from the order detection results, the proposed algorithm can accurately detect the model order through the short-term adjustment process.

With a view to verify that the proposed algorithm not only can identify the optimal model order but also can detect the abnormal data accurately. We modified the open-loop model mentioned in the Alex Alexandridis paper to get the second set of data (Alexandridis, Sarimveis, & Bafas, 2003). Rabiner 1989). The modified model is as follows:

[mathematical expression not reproducible] (34)

In which:

[mathematical expression not reproducible] (35)

As can be seen from (34), there are time vary parameters on the denominator. To verify the robustness of the algorithm, we add 10% of the white noise and eight anomalies. The data is shown in Figure 3-(a), and the order detection results and outliers detected results are shown in Figure 3-(b) and Figure 3-(c).

Fig.3-(a) is the data waiting for detecting. Fig.3-(b) is the order estimated result, and Fig.3-(c) is the outlier detection result. As can be seen from the result, for the third-order nonlinear time-varying system, the model-order self-learning algorithm proposed in this paper can find its optimal model order and accurately detect all the anomaly data.

Application

To further verify the practicality of the improved ARHMM outlier identified method, it is applied to the on-stream x-ray fluorescence analyzer, which, first of all, in the last century 70's by the Finotec Outotec company successfully developed and implemented to mineral processing practice. So far, Finland Outotec company still has more than 80% market share. In China, the Beijing Institute of Mining and Metallurgy following its analytical principles, developed in 2014 with the same function grade analyzer - BOXA type on-stream x-ray fluorescence analyzer (Hekimoglu, Eernoglu, & Kalina, 2009). According to foreign reports, the measurement accuracy of the analyzer increased by 1%, will effectively improve the metal recovery rate of 0.2 or more, and now from the hardware to improve measurement accuracy has been very difficult, or input-output serious disproportionate, so more scholars turn to the analysis of the modeling technology to improve research. Based on this situation, the accuracy of the data for the model of learning is essential

Our comparative tests are as follows:

The first set of data is obtained as follows: first of all, we use the improved ARHMM outlier detection algorithm proposed in this paper to pre-process the spectral signal which as the input of the analyzer, and get the ore grade results as the first set of data.

The second set of data is obtained as follows: we use the traditional AR outlier detection algorithm proposed in paper by Northey, Mohr, & Mudd (2014) to pre-process the spectral signal which as the input of the analyzer, and get the ore grade results as the second set of data.

Finally, the two sets of data are compared with the results of the ore grade laboratory test, the error of the two groups of test results as shown in table 1.

It can be seen from the table that there is higher accuracy when using the improved ARHMM outlier detection algorithm proposed in this paper to do the pre-processing of the spectrum compared to tradition AR outlier detected method, which since the improved ARHMM outlier detected algorithm have more robustness and more suitable for non-linear systems.

Conclusions

Taking into account the lack of ARHMM algorithm for ore grade analysis process, an order self-learning ARHMM algorithm is proposed in this paper, whose innovation points are summarized as: first, unlike other outlier detection method (such as the traditional AR model detection method), the outliers detection method proposed in this paper does not need to set the detection threshold. Second, considering the problem that the model order of control system's hard to be determined, the new detected method which is based on residual error has the function of model order self-learning. And third, to avoid the influence of outliers on the test results, this paper proposes a detection-before-update and detection-based-update strategies. So under above improving, ARHMM algorithm can more accurately use to analysis the data in the geological mineral grade analysis process. In other words, the application field of ARHMM algorithm has been expanded. Subsequent simulation by model data and practical application verify the accuracy, robustness, and property of online detection of the algorithm. According to the result, it is evident that new algorithm proposed in this paper is more suitable for outlier detection in the geological mineral grade analysis process.

Acknowledgments

This research is partially supported by National Natural Science Foundation of China under Grant 51607122, 61602343.

References

Abd-Krim, S. (2006). Vector Autoregressive Model-Order Selection From Finite Samples Using Kullback's Symmetric Divergence. IEEE Transactions on Circuits and Systems I: Regular Papers, 53(10), 2327.

Alexandridis, A., Sarimveis, H., & Bafas, G. (2003). A new algorithm for online structure and parameter adaptation of RBF networks. Neural Networks, 16(7), 1003-1017.

Almeida, J. A. S., & Barbosa L. M. S. (2007). A new method with outlier detection and automatic clustering. Chemometrics and Intelligent Laboratory Systems, 87(2), 208.

Barnet, V. , & Lewis, T. (1994). Outlier in Statistical Data. John Wiley & Sons, Chichester.

Bilmes, A. J. (2006). What HMMs Can Do. IEICE - Transactions on Information and Systems, E89-d(3), 1.

Brockwell, P. J., Dahlhaus, R., & Trindade, A. A. (2002). Modified Burg Algorithms for Multivariate Subset Autoregression. Technical Report 2002-015, Department of Statistics, University of Florida.

Bullen, R. J., Cornford, D., & Nnbney, I. T. (2003). Outlier detection in scatterometer data: neural network approaches. Neural Networks, 16(3-4), 419.

Clarke, B. R., & Levis, T. (1998). An outlier problem in the determination of ore grade. Journal of applied statistics, 25(6), 751-662.

De'nan, F., Naaim, N., & Leong, L. C. (2017). Behaviour of fush end-plate connection for perforated section. Engineering Heritage Journal, 1, 11-20.

Edwin, M. K. & Raymond, T. N. (1998). Algorithms for Mining Distance-based Outliers. Proceedings of the twenty-fourth international conference on very large data bases, C. 392.

Han, J. W., & Micheline, K. (2001). Data mining concepts and techniques. Machinery Industry Press, China.

Hekimoglu, S., Eernoglu, R. C., & Kalina, J. (2009). Outlier detection by means of robust regression estimators for use in engineering science. Journal of Zhejiang university-science A, 10(6), 909.

Kameshwara, R., Rao, C. R., & Narayana, A. C. (2014). Assessing grade domain of iron ore deposit using geostatistical modelling: A case study. Journal of the Geological Society of India. 83(5), 549-554.

Knorr, E. M. & Ng, R. T. (1999). Finding Intentional Knowledge of Distance-based Outliers. Proceedings of the twenty-fifth international conference on very large data bases, C. 211.

Lou, H. L. (1995). Implementing the Viterbi Algorithm. IEEE Signal Processing Magazine, 1053-5888, 42.

Northey, S., Mohr, S., & Mudd, G. M. (2014). Modeling future copper ore grade decline based on a detailed assessment of copper resources and mining. Resources conservation and recycling, 83, 190-201.

Prakobphol, K., & Zhan J. T. (2008). A novel outlier detection scheme for network intrusion detection systems. Proceedings of the second international conference on information security and assurance, C. 555.

Rabiner, R. L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. 77(2), 257.

Ramaswamy, S., Rastogi, R., & Kyuseok, S. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the ACM SIGMOD International Conference on Management of Data Dallas, C. 427.

Rivoirard, J., Demange, C., & Freulon, X. (2013). A Top-Cut Model for Deposits with Heavy-Tailed grade distribution. Mathematical geosciences, 45(8), 967-982.

Takeuchi, J., & Yamanishi, K. (2006). A Unifying Framework for Detecting Outliers and Change Points from Time Series. IEEE Transactions on Knowledge and Data Engineering, 18(4), 482.

Trindade, A. A. (2003). Implementing Modified Burg Algorithms in Multivariate Subset Auto-regressions Modeling. Department of Statistics, University of Florida.

Wang, J. S. & Chiang, J. C. (2008). A cluster validity measure with Outlier detection for support vector clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38(1), 78.

Jianjun Zhao (a,b), Junwu Zhoub (b), Weixing Su (c*), Fang Liu (c)

(a) School of Information Science & Engineering, Northeastern University, Shenyang 110004, China;

(b) BeiJing General Research Institute of Mining & Metallurgy, BeiJing 100160, China

(c) School of Computer Science & Software Engineering, Tianjin Polytechnic University, Tianjin 300387, China

(*) Email of Corresponding Author: 15900201597@163.com

Record

Manuscript received: 21/02/2017

Accepted for publication: 28/07/2017

How to cite item

Zhao, J., Zhou, J., Su, W., & Liu, F. (2017). Online Outlier Detection for Time-varying Time Series on Improved ARHMM in Geological Mineral Grade Analysis Process. Earth Sciences Research Journal, 21(3), 135-139.

Table 1. The result of Comparison between with outlier detection process and without outlier detection process Relative Relative error(%)by traditional AR error(%)By improved method ARHMM method Lead concise ore pb 3.25 2.31 Zn 4.43 3.32 Lead Tailings Pb 5.21 4.52 Zn 4.17 3.07 Zinc concise ore Pb 5.S9 4.21 Zn 1.75 1.02 Ore Pb 3.21 2.52 Zn 3.75 2.57 Total tailings Pb 5.27 4.72 Zn 4.S7 3.25

Printer friendly Cite/link Email Feedback | |

Author: | Zhao, Jianjun; Zhoub, Junwu; Su, Weixing; Liu, Fang |
---|---|

Publication: | Earth Sciences Research Journal |

Article Type: | Report |

Date: | Sep 1, 2017 |

Words: | 3414 |

Previous Article: | Mean velocity and suspended sediment concentration profile model of turbulent shear flow with probability density function. |

Next Article: | PNN-based Rock burst Prediction Model and Its Applications. |

Topics: |