# A particle filter approach to identification of nonlinear processes under missing observations.

INTRODUCTIONMany chemical processes can be modeled using nonlinear stochastic differential equations arising from fundamental physical laws. These differential equations are usually continuous in time, and various approaches exist for their discretization (Wahlberg et al., 1993; Yuz and Goodwin, 2005). For instance, the dynamics of polymerization and other chemical reactors, which are highly nonlinear, can be discretized and represented by the following general stochastic state-space model

[x.sub.t+1] = f ([x.sub.t], [u.sub.t], [theta]) + [w.sub.t] (1)

[y.sub.t] = g([x.sub.t], [u.sub.t], [theta]) + [v.sub.t] (2)

where [x.sub.t] [member of] [R.sup.n] is the n-dimensional state vector, [u.sub.t] [member] [R.sup.s] the s-dimensional input vector, [y.sub.t] [member of] [R.sup.m] the in-dimensional output or measurement vector, and [w.sub.t], [v.sub.t] are independent and identically distributed Gaussian noise sequences of appropriate dimension, [theta] [member of] [R.sup.P] is a p-dimensional parameter vector and f(x), g(x) are known nonlinear functions. A good model--in other words, a good estimate of [theta] in the above model--is required for state estimation, control, performance monitoring and assessment, fault detection and diagnosis of such processes.

While there is extensive literature on state estimation for such processes using extended Kalman filters, unscented Kalman filters, particle filters, and other approaches, parameter estimation has received comparatively less attention. A thorough discussion of the state estimation approaches is beyond the scope of this article, however, a good review of currently popular state estimation procedures is provided in Rawlings and Bakshi (2006) and in Soroush (1998), and the references therein. Practical issues in implementing particle filters are discussed in Imtiaz et al. (2006). This article focuses on parameter estimation of models of the form presented above.

In parameter estimation, the problem is to obtain an estimate of the parameter vector, [theta], by using the measured observations, [y.sub.t], and the known inputs, [u.sub.t]. The nonlinear parameter estimation approaches can be broadly divided into input-output and state-space based approaches. Input-output approaches are suitable for black-box identification, where a structure of the process model from physical laws is not available and all process variables are measured. Input-output approaches are discussed elsewhere (Haber and Keviczky, 1999; Ljung and Vicino, 2005). On the other hand, many variables in chemical processes are either not measurable or measured irregularly. Processes with such measurements can be represented naturally using state-space models. This article proposes a new method for parameter estimation of nonlinear state-space models under missing observations by combining expectation-maximization (EM) algorithm with particle filters.

EM is a standard algorithm for parameter estimation in state-space models (Shumway and Stoffer, 2000). It involves two steps, where one estimates the joint probability density of the states and the observations based on an initial estimate of parameters in the first step, and maximizes the expected value of the joint density in the second step to obtain a new estimate of the parameter vector (Dempster et al., 1977). These two steps are repeated until the change in parameters after each iteration is within a specified tolerance level. For linear systems with Gaussian noise, the expectation and maximization steps in the algorithm can be solved analytically, and explicit recursive equations for parameter estimation can be developed (Shumway and Stoffer, 1982). On the other hand, for most nonlinear state-space models with Gaussian or non-Gaussian noise, the expectation and maximization steps do not have explicit solutions.

A number of approximations of the expectation and maximization steps have been proposed in the literature for nonlinear processes. In Roweis and Ghahramani (2001) an extended Kalman filter is used to approximate the filtered states and the expected value of the complete likelihood function. In Goodwin and Aguero (2005) a similar linearized approximation of the process around a maximum a posteriori estimate of the state vector is used. In Doucet et al. (2001), Doucet and Tadic (2003), Poyiadjis et al. (2005) a particle approximation of the expectation step is used. While linearization methods fail to perform well if the nonlinearities are strong (Chen et al., 2005), particle filter approaches require large number of particles for good approximation of the expected likelihood function (Andrieu et al., 2004). Most particle based approaches use a state-path based density function to approximate the likelihood function, that is, a density function of the form p([x.sub.1], [x.sub.2], ..., [x.sub.T]) (Andrieu et al., 2004). It is known that the variance of path based density function increases rapidly with the data length, T (Andrieu et al., 2004; Poyiadjis et al., 2005). In this article, a new approximation, based on point-wise state density functions of the form p([x.sub.t]), is proposed and extended to handle missing data in the observations.

The second step in the EM algorithm involves maximization of the expected complete likelihood function with respect to the parameter vector. For most nonlinear processes, this maximization step does not have an explicit solution. However, in some special cases, such as bilinear models (Gibson et al., 2005) or processes defined by radial basis functions (Roweis and Ghahramani, 2001), it is possible to find an explicit solution. Depending on the structure of the nonlinear process, any of the above mentioned approaches can be used with the method proposed in this article. If an explicit solution cannot be obtained, one can use either line search methods or gradient-based methods for maximization.

Missing observations are commonplace in the chemical industry. For linear systems, EM algorithm has been adapted to handle missing observations (Isaksson, 1993; Shumway and Stoffer, 2000), and also applied in practice (Raghavan et al., 2006). Other approaches for linear systems based on lifting techniques (Li et al., 2001, 2003) and continuous time identification (Wang and Gawthrop, 2001) have also been reported. While the importance of parameter estimation for nonlinear processes under missing observations has long been recognized (Gudi et al., 1995; Tatiraju et al., 1999), to the best knowledge of the author, no work has been reported for nonlinear stochastic processes considered in this article. Missing observations can be elegantly handled using the EM algorithm. To the best knowledge of the author, the published work on parameter estimation for nonlinear systems treats only states as missing data. In this article, the EM algorithm is adapted to handle missing observations also.

Before proceeding with this article, it is pointed out that there is extensive literature in Bayesian statistics on parameter estimation for nonlinear systems. Many such approaches consist of imposing a prior distribution on the parameter vector and augmenting the state vector with the parameters. However, these approaches are known to be inefficient as they result in large parameter covariance (Andrieu et al., 2004). In Kitagawa (1998) the above problem was solved by introducing artificial dynamics on the parameter vector and augmenting the state vector and in Doucet et al. (2001) a kernel based approach is used. An application of kernel based approach is presented in Chen et al. (2005). These approaches can potentially be extended to handle missing observations. However, the problems mentioned above still remain, and hence the choice of EM algorithm in this article.

The following are the main contributions of this article: (a) A new approach to maximum likelihood estimation (MLE) under missing observations is presented. (b) An expression for the expectation step in the EM algorithm involving only point-wise (as opposed to path-based) density functions is derived. (c) Particle filter approximations of various density functions involved in approximating the expectation step in the EM algorithm are derived. (d) Numerical examples to illustrate the methodology developed in this article are presented.

This article is divided into the following sections. In Expectation-Maximization Algorithm Section, a brief description of the EM algorithm is presented, in The Q-Function Section, an approximation of the Q-function is presented, in Bayesian Filtering and Smoothing Section, Bayesian filtering and smoothing algorithms for missing observations are presented, in Identification Algorithm Section an approximation of the EM algorithm is presented, in Computational Cost Section computational cost of the developed algorithms is discussed, and in Examples Section a few simulation examples are presented.

EXPECTATION-MAXIMIZATION ALGORITHM

Algorithm

MLE, where the likelihood function of the observed data is maximized with respect to the parameter vector, is a popular approach for parameter estimation (Ljung, 1999). In input-output models, where there are no hidden states or missing observations, it is often possible to construct a likelihood function and maximize it. On the other hand, in state-space models, it is difficult to construct such a likelihood function due to the presence of hidden states and/or missing observations. Expectation-maximization is an elegant optimization algorithm that constructs a complete likelihood function including the hidden states and missing observations, and maximizes the likelihood function of observed data through iterations. A brief description of the EM algorithm is presented in this section to facilitate the development of the algorithm in later sections.

Consider the state-space model described in (1-2). Let p([y.sub.1:T]|[theta]) denote the joint likelihood function of the observed data. The maximum likelihood estimate of the parameter vector is obtained by maximizing this observed data likelihood function. Since, the likelihood function and the log-likelihood function have the same maximizer, henceforth, the log-likelihood function is maximized. For certain classes of state-space models, such as linear systems, it is possible to derive an explicit expression for this joint density. However, for the model in (1-2), it is difficult to develop such an expression. Instead, using the Markov property of the model it is straightforward to develop an expression for the complete (including states and observations) likelihood function, p([x.sub.1:T], [y.sub.1:T]|[theta]), In light of this feeature of the Markovian state-space model, p([y.sub.1:T]|[theta]) can be decomposed into two parts as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [E.sub.[theta]'] (x) denotes the expectation operator with respect to p([x.sub.1:T]|[y.sub.1:T], [theta]'), and [theta]' is any fixed value of the parameter vector. The first term, Q([theta]', [theta])--henceforth called Q-function, in the above decomposition is the expected value of the joint likelihood function of the hidden states and the observations. It will be shown, in a later section, that this term can be easily evaluated due to the Markovian property of the state-space model. Let [[theta].sub.i] be an estimate of the parameter vector. Then

log[p([y.sub.1:T]|[theta])] = Q([[theta].sub.i], [theta]) - H([[theta].sub.i], [theta])

Let a new parameter vector be defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [OMEGA] is the set of all feasible values of [theta]. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

where the first inequality follows from the definition of [[theta].sub.i+1] and the second inequality from Jensen's inequality (Casella and BergeL 1990). Therefore, by simply maximizing the complete likelihood function, it is possible to obtain a new estimate of [theta] such that the likelihood function increases. This maximization approach is called EM algorithm and is summarized below:

(1) Choose an initial guess of the parameter vector [[theta].sub.0] [member of] [OMEGA].

(2) Estimate the states given the parameter vector and the observations and evaluate Q([[theta].sub.i], [theta]).

(3) Maximize Q ([[theta].sub.i], [theta]) with respect to [theta]. Call the maximizing value [[theta].sub.i+1].

(4) Repeat the above two steps until the change in parameter vector is within a specified tolerance level.

The second step in the above algorithm is called E-step and the third step is called M-step. The likelihood function increases monotonically through these iterations. Under some regularity conditions, it has been shown that the EM parameter estimates converge to a local maximum of the likelihood function (Wu, 1983).

Approximations

In many situations, it is not possible to obtain an explicit expression for Q([[theta].sub.i], [theta]). Hence, a number of approximations have been proposed in the literature and it is beyond the scope of this article to discuss them in detail. Interested readers may refer to McLachlan and Krishnan (1997). In this section a brief summary of the particle based approximations and the associated disadvantages are explained.

The Q-function is an integral given by

Q([[theta].sub.i], [theta]) = [summation] p([x.sub.1:T]|[y.sub.1:T], [[theta].sub.i])log[p([x.sub.1:T], [y.sub.1:T]|[y.sub.1:T], [theta])][dx.sub.1:T]. (4)

A variant of EM called stochastic EM uses a single realization of the joint density function p([x.sub.1:T],[y.sub.1:T]|[y.sub.1:T], [theta]) to approximate the above integral (Celeux and Diebolt, 1985)

Q([[theta].sub.i], [theta]) [approximately equal to] log[p([x.sup.i.sub.1:T], [y.sub.1:T]|[y.sub.1:T], [theta])]

where [x.sup.i.sub.1:T] is a realization of the states drawn from the distribution p([x.sub.1:T]|[y.sub.1:T], [[theta].sub.i]). Another stochastic version of EM algorithm called Monte-Carlo EM algorithm makes use of the following better approximation (Wei and Tanner, 1990)

Q([[theta].sub.i], [theta]) [approximately equal to] 1/N [N.summation over (i=1)] log[p([x.sup.i.sub.1:T], [y.sub.1:T]|[y.sub.1:T], [[theta]].

This approximation does not take into account the likelihood of occurrence of each state trajectory [x.sub.i.sub.1:T]. A particle based approximation shown below accounts for this (Andrieu et al., 2004)

Q([[theta].sub.i], [theta]) = [N.summation over (i=1)] [w.sup.i]log[p([x.sup.i.sub.1:T], [y.sub.1:T]|[y.sub.1:T], [[theta]]. (5)

where [w.sup.i] are weights proportional to the likelihood of occurrence of [x.sup.i]. In Andrieu et al. (2004), it is pointed out this approach is inefficient as it requires sampling the state trajectory from a large dimensional density function of states, p([x.sub.1:T]|[y.sub.1:T], [theta]). As the data size (T) increases, the dimensionality of this density function also increases making it extremely inefficient in approximating the Q-function. In this article, a new approach for approximating the Q-function that uses only point-wise state density functions, such as p([x.sub.t]|[y.sub.1:T], [theta]), is presented. Since the dimensionality of p([x.sub.t]|[y.sub.1:T], [theta]) does not increase with time, the accuracy of this density function, and hence that of the Q-function do not deteriorate with increase in data size.

THE Q-FUNCTION

In this section an approximation of the Q-function that is free from the dimensionality problems explained earlier, and that can handle missing observations, will be developed using the Markovian property of the state-space model.

In order to approximate the Q-function in (4), one needs an approximation of the density function p([x.sub.1:T]|[y.sub.1:T], [[theta].sub.i]). It is possible to approximate this path-based density function directly (Godsill et al., 2004). However, since the density function is high-dimensional (its dimension depends on the data length), extremely large number of particles are required to approximate it reliably (Andrieu et al., 2004; Poyiadjis et al., 2005). In the following paragraphs, an alternative approach where the Q-function is written as a sum of point-wise density functions, such as p([x.sub.t]|[y.sub.1:T], [theta]), is presented. Note that the dimensionality of p([x.sub.t]|[y.sub.1:T], [theta]) is much smaller than the dimensionality of the joint density function of the states.

Full Data Case

Consider the case where all the observations {[y.sub.1], ..., [y.sub.T]} and the inputs {[u.sub.1], ..., [u.sub.T]} are available. In the rest of this article, it is assumed that the inputs are known and all the density functions of the form p(.|., ., [u.sub.1:T} are denoted by p(.|., .) without explicitly showing the input dependence. Then, using the Markov property of the state-space model, the joint density function of states and outputs can be written as (repeatedly using (35), (36)),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

Therefore, by substituting (G) in (4), the Q-function can be expanded as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (7)

Using the following expressions, the Q-function can be further simplified (using multi-variable version of (3G)).

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (8)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

[integral] log[p([y.sub.t]|[x.sub.t], [theta])]p([x.sub.1:T]|[y.sub.1:T], [[theta].sub.i])][dx.sub.1:T]

= [integral] log[p([y.sub.t]|[x.sub.t], [theta])]p([x.sub.t]|[y.sub.1:T], [[theta].sub.i])[dx.sub.t]. (10)

Now substituting (8), (9), (10) in (7), the following form of Q-function can be obtained

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

From the above expression, notice that approximations of the density functions p([x.sub.1]|[y.sub.1:T], [theta].sub.i]), p([x.sub.t-1:t]|[y.sub.1:T], [[theta].sub.i]), p([x.sub.t]|[y.sub.1:T], [[theta].sub.i]) would allow one to approximate the Q-function.

Missing Data in Output

Suppose that only a portion of the output measurements at time instants {[t.sub.1], ..., [t.sub.[gamma]]} are available and are not available at time instants ([s.sub.1], ..., [s.sub.[beta]]). In other words only [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] out of {[Y.sub.1], ..., [y.sub.T]} are available. For notational simplicity, it is also assumed that [y.sub,1] and [Y.sub.T] are available. Then the Q-function can be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (12)

Using expressions similar to (8), (9), and (10), one can simplify (12) to the following expression

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (13)

In order to approximate the Q-functions in (7) and (13), approximations of the following density functions are required,

Full data case

(1) p([x.sub.t]|[y.sub.1:T], [theta])

(2) p([x.sub.t-1], [x.sub.t]|[y.sub.1:T], [theta])

Missing data case

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Notice that the maximum dimensionality of the above density functions is max(2n, n + m), and hence the accuracy of these density functions does not deteriorate with increase in the size of available measurements as is the case with the method suggested in Andrieu et al. (2004).

BAYESIAN FILTERING AND SMOOTHING

In this section, Bayesian algorithms to generate approximations of the above density functions are presented.

Filtering

Full data

The density function of the states given the past and current outputs, p([x.sub.t]|[y.sub.1:t], [theta]) is called a filter. Applying Bayes' rule in a straightforward manner, one can derive recursive expressions for the density function of the filter. The following predictor density function can be derived using Bayes' rule (see (35), (36)),

p([x.sub.t]|[y.sub.1:t-1], [theta]) = [integral] p([x.sub.t], [x.sub.t-1]|[y.sub.1:t-1], [theta])[dx.sub.t-1], = [integral] p([x.sub.t]|[x.sub.t-1]r, [theta])p([x.sub.t-1]|[y.sub.1:t-1, [theta])[dx.sub.t-1], (14)

Now using the predictor and (35), one can write the following expression for the filter,

p([x.sub.t]|[y.sub.1:t], [theta]) = P([Y.sub.t]|[x.sub.t], [theta])p([x.sub.t]|[y.sub.1:t-I], [theta])/[integral] p ([y.sub.t]|[x.sub.t], [theta])p([x.sub.t]|[y.sub.1:t-1], [theta])[dx.sub.t] (15)

The filter density can be evaluated recursively by substituting (14) in (15). The above integrals needed to estimate the filter density are often intractable, and need to be approximated.

Although numerous approximations are available, in this article a particle filter approach is used. The basic idea behind particle filters is to approximate a density function using dirac-delta functions. The filter density at t - 1, could be approximated as

P([x.sub.t1]|[y.sub.1:t-1], [theta]) = [N.summation over (i=1)][w.sup.(i).sub.t-1|t-1][delta]([x.sub.t-1] - [x.sup.(i).sub.t-1]) (16)

where [w.sup.(i).sub.t-1|t-1] are weights proportional to the filter density at [x.sup.(i).sub.t-1], and [delta](x) is a dirac-delta function. Substituting (1G) in (14), an approximation of the predictor can be obtained as follows:

p([x.sub.t]|[y.sub.1:t-1], [theta]) = [N.summation over (j=1)p([x.sub.t]|[x.sup(j).sub.t-1], [theta])[w.sup.(j).sub.t-1|t-1] (17)

Similarly, substituting (17) in (15), one can approximate the filter density function (Poyiadjis et al., 2005)

p([x.sub.t]|[y.sub.1:t], [theta]) = [N.summation over (i=1)] [w.sup.(i).sub.t\t] [delta]([x.sup.t] - [x.sup.(i).sub.t] (18)

where [x.sup.(i).sub.t] are chosen from an importance sampling function p([x.sub.t]|[y.sub.1:t-1], [theta]), and therefore weights are given by

[w.sup.(i).sub.t\t] = p([y.sub.t]|[x.sup.(i).sub.t], [theta])/[N.summation over (j=1)]p([y.sub.t]|[x.sup.(j).sub.t]. [theta]) (19)

Particle Filter Algorithm--Full Data

1. Initialization: Generate N samples of the initial state [x.sub.1], from an initial distribution, p([x.sub.1]). Set [w.sup.(i).sub.1|1] = 1/N for i [member of] {1, ..., N}. Set t = 2.

2. Prediction: Sample N values of [x.sub.t] from the distributions p([x.sub.t]|[x.sub.(i).sub.t-1]), [theta]) for each i.

3. Update: Using (19), find the weights of filter density, [w.sup.(i).sub.t\t].

4. Resampling: Resample N particles from the set {[x.sup.(1).sub.t], ..., [x.sup.(N).sub.t]} with the probability of picking [x.sup.(i).sub.t] being [w.sup.(i).sub.t\t]. Assign [w.sup.(i).sub.t\t] = 1/N for all i.

5. Set t = t + 1. Repeat the above steps (2), (3), and (4) for t [less than or equal to] T.

Missing data

For the missing data case, the prediction equation is used recursively until an observation is available. Once an observation is available, the update equation is also used. If an observation is not available at time t, then the following filter equation (ideally it should be called a predictor since no observation is available at time t. However, to be consistent with the full data case, it is called a filter) is used (which is obtained by repeatedly applying (35))

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (20)

where [t.sub.[alpha]] is the last observation available up to time t. Now assuming that the following approximation of the filter at time [t.sub.[alpha]] is available,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (21)

one can write an approximation of (20) at time t, by substituting (21) in (20), as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which can be represented using particles as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (22)

where

[[bar.w].sup.(i).sub.t\t] = 1/N, (23)

and if an observation is available at time t, then the filtered density is

p([x.sub.t]|[y.sub.1:t], [theta]) = [N.summation over (i=1)] [[bar.w].sup.(i).sub.t\t] [delta]([x.sub.t] - [x.sup.(i).sub.t]) (24)

where [[bar.w].sup(i).sub.t\t] are given by (19) and [x.sup.(i).sub.t] are drawn from the density function [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Particle Filter Algorithm--Missing Data

1. Initialization: Generate N samples of the initial state [x.sub.1], from an initial distribution, p([x.sub.1]). Set [[bar.w].sup.(i).sub.1|1] = 1/N for i [member of] {1, ..., N}.

2. Prediction: Sample N values of [x.sub.t] from the distributions p([x.sub.t]|[x.sup.(i).sub.t-1], [theta]) for each i.

3. Update: If [y.sub.t] is available, using (19), find the weights of filter density, [[bar.w].sup.(i).sub.t\t]. If not, use the density in (22) as the filtered density.

4. Resampling: Resample N particles from the set {[x.sup.(1).sub.t], ..., [x.sup.(N).sub.t]} with the probability of picking [x.sup.(i).sub.t] being [[bar.w].sup(i).sub.t\t]. Assign [[bar.w].sup.(i).sub.t\t] = 1/N for all i.

5. Set t = t + 1. Repeat the above steps (2), (3), and (4) for t [less than or equal to] T.

Smoothing

Full data

The density function of a state given the past and future observations is called a smoother. In the above expressions for the Q-function, p([x.sub.t]|[y.sub.1:T], [theta]) is the smoothed density functions of the states. There are many approaches to estimate the density function of the smoothed states (Klaas et al., 2006). A forward-backward smoother algorithm, which is explained below, is used in this article. The smoothed density can be factored as (see Appendix for a complete derivation)

p([x.sub.t]|y./sub.1:T], [theta]) = [integral] p([x.sub.t], [x.sub.t+1]|[y.sub.1:T], [theta])[dx.sub.t+1] (25)

p([x.sub.t]|[y.sub.1:T], [theta]) = [integral] p([x.sub.t+1]|[y.sub.1:T], [theta])p([x.sub.t]|[x.sub.t+I], [y.sub.1:t], [theta])[dx.sub.t+l] (26)

p([x.sub.t]|[y.sub.1:T], [theta]) = p([x.sub.t]|[y.sub.1:t], [theta]) [integral] p([x.sub.t+1]|[y.sub.1:T], [theta])p([x.sub.t+1]|[x.sup.t], [theta])[dx.sub.t+1]/[integral] p([x.sub.t+1]|[x.sup.t], [theta])p([x.sub.t]|[y.sub.1:t], [theta]) [dx.sub.t] (27)

Hence the smoothed density function can be obtained as a function of filtered state density at time t, smoothed density at t + 1 and the state prediction density p([x.sub.t+1]|[x.sub.t], [theta]). Clearly, this approach to smoothing involves a forward filtering step and a backward smoothing step. Assuming that the smoothed density function at time t, can be approximated using the following particle approximation

p([x.sub.t]|[y.sub.1:T], [theta]) = [N.summation over (i=1)] [w.sup.(i).sub.t\T] [delta]([x.sub.t] - [x.sup.(i).sub.t])

one can derive the following recursive particle approximation of the smoothed density function (Klaas et al., 2006),

[w.sup.(i).sub.t\T] = [w.sup.(i).sub.t\T][[N.summation over (j=1)] [w.sup.(i).sub.t+1\T] p([w.sup.(i).sub.t+1|[w.sup.(i).sub.t]), [theta])/[N.summation over (k=1)] [w.sup.(k).sub.t\t]p[x.sup.(j).sub.t+1]| [x.sup.(k).sub.t], [theta]] (28)

Particle Smoother Algorithm-Full Data

1. Filtering: For t = 1 to t = T perform filtering according to the algorithm in the previous section and obtain weights [w.sup.(i).sub.t\t] for all i.

2. Initialization: Initialize the smoother weights at t = T to [w.sup.(i).sub.T\T] = 1/N for all i.

3. Snwothing: Find the smoothed weights recursively using (28).

Missing data

It is easy to see that the backward recursion of the smoothing algorithm does not depend on the observations while the forward recursion depends on the observations. Therefore, the only modification needed in the smoothing algorithm is usage of missing data weights of the filtering density. The algorithm is summarized for the sake of completeness.

Particle Smoother Algorithm--Missing Data

1. Filtering: For t = 1 to t = T perform filtering according to the algorithm in the previous section and obtain weights wrier for all i.

2. Initialization: Initialize the smoother weights at t = T to [[bar.w].sup.(i).sub.T\T = 1/N.

3. Smoothing: Find the smoothed weights using (28) by replacing [w.sup.(i).sub.t\t] by [[bar.w].sup.(i).sub.t\t].

Joint Distribution of [x.sub.t], [x.sub.t+1]

The joint density function between [x.sub.t] and [x.sub.t+1] can be obtained by using (27)

p([x.sub.t], [x.sub.t+1]|y.sub.1:T], [theta]) = p([x.sub.t]|[y.sub.1:t], [theta]) p([x.sub.t+1]|[y.sub.1:T], [theta])p([x.sub.t+1]|[x.sub.t], [theta])/[integral] p([x.sub.t+1]|[x.sub.t], [theta])p([x.sub.t]|[y.sub.1:t], [theta])[dx.sub.t] (29)

Substituting particle approximations of p([x.sub.t]|[y.sub.1:t], [theta]) and P([x.sub.t+1]|[y.sub.1:T], [theta]) in (29), the following approximation of the joint distribution can be obtained,

p([x.sub.t], [x.sub.t+1]|[y.sub.1:T], [theta]) = [N.summation over (j=1)] [N.summation over (j=1)] [w.sup.(ij).sub.t,t+1} [delta]([x.sub.t] - [x.sup.(i).sub.t]) [delta]([x.sub.t+l] - [x.sup.(j).sub.t+1)

where

[w.sup.(ij).sub.t,t+1] = [w.sup.(i).sub.t\t] [w.sup.(j).sub.t+1\T] p[w.sup.(j).sub.t+1]|[w.sup.(i).sub.t], [theta])/[N.summation over (k=1)] [w.sup.(K).sub.t\t]p [w.sup.(j).sub. t+1]| [x.sup.(k).sub.t], {theta]]

The above approximation is found to be computationally very expensive, and hence it is replaced with the following approximation that has fewer particles,

p([x.sub.t], [x.sub.t+1]1|[y.sub.1:T], [theta]) = [N.summation over [i=1] [w.sup.(i).sub.t,t+1] [delta]([x.sub.t] - [x.sup.(i).sub.t+1])

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Similarly, for the missing data case simply replace the weights of the filter in the above equation by those corresponding to missing data.

Joint Distribution of [x.sub.t], [y.sub.t]

The joint distribution, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is required at sample times where the observations are missing.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (30)

In the above expression, the first term on the right hand side, P([y.sub.t]|[x.sub.t], [theta]), is the density function of the observations given the state, and the second term, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is the smoother. Since [y.sub.t] is missing, it is possible to obtain an estimate of [y.sub.t] for a given [x.sub.t] from the density function p([y.sub.t]|[x.sub.t], [theta]). Therefore, one can use the following particle approximation of p([y.sub.t]|[x.sub.t], [theta])

p([y.sub.t]|[x.sub.t], [theta]) ~ p([y.sup.(i).sub.t]|[x.sup.(i).sub.t]), [theta])[delta]([x.sub.t] - [x.sup.(i).sub.t])[delta]([y.sub.t] - [y.sup.(i).sub.t]). (31)

It is in place to mention that a better approximation of p([y.sub.t]|[x.sub.t], [theta]) can be obtained by using more than one estimate of [y.sub.t] for every [x.sup.(i).sub.t]. While this approach increases the accuracy, it comes at an increase in computational burden, and hence the approximation in (31) is used. The required joint distribution can now be approximated as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (32)

While basic algorithms for particle filter approximations of various density functions are presented in this section, there are a number of implementational issues that need to be considered and are problem specific. For instance, a prior that spans the support of the likelihood function needs to be chosen for convergence of the particle filter approximations (Doucet et al., 2001). Some of these implementational issues, including suggestions for tuning the state and measurement noise, are discussed in Imtiaz et al. (2006).

IDENTIFICATION ALGORITHM

By combining the equations for the filter, smoother and the joint density functions, one can approximate the Q-function. Once an approximation of the Q-function is available, it is possible to maximize it with respect to the parameter vector and obtain the next iterate of the EM algorithm. By substituting the particle approximations of various density functions derived in the previous section in (13), an approximate Q-function can be written as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (33)

Then the EM algorithm can be summarized as

1. Initialization: Initialize the parameter vector to [theta].sub.0]. Set i = 0.

2. Expectation: Evaluate the approximate Q-function according to (33) using [theta]' = [[theta].sub.i].

3. Maximization: Maximize the Q-function with respect to 0 and call the maximizing parameter, [[theta].sub.i+1]. Then set [theta] = [[theta].sub.i+1].

4. Iterate: Repeat steps 1 and 2 until the change in parameter vector is within a specified tolerance level.

For certain structures of the nonlinear state-space models, it is possible to explicitly evaluate the maximum [theta]. However, this is not always the case. If explicit solutions are not possible, then any standard nonlinear maximization routine can be used. The downside of this approach is that there are two levels of iterations--one at the EM algorithm level and the other at the maximization level. These nested iterations increase the computational burden. Therefore, this approach may not be suitable for online model estimation. It is efficient in handling nonlinear processes when batch identification is needed.

As mentioned earlier, the EM iterations are guaranteed to converge if the exact value of the Q-function is used. Since an approximation of the Q-function is used in this example, the iterations do not converge to a single value of [theta]. Instead [theta] converges to a distribution. In other words, after a large number of iterations, the sequence of parameter estimates, [[theta].sub.i], appear to be drawn from a probability density function whose mean is the true parameter vector. While EM algorithm is guaranteed to converge in its original form, there is no guarantee that it converges to the global optimal value. However, stochastic EM algorithms, such as the one proposed in this article, appear to converge to the global optimal value more often than other optimization methods. This feature has been observed through simulations in Schbn et al. (2006).

An inherent disadvantage with any method for identification under missing observations is that it may lead to biased parameter estimates. However, in Little and Rubin (1987) it is shown that unbiased parameter estimates can be obtained for Markov models with no external input, if the observations are missing randomly. Similar results could potentially be derived for the state-space model with external input used in this article. Intuitively speaking, one would suspect that if the input is white and the missing data are completely random, as is the case with the examples presented in this article, there would be no bias. Although there is no proof as yet, this is corroborated by simulation examples in the article. Despite this disadvantage, the proposed method is useful, and is one of the very few methods available for identification of nonlinear state-space models under missing observations. Therefore, caution is advised in applying the proposed approach on data with a large fraction of missing observations. It is felt that appropriately designed experiments would potentially mitigate the effect of missing data on the parameter bias. Experiment design under missing observations remains an unsolved problem.

In many processes, the model parameters often change due to a variety of reasons. The proposed approach could potentially be extended to adaptively update model parameters. However, a discussion of such an extension to the method is beyond the scope of this article.

COMPUTATIONAL COST

In general, particle algorithms incur high computational cost. The particle filter algorithm presented in this article requires calculations of the order of O(NT) while the particle smoother algorithm and the algorithm for the joint distribution between [x.sub.t], [y.sub.t] incur approximately O([N.sup.2]T) calculations. Alternative smoother algorithms, such as the one in Godsill et al. (2004), which incur only O(NT) have also been developed. However, these algorithms provide samples of the smoothed state trajectories, and not the distribution of the individual states. Therefore, their use in the identification method developed in this article is quite inefficient (Andrieu et al., 2004). Even though the algorithms presented in this article are computationally intensive, computational intensity or the time taken to run these algorithms may not be of any consequence if they are run as batch algorithms. However, particles sizes of the order of 1000 can result in many hours of running time on the fastest desktops available when the algorithms are implemented in MATLAB.

The most time consuming step in these particle based algorithms in presence of exponential density functions, is the evaluation of functions of following form,

[f.sup.(i)] = [N.summation over (j=1) [g.sup.(j)]p([x.sup.(i).sub.t]|[x.sup.(j).sub.t-1]).

where [g.sup.(j)] are some known constants. This function can be approximated to a specified accuracy, using fast computational methods (Klaas et al., 2006). Due to lack of space, instead of delving into these methods, it is simply pointed out that the algorithms developed in this article are speeded up considerably by using fast approximations of the function form shown above.

Furthermore, incorporating the recently developed fast filtering and smoothing algorithms and using code written in C++ is likely to considerably decrease the computational time. In fact, the authors in Klaas et al. (2006) reported running fast filtering and smoothing algorithms with millions of particles in a matter of few minutes.

EXAMPLES

In this section, two examples are presented to illustrate the algorithms developed in this article. The first example is taken from Goodwin and Aguero (2005) and the second example is a chemical reactor from Momingred et al. (1992).

Example: Synthetic

Consider the following nonlinear process (Goodwin and Agiiero, 2005)

[x.sub.t+1] = [ax.sub.t] + [bu.sub.t] + [w.sub.t] [Y.sub.t] = ccos([x.sub.t]) + [v.sub.t] (34)

where [w.sub.t] ~ (0, Q), [v.sub.t] ~ (0,R),and a = 0.9, b = c = 1, Q = R = 0.1. In order to compare the results from the proposed algorithm with those reported in Goodwin and Aguero (2005), similar simulation conditions are used to the extent possible. As in Goodwin and Aguero (2005), the following initial parameter estimates are used [??] = [??] = [??] = 0.5, [??] = [??] = 0.05 with an input variance of unity. The algorithm presented in this article is then applied on data simulated from the model in (34). In the first simulation experiment, T = 100 measurements are collected, and all the available data are used in the algorithm with N = 150 particles. The model parameters converged to a neighbourhood of the true parameters in about 100 iterations. The trajectory of the parameters is shown in Figure 1. The trajectory of the log-likelihood function is shown in Figure 2. In the original form of EM algorithm, the likelihood function is expected to increase after each iteration. However, as seen in Figure 2, the likelihood function is not a monotonically increasing function. This feature of the approximate EM algorithm proposed in this article is due to the fact that only an approximation of the expectation algorithm is used and not the exact expected value of the complete likelihood function. In the second experiment on this model, 10% of the available measurements are removed randomly. In other words, an observation is taken if a uniformly distributed random variable, q, in the interval [0,1] is less than 0.1. Similar experiments are conducted with 25% and 50% of the data missing. The trajectories of the log-likelihood function are shown in Figure 2. The parameter values after 100 iterations from each of these experiments are shown in Table 1. In all the experiments, estimated parameters settled into a neighbourhood of the true parameters. Unlike traditional EM algorithm, where the variance in the estimated parameter is due to the noise in the process, there is an additional source of variance due to the particle filter approximation. The variance due to noise can be decreased by increasing the data size, and the variance due to particles can be decreased by increasing the number of particles.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

As the percentage of missing data increases, the algorithm takes lot more iterations to settle close to the true parameters (see Figure 2). In fact, the experiment with 50% missing data does not settle into a neighbourhood of the true log-likelihood even after 100 iterations. The relatively noticeable variance in the Q-function is due to the small data length. In the next example, a much larger data set is chosen resulting in smaller variance in the Q-function.

Example: Adiabatic CSTR

The governing equations of a popular CSTR shown in Figure 3 are given below (Morningred et al., 1992; Chen, 2004)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [C.sub.A] is the concentration of the reactant in the reactor, T the temperature in the reactor, q the flow rate, V the volume of the reactor, [C.sub.Ai] and [T.sub.i] are inflow concentration and temperature, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the reaction rate, [DELTA]H is the reaction heat, [rho] and [[rho.sub.c] are the densities of the reactant and the cooling fluid respectively, [C.sub.p] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the corresponding specific heats, h and A are the effective heat transfer coefficient and area respectively, [T.sub.c] and [q.sub.c] are the temperature and flow rate of the cooling fluid. Finite difference discretization of the above continuous time differential equations results in the model in (1)-(2), where the state vector is [x.sub.t] = [[C.sub.A](t) T(t)],and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [[theta].sub.1] = [k.sub.0], [[theta].sub.2] = ([k.sub.0] [DELTA]H)/([rho][C.sub.p]), [[theta].sub.3] = hA, R([x.sub.t], [u.sub.t], [theta]) = [x.sub.t] and [DELTA]t is the discretization sample time. The parameters and operating conditions of this CSTR are given in Henson and Seborg (1997) and are reproduced in Table 2. For simulation purposes, the CSTR is operated around a steady state corresponding to [C.sub.A] = 0.1 mol/L and T = 438.54 K with the following noise covariance matrices Q = 2.5 x [10.sup.-7] [0.1 0;0 1] and R = 2.5 x [10.sup.-5] [0.1 0;0 1] and discretizing sample time, [DELTA]t = 0.02 is chosen. In order to reduce the number of parameters, the state and measurement covariance matrices are parameterized as follows: Q = [q.sup.2] x [10.sup.-5] [0.1 0;0 1] and R = [r.sup.2] x [10.sup.-3] [0.1 0;0 1]. The initial guess for the parameter vector is [[theta].sub.1] = 6 x [10.sup.10], [[theta].sub.2] = 14.4 x [10.sup.12], [[theta].sub.3] = 6 x [10.sup.2], q = [square root of 0.05], and r = [square root of 0.05].

[FIGURE 3 OMITTED]

This example posed a couple of unforeseen challenges. It is found that 'large' levels of noise (of the order of variance greater than [10.sup.-8]) in the state equation lead to an unstable system. On the other hand, it is well-known that small noise levels result in an EM algorithm that is extremely slow at converging (Petersen et al., 2005). In fact during simulations, the Q-function barely changes even after 20 iterations. Therefore, in order to speed up the EM algorithm, an overrelaxed EM algorithm (Salakhutdinov et al., 2003) is implemented. The idea behind overrelaxed algorithm is to hasten the movement of parameter vector in the direction in which it is moving by taking steps larger than those suggested by the EM algorithm. If these large steps result in a decrease in the value of Q-function, then the corresponding value of [theta] is thrown away and the basic version of EM algorithm is started again from the previous value of parameter vector. The overrelaxed EM algorithm is summarized below:

Overrelaxed EM Algorithm

(1) Set [zeta] = 1; Q ([[theta].sub.0], [[theta].sub.0]) = -[infinity]; [delta] = tol; [gamma] > 1 and Define delta] = (Q([[theta].sub.i], [[theta].sub.i]) - Q([[theta].sub.i-1], [[theta].sub.i-1))/|Q([[theta].sub.i], [[theta].sub.i])|

(2) Evaluate Q([[theta].sub.i], [[theta].sub.i]).

(3) Maximize Q([[theta].sub.i], [theta]) with respect to [theta]. Call the maximizing value, [[theta].sup.EM.sub.i+1].

(4) If i > 1 then--If [delta] > tol, then update [zeta] = [gamma][zeta] else if [delta] < tol set [zeta] = 1.

(5) Set [[theta].sub.i+1] = [[theta].sub.i] + [zeta]([[theta].sup.EM.sub.i+1] - [[theta].sub.i]).

(6) Repeat steps (2), (3), (4), (5) until EM algorithm convergence criteria are satisfied.

A plot showing the Q-function as a function of the iterations is shown in Figure 4. As in the previous example, as the percentage of missing data increases, the algorithm gets slower. The parameter values after 300 iterations are shown in Table 3. All the parameters, except [[theta].sub.3], converge to a neighbourhood of the true parameter values. It is found that the sensitivity of the estimated log-likelihood function to changes in [[theta].sub.3] is smaller than its variance. This is supported by the fact that the true average log-likelihood function is 24.2631, while the Q-function in the simulations converges to a neighbourhood of this true value (about 24.1 from Figure 4) even though [[theta].sub.3] is not in the neighbourhood of its true value. A plot of the [zeta] value as a function of the iterations, is shown in Figure 5. The larger values of [zeta] initially speeded up the EM algorithm by "pushing harder" the parameter vector in the direction suggested by the EM algorithm.

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

CONCLUSIONS

An identification algorithm based on expectation-maximization is developed for nonlinear state-space models to handle missing observations. The expectation step in the EM algorithm is performed by using particle approximations of state filter, smoother, and a joint density function between the state and observations. The convergence of EM algorithm depends on the percentage of missing observations. The higher the missing observations, the slower the EM algorithm. The proposed algorithm is computationally intensive, however, this problem is mitigated to an extent by using fast computational algorithms for exponential functions.

ACKNOWLEDGEMENTS

The author would like to thank Vidya Koramraju and Vinay Kotamraju for suggestions and proofreading.

END NOTES

The variable plotted is proportional to the average log-likelihood function.

APPENDIX

The following formulae, where x and y are any two random variables, are repeatedly used in this article (Papoulis, 2002):

p(x,Y) = p(x|y)p(y) = p(y|x)p(x) (35)

p(x) = [integral] p(x, y)dy (36)

p([x.sub.1], [x.sub.2]) = p([x.sub.1]) if [x.sub.1], is independent of [x.sub.2] (37)

Derivation of (27): Using (36), one can write the smoother as,

p([x.sub.t]|[y.sub.1:T, [theta]) = [integral] p([x.sub.t], [x.sub.t+1]|y.sub.1:T], [theta])[dx.sub.t+1] (38)

and now using (35) and (38),

p([x.sub.t]|[y.sub.1:T], [theta]) = [integral] p([x.sub.t+1]|[y.sub.1:T], [theta])p([x.sub.t][x.sub.t+1], [theta])[dx.sub.t+1] (39)

Using (35) and observing that

P([x.sub.t+1]|[x.sub.t], [y.sub.1:t], [theta]) = p([x.sub.t+1]|[x.sub.t], [theta]),

one can write,

p([x.sub.t]|[x.sub.t+1], [y.sub.1:t], [theta]) = p([x.sub.t][y.sub.1:t], [theta])p([x.sub.t+1]|[x.sub.t], [theta])/ p([x.sub.t+1]|[y.sub.1:t], theta]) (40)

The expression in (27) is obtained by noting that,

P([x.sub.t+1]|[y.sub.1:t, [theta]) = [integral] p([x.sub.t], [x.sub.t+1]|[y.sub.1:t, [theta])[dx.sub.t] = [integral] p([x.sub.t+1]|[x.sub.t], [theta])p([x.sub.t|[y.sub.1:t], [theta])[d.sub.t] (41)

and substituting (41) in (40), and (40) in (39), the expression in (27) is obtained.

Manuscript received October 5, 2007; revised manuscript received July 13, 2008; accepted for publication May 31, 2008.

REFERENCES

Andrieu, C., A. Doucet, S. S. Singh and V B. Tadic, "Particle Methods for Change Detection, System Identification, and Control," Proc. IEEE 92, 423-438 (2004).

Casella, G. and R. L. Berger, "Statistical Inference," Brooks/Cole Publishing Company, Pacific Grove, California (1990).

Celeux, G. and J. Diebolt, "The SEM Algorithm: A Probabilistic Teacher Algorithm Derived From the em Algorithm for the Mixture Problem," Comput. Stat. Q. 2, 73-82 (1985).

Chen, W S., "Bayesian Estimation by Sequential Monte Carlo Sampling For Nonlinear Dynamical Systems," Ph.D. Thesis, Ohio State University, 2004.

Chen, T., J. Morris and E. Martin, "Particle Filters for State and Parameter Estimation in Batch Processes," J. Process Control 15(6), 665-673 (2005).

Dempster, A. P., N. M. Laird and D. B. Rubin, "Maximum Likelihood From Incomplete Data Via the EM Algorithm," J. R. Stat. Soc. B 39, 1-38 (1977).

Doucet, A. and V B. Tadic, "Parameter Estimation in General State-Space Models Using Particle Methods," Ann. Inst. Stat. Math. 55(2), 409-422 (2003).

Doucet, A. and V B. Tadic, "Maximum Likelihood Parameter Estimation in General State-Space Models Using Particle Methods," Annals Inst. Stat. Math. 55, 409-422 (2003).

Doucet, A., N. de Freitas and N. Gordon, "Sequential Monte Carlo Methods in Practice," Springer, New York (2001).

Gibson, S., A. Wills and B. Ninness, "Maximum-Likelihood Parameter Estimation of Bilinear Systems," IEEE Trans. Inf. Theory 50(10), 1581-1596 (2005).

Godsill, S. J., A. Doucet and M. West, "Monte Carlo Smoothing for Nonlinear Time Series," J. Am. Stat. Assoc. 99(465), 156-168 (2004).

Goodwin, G. C. and J. C. Aguero, Approximate EM algorithms for parameter and state estimation in nonlinear stochastic models. "Proceedings of IEEE Conference on Decision and Control," 2005, pp. 368-373.

Gudi, R. D., S. L. Shah and M. R. Gray, "Adaptive multirate state and parameter estimation strategies with application to a bioreactor," AIChE J. 41(11), 2451-2464 (1995).

Haber, R. and L. Keviczky, "Nonlinear System Identification: Input-Output Modelling Approach," Kluwer Academic Publishers, New York (1999).

Henson, M. A. and D. E. Seborg, "Nonlinear Process Control," Prentice Hall, New Jersey, USA (1997).

Imtiaz, S. A., R. Kallol, B. Huang, S. L. Shah and P. Jampana, Estimation of states of nonlinear systems using a particle filter. "Proceedings of IEEE International Conference on Industrial Technology," 2006, pp. 2432-2437.

Isaksson, A. J., "Analysis of Identified 2-D Noncausal Models," IEEE Trans. Inf. Theory 39, 525-534 (1993).

Kitagawa, G., "A Self-Organizing State-Space Model," J. Am. Stat. Assoc. 93, 1203-1215 (1998).

Klaas, M., M. Briers, D. Nando and A. Doucet, Fast particle smoothing: If I had a million particles. "International Conference on Machine Learning," 2006.

Li, D., S. L. Shah and T. Chen, "Identification of Fast Rate Models From Multi-Rate Data," Int. J. Control 680-689 (2001).

Li, D., S. L. Shah, T. Chen and K. Qi, "Application of Dual-Rate Modeling to CCR Octane Quality Inferential Control," IEEE Trans. Control Syst. Technol. 11(1), 43-51 (2003).

Little, R. J. A. and D. B. Rubin, "Statistical Analysis With Missing Data," J. Wiley & Sons, Hoboken, NJ (1987).

Ljung, L., "System Identification: Theory for the User," Prentice Hall, NJ (1999).

Ljung, L. and A. Vicino, (Eds). "Special Issue on System Identification," IEEE Trans. Automatic Control 50, 10 (2005).

McLachlan, G. J. and T. Krishnan, "The EM Algorithm and Extensions," Wiley, Hoboken, NJ (1997).

Momingred, J. D., B. E. Paden, D. E. Seborg and D. A. Mellichamp, "An Adaptive Nonlinear Predictive Controller," Chem. Eng. Sci. 47, 755-762 (1992).

Papoulis, A., "Probability, Random Variables and Stochastic Processes," McGraw Hill, New York (2002).

Petersen, K. B., O. Winther and K. L. Hansen, "On the Slow Convergence of em and vbem in Low-Noise Linear Models," Neural. Comput. 17(9), 1921-1926 (2005).

Raghavan, H., A. K. Tangirala, R. B. Gopaluni and S. L. Shah, "Identification of Chemical Processes With Irregular Output Sampling," Control Eng. Pract. 14(5), 467-480 (2006).

Rawlings, J. B. and B. R. Bakshi, "Particle Filtering and Moving Horizon Esimaion," Comput. Chem. Eng. 30(10-12), 1529-1541 (2006).

Roweis, S. and Z. Ghahramani, "Kalman Filtering and Neural Networks," chapter Learning Nonlinear Dynamical Systems Using the EM algorithm. John Wiley, Hoboken, NJ (2001).

Salakhutdinov, R., S. T. Roweis and Z. Ghahramani, Adaptive overrelaxed bound optimization methods. "Proceedings of International Conference on Machine Learning," 2003, pp. 664-671.

Schon, T. B., A. Wills and B. Ninness, Maximum likelihood nonlinear system estimation. "Proceedings of IFAC Symposium on System Identification," 2006, pp. 1003-1008.

Seborg, D. E., T. F. Edgar and D. A. Mellichamp, "Process Dynamics and Control," John Wiley, New Jersey (2004).

Shumway, R. H. and D. S. Stoffer, "An Approach to Time Series Smoothing and Forecasting Using the em Algorithm," J. Time Ser. Anal. 3, 253-264 (1982).

Shumway, R. H. and D. S. Stoffer, "Time Series Analysis and Its Applications," Springer, New York (2000).

Soroush, M., "State and Parameter Estimations and Their Applications in Process Control," Comput. Chem. Eng. 23 (2), 229-245 (1998).

Tatiraju, S., M. Soroush and R. Mutharasan, "Multi-Rate Nonlinear State and Parameter Estimation in a Bioreactor," Biotechnol. Bioeng. 63(1), 22-32 (1999).

Wahlberg, B., L. Ljung and T. Soderstrom, "On Sampling of Continuous Time Stochastic Processes," Control Theory Adv. Technol. 9(1), 99-112 (1993).

Wang, L. and P. Gawthrop, "On the Estimation of Continuous Transfer Functions," Int. J. Control 74(9), 889-904 (2001).

Wei, G. C. G. and M. Tanner, "A Monte-Carlo Implementation of the em Algorithm and the Poor Man's Data Augmentation Algorithms," J. Am. Stat. Assoc. 85(411), 699-704 (1990).

Wu, C. F., "On the Convergence Properties of the EM Algorithm," Ann. Stat. 11, 95-103 (1983).

Yuz, J. I. and G. C. Goodwin, "On Sampled-Data Models for Nonlinear Systems," IEEE Trans. Automatic Control 50(10), 1477-1489 (2005).

R. B. Gopaluni * Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, Canada V6T 1Z3

* Author to whom correspondence may be addressed. E-mail address: gopaluni@achml.ubc.ca

Table 1. Parameter values after 100 iterations Parameter % missing data 0% 10% 25% 50% a 0.8985 0.8969 0.8936 0.8972 b 1.0366 1.0353 1.0511 0.9997 c 0.9712 0.9763 1.0043 0.9819 Q 0.0992 0.0968 0.1082 0.0942 R 0.0817 0.0859 0.0769 0.0772 Table 2. CSTR parameters Param Value q 100 L/min V 100 L [C.sub.Ai] 1 mol/L [k.sub.0] 7.2 x [10.sup.10] [min.sup.-1] [E.sub.a]/R 10000 K [T.sub.i] 1350 K Param Value [DELTA]H -2 x [10.sup.4] cal/mol [rho], [[rho].sub.c] 1000 g/L [C.sub.p], [C.sub.pc] 1 cal/g/K hA 7 x [10.sup.5] cal/[cm.sup.2]/min/K A 10 [cm.sup.2] [T.sub.c] 305 K Table 3. Parameter values after 300 iterations Parameter % missing data 0% 10% 25% 50% [[theta].sub.1] x [10.sup.-10] 7.2001 7.1996 7.1999 7.2019 [[theta].sub.2] x [10.sup.-12] 14.4031 14.4065 14.4067 14.4069 [[theta].sub.3] x [10.sup.-2] 8.6 45.25 13.56 8.92 q 0.1629 0.1657 0.1624 0.1795 r 0.1601 0.1587 0.1585 0.1594

Printer friendly Cite/link Email Feedback | |

Author: | Gopaluni, R.B. |
---|---|

Publication: | Canadian Journal of Chemical Engineering |

Date: | Dec 1, 2008 |

Words: | 9358 |

Previous Article: | Flow regime determination in horizontal hydrotransport using non-intrusive acoustic probes. |

Next Article: | Optimization by response surface methodology (RSM) for toluene adsorption onto prepared acid activated clay. |