Printer Friendly



In probability theory and related fields, a Markov process (named after the Russian mathematician Andrey Markov), is a stochastic process that satisfies the "memorylessness" property, meaning that one can make predictions for the future of the process based solely on its present state, independently from its history. A Markov chain is a Markov process that has a discrete state space. Markov chains have many applications as statistical models of real-world problems, such as counting processes, queuing systems, exchange rates of currencies, storage systems, population growths and other applications in Bayesian Statistics.

Monte Carlo methods are used to perform many simulations using random numbers and probability to get an approximation of the answer to a problem which is otherwise too complicated to solve analytically. Such methods use approximations which rely on "long run" simulations, based on computer random number generators. Monte Carlo methods can be used for (but are not restricted to) computation of probabilities, expected values and other distribution characteristics.

1.1. Preliminaries

We recall a few notions from Probability Theory that will be needed later.

Let S be the sample space of some experiment, i.e. the set of all possible outcomes of that experiment (called elementary events and denoted by [e.sub.i]). Let P be a probability mapping (see [4]).

Definition 1.1. A random variable is a function X : S [right arrow] R for which P (X [less than or equal to] x) exists, for all x [member of] R.

If X(S) [??] R is at most countable in R, then X is a discrete random variable, otherwise, it is a continuous random variable.

If X is a discrete random variable, then a better way of describing it is to give its probability distribution function (pdf) or probability mass function (pmf), an array that contains all its values [x.sub.i], and the corresponding probabilities with which each value is taken, [p.sub.i] = P(X = [x.sub.i]),

[mathematical expression not reproducible] (1.1)

Of the discrete probability laws, we recall two of the most widely used.

Bernoulli distribution Bern(p), with parameter p [member of] (0,1).This is the simplest of distributions, with pdf

[mathematical expression not reproducible] (1.2)

It is used to model "success/failure" (i.e. a Bernoulli trial), since many distributions are described in such terms.

Binomial distribution B(n, p), with parameters n [member of] N, p [member of] (0,1). Consider a series of n Bernoulli trials with probability of success p in every trial (q = 1 - p). Let X be the number of successes that occur in the n trials. Then X has a Binomial distribution, with pdf

[mathematical expression not reproducible] (1.3)

Note that a Binomial B(n,p) variable is the sum of n independent Bern(p) variables and Bern(p) = B(1,p).

Let us also recall the notion of conditional probability and related properties.

Definition 1.2. Let A and B be two events with P(B) [not equal to] 0. The conditional probability of A, given B, is defined as

P(A|B) = P(A[intersection]B)/P(B).

The next result is known as the total probability rule.

Proposition 1.3. Let [{[A.sub.i]}.sub.i[member of]I] be a partition of S, i.e. [[union].sub.i[member of]I][A.sub.i] =S [{[A.sub.i]}.sub.i[member of]I] are an exhaustive collection of events) and [A.sub.i] [intersection] [A.sub.j] = [empty set], i [not equal to] j [{[A.sub.i]}.sub.i[member of]I] are mutually exclusive events). Let E be any event and B any event with P(B) [not equal to] 0. Then

P(E) = [[summation].sub.i[member of]I]P([A.sub.i])P(E |[A.sub.i]), P(E |B) = [[summation].sub.i[member of]I]P([A.sub.i] | B)P( E | [A.sub.i]). (1.4)


Random variables describe random phenomena at a particular moment of time, but many variables change and evolve in time (think air temperatures, stock prices, currency rates, CPU usage, etc). Basically, stochastic processes are random variables that develop and change in time.

Definition 2.1. A stochastic process is a random variable that also depends on time. It is denoted by X(t, e) or [X.sub.t](e), where t [member of] T is time and e [member of] S is an outcome. The values of X(t, e) are called states.

If t [member of] T is fixed, then [X.sub.t] is a random variable, whereas if we fix e [member of] S, [X.sub.e] is a function of time, called a realization or sample path of the process X(t, e).

Definition 2.2. A stochastic process is called discrete-state if [X.sub.t](e) is a discrete random variable for all t [member of] T and continuous-state if [X.sub.t](e) is a continuous random variable, for all t [member of] T.

Similarly, a stochastic process is said to be discrete-time if the set T is discrete and continuous-time if the set of times T is a (possibly unbounded) interval in R

Throughout the paper, we will omit writing e as an argument of a stochastic process (as it is customary when writing random variables).

2.1. Markov Processes and Markov Chains; Transition Probability Matrix

Definition 2.3. A stochastic process [X.sub.t] is Markov if for any times [t.sub.1] < [t.sub.2] < *** < [t.sub.n] < t and any sets [A.sub.1],[A.sub.2],..., [A.sub.n];A,

[mathematical expression not reproducible] (2.1)

What this means is that the conditional distribution of [X.sub.t] given observations of the process at several moments in the past, is the same as the one given only the latest observation.

Definition 2.4. A discrete-state, discrete-time Markov stochastic process is called a Markov chain.

To simplify the writing, we use the following notations: Since a Markov chain is a discrete-time process, we can see it as a sequence of random variables {[X.sub.0],[X.sub.1],...} where [X.sub.k] describes the situation at time t = k. It is also a discrete-state process, so we denote the states by 1, 2, ..., n (they may start at 0 or some other value and n may possibly be [infinity]).

Then the random variable [X.sub.k] has the pdf

[mathematical expression not reproducible] (2.2)

where [P.sub.k](1) = P([X.sub.k] = 1),..., [P.sub.k](n) = P([X.sub.k] = n). Since the states (the values of the random variable [X.sub.k]) are the same for each k, one only needs the second row to describe the pdf. So, let

[P.sub.k] = [[P.sub.k](1) [P.sub.k](2)... [P.sub.k](n)] (2.3)

denote the vector on the second row of (2.2).

The Markov property (2.1) can be now written as

P([X.sub.k+1] =j | [X.sub.t] = i, [X.sub.t-1] = l, ...) = P([X.sub.t+1] =j\ [X.sub.t] = i), for all t [member of] T. (2.4)

We summarize this information in a matrix.

Definition 2.5.

- The conditional probability

[P.sub.ij](t) = P([X.sub.t+1] = j | [X.sub.t] = i) (2.5)

is called a transition probability; it is the probability that the Markov chain transitions from state i to state j, at time t. The matrix

[mathematical expression not reproducible] (2.6)

is called the transition probability matrix at time t.

- Similarly, the conditional probability

[p.sup.(h).sub.ij](t) = P([X.sub.t+h]=j | [X.sub.t] = i) (2.7)

is called h-step transition probability, i.e. the probability that the Markov chain moves from state i to state j, in h steps and the matrix

[mathematical expression not reproducible] (2.8)

is the h-step transition probability matrix at time t.

Definition 2.6. A Markov chain is homogeneous if all transition probabilities are independent of time,

[mathematical expression not reproducible]

Throughout the rest of the paper, we will only refer to homogeneous Markov chains (even if not specifically stated so).

Proposition 2.7. Let {[X.sub.0],[X.sub.1],...} be a Markov chain. Then the following relations hold:

[p.sup.(h)] = [p.sup.h], for all h = 1,2,...

[P.sub.t] = [P.sub.0] * [P.sup.i], for all i = 0,1,... (2.9)


The proof of the first relation goes by induction.

Obviously, the first relation in (2.9) is true for h = 1. Assume [P.sup.(h-1)] = [P.sup.h-1]. For a matrix M, we use the notation [[M].sub.ij] = M(i,j) and, similarly, for a vector v, [(v).sub.i] = v (i). Since the events [mathematical expression not reproducible] form a partition, using the total probability rule (1.4) for [p.sup.(h).sub.ij] = P([X.sub.h]=j | [X.sub.0] = i), we have

[mathematical expression not reproducible]

so [p.sup.(h)] = [P.sup.h].

To prove the second relation in (2.9), for each j = [bar.1,n,] we have [[[P.sub.i]].sub.j] = [P.sub.i](j) = P([X.sub.i] = j). Again, using (1.4) for the events [{([X.sub.0] = k)}.sub.k=[bar.1,n]], we get

[mathematical expression not reproducible]

so, by the previous relation proved, [P.sub.i] = [P.sub.0] * [P.sup.i].

Remark 2.8. We used the fact that state destinations are mutually exclusive and exhaustive events, thus forming a partition. That is because from each state, a Markov chain makes a transition to one and only one state. As a consequence, in matrices P and [P.sup.(h)](=[P.sup.h]), the sum of all the probabilities on each row is 1. Such matrices are called stochastic.

2.2. Steady-State Distribution; Regular Markov Chains

It is sometimes necessary to be able to make long-term forecasts, meaning we want [mathematical expression not reproducible] [P.sup.h], so we need to compute [mathematical expression not reproducible].

Definition 2.9. Let X be a Markov chain. The vector [pi] = [[[pi].sub.1],...,[[pi].sub.n]], consisting of the limiting probabilities [mathematical expression not reproducible], k = 1,...,n, if it exists, is called a steady-state distribution of X.

In order to find it, let us notice that

[P.sub.h]P = ([P.sub.0][P.sup.h])P = [P.sub.0][P.sup.h+i] = [P.sub.h+i].

Taking the limit as h [right arrow] [infinity] on both sides, we get

[pi]P = [pi]. (2.10)

Notice that the system (2.10) is an n x n singular linear system (multiplication by a constant on each side leads to infinitely many solutions). However, since n must also be a stochastic matrix, the sum of its components must equal 1. We state the following result, without proof.

Proposition 2.10. The steady-state distribution of a homogeneous Markov chain X, [pi] = [[[pi].sub.1],...[[pi].sub.n]], if it exists, is unique and is the solution of the (n + 1)x n linear system

[mathematical expression not reproducible] (2.11)

Remark. 2.11.

1. When we need to make predictions after a large number of steps, instead of the lengthy computation of [P.sub.h], it may be easier to try to find the steady-state distribution, [pi], directly.

2. If a steady-state distribution exists, then [P.sup.(h)] = [P.sup.h] also has a limiting matrix, given by

[mathematical expression not reproducible]

Notice that [pi] and [PI] do not depend on the initial state [X.sub.0]. Actually, in the long run the probabilities of transitioning from any state to a given state are the same, [p.sub.ik] = [p.sub.jk], [for all]i, j, k = [bar.1,n] (all the rows of [PI] coincide). Then, it is just a matter of "reaching" a certain state (from anywhere), rather than "transitioning" to it (from another state). That should, indeed, depend only on the pattern of changes, i.e. only on the transition probability matrix.

As stated earlier, a steady-state distribution may not always exist. We will mention (without proof) one case, which is really easy to check, when such a distribution does exist.

Definition 2.12. A Markov chain is called regular if there exists h [greater than or equal to] 0, such that

[p.sup.(h).sub.ij] > 0,

for all i,j = 1,...,n.

This is saying that at some step h, [p.sup.(h)] has only non-zero entries, meaning that h-step transitions from any state to any state are possible.

Proposition 2.13. Any regular Markov chain has a steady-state distribution.

Remark 2.14.

1. Regularity of Markov chains does not mean that all [p.sup.(k).sub.ij] should be positive, for all h. The transition probability matrix P, or some of its powers, may have some 0 entries, but there must exist some power h, for which [p.sup.(h)] has all non-zero entries.

2. If there exists a state i with [p.sub.ii] = 1, then that Markov chain cannot be regular. There is no exit (no transition possible) from state i. Such a state is called an absorbing state.

3. Another example of a non-regular chain is that of a periodic Markov chain, i.e. one for which there exists T > 0, such that [X.sub.t] = [X.sub.t+t], for all t [greater than or equal to] 0. Obviously, in this case, [mathematical expression not reproducible] does not exist, and neither does a steady-state distribution.


Monte Carlo methods are a class of computational algorithms that can be applied to a vast range of problems, where computation of probabilities and other characteristics of interest is too complicated, resource or time consuming, or simply not feasible. They are based on computer simulations involving random number generators and are used to make predictions about processes involving random variables. A computer code that replicates a certain phenomenon can be put in a loop, be simulated any number of times and, based on the outcomes, conclusions about its real life behaviour can then be drawn. The longer run is simulated, the more accurate the predictions are. Monte Carlo methods can be used for estimation of probabilities, other distribution characteristics, lengths, areas, integrals, etc.

Many important characteristics of stochastic processes require lengthy complex computations. Thus, it is preferable to estimate them by means of Monte Carlo methods. For Markov chains, to predict its future behaviour, all that is required is the distribution of [X.sub.0], i.e. [P.sub.0] (the initial situation) and the pattern of change at each step, i.e. the transition probability matrix P.

Once [X.sub.0] is generated, it takes some value [X.sub.0] = i (according to its pdf [P.sub.0]). Then, at the next step, [X.sub.1] is a discrete random variable taking the values j,j = 1,...,n with probabilities [p.sub.ij] from row i of the matrix P. Its pdf will be

[mathematical expression not reproducible]

The next steps are simulated similarly.

Since, at each step, the generation of a discrete random variable is needed, we can use any algorithm that simulates an arbitrary discrete distribution. Let

[mathematical expression not reproducible]

be any discrete random variable. We use the following simple algorithm (see [4]).

Algorithm 3.1.

1. Divide the interval [0,1] into the subintervals [{[A.sub.i]}.sub.i[member of]I] as follows:

[mathematical expression not reproducible]

2. Let U [member of] U(0,1) be a Standard Uniform random variable.

3. If U [member of] [A.sub.i], let X = [x.sub.i].

Indeed, thenXtakes the values xt and P(X = [x.sub.i]) = length([A.sub.i]) = [p.sub.i]. We put Algorithm 3.1 in a loop to generate a Markov chain.

Algorithm 3.2.

1. Given:

[N.sub.M] = sample path size (length of Markov chain),

[P.sub.0] = [[P.sub.0](1)... [P.sub.0](n)],

P = [[p.sub.ij].sub.i,j=[bar.1,n]].

2. Generate [X.sub.0] from its pdf [P.sub.0].

3. Transition: if [X.sub.t] = i, generate [X.sub.t+1], with [P.sub.ij] = [bar.1,n], using Algorithm 3.1.

4. Return to step 3 until a Markov chain of length [N.sub.M] is generated.


Let us consider the following example:

An encrypting program generates sequences of letters, such that a vowel is followed by a consonant with probability 0.3, while a consonant is followed by a vowel with probability 0.4.

(1) If the first character is a consonant, make predictions for the second and third character.

This stochastic process, say X, has two states, 1 ="vowel" and 2 ="consonant", so it is discrete-state. The time set consists of the position of each character in the sequence, so X is also discrete-time. Since the prognosis of each character depends only on the previous one, it is a Markov process and, hence, a Markov chain. Finally, the probability of transitioning from a vowel or from a consonant at any position in the sequence, is the same, hence, X is a homogeneous Markov chain.

The initial situation (first character) is

[mathematical expression not reproducible]

The transition probability matrix is

[mathematical expression not reproducible]

For the second character, at t = 1, the pdf will be

[mathematical expression not reproducible]

So, the second character has 40% chance of being a vowel and 60% chance of being a consonant. For the third character, the pdf is

[mathematical expression not reproducible]

The third character is a vowel with probability 0.52 and a consonant with probability 0.48.

(2) Suppose now that the first character is a consonant with probability 0.8. What is the prognosis for the third and the 100th character?

In this case, P is the same, but [X.sub.0] (i.e. [P.sub.0]) changes.

[P.sub.0] = [0.2 0.8]


[P.sub.2] = [P.sub.0] * [P.sup.2] = [0.538 0.462].

The 100th character is many steps away, so instead of computing [P.sub.100], we find the steady-state distribution. Notice that P has all nonzero entries, so the Markov chain is regular, which means a steady-state distribution does exist. We find it by solving the system (2.11), i.c,

[mathematical expression not reproducible]

with solution [[pi].sub.1] = 0.5714 and [[pi].sub.2] = 0.4286. So, in the "long run ", the pdf of the situation is

[mathematical expression not reproducible]

i.e., about 57% of the characters are vowels and around 43% are consonants.

(3) It was found that if more than 15 vowels or more than 12 consonants are generated in a row, in a sequence of 100 characters, then the code becomes vulnerable to cracking. Assuming that the first character is a consonant with probability 0.8, conduct a Monte Carlo study for estimating the probability of the code becoming vulnerable.

We use Algorithm 3.2 to generate a sample path of length 100, for a large number of simulations. The MATLAB code is given below.
% Simulate Markov chain,
clear all
Nm = inputs ('length of sample path =');
N = input('nr. of simulations = ')
For j = 1 : N
p = [0.2 0.8]; % initial distr. of vowels/consonants
P = [0.7 0.3; 0.4 0.6]; %trans. prob. matrix
prob(1, :) = p;
for t = 1 : Nm
   U = rand;
   X(t)=1*(U < p(1))+2*(U > =p(1));
% simulate X(1), ..., X(Nm)as Bernoulli variables
  prob(t+1, :) = prob(t, :)*P;
  p = P(X(t), :); % prepare the distribution for X(t+1);
                  % its pdf is the (X(t))th row of matrix P

i_change = [find(X(1:end-1) (~)=X(2:end)), Nm];
% find all indices where X changes states
longstr(1) = 1; % find a vector containing the long streak
                % of consecutive vowels/consonants
if (i_change(1) (~)=1)
% if X does not change state at step 1, the first long streak
begins at the first change of states
  longstr(1) = i_change(1);

for i = 2 : length(i_change)
  longstr(i) = i_change(i) - i_change(i-1);
% find all streaks

vowel = longstr(1:2:end); % find the long streaks of vowels
conson = longstr(2:2:end); % find the long streaks of the consonants
  vowel = longstr(2:2:end);
  conson = longstr(1:2:end);
maxv(j) = max(vowel); % longest streak of vowels
maxc(j) = max(conson); % longest streak of consonants
fprintf('probability of more than 15 vowels in a row is
       %1.4f\n', mean(maxv>15))
fprintf('probability of more than 12 consonants in a row is
       %1.4f\n', mean(maxc>12))

After running this code several times, for a sample path of length 100 and for a number of [10.sup.4] and [10.sup.5] simulations, it was found that the probability of having more than 12 vowels in a row is approximately 0.07, whereas the chance of getting more than 12 consonants in a row is around 0.04. Based on these results, the encrypting technique can be properly adjusted.


[1] C. Andrieu, A. Doucet, R. Holenstein, Particle Markov chain Monte Carlo methods, J. Royal. Statist. Soc. B. Vol. 72(3), 2010, 269-342.

[2] M. Baron, Probability and Statistics for Computer Scientists, 2nd Edition, CRC Press, Taylor & Francis, Boca Raton, FL, USA, 2014.

[3] L. Gurvits, J. Ledoux, Markov property for a function of a Markov chain: A linear algebra approach, Linear Algebra and its Applications, Vol. 404, 2005, 85-117.

[4] D. V. Khmelev, F. J. Tweedie, Using Markov Chains for Identification of Writers, Literary and Linguistic Computing, Vol. 16(4), 2001, 299-307.

[5] T. Liu, Application of Markov Chains to Analyse and Predict the Time Series, Modern Applied Science, Vol. 4(5), 2010, 161-166.

[6] S. Micula, Probability and Statistics for Computational Sciences, Cluj University Press, 2009.

[7] S. Micula, Statistical Computer Simulations and Monte Carlo Methods, J. of Information Systems and Operations Management, Vol. 9(2), 2015, 384-394.

[8] J. S. Milton, J. C. Arnold, Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, 3 (rd) Edition. McGraw-Hill, New York, 1995.

[9] J. Pan, A. Nagurney, Using Markov chains to model human migration in a network equilibrium framework, Mathematical and Computer Modelling, Vol. 19(11), 1994, 31-39.

[10], 2017.

Sanda Micula (1) Rodica Sobolu (2*)

(1) Department of Mathematics and Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania,

(2*) corresponding author, Department of Land Measurements and Exact Sciences, University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania,
COPYRIGHT 2017 Romanian-American University
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Micula, Sanda; Sobolu, Rodica
Publication:Journal of Information Systems & Operations Management
Article Type:Report
Date:Nov 1, 2017

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |