0% found this document useful (0 votes)

13 views30 pages

18-MarkovChains 2

This document discusses the application of Markov chains in statistical inference and modeling, particularly in the context of time-dependent data. It provides examples, such as a taxi company's fare distribution, to illustrate how Markov models can analyze dependent observations and compute transition probabilities. The document also covers the properties of Markov chains, including the Markov property and the construction of transition matrices for modeling state transitions over time.

Uploaded by

tanyalim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

18-MarkovChains 2

Uploaded by

tanyalim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STAT 5703

data- connected to
time
dependent
Statistical Inference and Modeling
for Data Science
Dobrin Marchev

1
Motivation
• Until now, all models assumed i.i.d.
random sample framework
• We now consider a dependence structure
between the observations X1, … , Xn
• This can be done with two approaches:
Markov chains or time series
• Markov models provide a very rich set of
tools for handling dependence

Prof. Andrei A. Markov (1856-1922) , published his

result in 1906.
Markov later used his theory to study the distribution
of vowels in Onegin, written by Pushkin.
Data Science Example
• A common application of Markov chains in data science is
text prediction.
• It’s an area of NLP (Natural Language Processing) that is
commonly used in the tech industry by companies like
Google, LinkedIn and Instagram. When you’re writing
emails, Google predicts and suggests words or phrases to
autocomplete your email. And when you receive messages
on Instagram or LinkedIn, those apps suggest potential
replies.
• These are not the applications of a Markov chain we will
explore. The types of models these large-scale companies
use in production for these features are more
complicated. We will stick to simpler examples.
Example: the taxi problem
A taxi company has divided the city into three regions –
Northside, Downtown, and Southside. By keeping track of
pickups and drop-offs, the company has found that:

• Of the fares picked up in Northside, 50% stay in that region,

20% are taken to Downtown, and 30% go to Southside.
• Of the fares picked up Downtown, only 10% go to
Northside, 40% stay in Downtown, and 50% go to
Southside.
• Of the fares picked up in Southside, 30% go to each of
Northside and Downtown, while 40% stay in Southside.

We would like to know what the distribution of taxis will be

over time as they pick up and drop off successive fares.
Suppose we want to know the probability that a taxi starting
off Downtown, will be Downtown after letting off its seventh
fare?
State Diagram
what happens in 2 interventions ?

Random Var Sequence but dont

:
,
assume

This information can be IID .

represented in a state Dependence is specified as

much as
possible .
diagram which includes
• the three states D, N, and S
corresponding to the three
regions of the city
• the probabilities of a taxi
transitioning from one
region/state to another
Properties
Markov chains
conditionally dependent most immediate
I
?
Discrete & finite Space
continuous time .
passage
:
on .

• If the location of the taxi at time n is denoted by Xn , then the

sequence X1, X2, … consists of dependent variables, and can
be modeled with a Markov chain. The values the random
-

variable Xn can take, are known as states of the chain.

↓
sequence of var that satisfiesit .

• The probabilities of moving from state to state are constant

and independent of the past behavior – this property of the
system is called the Markov property. That is,

𝑃 𝑋𝑛 = 𝑠𝑛 𝑋𝑛−1 = 𝑠𝑛−1, … , 𝑋0 = 𝑠0 = 𝑃 𝑋𝑛 = 𝑠𝑛 𝑋𝑛−1 = 𝑠𝑛−1

• We assume that a transition – picking up and dropping off a

fare – occurs each time the system is observed, and that
observations occur at regular intervals. Systems with these
characteristics are called Markov chains or Markov processes
(when time is continuous).
Computing Transition Probabilities
• What is the probability that a taxi that starts off Downtown ends up in
Northside after two fares?
• One possibility is that the taxi stays Downtown for the first fare and
then transitions to Northside for the second. The probability of this
occurring is then:

• But we could also have the taxi going to either Northside of Southside
first, then transitioning to Northside:

• Since the taxi could follow the first, second or third path, the
probability of starting and ending Downtown after two fares is:
Transitions in More Steps
• If we want to know the
Tree Diagram probability of a taxi
transitioning from one
region to another after just
three fares, the
computation will have more
possible paths.
• Suppose we were
interested in the probability
of a taxi both starting and
ending up Downtown. We
can use a tree diagram to
represent this calculation.
• More generally, we might
want to determine the
If we multiply along all the paths and sum probability of moving from
the results, we find that this probability is state I to state I over m
0.309. steps.
Transition Matrix

• We can create a square matrix, P, called the transition

matrix, by constructing rows for the probabilities going
from Southside and Northside as well.

• An entry Pij of this matrix is the probability of a transition

from region i to region j. For example, p32, is the
probability of a fare that originates in Northside going to
Southside. (Note the sum of entries across rows of P.)
Transition Matrix

What results when we multiply the transition matrix by itself?

from D to Nin2steps.
0.4 0.5 0.1 2
O𝑃2 = 0.3 0.4 0.3 Around
0.2 0.5 0.5 Around3 .
4

↑
0 .

0 .

↓
O
↓
0.4 0.5 0.1 0.4 0.5 0.1 0.33 0.43 0.24
= 0.3 0.4 0.3 × 0.3 0.4 0.3 = 0.3 0.4 0.3
0.2 0.5 0.5 0.2 0.5 0.5 0.27 0.37 0.36

The highlighted entry results from the same computation that

we already considered for of a taxi going from D to N in two
min
fares. Stationary distribution Numbers should converge
of chain -
equalise together.
.

What are the other entries of P2? What are the entries of P3? Pn?
Statistical Model
• A Markov chain is a stochastic process {Xt} taking values in a (finite discrete)
state space 𝒳 = {1, . . . , s} such that the distribution of Xt depends only on Xt-1.
In the taxi example, 𝒳 = {N, D, S}. Note that the state space can also be infinite.
initial distribution.
• The observed sample data 𝑋𝑡1 , … , 𝑋𝑡𝑛 are of the form A 𝑋0 = 𝑠0, 𝑋𝑡1 =
𝑠1 , … , 𝑋𝑡𝑛 = 𝑠𝑛 , where 0 < t1 < ... < tn. drop generic assumption.

• Note that we can always write the likelihood (joint pdf) for any sample, using
the multiplication rule for dependent r.v, as:
Hard.
generalising to infinite state space >
-

𝑃 𝑋0 = 𝑠0 , … , 𝑋𝑡𝑛 = 𝑠𝑛
𝑛
= 𝑃 𝑋0 = 𝑠0 ෑ 𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋0 = 𝑠0 , 𝑋𝑡1 = 𝑠𝑡1 , … , 𝑋𝑡𝑖−1 = 𝑠𝑖−1
𝑖=1

• What makes {Xt} a Markov chain is the extra assumption that each
𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋0 = 𝑠0 , 𝑋𝑡1 = 𝑠𝑡1 , … , 𝑋𝑡𝑖−1 = 𝑠𝑖−1 depends only on the most
recent observation 𝑋𝑡𝑖−1 = 𝑠𝑖−1
Statistical Model
Definition: The (first-order) Markov property assumes that “given the present,
the future is independent of the past”, meaning that

𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋0 = 𝑠0 , 𝑋𝑡1 = 𝑠𝑡1 , … , 𝑋𝑡𝑖−1 = 𝑠𝑖−1

= 𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋𝑡𝑖−1 = 𝑠𝑖−1 , ∀𝑖

This simplifies the likelihood to:

𝑛
𝑃 𝑋0 = 𝑠0 , … , 𝑋𝑡𝑛 = 𝑠𝑛 = 𝑃 𝑋0 = 𝑠0 ෑ 𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋𝑡𝑖−1 = 𝑠𝑖−1
𝑖=1

For example,
𝑃 𝑋0 = 𝑎, 𝑋1 = 𝑏, 𝑋2 = 𝑐 = 𝑃 𝑋0 = 𝑎 𝑃 𝑋1 = 𝑏 𝑋0 = 𝑎 𝑃(𝑋2 = 𝑐|𝑋1 = 𝑏)
If the process does not posses the Markov property, then we can only write:
𝑃 𝑋0 = 𝑎, 𝑋1 = 𝑏, 𝑋2 = 𝑐
= 𝑃 𝑋0 = 𝑎 𝑃 𝑋1 = 𝑏 𝑋0 = 𝑎 𝑃(𝑋2 = 𝑐|𝑋0 = 𝑎, 𝑋1 = 𝑏)

The theory of Markov chains is a very rich and complex. We must get through
many definitions before we can do anything interesting.
Questions to be answered
• When does a Markov chain "settle
down" into some sort of equilibrium?
• What does this even mean?
• How do we estimate the parameters
of a Markov chain?
• What are the parameters of a Markov
chain?
• How can we construct Markov chains
that converge to a given equilibrium
distribution? Two Markov chains. One of the
• Why would we want to do that?? chains does not settle down into an
• Time to think: do you know what does equilibrium. The other one does.
Which one is which?
the following sequence converge to?
1 1
𝑥𝑛+1 = 𝑥𝑛 +
2 𝑥𝑛
Transition Matrix
Definition: A Markov chain is a Markov process in discrete time, so we can
simply write ti = i, i = 1, … , n.

Definition: A transition matrix Pt for a Markov chain {X} at time t is a matrix

containing information on the probability of transitioning between states.
Given an ordering of a matrix’s rows and columns by the state space 𝒳, the
(i, j)th element of the matrix Pt is given by:
given current
Next value -j value i. is
,

𝑷𝑡 𝑖𝑗 = 𝑃 𝑋𝑡+1 = 𝑗 𝑋𝑡 = 𝑖

This means each row of the matrix is a probability vector, and the sum of its
entries is 1.
Transition matrices have the property that the product of subsequent ones
describes a transition along the time interval spanned by the transition
matrices.
Note: The matrix depends on the time t.
Time homogeneous Markov chains
Definition: A Markov chain is time-independent, aka homogeneous if
Converging to something.
𝑃 𝑋𝑡+1 = 𝑠 𝑋𝑡 = 𝑟 = 𝑃 𝑋1 = 𝑠 𝑋0 = 𝑟 , = 𝑝𝑟𝑠 ∀𝑡
That is, the conditional probabilities only depend on the time difference.
Note: 𝑝𝑟𝑠 does not depend on t.

For a homogeneous Markov chain Xt, observed at a discrete equally spaced

times, t = 0, 1, . . . , n, we define the constant s×s transition matrix P such that
the (i, j)th element pij is given by:

𝑷 𝑖𝑗 = 𝑃 𝑋𝑡+1 = 𝑗 𝑋𝑡 = 𝑖 = 𝑃 𝑋1 = 𝑗 𝑋0 = 𝑖 , ∀𝑡

Note: The transition matrix must satisfy that

1. 𝑝𝑖𝑗 ≥ 0, ∀𝑖, 𝑗 ∈ 𝒳
2. σ𝑠𝑗=1 𝑝𝑖𝑗 = 1, ∀𝑖 ∈ 𝒳
Transition Matrix Properties
Denote the initial probabilities by 𝑝𝑗 = 𝑃 𝑋0 = 𝑗 , 𝑗 = 1, … , 𝑠. That is, we
𝑝1
can form a vector p of all initial probabilities: 𝒑 = ⋮ .
𝑝𝑠
tedious algebra
.
Then, by the law of total probability,
>
-
𝑠
Marginal property of next 𝑠
𝑃 𝑋1 = 𝑘 = ෍ 𝑃 𝑋0 = 𝑗 𝑃 𝑋1 = 𝑘 𝑋0 = 𝑗 = ෍ 𝑝𝑗 𝑝𝑗𝑘
𝑗=1
step being K. 𝑗=1
Notice this is the kth element of 𝒑′ 𝑷

Theorem: By induction, we can prove that the pmf of Xn is given by 𝒑′ 𝑷𝑛 .

That is, 𝑃(𝑋𝑛 = 𝑘) is the kth element of the vector 𝒑′ 𝑷𝑛 .

Note: This is different than 𝑷𝑛 . The element on ith row, jth column of 𝑷𝑛 is
equal to 𝑃(𝑋𝑛 = 𝑗|𝑋1 = 𝑖).
Example: Given the chain is in state s, the probability of a run of m ≥ 1
𝑚−1 1 − 𝑝
successive stays in state s is 𝑝ss 𝑠𝑠 ~ Geo 1 − 𝑝𝑠𝑠
Stationary Distribution
To understand the long-term behavior of our Markov chain, we introduce the
concept of a stationary distribution.
Definition: A Markov chain with transition matrix P is said to have a stationary
distribution 𝝅 (aka equilibrium or invariant distribution) if there exists a vector
𝝅 such that 𝝅′ 𝑷 = 𝝅′
This equation implies that multiplying the equilibrium vector by the transition
matrix P yields the same vector. In Markov chains, this is called "Pisa state" and
it refers to a specific kind of state where the chain can only reach from other
states but cannot transition away from it once entered.
Notice this means that 𝝅 must be a left eigenvector, corresponding to
eigenvalue 1. Note also that all transition matrices posses a right eigenvector,
corresponding to eigenvalue 1, since 𝑷𝟏𝑆 = 𝟏𝑆 , where 𝟏𝑆 is a vector of S 1’s.
?
is it
unique will chain converge to it ?
/ on finite state spaces always posses at
Theorem: All Markov chains defined
least one equilibrium distribution.
Theorem: Under some conditions (the Markov chain is ergodic), the stationary
distribution satisfies:
lim 𝑷𝑛 = 𝟏′𝝅
Irreducibility
Definition: A state sj is said to be accessible from state si if a chain starting in state
si has a positive probability to reach state sj at some future time point n. That is,
𝑛
∃𝑛 > 0: 𝑝𝑖𝑗 >0
If sj is accessible from si and si is accessible from state sj then we say that si and sj
communicate.
A communicating class is defined to be a set of states that communicate.
Definition: If a discrete time Markov chain is composed by only one
communicating class (i.e., if all states in the chain communicate), then it is said to
be irreducible.
0.5 0.5 0 0
0.9 0.1 0 0
Example: 𝑃 = is reducible and has 2 classes.
0 0 0.2 0.8
0 0 0.7 0.3

Note: Irreducibility does not guarantee the presence of limiting

probabilities.
Arrow Indicate that
:
it is not 0
.

Periodicity
Definition: We define the
period of state i, denoted
d(i), to be the greatest
common divisor of
𝐽𝑖 = 𝑛 ≥ 0: 𝑝𝑛 𝑖, 𝑖 > 0
That is, any return to state i
must occur in multiples of
d(i) time steps.
We call an irreducible
transition matrix P
aperiodic if 𝑑 𝑖 = 1, ∀𝑖.
Periodicity
Theorem: If a stochastic matrix is irreducible and aperiodic, then there exists a
unique invariant distribution and it satisfies the ergodic theorem.

Examples:
1 0
𝑃= has infinitely many invariant distributions.
0 1
0 1
𝑃= has a unique invariant distribution but is not ergodic.
1 0
Which condition is violated in each case?

Exercise: Is the following transition matrix irreducible and aperiodic?

1 1
0 0
2 2
1 2
𝑷= 0 0
3 3
1 0 0 0
0 0 1 0
The fine print

I
(𝑛)
Definition: 𝑓𝑖𝑖 = 𝑃 𝑋𝑛 = 𝑖, 𝑋1 ≠ 𝑖, … , 𝑋𝑛−1 ≠ 𝑖 𝑋0 = 𝑖 is the probability of
first recurrence to i at the nth step.
(𝑛)
Then 𝑓𝑖 = 𝑓𝑖𝑖 = σ∞
𝑛=1 𝑓𝑖𝑖 is the probability of recurrence.

Definition: A state i is called recurrent if 𝑓𝑖 = 1, that is, eventual return is

certain.
Otherwise, if 𝑓𝑖 < 1 the state is called transient.

Definition: If the mean time for recurrence is finite, the state is called positive
recurrent, otherwise it is null-recurrent.
Note: For infinite-state Markov chains, all 12 combinations of (irreducible or
not), (aperiodic or not), and (positive recurrent or null recurrent or transient)
are possible.

Theorem: If a general Markov chain is positive recurrent, then there exists a

unique finite invariant measure.
Rscripting .

Finding the stationary distribution

• Using a computer, you can apply “brute force”, meaning

compute some very high power of P and then each row should
be approximately equal to 𝜋

• Eigen decomposition: Notice that 𝝅′ 𝑷 = 𝝅′ implies that 𝜋 is a

left eigenvector of P, corresponding to eigenvalue 1. Therefore,
if you can find the eigenvalues of P it means you can find 𝜋

• You can solve 𝝅′ 𝑷 = 𝝅′ as a system of linear equations subject

to σ 𝜋𝑖 = 1 . Distribution
Probability .
Rate of convergence
Theorem: The eigenvalues 𝜆1 , … , 𝜆𝑠 of a Markov transition matrix
P satisfy the inequality 𝜆𝑖 ≤ 1, ∀ 𝑖 = 1, … , 𝑠.
Theorem: All Markov chain transition matrices have at least one
eigenvalue equal to 1.

If all eigenvalues are positive and 𝜆1 > ⋯ > 𝜆𝑠 , then 𝜆1 = 1 and

the size of the next eigenvalue 𝜆2 indicates the rate of the speed of
convergence as we approach equilibrium. The reason is because it
describes how quickly the largest of the vanishing terms will
approach zero in:
𝑠

𝑷𝑛 = ෍ 𝜆𝑛𝑖 𝒓𝑖 𝒍𝑖 ′
𝑖=1
where ri are the right eigenvectors, and li are the left eigenvectors.
Note: There is no formula how to compute the left eigenvectors if
you already computed the right eigenvectors.
Business
Example
Coke and Pepsi are the
only companies in
country X. A soda
company wants to tie up
with one of these
competitor. They hire a
market research company
to find which of the brand
will have a higher market
share after 1 month and
after 2 months. Currently,
Pepsi owns 55% and Coke
owns 45% of market
share. Following are the
conclusions drawn out by
the market research
company:
Likelihood Inference
Assume we have observed data s0, s1, … , sn at times 0, 1, … , n
from a stationary Markov chain Xt. Given tjat sequence, we can
estimate the transition matrix using the MLE method.
Then the likelihood is:
𝐿 𝑷 = 𝑃 𝑋0 = 𝑠0 , … , 𝑋𝑡𝑛 = 𝑠𝑛
𝑛−1

= 𝑃 𝑋0 = 𝑠0 ෑ 𝑃 𝑋𝑡𝑖+1 = 𝑠𝑡𝑖+1 𝑋𝑡𝑖 = 𝑠𝑖

𝑖=0
𝑛−1 𝑠 𝑠
𝑛𝑖𝑗
= 𝑃 𝑋0 = 𝑠0 ෑ 𝑝𝑠𝑖 𝑠𝑖+1 = 𝑝0 ෑ ෑ 𝑝𝑖𝑗
𝑖=0 𝑖=1 𝑗=1
where nij is the observed frequency of transitions from i to j.
The loglikelihood is:
𝑠 𝑠

𝑙 𝑷 = ෍ ෍ 𝑛𝑖𝑗 log 𝑝𝑖𝑗 + log 𝑝0

𝑖=1 𝑗=1
MLE
If you take the derivatives,
𝜕𝑙(𝑷) 𝑛𝑖𝑗
=
𝜕𝑝𝑖𝑗 𝑝𝑖𝑗
It can be shown that the MLE of 𝑝𝑖𝑗 is

𝑛𝑖𝑗
𝑝Ƹ𝑖𝑗 =
𝑛𝑖⋅

where 𝑛𝑖⋅ = σ𝑠𝑗=1 𝑛𝑖𝑗

Note: there is a chi-square test if the transition probabilities are

independent of the current state i.e. pij = pj. It can be used to test if
a data sequence is independent or a Markov chain or also to test
convergence to equilibrium.
Extensions

One can extend the idea of a first-order Markov chain to chains of

order m, where the probability of transition into s depends on the m
previous states
𝑃 𝑋𝑗 = 𝑠|𝑋0 = 𝑠0 , … , 𝑋𝑡𝑗−1 = 𝑠𝑗−1
= 𝑃 𝑋𝑗 = 𝑠|𝑋𝑗−𝑚 = 𝑠𝑗−𝑚 , … , 𝑋𝑡𝑗−1 = 𝑠𝑗−1

When m = 1 we have s(s − 1) parameters There are Sm such vectors

now!
Extensions

The state space S can in countably infinite. Then the transition

“matrix” is infinite as well! The theoretical results are a lot more
complicated.
A famous example is known as random walk.
For example, a random walk on the integer numbers ℤ can be
defined as being at position z and changing to either z + 1 or z – 1
with probabilities
1 1 𝑧
𝑝𝑧,𝑧−1 = +
2 2 𝑐+ 𝑧
𝑝𝑧,𝑧+1 = 1 − 𝑝𝑧,𝑧−1 , 𝑧 ∈ ℤ
where c > 0.
Note that the set of all these p form an infinite matrix where most of
the elements are zeros.
Road ahead

• Markov chain theory is most widely used in the Markov chain

Monte Carlo (MCMC) methods in statistics.
• However, in MCMC the state space is usually uncountable, for
example, 𝑆 = (−∞, ∞), that is, 𝑋𝑛 ∈ 𝑆 would be a continuous
random variable.
• The transition matrix becomes a transition kernel
(A matrix can be regarded as an operator from vectors in ℝ𝑠 to ℝ𝑠 ,
whereas the kernel transforms one density function to another
density function)
• The equilibrium distribution is not a vector but a pdf.
• The conditions for ergodicity are more complicated
• The eigenvectors are called eigenfunctions.
• There are infinitely many eigenvalues!
Example

• Consider (again) a random walk defined as

𝑛

𝑋𝑛 = ෍ 𝑍𝑖
𝑖=0
where Zi ~ N(0,1) are independent.
Then the transition kernel is a conditional density given by
𝑘 ⋅ 𝑥 = 𝑁 𝑥, 1
This should be interpreted as the conditional density of Xn+1 given
that Xn = x.
What does this mean?
𝑏
1 𝑦−𝑥 2
𝑃 𝑎 < 𝑋𝑛+1 < 𝑏 𝑋𝑛 = 𝑥 = න 𝑒 − 2 𝑑𝑦
2𝜋
𝑎
Note that the Lebesgue measure is the invariant measure in this case.

Markov Chain (Part 1)
No ratings yet
Markov Chain (Part 1)
31 pages
System Modeling - 5
No ratings yet
System Modeling - 5
9 pages
Markov Decission Process. Unit 3
No ratings yet
Markov Decission Process. Unit 3
37 pages
Markov Hand Out
No ratings yet
Markov Hand Out
14 pages
Markov Process: Analysis and Applications
No ratings yet
Markov Process: Analysis and Applications
113 pages
Markov Process: Properties, Analysis and Applications: Ajay Kumar
No ratings yet
Markov Process: Properties, Analysis and Applications: Ajay Kumar
113 pages
Markov Chains for Business Students
No ratings yet
Markov Chains for Business Students
39 pages
Markov Chains
No ratings yet
Markov Chains
6 pages
07 Markov Chains
No ratings yet
07 Markov Chains
4 pages
Lec7 MarkovChains
No ratings yet
Lec7 MarkovChains
14 pages
Long-Run State Occupation in Markov Chains
No ratings yet
Long-Run State Occupation in Markov Chains
53 pages
DCN 1
No ratings yet
DCN 1
128 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
No ratings yet
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
30 pages
DSBD Unit-Ii 2
No ratings yet
DSBD Unit-Ii 2
47 pages
Stochastic Processes & Markov Chains
100% (1)
Stochastic Processes & Markov Chains
16 pages
Markov Chains: Definitions & Examples
No ratings yet
Markov Chains: Definitions & Examples
7 pages
Markov Chains for Academics
No ratings yet
Markov Chains for Academics
80 pages
Discrete Markov Chain
No ratings yet
Discrete Markov Chain
43 pages
STA 412 Markov Chain
No ratings yet
STA 412 Markov Chain
11 pages
MC Notes
No ratings yet
MC Notes
42 pages
Introduction to Markov Chains
No ratings yet
Introduction to Markov Chains
48 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
4 pages
Markov Chains 2023
No ratings yet
Markov Chains 2023
130 pages
Sistema. Markov Chain - Anton, Rorres - 10.4 (Intro) (Solucao de Sistema)
No ratings yet
Sistema. Markov Chain - Anton, Rorres - 10.4 (Intro) (Solucao de Sistema)
10 pages
Discrete Time Markov
No ratings yet
Discrete Time Markov
71 pages
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
No ratings yet
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
38 pages
STA03B3 Lecture 4
No ratings yet
STA03B3 Lecture 4
34 pages
Hidden Markov Model Overview
100% (2)
Hidden Markov Model Overview
73 pages
Algorithm Foundations of Data Science: Lecture 1: Markov Chain
No ratings yet
Algorithm Foundations of Data Science: Lecture 1: Markov Chain
111 pages
Markov Chains
No ratings yet
Markov Chains
50 pages
Discrete-Time Markov Chains: He Shuangchi
No ratings yet
Discrete-Time Markov Chains: He Shuangchi
61 pages
Markov Chains
No ratings yet
Markov Chains
55 pages
All Chapters
No ratings yet
All Chapters
180 pages
Markov Chain - Monte Carlo Extension Lecture
No ratings yet
Markov Chain - Monte Carlo Extension Lecture
48 pages
Stochastic Processes & Markov Chains Overview
No ratings yet
Stochastic Processes & Markov Chains Overview
37 pages
Discrete-Time Markov Chains Overview
No ratings yet
Discrete-Time Markov Chains Overview
43 pages
Markov Chains2
No ratings yet
Markov Chains2
75 pages
Understanding Markov Models and Chains
100% (7)
Understanding Markov Models and Chains
91 pages
Gawe Rantai Markov 325book Proc Stoc-150-195
No ratings yet
Gawe Rantai Markov 325book Proc Stoc-150-195
46 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
JMSSP 2013 149 154
No ratings yet
JMSSP 2013 149 154
6 pages
Markov Chains
No ratings yet
Markov Chains
45 pages
Markov Chains
No ratings yet
Markov Chains
45 pages
Markov Models: 1 Definitions
No ratings yet
Markov Models: 1 Definitions
10 pages
Handout02 2
No ratings yet
Handout02 2
37 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
STA03B3 Lecture 3
No ratings yet
STA03B3 Lecture 3
27 pages
Notes On Markov Chain
No ratings yet
Notes On Markov Chain
34 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Stochastic Proccess
No ratings yet
Stochastic Proccess
114 pages
Stochastic Processes Beamer
No ratings yet
Stochastic Processes Beamer
43 pages
Stoch Bio Chapter 3
No ratings yet
Stoch Bio Chapter 3
46 pages
Frog Markov Chain Explained
No ratings yet
Frog Markov Chain Explained
35 pages
Stoch Procs Lecture1 2025
No ratings yet
Stoch Procs Lecture1 2025
20 pages
16 GLM2
No ratings yet
16 GLM2
29 pages
15 GLM
No ratings yet
15 GLM
32 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Sphere, Pyramid, Cone, Composite QP 12
No ratings yet
Sphere, Pyramid, Cone, Composite QP 12
7 pages
Phy Chem 3 Past Questions 2011 - 2012
No ratings yet
Phy Chem 3 Past Questions 2011 - 2012
45 pages
Ecotec Installation and Servicing Manual 261417
No ratings yet
Ecotec Installation and Servicing Manual 261417
88 pages
Cambridge International AS & A Level: Physics 9702/33
No ratings yet
Cambridge International AS & A Level: Physics 9702/33
10 pages
College Algebra: Polynomials & Functions
No ratings yet
College Algebra: Polynomials & Functions
9 pages
Calculus:: Real Analysis
No ratings yet
Calculus:: Real Analysis
2 pages
Stress-Strain Analysis in R.C. Design
No ratings yet
Stress-Strain Analysis in R.C. Design
44 pages
Physics Project File
No ratings yet
Physics Project File
18 pages
The Human Eye and The Colourful World
No ratings yet
The Human Eye and The Colourful World
2 pages
Lecture 8 - 1st Law of Thermodyanmics - Joule Thomson Effect Updated 04-30-2020
No ratings yet
Lecture 8 - 1st Law of Thermodyanmics - Joule Thomson Effect Updated 04-30-2020
48 pages
Charges and Efield
No ratings yet
Charges and Efield
49 pages
Mechanical Properties of Fluids HSC
No ratings yet
Mechanical Properties of Fluids HSC
38 pages
Effects of Weather Conditions On Electromagnetic Field Parameters
No ratings yet
Effects of Weather Conditions On Electromagnetic Field Parameters
8 pages
Flow Meter Errors in Compressor Systems
No ratings yet
Flow Meter Errors in Compressor Systems
100 pages
Perpetual Motion and Thermodynamics
No ratings yet
Perpetual Motion and Thermodynamics
7 pages
Mx8000 Image Quality
No ratings yet
Mx8000 Image Quality
60 pages
Skyrmion Dynamics via Spin Waves
No ratings yet
Skyrmion Dynamics via Spin Waves
6 pages
Uganda A-Level Chemistry Paper 2
No ratings yet
Uganda A-Level Chemistry Paper 2
7 pages
Mathematical Modeling PDF
No ratings yet
Mathematical Modeling PDF
30 pages
Applied Physics: Current, Voltage, Components
No ratings yet
Applied Physics: Current, Voltage, Components
46 pages
Electrostatic Force Lesson Plan
No ratings yet
Electrostatic Force Lesson Plan
5 pages
Miniature Circuit Breakers
No ratings yet
Miniature Circuit Breakers
78 pages
A Fundamental Concept For High Speed Relaying
No ratings yet
A Fundamental Concept For High Speed Relaying
11 pages
VACON VAALS Diesel Pump Parameters
No ratings yet
VACON VAALS Diesel Pump Parameters
3 pages
Exp.5 - Helical Spring
No ratings yet
Exp.5 - Helical Spring
7 pages
RCD-Lecture 6-Shear Design
No ratings yet
RCD-Lecture 6-Shear Design
94 pages
Wave-Particle Duality
No ratings yet
Wave-Particle Duality
18 pages
Retaining Wall Overturning Safety Check
No ratings yet
Retaining Wall Overturning Safety Check
3 pages
Generator Protection Relay Settings
100% (4)
Generator Protection Relay Settings
13 pages
IOM Unisplit
No ratings yet
IOM Unisplit
21 pages

18-MarkovChains 2

Uploaded by

18-MarkovChains 2

Uploaded by

STAT 5703

Prof. Andrei A. Markov (1856-1922) , published his

• Of the fares picked up in Northside, 50% stay in that region,

We would like to know what the distribution of taxis will be

Random Var Sequence but dont

This information can be IID .

represented in a state Dependence is specified as

• If the location of the taxi at time n is denoted by Xn , then the

variable Xn can take, are known as states of the chain.

• The probabilities of moving from state to state are constant

𝑃 𝑋𝑛 = 𝑠𝑛 𝑋𝑛−1 = 𝑠𝑛−1, … , 𝑋0 = 𝑠0 = 𝑃 𝑋𝑛 = 𝑠𝑛 𝑋𝑛−1 = 𝑠𝑛−1

• We assume that a transition – picking up and dropping off a

• We can create a square matrix, P, called the transition

• An entry Pij of this matrix is the probability of a transition

What results when we multiply the transition matrix by itself?

The highlighted entry results from the same computation that

𝑃 𝑋𝑡𝑖 = 𝑠𝑡𝑖 𝑋0 = 𝑠0 , 𝑋𝑡1 = 𝑠𝑡1 , … , 𝑋𝑡𝑖−1 = 𝑠𝑖−1

This simplifies the likelihood to:

Definition: A transition matrix Pt for a Markov chain {X} at time t is a matrix

For a homogeneous Markov chain Xt, observed at a discrete equally spaced

Note: The transition matrix must satisfy that

Theorem: By induction, we can prove that the pmf of Xn is given by 𝒑′ 𝑷𝑛 .

Note: Irreducibility does not guarantee the presence of limiting

Exercise: Is the following transition matrix irreducible and aperiodic?

Definition: A state i is called recurrent if 𝑓𝑖 = 1, that is, eventual return is

Theorem: If a general Markov chain is positive recurrent, then there exists a

Finding the stationary distribution

• Using a computer, you can apply “brute force”, meaning

• Eigen decomposition: Notice that 𝝅′ 𝑷 = 𝝅′ implies that 𝜋 is a

• You can solve 𝝅′ 𝑷 = 𝝅′ as a system of linear equations subject

If all eigenvalues are positive and 𝜆1 > ⋯ > 𝜆𝑠 , then 𝜆1 = 1 and

= 𝑃 𝑋0 = 𝑠0 ෑ 𝑃 𝑋𝑡𝑖+1 = 𝑠𝑡𝑖+1 𝑋𝑡𝑖 = 𝑠𝑖

𝑙 𝑷 = ෍ ෍ 𝑛𝑖𝑗 log 𝑝𝑖𝑗 + log 𝑝0

where 𝑛𝑖⋅ = σ𝑠𝑗=1 𝑛𝑖𝑗

Note: there is a chi-square test if the transition probabilities are

One can extend the idea of a first-order Markov chain to chains of

When m = 1 we have s(s − 1) parameters There are Sm such vectors

The state space S can in countably infinite. Then the transition

• Markov chain theory is most widely used in the Markov chain

• Consider (again) a random walk defined as

You might also like