0% found this document useful (0 votes)

28 views5 pages

Leonard Baum PDF

The document discusses a maximization technique for estimating parameters of probabilistic functions of Markov processes, focusing on the likelihood of observed samples. It introduces transformations and inequalities to derive maximum likelihood estimates under certain constraints. The author presents several theorems to support the methodology and its applications in statistical estimation.

Uploaded by

Eduardo Almeida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Leonard Baum PDF

Uploaded by

Eduardo Almeida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

An Inequality and Associated Maximization Technique

in Statistical Estimation
for Probabilistic Functions of Markov Processes
LEONARD E. BAUM

Institute for Defense Analyses, Princeton, New Jersey

We say that {Yt} is a probabilistic function of the Markov process {Xt}

P(Xt+1 = j I Xt = i, Xt-1,"" Yt ,...) = ai; , i,j == 1,..., s;

P(Yt+1 = k I Xt+1 = j, Xt = i, Xt-1 ,..., Yt, Yt-1 ,...) = bilk),

i,j = 1,..., s, k = 1,..., r.

We assume that {ai;},{bii(k)} are unknown and restricted to be in the mani-

fold M
s
ai; ?: 0, L
;~1
au = 1, i = 1,..., s,

r
boCk) ?: 0, L buCk) =
k~1
1, i,j = 1,..., s.

We see a Y sample {Y1 = Y1 , Y2 = Y2'''''. YT = YT} but not an X sample

and desire to estimate {ai; , bilk)}. .

We would like to choose maximum likelihood parameter values, i.e.,

{ai; , bilk)} which maximize the probability of the observed sample {Yt}

p{y,}({ai; , boCk)}) = P({ai; , buCk)})

s
L aioaioilbioil( Yt) aili2bili2(Y2) ... aiT-l iTbiT(Yr) (1)
io.i1''' ..iT~1

where ai are initial probabilities for the Markov process. For this purpose
1

~A
2
BAUM
-
INEQUALITY AND MAXIMIZATION TECHNIQUE 3
where
we define a transformation r{aii, biiCk)} = {au, DiiCk)}of M into itself
(b) An attempt to solve the likelihood equation obtained by setting the
partial derivatives of P with respect to the au and buCk) = 0, taking due
- = Lt P(Xt = i, Xt+1=
aii . j I {ytJ, {aii, bij(k)}) account of the restraints, is indicated in the third expressions for aii and bu(k)
Lt P(Xt = 11{Yt}, {aii, biiCk)}) since the likelihood equations can be put into the form
-Lt rxtCi)(JtHU) aiibiiCYtH)
au oP/oau
- 'Lt rxtCi)(Jt(i) a..
"
" ,
= "£ i au u"P/uau
- aii oP/oaii
- Li (Iu oP/oau ' (2a) b.{k) = bilk) oP/obii(k)
" Lk buCk) oP/obii(k) .
D.{k) = LYt=k P(Xt = i, XtH = j I{Yt}, {aii, biiCk)})
THEOREM1. [1] p( r{au , bilk)}) > p({au , buCk)})unlessr{au,bilk)} =
" Lt P(Xt = i, XtH = j I{Yt}, {aii, bij(k)}) {aii, bilk)} whichis true if and only if{au, bilk)} is a criticalpoint of P, i.e.,
LYt=k rxt(i)(JtHU) aiibiiCYtH) a solution of the likelihood equations.
- Lt rxt(i)(Jt+1U)aiibii(YtH)
Note that r depends only on the first derivatives of P. Now if one moves a
bij(k) oP/obij(k)
sufficiently small distance in the gradient direction, one is guaranteed to
= Lk bij(k) oP/obiiCk) . (2b) increase P, but how small a distance depends on the second partials. It is
somewhat unexpected to find that it is possible to specify a point at which P
The second of the equivalent forms in Eqs. (2) contains quantities rxtCi), increases, without any mention of higher derivatives.
IINI in t bywhich are defined inductively forwards and backwards, respectively,
(JtU) Eagon and the author [1] originally observed that P({au, buCk)})is a homo-
I111
8 geneous polynomial of degree 2T + 1 in ai' au , buCk) and obtained the
= I result as an application of the following theorem.
rxtHU)
i=l rxtCi)aiibiiCYtH)' j = 1,...,s, t = 0, 1,..., T - 1,
1111 8
THEOREM2. [1] Let
(3)
'lil
(JtCi)= I
i=l (JtHU) aijbiiCYtH)' i = 1,..., s, t = T - 1, T - 2,..., O. P(Zl ,..., zn) = I CI"l,I"2,...,I"n
ZI"lZI"2
1 2 ... zI"n
n where cI"l,I"2,...,I"n ?'>-:0
111'1 1"1'1"2'" .,I"n
I I1II

'/111 Note multiplications.

4s2T that the rxtCi),f3t(i),
Hencei = 1,..., s, t = 0,..., T can all be computed with and ft1 + ... + ftn = d. Then
lll

8
. \ ZioP/ozi
I II

jIII P({aij , buCk)})= I (XtCi)

(JtCi)
r . {Zi} -+ ! Li Zi oPjoZi !

' i=l
~1'1
maps D : Zi ~ 0, L Zi = I into itself and satisfies P(r{zi}) ~ P{Zi}' In fact,
1'11 (identically in t) can be computed with 4S2T multiplications rather than the strict inequality holds unless {Zi} is a critical point of P in D.
II~I
2TsT+1multiplications indicated in the defining formula (1). Similarly, the
partial derivatives of P needed for defining the image in (2) are computed For the proof, the partial derivatives were evaluated as
m!
from the rx'sand (J's with a work factor linear in T, not exponential in T.
defined
Thereinare
(2):three ways of rationalizing the use of this transformation,
Zi OPjoZi = L
I"1'I"2,...,I"n
Cl"l'I"2 I"nftiZ~lz~t ... z~n

'I~
and substituted for the variables Zi in the expression for P. An elementary
(a) Bayesian a posteriori reestimation suggested the transformation r
I~
originally and is embodied in the first expressions for aii and Du(k). though very tricky juggling of the inequality between geometric and arith-
III metic means and HOlder's inequality then led to the desired result through a
In

III

(
-
BAUM
INEQUALITY AND MAXIMIZATION TECHNIQUE 5

route which cast no light on what was actually happening. The author believes
by hypothesis. Jensen's inequality is applicable to the first inequality since
the following derivation due to Baum et al. [2], which greatly generalizes
p(x, ,\) dp,(x)/P(x) is a nonnegative measure with total mass 1. Since log is
theWe
applicability of the transformation
adopt a simplified notation. We T, lays bare the essence of the situation.
write strictly concave (log" < 0), equality can hold only if p(x, A)/p(x, ,\) is constant
a.e. with respect to dp,(x).
peA) L p(x,
= "'EX A) We now have a way of increasing P('\). For each ,\ we need only find a
Awith Q(A,A) ;;:;,Q('\, '\). This may not seem any easier than directly finding
a A with peA) ;;:;,P('\). However the author shall show that under natural
where A specifies an [s - 1 + s(s -
1) + s2(r - l)]-dimensional param-
assumptions and in particular in the cases of interest:
eter point
0 1 {ai, aij,
T bij(k)} in [s + S2+ s2r]-dimensional space and
x = {Xi, Xi "'" Xi } is a sequenceof states of the unseenMarkov process. (a)For fixed '\, Q('\, '\') assumes its global maximum as a function of ,\'
The summation is over X, the space of all possible T + 1 long sequences of at a unique point T('\).
states, and p(x, A) = ai 0ai 0i1bi 0i1(Yl) ... ai T-1 iTbi T-1 iT (YT) is the probability (b) T('\) is continuous.
of the Markov process following that sequence of states and producing the
Wewrite (c) T('\) is effectively computable.
observed {Yt} sample for the parameter values {ai, aij, bij(k)}. More generally, (d) P(T('\»);;:;'P('\) which follows from Theorem 3 and the definition of
T('\) since,\' = A is one of the competitors for the global maximum
peA) = J"'EXp(x, A) dfl(A) of Q(A, A') as a function of A'.

We apply Theorem 3 to the principle case of interest. Letting {a, A, B}

where fl is a finite nonnegative measure and p(x, A)is positive a.e. with respect denote {ai' aij , bij(k)}, we have
for each
to fl. In of
thethe ST+1points
main x. of interestfl is a counting measure: fleX) = 1
application
Pea, A, B) = LP(x, a, A, B)
We wish to define a transformation T on the A-spaceand show that '"
variables where
P(T(A») > peA). For this purpose we define an auxiliary function of two
T-l T-l
Q(A,A') = J "'EXp(x, A) logp(x, A') dfl(X).
p(x, a, A, B) = a",o TI a"'t"'1+1TI b"'t"'l+l(Yt+1)'
t=o t-O
Also

p(x,THEOREM
A) = p(x,3.A) [2]
a.e. If Q(A,A) to
with respect ;;:;,Q(A,
fl. A), then peA) > peA) unless Q(a, A, B; a', A', B')

Proof. We shall apply Jensen's inequality to the concave function log x. = L

~x p(x, a, A, B) !log a~o + Lt log a~t"'t+1+ Lt log b~t"'t+1(Yt+1)!'
We wish to prove peA) ;;:;,PeA) or, equivalently, 10g[P(A)/P(A)];;:;,O. Now
peA) 1
log P(A) = log [peA) Jxp(x, A)dfl(X)] For fixed a, A, B we seek to maximize Q as a function of a', A', B'. We
observe that for a, A, B fixed, Q is a sum of three functions-one involving
only {a/}, the second involving only {a~i}'and the third involving only {b~lk)}
= 10
g
J[
X
P(x, A) dfl(X)
peA)
p(x, A)
]p(x, A)
which can be maximized separately.
We consider the second of these. Observe that

J[
>-
~ X
P(x, A) dfl(X)
peA)
]1og p(x,
p(x, A)
A) L p(x, a, A, B) Lt log a~t"'l+l =
s

L [L p(x, L
a, A, B) t:"'t=i log a~''''1+1]
"'EX i=1"'EX
1
= peA) [Q(A,A) - Q(A, A)] ;;:;, 0 is itself a sum of s functions the ith of which involves only a~i,j = 1,..., s,
which can be maximized separately. If we let nii(x) be the number of t's with
6
BAUM
INEQUALITY AND MAXIMIZATION TECHNIQUE 7

Xt =
ith i, Xt+! as= j in the sequence of states specified by x, we can write the
function distributed variable with an unknown mean mt and standard deviation at '

Now we wish to maximize the likelihood density of an observation Y1,.." YT ,

8 8

I I niix) p(x, a, A, B) log a~j = I Aij log a~j pea, A, m, a) = I pea, A, m, a, x)

"'EX
j=1 "'EX j=1
where
where Aij = L:"'EXniix) p(x, a, A, B). But
pea, A, m, a, x) = a"'oa"'o"'lb(m"'l'a"'l ' Yl) ... a"'T-l"'Tb(m"'T' a"'T' YT)'
I
j=1
Aij log a~j
With
as a function of {a;,}, subject to the restraints
8
Q(a,A, m, a, a', A', m', a') = I
p(x, a, A, m, a) logp(x, a', A', m', a')
"'EX
" a:ZJ =
L,
j=1
0 1, a;j ?: 0,
Theorem 3 applies since everything is nonnegative; it is sufficient to find ii,
A, iii, ii such that
attains a global maximum at the single point
8 Q(a, A, m, a; a, A, iii, ii) ?: Q(a, A, m, a; a, A, m, a).
iiu = Aijl I
j=1
Aij.
An argument similar to one given previously shows that:

This {iiu} agrees with the first expression of (2); i.e., THEOREM 4. [2J For each fixed {a, A, m, a}, the function Q(a, A, m, a;
t-l
a', A', m', a') attains a global maximum at a unique point. This point
t=o T(a, A, m, a), the transform of {a, A, m, a}, is given by
I P(Xt = i, Xt+l = j' {Yt}, {aij, bij(k)}) = Aij/P({Yt} I{aij, buCk)}).
Similarly we obtain - Lt rxli) aii{lt+lU) b(mj , aj ,Yt+l)
,
au = "8 " ' '

"-i-l "-t rxt(z ) atj (lt+l (] ) b (mj , aj , Yt+1 )

iii = "'o=t
Io p(x, a, A, B)II '" p(x, a, A, B),
Lt rxtU)fltU) Yt
mj=
Lt rxtU)(ltU) ,

5iik) = Ip(x, a,A, B) "'t=t''''t+l=j,Vt+l=k

I 0 l/Ip(x,
'" a, A, B) "'I=t
oL 01,
,"'1+1=' al = Lt rxlj) (ltU)(Yt - mj)2
Lt rxtU)(ltU)
in agreement with (1). Of course iii, iiu , Du(k) are computed by inductive
calculations as indicated in the second expression of (2) and in (3), not as in The last two can be interpreted, respectively, as a posteriori means and
the above formulas. variances.
We have now shown that the transformation T increases P in the case
where the output observables Y take values in a finite state space. More generally, let bey) be a strictly log concave density, i.e., (log b)" < O.
We can also consider the case [2J where the output observables Yt are real- We introduce a two-parameter family involving location and scale parameters
valued. For example, imagine that mt, at in state i by defining b(m, a, y) = b(y - m)fa) as we did for the
normal density above. The following theorem is somewhat harder to prove
. 1 -(Yt - mi)2 than the previous results for the discrete and normal output variables:
P(Yt = Y' Xt = z) = (2 7T)1/2at exp 2 at 2 = b(mi, at ,Yt);
THEOREM5. [2] For fixed a, A, m, a the function Q(a, A, m, a,
i.e., associated with state i of an unseen Markov process there is a normally
La" A', m', a') attains a global maximum at a single point (ii, A,m, ii). The
8 BAUM

transformation T(a, A, m, 0")= (ii, .4, m, u) thus defined is continuous and

p(T(a, A, m, 0"»)~ pea, A, m, 0") with equality if and only ifT(a, A, m, 0")=
(a, A, m, 0")which, in turn, holds if and only if (a, A, m, 0")is a critical point
ofP.

However, the new mi , Ui do not have obvious probabilistic interpretations

as in the normal case above. Moreover, thesemi and Uicannot be inductively
computed as in the finite and normal output cases. These facts greatly
decrease the interest in the last transformation T.
We now consider convergence properties of the iterates of the trans-
formation T. We have P(T(..\»)~ P(..\), equality holding if and only if
T(..\) = ..\ which holds if and only if ..\is a critical point of P. It follows that
if ..\0is a limit point of the sequence Tn(..\),then T(..\o)= ..\0'[In fact, if Tni --* ..\0,
then P(..\o)~ P(T(..\O»)= limi p(Tni+1(..\»)~ limi p(Tni+1(..\») = P(..\o).] We
want to conclude that Tn(..\)--* ..\0. If P has only finitely many critical points
so that T has only finitelymany fixedpoints, this followsas an elementary
point set topology exercise. However, at least theoretically, if P has infinitely
many critical points, limit cycle behavior is possible.
However, T has additional properties beyond those just used and it is
possible that a theorem guaranteeing convergence to a point is provable under
suitable hypotheses. For related material see References [3] and [4].

REFERENCES

1. L. E. BAUMANDJ. A. EAGON,An inequality with applications to statistical prediction

for functions of Markov processes and to a model for ecology. Bull. Amer. Math. Soc.
73 (1967), 360-363.
2. L. E. BAUM,T. PETRIE,G. SOULES,ANDN. WEISS,A maximization technique occurring
in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist.
41 (1970), 164---171.
3. G. R. BLAKELY, Homogeneous non-negative symmetric quadratic transformations.
Bull. Amer. Math. Soc. 70 (1964), 712-715.
4. L. E. BAUMANDG. R. SELL,Growth transformations for functions on manifolds.
Pacific J. Math. 27 (1968), 211-227.

A Maximization Technique Occuring in The Statistical Analysis of Probabilistic Functions of Markov Chains - Leonard Baum
No ratings yet
A Maximization Technique Occuring in The Statistical Analysis of Probabilistic Functions of Markov Chains - Leonard Baum
8 pages
Asano - On Semi-Linear Parabolic Partial Differential Equations - 1965-Annotated
No ratings yet
Asano - On Semi-Linear Parabolic Partial Differential Equations - 1965-Annotated
18 pages
Maximal Inequalities of Weak Type
No ratings yet
Maximal Inequalities of Weak Type
19 pages
Jmi 14 20
No ratings yet
Jmi 14 20
14 pages
Generalization of F-W Thorem
No ratings yet
Generalization of F-W Thorem
23 pages
ch4 Rev1
No ratings yet
ch4 Rev1
42 pages
Approximation of Expectation of Diffusion Processes Based On Lie Algebra and Malliavin Calculus
No ratings yet
Approximation of Expectation of Diffusion Processes Based On Lie Algebra and Malliavin Calculus
15 pages
Markov Chains in Queueing Theory
No ratings yet
Markov Chains in Queueing Theory
29 pages
Iterated Logarithm Inequalities
No ratings yet
Iterated Logarithm Inequalities
5 pages
Supp 2
No ratings yet
Supp 2
42 pages
Hilbert Space For Random Processes
No ratings yet
Hilbert Space For Random Processes
11 pages
Comprehensive Guide to Useful Inequalities
No ratings yet
Comprehensive Guide to Useful Inequalities
3 pages
Peli Grad 1999
No ratings yet
Peli Grad 1999
18 pages
Ho Ilxll/-: 592 A OF Stochastic Partial Differential Equations
No ratings yet
Ho Ilxll/-: 592 A OF Stochastic Partial Differential Equations
9 pages
Controle Sto Arret Optimal
No ratings yet
Controle Sto Arret Optimal
58 pages
Key Useful Inequalities Overview
100% (1)
Key Useful Inequalities Overview
3 pages
Inequalities Cheat Sheet PDF
No ratings yet
Inequalities Cheat Sheet PDF
3 pages
Nonparametric Estimation of Trend For Stochastic Differential Equations Driven by Multiplicative Stochastic Volatility
No ratings yet
Nonparametric Estimation of Trend For Stochastic Differential Equations Driven by Multiplicative Stochastic Volatility
11 pages
Useful Inequalities: V0.27a November 29, 2014
No ratings yet
Useful Inequalities: V0.27a November 29, 2014
3 pages
Mircea Reghis - On Nonuniform Asymptotic Stability
No ratings yet
Mircea Reghis - On Nonuniform Asymptotic Stability
19 pages
Homework 2
No ratings yet
Homework 2
10 pages
A e Convergence
No ratings yet
A e Convergence
6 pages
Mathematics Department, Princeton University Annals of Mathematics
No ratings yet
Mathematics Department, Princeton University Annals of Mathematics
12 pages
Useful Inequalities: v0.29 June 17, 2017
No ratings yet
Useful Inequalities: v0.29 June 17, 2017
3 pages
Review2 PDF
No ratings yet
Review2 PDF
27 pages
1982 - Burbea and Rao - Differential Metric in Probability Space
No ratings yet
1982 - Burbea and Rao - Differential Metric in Probability Space
18 pages
Integral Inequalities Overview
0% (1)
Integral Inequalities Overview
22 pages
Integral Inequalities Overview
100% (1)
Integral Inequalities Overview
22 pages
Furuta Inequalities
No ratings yet
Furuta Inequalities
9 pages
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
No ratings yet
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
12 pages
200910jpa Dsobolev
No ratings yet
200910jpa Dsobolev
12 pages
MLB Assignment 7 Final
No ratings yet
MLB Assignment 7 Final
16 pages
New Formula for Linear Stochastic Equations
No ratings yet
New Formula for Linear Stochastic Equations
15 pages
Resumenes Analisis Funcional
No ratings yet
Resumenes Analisis Funcional
4 pages
The Doob Meyer Decomposition Revisited
No ratings yet
The Doob Meyer Decomposition Revisited
13 pages
Inequalities in Analysis
No ratings yet
Inequalities in Analysis
3 pages
Weighted Norm Inequalities For The Hardy Maximal Function
No ratings yet
Weighted Norm Inequalities For The Hardy Maximal Function
20 pages
The Dimension of The Support of A Random Distribution Function
No ratings yet
The Dimension of The Support of A Random Distribution Function
4 pages
Linear Stochastic Equations with Anticipation
No ratings yet
Linear Stochastic Equations with Anticipation
9 pages
New Method of On-Line Estimation of Noise Coy Ariances and R
No ratings yet
New Method of On-Line Estimation of Noise Coy Ariances and R
6 pages
Lecture 4 Inequalities and Asymptotic Estimates
No ratings yet
Lecture 4 Inequalities and Asymptotic Estimates
9 pages
IJAMSS - On Convergence Properties of Szasz Type Positive Linear Operator
No ratings yet
IJAMSS - On Convergence Properties of Szasz Type Positive Linear Operator
8 pages
Some Inequalities For Products of Power Sums: 1. Introduction and Background
No ratings yet
Some Inequalities For Products of Power Sums: 1. Introduction and Background
22 pages
Presentation On Kill All
No ratings yet
Presentation On Kill All
16 pages
BALAKRISHNAN Optimal Problem in Banach Spaces
No ratings yet
BALAKRISHNAN Optimal Problem in Banach Spaces
29 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
2 pages
Mathematics 13 02255
No ratings yet
Mathematics 13 02255
89 pages
Exponential Approximation And Meromorphic Interpolation: e L π, π f L π, π f n
No ratings yet
Exponential Approximation And Meromorphic Interpolation: e L π, π f L π, π f n
15 pages
Limiting Distribution of Explosive Correlation
No ratings yet
Limiting Distribution of Explosive Correlation
11 pages
KernelReversible 2submitted
No ratings yet
KernelReversible 2submitted
14 pages
Inequalities CheatSheet
No ratings yet
Inequalities CheatSheet
3 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Best Minimax Polynomial Approximation
No ratings yet
Best Minimax Polynomial Approximation
10 pages
Docslide - Us - Jehle and Renysolutions To Advanced Microeconomic Theory2ed 2000 PDF
100% (1)
Docslide - Us - Jehle and Renysolutions To Advanced Microeconomic Theory2ed 2000 PDF
12 pages
Math Olympiad Inequalities Guide
No ratings yet
Math Olympiad Inequalities Guide
12 pages
1 s2.0 002212369190099Q Main
No ratings yet
1 s2.0 002212369190099Q Main
7 pages
1 s2.0 0022247X79902348 Main
No ratings yet
1 s2.0 0022247X79902348 Main
8 pages
Presentation About On Industrial Training Completed in IOT (Internet of Things)
No ratings yet
Presentation About On Industrial Training Completed in IOT (Internet of Things)
20 pages
Running Production PostgreSQL Databases On Amazon RDS For PostgreSQL
No ratings yet
Running Production PostgreSQL Databases On Amazon RDS For PostgreSQL
42 pages
ENL1W - Short Story Test
No ratings yet
ENL1W - Short Story Test
7 pages
Soal ULANGAN B. ING Mentah SMT 1
No ratings yet
Soal ULANGAN B. ING Mentah SMT 1
13 pages
Writing 1
No ratings yet
Writing 1
2 pages
Q4 - LP - Week 1 - Arts
No ratings yet
Q4 - LP - Week 1 - Arts
4 pages
RPV311 TM en 11.2a
No ratings yet
RPV311 TM en 11.2a
302 pages
Software Requirements Specification For: Foodies Restaurant
No ratings yet
Software Requirements Specification For: Foodies Restaurant
24 pages
Nationalism vs. Patriotism Explained
No ratings yet
Nationalism vs. Patriotism Explained
4 pages
ComputerScience II C Optional
No ratings yet
ComputerScience II C Optional
6 pages
RRU3008 Remote Radio Unit Overview
No ratings yet
RRU3008 Remote Radio Unit Overview
14 pages
Critical Thinking: An Introduction To The Basic Skills - Seventh Edition: An Introduction To The Basic Skills
100% (8)
Critical Thinking: An Introduction To The Basic Skills - Seventh Edition: An Introduction To The Basic Skills
35 pages
Kimi Räikkönen 2006 Race Calendar
No ratings yet
Kimi Räikkönen 2006 Race Calendar
13 pages
GE1 Module 5 Eastern Western Thought
No ratings yet
GE1 Module 5 Eastern Western Thought
3 pages
Naukri SachinPoojar (8y 0m)
No ratings yet
Naukri SachinPoojar (8y 0m)
3 pages
MUET Writing Task 1 Lead-In Exercise
No ratings yet
MUET Writing Task 1 Lead-In Exercise
3 pages
8051micro Details
No ratings yet
8051micro Details
2 pages
ZV Authorized Contracts Backup
No ratings yet
ZV Authorized Contracts Backup
11 pages
Essential Programming Concepts and Techniques
No ratings yet
Essential Programming Concepts and Techniques
46 pages
Harinama Cintamani Overview
No ratings yet
Harinama Cintamani Overview
339 pages
Solutions - PT-4 - Axe, Axm, Axy (14.11.2022)
No ratings yet
Solutions - PT-4 - Axe, Axm, Axy (14.11.2022)
21 pages
Past Tense Verb Exercise: Shark Attack
No ratings yet
Past Tense Verb Exercise: Shark Attack
1 page
A Data Set Joining Primer
No ratings yet
A Data Set Joining Primer
4 pages
Berlitz French Phrase Book & Dictionary.s PDF
100% (5)
Berlitz French Phrase Book & Dictionary.s PDF
196 pages
Akasha Garbha
No ratings yet
Akasha Garbha
57 pages
2nd Bac Worksheets
100% (1)
2nd Bac Worksheets
75 pages
Organic - Positions in 20250803 2025-08-04T16!01!24Z
No ratings yet
Organic - Positions in 20250803 2025-08-04T16!01!24Z
31 pages
Diagnostic Test in Math 8
100% (1)
Diagnostic Test in Math 8
8 pages
Microsoft Word 2007 Keyboard Shortcuts
No ratings yet
Microsoft Word 2007 Keyboard Shortcuts
20 pages
B2 First for Schools Test 1 Answers
100% (2)
B2 First for Schools Test 1 Answers
7 pages

Leonard Baum PDF

Uploaded by

Leonard Baum PDF

Uploaded by

An Inequality and Associated Maximization Technique

Institute for Defense Analyses, Princeton, New Jersey

We say that {Yt} is a probabilistic function of the Markov process {Xt}

P(Xt+1 = j I Xt = i, Xt-1,"" Yt ,...) = ai; , i,j == 1,..., s;

P(Yt+1 = k I Xt+1 = j, Xt = i, Xt-1 ,..., Yt, Yt-1 ,...) = bilk),

i,j = 1,..., s, k = 1,..., r.

We assume that {ai;},{bii(k)} are unknown and restricted to be in the mani-

We see a Y sample {Y1 = Y1 , Y2 = Y2'''''. YT = YT} but not an X sample

We would like to choose maximum likelihood parameter values, i.e.,

p{y,}({ai; , boCk)}) = P({ai; , buCk)})

'/111 Note multiplications.

jIII P({aij , buCk)})= I (XtCi)

We apply Theorem 3 to the principle case of interest. Letting {a, A, B}

Proof. We shall apply Jensen's inequality to the concave function log x. = L

Now we wish to maximize the likelihood density of an observation Y1,.." YT ,

I I niix) p(x, a, A, B) log a~j = I Aij log a~j pea, A, m, a) = I pea, A, m, a, x)

"-i-l "-t rxt(z ) atj (lt+l (] ) b (mj , aj , Yt+1 )

5iik) = Ip(x, a,A, B) "'t=t''''t+l=j,Vt+l=k

transformation T(a, A, m, 0")= (ii, .4, m, u) thus defined is continuous and

However, the new mi , Ui do not have obvious probabilistic interpretations

1. L. E. BAUMANDJ. A. EAGON,An inequality with applications to statistical prediction

You might also like