0% found this document useful (0 votes)
158 views8 pages

1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory

The document discusses moments and moment generating functions. It defines the nth moment and nth central moment of a random variable X. The second central moment is the variance. Moments and moment generating functions can be used to calculate properties like skewness and kurtosis. The nth moment is equal to the nth derivative of the moment generating function evaluated at 0. The document provides an example calculating the mean and variance of an exponential random variable using its moment generating function. It also discusses how moment generating functions can show that a binomial distribution converges to a Poisson distribution as parameters change.

Uploaded by

Mavine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views8 pages

1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory

The document discusses moments and moment generating functions. It defines the nth moment and nth central moment of a random variable X. The second central moment is the variance. Moments and moment generating functions can be used to calculate properties like skewness and kurtosis. The nth moment is equal to the nth derivative of the moment generating function evaluated at 0. The document provides an example calculating the mean and variance of an exponential random variable using its moment generating function. It also discusses how moment generating functions can show that a binomial distribution converges to a Poisson distribution as parameters change.

Uploaded by

Mavine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

18 CHAPTER 1.

ELEMENTS OF PROBABILITY DISTRIBUTION THEORY

1.7.1 Moments and Moment Generating Functions


Definition 1.12. The nth moment (n ∈ N) of a random variable X is defined as
µ′n = E X n
The nth central moment of X is defined as
µn = E(X − µ)n ,
where µ = µ′1 = E X. 
Note, that the second central moment is the variance of a random variable X, usu-
ally denoted by σ 2 .

Moments give an indication of the shape of the distribution of a random variable.


Skewness and kurtosis are measured by the following functions of the third and
fourth central moment respectively:
the coefficient of skewness is given by
E(X − µ)3 µ3
γ1 = 3
= 3;
σ µ22
the coefficient of kurtosis is given by
E(X − µ)4 µ4
γ2 = 4
− 3 = 2 − 3.
σ µ2
Moments can be calculated from the definition or by using so called moment gen-
erating function.

Definition 1.13. The moment generating function (mgf) of a random variable X


is a function MX : R → [0, ∞) given by
MX (t) = E etX ,
provided that the expectation exists for t in some neighborhood of zero. 
More explicitly, the mgf of X can be written as
Z ∞
MX (t) = etx fX (x)dx, if X is continuous,
−∞
X
MX (t) = etx P (X = x)dx, if X is discrete.
x∈X

The method to generate moments is given in the following theorem.


1.7. EXPECTED VALUES 19

Theorem 1.7. If X has mgf MX (t), then


(n)
E(X n ) = MX (0),

where
(n) dn
MX (0) = MX (t)|0 .
dtn
That is, the n-th moment is equal to the n-th derivative of the mgf evaluated at
t = 0.

Proof. Assuming that we can differentiate under the integral sign we may write

d ∞ tx
Z
d
MX (t) = e fX (x)dx
dt dt −∞
Z ∞ 
d tx
= e fX (x)dx
−∞ dt
Z ∞
xetx fX (x)dx

=
−∞
= E(XetX ).

Hence, evaluating the last expression at zero we obtain

d
MX (t)|0 = E(XetX )|0 = E(X).
dt
For n = 2 we will get

d2
2
MX (t)|0 = E(X 2 etX )|0 = E(X 2 ).
dt
Analogously, it can be shown that for any n ∈ N we can write

dn
MX (t)|0 = E(X n etX )|0 = E(X n ).
dtn


Example 1.14. Find the mgf of X ∼ Exp(λ) and use results of Theorem 1.7 to
obtain the mean and variance of X.

By definition the mgf can be written as


Z ∞
t
MX (t) = E(e X) = etx fX (x)dx.
−∞
20 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY

For the exponential distribution we have

fX (x) = λe−λx I(0,∞) (x),

where λ ∈ R+ . Here we used the notation of the indicator function IX (x) whose
meaning is as follows:

1, if x ∈ X ;
IX (x) =
0, otherwise.

That is,
λe−λx , if x ∈ (0, ∞);

fX (x) =
0, otherwise.
Hence, integrating by the method of substitution, we get
Z ∞ Z ∞
tx −λx λ
MX (t) = e λe dx = λ e(t−λ)x dx = provided that |t| < λ.
0 0 t−λ
Now, using Theorem 1.7 we obtain the first and the second moments, respectively:
λ 1
E(X) = MX′ (0) = 2 t=0
= ,
(λ − t) λ

(2) 2λ 2
E(X 2 ) = MX (0) = 3 t=0
= 2.
(λ − t) λ
Hence, the variance of X is
2 1 1
var(X) = E(X 2 ) − [E(X)]2 = 2
− 2 = 2.
λ λ λ

Exercise 1.10. Calculate mgf for Binomial and Poisson distributions.

Moment generating functions provide methods for comparing distributions or


finding their limiting forms. The following two theorems give us the tools.

Theorem 1.8. Let FX (x) and FY (y) be two cdfs whose all moments exist. Then

1. If FX and FY have bounded support, then FX (u) = FY (u) for all u iff
E(X n ) = E(Y n ) for all n = 0, 1, 2, . . ..

2. If the mgfs of X and Y exist and are equal, i.e., MX (t) = MY (t) for all t in
some neighborhood of zero, then FX (u) = FY (u) for all u. 
1.7. EXPECTED VALUES 21

Theorem 1.9. Suppose that {X1 , X2 , . . .} is a sequence of random variables, each


with mgf MXi (t). Furthermore, suppose that

lim MXi (t) = MX (t), for all t in a neighborhood of zero,


i→∞

and MX (t) is an mgf. Then, there is a unique cdf FX whose moments are deter-
mined by MX (t) and, for all x where FX (x) is continuous, we have

lim FXi (x) = FX (x).


i→∞

This theorem means that the convergence of mgfs implies convergence of cdfs.

Example 1.15. We know that the Binomial distribution can be approximated by a


Poisson distribution when p is small and n is large. Using the above theorem we
can confirm this fact.

The mgf of Xn ∼ Bin(n, p) and of Y ∼ Poisson(λ) are, respectively:


t
MXn (t) = [pet + (1 − p)]n , MY (t) = eλ(e −1) .

We will show that the mgf of X tends to the mgf of Y , where λ = np.

We will need the following useful result given in the lemma:

Lemma 1.1. Let a1 , a2 , . . . be a sequence of numbers converging to a, that is,


limn→∞ an = a. Then  an n
lim 1 + = ea .
n→∞ n
Now, we can write n
MXn (t) = pet + (1 − p)
 n
1 t
= 1 + np(e − 1)
n
n
λ(et − 1)

= 1+
n
t
−→ eλ(e −1) = MY (t).
n→∞

Hence, by Theorem 1.9 the Binomial distribution converges to a Poisson distribu-


tion. 
22 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY

1.8 Functions of Random Variables


If X is a random variable with cdf FX (x), then any function of X, say g(X) = Y
is also a random variable. The question then is “what is the distribution of Y ?”

The function y = g(x) is a mapping from the induced sample space of the random
variable X, X , to a new sample space, Y, of the random variable Y , that is

g(x) : X → Y.

The inverse mapping g −1 acts from Y to X and we can write

g −1 (A) = {x ∈ X : g(x) ∈ A} where A ⊂ Y.

Then, we have
P (Y ∈ A) = P (g(X) ∈ A)

= P {x ∈ X : g(x) ∈ A}
= P X ∈ g −1 (A) .


The following theorem relates the cumulative distribution functions of X and Y =


g(X).

Theorem 1.10. Let X have cdf FX (x), Y = g(X) and let domain and codomain
of g(X), respectively, be

X = {x : fX (x) > 0}, and Y = {y : y = g(x) f or some x ∈ X }.

(a) If g is an increasing function on X then FY (y) = FX g −1(y) for y ∈ Y.




(b) If g is a decreasing function on X , then FY (y) = 1 − FX g −1(y) for y ∈ Y.




Proof. The cdf of Y = g(X) can be written as

FY (y) = P (Y ≤ y)
= P (g(X) ≤ y)

= P {x ∈ X : g(x) ≤ y}
Z
= fX (x)dx.
{x∈X :g(x)≤y}

(a) If g is increasing, then

{x ∈ X : g(x) ≤ y} = {x ∈ X : g −1 g(x) ≤ g −1(y)} = {x ∈ X : x ≤ g −1 (y)}.



1.8. FUNCTIONS OF RANDOM VARIABLES 23

So, we can write Z


FY (y) = fX (x)dx
{x∈X :g(x)≤y}
Z
= fX (x)dx
{x∈X :x≤g −1 (y)}
Z g −1 (y)
= fX (x)dx
−∞
g −1 (y) .

= FX
(b) Now, if g is decreasing, then

{x ∈ X : g(x) ≤ y} = {x ∈ X : g −1 g(x) ≥ g −1 (y)} = {x ∈ X : x ≥ g −1 (y)}.




So, we can write Z


FY (y) = fX (x)dx
{x∈X :g(x)≤y}
Z
= fX (x)dx
{x∈X :x≥g −1 (y)}
Z ∞
= fX (x)dx
g −1 (y)

= 1 − FX g −1 (y) .


Example 1.16. Find the distribution of Y = g(X) = − log X, where X ∼


U([0, 1]). The cdf of X is

 0, for x < 0;
FX (x) = x, for 0 ≤ x ≤ 1;
1, for x > 1.

For x ∈ [0, 1] the function g(x) = − log x is defined on Y = (0, ∞) and it is


decreasing.

For y > 0, y = − log x implies that x = e−y , i.e., g −1 (y) = e−y and

FY (y) = 1 − FX g −1 (y) = 1 − FX e−y = 1 − e−y .


 

Hence we may write


FY (y) = 1 − e−y I(0,∞) .


This is exponential distribution function for λ = 1. 


24 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY

For continuous rvs we have the following result.


Theorem 1.11. Let X have pdf fX (x) and let Y = g(X), where g is a monotone
function. Suppose that fX (x) is continuous on its support X = {x : fX (x) >
0} and that g −1(y) has a continuous derivative on support Y = {y : y =
g(x) f or some x ∈ X }. Then the pdf of Y is given by
 d
fY (y) = fX g −1 (y) | g −1 (y)|IY .
dy
Proof.
d
fY (y) = FY (y)
dy
 d −1

dy 
F X g (y) , if g is increasing;
= d −1

dy
1 − F X g (y) , if g is decreasing.
−1
 d −1
fX g (y) dy g (y),

if g is increasing;
=  d −1
−fX g −1 (y) dy g (y), if g is decreasing.
which gives the thesis of the theorem. 

Example 1.17. Suppose that Z ∼ N (0, 1). What is the distribution of Y = Z 2 ?

For Y > 0, the cdf of Y = Z 2 is


FY (y) = P (Y ≤ y)
= P (Z 2 ≤ y)
√ √
= P (− y ≤ Z ≤ y)
√ √
= FZ ( y) − FZ (− y).

The pdf can now be obtained by differentiation:


d
fY (y) = FY (y)
dy
d √ √ 
= FZ ( y) − FZ (− y)
dy
1 √ 1 √
= √ fZ ( y) + √ fZ (− y)
2 y 2 y

Now, for the standard normal distribution we have


1 2
fZ (z) = √ e−z /2 , ∞ < z < ∞.

1.8. FUNCTIONS OF RANDOM VARIABLES 25

This gives,  
1 1 (−√y)2 /2 1 (√y)2 /2
fY (y) = √ √ e +√ e
2 y 2π 2π
1 1 −y/2
=√ √ e , 0 < y < ∞.
y 2π
This is a well known pdf function, which we will use in statistical inference. It is
called chi squared random variable with one degree of freedom and it is denoted
by χ21 .

Note that g(Z) = Z 2 is not a monotone function, but the range of Z, (−∞, ∞),
can be partitioned so that it is monotone on its sub-sets. 

Exercise 1.11. The pdf obtained in Example 1.17 is also pdf of a Gamma rv for
some specific values of its parameters. What are these values?

Exercise 1.12. Suppose that Z ∼ N (0, 1). Find the distribution of Y = µ + σZ


for constant µ and σ.

Exercise 1.13. Let X be a random variable with moment generating function MX .

(i) Show that the moment generating function of Y = a + bX, where a and b are
constants, is given by
MY (t) = eta MX (tb).

(ii) Derive the moment generating function of Y ∼ N (µ, σ 2 ). Hint: First find
MZ (t) for a standard normal rv Z.

You might also like