0% found this document useful (0 votes)
18 views

On Continuous Distributions

This document discusses continuous distributions and their properties. It defines probability density functions (PDFs) and cumulative distribution functions (CDFs) that describe continuous distributions. It also discusses how to derive the PDF of a function of a random variable from the original variable's PDF using techniques like the chain rule. Specific distributions covered include the normal, chi-squared, and Student's t-distributions. Methods for finding expectations and variances of continuous random variables are also presented.

Uploaded by

奔腾Surge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

On Continuous Distributions

This document discusses continuous distributions and their properties. It defines probability density functions (PDFs) and cumulative distribution functions (CDFs) that describe continuous distributions. It also discusses how to derive the PDF of a function of a random variable from the original variable's PDF using techniques like the chain rule. Specific distributions covered include the normal, chi-squared, and Student's t-distributions. Methods for finding expectations and variances of continuous random variables are also presented.

Uploaded by

奔腾Surge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

On Continuous Distributions

Tristan Chaang
February 8, 2022

In this article I will write about continuous distributions, including their properties and their deriva-
tions, without using advanced concepts such as measure theory.

Contents
1 PDFs and CDFs 2

2 Multiple Continuous Random Variables 3

3 Normal Distribution 5
3.1 Properties of the Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Deriving the pdf for N (0, 1) from B(n, p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Sample Statistics 9

5 Chi-Squared Distribution 10
5.1 Deriving the pdf for the Chi-Squared Distribution . . . . . . . . . . . . . . . . . . . . . . . 12

6 Student’s t-Distribution 14
6.1 Deriving the pdf for the t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1
1 PDFs and CDFs
Suppose we want to randomly choose a number in the interval [0, 100] so that ‘every number is equally
likely to be chosen’. The natural question to ask is, what do we mean exactly by equally likely? The set
[0, 100] has infinitely many elements, so the probability of choosing an exact given value is zero. How do
we resolve this? The answer is by using arbitrary intervals: The probability of choosing a number in the
interval [a, b] is (b − a)/100 for any 0 ≤ a ≤ b ≤ 100.
The distribution above is called the uniform distribution on [0, 100], denoted as U[0,100] . However, that
is not the only distribution out there. We can have distribution on other intervals, and even if we are just
taking a distribution on [0, 100], there can be distributions where it is more likely to choose one number
than the other, e.g. choosing a number nearer to 0 being more likely than choosing one near 100.
The way we describe a distribution is by using probability density functions (pdf ). For a continuous
random variable X with distribution D (written as X ∼ D), the pdf fX (x) of X is the function such that
Z b
P(a ≤ X ≤ b) = fX (x) dx
a

for any a ≤ b. For example, the pdf of X ∼ U[0,100] is


(
1/100 if 0 ≤ x ≤ 100;
fX (x) =
0 otherwise.

fX (x)

1/100
P(a ≤ X ≤ b) = (a − b)/100
x
a b

Figure 1: Uniform distribution U[0,100]

There are a few properties to notice. Since P(X ∈ R) = 1 and P(a ≤ X ≤ b) ≥ 0,


Z
fX (x) dx = 1 and ∀x ∈ R : fX (x) ≥ 0
∀x

for any pdf1 fX (x).


The cumulative distribution function (cdf ), or distribution function in short, of X is
Z x
FX (x) = P(X ≤ x) = fX (t) dt
−∞

Notice we used t in the integrand instead of x because we have already used the symbol x in the bounds.
The cdf for X ∼ U[0,100] is thus

0
 if x < 0;
FX (x) = x/100 if 0 ≤ x ≤ 100;

1 if x > 100.

1
We assume these pdfs are Riemann integrable. You can ignore this if you haven’t heard of this. It roughly means it
cannot be weird functions such as f = 0 at rational numbers and f = 1 otherwise. We cannot integrate it the typical way.

2
A simple corollary is that FX′ (x) = fX (x). Using this corollary, we can find the pdf of g(X) where g is
a function taking X as input:
d
fg(X) (x) = P(g(X) ≤ x).
dx

Writing the pdf of 3X + 1 in terms of the pdf of X


   
d d x−1 1 x−1
f3X+1 (x) = P (3X + 1 ≤ x) = P X≤ = fX .
dx dx 3 3 3

Writing the pdf of X 2 in terms of the pdf of X


d
fX 2 (x) = P(X 2 ≤ x). We have two cases:
dx
d d
x≤0: P(X 2 ≤ x) = (0) = 0
dx dx
d d √ √
x>0: P(X 2 ≤ x) = P(− x ≤ X ≤ x)
dx dx
d √ √
= (FX ( x) − FX (− x))
dx √ √
fX ( x) fX (− x)
= √ + √ .
2 x 2 x

In the last line we used the chain rule. Therefore,



0 if x ≤ 0;
fX 2 (x) = fX (x) + fX (−x)
 √ if x > 0.
2 x

The expectation, or mean, of a continuous random variable X is defined as


Z
E(X) = xfX (x) dx
∀x

The Law of the Unconscious Statistician (LOTUS)


Z ∞ Z ∞
E(g(X)) = xfg(X) (x) dx = g(x)fX (x) dx
−∞ −∞

The proof of this is complicated. I will write a new article about this.

and its variance and standard deviation are defined as


p
V ar(X) = E(X 2 ) − E(X)2 and Sd(X) = V ar(X)

2 Multiple Continuous Random Variables


If we have two variables X and Y , we can construct a pdf with two inputs fX,Y (x, y) so that
ZZ
P((X, Y ) ∈ A) = fX,Y (x, y) dx dy for any closed region A.
A

3
This means, instead of calculating the probability by finding the area under a curve of a 2D pdf graph, we
calculate the probability by finding the volume under a surface of a 3D pdf graph. Such a pdf must also
satisfy similar properties such as the total volume under the surface is 1 and the pdf is always nonnegative.
If fX,Y (x, y) = fX (x) · fY (y) for all x, y, we say that the random variables X and Y are independent.
We will normally deal with independent variables.
If X ∼ D1 and Y ∼ D2 are independent variables, we would like to find the pdf of Z = X + Y :
d
fZ (z) = P(Z ≤ z)
dz
d
= P(Y ≤ z − X)
dz Z Z
∞ z−x
d
= fX,Y (x, y) dy dx
dz −∞ −∞
Z ∞ Z z−x 
d
= fX (x) · fY (y) dy dx
−∞ dz −∞
Z ∞
= fX (x) · fY (z − x) dx
−∞

This will be important later on. Also, for some positive constant c, the pdf of Z = cX is
d d  z 1 z 
fZ (z) = P (Z ≤ z) = P X ≤ = fX
dz dz c c c
Finally, let’s find the pdf of Z = X/Y .
d
fZ (z) = P(Z ≤ z)
dz
d
= P(X ≤ Y z)
dz Z Z
∞ yz
d
= fX,Y (x, y) dy dx
dz −∞ −∞
Z ∞ Z yz 
d
= fY (y) · fX (x) dx dy
−∞ dz −∞
Z ∞
= y · fY (y)fX (yz) dy
−∞

If X and Y are two random variables (might not be independent!), we have:

Linearity of Expectation
ZZ
E(X + Y ) = (x + y)fX,Y (x, y) dy dx
∀x,y
ZZ ZZ
= xfX,Y (x, y) dy dx + yfX,Y (x, y) dy dx
∀x,y ∀x,y
Z Z  Z Z 
= x fX,Y (x, y) dy dx + y fX,Y (x, y) dx dy
∀x ∀y ∀y ∀x
Z Z
= xfX (x) dx + yfY (y) dy
∀x ∀y
= E(X) + E(Y ).
Z Z
and E(cX) = cxfX (x) dx = c xfX (x) dx = cE(X).
∀x ∀x

4
If X and Y are two independent variables,

Multiplicity of Expectation for Independent Variables


Z Z ZZ
E(X)E(Y ) = xfX (x) dx yfY (y) dy = (xy)fX,Y (x, y) dx dy = E(XY ).
∀x ∀y ∀x,y

(Semi)linearity of Variance for Independent Variables

V ar(X + Y ) = E((X + Y )2 ) − E(X + Y )2


= E(X 2 ) + 2E(XY ) + E(Y 2 ) − E(X)2 − 2E(X)E(Y ) − E(Y )2
= V ar(X) + V ar(Y ).
and V ar(cX) = E(c2 X 2 ) − E(cX)2 = c2 V ar(X).

Proving that X and X + Y are Not Independent if X and Y are Independent

P(x ≤ X ≤ x + δx and y − X ≤ Y ≤ y − X + δy)


fX,X+Y (x, y) = lim lim
δx→0 δy→0 δxδy
R x+δx hR i
y−u+δy
x
f X (u) y−u
f Y (v) dv du
= lim lim
δx→0 δy→0 δxδy
R x+δx
fX (u)fY (y − u) du
= lim x
δx→0 δx
= fX (x)fY (y − x) ̸= fX (x)fY (y)

3 Normal Distribution
We come to our first famous distribution: the normal distribution. Any continuous random variable X
having the pdf in the form2
 2 !
1 1 x−µ
fX (x) = √ exp − for some µ and positive σ
σ 2π 2 σ

is said to belong to a normal distribution, denoted as X ∼ N (µ, σ 2 ). Figure 2 below shows the pdf of
N (µ, σ 2 ). Even though this distribution is determined by the mean µ and variance σ 2 , we can always
apply a transformation to the pdf of N (µ, σ 2 ) by denoting Z = (X − µ)/σ, transforming X ∼ N (µ, σ 2 )
into Z ∼ N (0, 1), as shown in Figure 2.
This transformation is called standardisation. The resultant pdf is much simpler, given by
 
1 1 2
fZ (z) = √ exp − x
2π 2

2
Here exp(x) means ex .

5
fX (x)

x
µ

Figure 2: Normal distribution N (µ, σ 2 )

fZ (z)

z
0

Figure 3: Standard normal distribution N (1, 0)

3.1 Properties of the Normal Distribution


These properties are fun to prove using what we have learnt so far:

• The distribution is symmetric about µ.

• The mean of X ∼ N (µ, σ 2 ) is µ.

• The variance of X ∼ N (µ, σ 2 ) is σ 2 .

6
The Sum of Two Independent Normal Variables is Normal

Say X ∼ N (µ1 , σ12 ), Y ∼ N (µ2 , σ22 ), then the pdf of Z = X + Y is


Z ∞
fZ (z) = fX (x)fY (z − x) dx
−∞
∞ exp(− 12 ( x−µ ) ) exp(− 12 ( z−x−µ
1 2 2 2
))
Z
= √ σ1 · √ σ2 dx
−∞ σ1 2π σ2 2π
Z ∞ "  2  2 #
1 1 x − µ1 1 z − x − µ2
= exp − − dx
2πσ1 σ2 −∞ 2 σ1 2 σ2
Z ∞
exp −A(x + some linear in z)2 + some quadratic in z dx
 
= (constant)
−∞
Z ∞
2
= (constant)e quadratic in z
e−A(x+ some linear in z) dx
−∞
quadratic in z
= (constant)e

Therefore Z is normal. We do not have to specifically find out what the constants are because
we know E(Z) = E(X) + E(Y ) = µ1 + µ2 and V ar(Z) = V ar(X) + V ar(Y ) = σ12 + σ22 . Thus
X + Y ∼ N (µ1 + µ2 , σ12 + σ22 ).

3.2 Deriving the pdf for N (0, 1) from B(n, p)


One of the reasons we study normal distributions is due to the binomial distribution B(n, p). Recall the
binomial distribution formula
 
n x n−x
Bn,p (x) = p q where x = 0, · · · , n and q = 1 − p.
x

If we plot the graphs of Bn,p (x) with fixed p and varying n, we get the following:

Figure 4: Binomial distributions B(5, 25 ), B(7, 25 ), B(11, 25 ).

7
As we see above, the graph gets wider as n increases because the domain enlarges. The height of the
graph is also falling because as the domain enlarges, the sum of the probabilities must still be equal to 1.

We know that the mean and standard deviation of B(n, p) are µ = np and σ = npq respectively. So, in
order to get a converging sequence of graphs, let’s shift the distribution of B(n, p) to the left by µ, shrink
it horizontally by σ, and then stretch it vertically by σ to maintain the sum-equals-one property:

Figure 5: The graphs of y = σBn,p (σx + µ) for n = 5, 7, 11.

Now we’re converging (or at least we seem to)! Suppose they converge to N (x). For fixed n, the point
(x, σBn,p (σx+µ)) on the adjusted Binomial graph corresponds to (u, Bn,p (u)) = (σx+µ, Bn,p (σx+µ)) on the
original Binomial graph. Therefore, the gradient of the line connecting (u, Bn,p (u)) and (u + 1, Bn,p (u + 1))
is
   
Bn,p (u + 1) − Bn,p (u) n u+1 n−u−1 n u n−u
= p q − p q
(u + 1) − u u+1 u
This line, after transforming it onto the adjust Binomial graph, is shrinked horizontally by σ but stretched
vertically by σ, so its gradient is σ 2 times larger than the expression above, i.e.
    
′ 2 n u+1 n−u−1 n u n−u
N (x) = lim σ p q − p q
n→∞ u+1 u
This is hard to handle, so let’s divide this by N (x) = lim σBn,p (σx + µ) = σBn,p (u).
n→∞

n n u n−u
 u+1 n−u−1 
N ′ (x) u+1
p q − u
p q
∴ = lim σ 2 ·
σ nu pu q n−u

N (x) n→∞

√ np − u − q
= lim npq ·
n→∞ q(u + 1)

√ np − x npq − np − q
= lim npq · √
n→∞ qx npq + npq + q

−(pqx)n − (q pq)n1/2
= lim √
n→∞ (pq)n + (qx pq)n1/2 + q

= −x
N ′ (x) d
but = ln(N (x)), so
N (x) dx
ln(N (x)) = −x2 /2 + c
N (x) = C exp(−x2 /2)
where C is a constant. Now we just have to solve for C in
Z ∞  2
x
C exp − dx = 1.
−∞ 2

8
R∞ √
However, −∞ exp(−x2 ) dx = π is a classic calculus problem (perhaps I will write about it). Therefore,

in this case C must be 1/ 2π. In conclusion,
 2
1 x
N (x) = √ exp −
2π 2

is the desired curve.

4 Sample Statistics
Suppose the variables X1 , · · · , Xn are independent but all have the same distribution D. We say that
X1 , · · · , Xn are independent observations of X ∼ D. A set of independent observations {X1 , · · · , Xn } is a
sample. Given a sample S, we can construct any function taking inputs from S. These functions are called
statistics. For example,
X 1 + · · · + Xn
Sample mean: X=
n
(X 1 − X)2 + · · · + (Xn − X)2
Sample variance: s2 =
n−1

are commonly-used statistics. From section 2, we know that we can find the distribution of X by finding
its pdf.
Suppose D = N (µ, σ 2 ). Since the sum of two normal variables is normal, by induction, X1 + · · · + Xn
is normal. By the linearity of expectation and variance,
1X 1
E(X) = E(X) = · nµ = µ.
n n
1 X 1 σ2
V ar(X) = 2 V ar(X) = 2 · nσ 2 = .
n n n
Therefore X ∼ N (µ, σ 2 /n).

Central Limit Theorem (CLT)

Even when X is not normal, if X1 , · · · , Xn are independent observations of X then the dis-
tribution of X converges to N (µ, σ 2 /n) when n → ∞.

This is also a hard theorem to prove, I will write about it in the future.

Suppose we want to estimate the mean µ and variance σ 2 of a population. If we were given access to
every possible data value of the population, then we can get µ and σ 2 easily. However, when we only take
a sample, we can only evaluate X and s2 . First of all, why is the denominator of s2 , the sample variance,
n − 1 instead of n, the number of values taken? Second of all, how are we sure that X and s2 give good
estimates of µ and σ 2 ? Third of all, how confident are we in these estimates?
We answer the first and second question together. We say that a statistic T is an unbiased estimate of
a value θ if E(T ) = θ. For example, X is an unbiased estimate of µ because E(X) = µ asPshown above.
Since the formula of X looks exactly the same as that for µ, one would expect that S 2 = (Xi − X)2 /n

9
is also an unbiased estimate for σ 2 . However, since σ 2 = E(X 2 ) − E(X)2 = E(X 2 ) − µ2 ,
n
!  Pn 2 !
1 X X i
E(S 2 ) = E X2 − E i=1
n i=1 i n
n
!
1 X X
= E(X 2 ) − 2 E Xi2 + 2 Xi Xj
n i=1 i<j
 
2 1 2 2 n
= E(X ) − E(X ) − 2 · · E(X)E(X)
n n 2
 
1 n−1 2
= 1− (σ 2 + µ2 ) − µ
n n
n−1 2
= σ ̸= σ 2
n
suggests otherwise. However, in light of the (n − 1)/n term, we see that by replacing the denominator of
S to n − 1, we have E(s2 ) = σ 2 instead. Therefore s2 is an unbiased estimator for σ 2 , while S 2 is not.
To answer the third question, we previously proved that X ∼ N (µ, σ 2 /n), so for large values of n, the
standard deviation is very small and hence we are very confident that X is close to µ. The problem is,
how confident is confident, especially when n is small? If we want to find an interval so that X has a 0.9
probability to lie in it, we just standardise and solve
 
X −µ
P −z0.95 < √ < z0.95 = 0.9
σ/ n
 
σ σ
P X − z0.95 √ < µ < X + z0.95 √ = 0.9
n n
where z0.95 = 1.645 is the value of z such that P(Z < z) = 0.95 whereby Z is normal. This is mathemati-
cally correct, but we have no idea what the value of σ is in the first place. We will resolve this problem in
the t-distribution section.

5 Chi-Squared Distribution
In section 1, we found that the pdf of Z = X 2 is

0 if z ≤ 0;
√ √
fZ (z) = fX ( z) + fX (− z)
 √ if z > 0.
2 z

In this section, we will study the case when X ∼ N (0, 1), so fX (x) = exp(−x2 /2)/ 2π, giving

0 if z ≤ 0;
fZ (z) = exp(−z/2)
 √ if z > 0.
2πz
Therefore, the distribution of X 2 looks like Figure 6 (asymptote at x = 0):
This is called the chi-squared distribution with 1 degree of freedom. In general, the chi-squared distribu-
tion with n degrees of freedom, denoted as χ2n , is the distribution of Qn = X12 + · · · + Xn2 where X1 , · · · , Xn
are independent observations of X ∼ N (0, 1).
Firstly, the sum of two independent chi-squared distributed variables with a and b degrees of freedom
respectively is chi-squared distributed with a + b degrees of freedom. This is simply because (X12 + · · · +
Xa2 ) + (Y12 + · · · + Yb2 ) is just the sum of squares of a + b standard normal variables.

10
fX 2 (x)

Figure 6: Distribution of X 2 if X ∼ N (0, 1)

(n − 1)s2
∼ χ2n−1 if X1 , · · · , Xn ∼ N (µ, σ 2 ) are n independent observations.
σ2
Standardise Zi = (Xi − µ)/σ. (n − 1)s2 /σ 2 can be written as a sum of n − 1 squares:

(n − 1)s2 X
2 (Z1 + · · · + Zn )2
= Z i −
σ2 1≤i≤n
n
n−1 X 2 2 X
= Z − Zi Zj
n 1≤i≤n i n 1≤i<j≤n
r !2
n−1 1 1
= Zn − p Zn−1 − · · · − p Z1
n n(n − 1) n(n − 1)
n−2 X 2 X
+ Zi2 − Zi Zj
n − 1 1≤i≤n−1 n − 1 1≤i<j≤n−1
n
r !2
X k−1 1 1
= Zk − p Zk−1 − · · · − p Z1 by induction
k=2
k k(k − 1) k(k − 1)

It is a fun exercise to prove that each squared term at the end is standard normal. However, it is
extremely difficult to prove that they are mutually independent using the tools we’ve learnt so far
(we can proceed something similar to how we proved X and X + Y are dependent in section 4).
We will accept the fact that the above expression is a sum of n − 1 independent standard normal
variables, and hence has the chi-squared distribution with n − 1 degrees of freedom.

Since its pdf for x ≤ 0 is obviously 0, we have


(
0 if x ≤ 0;
fQn (x) =
gn (x) if x > 0

for some function gn (x). Henceforth we just find gn (x).

11
5.1 Deriving the pdf for the Chi-Squared Distribution

Finding the pdf for χ22

exp(−x/2)
From above, g1 (x) = √ . Let’s find the pdf for Q2 ∼ χ22 for x > 0, which is
2πx
Z ∞
g2 (x) = fX12 (t)fX22 (x − t) dt
−∞
Z x
= g1 (t)g1 (x − t) dt
Z0 x
exp(−t/2) exp(−(x − t)/2)
= √ · p dt
0 2πt 2π(x − t)
exp(−x/2) x
Z
1
= p dt
2π 0 t(x − t)
  x
exp(−x/2) 2t
= arcsin −1
2π x t=0
1
= exp(−x/2).
2

The distribution for χ22 looks like Figure 8:

fX 2 (x)

Figure 7: Chi-Squared Distribution χ22

Now that we have g1 and g2 , we can construct a recursive relation of gn . Since X12 + · · · + Xn2 =
2 2 2
(X1 + · · · + Xn−2 ) + (Xn−1 + Xn2 ) and X12 + · · · + Xn−2
2
and Xn−12
+ Xn2 are independent, the random variable
Qn is just the sum of two independent variables Q2 and Qn−2 . Hence
Z ∞
gn (x) = fQn−2 (t) · fQ2 (x − t) dt
−∞
1 x
 
x−t
Z
= gn−2 (t) exp − dt
2 0 2

If we write gn (x) = 2−n/2 hn (x) exp(−x/2), the above expression simplifies to


Z x
hn (x) = hn−2 (t) dt
0

12
Finding the pdf for Chi-Squared Distributions with Even Degree of Freedom

xk−1
Since h2 (x) = 1, we can prove by induction that h2k = , because
(k − 1)!
Z x k−1
xk−1 t xk
h2k = ⇒ h2(k+1) = dt = .
(k − 1)! 0 (k − 1)! k!

In conclusion, we have that the pdf for χ22k is



0 if x ≤ 0;
1
 k xk−1 e−x/2 if x > 0.
2 (k − 1)!

Here, we find the pattern by integrating hi (x) many times. A similar method can be used to find the
pdf of those with odd degrees of freedom.

Finding the pdf for Chi-Squared Distributions with Odd Degree of Freedom

x−0.5 22k k!xk−0.5


Since h1 (x) = √ , we can prove by induction that h2k+1 = √ , because
π (2k)! π
Z x 2k k−0.5
22k k!xk−0.5 2 k!t 22k k!xk+0.5 22k+2 (k + 1)!xk−0.5
h2k+1 = √ ⇒ h2k+3 = √ dt = √ = √ .
(2k)! π 0 (2k)! π (k + 0.5)(2k)! π (2k + 2)! π

In conclusion, we have that the pdf for χ22k+1 is



0 if x ≤ 0;
2k−0.5 k! k−0.5 −x/2
 √ x e if x > 0.
(2k)! π

Combining both cases, we have that the pdf of χ2n in general is




 0 if x ≤ 0;





 1
· xn/2−1 e−x/2 if n is even and x > 0;


fQn (x) = 2 (n/2 − 1)!
n/2




2n/2−1 ( n−1



2
)! n/2−1 −x/2
√ ·x e if n is odd and x > 0.



(n − 1)! π

13
The Gamma Function

The gamma function Γ : R → R≥0 is defined as


Z ∞
Γ(x) = tx−1 e−t dt.
0

Some properties include



• Γ(1) = 1, Γ(0.5) = π

• Γ(x) = (x − 1)Γ(x − 1)

• Γ(x)Γ(x + 0.5) = 21−2x Γ(2x)

Using these facts, we can prove easily by induction that for n = 0, 1, · · · ,

• Γ(n + 1) = n!

π(2n)!
• Γ(n + 0.5) =
22n (n − 1)!

Using the Gamma function, you can simplify the pdf as



0 if x ≤ 0;
fQn (x) = 1
 n/2 · xn/2−1 e−x/2 if x > 0.
2 Γ(n/2)

fQn (x)

Figure 8: Chi-Squared Distributions χ21 , χ22 , χ23 , χ24 , χ25 , χ26

6 Student’s t-Distribution
We answer the question raised at the end of section 4: Let X1 , · · · , Xn be n independent observations of
N (0, 1). How do we find an interval, in terms of X, such that µ has 0.9 probability to lie in it? Previously,
we considered
X −µ
√ ∼ N (0, 1)
σ/ n
 
σ σ
⇒ P X − z0.95 √ < µ < X + z0.95 √ = 0.9
n n

14
but we noticed that σ is unknown. Therefore, we will consider the distribution of

X −µ
√ ∼ tn−1 (0, 1)
s/ n

instead, where s is the sample variance. This distribution will differ according to n and is known as the
Student’s t-distribution, or t-distribution in short, with ν = n − 1 degrees of freedom.

6.1 Deriving the pdf for the t-Distribution


Note that
   
X −µ X −µ s
√ = √ /
s/ n σ/ n σ

Recall from section 4 and 5 that


X −µ
R= √ ∼ N (0, 1)
σ/ n
(n − 1)s2
T = ∼ χ2n−1
σ2
We will accept the fact that X and s2 are independent without proof (Informally, they measure completely
different things: X measures size and s2 measures spreadness, so √ one does not√affect the other). Therefore
U and V are independent. It remains to find the pdf of W = n − 1 · R/ T where R ∼ N (0, 1) and
T ∼ χ2n−1 . Since the t-distribution is symmetric, let’s just study w > 0 below. Denote ν = n − 1.

d p
fW (w) = P(R ≤ w T /ν)
dw
Z ∞ Z w√t/ν
d
= fT (t) fR (r) dr dt
dw 0 −∞
Z ∞ r r !
t t
= fT (t) · fU w dt
0 ν ν
Z ∞ ν/2−1 r
w2 t
 
t exp(−t/2) t 1
= · √ exp − · dt
2ν/2 Γ ν2

0 ν 2π 2 ν
 
Z ∞
w2 
 
1 (ν−1)/2
 1
= √ t exp −
 1+ t
 dt
2ν/2 Γ ν2 2πν 0 2 ν
| {z }
K
Z ∞
1
= ν
 √ (Kt)(ν−1)/2 exp (−Kt) d(Kt)
ν/2
2 Γ 2 2πν · K (ν+1)/2
0
 
1 − ν+1 ν+1
= √ ·K 2 ·Γ
2ν/2 Γ ν2 2πν 2
− ν+1
Γ ν+1
 
2 w2 2
= √ 1 +
Γ ν2

πν ν

15
f (x)

Figure 9: t-Distributions t1 , t2 , t3 , t4 , t5 , t6

In Figure 9, the one with the lower y-intercept has the lower degree of freedom.

16
References
[1] Law of the Unconscious Statistician
https://en.wikipedia.org/wiki/Law of the unconscious statistician

[2] Normal Distribution


https://en.wikipedia.org/wiki/Normal distribution

[3] Chi-Squared Distribution


https://en.wikipedia.org/wiki/Chi-squared distribution

[4] Gamma Function


https://en.wikipedia.org/wiki/Gamma function

[5] T-Distribution
https://en.wikipedia.org/wiki/Student%27s t-distribution

17

You might also like