Statistical Inference
Class 2
Sugata Sen Roy
Department of Statistics,
University of Calcutta,
Calcutta, INDIA.
1 / 21
Probability Distributions : Binomial Distribution
I Suppose there are two possible outcomes - Success & Failure.
I Let there be n independent trials.
I Let π = P[success in a given trial] , 0 ≤ π ≤ 1.
I π is constant for each trial
I Let X = number of successes in the n trials.
I Then X ∼ Bin(n, π), i.e. X has the p.m.f.
n
f (x) = π x (1 − π)n−x , x = 0, 1, . . . , n.
x
I E(X) = nπ
I V ar(X) = nπ(1 − π)
I If n = 1 (a Bernoulli trial),
f (x) = π x (1 − π)1−x , x = 0 or 1.
2 / 21
Some similar Distributions
I Negative Binomial Distribution
I Here X is the number of trials to get r (given) successes.
x−1
f (x) = π r (1 − π)x−r , x = r, r + 1, . . . , ....
r−1
I Geometric Distribution (Special case when r = 1).
I Hypergeometric Distribution
I Let X be the number of S-cards when drawing n cards from
an urn having N cards, M of which are S-cards.
M N −M
x n−x
f (x) = , x = 0, 1, . . . , min(n, M ).
N
n
I Inverse Hypergeometric Distribution
3 / 21
Probability Distributions : Poisson Distribution
I Let X be the number of counts of an event.
I Examples : X can be the number of
I particles emitted from a radio-active source in a given time
I road accidents in a year on a given road
I annual visits of a patient to a doctor
I rivers in a country
I Then X is P(λ) if for some λ > 0,
e−λ λx
f (x) = , x = 0, 1, 2, . . .
x!
I E(X) = λ
I V ar(X) = λ
4 / 21
Probability Distributions : Normal Distribution
I Let X be a continuous random variable (theoretically) taking
values from −∞ to ∞.
I Examples : Height, Weight, IQ, measurement error.
I X is said to follow the normal distribution N (µ, σ 2 ) if the
probability density function of X is of the form
1 (x−µ)2
f (x) = √ e− 2σ2 , − ∞ < x < ∞.
2πσ
I E(X) = µ, −∞ < µ < ∞
I V ar(X) = σ 2 , σ > 0
5 / 21
Normal Distribution
6 / 21
Properties of the Normal Distribution
I Symmetric about µ.
I Moderately peaked and thin tailed (mesokurtic).
I 99.73% of X’s lie within [µ − 3σ, µ + 3σ].
I If X ∼ N (µ, σ 2 ), then the random variable
X −µ
τ= ∼ N (0, 1),
σ
i.e. a normal distribution with mean zero and variance 1.
I τ is known as the standard normal variable and its p.d.f. is
1 τ2
φ(τ ) = √ e− 2 , − ∞ < x < ∞.
2π
7 / 21
Probability Distribution and Sampling Distribution
I The above distributions that we just studied are referred to as
probability (or theoretical) distributions.
I Probability distributions characterize the population.
I Next we take a sample from the population and compute a
statistic.
I This statistic varies from sample to sample and hence is itself
a random variable.
I The distribution of this statistic is referred to as sampling
distribution.
8 / 21
A Small Experiment
I Let a population be of size 5 with X-values
?, ?, ?, ?, ?.
I Our interest is in the unknown population mean
µ = 51 (sum of the five numbers).
I Lack of time forces us to pick just two members randomly and
study the sample mean X = X1 +X 2
2
. hoping that it would be
a good representation of µ.
I I pick red and green and they are 2 and 8. Thus my
X = (2 + 8)/2 = 5.
I You pick blue and brown and they are 10 and 3. Thus your
X = (10 + 3)/2 = 6.5.
I Someone else picks green and purple and they are 8 and 12.
Thus his/her X = (8 + 12)/2 = 10.
I How many such samples are possible ?
9 / 21
A Small Experiment (contd.)
I With Replacement (WR) Sampling i.e. a unit chosen is
replaced back in the population before the next choice - here
the same unit can be chosen several times.
I This in our case will lead to 5 × 5 = 25 possible samples.
I In general, for a sample of size n from a population of size N ,
possible number of samples = N n .
I Without Replacement (WOR) Sampling i.e. a unit chosen is
not replaced back in the population again - here the sample
has all distinct units.
5
I This in our case will lead to = 10 possible samples.
2
I In general, for a sample of size n froma population
of size N ,
N
possible number of samples = .
n
10 / 21
A Small Experiment (contd.)
I In either case we can get several possible samples and
correspondingly several X.
(like 5, 6.5, 10, ... as in the above example)
I The distribution of this X’s is referred to as its sampling
distribution.
I Observe that in practice we will get only one sample.
I But theoretically we can study this distribution and make our
conclusions based on it.
11 / 21
Sampling from Binomial Distribution
I Sampling distribution of the sum of successes in n Bernoulli
trials :
I Suppose X1 , X2 , ..., Xn is a sample from a Bernoulli
experiment with
probability of success = π
and probability of failure = 1 − π.
I Let S = ni=1 Xi = number of successes in the sample.
P
Then S ∼ Bin(n, π).
I In fact, if Xi ’s are identically and independently distributed
(i.i.d.) as Bin(m, π),
n
X
S= Xi ∼ Bin(nm, π).
i=1
12 / 21
Sampling from a Poisson Distribution
I Sampling distribution of the sum of Poisson variables :
I Suppose X1 , X2 , ..., Xn is a sample from a Poisson
distribution, P (λ).
I Let S = ni=1 Xi = total count in the sample.
P
I Then S ∼ P (nλ).
13 / 21
Sampling from a N (µ, σ 2 ) Distribution
I Let X1 , X2 , ..., Xn be a sample from N (µ, σ 2 ) (i.i.d.).
I The n observations are independently drawn.
I Then S = ni=1 Xi ∼ N (nµ, nσ 2 )
P
I Also Sample Mean,
n
1X S σ2
X= Xi = ∼ N (µ, ).
n n n
i=1
I What about Sample Variance
n
1X
2
S = (Xi − µ)2 if µ is known
n
i=1
n
2 1 X
or s = (Xi − X)2 if µ is unknown?
n−1
i=1
14 / 21
Chi-square Distribution
I Define
Xi − µ
τi = , i = 1, . . . , n.
σ
I τi ’s i.i.d. ∼ the standard normal distribution N (0, 1).
I The sum of the squares of the standard normal variables
n n
Xi − µ 2
X X
2
χ = τi2 = ,
σ
i=1 i=1
is said to follow the Chi-square (χ2n ) distribution with n
degrees of freedom.
I p.d.f. of χ2n :
1 χ2 n
f (χ2 ) = n e− 2 (χ2 ) 2 −1 , 0 < χ2 < ∞.
2 Γ(n/2)
2
I positively skewed distribution
15 / 21
t-Distribution
I Let τ ∼ N (0, 1) independently of a χ2n distribution with n d.f..
I Then
τ
t= p ,
χ2n /n
is said to follow the t-distribution with n degrees of freedom.
I p.d.f. of tn :
− n+1
Γ( n+1 ) t2
2
f (t) = √ 2 n 1+ , − ∞ < t < ∞.
nπΓ( 2 ) n
I symmetric about 0
I leptokurtic.
16 / 21
F-Distribution
I Let χ21 and χ22 be two independent chi-square variables with
n1 and n2 degrees of freedom respectively..
I Then
χ2 /n1
F = 21 .
χ2 /n2
is said to follow the F-distribution with degrees of freedom
(n1 , n2 ).
I p.d.f. of Fn1 ,n2 :
n1 −1 n1 +n2
n1 /n2 n1 2 n1 2
f (F ) = F 1+ F , 0 < F < ∞.
β( n21 , n22 ) n2 n2
17 / 21
Distribution of the Sample Mean and Variance
I X1 , X2 , ..., Xn is a sample from a N (µ, σ 2 ).
2
I X ∼ N (µ, σn ).
I Hence, if σ is known,
√
X −µ n(X − µ)
τ= √ = ∼ N (0, 1).
σ/ n σ
I If µ is known,
n n
Xi − µ 2
Pn
nS 2 − µ)2
i=1 (Xi
X X
= = = τ 2 ∼ χ2n .
σ2 σ2 σ
i=1 i=1
18 / 21
Distribution of the Sample Mean and Variance
I If µ is unknown,
Pn
ns2 i=1 (Xi − X)2
= ∼ χ2n−1 .
σ2 σ2
I If σ is unknown,
√
X −µ n(X − µ)/σ τ
√ =q =q ∼ tn−1 .
s/ n (n−1)s2
χ2 /(n − 1)
σ2
/(n − 1) n−1
19 / 21
Statistical Inference
I Two distinct areas
I Problem of Estimation
I Problem of Hypothesis Testing
I The problem of estimation arises when we have no apriori idea
of the population parameter(s), or the characteristic(s) of the
population that we are interested in.
I We then use the sample observations to obtain a statistic
which can be used as a substitute (or estimator) of this
parameter.
I For example, we want to study the per capita income of
Egyptians.
I Do we even have a remote idea about it? How to find it ?
20 / 21
Statistical Inference (contd.)
I Estimation problem itself can be of rwo types
I Point Estimation - a single value is quoted as the subtsitute of
the unknown parameter.
I Interval Estimation - an interval is given such that we strongly
believe that the parameter lies within it.
I In hypothesis testing on the other hand, we have some
tentative idea or hypothesis about the population
parameter(s) under study.
I We then use the sample to find out whether this idea is
correct or not.
I For example, suppose we believe that the Indian annual per
capita income is Rs 10000/-.
I Is this correct or is it not ? We have a hypothesis testing
problem.
21 / 21