Slide MathStat DS 2025
Slide MathStat DS 2025
August 3, 2025
job
s
od u rse
co
Go
Pa her
es Boo
cis
ss
Ot k’
er
se
Score +
C las s e x
xercises
THEORY
Certific
High
om
ate
se
s
C
p uter exerci
Statist
ic works
[Link]/buiduonghai Mathematical Statistics August 3, 2025 3 / 273
Topics
1 Introduction
2 Descriptive Statistics
3 Sampling Distribution
4 Point Estimation
5 Confidence Interval
6 Hypothesis Testing - One Sample
7 Inferences on Two Samples
8 Analysis of Variance
9 Non-parametric Test
10 Linear regression
11 Basic Bayes Statistics
1.1. Concept
1.2. Branches of Statistics
1.3. Data Sources
1.4. Population and Sample
1.5. Structure of Classical Data
1.6. Types of Variable
1.7. Revision of Probability
Reference
Book [1] Chapter 1, pp.1 - 9
Revision of Probability: Book [1] Chapter 2,3,4,5
[Link]
Inferential Statistics
predict, forecast, verify knowledge by analyzing data
Descriptive Inferential
Data Probability
Statistics Statistics
Population Sample
Set of all interested elements subset of Population
Size N, maybe infinite n, finite
Value Parameter Statistics
•
• • • • Sample 1
• •••
•••
•
•
• ⋆ ⋆ ⋆ Sample 2
•⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆ ⋆⋆⋆
• ⋆⋆⋆
Population, N = 100
Include: Obersvation (in row); Variable (in column); Value (in cell)
1 variable: univariate
2 variables: bivariate
n variables: multivariate
Bigdata: Methodology for many more type, complex structure,
non-structure data
Quantitative (Scale)
Discrete: number of values is countable
Continuous: uncountable: time, lenght, weight,...
∗ By level of calculation
Interval: no true zero value, +, − but not ×, ÷
Ratio: true zero value, all mathematical operation
In practice: Nominal, Ordinal, Scale
August 3, 2025 12 / 273
1.6. Type of Variables
Ratio
Probability
mes(A)
P(A) =
mes(Ω)
Probability of intersection
P(A ∩ B) = P(A)P(B|A)
Expected value:
X Z +∞
E (X ) = xi pi ; E (X ) = xf (x)dx
i −∞
p
σX = V (X )
Covariance
n o
Cov (X , Y ) = E X − E (X ) Y − E (Y ) = E (XY ) − E (X )E (Y )
Correlation coefficient
Cov (X , Y )
ρX ,Y =
σX σY
E (X ) = µ1 ; V (X ) = µc2 = µ2 − µ21
Skewness and Kurtosis
3 4
E X − E (X ) E X − E (X )
Skew = ; Kurt =
σ3 σ4
E (c) = c V (c) = 0
E (X + c) = E (X ) + c V (X + c) = V (X )
E (cX ) = cE (X ) V (cX ) = c 2 V (X )
E (X ±Y ) = E (X )±E (Y ) V (X ±Y ) = V (X )+V (Y ) ± 2Cov (X , Y )
P P P P
E i Xi = i E (Xi ) V i Xi = i V (Xi ) : Xi independent
Example 1.1
Players A and B play a game that have no draw, they are equally likely to
win each match. They intend to play 9 matches, who wins more will take
prize of 1 thousand USD. But after 7 matches and ratio of A:B is 4:3.
How to distribute the money?
Example 1.2
A couple have an online appointment, from 0:00 to 1:00, the first comer
will wait only 20 minutes. Find the probability that they meet each other.
Example 1.3
Consider the Vietnamese gamble “danh de” and its profit X
Find E (X ) and V (X ) when play 1 ([Link]) in one day;
Compare playing 10 mil in one day, and play 10 days, each day 1 mil.
X −µ
X ∼ N(µ, σ 2 ) ⇒ Z = ∼ N(0, 1)
σ
b−µ
P(X < b) = P Z < ; P(Z < b ⋆ ) = P(X < µ + b ⋆ σ)
σ
f
Z ∼ N(0, 1) X −µ
Z=
σ
X ∼ N(µ, σ 2 )
X = µ + Zσ
• •
0 b−µ µ b x
σ
= b⋆ = µ + b⋆ σ
P(Z > zα ) = α
z1−α = zα
z0 = +∞; z1 = −∞; z0.5 = 0
z0.05 = 1.645; z0.025 = 1.96
Z ∼ N(0, 1)
α
• •
z1−α 0 0 zα
= −zα
Example 1.4
Income X is normal distributed with mean of 500 USD and variance of
400 USD2 .
(a) Find the probability that X > 510
(b) With probabilty of 0.95, find the upper limit of X
(c) With probabilty of 0.95, find the lower limit of X
Example 1.5
X ∼ N(µ, σ 2 ). With probability of (1 − α)
(a) Find the upper limit of X
(b) Find the lower limit of X
(c) Find an interval around the mean that X falls into
f χ2 (v )
α
•
0 χ2(v )α
Definition
Z
If Z ∼ N(0, 1) and X ∼ χ2 (v ), independent, then T = p is Student
X /v
distributed v degree of freedom, denoted by T ∼ T (v ).
f
v = 100
v =2
•
0 t(100)α
Definition
X1 /v1
If X1 ∼ χ2 (v1 ) and X2 ∼ χ2 (v2 ), independent, then F = is Fisher
X2 /v1
distributed v1 , v2 degree of freedom, denoted by F ∼ F (v1 , v2 ).
Reference
For χ2 , T , F distribution, reference the book [1] pp. 315 - 325.
Example 1.6
Find the following critical values and their probability meaning
χ2(20)0.05 , χ2(20)0.95
t(20)0.05 , t(20)0.95
t(200)0.025
f(2,10)0.05 , f(10,2)0.05 , f(2,10)0.95
Example 1.7
Error of a measurement is N(0, 1), and repair cost is square of error.
With probability of 0.95, find upper limit of cost of a measurement
Find probability that total cost of 3 independent measurements is
greater than 6.49
Theorem (simplified)
If X1 , X2 , ..., Xn are independent, identically distributed with mean of
E (X ), variance of V (X ) then
n
n→∞
X
T = Xi −−−→∼ N nE (X ), nV (X )
i=1
and Pn
i=1 Xi n→∞ V (X )
X = −−−→∼ N E (X ),
n n
Value x1 x2 ··· xk
Frequency f1 f2 ··· fk
Relative Freq. p̂1 p̂2 ··· p̂k
Percentage p̂1 % p̂2 % ··· p̂k %
Example 2.1
Data from customer survey, n = 50 observations
E = Excellent F = Fair
G = Good B = Bad
Bad Excellent
Evaluation Excel. Good Fair Bad
12%
Freq. 10 25 9 6 Fair 20%
% 20% 50% 18% 12% 18%
50%
Table: Evaluation distribution
Good
30
25
25
Evaluation Freq. %
20
Excel. 10 20%
Good 25 50% 15
Fair 9 18% 10
10 9
Bad 6 12% 6
Table: Evaluation distribution 5
0
Excel. Good Fair Bad
11 11
7 7
6 6
5 5
4 4
2 2
1 1
Female
Male Female
Male
55%
40% 56% 78% 83%
47%
30%
60%
23%
17% 44%
13%
10%
5% 22%
17%
Age 23 26 28 32 35 36 38 40 43 47 50 54 58 63 Sum
Freq. 1 2 2 2 4 3 5 8 7 4 5 4 2 1 50
% 2 4 4 4 8 6 10 16 14 8 10 8 4 2 100%
10
8
8
7
6
5 5
4 4 4
4
3
2 2 2 2
2
1 1
0
23 26 28 32 35 36 38 40 43 47 50 54 58 63
Figure: Distribution of Age
Age 23 26 28 32 35 36 38 40 43 47 50 54 58 63 Sum
Freq. 1 2 2 2 4 3 5 8 7 4 5 4 2 1 50
% 2 4 4 4 8 6 10 16 14 8 10 8 4 2 100%
10
8
8
7
6
5 5
4 4 4
4
3
2 2 2 2
2
1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Age 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 Sum
Freq. 1 4 2 12 15 4 9 2 1 50
% 2 8 4 24 30 8 18 4 2 100%
20
15
15
12
10 9
5 4 4
2 2
1 1
0
20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
Figure: Distribution of Age
25
20 19
15 14
11
10
5
5
1
0
20-29 30-39 40-49 50-59 60-69
Figure: Histogram of Age
38% • •
Distribution of Age 98% 100%
40%
•
• 100%
Distribution of Waiting time 96%
•
30% 86%
Time Freq. % Cumm. •
0-5 15 30% 30% 70%
5 - 10 20 40% 70%
10 - 15 8 16% 86% 16%
15 - 20 5 10% 96%
20+ 2 4% 100% • 10%
30%
Table: Distribution of Waiting 4%
time
0-5 5-10 10-15 15-20 20+
Sharp
Flat
Symmetrical
Data: 21, 22, 32, 34, 34, 35, 40, 42, 46, 48, 49, 52, 52, 57, 61
2 1 2
3 2 4 4 5
4 0 2 6 8 9
5 2 2 7
6 1
Data: 329, 332, 335, 337, 339, 340, 344, 345, 347, 350, 352, 355,
358, 361, 364, 365, 372
Output
12 150 17 210 150
13 110 17 260
13 150 18 240 100
13 200 18 200
15 170 19 240 50
14 180 19 280
0
Table: Output - Labor data 10 12 14 16 18 20
Labor
60
Applied for multi-variate data 50 C 23
D 30
Project R& D Adv. Profit 40
Advertising
A 30 35 50 A 50
B 40 25 40 30 F 20
C 45 50 23 B 40
20
D 15 45 30
E 50 10 15 10 E 15
F 10 30 20
Table: Projects’ data 0
0 10 20 30 40 50 60
R&D
Red
Mean
Median
1 2 3 4 5 6 7 8 9 10
Mode
Quartile and Quantile Green
Boxplot and Outlier
1 2 3 4 5 6 7 8 9 10
Example 2.2
Blue
Compare the wage in 3 firms ($.th.)
with histograms on the right
1 2 3 4 5 6 7 8 9 10
Example 2.3
Compare the means of households’ income ($ thousand) in two areas
Area (A): 10, 11, 15, 17, 12
Area (B): 5, 5, 6, 8, 10, 50
August 3, 2025 51 / 273
Median
Median of data
Median is the middle value of an ordered data, denoted by me or x̃.
n+1
Median is value at position of .
2
Example 2.4
Find the median
Data (A): 10, 11, 15, 17, 12
Data (B): 5, 5, 6, 8, 10, 50
Data (C): XS, XS, S, M, L, XL, XL
Mode of data
Mode is the most frequently value, denoted by m0 . Data may have 0, 1, or
more than 1 mode.
Example 2.5
Formula of Quantile
In ordered data size n, quantile level β, denoted by qβ is computed as
following
(n + 1)β = {integer .decimal}
qβ = xint. + (dec.)(xint.+1 − xint. )
Example 2.6
Find quartiles and quantile 30% of data
Data (A): 10, 11, 15, 17, 12
Data (B): 5, 5, 6, 8, 10, 50
• • • • •
BOXPLOT
1.5 · IQR IQR 1.5 · IQR
Outlier Outlier
Example 2.7
Data is: 3, 3, 3, 4, 4, 5, 5, 7, 10
Is data approximately Normal distributed?
x q Normal
3 q0.1 -1.28 10
3 q0.2 -0.84
3 q0.3 -0.52
4 q0.4 -0.25 7
4 q0.5 0
5 q0.6 0.25 5
4
5 q0.7 0.52
3
7 q0.8 0.84
10 q0.9 1.28 -2 -1 0 1 2
• •• ••••• •
•••• •••••••
• •
• •• ••••
•
• •• ••
•• ••• ••
• •
• ••• •
• • •
••• •• •
•
•
••• • •
••
•
• ••• •
•••• •••
•
• ••• •• • •
Example 2.8
Compare variability of the data
Range
Interquartile Range (E)
Variance (F)
Standard Deviation
(G)
Coefficient of Variation
Standardized Value
(H)
1 2 3 4 5 6 7 8 9
Fomular
Range = xmax − xmin : width of interval cover 100% values
IQR = Q3 − Q1 : width of interval cover 50% middle values
Example 2.9
Find range and IQR of data
Data (A): 10, 11, 15, 17, 12
Data (B): 5, 5, 6, 8, 10, 50
Definition of Variance
Variance of sample s 2 and Variance of Population σ 2
n N
(xi − x̄)2 (xi − µ)2
P P
s 2 = i=1 ; σ 2 = i=1
n−1 N
Definition of S.D
Standard Deviation of sample s and of Population σ
√ √
s = s 2 ; σ = σ2
Example 2.10
Compare variance and standard deviation of samples
Data (A): 10, 11, 15, 17, 12
Data (B): 5, 5, 6, 8, 10, 50
Definition of CV
Coefficient of Variation of Sample and Population
Standard Deviation
CV = × 100%
Mean
Example 2.11
Compare variability of samples (A) income: $ thousand, (B) income: $
thousand, (J) profit: $ milion
Definition
Standardized value, or z-score, of value xi , denoted by zi
xi − mean
zi =
S.D
Standardized data has zero mean and unit variance:
z̄ = 0 ; sz2 = 1
Example 2.12
Compare wage/week and work-time/week of a worker in company
Skewness
Kurtosis and Adjusted Kurtosis
Skewness = 0; Kurtosis = 3
Normal distribution
Symmetric, bell-shaped
Definition
Sample Skewness n
(xi − x̄)3 /n
P
i=1
skew =
sx3
kurt < 3, kurt ∗ < 0 kurt = 3, kurt ∗ = 0 kurt > 3, kurt ∗ > 0
Platykurtic Mesokurtic Leptokurtic
August 3, 2025 67 / 273
Example
Example 2.13 (E) (F) (G) (H)
Statistics for 4 data n 5 5 9 9
(E) x̄ 5 5 5 5
Range 4 8 8 4
(F)
q0.25 3.5 2 2.5 4
(G)
q0.5 5 5 5 5
q0.75 6.5 8 7.5 6
(H)
123456789 IQR 3 6 5 2
s2 2.5 10 7.5 1.5
| | (E) s 1.58 3.16 2.74 1.22
CV 32% 63% 55% 24%
| | (F)
skew 0 0 0 0
| | (G)
kurt ∗ -1.2 -1.2 -1.2 -0.61
| | (H)
August 3, 2025 68 / 273
2.5. GROUPED DATA
Interval grouped frequency table
Value a0 - a1 a1 - a2 ··· ak−1 - ak
Frequency f1 f2 ··· fk
a0 + a1
Using the middle value of interval: x1 = , ...
2
Value x1 x2 ··· xk
Frequency f1 f2 ··· fk
Formula
P P
i fi xi
fi xi
x̄ = = Pi
n i fi
2 2
P P
2 i fi (xi − x̄) n i fi xi 2
s = = − (x̄)
n−1 n−1 n
August 3, 2025 69 / 273
Example
Descriptives 25
20
xi fi 15
1 10 10
2 22 5
3 25 0
1 2 3 4 5 6 7 8 9
4 18
• •
5 13
6 7 Central Quartile Variability Shape
7 3 x̄ = 3.46 q0.25 = 2 s 2 = 2.857 skew = 0.73
8 1 me = 3 q0.5 = 3 s = 1.697 kurt = 3.42
9 1 m0 = 3 q0.75 = 4.25 CV = 0.489 kurt ∗ = 0.42
Example 2.15
Sample 1 of 4 elements has mean of 10; sample 2 of 6 elements has mean
of 12. What is the mean of pooled sample that is combination of two?
Example 2.16
Find sample mean and sample variance of pooled sample
(a) of two samples
sample size mean variance
(1) n1 x̄1 s12
(2) n2 x̄2 s22
(b) of k samples that sample size, mean, variance are nj , x̄j , sj2 , j = 1, k,
respectively.
Y
1 x1 y1
2 x2 y2
.. .. ..
. . .
n xn yn
X
Definition
Sample Covariance n
P
(xi − x̄)(yi − ȳ )
i=1
cov (x, y ) =
n−1
Correlation Coefficient
P
cov (x, y ) (xi − x̄)(yi − ȳ )
rxy = = pP i pP
sx sy i (xi − x̄)
2
i (yi − ȳ )
2
Definition
Population Covariance N
P
(xi − µx )(yi − µy )
i=1
Cov (x, y ) =
N
Correlation Coefficient Cov (x, y )
ρxy =
σx σy
Income (x) 10 10 11 13 15 15 16 18 19 20
Expenditure (y ) 9 5 10 10 12 13 9 14 14 15
No. xi yi xi2 yi2 xi yi P
1 10 9 100 81 90 xi
x̄ =
2 10 5 100 25 50 n
n h 2 i
3 11 10 121 100 110 sx2 = x − (x̄)2
n−1
4 13 10 169 100 130 P
yi
5 15 12 225 144 180 ȳ =
n
6 15 13 225 169 195 n h 2 i
sy2 = y − (ȳ )2
7 16 9 256 81 144 n−1
n
8 18 14 324 196 252 cov (x, y ) = [xy − x̄ ȳ ]
n−1
9 19 14 361 196 266 cov (x, y )
10 20 15 400 225 300 rx,y =
sx sy
Sum 147 111 2281 1317 1717
All elements
Statistic +a ×b
Mean +a ×b
Median, mode +a ×b
Min, Max +a ×b
Quartile, quantile +a ×b
Interquartile range constant ×b
Variance constant ×b 2
Standard deviation constant ×|b|
Coefficient of variation change constant
Skewness constant constant
Kurtosis constant constant
Covariance
P
i (xi − µx )(yi − µy )
Cov (x, y ) = = µxy − µx µy
N
Correlation Cov (x, y )
ρx,y =
σx σy
August 3, 2025 78 / 273
Sample statistics vs Population parameters
In Microsoft Excel
Install Data Analysis Toolpak
Using Descriptive Statistics
Using Covariance
Using Correlation
In R
> summary(var)
>
3.1 Population
3.2 Random Sample [1] p.285
3.3 Sample Mean [1] p.296
3.4 Sample Variance [1] p.317
3.5 Sample Proportion
Reference
Book [1] Chapter 6, pp.284 - 381.
Book [2] Chapter 8
Book [4] Chapter 6
Example 3.1
Number of dots when tossing a die
Population: {..., 1, 3, 6, 4, 2, ...}: infinite number of values
Rv X , X ∈ {1, 2, 3, 4, 5, 6}
Probability distribution
X 1 2 3 4 5 6
P 1/6 1/6 1/6 1/6 1/6 1/6
E (X ) = 3.5 is population mean
V (X ) = 2.917 is population variance
Random sample
With random variable X , random sample size n is set of n random
variable, denoted by X = (X1 , X2 , ..., Xn ), that
Xi are independent
Xi are indentically distributed with X ; (Xi are iid., i = 1, n).
Example 3.2
Rolling a die, X is number of dots, consider sample size 3, sample mean
Random sample Observed sample
Sample X = (X1 , X2 , X3 ) x1 = (1, 3, 2); x2 = (1, 3, 4)
Sample mean X = (X1 + X2 + X3 )/3 x̄1 = 2; x̄2 = 8/3
Probability P(X = 1) = 1/216 P(x̄1 = 1) = 0
P(X = 2) =? P(x̄1 = 2) =?
P(X = 8/3) =? P(x̄1 = 8/3) =?
August 3, 2025 88 / 273
Statistic
Statistic - is a proxy of population parameter in sample, should be
n→∞
Stat. −−−→ Parameter
Mean of Stat. = Parameter
Variance of Stat. is small
n→∞
Variance of Stat. −−−→ 0
Probability distribution is specified
Sample Mean
From population with mean of µ, variance of σ 2 , sample (X1 , X2 , ..., Xn ),
sample mean P
Xi
X = i
n
is a random variable
σ2 σ
E (X ) = µ ; V (X ) = ; σX = √
n n
Example 3.3
Find distribution of sample mean when rolling a die 1, 2, 3 times
n = 1:
X 1 2 3 4 5 6
1 1 1 1 1 1
P 6 6 6 6 6 6
n = 2:
1 2 3 4 5 6
1 1 1.5 2 2.5 3 3.5
2 1.5 2 2.5 3 3.5 4
3 2 2.5 3 3.5 4 4.5
4 2.5 3 3.5 4 4.5 5
5 3 3.5 4 4.5 4.5 5
6 3.5 4 4.5 5 5.5 6
n=3
X̄ 1 4/3 5/3 2 7/3 8/3 ··· 16/3 17/3 6
1 3 6 10 15 21 6 3 1
P 216 216 216 216 216 216 ··· 216 216 216
1 2 3 4 5 6 1 3 2 5 3 7 4 9 5 11 6 1 4 5 2 7 8 3 10 11 4 13 14 5 16 17 6
1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
n=4 n = 10 n = 25
µ µ µ
August 3, 2025 93 / 273
Example
Example 3.4
Wage is Normality with mean of 200 and SD of 40.
(a) Find the Probability that average wage of a sample is greater than
210, with n = 1; n = 4; n = 16; n = 100
(b) With probability of 0.9, find the upper limit of average wage of 16
workers
(c) With probability of 0.9, find at least 3 interval aroud population mean
that avarge wage of 16 worker falls in
Example 3.5
For Nomality population, X ∼ N(µ, σ 2 ), sample size n, with probability of
(1 − α), build formula for intervals that sample mean falls in. Which of
them is narrowest?
Non-Normalily population
If population with mean of µ, variance σ 2 , Non-normality, with Large
sample (n > 30), apply the Central Limit Theorem, the X is Normal
distributed
σ2
2
σ 2
X ∼ N µX , σX , or N µ, , or N µ, √
n n
• • • •
µ µ µ
Sample Variance
− X )2
P
2 n i (Xi
S = MS = ⇒ E (S 2 ) = σ 2
n−1 n−1
August 3, 2025 97 / 273
Distribution of Sample Variance: Normality population
2σ 4 (n − 1)S 2
S2 σ2 ∼ χ2 (n − 1) n−1
n−1 σ2
Or S 2 is proportional to Chi-squared distributed, denoted: S 2 ∝ χ2 (n − 1).
Example 3.7
X −µ
Known σ 2 √ ∼ N(0, 1)
σ/ n
nSµ2 (n − 1)S 2
∼ χ2 (n) ∼ χ2 (n − 1)
σ2 σ2
X −µ X −µ
Unknown σ 2 √ ∼ T (n) √ ∼ T (n − 1)
Sµ / n S/ n
(n − 1)S 2
∼ χ2 (n − 1)
σ2
Example 3.8
Probability that voter thay ‘Yes’ is 0.7. X = 1 if voter say Yes, and 0
otherwise. Data is {Yes, Yes, No, No, Yes}.
(a) Write sample with X , find sample mean and sample proportion
(b) Find probability that above sample occurs
August 3, 2025 100 / 273
Distribution of Sample Proportion
p = 0.7; n = 1 n=4 n = 16 n = 32
0 1 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 01 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930311
1 1 4 4 4 4 4 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 32323232323232323232323232323232323232323232323232323232323232
Example 3.9
Probability that a customer of commercial bank is in bad-debt is 0.2. With
sample of 200 customers, find
(a) probability that proportion of bad-dept is greater than 22 %
(b) the upper limit of proportion of bad-debt, with probability level of 0.9
(c) the narrowest interval of bad-dept proportion, with probability of 0.95
Example 3.10
Population proportion is p, in sample which sized of n, probabilty level of
(1 − α), find intervals for sample proportion
Reference
Book [1] Chapter 7, pp.284 - 381.
Book [2] Chapter 10
Book [3] Chapter 7
Example 4.1
P
Xi Xmin + Xmax
Estimate for mean: X = ; Xmid =
nP 2 P
(Xi − X )2 (Xi − X )2
Estimate for variance: S 2 = ; MS =
n−1 n
August 3, 2025 106 / 273
Estimator vs Estimate
Estimator is Random statistic on random sample
Estimate is observed value of statitics from observed sample
S.D of estimator is called Standard error (S.E)
Example 4.2
X1 + X2 + X3
for mean X = x̄ = 15
3
P3
(Xi − X )2
for variance S = i=1
2 s 2 = 31
3−1
Estimator Estimate
Unbiasness
θ̂ is unbiased estimator of θ if
E (θ̂) = θ, or bias = 0
Biased estimator
Over-estimate: E (θ̂) > θ, or bias > 0
Under-estimate: E (θ̂) < θ, or bias < 0
Biased Unbiased
••
••• •
•• ••• ••
• • •••••
•• •• ••
• •• • ••• • ••
• • • •
•
•
Efficient
August 3, 2025 109 / 273
Example
Example 4.3
From population with mean of µ, variance of σ 2 ,
(a) random sample (X1 , X2 , X3 ), find the MVUE of µ among
1 1 1 1 1
M1 = X1 + X2 M4 = X1 + X2 + X3
3 3 2 3 6
1 1 1 1 1
M2 = X1 + X2 M5 = X1 + X2 + X3
2 2 2 4 4
1 2 1 1 1
M3 = X1 + X2 M6 = X1 + X2 + X3
3 3 3 3 3
µ̂ = α1 X1 + α2 X2 + · · · + αn Xn
Estimator Sµ2 MS S2
n−1 2
Expectation σ2 σ σ2
n
2σ 4 2(n − 1)σ 4 2σ 4
Variance
n n2 n−1
Example 4.6
From Normal population, sample 1 size 3 and variance of S12 , sample 2
size 5 and variance of S22 .
(a) Find MVUE of σ 2 in all linear combination of S12 and S22
(b) In general, k sample size nj , sample variance Sj2 , j = 1, n; find the
MVUE of σ 2 in all linear combination of Sj2
August 3, 2025 112 / 273
4.3. PERCENTILE MATCHING ESTIMATOR
Example 4.7
Data is: 4, 4, 5, 6, 8, 9, 10, find Percentile matching estimate of
ln 2
(a) parameter λ of Exponential distribution, in which median is
λ
(b) parameter a, b of Uniform distribution
(c) parameter µ, σ 2 of Normality distribution
Example 4.8
Data is: 4, 4, 5, 6, 8, 9, 10, find Moment estimate of
(a) parameter λ of Poisson distribution
(b) parameter λ of Exponential distribution
(c) parameter µ, σ 2 of Normality distribution
(d) paramater a, b of Uniform distribution
Likelihood function
Rv X with parameter θ, random sample X = (X1 , X2 , ..., Xn ) then
likelihood function is
(Q
n
P(Xi , θ) : discrete
L(X, θ) = Qni=1
i=1 f (Xi , θ) : continuous
Example 4.9
Probability is p = 0.7; which of the following samples is more likely?
x1 = (1, 0, 1); x2 = (0, 1, 0); x3 = (0, 0, 1); x4 = (1, 1, 1)
p L(x, p)
0.01 0.000099 L(p)
0.02 0.000392 0.15
.. .. 0.12
. .
0.66 0.148104 0.09
0.67 0.148137
0.06
0.68 0.147968
.. .. 0.03
. .
0.98 0.019208 0 0.1 0.3 0.5 0.7 0.9 p
0.99 0.009801
August 3, 2025 116 / 273
Maximum Likelihood Estimator (MLE)
MLE
MLE of θ is θ̂ that maximize Likelihood function or logarithm of
Likelihood function
∂L(X, θ)
=0
L(X, θ) → max ⇔ 2
∂θ
∂ L(X, θ) < 0
∂θ2
∂ ln L(X, θ)
=0
ln L(X, θ) → max ⇔ 2
∂θ
∂ ln L(X, θ) < 0
∂θ2
Example 4.11
Find MLE of parameter(s) in distributions
Properties
E (Score) = 0
Fisher’s information on one element is IX (θ), then
Cramer-Rao Inequality
For every unbiased estimator θ̂ of θ on sample of n elements, then
1
V (θ̂) ≥
In (θ)
Example 4.12
Proof that
(a) p̂ is MVUE of p in Bernoulli distribution
(b) X is MVUE of µ in Normality distribution
(c) S 2 is not MVUE of σ 2 in Normality distribution
lim E (θ̂) = θ
n→∞
Consistent estimator
Sample
Estimator
Small Large
Efficient :)) :)))
Unbiased Not Asymptotic efficient :| :))
efficient Not Asymptotic efficient :| :|
Asymptotic unbiased :(( :|
Biased
Not assymptotic unbiased :(( :((
Consistence :(( :)
Reference
Book [1] Chapter 8, pp.382 - 424.
Book [2] Chapter 11
Book [4] Chapter 8
Confidence width
w = UL − LL
One-sided C.I.
Right tail (Lower bounded) and Left tail (Upper bounded) C.I.
σ σ
P(X − zα √ < µ) = 1 − α and P(µ < X + zα √ ) = 1 − α
n n
Which is correct?
σ σ
P X − zα/2 √ < µ < X + zα/2 √ =1−α
n n
σ σ
P X − 1.96 √ < µ < X + 1.96 √ = 0.95
n n
5 5
P X − 1.96 · < µ < X + 1.96 · = 0.95
4 4
5 5
P 50 − 1.96 · < µ < 50 + 1.96 · = 0.95
4 4
P(50 − 2.45 < µ < 50 + 2.45) = 0.95
95%
X
| |
Example 5.2
Assumed that wage per hour ($) of financial expert is normal distributed.
Data of 16 randomly surveyed experts is below
Wage (x) 20 22 24 26
Freq. 1 6 5 4
P P 2
And i xi = 376; i xi = 8888
(a) Calculate sample mean and standard deviation
(b) Find the 95% confidence interval of mean (population mean)
(c) Find the 90% prediction interval
(d) Find the 80% upper confidence interval of mean
Example 5.3
Wage survey of 16 experts shows the mean of $23.5 and the variance of
$3.467 Assume that wage is normally distributed.
(a) Find the 95% confidence interval of variance
(b) Find the 90% upper confidence of standard deviation
p(1 − p)
n large enough, p̂ ∼ N p,
n
r r !
p(1 − p) p(1 − p)
P p − zα/2 < p̂ < p + zα/2 =1−α
n n
Two-sided C.I.
q
2 /2n 2 /(4n2 )
p̂(1 − p̂)/n + zα/2
p̂ + zα/2
p∈ 2 /n
± zα/2 2 /n
1 + zα/2 1 + zα/2
One-sided C.I.
Example 5.4
In 200 insurance customers, there were 48 claims.
(a) Find the 90% C.I. for claim proportion in customers
(b) Keep ME = 3%, C.I 90%, how many customers should be surveyed?
(c) Keep ME = 3%, with 200 customers, what is confidence level?
(d) Find the 95% C.I of number of claims in 1000 customers.
Example 5.5
To estimate proportion on sample n = 400, with confidence level 90%,
what is the maximum value that the ME of C.I. would be?
Example 5.6
Catch 500 fishes in the lake, mark on them then release. Then catch 1000
fishes, and there are 50 marked ones. Find the right-sided (lower bounded)
C.I 90% of total number of fishes in the lake.
August 3, 2025 136 / 273
Exercise - Lecture 05
Reference
Book [1] Chapter 9, pp. 425 – 483.
Book [2] Hypothesis Testing, pp.337 – 390.
Example 6.1
Determine hypotheses pair, conclusion when reject and accept H0
(a) Last year, average income was 120. This year, income has increased
(b) Average price does not differ from 15
(c) Proportion of bad-dept has been over 10%
(d) The variability of stock price is greater than 25 $2
August 3, 2025 144 / 273
Types of Error
Example 6.2
Testing for ’capacity’ of employee.
H0 : capacity level is ’Good’
Using the ’Entry test’
Statistic: Test’s score (0 to 100)
Critical value: 80
Reject region: ”Score < 80” → Error type 1: P(ET 1) = α
Statistic ≥ 80: Not reject H0 → Error type 2: P(ET 2) = β
Change α → change critical value and Reject region → change β
To reduce α → increase β
Given α, choose Statistic that minimize β
Example 6.3
2
( X ∼ N(µ, σ ), σ = 2, and µ is unknown. Test for single
Assumed that
H0 : µ = 6
hypothesis
H1 : µ = 9
(a) Random survey n = 1, critical value is 7, reject region: X > 7. Find
α = P(ET .1), β = P(ET .2)
α = P(X > 7|µ = 6) =
β = P(X ≤ 7|µ = 9) =
β α
6 7 9 x
Example 6.4
6 7 9 x
Example 6.6
Price is normality with variance of 25 ($2 ). Survey 100 observations and
sample mean is 24 ($).
(a) Test the hypothesis that average price is higher than 23($), with
significant level of 5%, 1%;
(b) Find the P-value of the test in (a)
(c) If the true mean is 24.8($), find power of the test in (a)
(d) Test the hypothesis that average price is 24.5($), at 5% of significant,
and find the P-value
(e) Find P-value of the test that mean of price is less than 25.5($)?
Example 6.7
Random survey wage of 16 persons, obtains sample mean of 23.5 and
sample variance of 3.467. Assumes that wage is normal distributed.
(a) At 5% significant, test the hypothesis that mean of wage is higher
than 22.5.
(b) Estimate the P-value of the test in (a)
(c) Compute P-value of the test in (a) by Microsoft Excel or R
(d) At 10% sig. test the hypothesis that mean of wage less than 24, and
estimate the P-value of the test
Example 6.8
On 16 workers, sample mean is 23.5, sample variance is 3.467. At 5%, test
the hypothesis that variance is greater than 2.6, and estimate the P-value
of the test.
August 3, 2025 156 / 273
6.6. TEST FOR PROPORTION
Example 6.9
In random survey 400 customers of choosing between product A and B,
223 favor A, the others favor B.
(a) At 5%, test the hypothesis that more than 50% favor A?
(b) Find P-value of the test in question (a)
(c) If true value p = 0.6, find Power of the test in question (a)
(d) If in 400 surveyees, 164 favor A, 158 favor B, the others are
indifferent. Test the hypothesis that surveyees prefer A to B, at 5%;
and find P-value of the test.
Example 6.11
Apply Central Limit Theorem, build the reject region for the test
(a) X ∼ U(0, b), test b > b0
(b) X ∼ E (λ), test λ > λ0
Example 6.12
With the following sample
12, 12, 14, 15, 15, 17, 18, 18, 21, 22, 22, 24, 25, 25, 25, 26, 28, 28, 29, 30
(a) Test the hypothesis H0 : median = 20 with H1 : median > 20,
significant 5%
(b) Test the hypothesis H0 : median = 20 with H1 : median ̸= 20,
significant 10%
Most powerful
Given hypotheses pair, significant α, reject region that maximize power of
the test is ’most power reject region’, test is ’most powerful test’
(
H0 : θ = θ 0
Simple pair → Uniformly most powerfull test (UMP)
H1 : θ = θ 1
Neyman-Pearson Lemma
(
H 0 : θ = θ0 L(x1 , x2 , ..., xn , θ0 )
For , sample (x1 , x2 , ..., xn ), let Λ =
H 1 : θ = θ1 L(x1 , x2 , ..., xn , θ1 )
With sig. α, constant k, reject region R is UMP if
Λ ≤ k inside R
Λ ≥ k outside R
Example 6.13
Proof that Reject region in part 6.3 ( is UMP
H0 : µ = µ 0
X ∼ N(µ, σ 2 ), σ is known, test for
H1 : µ = µ 1
(Xi − µ1 )2 − (Xi − µ0 )2
P P
L(X1 , X2 , ..., Xn , θ0 )
Λ= = exp
L(X1 , X2 , ..., Xn , θ1 ) 2σ 2
2
σ ln k 2
µ − µ12
Λ ≤ k ⇐⇒ (µ0 − µ1 )X ≤ + 0 =a
n 2
a σ
If µ1 < µ0 ; P X ≤ µ = µ0 = α ⇔ X ≤ µ 0 − z α √
µ0 − µ1 n
a σ
If µ1 > µ0 ; P X ≥ µ = µ0 = α ⇔ X ≥ µ 0 − z α √
µ0 − µ1 n
Equivalent to Z-test
L(Ω0 )
Likelihood Ratio: Λ =
L(Ω)
Reject region with sig. level α
Example 6.14
(
H0 : µ = µ 0
X ∼ N(µ, σ 2 ), σ is unknown, test for
H1 : µ ̸= µ0
Ω0 = {µ0 }, Ω1 = R\µ0 ⇒ Ω = R
(Xi − µ0 )2
P
L(Ω0 ) = L(X, µ = µ0 ) replace σ2 c2 =
=σ0
n
(Xi − X )2
P
L(Ω) = L(X, µ = X ) replace σ 2 = MS =
n
n/2
(Xi − X )2
P
2 L(X, µ0 )
Replace MLE of σ is MS, Λ = =
(Xi − µ0 )2
P
L(X, X )
r
X − µ0 1
It is proven that Λ ≤ k ⇔ √ ≥ (n − 1) 2/n − 1
S/ n k
Equivalent to the T-test
Example 6.15
From normality population, the sample is
(17, 18, 18, 18, 19, 19, 20, 22, 22, 23, 25, 25)
Using likelihood ratio test to test the hypothesis that mean is 22 and
variance is 20.
Reference
Book [1] Chapter 10, pp. 484 - 551; Chapter 12, pp. 668 - 669
Book [2] Chapter 13
Book [4] Chapter 10, Chapter 11
Example 7.2
Pair or independent samles?
(a) Income and Expenditure of households?
(b) Income of households in urban and rural area?
(c) Wage of undergraduates and graduates?
(d) Microeconomics and Macroeconomics score of students?
(e) Microeconomics score of K60 and K61 studens?
August 3, 2025 173 / 273
7.2. INFERENCES FOR TWO MEANS
Statistic
d
T = √
Sd / n
Reject region with significant α
|T | > t(n−1)α/2
Example 7.3
Revenue
Shop d
Jan Feb
(1) 72 76 4 d =4 ; sd2 = 10
(2) 75 79 4 Hypotheses pair
⇒
(3) 70 77 7
(
(4) 82 80 -2 H0 : µFeb = µJan
(5) 70 75 5
H1 : µFeb > µJan
(6) 83 89 6
(
(a) At 5%, mean of Revenue H 0 : µd = 0
⇔
in Feb is higher than H 1 : µd > 0
that in Jan?
(b) C.I. 90% of difference?
Example 7.4
Assummed that wage in industry A and B are normality distributed with
standard deviation are 10 and 15, respectively. Random survey 20 worker
in industry A and 25 worker in industry B expresses the sample mean are
240 and 260.
(a) At 5%, test the hypothesis that average wage in A is lower than that
in B
(b) Find P-value of the above test
(c) Find confidence interval 90% of the difference between two average
wages
If n1 , n2 > 30 → t(v )α ≈ zα
C.I for difference s
S12 S22
(µ1 − µ2 ) ∈ (X 1 − X 2 ) ± t(v )α/2 +
n1 n2
August 3, 2025 182 / 273
Example
Example 7.5
x̄A = 79.5; sA2 = 7.5
x̄B = 86.0; sB2 = 53.5
Sales
Firm A Firm B (a) Test for equal mean between A
77 97 and B, assuming equal variance,
79 86 significant level 5%, and
76 85 estimate P-value of the test
80 93
(b) Test for equal mean between A
82 81
and B, assuming unequal
83 72
variance, significant level 5%,
88
and estimate P-value of the test
90
82 (c) Find C.I 95% of difference
between two means
Sales are Normal distributed
d
Pair sample d = X1 − X2 T = √
Sd / n
X1 − X2
Z=s
Known σ12 , σ22 σ12 σ22
+
n1 n2
X1 − X2
Independent Unknown σ12 , σ22 T =s
Sp2 Sp2
samples σ12 = σ22 +
n1 n2
X1 − X2
F-test for Unknown σ12 , σ22 T =s
S12 S22
variances σ12 ̸= σ22 +
n1 n2
Example 7.6
Example 7.7
In 200 male customers, 140 say “satisfied”; in 300 female customers, 156
say “satisfied”.
(a) Test the hypothesis that male is more satisfied than female, at 5%
(b) Find the P-value of the test
(c) Find 90% C.I of the difference between two proportions
Example 7.8
Sample correlation of Present and Midterm of 83 students is 0.1726. At
5% test for correlation, and find the P-value of the test.
August 3, 2025 192 / 273
Summary Example
Example 7.9
Result table of wage from a firm
s2
Pn, x,P Senior Junior
xi , xi2
Post grad. 40; .......; ....... 60; .......; .......
1000; 25202.8 1020; 17882.8
Grad 50; ...... ; ....... 80; ......; .......
1150; 27077.2 1200; 18971.7
(a) Test for equality of variance between Post and Grad in Seinor; Junior
(b) Test for equality of mean between Post and Grad in Seinor; in Junior
(c) Test for equality of variance and mean between Senior and Junior in
Post; in Grad
(d) Confidence interval of difference between means of Post and Grad in
Senior; Junior; between means of Senior and Junior in Post; Grad
August 3, 2025 193 / 273
Exercises - Lecture 07
Reference
Book [1] Chapter 10, pp. 553-612
Book [4] Chapter 15
Example 8.1
Analyze deviation of Wage in two firms A and B
Quantitative variable: Wage
Factor: “training”
Firm A Firm B
Wage 5 6 8 9 Wage 5 6 8 9
Trained? No No Yes Yes Trained? No Yes No Yes
Values xj 5 6 8 9
Total mean x̄¯ 7
Deviations xj − x̄¯ −2 −1 +1 +2
Groups j No Yes
Group mean x̄i x̄No = 5.5 x̄Yes = 8.5
Dev. by Treatment x̄i − x̄¯ −1.5 −1.5 +1.5 +1.5
Error xij − x̄i −0.5 +0.5 −0.5 +0.5
(xij − x̄¯)2 = 10
P
Total Sum of Squares: SST =
Treatment Sum of Squares: SSTr = (x̄i − x̄¯)2 = 9
P
Degree of freedom
Df in total: 4 − 1 = 3
Df of Treatment (Between-sample): 2 − 1 = 1
Df of Error (Within-sample): 4 − 2 = 2
Table
Sources SS df MS
SSTr 9
Treatment SSTr = 9 1 MSTr = = =9
df 1
SSE 1
Error SSE = 1 2 MSE = = = 0.5
df 2
Total SST = 10 3
Sources SS df MS F
SSTr MSTr
Treatment SSTr I −1 MSTr = F =
I −1 MSE
SSE
Error SSE n−I MSE =
n−I
Assumption
Normality: in each category Xi ∼ N(µi , σi2 )
Heteroscedasticity: V (Xi ) = σi2 are equal
Samples are random and independent
Testing
Hypotheses pair (
H 0 : µ1 = µ2 = · · · = µ I
H1 : not H0
Example 8.2
Sources SS df MS F
Treatment
Error
Total
Hypothesis testing
H0 : µ1 = µ2 = µ3 : Zone does not affect to the wage
Fstat =
f(2,13)0.05 = 3.8; f(2,13)0.01 = 6.7
Example 8.2
> zone <- c(rep(’z1’,5), rep(’z2’,6), rep(’z3’,5))
> wage <- c(8,6,9,8,7,9,6,9,7,6,8,5,6,5,7,6)
> [Link] <- aov(wage ∼ zone)
> summary([Link])
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1
B1 B2 ··· BJ Mean
x1,1,1 x1,2,1 x1,J,1
A1 .. .. ··· .. x̄A1
. . .
x2,1,1 x2,2,1 x2,J,1
A2 .. .. ··· .. x̄A2
. . .
.. .. .. .. .. ..
. . . . . .
xI ,1,1 xI ,2,1 xI ,J,1
AI .. .. ··· .. x̄AI
. . .
Mean x̄B1 x̄B2 ··· x̄BJ x̄¯
Sources SS df MS F
SSA MSA
Factor A SSA I −1 MSA = FA =
I −1 MSE
SSB MSB
Factor B SSB J −1 MSB = FB =
J −1 MSE
SSE
Error SSE n−I −J +1 MSE =
n−I −J +1
Medium 10 9 8
M I 10
9 7 8
M I 9
Low 9 5 7
M S 9
10 6 6
M S 7
None 6 7 5
M O 8
5 6 5
M O 8
(a) Two-way ANOVA Table without L I 9
interaction? L I 10
.. .. ..
(b) At 5%, test for effect of Factors . . .
August 3, 2025 209 / 273
Two-Factor ANOVA without Interaction: Example
Example 8.3
Zone
Sales Mean
Inner Sub Outer
High 12, 14 11, 12 10, 11 11.667
Advertising
Sources SS df MS F
Advertising
Zone
Error
Total
Example 8.3
> adv <- c(rep(h,6),rep(m,6),rep(l,6),rep(n,6))
> z1 <- c(’i’,’i’,’s’,’s’,’o’,’o’)
> zone <- c(rep(z1,4))
> sales <- c(12,14,11,12,10,11,10,9,9,7,8,8,9,10,
5,6,7,6,6,5,7,6,5,5)
> [Link] <- aov(sales ∼ adv + zone)
> summary([Link])
Factor A, Factor B
Interaction of A and B (A ∗ B)
B1 B2 ··· BJ Mean
x1,1,1 x1,2,1 x1,J,1
A1 .. ⇒ x̄11 .. ⇒ x̄12 ··· .. ⇒ x̄1J x̄A1
. . .
x2,1,1 x2,2,1 x2,J,1
A2 .. ⇒ x̄21 .. ⇒ x̄22 ··· .. ⇒ x̄2J x̄A2
. . .
.. .. .. .. .. ..
. . . . . .
xI ,1,1 xI ,2,1 xI ,J,1
AI .. ⇒ x̄I 1 .. ⇒ x̄I 2 ··· .. ⇒ x̄IJ x̄AI
. . .
Mean x̄B1 x̄B2 ··· x̄BJ x̄¯
XXX
SST = (xijk − x̄¯)2
i j k
X
SSA = nAi (x̄Ai − x̄¯)2
i
X
SSB = nBj (x̄Bj − x̄¯)2
j
XXX
SSI = (x̄ij − x̄Ai − x̄Bj + x̄¯)2
i j k
XXX
SSE = (xijk − x̄ij )2
i j k
Sources SS df MS F
SSA MSA
Factor A SSA I −1 MSA = FA =
I −1 MSE
SSB MSB
Factor B SSB J −1 MSB = FB =
J −1 MSE
SSI MSI
Interaction SSI (I −1)(J −1) MSI = FI =
(I −1)(J −1) MSE
SSE
Error SSE n − IJ MSE =
n − IJ
Example 8.4
Zone
Sales
Inner Sub Outer
Zone
H 12 11 10 Sales Mean
I S O
14 12 11
H 13 11.5 10.5 11.667
Advertising
M 10 9 8
Adv.
⇒ M 9.5 8 8 8.5
9 7 8
L 9.5 5.5 6.5 7.167
L 9 5 7
N 5.5 6.5 5 5.667
10 6 6
Mean 9.375 7.875 7.5 8.25
N 6 7 5
5 6 5
Sources SS df MS F
Advertising
Zone
Interaction
Error
Total
Example 8.4
> adv <- ...
> zone <- ...
> sales <- ...
> [Link] <- aov(sales ∼ adv*zone)
> summary([Link])
Reference
Book [1] Chapter 13, pp. 723 - 757
Book [3] Chapter 10
Book [4] Chapter 14
Example 9.1
Testing for
Proportions of product in market in four quality level A, B, C, D are
10%, 20%, 30%, 40%, respectively.
Distribution of customers in week are uniform (same proportion)
Number of claims in one day in insurance company is Poisson
distribution
Hypothesis pair (
H0 : Distribution is R
H1 : Distribution is not R
Chi-squared statistic
k
X (Fi − Ei )2
χ2 =
Ei
i=1
Example 9.3
Testing for uniform distribution of the week-day data
Day Mon Tue Wes Thu Fri Sat Sun
Num. of customers 123 115 118 130 148 154 150
Observed Expected
(Fi − Ei )2
Levels Frequency Proportion Frequency
Ei
Fi pi Ei
A 50
B 60
C 125
D 165
Sum 400
H0 : Proportion of A, B, C, D is 10%, 20%, 30%, 40%
χ2stat =
Critical value χ2(k−m−1)α =
Example 9.2
> freq <- c(50,60,125,165)
> prop <- c(0.1,0.2,0.3,0,4)
> [Link](freq, p = prop)
Example 9.5
At 5% of significant, using following data to test the hypothesis
X 0 1 2 3 4 5 6
Freq. 70 74 36 13 3 3 1
Hypothesis (
H0 : A and B are independent
H1 : A and B are not independent
Ri
Expected proportion of Ai is estimated by
n
Cj
Expected proportion of Bj is estimated by
n
Cj Ri
Expected proportion of (Ai , Bj ) is estimated by ×
n n
Expected frequency in (Ai , Bj ) is
Cj Ri Ri Cj
Eij = × ×n =
n n n
Degree of freedom: IJ − (I − 1) − (J − 1) − 1 = (I − 1)(J − 1)
P P
B1 · · · BJ B1 ··· BJ
R 1 C1 R1 CJ
A1 F11 · · · F1J R1 A1 E11 = ··· E1J = R1
n n
.. .. .. .. .. ⇒ .. .. .. .. ..
. . . . . . . . . .
R I C1 RI CJ
AI FI 1 · · · FIJ RI AI EI 1 = ··· EIJ = RI
n n
P P
C1 ··· CJ n C1 ··· CJ n
P P (Fij − Eij )2
If χ2 = i j > χ2((I −1)(J−1))α : reject H0
Eij
s
χ2 /n
Cramer’s V: V = : as small as more independent.
min{I − 1, J − 1}
Neutral 3 5 3 1st A
Agree 8 20 30 2nd D
.. ..
. .
(a) At 5%, test the hypothesis that Year 2nd D
and Vote are independent 2nd N
.. ..
(b) Estimate the P-value of the test . .
August 3, 2025 233 / 273
Independence Test: Example
Vote
N 3 5 3 11 N 11
A 8 20 30 58 A 58
P P
21 41 38 100 21 41 38 100
Example 9.6
> year <- c(rep(’1st’,21),rep(’2nd’,41),
rep(’3rd’,38))
> vote <- c(rep(’d’,10),rep(’n’,3),rep(’a’,8),
rep(’d’,16), rep(’n’,5),rep(’a’,20),rep(’d’,5),
rep(’n’,3),rep(’a’,30))
> data <- [Link](year,vote)
> table <- table(data$vote,data$year)
> [Link](table)
Example 9.7
Customer’s response about design and quality, using Likert scale:
Design 2 3 5 4 3 4 3 5 2 3
Quality 3 4 4 3 5 3 4 5 4 2
(1: very bad; 2: bad; 3: neutral; 4: good; 5: very good)
Example 9.8
Testing for association between economic and social effect rank in 10 firms
Economic 2 4 5 7 6 9 1 8 10 3
Social 6 5 7 8 4 10 3 9 2 1
Jacques-Bera Test
H0 : Variable is Normal distributed
Sample size: n
(xi − x̄)3 /n
P
Skewness: Skew = 3
P s
(xi − x̄)4 /n
Kurtorsis: Kurt ∗ = −3
s4
Statistic
Example 9.9
Test for Normality with the following sample
x 20 21 22 23 24 25
freq. 10 16 23 22 17 12
Calculation table
xi fi x i fi fi (xi − x̄)2 fi (xi − x̄)3 fi (xi − x̄)4
20 10 200 65.54 -167.77 429.5
21 16 336 38.94 -60.74 94.76
22 23 506 7.21 -4.04 2.26
23 22 506 4.26 1.87 0.82
24 17 408 35.25 50.76 73.1
25 12 300 71.44 174.32 425.34
sum 100 2256 222.64 -5.6 1025.78
x̄ = 25.4; ms = 4.8
pi = P(xiL < X < xiU |X ∼ N(µ = x̄, σ 2 = ms)
August 3, 2025 241 / 273
Kolmogorov-Smirnov test
Example 9.11
Test the hopthesis that X is Normal distributed with mean of 3 and SD of
1, by the following data
Reference
Book [1] Chapter 12, pp.613 - 757
Book
Example 10.1
y: wage
X is experience (year), Y is 10
9
wage, sample of 5 staff
8
x 1 2 2 3 4 7
6
y 4 6 5 7 9 5
4
Correlation: rxy = 0.98
3
0 1 2 3 4 5
x: experience
y: wage
In sample, linear regression, in general
10
yb = b0 + b1 x 9
8
For each obersvation
7
yi = b0 + b1 xi + ei 6
Coefficient: b0 , b1 5
Example 10.1
With data of 5 observations
b1 = 1.6538; b0 = 2.231
Regression: yi = 2.231 + 1.6538xi + ei
On avarage, wage of non-experience staff is 2.231 units; when
experience increases 1 year, wage increases 1.6538 units
August 3, 2025 249 / 273
OLS Estimation — Standard error
Assumption: V (ε) = σ 2
Estimate for σ 2 is
ei2
P
c2 =
σ
n−2
p
Standard error or Regression: σ
b= σc2
Accuracy of estimator b0 , b1 around β0 , β1 are: V (b0 ), V (b1 )
Standard error of estimated coefficient
p p
se(b0 ) = V (b0 ), se(b1 ) = V (b1 )
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2308 0.5015 4.448 0.02113 *
x 1.6538 0.1923 8.600 0.00331 **
--
Multiple R-squared: 0.961, Adjusted R-squared: 0.948
y = β0 + β1 x1 + · · · + βk xk + ε
In sample of n observations:
yi = b0 + b1 x1 + · · · + bk xk + e
P 2
OLS estimator: that ei → min
P 2
2 2
ei
Assumption V (ε) = σ , then σ =
c
n−k −1
Standard error of estimated coefficient: se(bj ), j = 0, k
βj is unknown parameter
Point estimate of βj is bj
Using bj and se(bj ) to establish confidence interval (1 − α) and
testing for βj with significant level α.
Assumption: Normality distribution of random Error: ε ∼ N(µ, σ 2 )
Confidence interval
βj ∈ bj ± se(bj )t(n−k−1)α/2
Model: y = β0 + β1 x1 + · · · + βk xk + ε
Hypotheses pair
H0 : βj = 0 : coefficientisinsignificant
H1 : βj ̸= 0 : coefficientissignificant
bj − 0
If |T | = > t(n−k−1)α/2 then reject H0 .
se(bj )
General T-test
H0 : βj = b ∗
Hypotheses pair
H1 : βj {̸= 0, >, <}b ∗
bj − b ∗
Statistic: T =
se(bj )
Model: y = β0 + β1 x1 + · · · + βk xk + ε
Hypotheses pair
H0 : β1 = · · · = βk = 0 : model is overall insignificant
H1 : At least oneβj ̸= 0 : model is overall significant
R 2 /k
If F = > f(k,n−k−1)α then reject H0 .
(1 − R 2 )/(n − k − 1)
Example 10.2
Data of 10 worker, exp is experience (year), edu is education (year)
> exp <-c(1 ,2 ,2 ,3 ,4 ,5 ,7, 10,10,12,15,16)
> edu <-c(13,12,16,11,15,15,10,15,13,11,13,15)
> wage <-c(6 ,6 ,12,6 ,11,8 ,8 ,10,11,10,15,13)
Example 10.2
Variable Variable Y
X Qualitative Quantitative
Qualitative Pie chart, bar chart Histograms, bar chart
Z-test for p1 , p2 T-test for µ1 , µ2
Independence Chisq test ANOVA, F-test for means
Spearman’s correlation test
Quantitative Scatter plot, bubble chart
Pearson’s correlation test
Regression
11.1. Concept
11.2. Bayesian distribution
11.3. Beta-Binomial inference
11.4. Gamma-Poisson inference
11.5. Normal-normal inference
Reference
Book [1] Chapter 14, pp.776 - 783
Book
The statistician who use traditional statistic inferences: frequentist.
Parameter is unchangeable.
Bayesian statistics: parameter is random variable.
Example 11.1
Proportion of people has virus is 20%. People are checked by a test, with
correct prob. is 0.9. (+) will be keep in hospital, (-) are free.
(a) Find proportion of Has and No virus in hospital
(b) People in hospital will be checked again, find prob. of Has and No
virus in people with (++)
August 3, 2025 261 / 273
Bayes probability
Example 11.1
P(θj )P(x|θj )
P(θj |x) = P
i P(θi )P(x|θi )
R +∞
If θ is continuous, prior distribution fθ (θ) and −∞ fθ (θ)dθ = 1, then
posterior distribution
P(x|θ)f (θ)
fθ|x (θ|x) = R +∞
−∞ P(x|θ)fθ (θ)dθ
Example 11.2
Discrete p, what is most relevant value among 0.1, 0.3, 0.5, 0.7, 0.9
Sample x = (Y , Y , Y , N)
”Belief”: P(p = 0.1) = P(p = 0.3) = · · · = P(p = 0.9) = 0.2
Prior and Posterior distribution:
pi P(pi ) P(x|pi ) P(x ∩ pi ) P(pi |x)
0.1 0.2 0.0009 0.0002 0.0035
0.3 0.2 0.0189 0.0038 0.0732
0.5 0.2 0.0625 0.0125 0.2422
0.7 0.2 0.1029 0.0206 0.3987
0.9 0.2 0.0729 0.0146 0.2824
sum 1 0.0516 1
(prior) (posterior)
Conjugacy
If Distribution of X (P(x) or fX (x)) such that posterior fθ|x (θ|x) and prior
fθ (θ) are in the same class, then Disrtibution of X and distribution of θ are
‘conjugate’.
Credible Interval
The interval (θ1 , θ2 ) that P(θ1 < θ < θ2 ) = 1 − α is credible interval at
level (1 − α).
α = 1, β =1
Beta distribution Beta(α, β) α = 1, β =3
f (p) α = 2, β =2
Γ(α + β) α−1 α = 2, β =5
fp (p) = p (1−p)β−1
Γ(α)Γ(β) α = 4, β =1
0 ≤ p ≤ 1, α > 0, β > 0
R∞
Γ(α) = 0 x α−1 e −x dx
integer α: Γ(α) = (α − 1)!
α
E (p) =
α+β
αβ
V (p) =
(α + β)2 (α + β + 1) p
0 1
August 3, 2025 266 / 273
Binomial - Beta conjugacy
Preposition
X ∼ B(n, p), prior distribution p ∼ Beta(α, β). Sample x of n observation
has x success and n − x failures, then posterior distribution
p|x ∼ Beta(α + x, β + n − x)
Example 11.3
There is no information of p (prob. of Yes), or prior belief of p is Uniform
in (0, 1).
(a) First sample is (N, Y , N, N, N). Find posterior belief, expectation,
variance and upper bounded credible interval 95% of p
(b) Do the similar if second sample of (Y , N, Y , N, N) is updated to the
data.
Solution 11.3
p ∼ U(0, 1) = Beta(1, 1), E (p) = 0.5; V (p) = 1/12 = 0.0833
(a) Sample x1 that n = 5, x = 1
Posterior distribution: p|x1 ∼ Beta(1 + 1, 1 + 4) = Beta(2, 5)
2 10
E (p) = ; V (p) = = 0.0255
7 392
R UL
P(p < UL) = 0.95 ⇔ 0 30p(1 − p)4 dp = 0.95 ⇔ UL = 0.5818
(b) Sample x2 that n = 5, x = 2
Posterior distribution: p|x1 , x2 ∼ Beta(2 + 2, 5 + 3) = Beta(4, 8)
1 32
E (p) = ; V (p) = = 0.017
3 1872
R UL
P(p < UL) = 0.95 ⇔ 0 15840p 3 (1 − p)7 dp = 0.95 ⇔ UL = 0.5644
f (λ)
Distribution Gamma(α, β) α = 1, β =1
α = 2, β =1
1 α = 3, β =1
fλ (λ) = λα−1 e −λ/β
β α Γ(α) α = 1, β =2
α = 2, β =2
λ > 0, α > 0, β > 0
R∞
Γ(α) = 0 x α−1 e −x dx
integer α: Γ(α) = (α − 1)!
E (λ) = αβ
V (λ) = αβ 2
0 λ
Preposition
X ∼ P(λ), prior distribution λ ∼ Gamma(α, β). Sample x = (x1 , x2 , ..., xn )
then posterior distribution
Xn
p|x ∼ Gamma(α∗ , β ∗ ) with α∗ = α + xi , β ∗ = n + β
i=1
Example 11.5
Number of customer in one hour is Poisson distributed with mean of λ.
Prior belief is λ has Gamma distributed with mean of 10 and variance of
20. Data of 5 hours shows number of customers are: 8, 9, 12, 13, 15.
(a) Posterior distribution, mean and variance of λ?
(b) Upper bounded credible interval 90% of λ ?
Preposition
X ∼ N(µ, σ 2 ), σ is known, prior distribution µ ∼ N(η, τ 2 ). Sample
x = (x1 , x2 , ..., xn ) then posterior distribution
ησ 2 + nx̄τ 2 σ2τ 2
µ|x ∼ N(η ∗ , τ 2∗ ) with η ∗ = , τ 2∗ =
σ 2 + nτ 2 σ 2 + nτ 2
Example 11.6
Price is normal distributed with mean of µ and variance of 9. Prior belief
is the mean is normality N(20, 4). Random sample is (22, 23, 24, 28, 30).
(a) What is posterior distribution of µ?
(b) Find narrowest credible interval 95% of µ
(c) The second sample is (20, 24, 23, 21, 20). Compare between posterior
distribution of µ when (i) second step with new sample, and (ii)
combine two samples into one.
August 3, 2025 271 / 273
11.5. Predictive Inference
THE END