Intro To Probability and Statistics
Intro To Probability and Statistics
STATISTICS
Statistics is like
a bikini, what it
reveals is
suggestive, what
it conceals is
vital
Session 1.2
Session 1.3
Session 1.5
Definition of Statistics
plural sense: numerical facts, e.g. CPI,
peso-dollar exchange rate
singular sense: scientific discipline
consisting of theory and methods for
processing numerical information
that one can use when making
decisions in the face of uncertainty.
Session 1.6
History of Statistics
The
Application of Statistics
Diverse
applications
Application of Statistics
Session 1.10
Areas of Statistics
Descriptive statistics
Inferential statistics
methods
methods
concerned w/
collecting, describing, and
analyzing a set of data
without drawing
conclusions (or inferences)
about a large group
concerned
with the analysis of a
subset of data leading
to predictions or
inferences about the
entire set of data
Session 1.11
Session 1.12
Based on the results, it was concluded that the new milk formulation is
effective in improving the psychomotor development of infants.
Session 1.13
Inferential Statistics
Larger Set
(N units/observations)
Smaller Set
(n
units/observations)
Inferences and
Generalizations
Session 1.14
Key Definitions
Session 1.15
Key Definitions
Types of Variables
Qualitative variable
non-numerical values
Quantitative variable
numerical values
a.
Discrete
b.
Continuous
c.
countable
measurable
Constant
Session 1.17
Levels of Measurement
1.
Nominal
2.
Ordinal scale
3.
Interval scale
4.
Ratio scale
Objective Method
Subjective Method
Session 1.19
Summary Measures
Location
Variation
Percentile
Quartile
Decile
Maximum
Minimum
Central
Tendency
Mean
Range
Variance
Kurtosis
Coefficient of
Variation
Interquartile
Range
Mode
Median
Skewness
Standard Deviation
Session 1.21
Measures of Location
A Measure of Location summarizes a
data set by giving a typical value within
the range of the data values that describes
its location relative to entire data set.
Some Common Measures:
Minimum, Maximum
Central Tendency
Percentiles, Deciles, Quartiles
Session 1.22
Session 1.24
Mean
Population Mean
X
i 1
X1 X 2 K X N
N
Sample Mean
x1 x2 K xn
x
n
n
i 1
Session 1.25
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Session 1.27
Median
~
Sample median denoted as x
Properties of a Median
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Session 1.29
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 2 3 4 5 6
No Mode
Mode = 9
Session 1.30
Properties of a Mode
Session 1.31
Session 1.32
Session 1.33
Session 1.34
Percentiles
Session 1.35
EXAMPLE
Suppose LJ was told that relative
to the other scores on a certain
test, his score was the 95th
percentile.
This means that 95% of those
who took the test had scores less
than or equal to LJs score, while
5% had scores higher than LJs.
Session 1.36
Deciles
Quartiles
Measures of Variation
A
measure of variation is a
single value that is used to
describe the spread of the
distribution
A measure
of central tendency
alone does not uniquely
describe a distribution
Session 1.39
A look at dispersion
Data A
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 3.338
20 21
Mean = 15.5
s = .9258
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 4.57
Session 1.40
Range (R)
The difference between the maximum and
minimum value in a data set, i.e.
R = MAX MIN
Example: Pulse rates of 15 male residents of a
certain village
54
74
58
75
58 60 62 65 66 71
77 78 80 82 85
R = 85 - 54 = 31
Session 1.42
58
75
58 60 62 65 66 71
77 78 80 82 85
IQR = 78 - 60 = 18
Session 1.44
Session 1.45
Variance
important measure of variation
shows variation about the mean
Population variance
(X
i 1
)2
N
n
Sample variance
s2
(x x)
i 1
n 1
Session 1.46
Population SD
(X
i 1
)2
N
n
Sample SD
(x x)
i 1
n 1
Session 1.47
12
n=8
=16
14
15
17
18
18
24
Mean
(10 16)2 (12 16)2 (14 16) 2 (15 16) 2 (17 16) 2 (18 16) 2 (24 16) 2
s
7
4.309
Session 1.48
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 3.338
20 21
Mean = 15.5
s = .9258
20 21
Mean = 15.5
s = 4.57
Data B
11 12
13
14
15
16
17
18
19
Data C
11 12
13
14
15
16
17
18
19
Session 1.50
Mean = 65
S
=0
65
65
65
65
65
Session 1.51
Mean = 65
s = 4.0
62
67
66
70
60
Session 1.52
Chebyshevs Rule
It permits us to make statements about
Chebyshevs Rule
For any data set with mean () and
standard deviation (SD), the following
statements apply:
At least 75% of the observations are
within 2SD of its mean.
At least 88.9% of the observations are
within 3SD of its mean.
Session 1.55
Illustration
At least 75%
Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshevs Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.
Session 1.57
Comparing CVs
Stock A: Average Price = P50
SD = P5
CV = 10%
Stock B: Average Price = P100
SD = P5
CV = 5%
Session 1.59
Measure of Skewness
3 Mean Median
SK
SD
Session 1.60
What is Symmetry?
A distribution is said to be
symmetric about the mean,
if the distribution to the left of
mean is the mirror image
of the distribution to the right
of the mean. Likewise, a
symmetric distribution has
SK=0 since its mean is
equal to its median and its
mode.
Session 1.61
Measure of Skewness
SK > 0
positively skewed
SK < 0
negatively skewed
Session 1.62
Measure of Kurtosis
i 1
Session 1.63
Measure of Kurtosis
K=0
mesokurtic
K>0
leptokurtic
K<0
platykurtic
Session 1.64
Box-and-Whiskers Plot
Box-and-Whiskers Plot
The diagram is made up of a box which
lies between the first and third
quartiles.
The whiskers are the straight line
extending from the ends of the box to
the smallest and largest values that
are not outliers.
Session 1.66
Q1
Md
Q3
75
78
85
Session 1.67
1.5 IQR
60
Q1
Md
Q3
75
78
85
100
Session 1.68
.
.
55
60
Q1
Md
Q3
75
78
85
98
100
Session 1.70