Công thfíc MAS202 - Summary Xác suất thống kê toán doanh
nghiệp
Xác suất thống kê toán doanh nghiệp (Trường Đại học FPT)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
CHAP 1: DEFINE & COLLECTING DATA
I. Measurement Scales
1. Nominal scale: no ranking is implied
VD: Yes, No; Type of investment: Growth, value, …
2. ordinal scale: ranking is implied
VD: Very unsatisfied, Fairly unsatisfied, Neutral, Fairly satisfied, Very
satisfied Freshman, Sophomore, Junior, Senior
A, B, C, D, F
3. interval scale: an ordered scale: difference between measurements is a
meaningful quantity but measurements not have a true zero point.
VD: temperature, exam score,…
4. ratio scale: have a true zero
point. VD: height, age, salary,…
II. Collecting Data Via Sampling Is Used When Doing So Is
- Less time consuming
- Less costly
- Less cumbersome and more practical
III. Collecting data from
- Ongoing business activities: dữ liệu từ các hoạt động diễn ra trong nội bộ
công ty
- Compiled by organization/ individuals: dữ liệu ngành từ các công ty
nghiên cứu thị trường, hiệp hội thương mại, báo,,,,
- Survey
- Experiment: sử dụng thử, Thử nghiệm thị trường về các chương trình khuyến
mãi sản phẩm thay thế để xác định xem nên sử dụng chương trình khuyến
mãi nào rộng rãi hơn.
- Observation study (Observation report): Đo lưu lượng giao thông qua ngã
tư để xác định xem một số hình thức quảng cáo tại ngã tư có hợp lý hay
không
IV. Sources of Data
- Primary Source: data from survey, experiment, observation
- Secondary Source: data from print journals, published on the internet
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
V. Types of Samples
1. Non probability Sample
- Judgement: get opinions of expert
- Convinience: from family, friends,…
2. Probability Sample:
- Simple Random
- Systematic (kth)
- Stratified
Divide population -> subgroups (called strata) according to common
characteristic -> A simple random sample selected from each subgroups ->
combined into one
- Cluster
Population -> Clusters -> Select simple random sample of clusters
VI. Types of Survey Error
1. Coverage error or selection bias (Lỗi bảo hiểm hoặc sai lệch lựa chọn)
Excluded from frame and have no chance of being selected (Tồn tại nếu một
số nhóm bị loại khỏi khung và không có cơ hội được chọn)
2. Nonresponse error or bias (Lỗi không phản hồi hoặc sai
lệch) People not respond
3. Sampling error (Sai số do việc lấy mẫu)
Random differences from sample to sample
4. Measurement error (Lỗi đo lường)
Due to weaknesses in question design and / or respondent error.
CHAP 2: ORGANIZING & VISUALIZING VARIABLES
I. Organizing
1. Categorical Data
- One categorical variable => Summary table: tallies frequencies/
percentages of items in a set of categories -> see differences between
categories.
- Two categorical variables => Contigency table (Two – way table)
- Three/ More categorical variables => Multidimensional contingency table
2. Numerical Data
- Ordered Array: rank order, from the smallest value to the largest value
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
- Frequency Distribution: summary table: data are arranged into
numerically ordered classes.
+ determining a suitable width of a class grouping, establishing boundaries of
each class
+ a frequency distribution should have at least 5 but no more than 15 classes
Highest value – Lowest value
+ Width of a class interval = Number of class
+ Relative Frequency = Frequency / Total
- Cumulative Distribution
+ Cumulative Percentage = Cumulative Frequency / Total * 100
II. Visualizing
1. Categorical Data
a. Summary table
- Bar chart
- Pareto chart
+ categorical data (nominal scale)
+ Vertical bar chart: descending order of frequency.
+ cumulative polygon
+ separate the “vital few” from the “trivial many.”
- Pie chart/ Doughnut chart
b. Contigency table
- Side by side bar chart
- Doughnut chart
2. Numerical Data
a. Ordered Array => Stem and Leaf display
- Separate sorted data series: leading digits (stems) & trailing digits (leaves).
b. Frequency Distribution & Cumulative Distribution
- Histogram: no gaps between adjacent bars
Frequency
7
6 Histogram: Age Of Students
5
4
3
2
1
0
5 15 25 35 45 55 Mor
e
- Polygon: midpoint of each class represent the data in that class
- Scatter plot (Two numerical variables)
Cost per Day vs. Production Volume
250
Cost per
200
150
Day
100
50
0
20 25 30 35 40 45 50 55 60 65
Volume per Day
- Time Series plot (Two numerical variables)
III. Graphical Error
- No relative basis
- Compressing vertical axis
- No zero point on vertical axis
- Chart junk
CHAP 3: NUMERICAL DESCRIPTIVE MEASURES
I. Median
n+1
- Tìm L = 2 = a.5
- Median = (xa.5 – 0.5 + xa.5 + 0.5) / 2
II. Z-score
- A data value is considered an extreme outlier if its Z-score is less than -3.0
or greater than +3.0.
|Z| > 3 => X : outlier
- The larger the absolute value of the Z-score, the farther the data value is
from the mean.
Z1 > Z2 => X1 : higher relative position than X2
III. Shape of Distribution
1. Skewness: Measures the extent to which data values are not symmetrical
- Mean = Median => Symmetric
- Mean < Median => Left-skewed
- Mean > Median => Right-skewed
2. Kutoris: measures the peakedness of the curve of the distribution
IV. Quartile
- Split the ranked data into 4 segments with an equal number of values
per segment
- Find Q1 : Tìm L1 = (n + 1) / 4 -> Tìm Q1
- Find Q2: Tìm L2 = (n + 1) / 2 -> Tìm Q2
- Find Q3: Tìm L3 = 3(n + 1) / 4 -> Tìm Q3
Interquartile range (IQR) = Q3 – Q1
Dữ liệu nằm ngoài khoảng (Q1 – 1.5IQR ; Q3 + 1.5IQR) -> Outlier
Left-Skewed Symmetric Right-Skewed
Median – Xsmallest
Median – Xsmallest Median – Xsmallest
> ≈ <
Xlargest – Median
Xlargest – Median Xlargest – Median
Q1 – Xsmallest
Q1 – Xsmallest Q1 – Xsmallest
>
≈ <
Xlargest – Q3
Xlargest – Q3 Xlargest – Q3
Median – Q1
Median – Q1 Median – Q1
>
≈ <
Q3 – Median
Q3 – Median Q3 – Median
V. Covariance
- Measures the strength of the linear relationship between two
numerical variables ( X, Y)
- Sample covariance:
cov(X,Y) > 0 -> X , Y : same direction.
cov(X,Y) < 0 -> X, Y : opposite directions.
cov(X,Y) = 0 -> X, Y : independent
- Coefficient of Correlation:
The closer to –1, the stronger the negative linear relationship.
The closer to 1, the stronger the positive linear relationship.
The closer to 0, the weaker the linear relationship.
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
CHAP 4: BASIC PROBABILITY
I. Basic Probabilit concept
- Probability: 0 < P < 1
- Impossible Event: P = 0
- Certain Event: P = 1
II. Mutually Exclusive: Events that cannot occur simultaneously
A∩B=∅
III. Collectively Exhaustive: One of the events must occur.
A∪B=S
IV. Multiplication rule
P ( A and B) = P ( A ∩ B) = P(A) x P(B|A) = P(B) x P(A|B)
V. Conditional probability: probability of one event, given that another event has
occurred
P(A|B) = P(A and B) / P(B)
VI. Independent Event
P(A|B) = P(A)
P(A ∩ B) = P(A) x P(B)
P (B∨ A) x P( A)
P(A|B) = P( B)
VII. Counting Rule
1. Counting Rule 1:
If any one of k different mutually exclusive and collectively exhaustive
events can occur on each of n trials (Nếu bất kỳ một trong k sự kiện loại trừ
lẫn nhau và đầy đủ khác nhau có thể xảy ra trên mỗi n lần thử)
kn
2. Counting Rule 2:
If there are k1 events on the first trial, k2 events on the second trial, … and kn
events on the nth trial
(k1)(k2)…(kn)
3. Counting Rule 3:
The number of ways that n items can be arranged in order is
n! = (n)(n – 1)…(1)
4. Counting Rule 4:
Permutation (Hoán vị): The number of ways of arranging X objects selected
from n objects in order is
n Px
5. Counting Rule 5:
Combinations (Tổ hợp): The number of ways of selecting X objects from n
objects, irrespective of order, is
nCx
CHAP 5: DISCRETE PROBABILITY DISTRIBUTIONS
I. Binominal Distribution
X ~ B(n, π)
X = number of success in n trials
n trials: independent
Each trial: success: p = π
Failure: q = 1 –
π P (X = k) = nCk . πk . (1-
π)n-k Mean: E(X) = n π
Variance: V(X) = n. π.(1- π)
Standard of deviation: σ = √ V(
X)
II. Poisson Distribution
X = number of events occur in an interval of time unit
Length: T
k
( λT )
P(X=k) = e-λT .
k!
λ: number of events occur in a unit of time
E(X) = V(X) = λT
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
III. Covariance of a Probability Distribution
σxy = ∑(x - µx)(y - µy). P(x,y)
σxy > 0 => same direction
σxy < 0 => opposite direction
IV. Porfolio Expected Return
E(P) = w. E(X) + (1 – w).E(Y)
V. Porfolio Risk
σ p = √ w 2 . σX2 +(1−w)2 .σY 2 +2 w (1− w ). σXY
CHAP 6: CONTINUOUS RANDOM VARIABLES
I. Uniform Distribution
X ~ U[a, b ]
1
P = (b’ – a’ ) . −
b a
a+b
E(X) = 2 = Median
V(X) = (b – a)2 / 12
II. Normal distribution
X ~ N(µx , σ2)
a− μ
P(X ≤ a) = P(z ≤ )
σ
a > µ => P(X ≤ a) > 0.5
a < µ => P (X ≤ a) <
0.5
CHAP 7: SAMPLING DISTRIBUTION
I. Sampling distribution of Mean
If normal / n ≥ 30
X ~ N(
CHAP 9: TEST OF HYPOTHESIS FOR A SINGLE SAMPLE
(1) H0: µ = μ0
Downloaded by Qu?nh Nh? Tr?n (
[email protected])
H1: µ ≠ μ0 (two-tailed)
(2) H0: µ ≥ μ0
H1: µ ¿ μ0 (left-tailed)
(3) H0: µ ≤ μ0
H1: µ ¿ μ0 (right-tailed)
Remark:
α = P(Error type I) = P(error of rejecting Ho when Ho true)
β = P(Error type II) = P(error of failing to reject Ho when Ho false)
α tăng, β giảm
p-value < α => Reject Ho
p-value > α => Fail to reject Ho
CHAP 11: ANOVA
I. One Way ANOVA
II. Two Way ANOVA
CHAP 13: SIMPLE REGRESSION
Prediction Line (Best fit line): ^y = b0 + b1
b0 is the estimated mean value of Y when X = 0
b1 estimates the change in the mean value of Y as a result of a one-unit increase in X.
Coefficient of Correlation = |Multiple R|
R Square (coefficient of determination):
Standard Error of Estimate:
SSE = error sum of squares
n = sample size
Residual: difference between its observed and predicted value.
T Test for slope:
F Test for Slope:
T test for correlation coefficient:
r = multiple R, p = 0