0% found this document useful (0 votes)
25 views20 pages

Sườn Bài Kiểm Tra Tự Luận Ba

The document outlines various statistical analysis methods including descriptive statistics, correlation, t-tests, ANOVA, and regression analysis. It provides step-by-step methodologies for conducting these tests, interpreting results, and understanding concepts like standard deviation, mean, and significance levels. Additionally, it discusses the implications of findings and the importance of statistical significance in comparing groups and variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views20 pages

Sườn Bài Kiểm Tra Tự Luận Ba

The document outlines various statistical analysis methods including descriptive statistics, correlation, t-tests, ANOVA, and regression analysis. It provides step-by-step methodologies for conducting these tests, interpreting results, and understanding concepts like standard deviation, mean, and significance levels. Additionally, it discusses the implications of findings and the importance of statistical significance in comparing groups and variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

SƯỜN BÀI KIỂM TRA

1. analyze —> descriptive statistics —> frequencies —>


name -> variables —> chạy mean median mode

2. pivot table: custom table —> rồi chọn theo column, row —
> chạy PIVOT TABLE

3. deviation small means your product is quite standardized.

4. CV = std/mean —> Provides a relative measure of risk to


return, the higher it is, the riskier it gets. —> 1/CV thì ngược
lại

5. CP = mean/ std—> how well a manufacturing process can


achieve specifications.

if Cp < 1, what should company do ?

6. cause -and-effect relationship—> correlation. chứ không


suy ngược lại được

7. analyze —> correlation - Bivariate - có bảng đánh giá


mạnh hay yếu

8. one -tailed sample test: mean and C - 4 STEPS


- so sánh trung bình với 1 số có sẵn
• Dữ liệu được chọn ngẫu nhiên.
• Biến số là liên tục. (Continuous)
step 1
⁃ Ho: there is no significant difference between the average
household income with 60k USD
⁃ H1: there is a significant difference between the average
household income with 60k USD
step 2
⁃ methodology: one sample t test
step 3: descriptive statistics

⁃ the average household income of 6400 respondents is 69,4748


and the standard deviation is 78,71856
step 4: result
⁃ result

⁃ t = 9,629
⁃ p< 0,001
—> the average household income is bigger than 60k and the
difference is 9,47484, it is significant at 0,001 level

two - tailed sample test


step 1
Ho: mean of Male = mean of Female
H1: mean of Male not = mean of Female —> bigger or smaller
step 2 independent samples t test
step 3 Descriptive statistics: N, mean, s for each group
step 4
9. independent sample t test: xA and xB - 5 STEPS
(HOMOGENEITY OF VARIANCE)
- Thang đo:
- bài toán để cta so sánh 2 giá trị trung bình của 2 nhóm (2 sự việc
độc lập nhưng trong 1 tổng thể).
Ví dụ:
Sự khác biệt giữa nam và nữ trong một điểm thi
Sự khác biệt về điểm số hài lòng trong cuộc sống giữa những người đã
kết hôn và những người chưa kết hôn
Test variable: Biến liên tục (continuous); Grouping Variable: Biến rời
rạc (discrete)

step 1:
Ho: there is no difference between the average household income of
wireless non-user and wireless user
H1: there is a significant difference between the average household
income of wireless non-user and wireless user
Step 2: Methodology: independent sample t-test
step 3: descriptive statistics

⁃ the average household income of 3853 non-wireless users is


62.03762 and the standard deviation 67.03762
⁃ the average household income of 2547 wireless users is 79.3098
and the standard deviation 92.81293
step 4:Test of Homogeneity of Variance - EQUALITY OF
VARIANCES

Levene’s test:
⁃ F = 73.179
⁃ P <0.001 —> Equal variances not assumed. (>0.05 thì assumed)
Step 5
⁃ t: -7.66
⁃ p <0.001
⁃ —> Rejecting the null hypothesis (H0) means there is a
statistically significant difference in the mean between two groups of
values.
⁃ —> the average household income of non-wireless users is
smaller than the figure for wireless users and the difference is
16.33625, it is significant at 0,001 value.

10. paired sample test t test: xA and xA’ : move together or


not - 5 STEPS (CORRELATION) two data of the same group—>
check the estimation is right or wrong - two different time (1
nhóm trên 2 category khác nhau) khác với independent là 2
nhóm trên 1 category
-Biến giống independent

• So sánh điểm số của học sinh trước và sau khi học thêm.
• Đánh giá hiệu quả của một loại thuốc trước và sau khi dùng trên
cùng một nhóm bệnh nhân.

s1: pair 1:
⁃ Ho: there is no difference between the average score of writing
and reading
⁃ H1: there is a difference between the average score of writing
and reading
part 2:
⁃ Ho: there is no difference between the average score of math and
science
⁃ H1: there is a difference between the average score of math and
science
s2: methodology: paired sample test
s3: descriptive statistics
pair 1:
- the average writing score of 32 respondents is 84.75 and the
standard deviation is 9.880
- the average reading score of 32 respondents is 81.22 and the
standard deviation 10.235
pair 2:
⁃ the average math score of 32 respondents is 64.06 and the
standard deviation 14.856
⁃ the average science score of 32 respondents is 71.59 and the
standard deviation 12.924

s4:
pair 1:
⁃ r = 0,8 (correlation)
⁃ p< 0.001 (sig)
⁃ There is a strong positive correlation between the
average score of Writing and Reading
pair 2:
⁃ r = 0.935
⁃ p <0.001
⁃ there is a strong positive correlation between the
average score of Math and Science

s5:
pair 1:
⁃ t = 3.531
⁃ p <0.05
⁃ the average score of Writing is bigger than that of
Reading and the difference is 3.531, it is significant at 0.05
pair 2:
⁃ t = -7.531
⁃ p<0.001
⁃ the average score of Science is bigger than that of
Math and the difference is 7.531, it is significant at 0.001

11. ANOVA: dấu hiệu for different types of. - nhiều T-test
cộng lại vì nó so sánh nhiều mean
Nominal (duy nhất)
Biến: nominal, ratio (ANOVA)

step 1
Ho: there is no significant difference between the mean response for
Value for the Dollar for different types of cell phones
H1: there is at least one significant difference between the mean
response for Value for the Dollar for different types of cell phones

step 2: methodology
ONE WAY ANOVA

step 3 descriptive statistics

⁃ the average response for Value for Dollar of 21 Smart type is the
highest at 3.81, and the standard deviation is 1.03
⁃ the average response for Value for Dollar of 19 Camera type is
the smallest at 3.16, and the standard deviation is 0.958

step 4: homogeneity of variance


- F of Levene’s test = 3.171
- P of Levene’s test: 0.051 > 0.05 —> equal variance assumed —>
go directly to the ANOVA test (based on mean)
- Trường hợp: p<0.001—> equal variance not assumed —> the
ANOVA could be wrong so we need additional test —> Welch’s
test

step 5 hypothesis test

⁃ F of ANOVA’s test = 3.111


⁃ p = 0.053
⁃ Ho is supported, there is no significant difference between the
mean response for Value for the Dollar for different types of cell
phones

step 6 Post-hoc tests

⁃ the average response for Value for Dollar of Smart type is bigger
than that of Camera and the mean difference is 0.652. it is significant
at 0,05 level.
⁃ there is no significant difference between the average response
for Value for Dollar of Smart type and Basic type.
⁃ there is no significant difference between the average response
for Value for Dollar of Camera type and Basic type.

Kết luận gộp, không cần so sánh từng cái:


—> Kết luận ví dụ: "Those who do not finish high school have
lower incomes compared to all other groups. However, when
compared to high school graduates, the difference is not
statistically significant

—> Kết luận toàn bài: 2 nhóm kề nhau thì thường sự khác biệt
sẽ không có ý nghĩa thống kê, các nhóm xa
nhau thì sự khác biệt sẽ có ý nghĩa thống kê. Nhưng chung quy
thì có học thì sẽ có khác biệt.

Trường hợp step 4 not assumed


practice:

s1:
Ho: mean NH = mean H = mean SC = mean CL = mean PG
H1: there is at least one significant difference among NH, H, SC, CL,
and PG
s2: ANOVA – methodology
s3: descriptive statistics:

PG group has the highest average income, further emphasizing the


economic benefits of advanced education.
NH group has the lowest average income, indicating a significant
economic disadvantage.
As the level of education increases, so does the average household
income
While the average household income increases with higher education,
the Std also grows. This suggests that income distribution is more
spread out among individuals with higher education levels

s4: homogeneity of variance:


• F of Levene ’s test = 14.766 (based on Mean)
• p<0.001—> equal variance not assumed —> the ANOVA could
be wrong so we need additional test —> Welch’s test

s5: hypothesis test

F ANOVA: 15.309
p<0.001

Equality of Mean Welch: 14.622


p<0.001
—> Ho is not supported,
s6: post - hoc test - not belong to ANOVA
• 3 or 4 sentences is enough

12. 2 ways ANOVA : 2 factors


Nếu muốn mở rộng phân tích để xem xét sự khác biệt của một biến
định lượng dựa trên hai biến định tính đồng thời, phương pháp One-
Way ANOVA sẽ không còn phù hợp lúc này chúng ta cần sử dụng phân
tích phương sai hai yếu tố (Two-Way ANOVA).

13. TRENDLINE AND REGRESSION ANALYSIS (HỒI QUY) - DỰ


ĐOÁN
⁃ dependent should be the ratio (scale and continuous) variable
(continuous data) and it could not be the multi nominal
⁃ ratio category —> ordinal data (ranked, put in order) —> nominal
(1 and 0, yes or no: binary, dummy variables; 1 and 2; >=3 —>
convert the nominal into the dummy variables) , we cannot compare
between the two group —> don’t put into the regression
⁃ ordinal (put into the regression model), ratio (regression) có m
tính chất thì có m-1 biến giả
⁃ hope it is > 0.4, 0.7—> 0.9: good, >0,9: strong, good, 1: perfect :
R square

step 0: MV = Bo +B1*HA

step 1:

⁃ R square = 0.131
⁃ adjusted R square = 0.109
—> the trend line is weakly fit with the data set;
—> variation explained: 10,9% of the variation of the market value
can be explained by the house age —> 89.1 % of the variation of the
market value can be explained by other factors (dont need to
mention )
R square - hope it is > 0.4, 0.7—> 0.9: good, >0,9: strong,
good, 1: perfect

step 2: ANOVA
⁃ Ho: all the coefficient is equal to 0
⁃ H1: there is at least one coefficient different from 0 —> there will
be a regression line

⁃ F = 6.010
⁃ p <0.05
 H1 is supported- there is at least one coefficient different
from 0. There should be a regression line,

step 3: Regression model


assessment of coefficient
⁃ beta of constant = 2.358 (constant) (don’t need)
⁃ p < 0.05
—> the coefficient is significant ( factors that is independent)
⁃ beta of house age = 2.458
⁃ p < 0.05
⁃ —> coefficient is significant
3.1. unstandardized coefficient


⁃ MV = Bo +B1*HA
⁃ the function can forecast the value of the market value
⁃ given HA = 10 —> MV = 45217 + 1570.4*10 = 60921
3.2. standardized coefficient

⁃ MV = 0.361*HA
—> the house age has a positive impact on the market value

Lý thuyết multi:
VIF if it is bigger than >= 2.5, 3, 10 , vif< 2 (thực tế là 10) thì
không có hiện tượng đa cộng tuyến
VIF càng nhỏ, càng ít khả năng xảy ra đa cộng tuyến.

1. 1. R^2 lớn nhất


1. R^2 bằng thì xét VIF nào nhỏ hơn lấy nha
2. Standardized là dùng để đánh gía biến nào tác động mạnh hay yếu

ĐÁP ÁN: If VIF < 2, there is no multicollinearity occurring.


And if 2.5=< VIF < 10, there should be multicollinearity. With
the figure indicating that 2 < VIF < 2.5, there's a possibility
of occurring multicollinearity, but with a very low probability

Last assignment
INDEPENDENT MODEL 1 MODEL 2

Employ 0.484*** 0.484***

Wireless -0.013 -0.012

Employ*Wireless 0.025*
--------
R^2 0.234 0.235

F(p) 980.098, p <0.001 655.602, p<0.001

Model Summaryc
Model Adjusted R Std. Error of the
R R Square Square Estimate Durbin-Watson
a
1 .484 .235 .234 .87503568
b
2 .485 .235 .235 .87474216 2.018
a. Predictors: (Constant), Zscore: Wireless service, Zscore: Years with current
employer
b. Predictors: (Constant), Zscore: Wireless service, Zscore: Years with current
employer, EMPLOY_WIRELESS
c. Dependent Variable: Zscore: Job satisfaction
Coefficientsa
Model Unstandardized Standardized t Sig.
Coefficients Coefficients
B Std. Error Beta
1 (Constant) 7.765E-016 .011 .000 1.000
Zscore: Wireless
-.013 .011 -.013 -1.154 .248
service
Zscore: Years with
.484 .011 .484 44.209 .000
current employer
2 (Constant) .001 .011 .065 .948
Zscore: Wireless
-.012 .011 -.012 -1.126 .260
service
Zscore: Years with
.484 .011 .484 44.267 .000
current employer
Employ_Wireless .025 .011 .025 2.301 .021
a. Dependent Variable: Zscore: Job satisfaction

ANOVAa
Model Sum of df Mean SquareF Sig.
Squares
Regression 1500.897 2 750.449 980.098 .000b
1 Residual 4898.103 6397 .766
Total 6399.000 6399
Regression 1504.948 3 501.649 655.602 .000c
2 Residual 4894.052 6396 .765
Total 6399.000 6399
a. Dependent Variable: Zscore: Job satisfaction
b. Predictors: (Constant), Zscore: Years with current employer, Zscore:
Wireless service
c. Predictors: (Constant), Zscore: Years with current employer, Zscore:
Wireless service, Employ_Wireless

Model 2: Jobsat = B1*Employ + B2*Wireless + B3*(Employ*Wireless)


Step 1: Compute ZJobsat, ZWIRELESS, ZEmploy*WIRELESS
Step 2: Compute Interaction Term
WIRELESS_Employ = ZWIRELESS*ZEmploy
Step 3: Create 2 models
Model 1: Jobsat = ZEmploy + ZWIRELESS
Model 2: Jobsat = ZEmploy + ZWIRELESS +ZEmploy*ZWIRELESS

 There is a positive interaction effect of Employ and Wireless on Job


Satisfaction.
 There is a negative effect of wireless on Job Satisfaction.
 There is a positive effect of Employ on Job Satisfaction

17/296
DEPENDENT:
INDEPENDENT
CALORIES
MODEL 1 MODEL 2 MODEL 3
SODIUM 0.104 0.345**
FIBER 0.088 -0.329**
CARBS 0.706*** 0.690***
SUGARS 0.926*** 0.917***
R^2 0.726 0.717 0.228
F(p) 44.652 (p < 0.001) 84.702 (p<0.001) 10.721 (p<0.001)
VIF RANGE 1.174 -> 2.173 1.280->1.280 1.011->1.011

Model 1 fit stronger than model 2 with R^2 = 0.726, F=44.652, p<0.001
There is NO multicollinearity between any independent variables (VIF < 2.5)
Sodium has no impact on calories with significant level 0.01.
Fiber has no impact on calories with significant level 0.01.
Carbs has a strong positive impact on calories with significant level 0.001.
Sugar has a strong positive impact on calories with significant level 0.001.
BÀI TẬP THÊM - HỒI QUY

1. Phương trình hồi quy


Healthcare rating=1.040+ 0.005 AGE +0.041 EDUCATION +0.018 GENDER
+0.041 DOCTOR TRUST + 0.278 URGENT HEALTHCARE + ε
2.
At the 5% significance level
Ta có H0: Bi = 0 Gender is not a significant variable
H1: Bi ≠ 0 Gender is a significant variable
p = 0.015 < 0.05 => reject H0
=> Gender has a statistically significant positive impact on Rating of general health services.
???Ý nghĩa: At the 95% confidence level, women have a 0.018 unit higher Rating of general health
services than men (on a scale of 1 to 4).
3.
Age: p = 0.000 < 0.05
Ý nghĩa: At the 95% confidence level, for every 1 unit increase in age, the average health service
rating increases by 0.005 units. This shows that older people tend to rate health services higher.
Doctor Trust: p = 0.001 < 0.05
Ý nghĩa: At the 95% confidence level, for every 1-unit increase in the trust in doctors scale (1–4), the
health service evaluation score increases by 0.041 units. This shows that the higher the level of trust
in doctors, the more positive the health service evaluation.
CÂU KHÁC

1. adj R square: ….. of the variation in home market can be


explained by……, ……. % is affected by other factors
R-square=0.385: 38.5% of the variation in the dependent variable Purchase Intention is
explained by the independent variables in the model CH3, PR1, TR2, PV1. This shows
that the model has a medium level of explanation, meaning that it has captured a part of
the factors affecting purchase intention, but there is still 61.5% of the variation that is not
explained, which may be due to factors outside the model or random errors.
Adj R-square = 0.377: 37.7% of the variation in Purchase Intention is reliably explained
by the independent variables, while not inflating the model fit. This value reflects the
model fit more accurately as it adjusts for the number of predictors and decreases if
additional variables do not contribute to the model's significance.
+ Although the Adj R-square is lower than 0.5, this does not mean that the model is not
meaningful. It indicates that the current model has a moderate level of fit but still needs
further improvement, for example by adding more important factors to better explain
purchase intention.

2.
Sig (PV1) = 0.165
Sig (1PR1) = 0.264
Sig (TR2) = 0.000
Sig (CH3) = 0.000
- Biến có tác động:
- Variables with impact:
+ Trust in the store (TR2): Since p=0.000<0.05, we reject the hypothesis H0 (Regression
coefficient of TR2 is 0), that is, the regression coefficient of variable TR2 is different
from 0. This shows that the variable Trust in the store has an impact on Purchase
Intention. The high standardized Beta coefficient (0.385) shows a strong and positive
impact (positive Beta), meaning that when trust in the store increases, purchase intention
also increases.

+ Product quality (CH3): Since p=0.000<0.05, we reject the null hypothesis H0


(Regression coefficient of CH3 is 0), which means that the regression coefficient of
variable CH3 is different from 0. This shows that the variable Product quality has an
impact on Purchase Intention. The standardized Beta coefficient (0.390) shows a strong
and positive influence (positive Beta), which means that when product quality increases,
purchase intention will increase accordingly.
- Biến không có tác động:
+ Perceived benefits of purchasing (PV1): Since p=0.165>0.05, we accept the hypothesis
H0 (Regression coefficient of PV1 is 0). This shows that this variable has no impact on
Purchase Intention.
+ Risks of purchasing (PR1): Since p=0.264>0.05, we accept the hypothesis H0
(Regression coefficient of PR1 is 0). This shows that this variable has no impact on
Purchase Intention.
3. Đề xuất giải pháp:
a) Improve Trust in the store:
- Provide clear return and refund policies to reduce customer concerns.
- Transparently display customer reviews and ratings.
- Promote reputable certifications to increase trust.
b) Improve Product quality:
- Implement product quality warranties and commitments.
- Collaborate with reputable brands and suppliers to improve quality perceptions.
c) Pay attention to non-significant variables
Although Perceived benefits (PV1) and Risks of purchasing (PR1) are not statistically significant,
improving them can still enhance the shopping experience:
- Increase Perceived benefits: Provide competitive prices, loyalty programs or exclusive offers.
- Reduce Risks of purchasing: Ensure transparency about delivery times and provide real-time order
tracking.

You might also like