DẠNG 1.
NHẬN XÉT TỈ LỆ PHẦN TRĂM (BẢNG TẦN SUẤT)
(Analyze > Descriptive Statistics > Frequencies > Chọn biến muốn thực hiện > kéo zô ô)
Comments:
- Frequency: The Age variable data table has a total of 18 observations, in which value 18
appears many times (6 times) and value 22 appears the least (2 times). The difference
between the lowest and highest value is 4 times.
- Percent: The ages participating in the survey are not uniform, leading to large differences in
percentages, specifically: Age 18: 6/18 = 33.3% Age 19: 4/18 = 22.22% Age 20: 3/18 =
16.7% Age 21: 3/18 = 16.7% Age 22: 2/18 = 11.1%
- Cumulative Percent: Shows the cumulative percentage of valid values. It is calculated by
accumulating percentages across observations until it reaches 100%.
In the Age variable data table, the cumulative percentage is accumulated across ages,
specifically:
Age 18: 33.3% (first cumulative percentage is always the same as the first valid percentage).
Age 19: 33.3% + 22.2% = 55.5% Age 20: 55.5% + 16.7% = 72.2% Age 21: 72.2% + 16.7% =
88.9% Age 22: 88.9% + 11.1% = 100%
(Analyze > Descriptive Statistics > Frequencies > Chọn biến muốn thực hiện > kéo zô ô)
Which level of education accounts for the highest
percentage? Which level of education accounts for the
lowest percentage?
In the research sample, 66% of survey participants have a
Bachelor's degree, while 23% have high school
education, 7% have Master's degree, and 4% have PhD.
Thus, in the research sample, the highest educational level
is University and the lowest is PhD.
(thiếu cái bảng nhỏ statistics á nhen) Analyzing the research sample, which ethnic group has
the highest proportion? Which ethnic group has the
lowest percentage?
Due to the frequency table of ethnicity, The highest
proportion and the group with the lowest percentage can be
determined as follows:
+ The ethnic group with the highest proportion in the
research sample is “WHITE” (55.6%)
+ The ethnic group with the lowest percentage in the
research sample is “BLACK” (2.5%)
In conclusion, the “WHITE” ethnic group has the much
higher proportion than the “BLACK” ethnic group
DẠNG 2. NHẬN XÉT BIỂU ĐỒ
(Graphs > Chart Builder > Chọn Chart kéo lên > Chọn biến kéo zô Slice by > góc phải chọn Count
thành Percentage > Ra chart nhấn đúp chọn biểu tượng cái biểu đồ cột góc phải để hiện
percentage )
2.1. Cách nhận xét chung của các biểu đồ Bar, Line, Area, Pie, Histogram.
- Nhận xét chung về sự khác nhau, sự thay đổi:
The difference between the average GPA of native people and Asian people in the GPA is 1 point.
The Age variable frequency table has uneven age values and decreases with age from low to high age.
- Nêu lên thông tin từng thành phần có trong bảng có tỉ lệ hay đơn vị như thế nào?
The chart shows the average GPA of native students is 2.97; Asian students is 2.56;…
- So sánh các thành phần trong bảng với nhau; thành phần nào có hệ số cao nhất, thấp
nhất và ở giữa,…
Among the 5 ethnic groups, the highest average GPA belongs to indigenous students with an average
GPA of 2.97 and the lowest average GPA belongs to Asian students with an average GPA of 2.56.
Next is the GPA of white and Hispanic students with GPAs of 2.89 and 2.87 respectively. In the
middle are black students with a GPA of 2.7. The difference between the highest average GPA of
native people and the lowest Asian people in the GPA is about 0.41 points.
2.2. Line chart:
Comments: There are differences in the average GPA of ethnic groups, as shown in the chart. With 5
groups of ethnicity, we can see that the Native group has the highest mean GPA at 2.97. In contrast,
Asian people have the lowest mean GPA at 2.56. The difference between the average GPA of Native
people and Asian people is 0.41 points
2.2. Pie charts:
Comments: From the pie chart above, we can observe that 62 people are female, corresponding to
62%, and the remaining 38 people, accounting for 38%, were male. It can be seen that the number of
female participating in the survey is almost twice as many as male.
2.4. Scatter charts:
Scatter charts show the relationship between two variables.
Comments:
+ The scatter chart shows the graduation rate (Y-variable) versus median SAT scores (X-
variable) for the Colleges and Universities data
+ With the linear equation shown on the chart, the coefficient of the variable x = 0.03 is a
positive coefficient that shows the relationship between median SAT scores and graduation
rate, implying that when the median SAT scores increase, the graduation rate also increases.
+ The R-squared value of approximately 1.9% indicates that a small proportion of the variation
in graduation rates (1.9%) can be explained by variations in median SAT scores. This
suggests a relatively weak correlation between the two variables.
+ Around 1.9% of the variability in graduation rates can be attributed to differences in median
SAT scores. The remaining percentage of variability is likely influenced by other factors not
considered in this analysis.
DẠNG 3. T-TEST
3.1. Independent Samples T - Test
Independent Samples t-test: Analyze → Compare Means → Independent-samples T-test
- Cách dùng:
+ Independent Sample T - test dùng cho việc so sánh trung bình của 2 nhóm mẫu khác nhau
+ Ví dụ: Difference between males and females on an exam score, Difference of life-
satisfaction scores between those who are married and those who are unmarried
- Cách nhận xét:
+ Viết ra các bước thực hiện. Nhận xét kết quả.
- Ví dụ:
So sánh thu nhập trung bình của 2 nhóm giới tính. Có sự khác biệt có ý nghĩa thống kê giữa 2
nhóm giới tính về mặt Thu nhập không?
Comments: In Levene’s Test for Equality of Variances, Sig. = 0.809 > 0.05, so use the result in the
Equal variances assumed line. In the t-test table for Equality of Means, Sig. = 0.777 > 0.05, so it is
concluded that there is no statistically significant difference between the average of the two
populations. In other words, between male and female, there is no evidence of a difference in the
mean value of Income.
3.2. Paired - Samples T - Test
- Cách dùng:
+ The paired-samples t test, is usually based on groups of individuals who experience both
conditions of the variables of interest.
+ Examples include students’ scores on the first quiz versus the same students’ scores on the
second quiz;
Ví dụ:
Bảng 1: xem cột mean là giá trị trung bình. Giá trị trung bình pretest là 17.78, posttest là 21.28. Rõ
ràng posttest có chỉ số cao hơn.
Bảng 3: nhìn cột cuối sig. 2 tail. có giá trị là 0.000 <5%. Do đó ta kết luận là có sự khác biệt có ý
nghĩa thống kê giữa hai giá trị pretest và posttest. Sự khác biệt này chính là -3.500 dựa vào cột mean
trong bảng 3 này.
Comments: This paired-samples t test analysis indicates that for the 50 subjects, there is a significant
difference in the mean of the two nhóm người (sig. = 0.000 < 0.05). In other words, the mean chỉ số
đường huyết on the posttest (M = 21.28) was significantly greater than the mean chỉ số đường huyết
on the prettest group (M = 7.47).
DẠNG 4. ONE-WAY ANOVA
One-way ANOVA: Analyze → Compare Means → One-way ANOVA
Consider the Based on Mean row in Levene Statistic, Sig. = 0.002 < 0.05. There was the difference
in variance between age groups, so we will use the results of The Robust Test table. Sig. = 0.5 > 0.05.
There was no difference in variance between age groups, so we will use the results of The Anova Test
table.
The model has an F-value of 9.963, which yields a p-value of 0.00 (< 0.05), suggesting that at least
two of the three groups differ significantly with regard to the mean of overall price/performance
satisfaction. The model has an F-value of 9.963, which yields a p-value of 0.5 (> 0.05), suggesting
that there is no statistically significant difference between groups.
DẠNG 5. CORRELATIONS
Lưu ý: Nếu đề yêu cầu xây dựng model multi linear regression thì trước tiên cần xem xét các biến
trong model có đa cộng tuyến hay không
Correlation (Đa cộng tuyến): Correlation → Bivariate
+ Multicollinearity occurs when independent variables are strongly correlated with each other.
When the absolute value of the correlation coefficient is greater than 0.7, it shows
multicollinearity.
+ Colleges and Universities correlation matrix; none exceed the recommended threshold of ±0.7
+ Observing in the above table, we see: The correlation coefficient between Education and Age
is 0.927. We can conclude that: There is multicollinearity between the variables Education
and Age.
DẠNG 6. REGRESSION
Multiple linear regression (Hồi quy tuyến tính bội): Analyze > Regression > Linear…
6.1. Không xảy ra đa cộng tuyến:
Nhận xét bảng Model Summary:
We see that R Square Adjusted (0.872) < R Square (0.877), so we can safely use them because it does
not inflate the fit level of the model. The R Square Adjusted index = 0.872 reflects that the model's
suitability level is quite high.
When predicting Income using the variables Doing_Exercises, Marital_Status, Gender,
Education, Age, it shows that:
The R Square Adjusted value of 0.872 shows that the independent variables Doing_Exercises,
Marital_Status, Gender, Education, Age included in the regression analysis affect 87.2% of the
variation in the dependent variable Income, the remaining 12.8% is due to factors. variables outside
the model and random errors.
Nhận xét bảng ANOVA:
The model has an F value of 169.691 with a Sig. value of 0.000, which is less than 0.05. This
indicates that the independent variables Age, Gender, Education, Marital_ status, Doing_Exercises are
statistically significant in explaining the change about Income. They are also important and improve
the model's ability to predict the dependent variable Income better. Sig. < 0.05 shows that this
regression model is suitable for analysis.
If sig >0.05 shows that this regression model is not suitable for analysis. (Dừng kh tiếp tục phân tích)
Nhận xét bảng Coefficients
Income = -88759306.1 + 3941283.020 x Age + 1439975.573 x Gender + 18485615.55 x Marital
Status + 2011495.145 x Doing Exercises.
Regarding variables Gender and Doing_Exercises, it can be observed that:
The coefficient of Gender (Sig. = 0.400 > 0.05) is not statistically significant. We conclude that
Gender has no impact on Income. The coefficient of Doing_Exercises (Sig. = 0.226 > 0.05) is not
statistically significant. We conclude that doing exercise has no impact on Income.
With respect to variables Age and Marital_Status, we can observe that:
The coefficient of Age (Sig. = 0.000 < 0.05) is statistically significant and has a positive sign. This
shows that age has a positive impact on income. If age increases by 1 year, income will increase
3941283.020 VND, keeping other factors unchanged.
The coefficient of Marital_Status (Sig. = 0.000 < 0.05) is statistically significant and has a positive
sign. This indicates that marital status has a positive impact on income. If Marital_Status switches
from 0 (single) to 1 (married), income will increase 18485615.55 VND, while keeping other factors
unchanged
6.2. Xảy ra đa cộng tuyến: Chia làm 2 bảng.
Because there is multicollinearity between Education and Age variables, we will run the regression
estimation with these variables in separate regressions.
Model 1: Age
Income = b0 + b1 x Age + b2 x Gender + b3 x Marital_Status + b4 x Doing_Exercises
We see that R Square Adjusted (0.872) < R Square (0.877), so we can safely use them
because it does not inflate the fit level of the model. The R Square Adjusted index = 0.872 reflects
that the model's suitability level is quite high.
When predicting Income using the variables Doing_Exercises, Marital_Status, Gender,
Education, Age, it shows that:
The R Square Adjusted value of 0.872 shows that the independent variables Doing_Exercises,
Marital_Status, Gender, Education, Age included in the regression analysis affect 87.2% of the
variation in the dependent variable Income, the remaining 12.8% is due to factors. variables outside
the model and random errors.
The model has an F value of 169.691 with a Sig. value of 0.000, which is less than 0.05. This
indicates that the independent variables Age, Gender, Marital_ status, Doing_Exercises are
statistically significant in explaining the change about Income. They are also important and improve
the model's ability to predict the dependent variable Income better. Sig. < 0.05 shows that this
regression model is suitable for analysis.
Regarding variables Gender and Doing_Exercises, it can be observed that:
The coefficient of Gender (Sig. = 0.400 > 0.05) is not statistically significant. We conclude that
Gender has no impact on Income.
The coefficient of Doing_Exercises (Sig. = 0.226 > 0.05) is not statistically significant. We conclude
that doing exercise has no impact on Income.
With respect to variables Age and Marital_Status, we can observe that:
The coefficient of Age (Sig. = 0.000 < 0.05) is statistically significant and has a positive sign. This
shows that age has a positive impact on income. If age increases by 1 year, income will increase
3941283.020 VND, keeping other factors unchanged
The coefficient of Marital_Status (Sig. = 0.000 < 0.05) is statistically significant and has a positive
sign. This indicates that marital status has a positive impact on income. If Marital_Status switches
from 0 (single) to 1 (married), income will increase 18485615.55 VND, while keeping other factors
unchanged
Model 2: Education
Income = b0 + b1 x Education + b2 x Gender + b3 x Marital_Status + b4 x Doing _Exercises
We see that the R Square Adjusted (0.760) < R Square (0.770), so we can safely use them because it
does not inflate the fit level of the model. The R Square Adjusted index = 0.760 reflects that the
model's suitability level is quite high.
The value of R Square Adjusted indicates that 76% of the variation in the dependent variable
(Income) is explained by these independent variables (Marital_Status, Gender, Education,
Doing_Exercises)
In the table above, Sig. = 0.000 < 0.05 indicates that this regression model is suitable for analysis.
Regarding variables Gender and Doing_Exercise, it can be observed that:
The coefficient of Gender (Sig. = 0.171 > 0.05) is not statistically significant. This indicates that
Gender has no impact on Income.
The coefficient of Doing_Exercises (Sig. = 0.353 > 0.05) is not statistically significant. This indicates
that doing exercise has no influence on Income.
With respect to variables Education and Marital_Status, we can observe that:
The coefficient of Education (Sig. = 0.000 < 0.05) is statistically significant and has a positive sign.
This indicates that the education level has a positive impact on income. If Education increases by 1
unit (corresponding to a change in the encoded education level), Income will increase by 19858595.03
VND
The coefficient of Marital_Status (Sig. = 0.000 < 0.05) is statistically significant and has a positive
sign. This indicates that marital status has a positive impact on income. If Marital_Status switches
from 0 (single) to 1 (married), income will increase 37529998.56 VND, while keeping other factors
unchanged
In conclusion, the multiple linear regression analysis revealed the impact of variables such as Age,
Gender, Education, Marital_Status, and Doing_Exercises on the variable Income. The results indicate
that Age, Education, and Marital_Status positively influence Income. In contrast, the impact of
Gender and Doing_Exercises on Income is not statistically significant
DẠNG 7. BIẾN ĐIỀU TIẾT
Create an interaction variable: Interaction = Age x Doing_Exercises
In the table above, Sig. = 0.000 < 0.05 indicates that this regression model is suitable for analysis.
The coefficient for Interaction is statistically significant (Sig. = 0.027 < 0.05), it indicates that
Doing_Exercises moderates the relationship between Age and Income. The coefficient is positive and
significant, it suggests that the moderating effect is positive, meaning that the relationship between
Age and Income is stronger for those who do exercises.
Câu 2. Market research company Y wants to compare the satisfaction level when using
toothpaste product B of 2 gender groups (Male, Female), Which of the following test can be
used?
a. Paired Sample T test
b. R-squared
c. One Samle T test
d. independent Sample T test
Câu 3. When performing regression analysis, when constructing independent variables, how
many dummy variables should be created from a categorical variable with 4 values?
a. 3
b. 4
c. 1
d. 2
Câu 4. Multiple linear regression was used to analyze
a. Relationship between more than one dependent vi..ble and only one independent …
b. The relationship between a dependent variable and many independent variables
c. Relationship between more than one independent variable
d. Relationship between one or more dependent variable and only one independent variable.
Câu 5. A study students cell phone use behavior in HCMC shows that the Pearson correlation
coefficient between the variables “Time using cell phones” and “Academic Performance” is 0.42
(statistically significant). Thus it can be concluded:
a. Time using cell phones and academic performance are negatively correlated
b. Time using cell phones and academic performance were not correlated
c. Time using cell phones and academic performance are positively correlated
d. All of above statements are incorrect
Câu 6. Prescriptive analytics can help businesses solve which of the following questions
a. How to deal with customer complaints
b. What is the best way to ship goods from factories to agents to minimize costs
Câu 7. Students academic ranking (1) Average (2) Fair (3) Excellent (4) Excellent is the type of
scale
a. Nhị phân (Binary)
b. Tỷ lệ (Ratio)
c. Thứ bậc (Ordinal)
d. Liên tục (Continuous)
Câu 8. What is a statistical indicates of dispersion?
a. Trung vị (mean)
b. Yếu vị (Mode)
c. Trung bình (mean)
d. Phương sai (variance)
Câu 9. What is an example of a continuous variable?
a. Working departments (Marketing, Human Resources, Sales, Accounting, …) of employees
b. Weight of …
c. Gender of employees in the enterprise
d. Hotel rating … from 1 to 5
Câu 10. Which of the following is true about multiple linear regression?
a. Multiple linear regression uses least squares to estimate the intercept and slope coefficients.
b. Multiple linear regression using ANOVA tests the significance of each variable separate
c. The regression coefficient are called fractional regression coefficient
d. This is a linear regression model with more than one dependent variable
Câu 11. Which of the following is true for the median?
a. Median is the number that occurs most often in a data set
b. For an even number of observations, the median is the mean of the two middle numbers.
c. The median can be calculated regardless of how the data is sorted
d. A median is only meaningful for interval or ordinal data and not for ratio data.
Câu 12. Which of the following is true for the R - Squared (R^2) value in multiple linear
regression?
a. The R - Square value = 1 indicates the maximum deviation of the data from the regression
line
b. The higher the value of R - Squared, the better fit the regression line will be to the data
c. If the value of R^2 is greater than 1, the regression line will fit the data perfectly
d. The value of R-squared (R^2) will always be between -1 and 1
1.
We see that R Square Adjusted (0.414) < R Square (0.469), so we can safely use them because it does
not inflate the fit level of the model. The R Square Adjusted index = 0.414 reflects that the model's
suitability level is quite high.
The R Square Adjusted value of 0.414 shows that the independent variables Content, Social Media,
Email included in the regression analysis affect 41.4% of the variation in the dependent variable
Income, the remaining 60.6% is due to factors. variables outside the model and random errors.
Regarding variables Content, it can be observed that:
The coefficient of Content (Sig. = 0.136 > 0.05) is not statistically significant. We conclude that
Content has no impact on Cútomer intention.
With respect to variables SocialMedia and Email, we can observe that:
The coefficient of SocialMedia (Sig. = 0.000 < 0.05) is statistically significant and has a positive sign.
This shows that age has a positive impact on customer intention.
The coefficient of Email (Sig. = 0.013 < 0.05) is statistically significant and has a positive sign. This
indicates that marital status has a positive impact on customer intention.
2. Based on the regression results, both Email and Social Media significantly impact customer
intention, with Email emerging as the preferred channel for Bamboo's investment.
Specifically, if the Likertf scale for email marketing increases by 1 unit, customer intention
on the Likert scale is expected to increase by 0.34 units, holding all other factors constant.