3.
0 METHODOLOGY
This section outlines the statistical methods and procedures used to examine the relationships between
key variables related to gym members' exercise behaviours. The dataset for this study was originally
obtained from Kaggle, contributed by Vala Khorasani, and contained 974 individual records. To
ensure the analysis remained focused, relevant, and manageable, the dataset was filtered down to 500
observations, while still maintaining its representativeness and data quality. The dataset includes key
variables such as gender, workout type, session duration (in hours), and calories burned.
This study employed three statistical techniques using Minitab to address the research objectives: the
Chi-square test of independence, simple linear regression, and one-way ANOVA. Each method was
chosen based on the types of variables involved and the nature of the hypothesis being tested.
3.1 Chi-square Test
We used Minitab to carry out a Chi-square test of independence to achieve the first research objective,
which is to find out if there is a significant relationship between gender and workout type. Both
variables are categorical, so the Chi-square test is the right method to use.
In Minitab, we first entered the data with gender and workout type in separate columns. Then, we
clicked on Stat > Tables > Chi-Square Test (Two-Way Table). We selected gender as the row variable
and workout type as the column variable. Minitab then created a table and gave us the Chi-square
value, degrees of freedom, and the p-value.
The test can also be manually calculated using the formula:
(𝑂 − 𝐸 )
𝜒 =
𝐸
( )×( )
where 𝑂 is the observed frequency, and 𝐸 = is the expected
frequency. The calculated value is compared to the critical value at (𝑟 − 1)(𝑐 − 1) degrees of
freedom. In this case, the test confirms that gender and workout type are statistically independent.
3.2 Simple Linear Regression
Minitab is used to perform a simple linear regression to achieve the second research objective, which
is to find out if there is a significant linear relationship between session duration (in minutes) and
calories burned. Since both variables are numerical, this test is suitable to check how one variable
(session duration) affects the other (calories burned). In Minitab, we entered the data with session
duration as the independent variable (X) and calories burned as the dependent variable (Y). To do the
analysis, we went to Stat > Regression > Regression > Fit Regression Model. We selected session
duration as the predictor and calories burned as the response variable, then clicked "OK". From the
output, the regression equation was obtained in the form:
𝑌 = 𝛼 + 𝛽𝑋
where 𝑌 is the estimated calories burned, 𝛼 is the intercept, and 𝛽 is the slope coefficient indicating
the change in calories burned per minute of session duration. The result also gave a p-value for the
model. We compared this p-value to the significance level of 0.05. Since the p-value was less than
0.05, we rejected the null hypothesis. This means there is a significant positive relationship between
session duration and calories burned. In other words, longer workout sessions tend to burn more
calories.
3.3 ANOVA
A one-way Analysis of Variance (ANOVA) was conducted to fulfil the third research objective, which
aimed to determine whether there are significant differences in mean session duration across different
levels of calories burned. This statistical method is appropriate when comparing the means of a
continuous variable (session duration) across more than two independent categorical groups (calories
burned categories).
The analysis was performed using Minitab software. The data was first entered into the worksheet
with session duration recorded in one column and calories burned categories (e.g., low, medium, high)
entered in another column as the grouping variable. To perform the test, we selected Stat > ANOVA >
One-Way. In the dialog box, the option “Response data are in one column for all factor levels” was
chosen. Session duration was set as the response variable and calories burned category as the factor.
To examine which groups were significantly different from each other, the Tukey method was selected
under the “Comparisons” button to perform post-hoc analysis, assuming equal variances. The Tukey
test only appears when the null hypothesis is rejected. After running the test, Minitab generated the
ANOVA summary table, model summary, and group means.
The p-value was then compared with the significance level 𝛼 = 0.05. Since the p-value was less than
0.05, the null hypothesis was rejected. This indicates that there are statistically significant differences
in average session duration between at least one pair of calories burned categories.