Dear: Joe Teng
From: Khanh Hoang-1684391
Date: 25/11/2025
Course: BA 3310
Title: Predicting Company Profit Using Multiple Linear Regression
Step 1: Objective of the Regression Analysis
The objective of this multiple linear regression analysis is to evaluate how
three major categories of company expenditures—R&D Spend,
Administration, and Marketing Spend—predict the Profit of 1,000 companies.
By examining the influence of each expenditure type, the study seeks to
quantify the contributions of these financial activities and determine how
effectively they explain variations in profitability. This model ultimately
provides insight into which investment areas appear most strongly
associated with profit outcomes.
Step 2: Sample Size and Statistical Power
The dataset contains 1,000 observations, which provides a robust foundation
for conducting multiple linear regression. With only three predictors, this
sample size far exceeds common statistical recommendations for stable
coefficient estimation and high test power. The large number of observations
strengthens the reliability of the model, reduces the likelihood of Type II
errors, and enhances confidence in the resulting regression estimates.
Therefore, the dataset is well-suited to support an accurate and meaningful
predictive model.
Step 3: Checking and Reporting Model Assumptions
Linearity:
Scatterplots and the Actual vs Predicted plot showed that each independent
variable—R&D Spend, Administration, and Marketing Spend—has an
approximately linear relationship with Profit. No nonlinear patterns were
observed.
Normality of Residuals:
The Shapiro–Wilk test indicated that residuals were not normally distributed
(W = 0.9186, p < .001). However, with a large sample size (N = 1000), this
violation has minimal impact due to the Central Limit Theorem.
Homoscedasticity:
The Breusch–Pagan test suggested heteroscedasticity in the residuals (χ² =
260.91, p < .001), meaning the variance of errors is not constant across
predicted values.
Independence of Errors:
The Durbin–Watson statistic was 1.688, which is within acceptable range and
indicates no severe autocorrelation among residuals.
Multicollinearity:
VIF results revealed very high multicollinearity among predictors (R&D =
71.55, Administration = 16.26, Marketing = 127.87), meaning predictors are
highly correlated with each other, though the overall model remains strong.
Step 4: Statistical Significance of the Regression Model
The overall regression model was highly significant, as shown by an F-
statistic of 6290 with a p-value below 0.001. This result demonstrates that,
when considered together, the predictors explain a significant amount of
variation in Profit. The model therefore provides strong evidence that
expenditure patterns are meaningful predictors of company profitability.
Step 5: Coefficient of Determination
The model demonstrated excellent explanatory power, accounting for
approximately 95% of the variance in company Profit. Both the R² and
Adjusted R² values were 0.950, indicating a very strong fit with minimal
overfitting. In addition, the overall regression F-statistic confirmed that the
model was highly significant. The table below summarizes the key model
evaluation statistics:
Model
Value
Statistic
R² 0.950
Adjusted R² 0.950
F-statistic 6290.0
Model
Value
Statistic
p-value <
(Model) 0.001
Durbin–
1.688
Watson
These results show that the model performs exceptionally well in explaining
the variation in Profit and is statistically robust.
Step 6: Statistical Significance of Predictors
The regression results revealed that all three predictors—R&D Spend,
Administration, and Marketing Spend—were statistically significant
contributors to company Profit. Each variable had a p-value less than 0.001,
indicating strong evidence that these expenditures are meaningfully
associated with changes in profitability. Administration exhibited the
strongest effect on Profit, followed by R&D Spending, while Marketing Spend
showed a smaller but still positive and significant relationship.
The table below summarizes the coefficients, t-values, and significance levels
for each predictor:
Coefficient t- p- Significan
Predictor
(β) value value t?
<
R&D Spend 0.5539 15.93 Yes
0.001
<
Administration 1.0266 33.06 Yes
0.001
Marketing <
0.0806 4.79 Yes
Spend 0.001
These results indicate that increases in all three expenditure categories are
associated with increases in Profit. Although the magnitude of their
contributions differs, each predictor provides statistically reliable information
for estimating company profitability.
Step 7: Regression Coefficients and Regression Equation
The regression equation derived from the model is:
Profit = -70160.16 + 0.5539 × (R&D Spend) + 1.0266 × (Administration) +
0.0806 × (Marketing Spend)
This equation shows how each predictor contributes to changes in Profit.
Specifically, an increase in R&D spending increases Profit by approximately
0.5539 units, holding other variables constant. Administration expenses
show a stronger effect, where each unit increase raises Profit by about
1.0266 units. Marketing spending also has a positive impact, with each
additional unit predicting a 0.0806 unit increase in Profit. Collectively, these
coefficients reflect how expenditure decisions influence company profitability.
Step 8: Discussion of Model Fit and Limitations
The regression model demonstrates excellent predictive performance,
supported by a high adjusted R² and statistically significant predictors. Error
metrics such as MAE, MSE, and RMSE also indicate strong predictive
accuracy. However, the model has limitations. The presence of severe
multicollinearity weakens the interpretability of individual coefficients, and
the detected heteroscedasticity suggests that variance in errors is
inconsistent across predicted values. Additionally, residuals deviate from
normality, and the model cannot establish causal relationships. Other factors
affecting profit—such as market conditions, competition, or pricing strategy—
are not included, which may limit the generalizability of the model.
Step 9: Additional Diagnostics and Visualizations
To further evaluate model performance, prediction error metrics were
calculated. Overall, the errors were relatively small compared to the scale of
company Profit values, indicating strong predictive performance. The
scatterplot of Actual vs. Predicted Profit further supported this conclusion,
showing points closely aligned along the diagonal reference
Metric Value
1904.4
Mean Absolute Error (MAE)
9
9.20 ×
Mean Squared Error (MSE)
10⁷
Root Mean Squared Error (RMSE) 9595.6
Metric Value
These diagnostics confirm that the model provides accurate
predictions, despite limitations such as heteroscedasticity and
multicollinearity.
A scatterplot comparing Actual and Predicted Profit values was
included to assess model performance visually. The plotted points
aligned closely along the diagonal line, demonstrating strong
predictive accuracy. Error metrics, shown in the table above, further
confirmed the robustness of the model.