0% found this document useful (0 votes)
34 views5 pages

Linear Regression Analysis Assignment

Uploaded by

Marketing On
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

Linear Regression Analysis Assignment

Uploaded by

Marketing On
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Assignment 4: Linear Regression

Work within this document and provide a complete response to each of the items. Once
complete submit to the Assignments.

Answer the following: (5 points)

1. List out the assumptions for regression and briefly explain each.

2. What are the accepted measurement scale(s) for the data when conducting a
regression for the following:
a. Dependent Variable
b. Independent Variable

Conduct a multiple regression using the following example: (10 points)

A cancer specialist from the Los Angeles County General Hospital (LACGH) rated
patient optimism in 20 to 40 year old patients with incurable cancer in 1970 (N = 244). In
1990, the researcher examined hospital records to gather the following:

Independent Variables
- SES (1-7)
- Age in 1970
- Optimism in 1970 (1-100)

Dependent Variable
- Longevity (years lived after the 1970 diagnosis)

1. Include the output from the Model Summary

2. Include the output from the ANOVA

3. Include the output from the Coefficients

4. Summarize the results by including the adjusted R2, the significance of the overall
model and then value for each of the independent variables include the beta and the
significant level. Include at least two sentences practically explaining the findings.

SPSS Directions:

Analyze>Regression>Linear then include both the three IVs in the Independent(s) box
and Years Lived in the Dependent box then select OK.
Qs-1

In regression analysis—particularly linear regression—several key assumptions must be


met for the model to produce valid, reliable, and interpretable results. Below are the main
assumptions, along with brief explanations:
1. Linearity
The relationship between the independent (predictor) variables and the dependent
(response) variable is linear. This means that changes in the predictors are
associated with proportional changes in the outcome. Violations can be checked
using residual plots or scatterplots of predictors vs. outcome.
2. Independence of Errors
The residuals (errors) should be uncorrelated with each other. In other words, the
error for one observation should not be predictable from the error of another. This
is especially important in time series data, where autocorrelation often occurs.
Durbin-Watson test is commonly used to detect autocorrelation.
3. Homoscedasticity (Constant Variance of Errors)
The variance of the residuals should be constant across all levels of the predicted
values or independent variables. If the spread of residuals changes with the
predicted value (heteroscedasticity), standard errors and hypothesis tests may be
unreliable. Residual vs. fitted plots help diagnose this.
4. Normality of Residuals
The residuals should be approximately normally distributed, especially for valid
inference (e.g., confidence intervals and p-values). This assumption is less critical
for large samples due to the Central Limit Theorem but important for small
samples. Normality can be assessed with Q-Q plots or statistical tests like
Shapiro-Wilk.
5. No (or Little) Multicollinearity
Predictor variables should not be highly correlated with each other. High
multicollinearity inflates the variance of coefficient estimates, making them
unstable and hard to interpret. It can be detected using Variance Inflation Factor
(VIF) or correlation matrices.
6. No Perfect Multicollinearity
No independent variable should be an exact linear combination of other
independent variables (e.g., including both “height in inches” and “height in
centimeters”). This would make the model mathematically unsolvable (singular
matrix).
7. Correct Model Specification
The model includes all relevant variables and excludes irrelevant ones, and the
functional form is appropriate (e.g., no missing interaction terms or necessary
transformations). Omitting key predictors can lead to biased estimates.
8. Independence of Observations
Each observation should be independent of the others (distinct and not repeated
measures or clustered without accounting for it). Violations often occur in
hierarchical or longitudinal data and require specialized models (e.g., mixed-
effects models).
Meeting these assumptions ensures that the regression estimates are unbiased, efficient
(minimum variance), and valid for statistical inference.

Qs-2

When conducting regression analysis—particularly linear regression—the measurement


scales of the variables matter for both interpretation and statistical validity. Here’s what is
generally accepted:

a. Dependent Variable
Accepted scale: Interval or Ratio (i.e., continuous numeric data)
 Explanation:
Linear regression models the dependent variable as a continuous outcome that can
take any real number within a range. Interval and ratio scales have equal intervals
between values and (in the case of ratio) a true zero point, which allows
meaningful interpretation of differences and predictions.
 Examples:
 Income (ratio)
 Temperature in Celsius (interval)
 Test scores (often treated as interval)

b. Independent Variable(s)
 Accepted scales: Nominal, Ordinal, Interval, or Ratio
 Explanation:
Regression can accommodate any level of measurement for predictors, as long as
they are properly encoded:
 Continuous predictors (interval/ratio): Used directly (e.g., age, income).
 Ordinal predictors: Often treated as continuous if the ordering is
meaningful and intervals are roughly equal; otherwise, may be dummy-
coded.
 Nominal (categorical) predictors: Must be converted into dummy variables
(indicator variables) using techniques like one-hot encoding.
 Example: A variable "Color" with levels Red, Blue, Green
becomes two (or more) binary variables

Qs-3

Model Summary
Model R R Square Adjusted R Std. Error of
Square the Estimate
1 .988a .976 .965 .6555
a. Predictors: (Constant), optimism, age_1970, ses
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
1 Regression 106.647 3 35.549 82.723 <.001b
Residual 2.578 6 .430
Total 109.225 9
a. Dependent Variable: longevity
b. Predictors: (Constant), optimism, age_1970, ses

Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
1 (Constant) 3.713 3.707 1.002 .355
ses 1.220 .832 .670 1.466 .193
age_1970 -.090 .075 -.136 -1.189 .279
optimism .041 .088 .204 .465 .658
a. Dependent Variable: longevity

Summary

Regression Summary
 Adjusted R²: 0.965
This means that 96.5% of the variance in longevity is explained by the predictors:
SES, Age in 1970, and Optimism.
 Overall Model Significance:
F(3, 6) = 82.723, p < .001
This indicates the overall model is statistically significant — the predictors
together significantly explain variation in longevity.

Coefficients Table
Predictor Unstandardized Std. Standardized t Sig. (p-
B Error Beta value)
Intercept 3.713 3.707 — 1.002 .355
SES 1.220 0.832 0.670 1.466 .193
Age_197 -0.090 0.075 -0.136 - .279
0 1.189
Optimism 0.041 0.088 0.204 0.465 .658
Interpretation
While the regression model explains a very high proportion of variance in longevity
(96.5%), none of the individual predictors are statistically significant on their own (all p-
values > 0.05).
 SES has the strongest positive effect on longevity (B = 1.220, β = 0.670),
suggesting that higher socioeconomic status is associated with living longer.
However, this relationship is not statistically significant (p = .193).
 Age in 1970 has a slight negative effect (B = -0.090), possibly indicating that
older individuals in 1970 tended to have shorter longevity, but again this effect is
not significant (p = .279).
 Optimism shows a small positive association with longevity (B = 0.041), but with
the weakest effect size and highest p-value (p = .658).

Reference Data

ID Longe SES AGE_1 OPTIM


vity 970 ISIM
1 8.2 4 32 67
2 5.1 2 28 45
3 12.7 6 25 82
4 3.4 1 39 38
5 9 5 30 71
6 6.8 3 35 55
7 14.3 7 22 88
8 4.9 2 37 41
9 10.5 5 29 76
10 7.6 4 33 63

You might also like