LAB EXERCISE – SIMPLE REGRESSION ANALYSIS
Regression models help to predict the relationship between variables.
H1: Higher levels of smog in the city lead to an increase in the incidence of respiratory illnesses
in the city.
Steps:
1. Make a scatter plot – helps to visualize the relationship.
x-axis – independent variable
y-axis – dependent variables
2. Correlation and association between the study variables.
CORREL excel function.
3. Regression Analysis
Go to Data >> Data Analysis >> Regression
In the ‘Regression’ dialog box, set _______ as the independent variable (X) and _______
as the dependent variable (Y). Select a cell for the regression output as well.
Check labels. Click “Ok” or “Run”
4. Regression Output
Regression Output table will include the values of the coefficients, other statistics and some
charts (if you have selected the chart outputs)
Focus on the coefficients, the “intercept” and the independent variable coefficients.
Plug in the values of the intercept and the coefficients in the equation of the regression line.
Y = a + bX
You should know how to interpret the results you get.
For every additional unit of the IV, the DV is expected to increase by b.
For every unit of increase in the AQI, the incidence of respiratory illnesses changes by?
4. Goodness of Fit Measure/Coefficient of Determination.
R-squared (R²) as a measure of how well the regression line fits the data. Interpret what the value
of R² tells us about the model's goodness of fit.
LAB EXERCISE – MULTIPLE REGRESSION ANALYSIS
Steps:
1. Go to the “Data” table.
Click on “Data Analysis”, choose “regression”
Set “Air Quality in Lahore” as the dependent variable (Y) and the other three variables as
independent variables (X1, X2, X3),
Click “OK” or “Run”
2. Regression Output
Focus on the coefficients for each independent variable, as they represent the contribution of
each variable to the model.
For every additional ton of crop burning, the air-quality in Lahore is expected to decrease by ‘b’
units of the AQI, while holding other factors constant.
Coefficient Significance - It is important to identify the significance of the coefficients.
Compare their p-values with the significance level set. It is key to identify the statistically
significant variables.
3. Goodness of Fit measures:
See both - R-squared (R²) and adjusted R-squared (both) –
Adjusted R-squared - more realistic picture when there are multiple predictors in a model
Dummy Variables
Inclusion of categorical data in a regression model.
Coefficient for your dummy variable needs to be discussed. When ‘weekend’, how much will
AQI increase/decrease (holding all other variables constant)?