How to Calculate Studentized Residuals in Python?
Last Updated :
19 Jan, 2023
Studentized residual is a statistical term and it is defined as the quotient obtained by dividing a residual by its estimated standard deviation. This is a crucial technique used in the detection of outlines. Practically, one can claim that any type of observation in a dataset having a studentized residual of more than 3 (absolute value) is an outlier.
The following Python libraries should already be installed in our system:
You can install these packages on your system by using the below command on the terminal.
pip3 install pandas numpy statsmodels matplotlib
Steps to calculate studentized residuals in Python
Step 1: Import the libraries.
We need to import the libraries in the program that we have installed above.
Python3
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
Step 2: Create a data frame.
Firstly, we are required to create a data frame. With the help of the pandas' package, we can create a data frame. The snippet is given below,
Python3
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
96, 86, 75, 97, 89],
'Benchmark': [27, 28, 18, 18, 29, 30,
25, 25, 24, 29]})
Step 3: Build a simple linear regression model.
Now we need to build a simple linear regression model of the created dataset. For fitting a simple linear regression model Python provides ols() function from statsmodels package.
Syntax:
statsmodels.api.OLS(y, x)
Parameters:
- y : It represents the variable that depends on x
- x :It represents independent variable
Example:
Python3
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
Step 4: Producing studentized residual.
For producing a dataFrame that would contain the studentized residuals of each observation in the dataset we can use outlier_test() function.
Syntax:
simple_regression_model.outlier_test()
This function will produce a dataFrame that would contain the studentized residuals for each observation in the dataset
Python3
# Producing studentized residual
stud_res = simple_regression_model.outlier_test()
Below is the complete implementation.
Python3
# Python program to calculate studentized residual
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
96, 86, 75, 97, 89],
'Benchmark': [27, 28, 18, 18, 29, 30,
25, 25, 24, 29]})
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
# Producing studentized residual
result = simple_regression_model.outlier_test()
print(result)
Output:

The output is a data frame that contains:
- The studentized residual
- The unadjusted p-value of the studentized residual
- The Bonferroni-corrected p-value of the studentized residual
We can see that the studentized residual for the first observation in the dataset is -1.121201, the studentized residual for the second observation is 0.954871, and so on.
Visualization:
Now let us go into the visualization of the studentized residual. With the help of matplotlib we can make a plot of the predictor variable values VS the corresponding studentized residuals.
Example:
Python3
# Python program to draw the plot
# of stundenterized residual
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
96, 86, 75, 97, 89],
'Benchmark': [27, 28, 18, 18, 29, 30,
25, 25, 24, 29]})
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
# Producing studentized residual
result = simple_regression_model.outlier_test()
# Defining predictor variable values and
# studentized residuals
x = dataframe['Score']
y = result['student_resid']
# Creating a scatterplot of predictor variable
# vs studentized residuals
plt.scatter(x, y)
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel('Points')
plt.ylabel('Studentized Residuals')
# Save the plot
plt.savefig("Plot.png")
Output:

Plot.png:

Similar Reads
How to Calculate Standardized Residuals in R Residuals measure the difference between observed values and values predicted by a regression model. A standardized residual measures each residual by its estimated standard deviation making it easier to identify outliers and influential observations. In R standardized residuals are calculated with
3 min read
How to Calculate Residual Sum of Squares in Python The residual sum of squares (RSS) calculates the degree of variance in a regression model. It estimates the level of error in the model's prediction. The smaller the residual sum of squares, the better your model fits your data; the larger the residual sum of squares, the worse. It is the sum of squ
2 min read
How to Calculate SMAPE in Python? In this article, we will see how to compute one of the methods to determine forecast accuracy called the Symmetric Mean Absolute Percentage Error (or simply SMAPE) in Python. The SMAPE is one of the alternatives to overcome the limitations with MAPE forecast error measurement. In contrast to the me
3 min read
How to Create a Residual Plot in R In this article, we will be looking at a step-wise procedure to create a residual plot in the R programming language. Residual plots are often used to assess whether or not the residuals in regression analysis are normally distributed and whether or not they exhibit heteroscedasticity. Let's create
2 min read
How to Return the Fit Error in Python curve_fit The curve fitting method is used in statistics to estimate the output for the best-fit curvy line of a set of data values. Curve fitting is a powerful tool in data analysis that allows us to model the relationship between variables. In Python, the scipy.optimize.curve_fit function is widely used for
5 min read
How to Calculate Confidence Intervals in Python? Confidence interval (CI) is a statistical range that estimates the true value of a population parameter, like the population mean, with a specified probability. It provides a range where the true value is likely to lie, based on sample data. The confidence level (e.g., 95%) indicates how certain we
4 min read
How to Calculate Mean Absolute Error in Python? When building machine learning models, our aim is to make predictions as accurately as possible. However, not all models are perfect some predictions will surely deviate from the actual values. To evaluate how well a model performs, we rely on error metrics. One widely used metric for measuring pred
3 min read
Evaluate a Hermite series at tuple of points x in Python In this article, we will be looking toward the approach to evaluating a Hermite series at a tuple of points x in Python and NumPy. Example Tuple: (6,7,8,9,10) Result: [102175. 191631. 329175. 529399. 808815.] Explanation: Hermite series at points x. Â NumPy.polynomial.hermite.hermval() method To eval
2 min read
How to Calculate R^2 with Scikit-Learn The coefficient of determination, denoted as R², is an essential metric in regression analysis. It indicates the extent to which the independent variables account for the variation in the dependent variable.In this article, we will walk you through calculating R² using Scikit-Learn, a powerful Pytho
4 min read
Get the Least squares fit of Hermite series to data in Python In this article, we will discuss how to find the Least-squares fit of the Hermite series to data in Python and NumPy. NumPy.polynomials.hermite.hermfit method The Hermite series is an orthogonal polynomial sequence that has its applications in physics, wave theory, numerical analysis, and signal pro
4 min read