Logistic Regression using Statsmodels
Last Updated :
25 Oct, 2025
Logistic regression is a statistical technique used for predicting outcomes that have two possible classes like yes/no or 0/1. Using Statsmodels in Python, we can implement logistic regression and obtain detailed statistical insights such as coefficients, p-values and confidence intervals.

Need for Statsmodels
Some of the reasons to use Statsmodels for logistic regression are:
- Detailed Statistical Output: Shows p-values, confidence intervals and model fit metrics.
- Ease of Interpretation: Allows analysts to understand the effect of each variable on predictions.
- Flexibility: Supports categorical variables, interactions and transformations.
- Integration with Python Libraries: Works seamlessly with pandas and NumPy for data handling.
Building the Logistic Regression model
In this example, we predict whether a student will be admitted to a college based on their GMAT score, GPA and work experience. The target variable is binary i.e. admitted or not admitted.
Step 1: Importing Libraries
Importing libraries like statsmodel and pandas.
Python
import statsmodels.api as sm
import pandas as pd
Step 2: Loading Training Dataset
Here we will load the training dataset. You can download dataset from here.
Python
df = pd.read_csv('logit_train1.csv', index_col = 0)
Step 3: Define Dependent and Independent Variable
Defining dependent and independent variables for training.
Python
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]
Step 4: Build the Model
Building the model using statsmodel module. Here we use sm.Logit() method to train logistic regression model.
Python
log_reg = sm.Logit(ytrain, Xtrain).fit()
Step 5: Perform Predictions
Performing predictions on testing data.
Python
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
print('Actual values', list(ytest.values))
print('Predictions :', prediction)
Output:
Actual values: [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Step 6: Confusion Matrix
Testing the accuracy of the model and visualizing in the form of confusion matrix.
Python
from sklearn.metrics import (confusion_matrix, accuracy_score)
cm = confusion_matrix(ytest, prediction)
print ("Confusion Matrix : \n", cm)
print('Test accuracy = ', accuracy_score(ytest, prediction))
Output :
Confusion Matrix:
[[6 0]
[2 2]]
Test accuracy: 0.8
We can see our model is working fine.
Explore
Machine Learning Basics
Python for Machine Learning
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advanced Techniques
Machine Learning Practice