Logistic Regression using Statsmodels

Last Updated : 25 Oct, 2025

Logistic regression is a statistical technique used for predicting outcomes that have two possible classes like yes/no or 0/1. Using Statsmodels in Python, we can implement logistic regression and obtain detailed statistical insights such as coefficients, p-values and confidence intervals.

Need for Statsmodels

Some of the reasons to use Statsmodels for logistic regression are:

Detailed Statistical Output: Shows p-values, confidence intervals and model fit metrics.
Ease of Interpretation: Allows analysts to understand the effect of each variable on predictions.
Flexibility: Supports categorical variables, interactions and transformations.
Integration with Python Libraries: Works seamlessly with pandas and NumPy for data handling.

Building the Logistic Regression model

In this example, we predict whether a student will be admitted to a college based on their GMAT score, GPA and work experience. The target variable is binary i.e. admitted or not admitted.

Step 1: Importing Libraries

Importing libraries like statsmodel and pandas.

Python

import statsmodels.api as sm
import pandas as pd

Step 2: Loading Training Dataset

Here we will load the training dataset. You can download dataset from here.

Python

df = pd.read_csv('logit_train1.csv', index_col = 0)

Step 3: Define Dependent and Independent Variable

Defining dependent and independent variables for training.

Python

Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]

Step 4: Build the Model

Building the model using statsmodel module. Here we use sm.Logit() method to train logistic regression model.

Python

log_reg = sm.Logit(ytrain, Xtrain).fit()

Step 5: Perform Predictions

Performing predictions on testing data.

Python

yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))

print('Actual values', list(ytest.values))
print('Predictions :', prediction)

Output:

Actual values: [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Step 6: Confusion Matrix

Testing the accuracy of the model and visualizing in the form of confusion matrix.

Python

from sklearn.metrics import (confusion_matrix, accuracy_score)
cm = confusion_matrix(ytest, prediction) 
print ("Confusion Matrix : \n", cm) 
print('Test accuracy = ', accuracy_score(ytest, prediction))

Output :

Confusion Matrix:
[[6 0]
[2 2]]
Test accuracy: 0.8

We can see our model is working fine.

cosine1509

Improve

Article Tags :

Logistic Regression using Statsmodels

Need for Statsmodels

Building the Logistic Regression model

Step 1: Importing Libraries

Step 2: Loading Training Dataset

Step 3: Define Dependent and Independent Variable

Step 4: Build the Model

Step 5: Perform Predictions

Step 6: Confusion Matrix

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?