Open In App

Logistic Regression using Statsmodels

Last Updated : 25 Oct, 2025
Comments
Improve
Suggest changes
5 Likes
Like
Report

Logistic regression is a statistical technique used for predicting outcomes that have two possible classes like yes/no or 0/1. Using Statsmodels in Python, we can implement logistic regression and obtain detailed statistical insights such as coefficients, p-values and confidence intervals.

_what_is_logistic_regression.webp

Need for Statsmodels

Some of the reasons to use Statsmodels for logistic regression are:

  1. Detailed Statistical Output: Shows p-values, confidence intervals and model fit metrics.
  2. Ease of Interpretation: Allows analysts to understand the effect of each variable on predictions.
  3. Flexibility: Supports categorical variables, interactions and transformations.
  4. Integration with Python Libraries: Works seamlessly with pandas and NumPy for data handling.

Building the Logistic Regression model

In this example, we predict whether a student will be admitted to a college based on their GMAT score, GPA and work experience. The target variable is binary i.e. admitted or not admitted.

Step 1: Importing Libraries

Importing libraries like statsmodel and pandas.

Python
import statsmodels.api as sm
import pandas as pd 

Step 2: Loading Training Dataset

Here we will load the training dataset. You can download dataset from here.

Python
df = pd.read_csv('logit_train1.csv', index_col = 0)

Step 3: Define Dependent and Independent Variable

Defining dependent and independent variables for training.

Python
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]

Step 4: Build the Model

Building the model using statsmodel module. Here we use sm.Logit() method to train logistic regression model.

Python
log_reg = sm.Logit(ytrain, Xtrain).fit()

Step 5: Perform Predictions

Performing predictions on testing data.

Python
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))

print('Actual values', list(ytest.values))
print('Predictions :', prediction)

Output:

Actual values: [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Step 6: Confusion Matrix

Testing the accuracy of the model and visualizing in the form of confusion matrix.

Python
from sklearn.metrics import (confusion_matrix, accuracy_score)
cm = confusion_matrix(ytest, prediction) 
print ("Confusion Matrix : \n", cm) 
print('Test accuracy = ', accuracy_score(ytest, prediction)) 

Output : 

Confusion Matrix:
[[6 0]
[2 2]]
Test accuracy: 0.8

We can see our model is working fine.


Explore