Open In App

Extracting Regression Coefficients from statsmodels.api

Last Updated : 08 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

data analysis and machine learning, regression analysis is a fundamental tool used to understand relationships between variables. Python's statsmodels library provides a powerful framework for performing regression analysis. This article delves into how to extract regression coefficients using statsmodels.api, focusing on practical implementation with technical insights.

Introduction to statsmodels

Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and data exploration. It is particularly useful for econometric and statistical analyses, providing a comprehensive suite of tools for linear regression, time series analysis, and more.

Extracting Regression Coefficients Using statsmodels

Once the model is fitted, we can extract the regression coefficients using below methods from the fitted model, which include the slope and intercept of the regression line using the params Attribute. The params attribute of the fitted model object contains the coefficients.

Let's walk through a simple example of performing linear regression using statsmodels. First, prepare your data. For demonstration, let's create a synthetic dataset:

Python
import numpy as np
import pandas as pd

np.random.seed(0)
X = np.random.rand(100, 1)
y = 3 * X.squeeze() + 2 + np.random.randn(100) * 0.5

data = pd.DataFrame({'X': X.squeeze(), 'y': y})

Building the Regression Model

To perform regression analysis, you need to define your dependent (endog) and independent (exog) variables. In this example, y is the dependent variable, and X is the independent variable.

Python
import statsmodels.api as sm

# Add a constant to the independent variable
X = sm.add_constant(data['X'])

# Fit the Ordinary Least Squares (OLS) model
model = sm.OLS(data['y'], X)
results = model.fit()

Extracting Regression Coefficients

Once the model is fitted, you can extract the regression coefficients, which include the slope and intercept of the regression line.

Python
# Extract coefficients
coefficients = results.params
print("Intercept:", coefficients['const'])
print("Slope:", coefficients['X'])
# Print the summary of the regression model
print(results.summary())

Output:

Intercept: 2.1110755387236146
Slope: 2.9684675107010206
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.747
Model: OLS Adj. R-squared: 0.744
Method: Least Squares F-statistic: 289.3
Date: Thu, 08 Aug 2024 Prob (F-statistic): 5.29e-31
Time: 13:16:46 Log-Likelihood: -72.200
No. Observations: 100 AIC: 148.4
Df Residuals: 98 BIC: 153.6
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.1111 0.097 21.843 0.000 1.919 2.303
X 2.9685 0.175 17.009 0.000 2.622 3.315
==============================================================================
Omnibus: 11.746 Durbin-Watson: 2.083
Prob(Omnibus): 0.003 Jarque-Bera (JB): 4.097
Skew: 0.138 Prob(JB): 0.129
Kurtosis: 2.047 Cond. No. 4.30
==============================================================================

The params attribute returns a pandas.Series object, where each coefficient is indexed by the name of the corresponding variable. If you included a constant in your model using sm.add_constant(), the intercept will be labeled as 'const'.

Automating Coefficient Extraction

For larger models, you might want to automate the extraction of coefficients into a more structured format, such as a DataFrame:

Python
def get_coef_table(lin_reg):
    coef_df = pd.DataFrame({
        'coef': lin_reg.params.values,
        'pvalue': lin_reg.pvalues.round(4),
        'ci_lower': lin_reg.conf_int()[0],
        'ci_upper': lin_reg.conf_int()[1]
    }, index=lin_reg.params.index)
    return coef_df

coef_table = get_coef_table(results)
print(coef_table)

Output:

           coef  pvalue  ci_lower  ci_upper
const 2.111076 0.0 1.919284 2.302867
X 2.968468 0.0 2.622125 3.314810

Conclusion

Extracting regression coefficients using statsmodels.api is a straightforward process that provides valuable insights into your data. By understanding the coefficients and their statistical significance, you can make informed decisions and predictions. This article covered the essentials of setting up a regression model, extracting coefficients, and interpreting the results. With these tools, you're well-equipped to perform robust regression analysis in Python.


Next Article

Similar Reads