Open In App

How to Obtain ANOVA Table with Statsmodels

Last Updated : 07 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Analysis of Variance (ANOVA) is a statistical method used to analyze the differences among group means in a sample. It is particularly useful for comparing three or more groups for statistical significance. In Python, the statsmodels library provides robust tools for performing ANOVA. This article will guide you through obtaining an ANOVA table using statsmodels, covering both one-way and two-way ANOVA, as well as repeated measures ANOVA.

Understanding ANOVA

ANOVA is a powerful statistical method used to determine if there are any statistically significant differences between the means of two or more independent groups. It is widely used in various fields, including medicine, social sciences, and engineering. ANOVA can be one-way, two-way, or even multi-way, depending on the number of factors being analyzed. The key components of an ANOVA table include:

  • Sum of Squares (SS): Measures the variability.
  • Degrees of Freedom (DF): Number of independent values or quantities which can be assigned to a statistical distribution.
  • Mean Square (MS): Average of the squared differences from the mean.
  • F-Statistic: Ratio of systematic variance to unsystematic variance.
  • P-Value: Probability that the observed results are due to chance.

Performing ANOVA with Statsmodels

1. One-Way ANOVA

One-way ANOVA is used when you have one independent variable and one dependent variable. Here's how to perform one-way ANOVA using statsmodels. Step-by-Step Guide for evaluating one-way anova with statsmodels:

1. Import Libraries and Data:

Python
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

data = sm.datasets.get_rdataset("PlantGrowth").data

2. Fit the Model and Obtain the ANOVA Table:

Python
model = ols('weight ~ C(group)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

Output:

            sum_sq    df         F   PR(>F)
C(group) 3.76634 2.0 4.846088 0.01591
Residual 10.49209 27.0 NaN NaN

2. Two-Way ANOVA

Two-way ANOVA is used when you have two independent variables. It helps in understanding if there is an interaction between the two factors on the dependent variable. Step-by-Step Guide for evaluating two-way anova with statsmodels:

1. Import Libraries and Data:

Python
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# Example dataset
data = sm.datasets.get_rdataset("Moore", "carData").data
data = data.rename(columns={"partner.status": "partner_status"})

2. Fit the Model Obtain the ANOVA Table:

Python
model = ols('conformity ~ C(fcategory) * C(partner_status)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

Output:

                                    sum_sq    df          F    PR(>F)
C(fcategory) 11.614700 2.0 0.276958 0.759564
C(partner_status) 212.213778 1.0 10.120692 0.002874
C(fcategory):C(partner_status) 175.488928 2.0 4.184623 0.022572
Residual 817.763961 39.0 NaN NaN

Repeated Measures ANOVA

Repeated measures ANOVA is used when the same subjects are used for each treatment (i.e., repeated measurements). Step-by-Step Guide for evaluating Repeated Measures ANOVA with statsmodels:

For this example, let's assume we have a dataset where we measured the reaction time of subjects under different conditions.

Python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import AnovaRM

data = {
    'subject': ['S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S4', 'S4', 'S4'],
    'condition': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'reaction_time': [10, 12, 14, 8, 9, 11, 14, 15, 16, 7, 9, 10]
}

df = pd.DataFrame(data)

# Perform Repeated Measures ANOVA
aovrm = AnovaRM(df, 'reaction_time', 'subject', within=['condition'])
res = aovrm.fit()
print(res.summary())

Output:

                Anova
======================================
F Value Num DF Den DF Pr > F
--------------------------------------
condition 40.5000 2.0000 6.0000 0.0003
======================================

Interpreting the ANOVA Table

The ANOVA table provides several key statistics:

  • Sum of Squares (SS): The sum of the squared differences between each observation and the mean of its group.
  • Degrees of Freedom (DF): The number of independent pieces of information used to calculate the sum of squares.
  • Mean Square (MS): The sum of squares divided by the degrees of freedom.
  • F-statistic: The ratio of the mean square of the model to the mean square of the residuals.
  • p-value: The probability of observing the F-statistic under the null hypothesis that all group means are equal.

Customizing the ANOVA Table

You can customize the ANOVA table by specifying different types of sums of squares. In statsmodels, you can use the type parameter to specify the type of ANOVA test:

  • Type I: Sequential sums of squares.
  • Type II: Partial sums of squares.
  • Type III: Marginal sums of squares.
table = sm.stats.anova_lm(moore_lm, typ=3)  # Type III ANOVA
print(table)

Conclusion

Obtaining an ANOVA table in statsmodels is a straightforward process. By following the steps outlined above, you can perform one-way, two-way, and repeated measures ANOVA, and interpret the results to understand the significance of your factors. This powerful method allows you to analyze complex datasets and draw meaningful conclusions about the relationships between variables.


Next Article

Similar Reads