Fairness Metrics - Demographic Parity, Equalized Odds

Last Updated : 23 Jul, 2025

As machine learning is used in important areas like hiring, lending and healthcare, concerns about fairness and bias have grown. Since algorithms learn from data, they can pick up and even increase unfair patterns.

Fairness metrics like Demographic Parity and Equalized Odds help check if an AI system treats different groups fairly based on factors like race, gender or income. Using these metrics can make AI more fair, ethical and responsible.

Why Fairness Matters

The goal of machine learning is to derive patterns from data and make decisions that generalize well to new, unseen examples. However, when the data used to train these systems contains biases, the predictions may disproportionately favor or disadvantage certain groups. Such situations can lead to ethical dilemmas and legal challenges.

For instance:

  1. Hiring Algorithms: A hiring algorithm might inadvertently favor male candidates if trained on historical data where men were overrepresented in managerial roles.
  2. Loan Approvals: A loan approval system might deny applications disproportionately based on zip codes if zip codes correlate with race or ethnicity.
  3. Criminal Justice: Predictive policing models may unfairly target specific communities if historical arrest data reflects biased practices.

Fairness metrics offer an objective way to detect and address such issues, ensuring that machine learning systems uphold ethical principles and comply with legal frameworks like the 'Equal Credit Opportunity Act (ECOA)' or the 'General Data Protection Regulation (GDPR)'.

Demographic Parity

Demographic Parity ensures that the probability of receiving a positive outcome (e.g., a loan approval or a job offer) is the same across all groups defined by a sensitive attribute. This metric focuses on the model's output, regardless of the actual labels.

Mathematical Formulation

Let \hat{Y} denote the predicted outcome (1 for positive and 0 for negative), A be the sensitive attribute (e.g., gender or race)

Demographic Parity is satisfied if:

P(\hat{Y} = 1 | A = a) = P(\hat{Y} = 1 | A = b) \quad \forall \, a, b

This implies that the rate of positive predictions is equal across all groups defined by A .

Illustrative Example

A model predicting loan approvals. If men ( A = \text{male} ) have a loan approval rate of 70%, but women ( A = \text{female} ) only have a 40% approval rate, the model does not satisfy Demographic Parity.

Challenges

  • Trade-off with Accuracy: Achieving demographic parity might lead to suboptimal predictions for some groups, particularly if the underlying data reflects real-world disparities in target labels.
  • Over- and Under-correction: Imposing strict demographic parity can unintentionally benefit one group at the cost of another, leading to "reverse discrimination."

Implementation of the Demographic Parity check in Python

Python
import numpy as np

def check_dp(preds, sens_at):
    uniq_g = np.unique(sens_at)
    pos_rates = []  # List to store positive rates

    for g in uniq_g:
        group_in = (sens_at == g)
        pos_rate = np.mean(preds[group_in])  # Calculate the positive rate for the group
        pos_rates.append(pos_rate) 
    return np.allclose(pos_rates, pos_rates[0])

preds = np.array([1, 0, 1, 1, 0, 0, 1])
sens_at = np.array(['male', 'male', 'female', 'female', 'female', 'male', 'female'])

res = check_dp(preds, sens_at)
print(f"Does the model satisfy Demographic Parity? {res}")

Output
Does the model satisfy Demographic Parity? False

Equalized Odds

Unlike Demographic Parity, Equalized Odds focuses on the relationship between predictions and the true labels. It requires that the true positive rate (TPR) and false positive rate (FPR) are equal across all groups. This ensures that predictions are equally accurate (or equally erroneous) for different sensitive groups.

Mathematical Formulation

For a model to satisfy Equalized Odds:

1. True Positive Rate (TPR) Equality:

P(\hat{Y} = 1 | Y = 1, A = a) = P(\hat{Y} = 1 | Y = 1, A = b) \quad \forall \, a, b

2. False Positive Rate (FPR) Equality:

P(\hat{Y} = 1 | Y = 1, A = a) = P(\hat{Y} = 1 | Y = 1, A = b) \quad \forall \, a, b

Here:

  • Y is the actual label
  • \hat{Y} is the predicted outcome
  • A is the sensitive attribute

Illustrative Example

Consider a medical diagnostic system predicting diseases. If the system has a higher TPR for men than for women, it may result in women being underdiagnosed.

Challenges

  • Implementation Complexity: Ensuring Equalized Odds often requires more advanced techniques like reweighting or adversarial learning.
  • Trade-offs with Performance: Balancing TPR and FPR across groups may lead to reduced overall accuracy.

Implementation of the Equalized Odds check in Python

Python
from sklearn.metrics import confusion_matrix
import numpy as np

def fun(preds, labels, sens_at):
    uniq_g = np.unique(sens_at)
    tpr, fpr = [], []

    for g in uniq_g:
        grp_idx = (sens_at == g)
        grp_labels = labels[grp_idx]
        grp_preds = preds[grp_idx]

        tn, fp, fn, tp = confusion_matrix(grp_labels, grp_preds, labels=[0, 1]).ravel()
        tpr.append(tp / (tp + fn))  # True Positive Rate
        fpr.append(fp / (fp + tn))  # False Positive Rate

    return np.allclose(tpr, tpr[0]) and np.allclose(fpr, fpr[0])

labels = np.array([1, 0, 1, 0, 1, 0, 0])
preds = np.array([1, 0, 1, 0, 0, 1, 0])
sens_at = np.array(['male', 'male', 'female', 'female', 'female', 'male', 'female'])

res = fun(preds, labels, sens_at)
print(f"Does the model satisfy Equalized Odds? {res}")

Output

Does the model satisfy Equalized Odds? False

Trade-offs and Practical Considerations

1. Fairness vs. Utility: Striving for fairness often comes at a cost to accuracy. Organizations must carefully consider this trade-off based on the context and goals.

2. Choice of Metric: There’s no universal fairness metric. The choice depends on the specific ethical and practical constraints of the application.

3. Biased Data: Even if a model satisfies fairness metrics, it may still reflect the biases in the underlying data.

4. Legal and Social Implications: Fairness definitions must align with relevant regulations and societal values.

Comment

Explore