As machine learning is used in important areas like hiring, lending and healthcare, concerns about fairness and bias have grown. Since algorithms learn from data, they can pick up and even increase unfair patterns.
Fairness metrics like Demographic Parity and Equalized Odds help check if an AI system treats different groups fairly based on factors like race, gender or income. Using these metrics can make AI more fair, ethical and responsible.
Why Fairness Matters
The goal of machine learning is to derive patterns from data and make decisions that generalize well to new, unseen examples. However, when the data used to train these systems contains biases, the predictions may disproportionately favor or disadvantage certain groups. Such situations can lead to ethical dilemmas and legal challenges.
For instance:
- Hiring Algorithms: A hiring algorithm might inadvertently favor male candidates if trained on historical data where men were overrepresented in managerial roles.
- Loan Approvals: A loan approval system might deny applications disproportionately based on zip codes if zip codes correlate with race or ethnicity.
- Criminal Justice: Predictive policing models may unfairly target specific communities if historical arrest data reflects biased practices.
Fairness metrics offer an objective way to detect and address such issues, ensuring that machine learning systems uphold ethical principles and comply with legal frameworks like the 'Equal Credit Opportunity Act (ECOA)' or the 'General Data Protection Regulation (GDPR)'.
Demographic Parity
Demographic Parity ensures that the probability of receiving a positive outcome (e.g., a loan approval or a job offer) is the same across all groups defined by a sensitive attribute. This metric focuses on the model's output, regardless of the actual labels.
Mathematical Formulation
Let
Demographic Parity is satisfied if:
P(\hat{Y} = 1 | A = a) = P(\hat{Y} = 1 | A = b) \quad \forall \, a, b
This implies that the rate of positive predictions is equal across all groups defined by
Illustrative Example
A model predicting loan approvals. If men
Challenges
- Trade-off with Accuracy: Achieving demographic parity might lead to suboptimal predictions for some groups, particularly if the underlying data reflects real-world disparities in target labels.
- Over- and Under-correction: Imposing strict demographic parity can unintentionally benefit one group at the cost of another, leading to "reverse discrimination."
Implementation of the Demographic Parity check in Python
import numpy as np
def check_dp(preds, sens_at):
uniq_g = np.unique(sens_at)
pos_rates = [] # List to store positive rates
for g in uniq_g:
group_in = (sens_at == g)
pos_rate = np.mean(preds[group_in]) # Calculate the positive rate for the group
pos_rates.append(pos_rate)
return np.allclose(pos_rates, pos_rates[0])
preds = np.array([1, 0, 1, 1, 0, 0, 1])
sens_at = np.array(['male', 'male', 'female', 'female', 'female', 'male', 'female'])
res = check_dp(preds, sens_at)
print(f"Does the model satisfy Demographic Parity? {res}")
Output
Does the model satisfy Demographic Parity? False
Equalized Odds
Unlike Demographic Parity, Equalized Odds focuses on the relationship between predictions and the true labels. It requires that the true positive rate (TPR) and false positive rate (FPR) are equal across all groups. This ensures that predictions are equally accurate (or equally erroneous) for different sensitive groups.
Mathematical Formulation
For a model to satisfy Equalized Odds:
1. True Positive Rate (TPR) Equality:
P(\hat{Y} = 1 | Y = 1, A = a) = P(\hat{Y} = 1 | Y = 1, A = b) \quad \forall \, a, b
2. False Positive Rate (FPR) Equality:
P(\hat{Y} = 1 | Y = 1, A = a) = P(\hat{Y} = 1 | Y = 1, A = b) \quad \forall \, a, b
Here:
Y is the actual label\hat{Y} is the predicted outcomeA is the sensitive attribute
Illustrative Example
Consider a medical diagnostic system predicting diseases. If the system has a higher TPR for men than for women, it may result in women being underdiagnosed.
Challenges
- Implementation Complexity: Ensuring Equalized Odds often requires more advanced techniques like reweighting or adversarial learning.
- Trade-offs with Performance: Balancing TPR and FPR across groups may lead to reduced overall accuracy.
Implementation of the Equalized Odds check in Python
from sklearn.metrics import confusion_matrix
import numpy as np
def fun(preds, labels, sens_at):
uniq_g = np.unique(sens_at)
tpr, fpr = [], []
for g in uniq_g:
grp_idx = (sens_at == g)
grp_labels = labels[grp_idx]
grp_preds = preds[grp_idx]
tn, fp, fn, tp = confusion_matrix(grp_labels, grp_preds, labels=[0, 1]).ravel()
tpr.append(tp / (tp + fn)) # True Positive Rate
fpr.append(fp / (fp + tn)) # False Positive Rate
return np.allclose(tpr, tpr[0]) and np.allclose(fpr, fpr[0])
labels = np.array([1, 0, 1, 0, 1, 0, 0])
preds = np.array([1, 0, 1, 0, 0, 1, 0])
sens_at = np.array(['male', 'male', 'female', 'female', 'female', 'male', 'female'])
res = fun(preds, labels, sens_at)
print(f"Does the model satisfy Equalized Odds? {res}")
Output
Does the model satisfy Equalized Odds? FalseTrade-offs and Practical Considerations
1. Fairness vs. Utility: Striving for fairness often comes at a cost to accuracy. Organizations must carefully consider this trade-off based on the context and goals.
2. Choice of Metric: There’s no universal fairness metric. The choice depends on the specific ethical and practical constraints of the application.
3. Biased Data: Even if a model satisfies fairness metrics, it may still reflect the biases in the underlying data.
4. Legal and Social Implications: Fairness definitions must align with relevant regulations and societal values.