AIML-411T
Advances in Machine Learning
UNIT II
B.Tech (AIML)
August, 2025
Megha Gupta
ssistant Professor, VIPS-TC
[email protected]
Syllabus
Model Interpretability and Explainability: Feature importance and SHAP
values, LIME (Local Interpretable Model-agnostic Explanations),
Explainable AI (XAI) techniques
Explainable AI (XAI) Techniques
Explainable AI (XAI) refers to methods and tools that help make machine
learning models more understandable and transparent to humans. As
complex models (like neural networks and ensemble models) are often
hard to interpret, XAI techniques aim to provide insights into how these
models make decisions, which is essential for trust, accountability,
and improving model performance.
Key XAI Techniques:
1. Feature Importance
2. SHAP (SHapley Additive exPlanations) Values
3. LIME (Local Interpretable Model-Agnostic Explanations)
Explainable artificial intelligence (XAI) refers to a collection of
procedures and techniques that enable machine learning algorithms to
produce output and results that are understandable and reliable for
human users. Explainable AI is a key component of the fairness,
accountability, and transparency (FAT) machine learning paradigm
and is frequently discussed in connection with deep learning.
Organizations looking to establish trust when deploying AI can benefit
from XAI. XAI can assist them in comprehending the behavior of an AI
model and identifying possible problems like AI.
Why Explainable AI is needed?
The need for explainable AI arises from the fact that traditional
machine learning models are often difficult to understand and
interpret. These models are typically black boxes that make
predictions based on input data but do not provide any insight into
the reasoning behind their predictions. This lack of transparency
and interpretability can be a major limitation of traditional
machine learning models and can lead to a range of problems and
challenges.
One major challenge of traditional machine learning models is that
they can be difficult to trust and verify. Because these models are
opaque and inscrutable, it can be difficult for humans to understand
how they work and how they make predictions. This lack of trust and
understanding can make it difficult for people to use and rely on
these models and can limit their adoption and deployment.
Origin of Explainable AI
The origins of explainable AI can be traced back to the early days of machine
learning research when scientists and engineers began to develop algorithms and
techniques that could learn from data and make predictions and inferences. As
machine learning algorithms became more complex and sophisticated, the need for
transparency and interpretability in these models became increasingly important,
and this need led to the development of explainable AI approaches and methods.
One of the key early developments in explainable AI was the work of Judea
Pearl, who introduced the concept of causality in machine learning, and
proposed a framework for understanding and explaining the factors that are most
relevant and influential in the model's predictions. This work laid the foundation
for many of the explainable AI approaches and methods that are used today and
provided a framework for transparent and interpretable machine learning.
Another important development in explainable AI was the work of LIME (Local
Interpretable Model-agnostic Explanations), which introduced a method for
providing interpretable and explainable machine learning models. This method
uses a local approximation of the model to provide insights into the factors
that are most relevant and influential in the model's predictions and has been
widely used in a range of applications and domains.
Benefits of explainable AI
1. Improved decision-making:- Explainable AI can provide valuable
insights and information that can be used to support and improve
decision-making. For example, explainable AI can provide insights into
the factors that are most relevant and influential in the model's
predictions, and can help to identify and prioritize the actions and
strategies that are most likely to achieve the desired outcome.
2. Increased trust and acceptance:- Explainable AI can help to build
trust and acceptance of machine learning models, and can overcome the
challenges and limitations of traditional machine learning models, which
are often opaque and inscrutable. This increased trust and acceptance
can help to accelerate the adoption and deployment of machine learning
models and can provide valuable insights and benefits in different
domains and applications.
3. Reduced risks and liabilities:- Explainable AI can help to reduce
the risks and liabilities of machine learning models, and can provide a
framework for addressing the regulatory and ethical considerations of
this technology. This reduced risk and liability can help to mitigate the
potential impacts and consequences of machine learning, and can
provide valuable insights and benefits in different domains and
applications.
How does Explainable AI work?
In general, explainable AI architecture can be thought of as a combination of three
key components:
Machine learning model:- The machine learning model is the core component of
explainable AI, and represents the underlying algorithms and techniques that are
used to make predictions and inferences from data. This component can be based
on a wide range of machine learning techniques, such as supervised,
unsupervised, or reinforcement learning, and can be used in a range of
applications, such as medical imaging, natural language processing, and computer
vision.
Explanation algorithm:- The explanation algorithm is the component of
explainable AI that is used to provide insights and information about the factors
that are most relevant and influential in the model's predictions. This component
can be based on different explainable AI approaches, such as feature importance,
attribution, and visualization, and can provide valuable insights into the workings
of the machine learning model.
Interface:- The interface is the component of explainable AI that is used to
present the insights and information generated by the explanation algorithm to
humans. This component can be based on a wide range of technologies and
platforms, such as web applications, mobile apps, and visualizations, and can
provide a user-friendly and intuitive way to access and interact with the insights
and information generated by the explainable AI system.
Explainable AI principles
Some of the key XAI principles include:
Transparency:- XAI should be transparent and should provide
insights and information about the factors that are most relevant and
influential in the model's predictions. This transparency can help to
build trust and acceptance of XAI and can provide valuable insights
and benefits in different domains and applications.
Interpretability:- XAI should be interpretable and should provide a
clear and intuitive way to understand and use the insights and
information generated by XAI. This interpretability can help to
overcome the challenges and limitations of traditional machine
learning models, which are often opaque and inscrutable, and can
provide valuable insights and benefits in different domains and
applications.
Accountability:- XAI should be accountable and should provide a
framework for addressing the regulatory and ethical considerations of
machine learning. This accountability can help to ensure that XAI is
used in a responsible and accountable manner, and can provide
valuable insights and benefits in different domains and applications.
Explainable AI approaches
Some of the most common explainable AI approaches include:
Feature importance:- This approach is based on the idea that each
input feature or variable contributes to the model's predictions in a
different way, and that some features are more important than others.
Feature importance techniques aim to identify and rank the
importance of each feature, and can provide insights into the factors
that are most relevant and influential in the model's predictions.
Attribution:- This approach is based on the idea that each input feature
or variable contributes to the model's predictions in a different way, and
that these contributions can be measured and quantified. Attribution
techniques aim to attribute the model's predictions to each input
feature and can provide insights into the factors that are most relevant
and influential in the model's predictions.
Visualization:- This approach is based on the idea that graphical and
visual representations can be more effective and intuitive than numerical
and textual representations in explaining and interpreting machine
learning models. Visualization techniques aim to represent the model's
structure, parameters, and predictions in a visual and interactive way and
can provide insights into the model's behavior and performance.
Explainable AI (XAI) Techniques
LIME (Local Interpretable Model-agnostic Explanations):- LIME
is a popular XAI approach that uses a local approximation of the
model to provide interpretable and explainable insights into the
factors that are most relevant and influential in the model's
predictions. To implement LIME in python, you can use the lime
package, which provides a range of tools and functions for
generating and interpreting LIME explanations.
SHAP (SHapley Additive exPlanations):- SHAP is an XAI approach
that uses the Shapley value from game theory to provide
interpretable and explainable insights into the factors that are most
relevant and influential in the model's predictions. To implement
SHAP in python, you can use the shap package, which provides a
range of tools and functions for generating and interpreting SHAP
explanations.
Current Limitations of XAI
There are several current limitations of explainable AI (XAI) that are important to consider. Some
of the key limitations of XAI include:
Computational complexity:- Many XAI approaches and methods are computationally
complex, and can require significant resources and processing power to generate and
interpret the insights and information that they provide. This computational complexity can
be a challenge for real-time and large-scale applications and can limit the use and
deployment of XAI in these contexts.
Limited scope and domain-specificity:- Many XAI approaches and methods are limited in
scope and domain-specificity, and may not be applicable or relevant to all machine learning
models and applications. This limited scope and domain-specificity can be a challenge for XAI
and can limit the use and deployment of this technology in different domains and
applications.
Lack of standardization and interoperability:- There is currently a lack of
standardization and interoperability in the XAI field, and different XAI approaches and
methods may use different metrics, algorithms, and frameworks, which can make it difficult
to compare and evaluate these approaches and can limit the use and deployment of XAI in
different domains and applications.
Overall, there are several current limitations of XAI that are important to consider, including
computational complexity, limited scope and domain-specificity, and a lack of standardization
and interoperability. These limitations can be challenging for XAI and can limit the use and
deployment of this technology in different domains and applications.
Explainable AI Case studies
Medical imaging:- In the medical imaging domain, explainable AI techniques can
be used to provide insights into the factors that are most relevant and influential in
the diagnosis of diseases and conditions. For example, explainable AI techniques
can be used to identify and visualize the features that are most important in the
diagnosis of cancer and can provide insights into the factors that are most
predictive of a positive or negative outcome.
Natural language processing:- In the natural language processing domain,
explainable AI techniques can be used to provide insights into the factors that are
most relevant and influential in the interpretation and analysis of the text. For
example, explainable AI techniques can be used to identify and visualize the words
and phrases that are most important in the classification of sentiment and can
provide insights into the factors that are most predictive of positive or negative
sentiment.
Computer vision:- In the computer vision domain, explainable AI techniques can
be used to provide insights into the factors that are most relevant and influential in
the recognition and classification of images. For example, explainable AI
techniques can be used to identify and visualize the regions of an image that are
most important in the classification of objects and can provide insights into the
factors that are most predictive of a specific object class.
LIME (or Local Interpretable Model-
agnostic Explanations)
The beauty of LIME its accessibility and simplicity. The core idea behind LIME
though exhaustive is really intuitive and simple! Let's dive in and see what
the name itself represents:
Model agnosticism refers to the property of LIME using which it can give
explanations for any given supervised learning model by treating it as a
'black box' separately. This means that LIME can handle almost any model
that exists out there in the wild!
Local explanations mean that LIME gives explanations that are locally
faithful within the surroundings or vicinity of the observation/sample being
explained.
Though LIME limits itself to supervised Machine Learning and
Deep Learning models in its current state, it is one of the most popular
and used XAI methods out there.
How LIME works?
Broadly speaking, when given a prediction model and a test
sample, LIME does the following steps:
Sampling and obtaining a surrogate dataset: LIME provides locally
faithful explanations around the vicinity of the instance being explained. By
default, it produces 5000 samples(see the num_samples variable) of the
feature vector following the normal distribution. Then it obtains the target
variable for these 5000 samples using the prediction model, whose
decisions it's trying to explain.
Feature Selection from the surrogate dataset: After obtaining the
surrogate dataset, it weighs each row according to how close they are to the
original sample/observation. Then it uses a feature selection technique
like Lasso to obtain the top important features.
LIME also employs a Ridge Regression model on the samples using only the
obtained features. The output prediction should theoretically be similar in
magnitude to the one output by the original prediction model. This is done
to stress the relevance and importance of these obtained features.
Coming to the installation part, we can use pip to install LIME in Python.
pip install lime
Import the required modules, such as lime, NumPy, and sklearn, by
running the following code:
import lime
import numpy as np
import sklearn.ensemble
import lime.lime_tabular
import IPython
from sklearn import datasets
Load the data and train the machine learning model
# load the data and train the model
X, y = sklearn.datasets.load_iris(return_X_y=True)
model = sklearn.ensemble.RandomForestClassifier()
model.fit(X, y)
Create a LIME explainer instance
# create a LIME explainer instance
explainer = lime.lime_tabular.LimeTabularExplainer( X,
feature_names=['sepal length', 'sepal width', 'petal length', 'petal
width'],
class_names=['setosa', 'versicolor', 'virginica']
)
Generate the LIME explanation
# generate the LIME explanation
exp = explainer.explain_instance(X[0], model.predict_proba,
num_features=4)
Saving the generated lime explanation output in an HTML file
named op.html
file =open('op.html','w', encoding="utf-8")
Output :
Shapley Additive Explanations
(SHAP)
SHapley Additive exPlanationsis a model-agnostic method, which means
that it is not restricted to a certain model type, and it is a local method
which means that it only provides explanations for individual samples.
However, the individual explanations can be used to also
get global interpretations. SHAP was introduced in 2017 by Lundberg et al.
To summarize, SHAP is a method that enables a fast computation of
Shapley values and can be used to explain the prediction of an instance x
by computing the contribution (Shapley value) of each feature to the
prediction. We get contrastive explanations that compare the prediction
with the average prediction. The fast computation makes it possible to
compute the many Shapley values needed for the global model
interpretations. With SHAP, global interpretations are consistent with the
local explanations, since the Shapley values are the “atomic unit” of the
global interpretations. If you use LIME for local explanations and
permutation feature importance for global explanations, you lack a
common foundation. SHAP provides KernelSHAP, an alternative, kernel-
based estimation approach for Shapley values inspired by local surrogate
models, as well as TreeSHAP, an efficient estimation approach for tree-
based models.
How to compute Shapley Values?
To summarize, focusing on a specific instance j, we can compute the Shapley value for each
feature k = 1, …, K. Let’s choose feature A (a player in game theory) and compute the Shapley
values , ϕA , as the weighted sum of the marginal contributions on different sets s:
where the marginal contributions are the difference between the prediction on a subset containing
the feature of interest, and on the subset without that feature
For the weights we need to consider that the larger and smaller set have the highest weight.
To calculate each of those marginal contributions, we’re required to determine the
predicted value across all potential feature subsets. This necessitates computing
the prediction a total of times, considering every possible combination of
features. Theoretically, model retraining for each subset is necessary, but SHAP
approximates these predictions by leveraging the model trained on the entire set.
Indeed, we can use this model to run inference on various coalitions by
manipulating feature values—often by shuffling or altering them—and estimate
the impact of different feature subsets without the need for full retraining.
Once we compute the Shapley value for each feature k, we can use them to
understand the impact they have on the prediction for the specific instance j we
are focusing on. Moreover, a very important property of the Shapley values is that
they add to the difference between the model prediction (with all the features,
and the baseline), giving a way to estimate how each feature contributes to
deviating the prediction compared to the baseline:
where is the prediction for the instance (the one considering all the available
features) and is called baseline, i.e. the prediction computed when all features
are excluded. In a tabular data set it is often computed as the average prediction
among all the instances in the data set. In practical situations, because of
numerous approximations, the property may not be entirely fulfilled.
Exercise
A simplified version of the Boston housing data set (link) that collects
information about the percentage of the population that is working class, the
number of rooms, and the nitric oxides concentration (parts per million) of a
house. For simplicity, we will call the features A, B, and C. Table 1 shows a
small example of the dataset.
Let’s imagine you trained a machine learning model to predict the house
price (regression problem) and you want to explain the results. Specifically,
you want to understand which features are impacting your new prediction.
Considering that you trained the model and you are able to run inference,
you can have the value of the prediction for all the combinations of features
(we exclude features by shuffling them); the values of the prediction on all
the possible subsets for features are summarized in Figure 1.
Task: Compute the Shapley value for the instance house_1, ϕ(house1)
Figure 1:Values of inference for different feature combinations.
Solution
Let’s compute the Shapley value for each feature and the first
instance house_1, knowing that the model prediction is
F(house1) =24.2K$ and this corresponds to the prediction made
considering all the available features (model(A, B, C)).
Feature A
We compute all the marginal contributions first:
Thus,
where the sets s₁ and s₄ have higher weights (1/3 each).
Feature B
Similarly:
Thus,
Feature C
Similarly:
Thus:
Putting everything together:
We can conclude that features A and B contribute positively (increasing
predicted value), while feature C contributes negatively (reducing the
prediction).
SHAP values plotted in red (positive contributions) and blue
(negative contributions).
FEATURE EXTRACTION