MLquestions
MLquestions
fi
What is cross-validation and why is it used?
what is hyperparameter
Hyperparameters are parameters that are not learned from the data
but are set manually before training a machine learning model.
They de ne the characteristics of the model and affect its
learning process, performance, and generalization ability.
fi
fi
:
Boosting
- Boosting involves creating a sequence of models, where each
subsequent model focuses on correcting the mistakes made
by the previous models
- Each model in the sequence is trained on a modi ed version of
the training data, where instances that were misclassi ed by
previous models are given higher weights or importance
- The models are typically created sequentially, with each model
trying to improve the overall performance of the ensemble
- During prediction, each model's prediction is weighted based on
its performance, and the nal prediction is determined by
combining the weighted predictions of all the models
- Examples of boosting algorithms include AdaBoost, Gradient
Boosting, and XGBoost
fi
.
fi
:
fi
.
fi
.
fi
.
what is ensemble?
fi
fi
.
fi
fi
.
fi
:
fi
.
fi
.
fi
.
fi
fi
various domains and has proven to be an effective technique
in improving machine learning models' performance.
/
Theory
Explanation
In the given example of AdaBoost with three sequential models, each model's prediction is
assigned a weight based on its performance on the training data. The weights re ect the
model's ability to correctly classify instances in the training set. The nal prediction is
obtained by combining the weighted predictions of all the models.
1 Model 1 predicts class A with a weight of 0.5: Model 1 is the rst model in the
boosting process. It makes predictions on the training data, and based on its
performance, it is assigned a weight of 0.5. This weight indicates the model's
in uence on the nal prediction. If Model 1 performs well, its weight will be higher,
indicating that its predictions are more reliable.
*
fl
.
-
fi
.
fi
.
fi
fi
fl
fi
fl
2 Model 2 predicts class B with a weight of 0.2: Model 2 is built to correct the
mistakes made by Model 1. It focuses on the instances that Model 1 struggled to
classify correctly. After training, Model 2 makes predictions on the training data, and
its performance determines its weight. In this case, Model 2 is assigned a weight of
0.2, indicating its lower in uence on the nal prediction compared to Model 1.
3 Model 3 predicts class A with a weight of 0.3: Model 3 is built to further improve the
accuracy by focusing on the instances that both Model 1 and Model 2 failed to
classify correctly. After training, Model 3 makes predictions on the training data, and
its performance determines its weight. In this example, Model 3 is assigned a
weight of 0.3.
To obtain the nal prediction, the weighted predictions of all the models are combined. The
weights indicate the importance of each model's prediction in the nal outcome. In this
case, the nal prediction would involve a weighted combination of the predictions made by
Model 1, Model 2, and Model 3, with weights of 0.5, 0.2, and 0.3, respectively.
The purpose of assigning weights is to emphasize the predictions of models that perform
well on the training data while downweighting the ones that struggle. By iteratively training
new models to focus on the challenging instances, boosting aims to improve the overall
performance and reduce bias. The weights allow the ensemble to give more consideration
to the models that have demonstrated better predictive ability on the training data.
In bagging, each individual model is trained on a different subset of the training data using
the same learning algorithm. During prediction, the individual models independently make
their predictions, and the nal prediction is determined by aggregating the predictions
through voting (for classi cation) or averaging (for regression).
.
.
fi
.
fi
fi
fl
fi
fi
fi
Let's consider an example of bagging in a classi cation problem with three individual
models. Each model is a decision tree trained on a different subset of the training data:
In the case of averaging, the nal prediction is obtained by averaging the predicted
probabilities or scores assigned to each class by the individual models. Let's assume the
predicted probabilities for class A by the three models are as follows:
For example, let's consider a boosting algorithm like AdaBoost with three sequential
models:
Voting/Averaging in bagging and boosting allows the ensemble models to bene t from the
diversity and collective wisdom of the individual models. It helps to improve the overall
•
•
•
•
•
•
•
•
•
fi
fi
fi
fi
fi
fi
fi
fl
fi
fi
predictive accuracy, reduce bias, and handle uncertainties in the data. The speci c voting
or averaging mechanism depends on the problem type (classi cation or regression) and
the ensemble algorithm being used.
1 Bagging
2 Bagging algorithms, such as Random Forest, aim to reduce variance while
maintaining low bias. Here's how the bias-variance tradeoff plays out:
Bias: Each individual model in the ensemble is typically trained on a subset of the
data and has limited access to the entire dataset. Therefore, each model has a
certain degree of bias. However, as the ensemble combines multiple models, the
collective bias tends to decrease.
Example: Consider a classi cation problem where a Random Forest ensemble with 100
decision trees is used. Each tree is trained on a different subset of the training data.
Individually, the trees might have high variance and over t the training data. However, by
averaging their predictions, the ensemble can reduce variance and provide more robust
predictions.
2 Boosting
3 Boosting algorithms, such as AdaBoost and Gradient Boosting, aim to reduce bias
while controlling variance. Let's see how the bias-variance tradeoff applies:
Bias: Boosting algorithms initially start with a weak model, which typically has high
bias. The subsequent models are then trained to focus on the instances that the
•
•
•
.
.
.
.
:
fi
fl
fi
fi
fi
previous models struggled with, thereby reducing bias. As the boosting process
continues, the ensemble gradually reduces the bias and improves the overall
model's accuracy.
Example: Suppose we have a binary classi cation problem where AdaBoost is employed.
Initially, the weak model (e.g., a decision stump) might have limited predictive power and
high bias. Subsequent models are then built to address the misclassi ed instances. Each
model focuses on different regions of the data, trying to reduce bias. However, as the
ensemble grows, there is a potential for higher variance due to the complex nature of the
model.
In summary, bagging aims to reduce variance by combining models with different biases,
while boosting aims to reduce bias by iteratively improving the models at the cost of
potentially increased variance. The tradeoff between bias and variance in bagging and
boosting algorithms depends on the speci c ensemble method used, the number of
models in the ensemble, and the characteristics of the training data. The goal is to strike a
balance that minimizes the overall error and provides good generalization to unseen data.
fi
fi
The purpose of dimensionality reduction in machine learning is to
reduce the number of features or variables in a dataset while
preserving the most relevant information. Dimensionality
reduction techniques are employed when working with high-
dimensional datasets that contain a large number of features,
which can lead to various challenges and limitations. Some of
the main purposes of dimensionality reduction include:
fi
.
fi
fi
model becomes less prone to over tting and focuses on the
most informative features.
fi
fi
fi
fi
:
fi
.
5. **Training the Model**: Feed the training data into the chosen
model and allow it to learn the patterns and relationships in the
data. The model adjusts its internal parameters during the training
process to minimize the chosen loss or cost function
fi
fi
.
fi
.
Q)-What are precision and recall, and how are they related to the
concept of accuracy?
fi
.
fi
:
Precision, recall, and accuracy are related to each other, but they
focus on different aspects of the classi cation performance
fi
:
fi
)
fi
)
fi
cost of false positives or false negatives differs. Accuracy, while
important, may not fully capture the performance in such cases. It is
essential to consider precision, recall, and accuracy together to gain
a comprehensive understanding of a classi er's performance.
fi
fi
fi
confusion matrix
Predicted Clas
| Positive | Negative
--------------------------------------
Actual Class |
Positive | True Positive (TP) | False Negative (FN)
Negative | False Positive (FP) | True Negative (TN)
In this diagram
fi
Q)-How do you evaluate ML model?
fi
fi