ML-21AI63
ML-21AI63
Definition:
Importance:
- Predictive Analytics: ML predicts future trends based on historical data, crucial for
financial forecasting, medical diagnosis, etc.
Applications:
- Supervised Learning: The algorithm learns from labeled data to make predictions.
Example: Spam email detection.
Issues:
- Bias and Fairness: ML models can inherit biases from training data, leading to unfair or
discriminatory outcomes.
- Data Privacy: Collecting and using large amounts of personal data can lead to privacy
concerns.
- Overfitting: Models might perform well on training data but poorly on unseen data due
to excessive complexity.
Perspectives:
Steps:
1. Problem Definition: Clearly define the problem and objectives. Example: Predicting
house prices.
2. Data Collection: Gather relevant data needed for training. Example: Historical
property sales data.
3. Data Preparation: Clean and preprocess the data to ensure quality. Example:
Handling missing values and normalizing features.
4. Model Selection: Choose an appropriate ML model. Example: Linear regression for
predicting house prices.
5. Training: Train the model using the prepared data. Example: Fitting the linear
regression model on the historical data.
6. Evaluation: Assess the model's performance using metrics. Example: Mean squared
error for regression tasks.
8. Monitoring and Maintenance: Continuously monitor and update the model to ensure
its relevance and accuracy.
Definition:
A well-posed learning problem is one where the problem has a clear solution, and the
conditions are well-defined.
Examples:
- Classification Problem: Given a dataset of emails labeled as spam or not spam, a well-
posed problem is to classify new emails into these categories based on their features.
- Regression Problem: Predicting house prices based on features like size and location.
The problem is well-posed if there is a clear relationship between features and prices.
5. Concept Learning
Definition:
Concept learning is a type of supervised learning where the goal is to learn a general
concept or class from examples.
Task:
The task is to generalize from specific examples to make predictions about new
instances.
Concept learning can be viewed as a search task where the algorithm searches through
a space of possible hypotheses to find the one that best fits the training data. Example:
Learning the concept of “triangle” by searching through geometric shapes to identify
those that meet the criteria of having three sides.
6. Definitions
A model that learns patterns from unlabeled data. Example: Clustering algorithms like
K-means, which group similar data points without predefined labels.
A hypothesis in machine learning that assumes the learning process can generalize
from specific examples to broader patterns. It guides the creation of models that make
predictions on unseen data.
The set of all hypotheses that are consistent with the training data. Example: If the
training data consists of positive and negative examples of fruits, the version space
includes all hypotheses that classify fruits correctly as positive or negative.
- Specific Boundary: Represents the most specific hypotheses. Example: "A fruit is an
apple" is a specific boundary.
- Multilabel Classification:
python
clf = MultiOutputClassifier(SVC())
clf.fit(X_train, y_train)
- Multiclass Classification:
python
Load dataset
data = load_iris()
X = data.data
y = data.target
Train-test split
Train model
clf = SVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
python
clf = MultiOutputClassifier(LogisticRegression())
clf.fit(X, y)
python
import numpy as np
cm = confusion_matrix(y_true, y_pred)
print('Confusion Matrix:\n', cm)
python
print(f'Precision: {precision}')
print(f'Recall: {recall}')
python
Load dataset
data = load_iris()
X = data.data
y = data.target
Initialize model
clf = SVC()
Perform cross-validation
- Batch Gradient Descent: Uses the entire dataset to compute the gradient of the loss
function. It's precise but can be slow for large datasets.
- Stochastic Gradient Descent (SGD): Updates the parameters using one or a few
samples at a time. This provides faster convergence and can handle larger datasets but
introduces more variance in updates.
Pros: Faster updates, good for large datasets.
Pros: Faster than batch gradient descent, less noisy than SGD.
Error Analysis: Involves evaluating where a classification model makes errors and
understanding the reasons behind them. It helps in improving the model by identifying
patterns or specific classes where the model performs poorly.
Example: Consider a multiclass classification problem with three classes (A, B, C). If
the model frequently misclassifies Class A as Class B, the confusion matrix will show a
high number of false positives for Class A and false negatives for Class B. Analyzing this
can help identify if there are overlapping features or if the model needs additional
training data.
python
model.fit(X_train, y_train)
- L2 Regularization (Ridge): Adds the squared values of the weights to the loss function,
encouraging smaller weights but not necessarily zeroing them out.
python
model.fit(X_train, y_train)
- Polynomial Kernel: Computes the dot product of the input features raised to a
polynomial degree, allowing for learning non-linear relationships.
python
model.fit(X_train, y_train)
- Gaussian (RBF) Kernel: Measures the similarity between points using a Gaussian
function, e ective for capturing complex relationships.
python
model.fit(X_train, y_train)
Training Process:
python
Create a dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Quadratic Programming (QP): Used in SVMs to solve the optimization problem of finding
the optimal hyperplane by maximizing the margin and minimizing classification errors.
Kernelized SVM: Uses kernel functions to map data into higher-dimensional space to
handle non-linearly separable data.
python
model.fit(X_train, y_train)
16. Decision Trees and CART Training Algorithm
(i) Decision Trees: A model that splits data into subsets based on feature values, leading
to a tree structure with branches representing decisions.
How they are used: Decision trees are used for both classification and regression tasks,
providing a visual representation of decision-making.
python
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
(ii) CART (Classification and Regression Trees): A decision tree algorithm that uses
binary splits to partition data and is used for classification and regression tasks.
python
clf.fit(X_train, y_train)
python
model.fit(X_train, y_train)
- Pasting: Similar to bagging but uses subsets of the data without replacement.
python
model.fit(X_train, y_train)
Combines predictions from multiple models by voting, either using hard voting (majority
vote) or soft voting (average probability).
python
Initialize classifiers
clf1 = LogisticRegression()
clf2 = SVC(probability=True)
clf3 = DecisionTreeClassifier()
Voting classifier
voting_clf.fit(X_train, y_train)
(i) AdaBoost
python
data = load_iris()
X = data.data
y = data.target
base_model = DecisionTreeClassifier(max_depth=1)
Initialize AdaBoost
Train model
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Gradient Boosting: A boosting technique that builds models sequentially. Each model
tries to correct the errors of the previous one by focusing on the residual errors.
python
Load data
data = load_iris()
X = data.data
y = data.target
model = GradientBoostingClassifier(n_estimators=50)
Train model
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
python
Load data
data = load_iris()
X = data.data
y = data.target
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
plt.figure(figsize=(12, 8))
tree.plot_tree(clf, feature_names=data.feature_names,
class_names=data.target_names, filled=True)
plt.show()
python
Load data
data = load_iris()
X = data.data
y = data.target
base_models = [
('dt', DecisionTreeClassifier()),
('svc', SVC(probability=True))
Define meta-model
meta_model = LogisticRegression()
Initialize and train stacking classifier
stacking_clf = StackingClassifier(estimators=base_models,
final_estimator=meta_model)
stacking_clf.fit(X_train, y_train)
y_pred = stacking_clf.predict(X_test)
Bayes' Theorem: A principle used to update the probability estimate for a hypothesis
based on new evidence.
Cross Entropy: Measures the di erence between two probability distributions for a
given random variable. In machine learning, it is used to quantify the performance of
classification models.
Cross Entropy Formula:
Minimum Description Length (MDL): A principle of model selection based on the idea of
minimizing the total length of the description of the data and the model. It balances the
model complexity and the fit to the data, aiming to find a model that provides the
simplest explanation for the observed data.
- E-Step (Expectation): Compute the expected value of the log-likelihood given the
current parameters.
K-Means Clustering: A special case of EM algorithm where the expectation step assigns
data points to clusters, and the maximization step updates the cluster centroids.
K-Means Algorithm:
python
Apply K-Means
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
Visualize clustering
plt.show()
Naive Bayes Classifier: A probabilistic classifier based on Bayes' Theorem with the
assumption of feature independence. It's commonly used for text classification, spam
filtering, and more.
Example:
python
Load data
data = load_iris()
X = data.data
y = data.target
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test)
Explanation:
- The classifier computes probabilities for each class based on the feature values and
class frequencies.
Example (simplified):
python
import numpy as np
Least Squares Error (LSE): A method for estimating parameters by minimizing the sum
of squared di erences between observed and predicted values.
Relationship:
- In linear regression, minimizing the least squares error is equivalent to maximizing the
likelihood under the assumption of normally distributed errors.
Example:
python
np.random.seed(42)
X = np.random.rand(100, 1)
y = 3 X.squeeze() + np.random.randn(100)
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)