0% found this document useful (0 votes)
14 views31 pages

Mid 1 Answer

Uploaded by

mount3172
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Mid 1 Answer

Uploaded by

mount3172
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Q.

1 Define Machine Learning


Machine Learning (ML):

●​ Machine → A device or system created by humans to perform tasks.​

●​ Learning → The process of acquiring knowledge, behaviors, skills, or


values.​

●​ Machine Learning → A computer system’s ability to learn from


experience, using algorithms and statistical models, by analyzing patterns
in data and improving its performance over time.​

Simple Definition:

Machine Learning is when computers learn from data (experience) to


perform tasks better without being explicitly programmed.

Example:

●​ When you type on your phone, it predicts the next word → the system
learns from previous text patterns.​

✅ Applications of Machine Learning


Machine learning is applied in almost every field of life. Below are key domains
and how they benefit from ML:

📊 1. Banking and Finance


●​ Challenge: Credit card fraud and customer churn.​

●​ How ML helps:​

○​ Detects fraudulent transactions by identifying abnormal patterns.​


○​ Analyzes customer behavior and helps banks retain customers by
offering personalized plans.​

🛡 2. Insurance
●​ Challenge: Managing risks and claims.​

●​ How ML helps:​

○​ Predicts customer risk during onboarding by analyzing past data.​

○​ Improves claims management by spotting unusual claims or


predicting fraud.​

🏥 3. Healthcare
●​ Challenge: Monitoring patient health.​

●​ How ML helps:​

○​ Uses data from wearable devices to predict health issues in real


time.​

○​ Alerts doctors or users if a critical health condition is detected,


allowing preventive action.​

📦 Other Examples (Additional Applications):


●​ Self-driving cars: Learns from environment data to navigate safely.​

●​ AI personal assistants (e.g., Google Assistant): Understands speech


and schedules tasks.​
●​ Recommendation systems (e.g., Amazon): Suggests products based on
user preferences.​

●​ Spam filters: Detects and classifies unwanted emails.​

●​ Image recognition: Identifies objects, faces, and scenes in pictures.​

✅ Key Points to Remember


1.​ Machine + Learning = ML → Computers learning from data.​

2.​ ML is everywhere → Finance, insurance, healthcare, transport,


e-commerce.​

3.​ ML helps in prediction, classification, risk management, and


personalization.​

4.​ Example → Fraud detection, health monitoring, self-driving cars.​

5.​ It uses algorithms like supervised, unsupervised, and reinforcement


learning.​

Here’s a detailed, point-wise, easy-to-remember explanation based on the


Bayes’ theorem and concept learning content from the PDF you uploaded
(Unit 3 – Bayesian Concept Learning):
Q.2 Bayes’ Theorem – Introduction
●​ What is it?​
A mathematical formula that helps compute the probability of a hypothesis
being true given some evidence.​

●​ Key idea:​
We update our belief (probability) about a hypothesis when we see new
data.​

●​ Who discovered it?​


Named after Thomas Bayes.​

●​ Used in:​
Classification, decision making, spam filtering, medical diagnosis,
recommendation systems.​

✅ Important Terms in Bayes’ Theorem


➤ Prior Probability (P(H))

●​ Represents what we believe about the hypothesis before seeing any new
data.​

●​ Example → Probability that a patient has a malignant tumor, based on


general population statistics.​

➤ Likelihood (P(X|H))

●​ The probability of observing the evidence given that the hypothesis is true.​

●​ Example → If a patient has a malignant tumor, what’s the chance the lab
test shows positive?​
➤ Posterior Probability (P(H|X))

●​ The updated belief after observing the evidence.​

●​ It combines prior knowledge and the new evidence.​

●​ Example → After seeing the lab test result, what’s the chance the patient
actually has a malignant tumor?​

➤ Evidence (P(X))

●​ The total probability of observing the evidence, regardless of which


hypothesis is true.​

✅ Bayes’ Theorem Formula


P(H∣X)=P(X∣H)×P(H)P(X)P(H|X) = \frac{P(X|H) \times P(H)}{P(X)}

●​ Posterior = (Likelihood × Prior) / Evidence​

Here’s another example of Bayes’ Theorem, explained in an


easy-to-remember, point-wise format, different from the tumor example:

✅ Example – Email Spam Filtering


Scenario:

You receive an email and want to check whether it’s spam or not based on
certain keywords like “Buy now”, “Free offer”, etc.
➤ Given Information:

●​ 20% of all emails are spam → P(Spam) = 0.20​

●​ 80% of all emails are not spam → P(Not Spam) = 0.80​

●​ If an email is spam, there’s a 70% chance it contains the keyword “Buy


now” → P(Buy now | Spam) = 0.70​

●​ If an email is not spam, there’s a 10% chance it contains the keyword “Buy
now” → P(Buy now | Not Spam) = 0.10​

➤ Question:

If an email contains the keyword “Buy now”, what is the probability that it’s
actually spam?

That is, find P(Spam | Buy now).


✅ Key Points to Remember:
✔ Even though only 20% of all emails are spam, the presence of the keyword
“Buy now” makes it much more likely to be spam.​
✔ The prior probability (P(Spam)) represents how common spam is in general.​
✔ The likelihood (P(Buy now | Spam)) represents how often spam emails use
that keyword.​
✔ The posterior probability tells us how suspicious the email is after seeing the
keyword.

✅ Applications
✔ Spam filtering in email systems​
✔ Identifying phishing attempts​
✔ Recommender systems analyzing user activity​
✔ Fraud detection in financial transactions

✅ Key Insights
✔ Bayes’ theorem helps update beliefs based on new evidence.​
✔ Prior knowledge is critical in determining the outcome.​
✔ Likelihood shows how strongly the evidence supports the hypothesis.​
✔ Posterior gives the final updated probability after considering evidence.​
✔ Even with accurate tests, rare conditions can still yield unexpected results.

✅ Use Cases
✔ Medical diagnosis (e.g., cancer, diseases)​
✔ Spam filtering in emails​
✔ Sentiment analysis in social media​
✔ Recommendation systems (e.g., shopping sites)​
✔ Decision-making in AI models

✅ Final Summary – Easy to Remember


●​ Prior → What you knew before seeing the data.​

●​ Likelihood → How well the data supports the hypothesis.​

●​ Posterior → What you believe after seeing the data.​

●​ Formula → Posterior = (Likelihood × Prior) / Evidence.​


Q.3 Types of Machine Learning – Overview
Machine learning can be grouped into three main types, depending on how the
learning is done and how the data is used:

1.​ Supervised Learning​

2.​ Unsupervised Learning​

3.​ Reinforcement Learning​

Each type has its own use cases, methods, and examples.

✅ 1. Supervised Learning
👉 Also called predictive learning.
✅ Definition:
●​ The model learns from labeled data (input + correct output).​

●​ It uses past data to predict or classify new data.​

✅ Key Characteristics:
✔ The data has inputs and outputs clearly labeled.​
✔ The model learns the relationship between inputs and outputs.​
✔ It’s used when the goal is to predict outcomes.

✅ Examples:
●​ Predicting whether a tumor is malignant or benign.​

●​ Predicting house prices based on area, location, etc.​

●​ Classifying emails as spam or non-spam.​

●​ Forecasting demand or stock prices.​

✅ Types within Supervised Learning:


1.​ Classification:​
✔ Predicts a category or class.​
✔ Example: Image classification → Cat, Dog, or Bird.​

2.​ Regression:​
✔ Predicts a continuous value.​
✔ Example: Forecasting weather temperatures or sales numbers.​

✅ Important Notes:
●​ Accuracy depends on the quality of labeled data.​

●​ Better data → better prediction.​

✅ 2. Unsupervised Learning
👉 Also called descriptive learning.
✅ Definition:
●​ The model learns from unlabeled data without known outputs.​

●​ It finds patterns or structures within the data.​

✅ Key Characteristics:
✔ The data is unlabeled → no correct answers provided.​
✔ The model groups data or finds patterns.​
✔ It’s used to explore hidden structures in data.

✅ Examples:
●​ Customer segmentation → Grouping customers based on purchase
behavior.​

●​ Credit card fraud detection → Identifying suspicious patterns.​

●​ Recommendation systems → Finding similar products for users.​

✅ Types within Unsupervised Learning:


1.​ Clustering:​
✔ Groups similar data points together.​
✔ Example: Grouping customers with similar shopping habits.​

2.​ Association:​
✔ Finds rules or relationships between data items.​
✔ Example: Market basket analysis → People who buy bread also buy
butter.​
✅ Important Notes:
●​ Good for exploratory data analysis.​

●​ Helps in identifying hidden relationships.​

✅ 3. Reinforcement Learning
👉 Also called feedback-based learning.
✅ Definition:
●​ The model learns by trial and error, interacting with the environment.​

●​ It receives rewards or penalties based on actions.​

✅ Key Characteristics:
✔ There is no fixed dataset → learning happens in real-time.​
✔ The model learns from consequences of actions.​
✔ It’s used where long-term goals matter.

✅ Examples:
●​ Self-driving cars → Learn to navigate safely.​

●​ Robots → Learn optimal paths or actions.​

●​ Game-playing AI → Learns strategies by competing against itself.​


✅ Important Notes:
●​ Works in dynamic environments.​

●​ Optimizes performance over time.​

✅ Comparison of Types of Machine Learning


Feature Supervised Unsupervised Reinforcement
Learning Learning Learning

Type of Labeled data Unlabeled data Interaction data


data

Output Prediction/classificati Patterns/clusters Actions & rewards


on

Use case Forecasting, Grouping, discovering Navigation, gaming


classification patterns

Feedbac Supervised signal No feedback Reward/Penalty


k signal

Example Spam detection Customer Self-driving cars


segmentation
✅ Final Summary – Easy to Remember
✔ Supervised → “Teacher-guided learning”​
✔ Unsupervised → “Finding hidden patterns”​
✔ Reinforcement → “Learning from rewards & penalties”

Q.4 Support Vector Machine (SVM) – Basics


✔ SVM is a supervised learning algorithm​
✔ Mainly used for classification, but also applied in regression​
✔ It finds the best decision boundary (called a hyperplane) that separates
classes​
✔ The goal is to maximize the margin — the distance between the classes and
the hyperplane​
✔ The data points closest to the hyperplane are called support vectors, and
they define the hyperplane

📌 Key Terms
➤ Hyperplane

●​ The decision boundary that separates different classes​

●​ In 2D, it’s a line; in 3D, it’s a plane; in higher dimensions, it’s a hyperplane​

●​ We aim to find the best hyperplane that separates the classes with the
largest margin​

➤ Support Vectors

●​ Data points closest to the hyperplane​


●​ They “support” or define the boundary​

●​ These are the most critical elements that the SVM uses to construct the
decision boundary​

➤ Maximum Margin Hyperplane (MMH)

●​ The hyperplane that maximizes the distance (margin) between the closest
data points of both classes​

●​ A wider margin helps the model generalize better and avoid


misclassification​

●​ For linearly separable data, MMH is easy to find by enclosing data points in
convex hulls​

●​ For non-linearly separable data, SVM uses the kernel trick to transform
data into higher dimensions where it becomes separable​

✅ Types of SVM
1️⃣ Linear SVM

●​ Used when the data can be separated with a straight line or hyperplane​

●​ Example → If two classes can be divided by a line in a 2D plot, Linear


SVM is applicable​

2️⃣ Non-Linear SVM

●​ Used when data cannot be separated by a straight line​


●​ SVM uses kernel functions (like polynomial or radial basis function) to
transform data into higher dimensions​

●​ Example → Complex datasets like circles, spirals, or overlapping clusters


can be handled using Non-linear SVM​

✅ Algorithm Steps (Basic Idea)


1.​ Input: Training data with labeled classes​

2.​ Find the best hyperplane that maximizes the margin between classes​

3.​ Identify the support vectors closest to the boundary​

4.​ For non-linear problems, apply kernel functions to map data to a


higher-dimensional space​

5.​ Optimize the hyperplane to reduce misclassification​

6.​ Classify new data points based on their position relative to the hyperplane​

✅ Strengths of SVM
✔ Works for both classification and regression​
✔ Effective with noisy data and outliers​
✔ Provides promising prediction results​
✔ Well-suited for binary classification tasks​
✔ Maximizes margin for better generalization

✅ Weaknesses of SVM
✔ Mostly applicable to binary classification​
✔ Complex and hard to interpret in high-dimensional spaces​
✔ Slow for large datasets with many features or instances​
✔ Memory-intensive computations required for large datasets​
✔ Difficult to understand the model like a “black box”

✅ Applications of SVM
✔ Bioinformatics → Detecting cancer or genetic disorders by classifying data
into two groups​
✔ Face Detection → Separating images into face vs. non-face​
✔ Image Classification → Identifying objects in images​
✔ Text Categorization → Classifying documents, emails, or news articles​
✔ Financial predictions → Credit risk assessment, stock trends

✅ Final Summary – Easy to Remember


✔ SVM = Finding the best boundary to separate data​
✔ Support Vectors = Critical points that define the boundary​
✔ MMH = Largest margin between classes → better generalization​
✔ Linear SVM → straight line separation​
✔ Non-linear SVM → kernel trick for complex data​
✔ Strengths → robust, accurate, margin maximization​
✔ Weaknesses → binary focus, complex, slow for big data​
✔ Applications → healthcare, image analysis, text classification
Q.5 k-Nearest Neighbors (kNN) – Basics
✔ kNN is a simple, supervised learning algorithm​
✔ Used for classification and regression, but primarily for classification​
✔ It works by comparing a new data point with its k closest neighbors from the
training set​
✔ The output is determined by the majority class among the neighbors (for
classification) or the average of neighbors (for regression)

📌 How kNN Works


1.​ Choose k → The number of neighbors to consider​

2.​ Measure distance → Find the distance between the new data point and
all training points (common methods: Euclidean, Manhattan)​

3.​ Select neighbors → Pick the k closest training points​

4.​ Vote or average →​

○​ Classification → Take the majority class among neighbors​

○​ Regression → Take the average of the neighbors’ values​

5.​ Assign the label → Based on the vote or average, assign the class or
value to the new point​

✅ Key Characteristics
✔ Instance-based → Learns by comparing examples, not by building a model​
✔ Lazy learning → Doesn’t train a model upfront; computes at the time of
classification​
✔ Non-parametric → Doesn’t make assumptions about data distribution
✅ Strengths of kNN
✔ Simple and intuitive – easy to understand and implement​
✔ No training phase – stores training data and makes predictions during
classification​
✔ Effective for small datasets – works well when data is not huge​
✔ Adaptable to non-linear data – doesn’t require the data to be linearly
separable​
✔ Works with multi-class problems – can classify into more than two
categories

✅ Weaknesses of kNN
✔ Computationally expensive – requires distance calculation with all data
points for each prediction​
✔ Sensitive to irrelevant or noisy features – irrelevant data can mislead
classification​
✔ Needs careful choice of k – too small leads to noise influence; too large
leads to smoothing over differences​
✔ Memory-intensive – stores all training data in memory​
✔ Curse of dimensionality – performance drops when dealing with
high-dimensional data due to sparse distances

✅ Applications of kNN
✔ Pattern Recognition → Handwriting, digit recognition​
✔ Medical Diagnosis → Classifying diseases based on patient attributes​
✔ Recommendation Systems → Suggest products similar to previous ones​
✔ Credit Risk Analysis → Classifying loan applicants as high or low risk​
✔ Customer Behavior Prediction → Understanding customer preferences in
marketing​
✔ Image Recognition → Identifying objects based on similar images
✅ Final Summary – Easy to Remember
✔ kNN = “Find your neighbors and ask them what they are!”​
✔ Strengths → Simple, no training, flexible​
✔ Weaknesses → Slow, needs memory, sensitive to noise and irrelevant
data​
✔ Applications → Medical, finance, marketing, image recognition

Here’s a detailed, point-wise, easy-to-remember explanation about


Regression, its types, and Linear Regression with example:

✅ What is Regression?
✔ Regression is a supervised learning technique used to predict a
continuous value based on input features.​
✔ It finds the relationship between the dependent variable (output) and
independent variables (inputs).​
✔ Used when the target variable is numerical, such as price, temperature, or
salary.​
✔ The goal is to fit a model that best explains how the output depends on the
inputs.

📌 Key Points
✔ It’s not about classifying into categories → it predicts numbers​
✔ It helps in forecasting, trend analysis, and estimating unknown values​
✔ It assumes that there is some underlying pattern in the data
✅ Types of Regression
1.​ Linear Regression​
✔ Relationship is modeled with a straight line​

2.​ Multiple Linear Regression​


✔ More than one input feature is used to predict the output​

3.​ Polynomial Regression​


✔ Models non-linear relationships by using polynomial functions​

4.​ Ridge Regression​


✔ A regularization method to reduce overfitting by penalizing large
coefficients​

5.​ Lasso Regression​


✔ Another regularization technique that can shrink some coefficients to
zero​

6.​ Logistic Regression (though technically classification)​


✔ Predicts probability for categorical outcomes​

7.​ Support Vector Regression (SVR)​


✔ Uses Support Vector Machine principles for regression problems​

Q.6 Linear Regression – Detailed Explanation


✔ Definition:

Linear regression tries to find the best-fitting straight line that predicts the output
(Y) from one or more inputs (X).

✔ Equation of a line:

Y=mX+cY = mX + c
Where:​
✔ YY = predicted output​
✔ XX = input feature​
✔ mm = slope of the line (how much Y changes with X)​
✔ cc = intercept (value of Y when X is 0)

✅ Example – Predicting House Prices


Problem:​
You want to predict the price of a house based on its area (in square feet).

Given Data:

Area (sq Price (in


ft) ₹1000s)

1000 200

1500 250

2000 300

2500 350

3000 400
✅ Strengths of Linear Regression
✔ Simple and easy to understand​
✔ Provides clear relationship between input and output​
✔ Works well when data shows a linear trend​
✔ Good for prediction and forecasting​
✔ Helps identify which features affect the output the most
✅ Weaknesses of Linear Regression
✔ Doesn’t work well if the relationship is not linear​
✔ Sensitive to outliers → extreme values can skew the results​
✔ Assumes constant variance and normal distribution of errors​
✔ Not suitable for complex, multi-dimensional problems without transformation

✅ Applications of Linear Regression


✔ House price prediction​
✔ Stock market forecasting​
✔ Salary estimation based on experience​
✔ Demand forecasting in business​
✔ Agricultural yield prediction​
✔ Temperature and rainfall analysis

✅ Final Summary – Easy to Remember


✔ Regression = Predicting numbers, not categories​
✔ Linear Regression = Best straight line through data points​
✔ Formula → Y=mX+cY = mX + c​
✔ Example → Predict house price from area​
✔ Strengths → Simple, interpretable, fast​
✔ Weaknesses → Sensitive to outliers, assumes linearity​
✔ Applications → Finance, real estate, agriculture, weather forecasting
Q.7​
1. Hierarchical Clustering
👉 Groups data into a hierarchy of clusters without predefining the number of
clusters.

Key Points

1.​ Definition​

○​ Builds a tree-like structure (dendrogram) of nested clusters.​

○​ Clusters are formed based on distance matrix instead of specifying


k.​

2.​ Types​

○​ Agglomerative (Bottom-Up)​

■​ Start: Each data point = its own cluster.​

■​ At each step: Merge the two most similar clusters.​

■​ Stop: When all objects merge into one big cluster.​

■​ Example: AGNES (Agglomerative Nesting).​

○​ Divisive (Top-Down)​

■​ Start: All data in one cluster.​

■​ At each step: Split the most heterogeneous cluster.​

■​ Stop: Until each object is a separate cluster.​


■​ Example: DIANA (Divisive Analysis).​

3.​ Distance Measures Between Clusters​

○​ Single Link → Minimum distance between two points of different


clusters.​

○​ Complete Link → Maximum distance between two points of


different clusters.​

○​ Average Link → Average distance between points across clusters.​

○​ Centroid → Distance between centroids.​

○​ Medoid → Distance between most central points (medoids).​

4.​ Dendrogram​

○​ A tree diagram showing how clusters are merged/split.​

○​ By “cutting” at a desired level → final clusters are obtained.​

5.​ Strengths​

○​ Easy to understand and interpret.​

○​ No need to pre-define k.​

○​ Good visualization via dendrogram.​

6.​ Weaknesses​

○​ Once merged/split → cannot be undone.​

○​ Poor with large datasets and mixed data types.​


○​ Sensitive to missing data.​

○​ Dendrograms are often misinterpreted.​

2. K-Means Clustering
👉 A partitioning clustering method based on centroids.
Key Points

1.​ Definition​

○​ Groups n objects into k clusters.​

○​ Each cluster is represented by a centroid (mean point).​

○​ Objective: Minimize Sum of Squared Errors (SSE).​

2.​ Algorithm Steps​

○​ Choose k (number of clusters).​

○​ Initialize → Randomly select k objects as initial centroids.​

○​ Assignment → Assign each object to the nearest centroid.​

○​ Update → Recompute centroids of clusters.​

○​ Repeat steps 3–4 until centroids do not change (convergence).​

3.​ Concept​

○​ Uses Euclidean distance (commonly) to measure similarity.​


○​ Works by iterative relocation (objects may be reassigned
repeatedly).​

4.​ Choosing k​

○​ Done using Elbow Method (plot SSE vs. k, choose elbow point).​

○​ Or Silhouette score.​

5.​ Advantages​

○​ Simple and fast.​

○​ Works well on large datasets.​

○​ Produces tighter clusters.​

6.​ Limitations​

○​ Must specify k beforehand.​

○​ Only works when mean is defined (not categorical data).​

○​ Struggles with non-convex clusters or different sized clusters.​

○​ Sensitive to noise and outliers.​

7.​ Example (from notes)​

○​ Data points: A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4),


C1(1,2), C2(4,9).​

○​ k = 3, initial centers chosen → Iteratively refine until stable clusters


are formed.​
✅ Easy to Remember Tip
●​ Hierarchical → Tree (AGNES & DIANA)​

●​ K-Means → Centroid & Iteration (Partitioning)​

Q.7

🔹 How Overfitting in Decision Trees Can Be Avoided


1.​ Pruning the Tree​

○​ Pre-pruning (early stopping): Stop splitting when nodes become too


small or improvement is negligible.​

○​ Post-pruning: Grow full tree first, then remove unnecessary


branches.​

2.​ Restrict Tree Depth​

○​ Limit maximum depth → avoids too many levels → reduces


complexity.​

3.​ Minimum Samples per Split/Leaf​

○​ Require a minimum number of samples before splitting a node.​

○​ Prevents tree from fitting noise in small sample subsets.​

4.​ Limit Number of Features​


○​ Restrict number of features considered at each split to avoid overly
complex boundaries.​

5.​ Use Ensemble Methods​

○​ Combine multiple trees (Random Forest, Gradient Boosting) →


reduces variance and prevents overfitting.​

👉 Easy Tip to Remember:​


Think of “PRUNE + LIMIT” → Prune tree, Limit depth, Limit samples, Limit
features, Use ensembles.

🔹 Out-of-Bag (OOB) Error in Random Forest


1.​ Bootstrap Sampling​

○​ Each tree is trained on a random sample (with replacement) of the


dataset.​

○​ About 2/3rd of samples are used → remaining 1/3rd are left out
(called Out-of-Bag samples).​

2.​ OOB Testing​

○​ The left-out samples (not used for training a tree) are used as a test
set for that tree.​

○​ Gives an unbiased estimate of prediction error.​

3.​ OOB Error Rate​

○​ Average error across all trees, measured using their respective OOB
samples.​
○​ Acts like a built-in cross-validation for Random Forest.​

4.​ Advantages of OOB​

○​ No need for separate validation dataset.​

○​ Saves computation time.​

○​ Provides reliable error estimate.​

👉 Easy Tip to Remember:​


OOB = “Free Test Set” → Each tree ignores some data → That ignored data
tests the tree → Gives error estimate.

You might also like