0% found this document useful (0 votes)
11 views5 pages

Assignment m2 Machine Learning Final

The document discusses the application of machine learning (ML) concepts in real-world scenarios, including supervised learning for sales prediction, unsupervised learning for user grouping in video streaming, and reinforcement learning for drone delivery routes. It highlights the benefits and challenges of each approach while addressing ethical concerns in ML, particularly in healthcare. Additionally, it emphasizes the importance of model evaluation metrics beyond accuracy, suggesting methods like cross-validation to enhance reliability.

Uploaded by

sehajsingh7838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Assignment m2 Machine Learning Final

The document discusses the application of machine learning (ML) concepts in real-world scenarios, including supervised learning for sales prediction, unsupervised learning for user grouping in video streaming, and reinforcement learning for drone delivery routes. It highlights the benefits and challenges of each approach while addressing ethical concerns in ML, particularly in healthcare. Additionally, it emphasizes the importance of model evaluation metrics beyond accuracy, suggesting methods like cross-validation to enhance reliability.

Uploaded by

sehajsingh7838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment M : Machine Learning

Abdul Sahil Ansari


24/5

Objective:

This assignment will help you connect the concepts of ML with real-world scenarios.
You are expected to think critically, analyze situations, and explain how ML can be
applied without writing any code.

Part A - Real-World Scenarios

. Supervised Learning: Supermarket Sales Prediction


1
Which type of supervised learning (regression or classi cation) would be
fi
suitable here? Why?

Regression would be suitable here. The objective is to predict the monthly sales,
which is a continuous numerical value. Regression models are designed to
predict continuous outcomes, unlike classi cation models which predict discrete
fi
categories.

Suggest one bene t and one challenge of using supervised learning in this
fi
case.

Bene t: Supervised learning can provide highly accurate sales forecasts,


fi
enabling the supermarket to optimize inventory management, reduce waste, and
plan marketing campaigns more e ectively. By learning from historical data, the
ff
model can identify complex relationships between factors like season,
advertisements, and pricing, and their impact on sales.

Challenge: A signi cant challenge is the need for a large amount of high-quality,
fi
labeled historical sales data. If the data is incomplete, inconsistent, or lacks
relevant features (e.g., competitor pricing, local events), the model's accuracy
can be severely impacted. Additionally, the model might struggle to adapt to
sudden, unforeseen market changes or new product introductions without
retraining.

2. Unsupervised Learning: Video Streaming Platform User Grouping

Which unsupervised learning technique could be useful?

Clustering techniques, such as K-Means clustering or Hierarchical clustering,


would be useful. These algorithms can group users into distinct segments based
on similarities in their viewing habits (e.g., genres watched, watch times,
frequency of viewing, interaction with recommendations) without requiring pre-
defined labels.

How could this grouping improve customer experience?

Grouping users based on viewing habits allows the platform to provide highly
personalized movie recommendations. Instead of generic suggestions, users
would receive recommendations tailored to their specific cluster's preferences,
leading to a more relevant and enjoyable content discovery experience. This
personalization can increase user engagement, satisfaction, and retention, as
users feel the platform understands their tastes and offers content they are
genuinely interested in.

3. Reinforcement Learning: Drone Delivery Routes

How does reinforcement learning apply here?

Reinforcement learning (RL) is highly applicable here because the drones need to
learn optimal delivery routes through trial and error in a dynamic environment.
The drone acts as an agent, the environment is the delivery area (including
obstacles, traffic, weather), and the actions are movements and route choices.
The drone receives rewards for successful and efficient deliveries (e.g., reaching
the destination quickly, avoiding obstacles, minimizing fuel consumption) and
penalties for undesirable outcomes (e.g., delays, crashes, inefficient routes). Over
time, through continuous interaction with the environment and receiving
feedback (rewards/penalties), the RL algorithm will learn a policy that dictates
the best sequence of actions to take to optimize delivery routes.

What could be one possible risk of using this approach?


One significant risk is the potential for unforeseen or unsafe behaviors during
the learning process, especially in real-world deployment. Since RL involves
exploration and trial-and-error, the drones might initially attempt inefficient or
even dangerous routes or actions that could lead to accidents, property damage,
or injury to people. Ensuring safety during the training phase and implementing
robust safety protocols, such as simulation-based training and strict real-world
testing with human oversight, is crucial to mitigate this risk.

Part B - Case Study Reflection

Case Study: Helping in Early Disease Detection

Machine Learning (ML) holds immense potential in revolutionizing early disease


detection, offering a proactive approach to healthcare. In this scenario, supervised
learning would be the most suitable ML type. Specifically, classification algorithms
would be employed to categorize individuals into discrete groups, such as 'diseased'
or 'healthy,' or to identify the presence of specific conditions based on various input
features.

Examples of data that might be used include a wide array of patient information. This
could encompass demographic data (age, gender, ethnicity), medical history (pre-
existing conditions, family history of diseases), lifestyle factors (diet, exercise, smoking
habits), and crucially, diagnostic test results. The diagnostic data could range from
blood test markers, genetic sequences, imaging scans (e.g., X-rays, MRIs, CT scans), to
physiological measurements (e.g., blood pressure, heart rate). For instance, a model
could be trained on thousands of anonymized patient records, where each record
includes these features along with a confirmed diagnosis (the 'label').

However, the application of ML in early disease detection raises significant ethical and
social concerns. A primary concern is data privacy and security. Medical data is highly
sensitive, and its collection, storage, and processing must adhere to stringent privacy
regulations (e.g., HIPAA, GDPR). There's also the risk of algorithmic bias. If the training
data disproportionately represents certain demographics or lacks diversity, the model
might perform poorly or inaccurately for underrepresented groups, leading to
disparities in healthcare access and outcomes. For example, a model trained primarily
on data from one ethnic group might misdiagnose or delay diagnosis for individuals
from another. Furthermore, the issue of false positives and false negatives is critical.
A false positive could lead to unnecessary anxiety, costly follow-up tests, and even
invasive procedures, while a false negative could delay crucial treatment, with
potentially life-threatening consequences. Ensuring transparency in how these models
arrive at their predictions and establishing clear accountability for their outcomes are
paramount to building trust and ensuring equitable healthcare.

Part C - Thinking About Model Evaluation

1. Why might accuracy alone not be enough to evaluate this model?

Accuracy alone might not be enough to evaluate a model predicting whether a student
will pass or fail an exam, especially if there's an imbalance in the dataset (e.g.,
significantly more students pass than fail). If 95% of students typically pass, a model
that simply predicts every student will pass would achieve 95% accuracy. While
seemingly high, this model is useless as it fails to identify any failing students.
Accuracy doesn't differentiate between the types of errors (false positives vs. false
negatives), which can have different implications. In this context, incorrectly predicting
a failing student will pass (false negative) is far more critical than incorrectly predicting
a passing student will fail (false positive), as it prevents timely intervention.

2. Between precision and recall, which would matter more in this


situation? Explain your choice.

In this situation, recall would generally matter more than precision. Recall measures
the proportion of actual positive cases (students who will fail) that were correctly
identified by the model. A high recall means the model is good at catching most of the
students who are at risk of failing. The consequence of a false negative (predicting a
failing student will pass) is severe: the student might not receive the necessary support
or intervention, potentially leading to actual failure. While a low precision (many false
positives – predicting a passing student will fail) might lead to unnecessary
interventions for some students, it is less detrimental than missing a student who
genuinely needs help. The priority is to identify as many at-risk students as possible to
provide support.
3. Suggest one simple method (like cross-validation or A/B testing) to
make the model more reliable.

Cross-validation is a simple yet effective method to make the model more reliable.
Instead of training and evaluating the model on a single split of data (e.g., 80% train,
20% test), cross-validation involves partitioning the dataset into multiple subsets
(folds). The model is then trained and tested multiple times, with each fold serving as
the test set exactly once. For example, in 5-fold cross-validation, the data is divided
into five parts. The model is trained on four parts and tested on the remaining one, and
this process is repeated five times. The performance metrics (like recall) are then
averaged across all folds. This approach provides a more robust and less biased
estimate of the model's performance on unseen data, reducing the chance of
overfitting to a specific data split and giving a more reliable indication of its
generalization ability.

You might also like