1.
Compare Compare the adaptability of traditional programming and machine learning
programming. How do these approaches handle changes in input or requirements?
Traditional programming is like giving exact instructions for every step, so if anything
changes, you have to rewrite those instructions. Machine learning programming learns from
examples, so it can adjust to changes by using new examples instead of rewriting everything.
It's like learning a new game by playing different rounds instead of memorizing new rules
each time.
2. For a “bird species recognition learning” problem, identify <T, P, E>???
2. the context of a "bird species recognition learning" problem:
T (Task): Identifying and classifying different bird species based on input data.
P (Performance): Evaluating the accuracy and efficiency of the model's recognition of
bird species.
E (Experience): Training the model using a labeled dataset of bird images to improve
its recognition capabilities over time.
3.Discuss how machine learning programming focuses on generalization, allowing models to make
predictions on unseen data.
Machine learning programming places a strong emphasis on generalization, which refers to
the ability of a model to perform well on unseen data that wasn't part of its training dataset.
This is a key objective because the ultimate goal of machine learning is to develop models
that can make accurate predictions or decisions in real-world scenarios.
The focus on generalization is achieved through various techniques and principles:
1. Training-Validation-Testing Split: The dataset is divided into three sets: training,
validation, and testing. The training set is used to train the model, the validation set is
used to fine-tune hyperparameters and prevent overfitting, and the testing set assesses
the model's performance on completely unseen data.
2. Overfitting Prevention: Overfitting occurs when a model learns noise in the training
data rather than the underlying patterns. Regularization techniques like L1, L2
regularization, and dropout are used to prevent overfitting, helping the model to
generalize better.
3. Cross-Validation: This technique involves repeatedly splitting the dataset into
different training and validation sets. It helps provide a more robust estimate of a
model's performance on unseen data and aids in hyperparameter tuning.
4. In a healthcare context, suggest a machine learning problem that can be addressed using
supervised learning. Explain the features, labels, and potential benefits of solving this problem.
Problem: Heart Disease Risk Prediction
Objective: To predict the likelihood of a patient having heart disease based on various medical and
lifestyle factors.
Features:
Age
Gender
Cholesterol levels
Blood pressure
Fasting blood sugar levels
Body Mass Index (BMI)
Family history of heart disease
Exercise habits
Labels:
Binary label: 0 (No heart disease) or 1 (Heart disease)
Benefits:
1. Preventive Measures: Predicting heart disease risk can help individuals take
preventive actions, such as adopting healthier lifestyles, managing cholesterol levels,
and seeking medical advice to reduce the risk of developing heart-related conditions.
2. Focused Screening: Healthcare providers can target screenings and tests towards
individuals with higher predicted risk, leading to earlier detection and intervention.
3. Treatment Planning: Doctors can design treatment plans tailored to a patient's
predicted risk, optimizing medication, lifestyle changes, and monitoring strategies.
6.Why should testing data be separate from the training data? How does this separation help
ensure unbiased evaluation?
Unbiased Evaluation: Think of testing data as a completely new set of challenges for the
model. If the model has seen this data before, it might just remember the answers without
truly understanding the underlying patterns. By keeping testing data separate, we're checking
if the model can handle new situations it hasn't encountered before.
Avoiding Overfitting: Imagine if you studied only the questions you knew would be on a
test. You'd do well on those questions but struggle if the test had different ones. Similarly, if
a model learns too much from the training data, it might perform perfectly on those examples
but fail when faced with different ones. Testing data helps us spot if the model has been
overly focused on training data.
Real-world Performance: Suppose you practiced a sport only on a specific type of field.
You might be great there, but how would you perform on different fields? Testing data acts
like those different fields, giving us an idea of how the model will do in real situations it
hasn't practiced extensively.
5. Imagine you have a dataset containing customer purchase history. Which type of
learning (supervised, or unsupervised) would you use to analyse this data? Justify your
choice.
To analyze customer purchase history, I would use unsupervised learning. This is because
unsupervised learning can help uncover patterns and highlights in the data without the need for
predefined labels or outcomes. It can help identify groups of similar customers or discover hidden
patterns in their purchasing behavior.