0% found this document useful (0 votes)
2 views

Lecture 3 Design of a ML System

Design of ml system

Uploaded by

Paawan Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 3 Design of a ML System

Design of ml system

Uploaded by

Paawan Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1/6/2025

ML: Course Objectives


NIMS University
NIET 1.Understand the concept of learning in computer and science.
DEPARTMENT OF CSE 2.Compare and contrast different paradigms for learning
(supervised, unsupervised, etc.).
3.Design experiments to evaluate and compare different
Faculty: Prof. (Dr.) Vineet Mehan machine learning techniques on real-world problems.

Lecture – 3 Design of a ML System


Machine Learning (CSC601B) 1 Machine Learning (CSC601B) 2

COURSE OUTCOMES Syllabus

On completion of this course, the students shall be able to:-


1. Comprehend core machine learning concepts (supervised/unsupervised
learning, models) for data analysis and prediction.
2. Implement various machine learning algorithms (e.g., linear regression, kNN,
decision trees) to solve real-world problems.
3. Evaluate and compare model performance using appropriate metrics (accuracy,
precision, recall).
4. Preprocess and prepare data for machine learning tasks (cleaning, normalization,
feature engineering).
5. Communicate machine learning results effectively, interpreting model behavior
and limitations.

Machine Learning (CSC601B) 3 Machine Learning (CSC601B) 4

SUGGESTIVE READINGS MODE OF EVALUATION


Theory
• Text/References Books:
Internal End Term
• 1. Kevin P. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, 2012. Components
• 2. Ethem Alpaydin, “Introduction to Machine Learning”, MIT Press, Third Edition, 2014.
Assessment Examination
• 3. Tom Mitchell, "Machine Learning", McGraw-Hill, 1997.
Marks 30 70
• 4. Sebastian Raschka, Vahid Mirjilili,”Python Machine Learning and deep learning”, 2nd edition, kindle book,
Total Marks 100
2018
Lab
• 5. Carol Quadros,”Machine Learning with python, scikit-learn and Tensorflow”, Packet Publishing, 2018
Internal End Term
• 6. Gavin Hackeling,” Machine Learning with scikit-learn”, Packet publishing, O’Reily, 2018 Components
Assessment Examination
• 7. Stanford Lectures of Prof. Andrew Ng on Machine Learning
Marks 15 35
Machine Learning (CSC601B) 5 Total Marks
Machine Learning (CSC601B) 50
By: Prof. (Dr.) Vineet Mehan 6

1
1/6/2025

Index Design of a ML System


1. Design • There are a few Design Steps (9) of a ML System
2. Example
• Step-by-Step

• Along with a relevant example

Machine Learning (CSC601B) By: Prof. (Dr.) Vineet Mehan 7 Machine Learning (CSC601B) 8

1. Problem Definition 1. Problem Definition


• Theory: • Example:

• Clearly identify the problem to solve and its scope. • Objective: Predict whether a customer will churn (stop using a
service) based on their usage patterns and demographics.
• Specify the input, output, and type of ML task (e.g., classification,
regression, clustering). • Inputs: Customer attributes (age, location, subscription type) and
behavioural data (session duration, payment history).

Machine Learning (CSC601B) 9 Machine Learning (CSC601B) 10

1. Problem Definition 2. Data Collection and Preprocessing


• Output: Binary outcome: 1 (churn) or 0 (no churn). • Theory:

• Type of ML Task: Supervised binary classification. • Gather data relevant to the problem.

• Clean and preprocess it for model readiness:


• Handle missing data, noise, and outliers.
• Transform and normalize features.
• Split data into training, validation, and testing sets.

Machine Learning (CSC601B) 11 Machine Learning (CSC601B) 12

2
1/6/2025

2. Data Collection and Preprocessing 2. Data Collection and Preprocessing


• Example: Demographics: the number and characteristics of people who live in a
• Example:
particular area or form a particular group, especially in relation to their age,
how much money they have and what they spend it on
• Data Sources: • Preprocessing:
• Customer demographics from the CRM database. • Handle missing age values by imputing the median age.
• Behavioral data (e.g., login frequency, average session time) from app • Normalize session duration between 0 and 1.
usage logs. • Encode subscription type (e.g., "Basic" = 0, "Premium" = 1).
• Payment history from billing systems.

Machine Learning (CSC601B) 13 Machine Learning (CSC601B) 14

2. Data Collection and Preprocessing 2. Data Collection and Preprocessing


• Example: • 1. Training Subset (70%)

• Splitting: Split the dataset into 70% training, 15% validation, and 15% • Purpose: The training set is used to teach the model to identify
testing subsets. patterns and learn from the data. It forms the foundation of the
model's understanding of the problem.

• Size: Allocating 70% of the dataset ensures that the model has a
sufficient amount of data to learn from, reducing the risk of
underfitting (where the model doesn't learn enough).

Machine Learning (CSC601B) 15 Machine Learning (CSC601B) 16

2. Data Collection and Preprocessing 2. Data Collection and Preprocessing


• 2. Validation Subset (15%) • 3. Testing Subset (15%)

• Purpose: The validation set is used to fine-tune the model. This subset
helps: • Purpose: The testing set evaluates the model's performance on
• Monitor the model's performance during training. unseen data after training is complete. It gives an unbiased estimate
• Tune hyperparameters (e.g., learning rate, number of layers). of how the model will perform in real-world scenarios.
• Detect overfitting, which occurs when the model performs well on the training data
but poorly on unseen data.
• Size: Reserving 15% ensures enough data to reliably assess the
• Size: A 15% allocation provides a good balance to evaluate the model model’s generalization capability.
during training without sacrificing too much data from the training subset.

Machine Learning (CSC601B) 17 Machine Learning (CSC601B) 18

3
1/6/2025

3. Model Selection 3. Model Selection


• Theory: • Example:

• Choose an algorithm suitable for the task and data type. • Algorithm: Start with a Logistic Regression as a baseline due to its
simplicity. Then move to Random Forest for better handling of mixed
data types and non-linearity.
• Compare traditional ML models (e.g., Random Forest, SVM) and deep
learning models (e.g., CNNs, RNNs).
• Baseline Model: Logistic Regression to establish a minimum expected
accuracy.
• Select a baseline model for benchmarking.

Machine Learning (CSC601B) 19 Machine Learning (CSC601B) 20

4. Model Design and Training 4. Model Design and Training


• Theory: • Example:

• Define the architecture or configuration of the selected model. • Model: Random Forest with the following hyperparameters:
• Number of trees: 100.
• Train the model using the training dataset and tune hyperparameters. • Max depth: 10.
• Minimum samples per leaf: 2.
• Monitor metrics during training to avoid overfitting or underfitting.
• Training Process: Train on the 70% training set.

Machine Learning (CSC601B) 21 Machine Learning (CSC601B) 22

5. Evaluation 5. Evaluation
• Theory: • Let's use a simple example to explain cross-validation, specifically 3-
fold cross-validation, with a small dataset.
• Use appropriate metrics to measure the model’s performance.
• Dataset:
• Conduct cross-validation to ensure robustness. • Imagine we have a dataset of 6 data points:
• Data: [A, B, C, D, E, F]
• Perform error analysis to identify areas for improvement. • Labels: [1, 1, 0, 0, 1, 0]

Machine Learning (CSC601B) 23 Machine Learning (CSC601B) 24

4
1/6/2025

5. Evaluation 5. Evaluation
• Goal • Step-by-Step Process

• We want to evaluate a model’s performance using cross-validation. • Step 1: Split Data into 3 Folds
We'll use 3-fold cross-validation, which means:
• We divide the dataset into 3 parts (folds):
1.The dataset will be split into 3 equal parts (folds). • Fold 1: [A, B]
• Fold 2: [C, D]
2.Each fold will take turns as the test set, while the other two are used • Fold 3: [E, F]
as the training set.
Machine Learning (CSC601B) 25 Machine Learning (CSC601B) 26

Final Result
The cross-validation process tells us the model's average
5. Evaluation 5. Evaluation accuracy is 50%. This is a more reliable estimate of the model's
performance than using a single train-test split, as it tests the
model on all parts of the dataset.

Machine Learning (CSC601B) 27 Machine Learning (CSC601B) 28

5. Evaluation 6. Deployment
• Results: • Theory:
• Accuracy: 92%.
• Precision: 85%. • Deploy the trained model into a production environment.
• Recall: 78%.
• F1-score: 81%. • Make predictions accessible via APIs or integrated systems.

Machine Learning (CSC601B) 29 Machine Learning (CSC601B) 30

5
1/6/2025

Flask is a lightweight and simple framework in Python that


helps you build web applications and APIs.

6. Deployment AWS Lambda is a service from Amazon Web Services (AWS)


that lets you run your code without needing to worry about
7. Monitoring and Maintenance
managing servers. It automatically handles the underlying
• Example: infrastructure for you, so you can focus on writing and
deploying your code. • Theory:
• Deployment:
• Use Flask to create an API that takes customer data and returns churn
probability. • Continuously monitor model performance to detect data drift or
degradation.
• Host the model on AWS Lambda for scalability.

• Workflow: • Set up alerts for drops in accuracy or significant changes in input


1.CRM system sends customer data to the API. distributions.
2.API returns a churn probability for each customer.
3.System triggers retention campaigns for high-risk customers. • Plan for regular retraining with new data.
Machine Learning (CSC601B) 31 Machine Learning (CSC601B) 32

7. Monitoring and Maintenance 8. Ethical and Regulatory Compliance


Grafana is an open-source platform used for visualizing data
• Example: and monitoring systems. It lets you create interactive • Theory:
dashboards to track metrics, logs, and other key data points in
real-time, making it ideal for system performance analysis and
troubleshooting.
• Monitoring: • Ensure the system adheres to ethical guidelines and legal
• Use a dashboard (e.g., Grafana) to track key metrics like prediction requirements.
accuracy, latency, and data distribution.

• Address biases in the model and explain decisions transparently.


• Maintenance:
• Retrain the model quarterly using the most recent customer data.
• Alert the team if precision falls below 80%.

Machine Learning (CSC601B) 33 Machine Learning (CSC601B) 34

8. Ethical and Regulatory Compliance 9. Scalability


• Example: • Theory:

• Bias Mitigation:
• Design the system to handle growing amounts of data and users.
• Check if the model unfairly predicts churn for specific demographics (e.g.,
age or location).
• Use techniques like caching, parallel processing, and distributed
• Compliance: systems for scalability.
• Follow Government regulations by anonymizing customer data.
• Provide explanations for churn predictions using SHAP values to
stakeholders.
SHapley Additive exPlanations
Machine Learning (CSC601B) 35 Machine Learning (CSC601B) 36

6
1/6/2025

Kubernetes, often abbreviated as K8s, is an open-source


9. Scalability platform for managing and orchestrating containerized
applications. It automates tasks like deploying, scaling, and Summary
managing applications across a cluster of machines.

• Example: • Summary of the Example:


1.Problem: Predict customer churn for a subscription business.
2.Data: Behavioral, demographic, and payment history data.
• Scaling:
3.Model: Started with Logistic Regression, then improved with Random
• Deploy the API on a Kubernetes cluster for load balancing. Forest.
• Use caching for commonly queried customer segments to reduce 4.Evaluation: Achieved 92% accuracy with a focus on recall.
latency. 5.Deployment: Flask API hosted on AWS Lambda.
6.Monitoring: Grafana dashboard and retraining every quarter.
7.Ethics: Checked for demographic bias and ensured Govt. compliance.
8.Scalability: Used Kubernetes for scaling and caching for efficiency.
Machine Learning (CSC601B) 37 Machine Learning (CSC601B) 38

Task REFERENCES
• Explain the steps involved in preprocessing data for a machine 1. ChatGPT
learning model. How would you handle missing values, categorical
variables, and scaling for numerical features in a churn prediction
system? 2. Gemini

• Given a classification task where the goal is to predict customer 3. Google


churn, how would you select and evaluate the performance of
different machine learning models? Discuss the metrics you would
use and why they are important for this specific problem. 4. YouTube

Machine Learning (CSC601B) By: Prof. (Dr.) Vineet Mehan 39 Machine Learning (CSC601B) 40

THANK YOU

Machine Learning (CSC601B) 41

You might also like