0% found this document useful (0 votes)
53 views39 pages

Iu 3.6.4 ML 101

Machine learning algorithms analyze large amounts of data to identify patterns and make predictions without being explicitly programmed. There are three key conditions for applying machine learning: 1) a pattern must exist in the input data, 2) there must be ample data to analyze, and 3) the problem behavior can be expressed mathematically. The machine learning process involves collecting data, preparing it, selecting an algorithm, training the model, and evaluating performance. Algorithms either understand relationships between inputs and outputs or identify intrinsic patterns in the input data.

Uploaded by

anunair.viji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views39 pages

Iu 3.6.4 ML 101

Machine learning algorithms analyze large amounts of data to identify patterns and make predictions without being explicitly programmed. There are three key conditions for applying machine learning: 1) a pattern must exist in the input data, 2) there must be ample data to analyze, and 3) the problem behavior can be expressed mathematically. The machine learning process involves collecting data, preparing it, selecting an algorithm, training the model, and evaluating performance. Algorithms either understand relationships between inputs and outputs or identify intrinsic patterns in the input data.

Uploaded by

anunair.viji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

IU 3.6.

4 Machine Learning 101


RISE 2.0

SEP 2022
13 hours

Contents Where Are We in the Journey & 5 mins


Learning Objectives

Agenda for Today


25 mins
• Course Intro + Machine Learning Techniques
• Chapter 1 – Linear Regression 3.5 hrs.

• Chapter 2 – Logistic Regression 3 hrs.

• Chapter 3 - Clustering 3 hrs.


• Chapter 4 – Recommender Systems 3 hrs.

1
Where are we in the learning journey?

IU 1.0 IU 2.0
IU 3.0
Orientation Business Digital Capstone
Business and Data Analytics core
(1 week) Essentials Essentials (4 weeks)
(14 weeks)
(1 week) (1 week)

IU 3.6.4 – Machine Learning 101

Career Development Journey: 5 Career Spotlights + 4 Career Buddies

Leadership and Personal Development Journey: Networking + Enrichment Sessions

2
Re-Cap

In previous session, we covered the following topics:

• Why Machine Learning matters


• Introduction to machine learning techniques
• Supervised and Unsupervised Learnings

Note: The following slides are a repeat of previous session. The Trainer can move to jupyter
notebooks directly and come back for project de-briefing

3
Understand the importance &
applications of machine learning

Learning
Overview of different machine learning
objectives techniques & associated steps

Practice the use of machine learning for


varied business applications

4
Course Introduction
5
Machine learning is a data analytics technique
that teaches computers to do what comes
naturally to humans and animals: learn from
experience

Machine learning algorithms use computational


Machine methods to “learn” information directly from
Learning data without relying on a predetermined
equation as a model

The algorithms adaptively improve their


performance as the number of samples
available for learning increases

* More reference on the external link library 6


Why machine learning matters?
With the rise in big data, machine learning has become a key technique for solving problems in areas, such as:

Retail & CPG Manufacturing Natural language processing


Understanding the future potential The monitoring of manufacturing Natural language processing (NLP) is
demand and sales for products is a key equipment is vital to any industrial about developing applications and
task for any retailer to better plan for process. Sometimes it is critical that services that are able to understand
inventory, cut down on production of equipment be monitored in real-time for human languages.
unnecessary products, decide pricing faults and anomalies to prevent damage
strategy. and correlate equipment behavior faults • Refer here for more details on NLP
to production line issues. Fault detection
• Refer here for example on price is the pre-cursor to predictive
forecasting maintenance.

• Refer here for more details

7
Why machine learning matters?
With the rise in big data, machine learning has become a key technique for solving problems in areas, such as:

Computational finance Image processing & computer vision Computational biology


Computational finance is also sometimes Image processing & computer vision is a Can be used for tumor detection, drug
referred to as "financial engineering," method to perform some operations on discovery, and DNA sequencing.
"financial mathematics," "mathematical an image, in order to get an enhanced
finance," or "quantitative finance." It image or to extract some useful informa • Refer here for more on Tumor
uses the tools of mathematics, statistics, tion. detection or try github
and computing to solve problems in • Refer here to understand more
finance like credit scoring and • Refer here for basic understanding about DNA sequencing
algorithmic trading. about facial recognition and a quick
tutorial
• Refer here for basic understanding • Refer here to understand more about
about credit risk models motion detection
• Refer here for more about trading

8
Course Outline
9
Chapter 1 (3.5 hours)
• Types of Regression
• Linear Regression model
• Model Training, Evaluation and Validation

Course Outline (I/II) Chapter 2 (3 hours)


• Logistic Regression model
• Model Training, Iteration and Validation
• Model Fit Statistic
• Class Imbalance

10
Chapter 3 (3 hours)
• Supervised vs Unsupervised
• Clustering model
• Common Methods: K-means

Course Outline (II/II) Chapter 4 (3 hours)


• Recommender Systems
• Common methods: Association rules learning, Market
Basket Analysis, Content-based recommendation

11
Introduction to machine
learning techniques
12
When should we use machine learning?

We consider using machine learning algorithms when we have a complex task or problem involving a large amount of
data and lots of variables, but no existing formula or equation.

For example, machine learning is a good option if you need to handle situations like these:

13
Three conditions must be met to apply machine learning to a problem

A pattern must exist in the input There must exist an ample amount The behavior in the problem can be
data that would help to arrive at a of data (examples, samples) to formulated as a mathematical
conclusion apply machine learning to a problem expression
• For instance, if we concluded the • For instance, if there are no product • Machine learning is used to derive
product reviews are random and do reviews for the webcam, it will be meaning from the data and perform
not offer any meaning, then it would difficult to arrive at a decision “structured learning” to arrive at a
be difficult to arrive at a decision by whether or not to buy the product mathematical approximation to
using them describe the behavior of the problem
• Handling these situation requires
• To solve a problem with machine simplifying the hypotheses & models
learning, the machine learning (use non-parametric approaches). *
algorithm must have a pattern to
infer from * More reference on the external link library

14
How Machine Learning Works?
Process Flow of Machine Learning

15
How Machine Learning Algorithm Works?

A machine learning algorithm performs a learning task where it either:

Understands relationships between input & an output Identifies intrinsic patterns in input data
• Given input data x & an output Y, the machine learning • The machine learning algorithm tries to find underlying
algorithm tries to find a relationship between x & Y, which structure or distributions in the data x
can be represented as: Y = f(x) • Since there is no output Y defined, there are no perfect
• The goal of machine learning algorithm would be to learn answers
the properties of this target function f, based on the given
data x

* More reference on the external link library

16
Overview of different techniques
Mainly there are 5 different categories of Machine Learning techniques that are used in the industry

17
Supervised & unsupervised
learning
18
What is supervised learning?

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an
algorithm to learn the mapping function from the input to the output. Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you
can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can
be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm
iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.

19
Supervised learning problems

Supervised learning problems can be further grouped into regression and classification problems:
Regression Vs Classification

• Classification: A classification problem is when the output variable is a discrete category, such as “will
a customer default or not in loan payment?” or “was a transaction anomalous or not?” or "is the growth
on brain shown in MRI scan a tumor or not?"

• Regression: A regression problem is when the output variable is a real value, such as “estimating future
demand of a product” or “predicting revenue based on advertising spend”.

20
Regression Vs Classification algorithms

21
What is unsupervised learning?

In unsupervised learning, we only have input data (X) and no corresponding output variables

The goal for unsupervised learning is to model the underlying structure or distribution in the data in order
to learn more about the data

These are called unsupervised learning because unlike supervised learning above there is no correct
answers and there is no teacher. Algorithms are left to their own devises to discover and present the
interesting structure in the data

22
Unsupervised learning problems

Unsupervised learning problems can be further grouped into clustering and association mining problems: Clustering Vs
Association

• Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as "grouping
customers by purchasing behavior & demographic features"

• Association Mining: An association rule learning problem is where you want to discover rules that describe large
portions of your data, such as "if a customer bought milk, which other products would he/she likely buy?"

23
Clustering Vs Association algorithms

24
Choosing between supervised & unsupervised ML

25
The basic steps in using any machine learning technique

Step 1 - Identify if we have a target variable

Step 2 - Identify if the target variable is continuous or categorical (not valid if there is no target variable)

Step 3 - Identify the independent features which can explain the target variable

Step 4 - Make necessary transformations of data

Step 5 - Perform modeling based on data characteristics

26
Example 1 - Regression
Objective: Predict sales for every product-store combination for the next quarter

27
Example 1 - Regression
Objective: Predict sales for every product-store combination for the next quarter

Step 1 - Identify if we have a target variable


• From the data, we can observe that we have a target variable – sales

Step 2 - Identify if the target variable is continuous or categorical (not valid if there is no target variable)
• The target variable sales is continuous. This means we should go with regression

Step 3 - Identify the independent features which can explain the target variable
• Discount, visitor count, store area, holiday status can influence sales

Step 4 - Make necessary transformations of data


• For example, the holiday status can be changed from Yes / No to 1 / 0 so the algorithm can understand it

Step 5 - Perform modeling based on data characteristics


• After the data is cleaned & pre-processed, we can choose a regression algorithm based on how the data is structured
• If we observe a linear relationship between the response (sales) and the other independent features, we can choose
linear regression

28
Example 2 - Classification
Objective: Predict if a customer will default on loan payment in the next year

29
Example 2 - Classification
Objective: Predict if a customer will default on loan payment in the next year

Step 1 - Identify if we have a target variable


• From the data, we can observe that we have a target variable - default status

Step 2 - Identify if the target variable is continuous or categorical (not valid if there is no target variable)
• The target variable sales is categorical (yes/no). This means we should go with classification

Step 3 - Identify the independent features which can explain the target variable
• Sex, Education, Income, previous default indicator, state of origin can influence the default behavior

Step 4 - Make necessary transformations of data


• For example, categorical data columns like sex, education, state can be changed to numerical values so the
algorithm can understand them better

Step 5 - Perform modeling based on data characteristics


• After the data is cleaned & pre-processed, we can choose a classification algorithm

30
Course Deep Dive
31
Chapter 1 – Linear Regression

Exit to Demo Workbook


02
Chapter 2 – Logistic Regression

Exit to Demo Workbook


03
Chapter 3 - Clustering

Exit to Demo Workbook


04
Chapter 4 – Recommender Systems

Exit to Demo Workbook


Project Details post
Machine Learning
36
Identify the level of income qualification needed for
the families in Latin America.

Points to note:
• Many social programs have a hard time ensuring that the right
people are given enough aid.
• The client believes that new ML methods beyond traditional
Project De- econometrics, might help improve the model for this problem.
Brief • The project involves these main tasks:
EDA Identify the output variable.
EDA Understand the type of data.
EDA Check if there are any biases in your dataset.
EDA Check whether all members of the house have the same poverty level.
EDA Check if there is a house without a family head.
EDA Set poverty level of the members and the head of the house within a family.
EDA Count how many null values are existing in columns.
Data Cleaning Remove null value rows of the target variable.
Modeling Predict the accuracy using random forest classifier and 2 other algorithms
Modeling Discuss parameter tuning and find the optimal paramater for each algorithm
Modeling Check the accuracy with cross validation.

37

You might also like