Unit - 1:
Introduction To Machine
Learning
Prepared By:
Ridhdhi Naik
●Introduction
●Overview of Human Learning
●Overview of Mahine Learning
Outline ●Types of Machine Learning
●Application of Machine Learning
●Tools & Technologies
●Learning is typically referred to as the process of
gaining information through observation
Human ●To do a task in a proper way, we need to have prior
information on one or more things related to the task.
Learning ●Also, as we keep learning more or in other words
acquiring more information, the efficiency in doing
the tasks keep improving.
●Learning directly from Expert: somebody who is an
expert in the subject directly teaches us
TYPES OF ●Knowledge gained from Expert: we build our own
HUMAN notion indirectly based on what we have learnt from
the expert in the past
LEARNING ● Self- Learning: we do it ourselves, may be after
multiple attempts, some being unsuccessful.
‘A computer program is said to learn from
experience E with
respect to some class of tasks T and performance
measure P,
if its performance at tasks in T, as measured by P,
Machine improves
with experience E.’
Learning Tom M. Mitchell,
Professor of Machine Learning
Department,
School of Computer Science,
Carnegie Mellon University
●The basic machine learning process can be divided
into 3 parts.
●Data Input: Past data or information is utilized as a
basis for future decision-making
How do ●Abstraction: The input data is represented in a
broader way through the underlying algorithm
machines ●Generalization: The abstracted representation is
learn? generalized to form a framework for making decisions
Step 1: What is the problem? : Task T
●Describe the problem informally and formally and list
assumptions and similar problems.
Step 2: Why does the problem need to be
Well-posed solved? :
Performance
learning Measure P
●List the motivation for solving the problem, the
problem benefits that the solution will provide and how the
solution will be used.
Step 3: How would I solve the problem?:
Experience E
● Describe how the problem would be solved manually
to flush domain knowledge.
●A number of information should be collected to know what is the
problem. Informal description of the problem, e.g. I need a
program that will prompt the next word as and when I type a word.
Well-posed Formalism:
●Use Tom Mitchell’s machine learning formalism stated above to
learning define the T, P, and E for the problem.
problem ●For example:
●Task (T): Prompt the next word when I type a word.
●Experience (E): A corpus of commonly used English words and
phrases.
Step 1: ●Performance (P): The number of correct words prompted
What is the considered as a percentage (which in machine learning paradigm
is known aslearning accuracy).
Problem? Assumptions:
●Create a list of assumptions about the problem.
Similar problems:
●What other problems have you seen or can you think of that are
similar to the problem that you are trying to solve?
Motivation:
●What is the motivation for solving the problem? What
Well-posed requirement will it fulfil?
learning ●For example, does this problem solve any longstanding
problem business issue like finding out potentially fraud
transactions? Or the purpose is more trivial like trying to
suggest some movies for upcoming weekend.
Step 2: Why Solution benefits:
●Consider the benefits of solving the problem. What
does the capabilities does it enable?
problem ●It is important to clearly understand the benefits of
solving the problem. These benefits can be articulated to
need to be sell the project.
solved? Solution use:
●How will the solution to the problem be used and the life
time of the solution is expected to have?
Well-posed
learning
problem ●Try to explore how to solve the problem manually.
●Detail out step-by-step data collection, data
preparation, and program design to solve the
Step 3: How problem.
would I ●Collect all these details and update the previous
sections of the problem definition, especially the
solve the assumptions.
problem?
●Machine learning can be classified into three broad
categories:
1. Supervised learning :
●Also called predictive learning. A machine predicts
Types of the class of unknown objects based on prior class
related information of similar objects.
Machine 2. Unsupervised learning:
Learning ●Also called descriptive learning. A machine finds
patterns in unknown objects by grouping similar
objects together.
3. Reinforcement learning:
●A machine learns to act on its own to achieve the
given goals.
Types of
Machine
Learning
Supervised
Learning
For Example:
●Predicting the results of a game
Supervised ●Predicting whether a tumour is malignant or benign
Learning ●Predicting the price of domains like real estate,
stocks, etc.
●Classifying texts such as classifying a set of emails as
spam or non spam
●Classification is a type of supervised learning where
a target feature, which is of type categorical, is
predicted for test data based on the information
imparted by training data.
●The target categorical feature is known as class.
Supervised Some typical classification problems include:
Learning: ●Image classification
Classification ●Prediction of disease
●Win–loss prediction of games
●Prediction of natural calamity like earthquake, flood,
etc.
●Recognition of handwriting
Supervised
Learning:
Classification
●In linear regression, the objective is to predict
numerical features like real estate or stock price,
temperature, marks in an examination, sales revenue,
etc.
●The underlying predictor variable and the target
Supervised variable are continuous in nature.
Learning: ●In case of linear regression, a straight line
relationship is ‘fitted’ between the predictor variables
Regression and the target variables, using the statistical concept
of least squares method.
● As in the case of least squares method, the sum of
square of error between actual and predicted values
of the target variable is tried to be minimized.
●In case of simple linear regression, there is only one
predictor variable whereas in case of multiple linear
regression, multiple predictor variables can be
included in the model.
Supervised ●Let’s take the example of yearly budgeting exercise
of the sales managers. They have to give sales
Learning: prediction for the next year based on sales figure of
previous years visà- vis investment being put in.
Regression ●Obviously, the data related to past as well as the data
to be predicted are continuous in nature. In a basic
approach, a simple linear regression model can be
applied with investment as predictor variable and
sales revenue as the target variable.
Supervised
Learning:
Regression ●Figure shows a typical simple regression model,
where regression line is fitted based on values of
target variable with respect to different values of
predictor variable. A typical linear regression model
can be represented in the form –
where ‘x’ is the predictor variable and ‘y’ is the target
variable.
Typical applications of regression can be seen in
●Demand forecasting in retails
Supervised ●Sales prediction for managers
Learning: ●Price prediction in real estate
Regression ●Weather forecast
●Skill demand forecast in job market
Unsupervised
Learning
●Unlike supervised learning, in unsupervised learning,
there is no labelled training data to learn from and no
prediction to be made.
●In unsupervised learning, the objective is to take a
Unsupervised dataset as input and try to find natural groupings or
Learning patterns within the data elements or records.
●Therefore, unsupervised learning is often termed as
descriptive model and the process of unsupervised
learning is referred as pattern discovery or
knowledge discovery.
●Clustering is the main type of unsupervised learning.
It intends to group or organize similar objects
together.
●For that reason, objects belonging to the same cluster
are quite similar to each other while objects
Unsupervised belonging to different clusters are quite dissimilar.
Learning: ●Hence, the objective of clustering to discover the
intrinsic grouping of unlabelled data and form
Clustering clusters, as depicted in Figure.
● Different measures of similarity can be applied for
clustering.
●One of the most commonly adopted similarity
measure is distance.
●Two data items are considered as a part of the same
cluster if the distance between them is less.
●In the same way, if the distance between the data
items is high, the items do not generally belong to the
same cluster.
●This is also known as distance-based clustering.
Unsupervised Figure 1.8 depicts the process of clustering at a high
level.
Learning:
Clustering
●As a part of association analysis, the association
between data elements is identified.
Market basket analysis:
●From past transaction data in a grocery store, it may
be observed that most of the customers who have
Unsupervised bought item A, have also bought item B and item C or
at least one of them.
Learning: ●This means that there is a strong association of the
Association event ‘purchase of item A’ with the event ‘purchase of
item B’, or ‘purchase of item C’.
Analysis ●Identifying these sorts of associations is the goal of
association analysis.
●This helps in boosting up sales pipeline, hence a
critical input for the sales group.
●Critical applications of association analysis include
market basket analysis and recommender systems.
Unsupervised
Learning:
Association
Analysis
Reinforcemen
t Learning
●One contemporary example of reinforcement learning
is self-driving cars.
●The critical information which it needs to take care of
Reinforcemen are speed and speed limit in different road segments,
t Learning traffic conditions, road conditions, weather conditions,
etc.
●The tasks that have to be taken care of are start/stop,
accelerate/decelerate, turn to left / right, etc.
Supervised Learning Unsupervised Reinforcement
Learning Learning
• Used when you know • Used when there is • Used when there is
how to classify given no idea about the no idea about the
data class or label of a class or label of a
• Class or labels are particular data particular data
available • The model has t • The model has to do
find pattern in the classification – It will
data get rewarded if the
classification is
correct, else get
punished
Difference • Labelled training data is
needed
• Any unknown &
unlabelled dataset
• The model earns &
updates itself through
• Model is built based on is given to model rewards/
training data as input & records Punishments
are grouped
• The model performance • difficult to measure • Model is evaluated by
can be evaluated based whether the model means of the reward
on how many did something function after it had
misclassifications have useful or sometime to learn
been done based on a interesting
comparison b/w • Homogeneity of
predicted and actual records grouped
value together is the only
Supervised Learning Unsupervised Reinforcement
Learning Learning
2 types: 2 types: • NO such types
• Classification • Clustering
• Regression • Association
• Simplest one to • More difficult to • Most complex to
understand understand and understand and apply
implement than SL
Standard Algorithm: Standard Algorithm: Standard Algorithm:
• Naïve Bayes • K-means • Q-learning
Difference • kNN
• Decision Tree
• PCA
• SOM
• Sarsa
• Linear regression • Apriori algorithm
• Logistic regression • DBSCAN etc.
• SVM etc.
Application: Application: Application:
• Handwriting recognition • Market Basket • Self-driving cars
• Stock market prediction Analysis • Intelligent robots
• Disease prediction • Recommender • AlphaGo Zero
• Fraud Detection Systems
• Customer
Segmentation
Banking & Finance:
●Fraudulent transactionsdetection
●Using predictive learning, the set of vulnerable
customers who may leave the bank very soon, can be
identified. Proper action can be taken to make sure
Application of that the customers stay back.
Machine Insurance:
Learning ●Risk prediction during new customer on boarding
●Claims management
Healthcare:
●Predict the health conditions of the person real time
●Disease prediction
Python:
●Python has very strong libraries:
●Advanced mathematical functionalities (NumPy),
Tools / ●Algorithms and mathematical tools (SciPy) and
Technologies ●Numerical plotting (matplotlib)
●Built on these libraries, there is a machine learning
library named scikit-learn, which has various
classification, regression, and clustering algorithms
embedded in it.
R Language:
●R is a very simple programming language with a huge set
of libraries available for different stages of machine
learning.
●Some of the libraries standing out in terms of popularity
are:
●plyr/dplyr (for data transformation),
Tools / ●caret (‘Classification and Regression Training’ for
Technologies classification),
●RJava (to facilitate integration with Java),
●tm (for text mining),
●ggplot2 (for data visualization).
●Other than the libraries, certain packages like Shiny and R
Markdown have been developed around R to develop
interactive web applications, documents and dashboards
on R without much effort.
Matlab:
●MATLAB (matrix laboratory) is a licenced commercial
software with a robust support for a wide range of
numerical computing.
●MATLAB has a huge user base across industry and
academia.
Tools / ●MATLAB is developed by MathWorks, a company
founded in 1984.
Technologies ● Being proprietary software, MATLAB is developed
much more professionally, tested rigorously, and has
comprehensive documentation.
●MATLAB also provides extensive support of statistical
functions and has a huge number of machine learning
algorithms in-built.
●It also has the ability to scale up for large datasets by
parallel processing on clusters and cloud.
[Link] the difference between supervised learning and
unsupervised learning.
[Link] the concept of penalty and reward in
reinforcement learning.
[Link] do you mean by a well-posed learning problem?
Explain important features that are required to well-
define a learning problem.
Assignment [Link] Machine learning? Briefly explain the types of
learning
[Link] and explain the flow diagram of machine
learning procedure.
[Link] and explain the types of machine learning in brief.
[Link] is regression? Define with a regression line.
#Thank You
From @:☺☺☺