Unsupervised learning
Unsupervised learning
PRESENTED BY :-
NAME :
ROLL NO :
CLASS : MBA
DIV :A
SUBJECT : ML with Python
GUIDE : Hema Darne
Introduction
• Unsupervised learning, also known as unsupervised machine learning,
uses machine learning algorithms to analyze and cluster unlabeled
datasets. These algorithms discover hidden patterns or data groupings
without the need for human intervention. Its ability to discover
similarities and differences in information make it the ideal solution
for exploratory data analysis, cross-selling strategies, customer
segmentation, and image recognition.
History
• Horace Barlow (see Barlow, 1992), who sought ways of characterizing
neural codes,
• David Marr (1970), who made an early unsupervised learning
postulate about the goal of learning in his model of the neocortex.
• The Hebb rule (Hebb, 1949), which links statistical methods to
neurophysiological experiments on plasticity
• Geoffrey Hinton and Terrence Sejnowski in inventing a model of
learning called the Boltzmann machine (1986)
Clustering
Clustering is a data mining technique which
groups unlabeled data based on their
similarities or differences. Clustering
algorithms are used to process raw,
unclassified data objects into groups
represented by structures or patterns in the
information. Clustering algorithms can be
categorized into a few types, specifically
exclusive, overlapping, hierarchical, and
probabilistic.
What is clustering good for
• Market segmentation - group customers into different market
segments
• Social network analysis - Facebook "smartlists“
• Organizing computer clusters and data centers for network layout
and location
• Astronomical data analysis - Understanding galaxy formation
Hierarchical clustering
• Hierarchical clustering, also known as
hierarchical cluster analysis (HCA), is an
unsupervised clustering algorithm that can be
categorized in two ways; they can be
agglomerative or divisive. Agglomerative
clustering is considered a “bottoms-up
approach.” Its data points are isolated as
separate groupings initially, and then they are
merged together iteratively on the basis of
similarity until one cluster has been achieved.
Four different methods are commonly used
to measure similarity
k-means clustering
• k-means clustering is a method of vector
quantization, originally from signal
processing, that aims to partition n
observations into k clusters in which
each observation belongs to the cluster
with the nearest mean (cluster centers or
cluster centroid), serving as a prototype
of the cluster.
Applications of unsupervised learning in
multiple domains
• Unsupervised Leaming helps in a variety of ways which can be used to
solve various real-world problems.
• They help us in understanding patterns which can be used to cluster
the data points based on various features.
• Understanding various defects in the dataset which we would not be
able to detect initially
• They help in mapping the vanous items based on the dependencies of
each other.
• Cleansing the datasets by removing features which are not really
required for the machine to learn from
Example
• Amazon : Amazon also uses unsupervised learning to learn the customer's purchase and
recommend the products which are most frequently bought together which is an
example of association rule miningAcrobat's got it.
• AirBnB : This is a great application which helps host stays and experiences connecting
people all over the world This application uses Unsupervised Learning where the user
queries his or her requirements and Airbnb learns these patterns and recommends stays
and experiences which fall under the same group or cluster.
• Credit-Card Fraud Detection : Unsupervised Learning algorithms learn about various
patterns of the user and their usage of the credit card. If the card is used in parts that do
not match the behaviour, an alarm is generated which could possibly be marked fraud
and calls are given to you to confirm whether it was you using the card or not.
• Dimensionality Reduction : Dimensionality reduction is the process of reducing the
number of random variablesunder consideration by getting a set of principal variables.