0% found this document useful (0 votes)
3 views

Unit 3

Clc

Uploaded by

raneemjihan5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 3

Clc

Uploaded by

raneemjihan5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Machine Learning

(PC 603 IT)


Unit-3

Unsupervised Learning: Clustering The K-Means Algorithm: Dealing


With Noise, the K-Means Neural Network Normalization, a Better
Weight Update Rule Using Competitive Learning for Clustering, Vector
Quantization, the Self-Organizing Feature Map: The Som Algorithm
Neighborhood Connections Self-Organization Network Dimensionality
and Boundary Conditions, Examples of Using the Som.
Unsupervised Learning - Clustering
Introduction
• The aim of unsupervised learning is to find clusters of similar inputs in the data
without being explicitly told that these datapoints belong to one class and those
to a different class.
• Instead, the algorithm has to discover the similarities for itself.
• Unsupervised learning algorithms can be further categorized into two types:
Clustering and Association.
Clustering – The k-Means Algorithm
• Suppose that we want to divide our input data into k categories, where we know
the value of k.
• We allocate k cluster centers to our input space, and we would like to position
these centers so that there is one cluster center in the middle of each cluster.
• The k-Means algorithm is used to find the middle of each cluster.
• Usually the Euclidean distance is used as the distance metric.
• We can compute the central point of a set of datapoints using the mean average.
• We compute the mean point of each cluster, μc(i), and put the cluster center
there.
• This is equivalent to minimizing the Euclidean distance from each datapoint to its
cluster center.
Positioning the Cluster center
• We start by positioning the cluster centers randomly through the input space,
since we don’t know where to put them, and then we update their positions
according to the data.
• We decide which cluster each datapoint belongs to by computing the distance
between each datapoint and all of the cluster centers, and assigning it to the
cluster that is the closest.
• For all of the points that are assigned to a cluster, we then compute the mean of
them, and move the cluster center to that place.
• We iterate the algorithm until the cluster centers stop moving.
Dealing with Noise
• Unfortunately, the mean average, which is central to the k-means algorithm, is
very susceptible to outliers, i.e., very noisy measurements.
• To avoid the problem is to replace the mean average with the median, which is
what is known as a robust statistic, meaning that it is not affected by outliers (the
mean of (1, 2, 1, 2, 100) is 21.2, while the median is 2).
The k-Means Neural Network
• If we think about the cluster centers that we optimize the positions of as
locations in weight space, then we could position neurons in those places and use
neural network training.
• Now, the location of each neuron is its position in weight space which matches
the value of its weights.
• The k-Means algorithm is implemented using a set of neurons as follows:
• We will use just one layer of neurons, together with some input nodes, and no
bias node.
• The first layer will be the inputs, which don’t do any computation, as usual, and
the second layer will be a layer of competitive neurons, that is, neurons that
‘compete’ to fire, with only one of them actually succeeding.
The k-Means Neural Network – cont..
• Only one cluster center can represent a particular input vector, and so we will
choose the neuron with the highest activation h to be the one that fires.
• This is known as winner-takes-all activation, and it is an example of competitive
learning.
• The weight of the winning neuron is adjusted to be closer to the input, using the
following rule:

• This has the effect of moving the weight wij directly towards the current input.
• Note that the only weights that are being updated are those of the winning unit.
Normalization
• Suppose an input vector with values (0.2, 0.3, -0.1) is presented and it happens to
be an exact match for one of the neurons.
• Then, the activation of that neuron will be:
0.2*0.2 + 0.3*0.3 + (-0.1)*(-0.1) = 0.14
• However, consider a neuron with large weights (10, 9, 8). Its activation will be:
0.2*10 + 0.3*9 + (-0.1)*8 = 3.9
• This will be the winner. However, this second neuron and all other neurons are
not perfect matches, so their activation should all be less.
• Thus, we can only compare activations if we know that the weights for all of the
neurons are the same size. We do this by normalizing all the weight vectors.
Using competitive learning for clustering
• Deciding which cluster each new datapoint belongs to is now an easy task.
• We present it to the trained algorithm and observe which neuron (ie cluster
center) is activated.
Vector Quantization
• Consider the example of Data Communication.
• We need to reduce the amount of data transmitted in order to keep the
transmission cost to a minimum.
• Instead of sending whole datapoints, we can encode the data and send only the
indices of the datapoints.
• The codebook can be shared with the receiver so that the data can be decoded.
• The codebook will not contain every possible datapoint. Now, if we want to send
a datapoint which is not in the codebook, the index of the prototype vector which
is closest to it is sent. This is known as Vector Quantization. This same idea is used
in lossy compression.
Vector Quantization (cont..)
• We need to accept that the received data will not look exactly the same as the
original data.
• In a Voronoi tessellation of space, the dots at the centre of each cell are the
prototype vectors and any datapoint that lies within a cell is represented by the
dot.
Vector Quantization (cont..)
• The question is: how to choose the prototype vectors.
• We need to choose prototype vectors that are as close as possible to all of the
possible inputs that we might see.
• The self-organizing feature map is used to solve this problem.
Self-Organizing Feature Maps
• The SOM is a neural network in which the relative locations of the neurons in the
Network matters. This property is known as feature mapping whereby nearby
neurons correspond to similar input patterns.
• The neurons are arranged in a grid with connections between the neurons, rather
than in layers with connections only between the different layers.
• The SOM demonstrates relative ordering preservation, which is sometimes
known as topology preservation.
• The relative ordering of the inputs should be preserved by the ordering in the
neurons, so that neurons that are close together represent inputs that are close
together, while neurons that are far apart represent inputs that are far apart.
Self-Organizing Feature Maps (cont..)
• The winning neuron should pull other neurons that are close to it in the network
closer to itself in weight space, which means that we need positive connections.
• Likewise, neurons that are further away should represent different features, and
so should be a long way off in weight space, so the winning neuron ‘repels’ them,
by using negative connections to push them away.
• Neurons that are very far away in the network should already represent different
features, so we just ignore them.
• This is known as the ‘Mexican Hat’ form of lateral connections.
Self-Organizing Feature Maps (cont..)
Neighbourhood Connections
• If we start our network off with random weights, then at the beginning of
learning, the network is unordered.
• As the weights are random, two nodes that are very close in weight space could
be on opposite sides of the map and vice versa.
• Therefore, at the beginning of the algorithm, the neighborhood size should be
large.
• Once the network has been learning for a while, the algorithm starts to fine-tune
the individual local regions of the network. At this stage, the neighborhood
should be small.
• These two phases of learning are also known as ordering and convergence.
Ordering
• Initially, similar input vectors excite neurons that are far apart, so that the
neighborhood (shown as a circle) needs to be large.
Convergence
• Later on during training the neighborhood can be smaller, because similar input
vectors excite neurons that are close together.
Self-Organization
• A particularly interesting aspect of feature mapping is that we get a global
ordering of the neurons in the network, despite the fact that the interactions are
all local, since neurons that are very far apart do not interact with each other.
• We thus get a global ordering of the space using only a set of local interactions.
This is known as self-organization.
• Consider a flock of birds flying in formation. The birds cannot possibly know
exactly where each other are, so how do they keep in formation?
• If each bird just tries to stay diagonally behind the bird to its right, and fly at the
same speed, then they form perfect flocks, no matter how they start off and what
objects are placed in their way.
• So the global ordering of the whole flock can arise from the local interactions of
each bird looking to the one on its right (or left).
Network Dimensionality and Boundary Conditions
• The SOM algorithm is usually applied on 2D rectangular array of neurons.
• There are cases where a line of neurons (1D) works better, or where three
dimensions are needed. It depends on the dimensionality of the inputs.
• We also need to consider the boundaries of the network. For example, if we are
arranging sounds from low pitch to high pitch, then the lowest and highest
pitches we can hear are obvious endpoints.
• However, it is not always the case that such boundary conditions are clearly
defined. In this case, we might want to remove the boundary conditions.
Circular boundary conditions
• Using circular boundary conditions in 1D turns a line into a circle.
End of Unit-3

You might also like