0% found this document useful (0 votes)
20 views

ML Clustering

- The document describes the k-means clustering algorithm which groups together data points into k clusters. - It involves randomly selecting k data points as initial cluster centers, assigning each remaining point to the closest center, then recalculating the centers as the means of the points in each cluster. - This process repeats iteratively until the cluster centers stabilize and no longer change with additional iterations.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

ML Clustering

- The document describes the k-means clustering algorithm which groups together data points into k clusters. - It involves randomly selecting k data points as initial cluster centers, assigning each remaining point to the closest center, then recalculating the centers as the means of the points in each cluster. - This process repeats iteratively until the cluster centers stabilize and no longer change with additional iterations.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

clustering

- group together data

- divide data into clusters

- kmeans

- dbscan

--------------------------
row = record = tuple = observation = instance = datapoint

col1 col2 col3


12 24 20 G2 Dp1
14 10 20 G1 Dp2
17 18 25 G2 Dp3
17 24 10* Cr1(G1) Dp4
11 10 10 G1
12 28 15 G1
18 24 10 G2
12 31 10 G2
15 12 35* Cr2 (G2)
14 23 30 G1
19 50 10 G2
11 21 15 G1

k=2
- - divide above records
into k groups
1. pick any k rows as centers
randomly - Cr1, Cr2
2. find dist between Dp1 to Cr1
and Dp1 to Cr2

d1 = (12-17)**2 + (24-24)**2 + (20-10)**2

dp1 to G1

d2 = (12-15)**2 + (24-12)**2 + (20-35)**2

dp1 to G2

5 - 9 = 4

4,5 7,9
x1,y1 x2,y2

sqrt( (x1-x2)**2 + (y1-y2)**2 )

if a > b is true

will a**2 > b**2 be true

sq5 > sq3

5 > 3

5**2 > 3**2


x1,y1,z1 x2,y2,z2

sqrt( (x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2 )

3. assign Dp1 to particular Grp


to which it is close (distance is lowest)
4. repeat steps 2 and 3
for all datapoints

col1 col2 col3


11 21 15 G1
14 10 20 G1 Dp2
17 24 10* Cr1(G1) Dp4
11 10 10 G1
12 28 15 G1 (NewCr1)
14 23 30 G1

15 12 35* Cr2 (G2)


19 50 10 G2
12 24 20 G2 Dp1
17 18 25 G2 Dp3
18 24 10 G2
12 31 10 G2 (newCr2)

18 24 10
- + - + -
12 31 11

** ** **
2 2 2

5. find the new center of each Group


by doing mean/avg operation on each group

for G1=> new center is

(11+14+17+11+12+14) / 6 12
(21+10+24+10+28+23) / 6 27
(15+20+10+10+15+30) / 6 15

for G2=> new center is :

6. repeat steps 2,3,4,5


again and again
until centers are not changing
and datapoints are not changing

=============== ============== K-MEANS ============== =================

N - 100 data points


d1 d2 d3 d4 d5 ... d100

k - 3 number of clusters
pick k - centroids - random

d1 d2 d3

d4 to d1 5
d4 to d2 7
d4 to d3 3

d4 belongs to d3

d5 to d1 3
d5 to d2 7
d5 to d3 5

d5 belongs to d1

d6

...

d100
-----------------------------------

c1 c2 c3
d1 d3* d9.. d2 d5 d7.. d4 d6 d8 ...

calc mean of
c1 data points

You might also like