0% found this document useful (0 votes)
19 views21 pages

2024 Fo Cluster 4

Chapter 7 of 'Data Mining: Concepts and Techniques' discusses cluster analysis, focusing on density-based clustering methods such as DBSCAN, OPTICS, and DENCLUE. It explains the fundamental concepts of density-reachability and density-connectivity, and how these methods can discover clusters of arbitrary shapes while handling noise. The chapter also highlights the algorithms and parameters essential for effective clustering in spatial databases.

Uploaded by

triprishikesh358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

2024 Fo Cluster 4

Chapter 7 of 'Data Mining: Concepts and Techniques' discusses cluster analysis, focusing on density-based clustering methods such as DBSCAN, OPTICS, and DENCLUE. It explains the fundamental concepts of density-reachability and density-connectivity, and how these methods can discover clusters of arbitrary shapes while handling noise. The chapter also highlights the algorithms and parameters essential for effective clustering in spatial databases.

Uploaded by

triprishikesh358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Data Mining: Concepts and Techniques — Chapter 7 — Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign [Link],[Link]/~hanj ©2006 Jiawei Han and Micheline Kamber, All rights reserved SS Cluster Analysis 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis 3. A Categorization of Major Clustering Methods 4. Partitioning Methods 5. Hierarchical Methods 6. Sd Density-Based Clustering Methods = Clustering based on density (local cluster criterion), such as density-connected points m Major features: Discover clusters of arbitrary shape Handle noise One scan Need density parameters as termination condition m Several interesting studies: DBSCAN: Ester, et al. (1996) OPTICS: Ankerst, et al (1999). DENCLUE: Hinneburg & D. Keim (1998) A Density-Based Clustering: Basic Concepts m Two parameters: Eps: Maximum radius of the neighborhood MinPts: Minimum number of points in an Eps- neighborhood of that point = Ne,<(p): — {q belongs to D | d(p,q) <= Eps} ” Directly density-reachable: A point p is directly di ~ reachable from a point g w.r.t. Eps, MinPts if ohh fer p belongs to Nzps(q) 5 4: MinPts = 5 core point condition: ins INeps (4)| >= MinPts Eps = 1 cm 4 A Density-Based Clustering: Basic Concepts = Density-reachable: A point p is density-reachable from a point q w.r.t. Eps, MinPts if there is a chain of points pj, ..., Py, P1 = q, Pn = p such that p;,; is directly density-reachable from p; = Density-connected A point p is density-connected toa point q w.r.t. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o w.r.t. Eps and MinPts 5 a Explanation on whiteboard . on pst & stl Ep pe @ fomg wot Eek tang D Fromp ae hae Coe f > No, # % ph not Oo Co ‘ ey uk SAM damp rer 4 DpRFAMP MN wt Ee, L vot (ye pofed P Doe feng . (ove pt) crap. (No, fp woe GRP cmp ton p DE Fam art & Oph ( ahabn o jeg 4) unit. Mpb, thon gana p De ort &, mph fp PB audg awe De . paq od FP. 4 Gow tue Same © feo p ond gare bent Emp pore (if O7B g (ere = q bee yt ghey FR (ypu fe ppg dom, , Men tee foeq_) t 2) P DRY = L b Ng P DEL = bre SS DBSCAN: Density Based Spatial Clustering of Applications with Noise m Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points = Discovers clusters of arbitrary shape in spatial databases with noise Ao Outlier Eps = lem MinPts = 5 ES DBSCAN: The Algorithm = Arbitrary select a point p = Retrieve all points density-reachable from p w.r.t. Eps and MinPts. = If pis acore point, a cluster is formed containing p and all the density-reachable points from p. Mark these points as processed. { Swuly asin fom ach Ste m Mark p as processed. = Continue this process until all of the points have been processed. — DBSCAN: Sensitive to Parameters Nbr tue © toon roger Hee Cluster Oud Siem Neva cluster Figure 8, 08Scan suit for ith ‘Mines at Land Eps at (@05and 004. Figure 9. DBScan results for 052 with ‘MinPts at 4 and Eps at (@)5.0, 03.5, and (30. (a) () 9 —~~E OPTICS: A Cluster-Ordering Method = OPTICS: Ordering Points To Identify the Clustering Structure Ankerst, Breunig, Kriegel, and Sander (1999) Produces a special order of the database w.r.t. its density-based clustering structure This cluster-ordering contains info equivalent to the density-based clusterings corresponding to a broad range of parameter settings Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure Can be represented graphically or using visualization techniques 10 SS OPTICS basic concepts Syaadlai © = Core Distance of p wrt MinPts: smallest distance eps’ between p and an object in its eps-neighborhood such that p would be a core object for eps’ and MinPts. Otherwise, undefined. = Reachability Distance of p wrt o: Max (core-distance (0), d (0, p)) if o is core object. Undefined otherwise % M -disti , d (0, i ax (core-distance (0), d (0, p)) MinPts =5 r(pl, 0) =1.5cm. 1(p2,0) = 4cm ,, 3 e =3cm Sh OPTICS = (1) Select non-processed object o m (2) Find neighbors (eps-neighborhood) = Compute core distance for o = Write object 0 to ordered file and mark o as processed = Ifois nota core object, restart at (1) = (ois a core object ...) = Putneighbors of o in Seedlist and order If neighbor n is not yet in SeedList then add (n, reachability from 0) else if reachability from o < current reachability, then update reachability + order SeedList wrt reachability = Take new object from Seedlist with smallest reachability and restart at (2) a Example on whiteboard 0, OO Cy or NCOdF =(0%, oy = £0.04, 04) Ronpb ob te ) ha itty di He J * m Cyr 04} Reachability -distance undefined Cluster-order 4 of the objects ————~_ DENCLUE: Using Statistical Density Functions m= DENSsity-based CLUstEring by Hinneburg & Keim (1998) = Using statistical density functions = Major features Solid mathematical foundation Good for data sets with large amounts of noise Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets Significant faster than DBSCAN But needs a large number of parameters SS Denclue: Technical Essence = Uses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree-based access structure = Influence function: describes the impact of a data point within its neighborhood atx, Foasioa(¥2Y) = € = Overall density of the data space can be calculated as the sum of the influence function of all data points OBS: minus d(xx,) i . nooo OBS: minus Femssian OO) = Dye 2° = Clusters can be determined mathematically by identifying density attractors. Density attractors are local maxima of the overall density function d(x)? Vi eeusion 5%) Dy yO —X)@ 27 OBS: minus Density Attractor fa) Data Set SSS Denclue: Technical Essence = Significant density attractor for threshold k: density attractor with density larger than or equal to k = Center-defined cluster for a significant density attractor x for threshold k: points that are density attracted by x Points that are attracted to a density attractor with density less than k are called outliers = Set of significant density attractors X for threshold k: for each pair of density attractors x1, x2 in X there is a path from x1 to x2 such that each point on the path has density larger than or equal to k = Arbitrary-shape cluster for a set of significant density attractors X for threshold k: points that are density attracted to some density attractor in X 18 SS Center-Defined and Arbitrary-shape clusters (abe =02 tb} idjo=15 Figure 3: Example cf Canepa Clusters for different (@)é=2 ib) é=2 (hea (d}g=2 Figure 4: Example of Arbitray-Ghape Clusters for different ¢ 19

You might also like