
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Scipy Cluster VQ KMeans2 Method
scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, the kmeans2() method does not use threshold values. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input metrices contain only finite numbers or not.
Below is given the detailed explanation of its parameters −
Parameters
-
data− ndarray
It is an ‘M’ by ‘N’ array of M observations in N dimension.
-
k− int or ndarray
This parameter represents the number of clusters to form and the centroids to generate. It is interpreted as initial cluster to use in case of the two conditions given below −
When minit initialization string is ‘matrix’.
or if a ndarray is given.
-
thresh− float, optional
This parameter represents the threshold value. If the change in distortion since the last iteration is less than or equal to this threshold value, the algorithm will be terminated by default.
-
minit− str, optional
This parameter represents the method for initialization. Below are given some available methods for the same −
random− It generates k centroids from a Gaussian with mean and variance. The mean and variance are estimated from the data.
points− This method chooses k observations i.e., rows randomly from data for the initial centroids.
++− This method, also called careful seeding, choose k observations i.e., rows to the kmeans++ method.
matrix− The matrix method interprets the k parameter (as ‘k’ by ‘M’ array) of initial centroids.
missing− str, optional
This parameter represents the method to deal with empty clusters. Below are the available methods −
warn− This method, as name implies, give a warning, and continue.
raise− This method will raise an error (ClusterError) and terminate the algorithm.
-
check_finite− bool, optional
This parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is True.
Returns
-
centroid− ndarray
It returns a k by N array of centroids.
-
label− ndarray
This is the index of the centroid.