kmeans聚类算法 c语言实现

### K-means Clustering Algorithm Implementation in C K-means clustering is an iterative algorithm that partitions a dataset into `k` clusters based on distances. Below is a detailed implementation of the K-means algorithm in the **C programming language**, which includes functions for: - Initializing centroids randomly. - Calculating Euclidean distance between points. - Assigning data points to the nearest centroid. - Updating centroids based on cluster members. - Repeating the assignment and update steps until convergence. --- ### Code Example: K-means in C ```c #include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> #define MAX_ITER 100 #define K 2 // Number of clusters #define N 6 // Number of data points #define D 2 // Dimensionality of data // Function to calculate Euclidean distance double euclidean_distance(double *a, double *b, int dim) { double dist = 0.0; for (int i = 0; i < dim; i++) { dist += (a[i] - b[i]) * (a[i] - b[i]); } return sqrt(dist); } // Function to assign each point to the nearest centroid void assign_clusters(double data[N][D], double centroids[K][D], int assignments[N]) { for (int i = 0; i < N; i++) { double min_dist = INFINITY; int closest = 0; for (int j = 0; j < K; j++) { double dist = euclidean_distance(data[i], centroids[j], D); if (dist < min_dist) { min_dist = dist; closest = j; } } assignments[i] = closest; } } // Function to update centroids based on current cluster assignments void update_centroids(double data[N][D], double centroids[K][D], int assignments[N]) { int count[K] = {0}; double sum[K][D] = {{0}}; for (int i = 0; i < N; i++) { int cluster = assignments[i]; for (int j = 0; j < D; j++) { sum[cluster][j] += data[i][j]; } count[cluster]++; } for (int i = 0; i < K; i++) { if (count[i] > 0) { for (int j = 0; j < D; j++) { centroids[i][j] = sum[i][j] / count[i]; } } else { // Re-initialize empty cluster to a random point int rand_index = rand() % N; for (int j = 0; j < D; j++) { centroids[i][j] = data[rand_index][j]; } } } } // Check for convergence (if centroids don't change) int has_converged(double old_centroids[K][D], double new_centroids[K][D]) { for (int i = 0; i < K; i++) { if (euclidean_distance(old_centroids[i], new_centroids[i], D) > 1e-4) { return 0; // Not converged } } return 1; // Converged } int main() { srand(time(NULL)); // Sample data points (N x D) double data[N][D] = { {1.0, 2.0}, {1.5, 1.8}, {5.0, 8.0}, {8.0, 8.0}, {1.0, 0.6}, {9.0, 11.0} }; // Initialize centroids randomly from data points double centroids[K][D]; for (int i = 0; i < K; i++) { int idx = rand() % N; for (int j = 0; j < D; j++) { centroids[i][j] = data[idx][j]; } } int assignments[N]; double old_centroids[K][D]; for (int iter = 0; iter < MAX_ITER; iter++) { // Save current centroids for (int i = 0; i < K; i++) { for (int j = 0; j < D; j++) { old_centroids[i][j] = centroids[i][j]; } } // Assign clusters assign_clusters(data, centroids, assignments); // Update centroids update_centroids(data, centroids, assignments); // Check for convergence if (has_converged(old_centroids, centroids)) { printf("Converged at iteration %d\n", iter); break; } } // Output final results printf("Final Centroids:\n"); for (int i = 0; i < K; i++) { printf("Centroid %d: ", i + 1); for (int j = 0; j < D; j++) { printf("%.2f ", centroids[i][j]); } printf("\n"); } printf("Cluster Assignments:\n"); for (int i = 0; i < N; i++) { printf("Data point %d -> Cluster %d\n", i, assignments[i]); } return 0; } ``` --- ### Explanation of Key Steps The core idea behind K-means involves two major steps: 1. **Assignment Step**: Each data point is assigned to the nearest centroid using Euclidean distance[^1]. 2. **Update Step**: The centroid positions are recalculated as the mean of all data points assigned to that cluster[^3]. This process repeats until the centroids stabilize, indicating convergence[^1]. --- ### Time Complexity The time complexity of K-means is generally expressed as $ O(T \cdot k \cdot n \cdot d) $, where: - $ T $: Number of iterations. - $ k $: Number of clusters. - $ n $: Number of data points. - $ d $: Dimensionality of the data. This aligns with similar expressions used in other clustering techniques like Border-Peeling, which also considers additional overhead depending on the data structure used[^2]. --- ### Notes on Optimization - **Initialization Strategy**: Instead of purely random initialization, you can use **K-means++** for better initial centroid placement. - **Distance Calculation**: Optimize by precomputing distances or using spatial indexing structures like k-d trees[^4]. - **Early Stopping**: Add thresholds for minimal centroid movement to reduce unnecessary iterations. ---

阅读全文

kmeans聚类算法 c语言实现

相关推荐

C++结合OpenCV实现Kmeans聚类图像分割的案列

k-means聚类算法c语言实现

基于Python的kmeans和canopy聚类算法实现与优化设计源码

kmeans聚类算法 C语言

KMEANS 聚类算法实现程序（c实现）

C语言实现KMEANS聚类算法

C语言实现KMEANS聚类算法详解

C语言实现Kmeans聚类算法教程

C语言实现KMEANS聚类算法示例与详解

KMEANS聚类算法详解与实现

kmeans聚类算法莺尾花c语言

C语言Kmeans聚类算法

kmeans聚类算法c++

kmeans聚类算法c代码

kmeans聚类算法实例c

kmeans算法c语言实现，能对不同维度的数据进行聚类

K-Means聚类分析算法C语言实现

KMeans算法C语言实现及视频播放功能源码分享

KMEANS算法在C语言资源压缩项目中的应用

C# Socket通信源码：多连接支持与断线重连功能的物联网解决方案

大家在看

umeshmotion子程序汇总

2017年全国文保单位空间分布数据.zip

Actor-Critic原理和PPO算法推导，PPT讲解

建行总行信息技术类09、10、11三年的笔试回忆资料

johnson-cook.zip_drawbbc_johnson cook_johnson cook umat_johnson-

最新推荐

C# Socket通信源码：多连接支持与断线重连功能的物联网解决方案

掌握XFireSpring整合技术：HELLOworld原代码使用教程

【Unity2018汉化大揭秘】：一步到位优化中文用户体验

iPhone

驾校一点通软件：提升驾驶证考试通过率

【DFLauncher自动化教程】：简化游戏启动流程，让游戏体验更流畅

自适应卡尔曼滤波是什么意思

EIA-CEA 861B标准深入解析：时间与EDID技术

【DFLauncher应用实战】：如何将DFLauncher融入矮人要塞并提升效率

银河麒麟系统打开屏保