K-Means Clustering— Kevin Jose Thomas

The most popular technique used for grouping data points or objects that are familiar. It can group unsupervised data (no trained model) based on the similarity of the data to each other. It is a partitioning-based algorithm that divides into k non-overlapping subsets (clusters) without any label references to train. It serves to minimize the intra-cluster distances and to maximize the inter-cluster distances.

Algorithm

Randomly places K markers (called centroids), one for each cluster
Calculates the distance of each point from the centroid (eg: euclidean distance, manhattan distance)
Assigns each data point to its closest centroid, creating a cluster
Recalculates the position of the K centroids
Repeats steps 2-4 until the centroids no longer move
Calculate variances and then repeat steps 1-5 using K new randomly placed centroids until N user-defined iterations are complete
Assign the data to clusters which had the smallest saved variance

🪴 Knowledgebase

Explorer

K-Means Clustering

Algorithm

Graph View

Backlinks