What is a k-means?
A popular clustering algorithm that groups examples in unsupervised learning.
k-means explained in plain English
A popular clustering algorithm that groups examples in unsupervised learning. The k-means algorithm basically does the following: - Iteratively determines the best k center points (known as centroids). - Assigns each example to the closest centroid. Those examples nearest the same centroid belong to the same group. The k-means algorithm picks centroid locations to minimize the cumulative square of the distances from each example to its closest centroid. For example, consider the following plot of dog height to dog width: If k=3, the k-means algorithm will determine three centroids. Each example is assigned to its closest centroid, yielding three groups: Imagine that a manufacturer wants to determine the ideal sizes for small, medium, and large sweaters for dogs. The three centroids identify the mean height and mean width of each dog in that cluster. So, the manufacturer should probably base sweater sizes on those three centroids. Note that the centroid of a cluster is typically not an example in the cluster. The preceding illustrations shows k-means for examples with only two features (height and width). Note that k-means can group examples across many features. See What is k-means clustering? in the Clustering course for more information.
Example
Practitioners refer to k-means when building, training, or evaluating machine learning systems. It appears in research papers, product documentation, and technical discussions about AI capabilities and limitations.
People also read
- A/B testing
A statistical way of comparing two (or more) techniques—the A and the B.
- ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model.
- accuracy
The number of correct classification predictions divided by the total number of predictions.
- activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
- active learning
A training approach in which the algorithm chooses some of the data it learns from.
- adaptation
Synonym for tuning or fine-tuning.
- agglomerative clustering
See hierarchical clustering.
- anomaly detection
The process of identifying outliers.
- area under the PR curve
See PR AUC (Area under the PR Curve).
- area under the ROC curve
See AUC (Area under the ROC curve).