K-Nearest Neighbors (KNN) is a supervised learning algorithm that classifies new data points based on the closest existing labeled examples. To measure how “close” samples are, KNN relies on distance metrics that quantify similarity among feature values. Choosing an appropriate metric improves classification accuracy, robustness and generalization.

Need for Right Distance Metric
Some common reasons distance metrics are important in KNN are:
- Impacts how neighbors are selected and ranked.
- Influences decision boundaries created by the classifier.
- Improves handling of various data types like continuous, categorical.
- Reduces misclassification in high-dimensional spaces.
- Ensures fair comparison when features vary in scale.
Common Distance Metrics

1. Euclidean Distance
Euclidean distance measures the straight-line distance between two points in continuous numerical space. It works best when all features are continuous and similarly scaled.
Formula:
d(p, q) = \sum_{i=1}^{n} (p_i - q_i)^2
Where, p and q are data points.
Properties
- Sensitive to large differences in feature values.
- Performs well on low-dimensional, normalized data.
- Commonly used for geometric interpretation.
2. Manhattan Distance (L1 Norm)
Manhattan distance computes by summing absolute differences across dimensions. Useful when features represent directions, steps or grid-based movement.
d(p, q) = \sum_{i=1}^{n} |p_i - q_i|
Where p and q are data points.
Properties
- Robust to outliers compared to Euclidean distance.
- Preferred for high-dimensional data.
- Works well in sparse feature environments.
3. Minkowski Distance
Minkowski Distance is generalized version of both Euclidean and Manhattan distances. Controlled by a parameter p.
d(p, q) = \left( \sum_{i=1}^{n} |p_i - q_i|^{\,p} \right)^{\frac{1}{p}}
Where,
- p and q are data points,
- When p=1: Manhattan distance,
- When p=2: Euclidean distance.
4. Chebyshev Distance (Maximum Norm)
Chebyshev Distance measures the maximum absolute difference between two points across all features. It focuses on the largest deviation among dimensions.
d(p,q) = \max_i (|p_i - q_i|)
Where p and q are data points.
Properties
- Uses the maximum feature difference
- Ignores smaller variations
- Square-shaped distance boundary
5. Cosine Similarity
Cosine Similarity measures the angle between two vectors instead of magnitude, capturing how similar their direction is.
Formula
\cos \theta = \frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| \cdot ||\vec{b}||}
Where,
- p⋅q is the dot product,
- ∥p∥,∥q∥ are magnitudes of vectors.
Range: -1 to 1
- 1: vectors point in the same direction (high similarity)
- 0: vectors are orthogonal (no relation)
- -1: vectors opposite direction (high dissimilarity)
Properties
- Measures angle, not magnitude
- Good for text or high-dimensional data
- Scale-independent
Choosing the Right Distance Metric in KNN
| Distance Metric | When to Use | Not Ideal When | Use Case Scenario |
|---|---|---|---|
| Euclidean Distance | Data is continuous and evenly scaled | Features vary greatly in scale | Image recognition, sensor data |
| Manhattan Distance | High-dimensional or grid-based data | Features are highly correlated | City-block routing, clustering |
| Minkowski Distance | Need flexible distance tuning of parameter p | Unsure how to choose p | Generalized KNN experiments |
| Chebyshev Distance | Max difference matters across dimensions | Small variations are important | Chessboard moves, quality control |
| Cosine Similarity | Angle matters more than magnitude | Numerical size matters | Text similarity, embeddings, recommendations |