Manhattan and Hamming Distance Use Cases Manhattan (L1) distance is useful when features have inherent sparsity or when outlier dimensions should contribute linearly rather than quadratically to the distance. For example, in recommendation systems where user behavior vectors are high-dimensional and sparse (e.g., click counts), L1 can better handle noise. Hamming distance is designed for binary or categorical data, such as comparing hashed fingerprints (e.g., SimHash for near-duplicate detection) or evaluating binary feature vectors (e.g., presence/absence of attributes in product catalogs). For instance, Hamming is used in DNA sequence alignment for mismatches or in blockchain consensus algorithms to compare node states.
Computational Cost Differences Manhattan distance requires summing absolute differences, which is computationally lighter than Euclidean (L2) distance, as L2 involves squaring and square roots. For example, L1 for a 1000D vector requires 1000 subtractions, absolute values, and a sum (~1000 operations), while L2 adds 1000 squaring steps and a sqrt. Hamming distance is even cheaper for binary data: comparing two 256-bit vectors using bitwise XOR and counting set bits (e.g., 3-5 CPU cycles per XOR operation) is far faster than floating-point arithmetic for L1/L2. However, for non-binary data, Hamming isn’t applicable. Cosine similarity, often equivalent to L2 on normalized vectors, adds normalization overhead but shares L2’s computational profile during search.
Index Support and Practical Tradeoffs Most vector databases (e.g., FAISS, Annoy) optimize for L2/cosine. For example, FAISS’s IVF-PQ indexes quantize vectors assuming L2-like geometry. Manhattan can sometimes use L2-optimized indexes with reduced accuracy, but dedicated L1 support (e.g., via VP-trees) is less common. Hamming requires specialized binary indexes, like FAISS’s “BinaryFlat” or bitmap-based structures, which leverage bitwise operations for speed. However, binary indexes lack the scalability of L2/cosine methods for high dimensions. In practice, L2/cosine benefit from decades of optimization (GPU acceleration, approximate algorithms), while Hamming and L1 are niche but critical for specific data types. Developers should prioritize L2/cosine unless data properties (e.g., binary values, robustness to outliers) explicitly demand alternatives.