J. Imaging, Vol. 12, Pages 55: Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces


J. Imaging, Vol. 12, Pages 55: Capacity-Limited Failure in Approximate Nearest Neighbor Search on Image Embedding Spaces

Journal of Imaging doi: 10.3390/jimaging12020055

Authors:
Morgan Roy Cooper
Mike Busch

Similarity search on image embeddings is a common practice for image retrieval in machine learning and pattern recognition systems. Approximate nearest neighbor (ANN) methods enable scalable similarity search on large datasets, often approaching sub-linear complexity. Yet, little empirical work has examined how ANN neighborhood geometry differs from that of exact k-nearest neighbors (k-NN) search as the neighborhood size increases under constrained search effort. This study quantifies how approximate neighborhood structure changes relative to exact k-NN search as k increases across three experimental conditions. Using multiple random subsets of 10,000 images drawn from the STL-10 dataset, we compute ResNet-50 image embeddings, perform an exact k-NN search, and compare it to a Hierarchical Navigable Small World (HNSW)-based ANN search under controlled hyperparameter regimes. We evaluated the fidelity of neighborhood structure using neighborhood overlap, average neighbor distance, normalized barycenter shift, and local intrinsic dimensionality (LID). Results show that exact k-NN and ANN search behave nearly identically when efSearch>k. However, as the neighborhood size grows and efSearch remains fixed, ANN search fails abruptly, exhibiting extreme divergence in neighbor distances at approximately k≈2–3.5×efSearch. Increasing index construction quality delays this failure, and scaling search effort proportionally with neighborhood size (efSearch=α×k with α≥1) preserves neighborhood geometry across all evaluated metrics, including LID. The findings indicate that ANN search preserves neighborhood geometry within its operational capacity but abruptly fails when this capacity is exceeded. Documenting this behavior is relevant for scientific applications that approximate embedding spaces and provides practical guidance on when ANN search is interchangeable with exact k-NN and when geometric differences become nontrivial.



Source link

Morgan Roy Cooper www.mdpi.com