Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers
- Resource Type
- Authors
- Roberto Uribe; Roberto Solar; Diego Seco; Oscar Pedreira; Nieves R. Brisaboa
- Source
- SOFSEM 2008: Theory and Practice of Computer Science ISBN: 9783540775652
SOFSEM
- Subject
- Metric space
Tree (data structure)
Similarity (network science)
Nearest neighbor search
Search engine indexing
Data mining
Cluster analysis
computer.software_genre
computer
Subspace topology
Mathematics
Curse of dimensionality
- Language
Metric spaces are a very active research field which offers efficient methods for indexing and searching by similarity in large data sets. In this paper we present a new clustering-based method for similarity search called SSSTree. Its main characteristic is that the centers of each cluster are selected using Sparse Spatial Selection (SSS), a technique initially developed for the selection of pivots. SSS is able to adapt the set of selected points (pivots or cluster centers) to the intrinsic dimensionality of the space. Using SSS, the number of clusters in each node of the tree depends on the complexity of the subspace it represents. The space partition in each node will be made depending on that complexity, improving thus the performance of the search operation. In this paper we present this new method and provide experimental results showing that SSSTree performs better than previously proposed indexes.