Accelerating Large-Scale Molecular Similarity Search through Exploiting High Performance Computing
- Resource Type
- Conference
- Authors
- Zhu, Chun Jiang; Zhu, Tan; Li, Haining; Bi, Jinbo; Song, Minghu
- Source
- 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Bioinformatics and Biomedicine (BIBM), 2019 IEEE International Conference on. :330-333 Nov, 2019
- Subject
- Bioengineering
Computing and Processing
Molecular Similarity Search
Indexing
High Performance Computing
- Language
Molecular similarity search is a simple but powerful chemoinformatics tool to rapidly find molecules that are structurally similar to a known reference compound from a large molecular database. A variety of indexing structures had been developed to improve the performance of similarity search over the large compound database. However, those algorithms often require a large computational cost to build indices and process queries, especially for a large-scale molecular dataset. We study the problem of accelerating similarity search using high performance computing (HPC) and design general algorithms to speed up existing indexing algorithms. We first propose a parallel algorithm based on data chunking, working for all indexing algorithms for similarity search. We theoretically analyze its computation cost and relationships between the speedup and number of data chunks. We further propose a parallel query algorithm for all graph-based indexing algorithms to accelerate their query processing in HPC. Both of our algorithms consistently offer a greater speedup than the baseline algorithm(s) when evaluated with different datasets and parameter settings.