Spectrum: fast density-aware spectral clustering for single and multi-omic data.
- Resource Type
- Article
- Authors
- John, Christopher R; Watson, David; Barnes, Michael R; Pitzalis, Costantino; Lewis, Myles J
- Source
- Bioinformatics. 2/15/2020, Vol. 36 Issue 4, p1159-1166. 8p.
- Subject
- *INTERNET servers
*TENSOR products
*NEAREST neighbor analysis (Statistics)
*CELL analysis
*INDIVIDUALIZED medicine
*INTEGRATED software
*DATA integration
*TRANSCRIPTOMES
- Language
- ISSN
- 1367-4803
Motivation Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data. Results We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods. Availability and implementation Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]