High-Performance SVD Partial Spectrum Computation
- Resource Type
- Conference
- Authors
- Keyes, David; Ltaief, Hatem; Nakatsukasa, Yuji; Sukkari, Dalal
- Source
- SC23: International Conference for High Performance Computing, Networking, Storage and Analysis High Performance Computing, Networking, Storage and Analysis, SC23: International Conference for. :1-12 Nov, 2023
- Subject
- Communication, Networking and Broadcast Technologies
Computing and Processing
Energy consumption
Software libraries
Heuristic algorithms
High performance computing
Software algorithms
Benchmark testing
Robustness
Singular Value Decomposition
Partial Spectrum
Parallel Numerical Algorithms
Distributed-Memory Systems
- Language
- ISSN
- 2167-4337
We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1-20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 1 https://github.com/ecrc/qdwhpartial-svd. on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experi-mental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations.