학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs

Resource Type: Conference
Authors: Wu, Chunshu; Bandara, Sahan; Geng, Tong; Guo, Anqi; Haghi, Pouya; Sachdeva, Vipin; Sherman, Woody; Herbordt, Martin
Source: 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) FPL Field-Programmable Logic and Applications (FPL), 2022 32nd International Conference on. :101-108 Aug, 2022
Subject: Computing and Processing
Degradation
Program processors
Filtering
Force
Distributed databases
Bandwidth
Data transfer
high-performance computing
FPGA
molecular dynamics
Language
ISSN: 1946-1488

Online Access

Full Text (IEEE)

초록

In N-body applications, the efficient evaluation of range-limited forces depends on applying certain constraints, including a cut-off radius and force symmetry (Newton's Third Law). When computing the pair-wise forces in parallel, finding the optimal mapping of particles and computations to memories and processors is surprisingly challenging, but can result in greatly reduced data movement and computation. Despite FPGAs having a distinct compute model (BRAMs/network/pipelines) from CPUs and ASICs, mappings on FPGAs have not previously been studied in depth: it was thought that the half-shell method was preferred. In this work, we find that the Manhattan method is sur-prisingly compatible with FPGA hardware. With the cache overlapping technique proposed in this paper, the ultra-fine-grained data access demanded by the Manhattan method can be satisfied, despite the fact that the memory blocks on FPGAs appear to be insufficiently fine-grained. We further demonstrate that, compared to the traditional baseline half-shell method, approximately a half of the filters (preprocessors) can be removed without performance degradation. For communication, the amount of data transferred can be reduced by 40% - 75% in the most common multi-FPGA scenarios. Moreover, data transfers are almost perfectly balanced along all directions, and the optimization requires only minimal hardware resources. The practical consequence is that nearly 2 x to 4 x the workload can be handled without upgrading the network connections between FPGAs. This is a critical finding given the relatively limited bandwidth available in many common accelerator boards and the strong-scaling applications to which FPGA clusters are being applied.

공지

DAU Library

학술논문

요약정보

Optimized Mappings for Symmetric Range-Limited Molecular Force Calculations on FPGAs

Online Access

초록