학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Hybrid Approach to Optimize MPI Collectives by In-network-computation and Point-to-Point Messages

Resource Type: Conference
Authors: Chen, Shuping; He, Wangquan; Qi, Fengbin; Zheng, Yan; Yu, Kang
Source: 2022 7th International Conference on Computer and Communication Systems (ICCCS) Computer and Communication Systems (ICCCS), 2022 7th International Conference on. :773-783 Apr, 2022
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Performance evaluation
Aggregates
High performance computing
Software algorithms
Interference
Tail
Hardware
in-network-computation
MPI collectives
aggregate tree
job interference
high-performance computing
Language

Online Access

Full Text (IEEE)

초록

Using in-network-computation capabilities of the network devices (also called hardware collectives) to optimize MPI collectives has become popular in high-performance computing, and shows significant performance advantages. However, the hardware collectives are not flawless in practical use scenarios. One of the problems is that it is difficult to use. In order to obtain the performance advantage of hardware collectives, the network management software need to create dedicated aggregate tree for each MPI communicator, which is a complicated task. One solution is to make MPI communicators sharing the global imprecise aggregate trees created by the management software when initiating networks, but it leads to heavy interference between MPI communicators and causes significant performance degradation. So we have to make tradeoff between performance and ease of use. We propose a hybrid approach to optimize MPI collectives by in-network-computation and point-to-point messages. On the one hand, we use the pre-created aggregate trees in each super-node, rather than sending requests to the network management software to create dedicated aggregate trees. On the other hand, the hardware collectives are transferred only in the local super-node, so it cannot disturb the jobs running on other super-nodes. We provide a cost model to evaluate the overhead of the hybrid collective algorithms. We also test its performance in the new generation Sunway supercomputer. The results show that our approach reduces the median latency by 18%~74% compared to collectives implemented by point-to-point messages, although the performance decrease slightly compared to the original hardware collectives. In addition, the tail latency of our approach is significantly lower than that of the original hardware collectives in the presence of heavy interference.

공지

DAU Library

학술논문

요약정보

Hybrid Approach to Optimize MPI Collectives by In-network-computation and Point-to-Point Messages

Online Access

초록