eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

PRIMATE: Processing in Memory Acceleration for Dynamic Token-pruning Transformers

Resource Type: Conference
Authors: Pan, Yue; Zhou, Minxuan; Lee, Chonghan; Li, Zheyu; Kushwah, Rishika; Narayanan, Vijaykrishnan; Rosing, Tajana
Source: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) Design Automation Conference (ASP-DAC), 2024 29th Asia and South Pacific. :557-563 Jan, 2024
Subject: Components, Circuits, Devices and Systems
Heuristic algorithms
Memory management
Bandwidth
Parallel processing
Transformers
Throughput
Energy efficiency
Language
ISSN: 2153-697X

Online Access

Full Text (IEEE)

초록

Attention-based models such as Transformers represent the state of the art for various machine learning (ML) tasks. Their superior performance is often overshadowed by the substantial memory requirements and low data reuse opportunities. Processing in Memory (PIM) is a promising solution to accelerate Transformer models due to its massive parallelism, low data movement costs, and high memory bandwidth utilization. Existing PIM accelerators lack the support for algorithmic optimizations like dynamic token pruning that can significantly improve the efficiency of Transformers. We identify two challenges to enabling dynamic token pruning on PIM-based architectures: the lack of an in-memory top-k token selection mechanism and the memory underutilization problem from pruning. To address these challenges, we propose PRIMATE, a software-hardware co-design PIM framework based on High Bandwidth Memory (HBM). We initiate minor hardware modifications to conventional HBM to enable Transformer model computation and top-k selection. For software, we introduce a pipelined mapping scheme and an optimization framework for maximum throughput and efficiency. PRIMATE achieves $30.6\times$ improvement in throughput, $29.5\times$ improvement in space efficiency, and $4.3\times$ better energy efficiency compared to the current state-of-the-art PIM accelerator for Transformers.

공지

DAU Library

eArticles

요약정보

PRIMATE: Processing in Memory Acceleration for Dynamic Token-pruning Transformers

Online Access

초록