학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

Resource Type: Working Paper
Authors: Xia, Heming; Ge, Tao; Wang, Peiyi; Chen, Si-Qing; Wei, Furu; Sui, Zhifang
Source
Subject: Computer Science - Computation and Language
Computer Science - Machine Learning
Language

Online Access

초록

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around $5\times$ speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only $1.4\times$$\sim$$2\times$ speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.
Comment: $\textbf{v1-v4}$ (Early 2022): Initially announced with the name "Generalized Aggressive Decoding"; $\textbf{v5}$ (September 2022): Renamed to "Speculative Decoding" as the ICLR'23 submission (https://openreview.net/pdf?id=H-VlwsYvVi), marking $\textbf{the first time}$ "Speculative Decoding" has been publicly proposed. $\textbf{v6}$: EMNLP'23 Findings camera ready

공지

DAU Library

학술논문

요약정보

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

Online Access

초록