학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Parameter-efficient fine-tuning of large-scale pre-trained language models

Resource Type: Original Paper
Authors: Ding, Ning; Qin, Yujia; Yang, Guang; Wei, Fuchao; Yang, Zonghan; Su, Yusheng; Hu, Shengding; Chen, Yulin; Chan, Chi-Min; Chen, Weize; Yi, Jing; Zhao, Weilin; Wang, Xiaozhi; Liu, Zhiyuan; Zheng, Hai-Tao; Chen, Jianfei; Liu, Yang; Tang, Jie; Li, Juanzi; Sun, Maosong
Source: Nature Machine Intelligence. 5(3):220-235
Subject
Language: English
ISSN: 2522-5839

Online Access

초록

With the prevalence of pre-trained language models (PLMs) and the pre-training–fine-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fine-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term ‘delta-tuning’, where ‘delta’ a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are ‘changed’ during training. We formally describe the problem and propose a unified categorization criterion for existing delta-tuning methods to explore their correlations and differences. We also discuss the theoretical principles underlying the effectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs.
Training a deep neural network can be costly but training time is reduced when a pre-trained network can be adapted to different use cases. Ideally, only a small number of parameters needs to be changed in this process of fine-tuning, which can then be more easily distributed. In this Analysis, different methods of fine-tuning with only a small number of parameters are compared on a large set of natural language processing tasks.

공지

DAU Library

학술논문

요약정보

Parameter-efficient fine-tuning of large-scale pre-trained language models

Online Access

초록