학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Resource Type: Conference
Authors: Wu, Cheng-En; Tian, Yu; Yu, Haichao; Wang, Heng; Morgado, Pedro; Hu, Yu Hen; Yang, Linjie
Source: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2023 IEEE/CVF International Conference on. :15442-15451 Oct, 2023
Subject: Computing and Processing
Signal Processing and Analysis
Knowledge engineering
Adaptation models
Training data
Robustness
Data models
Noise robustness
Noise measurement
Language
ISSN: 2380-7504

Online Access

Full Text (IEEE)

초록

Vision-language models such as CLIP [27] learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.

공지

DAU Library

학술논문

요약정보

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Online Access

초록