학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Resource Type: Conference
Authors: Hu, Hexiang; Luan, Yi; Chen, Yang; Khandelwal, Urvashi; Joshi, Mandar; Lee, Kenton; Toutanova, Kristina; Chang, Ming-Wei
Source: 2023 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2023 IEEE/CVF International Conference on. :12031-12041 Oct, 2023
Subject: Computing and Processing
Signal Processing and Analysis
Visualization
Image recognition
Text recognition
Computational modeling
Encyclopedias
Tail
Benchmark testing
Language
ISSN: 2380-7504

Online Access

Full Text (IEEE)

초록

Large-scale multi-modal pre-training models such as CLIP [30] and PaLI [8] exhibit strong generalization on various visual domains and tasks. However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual recognizers. To address this, we formally present the task of Open-domain Visual Entity recognitioN (Oven), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct Oven-Wiki ‡ by repurposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. Oven-Wiki challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels. Our study on state-ofthe-art pre-trained models reveals large headroom in generalizing to the massive-scale label space. We show that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning. We also find existing pretrained models yield different strengths: while PaLI-based models obtain higher overall performance, CLIP-based models are better at recognizing tail entities.

공지

DAU Library

학술논문

요약정보

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Online Access

초록