학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Resource Type: Working Paper
Authors: Burns, Andrea; Srinivasan, Krishna; Ainslie, Joshua; Brown, Geoff; Plummer, Bryan A.; Saenko, Kate; Ni, Jianmo; Guo, Mandy
Source
Subject: Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Language

Online Access

초록

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.
Comment: Accepted at the WikiWorkshop 2023. Data is readily available at https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md. arXiv admin note: text overlap with arXiv:2305.03668

공지

DAU Library

학술논문

요약정보

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Online Access

초록