학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

Resource Type: Working Paper
Authors: Zhu, Chenghao; Chen, Nuo; Gao, Yufei; Zhang, Yunyi; Tiwari, Prayag; Wang, Benyou
Source
Subject: Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Language

Online Access

초록

The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of LLMs in ever-changing real-world scenarios. Our study examines temporal generalization, which includes the ability to understand, predict, and generate text relevant to past, present, and future contexts, revealing significant temporal biases in LLMs. We propose an evaluation framework, for dynamically generating benchmarks from recent real-world predictions. Experiments demonstrate that LLMs struggle with temporal generalization, showing performance decline over time. These findings highlight the necessity for improved training and updating processes to enhance adaptability and reduce biases. Our code, dataset and benchmark are available at https://github.com/FreedomIntelligence/FreshBench.
Comment: Preprint

공지

DAU Library

학술논문

요약정보

Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

Online Access

초록