학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Evaluation of Hallucination and Robustness for Large Language Models

Resource Type: Conference
Authors: Hu, Rui; Zhong, Junhao; Ding, Minjie; Ma, Zeyu; Chen, Mingang
Source: 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) QRS-C Software Quality, Reliability, and Security Companion (QRS-C), 2023 IEEE 23rd International Conference on. :374-382 Oct, 2023
Subject: Computing and Processing
Perturbation methods
Software quality
Robustness
Software reliability
Security
Probes
Testing
large language model
hallucination
robustness
evaluation
Language
ISSN: 2693-9371

Online Access

Full Text (IEEE)

초록

As large language models (LLMs) rapidly advance, rigorous testing and evaluation of these models grows increasingly crucial. To address this need, we have developed three types of questions: Chinese contextual, English contextual, and language context-independent. Testing in both Chinese and English probes the LLMs' hallucination tendencies. We investigate the impact of language on hallucinations from two perspectives: the type of language used in the input prompt and the cultural context underlying the prompt's content. Additionally, 52 multi-domain single-choice questions from C-EVAL are presented in original and randomized order to assess robustness to perturbations. Among the five LLMs, the tests demonstrate GPT -4 has the strongest anti-hallucination and robustness capabilities, answering with greater accuracy, consistency, and reliability. ChatGLM ranks second and outperforms GPT -4 on Chinese context-dependent questions. Emergent testing phenomena are analyzed from the user's perspective. Hallucinated responses are categorized and potential causal factors leading to hallucination and fragility are examined. Based on these findings, viable avenues for improvement are proposed.

공지

DAU Library

학술논문

요약정보

Evaluation of Hallucination and Robustness for Large Language Models

Online Access

초록