학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa)

Resource Type: Conference
Authors: Sirusstara, Joshua; Alexander, Neil; Alfarisy, Akram; Achmad, Said; Sutoyo, Rhio
Source: 2022 3rd International Conference on Artificial Intelligence and Data Sciences (AiDAS) Artificial Intelligence and Data Sciences (AiDAS), 2022 3rd International Conference on. :1-6 Sep, 2022
Subject: Computing and Processing
General Topics for Engineers
Training
Analytical models
Data analysis
Bit error rate
Focusing
Media
Natural language processing
BERT
clickbait
deep learning
indoBERT
natural language processing
RoBERTa
Language

Online Access

Full Text (IEEE)

초록

The abuse of clickbait headlines by online news media has kept increasing, causing bad experiences for the reader and reducing online news reading engagement. Since the advances in self-attention model, model such as BERT and other variant has been considered to be the state-of-the-art method in many NLP (natural language processing) related tasks. In this experiment, we'll be focusing on one of BERT variants called RoBERTa which is an improved model over BERT. The goal of this experiment is to compare how well the currently available RoBERTa model perform in detecting clickbait In-donesia news headline. Using a total of 6632 annotated news headlines, sampled from the CLICK-ID dataset, we experimented with some of huggingface community's top voted RoBERTa and BERT Indonesia language models, and compare each of their performances in classifying Indonesia clickbait headlines. We evaluate the accuracy, precission, recall, and F1 score of each models and found that cahya/XLM-RoBERTa-large and indobenchmarkIndoBERT-p1 as our top-performing model in this experiment with a 92% accuracy. Resource and performance wise we recommend indobenchmark/IndoBERT-p1 as a more suitable model, however, XLM-RoBERTa-large also comes with its merit in terms of having a more consistent output across validation and unseen set. In this paper, we propose each of the model configurations along with the exploratory data analysis, preprocessing method on the data, and the training performance of the configuration of each model.

공지

DAU Library

학술논문

요약정보

Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa)

Online Access

초록