학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Using pre-trained language models for toxic comment classification

Resource Type: Electronic Thesis or Dissertation
Authors: Zhao, Zhixue
Source
Subject
Language: English

Online Access

초록

Toxic comment classification is a core natural language processing task for combating online toxic comments. It follows the supervised learning paradigm which requires labelled data for the training. A large amount of high-quality training data is empirically beneficial to the model performance. Transferring a pre-trained language model (PLM) to a downstream model allows the downstream model to access more data without creating new labelled data. Despite the increasing research on PLMs in NLP tasks, there remains a fundamental lack of understanding in applying PLMs to toxic comment classification. This work focuses on this area from three perspectives. First, we investigate different transferring strategies for toxic comment classification tasks. We highlight the importance of efficiency during the transfer. The transferring efficiency seeks a reasonable requirement of computational resources and a comparable model performance at the same time. Thus, we explore the continued pre-training in-domain which further pre-trains a PLM with in-domain corpus. We compare different PLMs and different settings for the continued pre-training in-domain. Second, we investigate the limitations of PLMs for toxic comment classification. Taking the most popular PLM, BERT, as the representative model for our study, we focus on studying the identity term bias (i.e. prediction bias towards comments with identity terms, such as "Muslim" and "Black"). To investigate the bias, we conduct both quantitative and qualitative analyses and study the model explanations. We also propose a hypothesis that builds on the potential relationship between the identity term bias and the subjectivity of comments. Third, building on the hypothesis, we propose a novel BERT-based model to mitigate the identity term bias. Our method is different from previous methods that try to suppress the model's attention to identity terms. To do so, we insert the subjectivity into the model along with the suggestion of the presence of identity terms. Our method shows consistent improvements on a range of different toxic comment classification tasks.

공지

DAU Library

학술논문

요약정보

Using pre-trained language models for toxic comment classification

Online Access

초록