학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Dictionary-based Oversampling Approach to Clinical Document Classification on Small and Imbalanced Dataset

Resource Type: Conference
Authors: Abdollahi, Mahdi; Gao, Xiaoying; Mei, Yi; Ghosh, Shameek; Li, Jinyan
Source: 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) WI-IAT Web Intelligence and Intelligent Agent Technology (WI-IAT), 2020 IEEE/WIC/ACM International Joint Conference on. :357-364 Dec, 2020
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Geoscience
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Obesity
Machine learning
Probabilistic logic
Biology
Intelligent agents
Informatics
Diseases
Oversampling
Medical Document Classification
WordNet
Language

Online Access

Full Text (IEEE)

초록

Medical document classification is one of the prominent research problems in document classification domain. As medical discharge notes are collected from real patients, they are often imbalanced. Moreover, these datasets are usually too small for data-hungry models (specially in rare disease cases). Both of these issues can lead to poor classification performance. In this work a new probabilistic dictionary-based data augmentation approach is proposed to address these issues by oversampling on the minority class. This method works by creating new documents with high variety by using the extracted synonyms from WordNet with awareness of synonyms’ similarities with the original word. To verify the effectiveness of the proposed oversampling approach, three different machine learning methods are used to learn classifiers from the augmented clinical text datasets generated by the oversampling approach. The experimental results show that the proposed method not only provides better classification accuracy than the imbalanced dataset case, but also can outperform some existing augmentation methods on the dataset of 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge.

공지

DAU Library

학술논문

요약정보

A Dictionary-based Oversampling Approach to Clinical Document Classification on Small and Imbalanced Dataset

Online Access

초록