학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem

Resource Type: Conference
Authors: An, Chunsheng; Sun, Jingtong; Wang, Yifeng; Wei, Qingjie
Source: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS) QRS Software Quality, Reliability and Security (QRS), 2021 IEEE 21st International Conference on. :883-887 Dec, 2021
Subject: Computing and Processing
Measurement
Data privacy
Conferences
Software algorithms
Clustering algorithms
Software quality
Probability distribution
K-means
CTGAN
oversampling
data imbalance
Language
ISSN: 2693-9177

Online Access

Full Text (IEEE)

초록

CTGAN is a tabular data synthesis method for privacy preservation, which is used in this paper for data imbalance problem. This paper proposes a method for dealing with imbalanced data sets that combines K-means clustering and CTGAN to address the imbalanced distribution of minority class examples that result from oversampling with CTGAN. By conducting experiments with the LightGBM algorithm on home loan and online shopping datasets, it is demonstrated that the CTGAN method achieves superior learning results in f1-score and G-mean metrics compared to the interpolation-based oversampling technique represented by SMOTE. The preceding results indicate that by applying the method described in this paper to handle an imbalanced dataset, one can obtain a dataset with more examples, a more uniform distribution, and less overfitting while still satisfying the original dataset's probability distribution.

공지

DAU Library

학술논문

요약정보

A K-means Improved CTGAN Oversampling Method for Data Imbalance Problem

Online Access

초록