SMOTified-GAN for Class Imbalanced Pattern Classification Problems
- Resource Type
- Periodical
- Authors
- Sharma, A.; Singh, P.K.; Chandra, R.
- Source
- IEEE Access Access, IEEE. 10:30655-30665 2022
- Subject
- Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Generative adversarial networks
Training
Interpolation
Generators
Costs
Prediction algorithms
Training data
Generative adversarial network (GAN)
synthetic minority over-sampling technique (SMOTE)
SMOTified-GAN
class imbalance problem
- Language
- ISSN
- 2169-3536
Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to overcome this deficiency. Our focus is on using the hybridization of Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalanced problems. We propose a novel two-phase oversampling approach involving knowledge transfer that has the synergy of SMOTE and GAN. The unrealistic or overgeneralized samples of SMOTE are transformed into realistic distribution of data by GAN where there is not enough minority class data available for GAN to process them by itself effectively. We named it SMOTified-GAN as GAN works on pre-sampled minority data produced by SMOTE rather than randomly generating the samples itself. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets. Its performance is improved by up to 9% from the next best algorithm tested on F1-score measurements. Its time complexity is also reasonable which is around $O(N^{2}d^{2}T)$ for a sequential algorithm.