학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods

Resource Type: Conference
Authors: Wang, Bangchao; Wang, Zihan; Wan, Hongyan; Li, Xingfu; Deng, Yang
Source: 2023 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2023 International Joint Conference on. :1-8 Jun, 2023
Subject: Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Support vector machines
Analytical models
Costs
Neural networks
Manuals
Maintenance engineering
Software
Machine learning
Data balancing
Software traceability
Software engineering
Language
ISSN: 2161-4407

Online Access

Full Text (IEEE)

초록

Machine learning (ML) has been widely used in trace link recovery (TLR) to reduce the manual maintenance cost of trace links by developers. However, the imbalanced distribution of valid links and invalid links seriously affects the performance of classifiers. Although a few studies have applied data balancing techniques (DBT) to ML-based TLR, none of them has systematically analyzed more effective combinations of them. Therefore, we perform an empirical study on three groups of control experiments to explore the impact of the combination of different ML methods with and without DBT on TLR efficiency. We compare the performance of supervised ML-based TLR and unsupervised ML-based TLR with and without DBT respectively. Then, we analyze the performance of the ensemble learning model (EM) with DBT on TLR. The experimental results on the 7 imbalance datasets of CoEST indicate that DBT has a positive effect on ML-based TLR. Specifically, the recall of the LR model increased by 0.5517 after combining with most DBTs on EasyClinic(ID-TC), while Tomek-link significantly improves the precision of K-Nearest Neighbor (KNN), Decision Tree (DT), LR, Support Vector Machine (SVM). The precision of LR increased from 0.5036 to 1.0. BalanceRF is best at increasing recall, reaching 1.0 on 4 datasets. Moreover,the improvement degree of ML-based TLR with DBT shows differences in terms of the size of datasets and the proportion of valid links.

공지

DAU Library

학술논문

요약정보

An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods

Online Access

초록