IMPROVING PERFORMANCE FOR IMBALANCED DATA CLASSIFICATION USING OVERSAMPLING AND CHARACTERISTICS OF EACH CLUSTER
Authors: Phan Anh Phong, Le Van Thanh
VINH UNIVERSITY JOURNAL OF SCIENCE (VUJS)
: Tập 53, Số 3A :
Publishing year: 9/2024
This paper proposes a method to enhance the effectiveness of classifying imbalanced data. The main contribution of the method is the integration of the K-means clustering algorithm and the minority oversampling technique VCIR to generate synthetic samples that closely represent the actual data characteristics. Experimental results have shown that the proposed method performs better on several metrics compared to current popular methods for handling imbalanced data such as SMOTE, Borderline-SMOTE, Kmeans-SMOTE, and SVM-SMOTE.
Data Classification, Imbalanced Data, Oversampling, K-Means, SMOTE