Improving performance for imbalanced data classification using oversampling and characteristics of each cluster
Authors: Phan Anh Phong, Le Van Thanh
Vinh University Journal of Science
: 53 : 5-15
Publishing year: 9/2024
This paper proposes a method to enhance the effectiveness of classifying imbalanced data. The main contribution of the method is integrating the K-means clustering algorithm and the minority oversampling technique VCIR to generate synthetic samples that closely represent the actual data characteristics. Experimental results have shown that the proposed method performs better on several metrics than current popular methods for handling imbalanced data, such as SMOTE, Borderline-SMOTE, Kmeans-SMOTE, and SVM-SMOTE.
Data classification; imbalanced data; oversampling; K-Means; SMOTE