VinhUni-Science

Improving performance for imbalanced data classification using oversampling and characteristics of each cluster

Authors: Phan Anh Phong, Le Van Thanh

7 0

Vinh University Journal of Science

: 53 : 5-15

: https://vujs.vn/vi/all-issues/copy/A.0054-2024

Publishing year: 9/2024

This paper proposes a method to enhance the effectiveness of classifying imbalanced data. The main contribution of the method is integrating the K-means clustering algorithm and the minority oversampling technique VCIR to generate synthetic samples that closely represent the actual data characteristics. Experimental results have shown that the proposed method performs better on several metrics than current popular methods for handling imbalanced data, such as SMOTE, Borderline-SMOTE, Kmeans-SMOTE, and SVM-SMOTE.

Data classification; imbalanced data; oversampling; K-Means; SMOTE

Bài báo khoa học