COMPSTAT 2022: Start Registration
View Submission - COMPSTAT2022
A0594
Title: Discriminant analysis with corrupted label data using subject similarity Authors:  Masaaki Okabe - Doshisha University (Japan) [presenting]
Hiroshi Yadohisa - Doshisha University (Japan)
Abstract: In the classification task, the labels of the obtained training data are assumed to be correct. However, the data labeled by humans and the labels assigned to objects may be incorrect due to problems such as mislabeling. If the labels in the training data are incorrect, the classification accuracy of the discriminant model may be reduced. The previous study assumes that given a true label, features and corrupted occurrences are independent of one another. In other words, they assume that mislabeling occurs randomly. In this situation, when the balanced error rate (BER) is used as the objective function, it is shown that the discriminant model that optimizes the objective function for data with corrupted labels optimizes the BER for data without corrupted labels. However, the classification may not work well when the label corruptness is correlated with the features. For example, if a label error depends on a feature, the label error will be correlated with the feature. The aim is to solve this problem by treating corrupted labels and weighting objects with features.