COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0236
Title: A fuzzy cluster-scaled principal component analysis for mixed high-dimension and low-sample size data Authors:  Mika Sato-Ilic - University of Tsukuba (Japan) [presenting]
Abstract: A fuzzy cluster-scaled principal component analysis (fuzzy cluster-scaled PCA) for mixed high-dimension and low-sample size data (mixed HDLSS) is proposed. The mixed HDLSS data comprises numerical and categorical data regarding quantitative and qualitative variables, and the number of variables is much larger than the number of objects. Conventionally, fuzzy cluster-scaled PCA has been proposed to analyze HDLSS data because ordinary PCA cannot be applied to HDLSS data since, theoretically, we cannot obtain the correct solution because of ordinary PCA. The essence of the fuzzy cluster-scaled PCA is the utilization of the result of fuzzy clustering. Fuzzy cluster-scaled PCA is based on the fuzzy cluster-scaled correlation, which is decomposed into two parts. First is the correlation of classification structures obtained because of fuzzy clustering, and this can be obtained using the dissimilarity of categorical data regarding qualitative variables. Second, is the ordinary correlation between variables, so we can use numerical data regarding quantitative variables. For constructing the fuzzy cluster-scaled correlation, the two parts used different kinds of data, which are numerical and categorical data, are reasonably combined, and we can obtain the result of PCA for the mixed HDLSS data. Several numerical examples show a better performance of the proposed method.