CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0954
Title: Principal component analysis for mixed high-dimension low-sample size data based on fuzzy-cluster scale Authors:  Mika Sato-Ilic - University of Tsukuba (Japan) [presenting]
Abstract: High-dimension, low-sample size (HDLSS) data in which the number of dimensions is much larger than the number of objects is difficult to deal with through conventional statistical analysis, such as principal component analysis (PCA), due to the inconsistent eigenvalues of the sample covariance matrix regarding variables for HDLSS data. In addition, if the HDLSS data is a mixed type of data obtained as both numerical and categorical data, then the difficulty of dealing with the data is further significantly increased. The focus is on the mixed-type HDLSS data, presenting the proposed PCA for mixed HDLSS data. The proposed PCA utilizes fuzzy cluster-scaled correlation, which is decomposed into two parts: the first part is the correlation of classification structures between variables, and the second part is the correlation between variables. Then, the first part can be adapted to the categorical data, and the second part can be used for the numerical data through the same objects. Several numerical examples using real data show a better performance of the proposed PCA for mixed HDLSS data.