EcoSta 2017: Start Registration
View Submission - EcoSta2017
A0431
Title: PCA based clustering for ultrahigh-dimensional data Authors:  Makoto Aoshima - University of Tsukuba (Japan) [presenting]
Kazuyoshi Yata - University of Tsukuba (Japan)
Abstract: High-dimension, low-sample-size (HDLSS) data situations occur in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. We consider clustering based on principal component analysis (PCA) for HDLSS data. We give theoretical reasons why PCA is effective for clustering HDLSS data. First, we derive a geometric representation of HDLSS data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop the idea of geometric representations and provide geometric consistency properties for multiclass mixture models. We show that PCA can classify HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering by using microarray data sets. We show that HDLSS data sets consisting of several lung carcinomas types hold the geometric consistency properties.