A0158
Title: High-dimension, low-sample-size analysis: Automatic sparse estimation and geometric insights in non-sparse spiked models
Authors: Makoto Aoshima - University of Tsukuba (Japan) [presenting]
Abstract: High-dimensional, low-sample-size (HDLSS) data presents unique challenges, particularly with non-sparse structures and spiked eigenvalues. Traditional sparse estimation methods often fail to fully capture the complexity of such data, which includes intrinsic information and large amounts of noise. The focus is on automatic sparse estimation techniques, leveraging geometric insights, data transformations, and methods for estimating the number of principal components to handle non-sparse spiked models effectively. Two key models are discussed: The strongly spiked eigenvalue (SSE) model and the non-SSE model. The SSE model applies to non-sparse, low-rank structures with strongly spiked eigenvalues, often observed in real-world datasets. However, spiked noise may prevent asymptotic normality in inference problems. To address this, new PCA methods for noise reduction, cross-data-matrix analysis, and transformations converting SSE models to non-SSE models are introduced. These techniques are demonstrated using a multi-million-dimensional genomic dataset with only a few dozen samples, tackling feature selection, classification, outlier detection, and high-dimensional clustering. The importance of non-sparse modeling in HDLSS analysis is highlighted, providing insights into geometric consistency and robust classification.