A1718
Title: Improving finite sample estimates of principal components for high-dimensional data
Authors: Nuwan Weeraratne - University of Waikato (New Zealand) [presenting]
Lynette Hunt - University of Waikato (New Zealand)
Jason Kurz - University of Waikato (New Zealand)
Abstract: Principal Component Analysis (PCA) is a method of compressing many data into a format that captures the essence of the original data. Moreover, PCA is a matrix decomposition technique that uses eigendecomposition. It quantifies variable relationships using covariance matrices, determines the data distribution, and assesses direction significance using eigenvalues. Therefore, the variance-covariance estimation is crucial for PCA. However, the traditional Maximum Likelihood Estimation (MLE) of covariance is asymptotically unbiased, but it gives poor estimates of the principal components and poorly conditioned estimates of the covariance matrix in high dimensional settings where the number of variables (p) exceeds the number of observations (n). To address the issues with PCA, we proposed a novel covariance estimation called the Pairwise Differences Covariance (PDC) estimation with four regularized versions of PDC (i.e., Standardized PDC (SPDC), Local Scaled PDC (LPDC), Scaled by Maximum Absolute Value PDC (MAXPDC), and Scaled by Range PDC method (RPDC)). In empirical comparisons with MLE and its existing most famous alternative, Ledoit-Wolf, the SPDC and all other regularized PDC estimators perform well in estimating the variance-covariance structure and principal components while minimizing the PCs overdispersion and cosine similarity error (CSE). Real data applications are presented.