A0454
Title: Principal component regression in high dimension
Authors: Alden Green - Stanford University (United States)
Elad Romanov - Stanford (United States) [presenting]
Abstract: Principal component regression (PCR) is a classical two-step approach to linear regression, where one first reduces the data dimension by projecting onto its leading principal components and then performs ordinary least squares regression. PCR is studied in an asymptotic high-dimensional regression setting, where the number of data points is proportional to the dimension. The main deliverables are asymptotically exact limiting formulas for the estimation and prediction risks, which depend in a nuanced way on the eigenvalues of the population covariance, the alignment between the population principal components and the true signal, and the number of selected components. A key challenge in the high-dimensional regime is that the sample covariance matrix is an inconsistent estimate of its population counterpart, and thus, sample principal components may fail to capture potential latent low-dimensional structure in the data. This point is demonstrated through several case studies, including that of a spiked covariance matrix.