A1508
Title: Variable selection in high-dimensional survival analysis
Authors: Pilar Gonzalez-Barquero - Universidad Carlos III de Madrid (Spain) [presenting]
Rosa Lillo - Universidad Carlos III de Madrid (Spain)
Alvaro Mendez-Civieta - Columbia University (United States)
Abstract: The rise of high-dimensional datasets, characterized by a high number of covariates, introduces significant challenges to traditional models. These large datasets, while rich in information, complicate the decision-making process. In this context, variable selection methods are necessary to reduce dimensionality and make the problem feasible. The focus is on survival analysis, particularly on the performance of Cox proportional hazards models in high-dimensional settings with a significant proportion of censored data. In such scenarios, the model presents an infinite number of possible solutions for the regression coefficients, requiring regularization techniques like Lasso and adaptive Lasso. Various methods are proposed and evaluated for determining adaptive Lasso weights, including principal component analysis, ridge regression, univariate Cox regression, and the random survival forest algorithm. Additionally, these methods are applied to genomic data. A real high-dimensional dataset comprising clinical and genetic information of patients with triple-negative breast cancer (TNBC), which is a type of breast cancer with low survival rates due to its aggressive nature, is used to identify variables influencing survival outcomes.