EcoSta 2021: Start Registration
View Submission - EcoSta2021
A0563
Title: Sparse common and distinctive covariates logistic regression: Classification method for high-dimensional multiblock data Authors:  Soogeun Park - Tilburg University (Netherlands) [presenting]
Eva Ceulemans - University of Leuven (Belgium)
Katrijn Van Deun - Tilburg University (Netherlands)
Abstract: Datasets comprised of large sets of variables from multiple sources concerning the same observation units are becoming more widespread today. Constructing a classification model in the context of such high-dimensional and multi-block datasets involves a multitude of challenges: variable selection, classification of the response variable and identification of processes at play underneath the predictors. These processes are of particular interest in the setting of multi-block data because they can either be associated individually with single data blocks or jointly with multiple blocks. Many methods have addressed the classification problem in high-dimensionality for a single block of data. However, the additional challenge of capturing and distinguishing distinctive and joint processes from multi-block data has not received sufficient attention. To this end, we propose Sparse Common and Distinctive Covariates Logistic Regression (SCD-Cov-logR). The method extends principal covariates regression to multi-block settings and combines with generalized linear modeling framework to allow classification of a categorical response while revealing predictive processes that involve single or multiple data blocks. In a simulation study, SCD-Cov-logR resulted in outperformance compared to related methods commonly used in behavioural sciences.