COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0361
Title: Sparse multivariate Gaussian mixture regression with covariance estimation Authors:  Michael Fop - University College Dublin (Ireland)
Marco Vitelli - University of Bologna (Italy)
Gabriele Soffritti - University of Bologna (Italy) [presenting]
Abstract: Gaussian clusterwise regression models and Gaussian cluster-weighted models represent useful tools to simultaneously perform multivariate linear regression analysis and model-based cluster analysis in the presence of continuous variables when the population from which the sample comes is composed of a certain number of sub-populations and the specific sub-population each sample observation belongs to is unknown. As the number of parameters scales quadratically with the number of variables, such models can be over-parameterized in the case of high-dimensional data. To mitigate this problem, lasso penalties are introduced in the model log-likelihood function so as to simultaneously obtain sparse estimators of the regression coefficients and the covariance structure within each component of the mixture. For this purpose, an efficient optimization procedure is embedded into the expectation-maximization algorithm usually employed to perform maximum likelihood estimation. The new penalized Gaussian clusterwise linear regression models and Gaussian cluster-weighted models obtained in this way allow both variable selection and regularization to be performed. This approach is also expected to enhance the prediction accuracy and interpretability of regression analysis and cluster analysis based on Gaussian clusterwise regression models and Gaussian cluster-weighted models. The performance of the new methodology is studied through simulated and real data.