A0389
Title: Component and feature selection in mixtures of generalised linear models
Authors: Sollie Millard - University of Pretoria (South Africa) [presenting]
Salomi Millard - University of Pretoria (South Africa)
Frans Kanfer - University of Pretoria (South Africa)
Mohammad Arashi - Ferdowsi University of Mashhad (Iran)
Gaonyalelwe Maribe - University of Pretoria (South Africa)
Abstract: Datasets with a relatively large number of highly correlated features are often found in applications of a finite mixture of regression models, resulting in unstable parameter estimates. The contribution of each feature towards the response variable differs in the respective components of the mixture model. This creates a complex feature selection problem. Penalised regression methods are frequently used to perform feature selection whilst addressing the issues that arise due to multicollinearity. The estimation of a mixture of generalised linear models in the presence of multicollinearity is considered, addressing both feature and component selection. The selection of the optimal number of components is important since traditional maximum likelihood estimation faces difficulty when an incorrect number of components are specified. We propose a novel penalised-likelihood approach to conduct model selection for finite mixtures of generalised linear models. Penalties are imposed on both mixing proportions and regression coefficients, hence order selection of the mixture and the variable selection in each component can be simultaneously achieved. A modified EM algorithm is proposed. We consider the use of the novel modified elastic-net penalty for feature selection. An extensive simulation study is performed to demonstrate the properties pertaining to component and feature selection of this approach.