EcoSta 2018: Registration
View Submission - EcoSta2018
A0411
Title: A mixture regression model of multivariate generalized Bernoulli distributions Authors:  Shu-Kay Ng - Griffith University (Australia) [presenting]
Abstract: In healthcare research, outcome variables in a categorical form are commonplace. Data collection often involves the acquisition of information on a spectrum of individuals feature variables (risk factors) that may influence the outcomes. Clustering of individuals based on the categorical outcome variables ($p$ dimensions) and associated vector of $q$-dimensional risk factors can be obtained via a mixture regression model-based approach. With this approach, each component-density function is specified by a multivariate generalized Bernoulli distribution consisting of one draw on $d_i$ categories for each outcome variable $i=1,\ldots,p$. Moreover, the risk factors are included in the mixing proportion via a logistic model. The proposed mixture regression model is thus able to simultaneously cluster individuals into groups with different patterns of outcomes and identify the characteristics of individuals that are relevant for explaining the heterogeneity in outcome patterns. Parameter estimation is based on maximum likelihood via the expectation-maximization (EM) algorithm. This model can also be adopted to cluster mixed categorical and continuous data, and apply in consensus clustering where each categorical outcome variable represents a partition of individuals based on a number of different sets of feature variables. The method is illustrated using simulated data and a publicly available data set concerning comorbidity patterns among alcohol- and drug-dependent adults.