COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0265
Title: The use of mixture models for clustering data with structured dependence Authors:  Shu-Kay Angus Ng - Griffith University (Australia) [presenting]
Richard Tawiah - University of Queensland (Austria)
Geoffrey McLachlan - University of Queensland (Australia)
Abstract: Identifying (disadvantaged) subgroups is fundamental and decisive in solving many real-world problems. Mixture models underpin a variety of statistical methods in cluster and latent class analyses for finding subgroups, outliers, and distinctive features between subgroups. Statistical inference for mixture models assumes that observed data are independent of one another. However, modern study designs often generate data structures with non-negligible dependence among data (e.g., patients treated in a hospital share the same hospital effect in multilevel studies). Thus, the independence assumption becomes invalid. Clustering methods (or mixture models) that ignore the structured dependence (by assuming zero hospital effect) can overlook the significance of such effect and data variability, resulting in misleading findings or failure to identify important risk factors. We present a statistical framework in mixtures of generalised linear mixed models (GLMMs) for clustering data with complex structured dependence. We introduce random-effect modelling techniques that can effectively capture complex intra- and between-subject correlations among observations due to various forms of dependence. An efficient estimation of model parameters is achieved using extended best linear unbiased prediction (BLUP) and approximate residual maximum likelihood (REML) procedures. We consider several data sets to show the capacity of this clustering approach.