CFE 2019: Start Registration
View Submission - CMStatistics
Title: Model-based clustering of high-dimensional longitudinal data Authors:  Tongtong Wu - University of Rochester (United States)
Luoying Yang - University of Rochester (United States) [presenting]
Abstract: A model-based clustering method is introduced with model and variable selection for high-dimensional longitudinal data. The motivation comes from the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multi-level factors related to the change of physical activities by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. The goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors among over 800 candidate variables within groups. The previous analyses could only allow clustering and variable selection conducted over two steps, while this method can perform the tasks simultaneously. By assuming each subject is drawn from a finite Gaussian mixture distribution, model effects and cluster labels are estimated based on the restricted maximum log-likelihood, with SCAD penalty and group lasso penalty applied on the fixed effects and random effects, respectively, to induce sparsity in predictors for efficient parameter estimation and identification. Bayesian Information Criterion is used to determine the optimal cluster number and tuning parameters values for the penalties. Our numerical studies show that the new model has advantages such as faster computation and more accurate clustering over other existing clustering methods and is able to accommodate complex data with multi-level and longitudinal effects.