CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1140
Title: On an approach for performing model-based clustering and imputation for multivariate data sets with asymmetric features Authors:  Brian Franczak - MacEwan University (Canada) [presenting]
Abstract: Classification can be defined as the process of sorting similar objects into groups. Classification is performed in an unsupervised, semi-supervised, or fully supervised setting. In the unsupervised setting, also known as clustering, no a priori knowledge is used, while the other two settings use some a priori knowledge. Model-based clustering is the process of using a finite mixture model for unsupervised classification. An approach is presented for performing model-based clustering for incomplete multivariate data sets that exhibit asymmetric features. The approach will model asymmetry directly or via a transformation of the observed data while simultaneously performing imputation. An expectation-maximization (EM) based scheme is used for parameter estimation. The EM-based scheme iteratively performs single imputation while estimating the maximum likelihood estimates of the model of interest. At convergence, traditional likelihood-based criteria like the Bayesian information criterion or integrated complete likelihood measure are used for model selection. Classification performance is assessed using the adjusted Rand index (ARI), and other relevant statistics demonstrating the overall performance of the parameter estimation scheme are given. The proposed model is presented using either one or a combination of simulated and real data sets.