View Submission

B0645

Title: Clustering with missing data using normal-scale mixture models Authors: Cristina Tortora - San Jose State University (United States) [presenting]
Abstract: Cluster analysis is an unsupervised data analysis technique whose goal is to group the data into homogeneous clusters. Model-based clustering, one of the most used techniques, assumes that the data are generated from a convex combination of distributions. The choice of the distributions is crucial. The normal distribution, which is often used, has limitations in terms of flexibility. It assumes symmetric clusters and it is affected by outlying observations. Several other heavy-tailed distributions can be used, many of which can be obtained as normal-scale mixtures. Specifically, a link and a weight function determine the shape of the new distribution which still maintains symmetry. One of the obtainable distributions is the multivariate Student-t. It illustrates how to treat data missing at random when using these distributions.