A1426
Title: Clustering multivariate discrete data with partial records
Authors: Kevin Giddings - Carleton University (Canada) [presenting]
Isen McDonald - Carleton University (Canada)
Utkarsh Dang - Carleton University (Canada)
Sanjeena Dang - Carleton University (Canada)
Abstract: The ability to cluster data with incomplete records is vital in many disciplines. A model-based clustering approach is developed for clustering multivariate discrete data with missing entries using a mixture of multivariate Poisson lognormal distributions. A multivariate Poisson lognormal distribution is a hierarchical Poisson distribution that can account for overdispersion and can model the correlation between the variables. To illustrate the effectiveness of this method, extensive simulation studies are conducted under varying levels of missingness and types of missing data, to evaluate the robustness of this new method under different percentages of incomplete records and missing data patterns. In addition, the approach is performed on a complete zebrafish RNA sequencing dataset. The results obtained from this complete data clustering problem are compared to the performance when some of the count values are artificially omitted. Through this, the method is shown to achieve similar clustering performance between complete and incomplete datasets.