Title: A component-based approach for the clustering of multivariate categorical data
Authors: Michio Yamamoto - Okayama University, RIKEN AIP (Japan) [presenting]
Abstract: A novel model-based clustering procedure for multivariate categorical data is proposed. The proposed model assumes that each response probability has a low-dimensional representation of the cluster structure in the formulation of latent class analysis. This representation, which is constructed by weights for categorical variables and component scores for cluster representatives, allows us to interpret the latent cluster structure in the categorical data. In addition, we define low-dimensional scores for individuals as convex combinations of scores for cluster representatives. It is shown that the relation between the individual scores and response probabilities can be interpreted through a divergence measure. An expectation-maximization (EM) algorithm with gradient projection and coordinate descent is developed, and it is shown that there is trade-off relation between the convergence rate of the algorithm and the cluster recovery. The usefulness of the proposed model is shown by the analysis of molecular biology data.