A0944
Title: Beta-CoRM binary matrix factorization
Authors: Jose Perusquia - Universidad Nacional Autonoma de Mexico (Mexico) [presenting]
Jim Griffin - University College London (United Kingdom)
Cristiano Villa - Duke Kunshan University (China)
Abstract: Binary data arises naturally across diverse fields, including psychology, natural language processing, and computer science. These datasets are often high-dimensional, with many features per observation, motivating the need for compact, low-dimensional representations. Traditional variable selection techniques may discard important information, whereas matrix factorization methods provide a more flexible alternative by uncovering latent structures. The aim is to propose a Bayesian nonparametric binary matrix factorization model tailored to grouped binary data, a setting often overlooked in the literature. The model assumes that each binary feature is driven by a finite set of latent traits. Two constraints are imposed: One matrix encodes the presence or absence of latent traits as binary values, while the other consists of non-negative entries constrained to the unit interval. A key innovation is the use of a link function that introduces a second layer of binary latent indicators, modulating the generation of observed binary features. This hierarchical structure allows for interpretable and flexible modelling of complex binary datasets.