A0336
Title: Seeded Poisson factorization topic models with covariates
Authors: Bernd Prostmaier - BMW AG (Germany)
Bettina Gruen - WU Vienna University of Economics and Business (Austria) [presenting]
Paul Hofmarcher - University Salzburg (Austria)
Abstract: Topic models infer latent structures in text corpora to guide data-driven detection of themes and to cluster or group documents. The basic topic model only requires the transformation of the documents in the corpus into a document-term matrix to perform inference based on either the latent Dirichlet allocation or the Poisson factorization models. Many extensions have been proposed and considered to improve the insights gained in applications and allow for the inclusion of additional information, such as the inclusion of seed words to guide topic discovery or covariates to infer, for example, associations between document characteristics and topic distributions. Many of these extensions build on the latent Dirichlet allocation model, with, for example, keyATM including seed words and the structural topic model allowing for covariates to be taken into account. The focus is on the Poisson factorization model, and seeded Poisson factorization is extended to include covariates that drive topic distributions of documents. The estimation is investigated using variational inference to allow for large-scale performance and empirically assess the performance of applying the model, including tools for suitable post-processing and model inspection.