COMPSTAT 2022: Start Registration
View Submission - COMPSTAT2022
A0642
Title: LEFDA: An extension of the classical LDA Authors:  Alice Giampino - University of Milano-Bicocca (Italy) [presenting]
Roberto Ascari - University of Milano-Bicocca (Italy)
Sonia Migliorati - University of Milano Bicocca (Italy)
Abstract: Latent Dirichlet Allocation (LDA) is a popular statistical tool for the analysis of text documents when the goal is detecting latent topics. A well-known limitation of the LDA is its inability to model positive correlations between topics. This is attributable to the stiffness of the Dirichlet distribution, which is the standard prior for the topic distributions. The aim is to perform a preliminary study of the extended flexible Dirichlet (EFD) as an alternative prior. The latter is a generalization of the Dirichlet distribution defined as a particular structured mixture allowing for positive correlations between its elements. The EFD distribution retains many good theoretical properties of the Dirichlet one, such as identifiability and also explicit expressions of joint moments and closure under many relevant operations on the simplex. Furthermore, the introduction of additional parameters establishes more flexibility, while still maintaining the interpretability of the model, as well as conjugacy with respect to the multinomial model. The generalization of the LDA based on the EFD distribution is illustrated via an application to real data using Markov Chain Monte Carlo (MCMC) methods.