COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0297
Title: Choosing the number of topics in LDA models: A Monte Carlo comparison of selection criteria Authors:  Anna Staszewska-Bystrova - University of Lodz (Poland) [presenting]
Victor Bystrov - University of Lodz (Poland)
Viktoriia Naboka-Krell - Justus Liebig Unversity of Giessen (Germany)
Peter Winker - University of Giessen (Germany)
Abstract: Selecting the number of topics in LDA models is considered to be a difficult task, for which alternative approaches have been proposed. The performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be implemented in singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. The performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the DGPs are identified. Practical recommendations for LDA model selection in applications are derived.