View Submission - HiTECCoDES2023
A0160
Title: A comparison of two-sample tests for thematic prevalence under various topic models Authors:  Louisa Kontoghiorghes - Kings College London (United Kingdom) [presenting]
Ana Colubi - University of Giessen (Germany)
Abstract: A measure of the prevalence of specific topics in a corpus based on keywords has been previously introduced. Instead of directly estimating the frequencies of the keywords, the approach proposes applying topic modelling to extract the structure of the topic within the documents. Then, the prevalence is computed by averaging the frequencies of the keywords within the topics and considering the importance of the topics within the documents. Hypothesis tests to compare the keywords-based prevalence have also been proposed. Latent Dirichlet Allocation (LDA) was initially suggested as a topic modelling approach. Still, other alternatives, such as Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA) and Paragraph Vector Topic Model (PVTM), can be employed. The aim is to empirically compare the behaviour of such methods within this context.