COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0164
Title: Subject prevalence in documents based on topic modeling Authors:  Ana Colubi - University of Giessen (Germany) [presenting]
Louisa Kontoghiorghes - Kings College London (United Kingdom)
Abstract: A metric to quantify the relevance of specific subjects within a text is considered. The metric can be used to track the evolution of a subject in a series of documents or to measure the statistical impact of a given text in related literature. To this aim, text mining tools are combined with Bayesian and frequentist statistical methods. First, topic modeling is suggested to be employed to identify relevant topics. The derived models are used to quantify the relative importance of a subject defined through a given set of terms, or keywords, by employing Bayesian techniques. Then, bootstrap two-sample tests are proposed to compare subjects' prevalence in two groups of documents. Illustrative empirical results are provided.