EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1174
Title: Measuring and comparing the thematic prevalence using a parametric and distribution-free bootstrap two-sample test Authors:  Louisa Kontoghiorghes - Kings College London (United Kingdom) [presenting]
Ana Colubi - University of Giessen (Germany)
Abstract: An approach has been introduced to measure the prevalence of specific subjects in a corpus using keywords. Instead of estimating the frequencies of the keywords directly, the method utilizes topic modelling to extract the structure of the topics within the documents. This enables the computation of the subject prevalence by averaging the frequencies of the keywords within the topics while taking into account the importance of the topics in the documents. Using a distribution-free bootstrap, a hypothesis test comparing the keyword-based prevalence has been proposed. An alternative parametric bootstrap test is proposed and compared to the existing test. It is applied to the sentiment analysis of Jane Austen's novels to demonstrate the methodology.