COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0331
Title: Textual content and academic journals selectiveness: A case of economic journals Authors:  Pawel Baranowski - Institute of Economic and Financial Research, Lodz, Poland (Poland) [presenting]
Szymon Wojcik - University of Lodz (Poland)
Abstract: Currently observed vast influx of papers obstructs the editorial procedures in scientific journals. This phenomenon applies explicitly to top-quality academic journals with high scientific impact. Moreover, it stimulates the emergence of low- (or non-) selective journals, attracting authors with short editorial procedures in exchange for high fees. We argue that introducing natural language processing (NLP) can help distinguish the papers worth reading by the editor from those whose scientific quality does not meet the standards of the journal. To test this hypothesis, we apply state-of-art large language models, i.e. bidirectional encoder representations from transformers (BERT). Our sample consists of approximately 400 academic papers representing economics, finance or business. The papers were collected from journals of three levels of selectiveness, namely: highly selective (top-tier journals), moderately selective (journals listed on DOAJ list), and non-selective (``predatory'' journals). More specifically, we used a pre-trained Sci-BERT model on anonymized and pre-processed texts of academic papers. The results show that the pure textual content may give more than 80\% out-of-sample accuracy in classifying texts into the three levels of selectiveness. The outcomes of the study prove the usefulness of NLP in distinguishing the scientific quality of the paper and support Beall's classification of ``predatory'' journals.