A0426
Title: LongFinBERT: A language model for long financial documents
Authors: Erik-Jan Senn - University of St. Gallen (Switzerland) [presenting]
Minh Tri Phan - University of St. Gallen (Switzerland)
Abstract: LongFinBERT, a modern language model specialized for processing long financial documents, is introduced. Due to an adaptation in model architecture, LongFinBERT demonstrates substantially lower computational requirements for lengthy documents compared to other state-of-the-art language models. This characteristic enables the processing of, e.g. an entire annual accounting filing at once, which was previously computationally infeasible for LMs. LongFinBERT is applied to two empirical settings. Firstly, the aim is to improve the detection of financial misreporting using text from 10-K filings from 1994 to 2018. Misreporting predictions that utilize text-based features from LongFinBERT outperform those based solely on accounting variables or other textual models, namely latent Dirichlet allocation, neural document embeddings, and FinBERT. Lastly, it is found that market returns respond to year-over-year alterations of accounting disclosures, measured using LongFinBERT.