Title: Bayesian topic regression: An econometric model for inference with heterogeneous high dimensional data
Authors: Julian Ashwin - University of Oxford (United Kingdom) [presenting]
Maximillian Ahrens - University of Oxford (United Kingdom)
Abstract: When incorporating text data into econometric models, a fundamental question is how to represent and select text features. Dictionaries and other unsupervised feature selection approaches have been widely used in financial and economic modelling to provide interpretable measures of interest from text data. However these methods are often not optimised for the research question at hand, since feature selection is performed separately from subsequent econometric analysis, thereby potentially discarding information relevant to the inference task. We combine a supervised LDA topic model with a multivariate Bayesian regression framework, allowing us to simultaneously perform feature extraction and parameter estimation. This has several advantages over existing supervised feature selection methods. First, our Bayesian approach allows inference on the coefficients of the regression model which takes into account the sampling uncertainty involved in the text feature estimation. Second, by estimating coefficients on text and non-text covariates jointly, it respects the Frisch-Waugh-Lovell Theorem which prevents simply using a residualised dependent variable. Third, we are able to use information from observations without text documents. Finally, our model allows meaningful inference in situations of a very high dimensional feature space, but a relatively small number of observations. We demonstrate this on synthetic data and on central bank communication data.