Title: Nowcasting payroll employment with traditional media content
Authors: Clement Bortoli - INSEE (France)
Stephanie Combes - INSEE (France)
Thomas Renault - Université Paris 1 Panthéon-Sorbonne (France) [presenting]
Abstract: Flash payroll employment statistics in France are published quarterly, with a delay of 45 days after the end of the quarter. In order to ``predict the present'', forecasters mainly rely on business tendency/consumer confidence surveys. Building on findings from other fields of research that identify that value-relevant information can be extracted from content published on traditional newspapers, we contemplate media content as an complementary source of data to improve the forecast of French employment at different time horizon. Features are extracted from a large sample of 1,354,100 articles published in the newspaper ``Le Monde'' between 1990 and 2016. We first adopt a simple ``bag-of-words'' representation using the frequency of occurrence of each word in the content published during a month. However, a simple bag-of-words approach fails to capture words polysemy and synonymy, as the context in which a word is used is not taken into account. In order to solve this issue, we also consider more advanced text analytics methods using a continuous bag-of words approach and probabilistic topics identification approach (Latent Dirichlet Allocation). Eventually, penalized regressions are mobilized to select the most relevant features with respect to forecasts performances.