Title: Correcting the bias of economic aggregates that is caused by classification errors
Authors: Quinten Meertens - University of Amsterdam (Netherlands) [presenting]
Cees Diks - University of Amsterdam (Netherlands)
Jaap van den Herik - Leiden University (Netherlands)
Frank Takes - Leiden University (Netherlands)
Abstract: In economic statistics, estimated aggregates are often based on underlying classifications. If the class labels are predicted by a classification algorithm, the data may contain classification errors. It occurs, for example, when social media data are used to estimate the number of people that will vote for a political candidate. The focus in on the effect of classification errors on the accuracy of estimated aggregates (such as counts). The first finding was that even highly accurate classification algorithms might result in relatively strongly biased aggregates. Then, we developed novel methods to correct that bias, making more effective use of accuracy data such as estimated precision and recall (or estimated type I and type II error rates). The new methods are shown to have serious implications for a wide range of applications in economics and machine learning, including e-commerce estimates, land use statistics, epidemiology and elections predictions. Currently, we look into the potential of ranking over classification and algorithm-specific bias corrections. The aim is to develop bias corrections at the micro-level that lead to a minimized mean-squared error on the aggregate level.