View Submission

B0160

Title: Statistical model selection with Big Data Authors: David Hendry - University of Oxford (United Kingdom) [presenting]
Jurgen Doornik - Oxford University (United Kingdom)
Abstract: Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, selecting by inappropriate methods and tackling vast computations. Paramount considerations when searching for a data-based relationship using Big Data include the formulation problem of embedding underlying relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria; the selection problem of using good quality data on all variables, analyzed at tight significance levels by a powerful selection procedure, while retaining available theory insights; the evaluation problem of testing for relationships being well specified and invariant to shifts in explanatory variables, and the computational problem of using a viable approach to handling immense numbers of possible models. The last is especially important for the extended general-to-specific approach in Autometrics, but a feasible solution using mixing multiple block path searches while retaining theory insights is described.