EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A0714
Title: Big data reduction based on information matrix: Algorithm development and application in finance Authors:  William Li - Shanghai Jiao Tong University (China) [presenting]
Abstract: The need to analyze large amounts of data without losing information is evidenced by the recent increase in attention to the information-based optimal sub-data selection(IBOSS) approach. However, there are no systematic explorations of this framework, including characterizing the optimal subset when the model is more complex than first-order linear models. Motivated by a real finance case study on the effect of corporate attributes on firm value, the framework and steps required to use IBOSS are systematically explored for data reduction. In the context of second-order models, a novel algorithm is developed for selecting informative sub-data. The performance of the proposed algorithm is also evaluated in terms of prediction and variable selection, the latter of which is important for complex models but has not received sufficient attention in the IBOSS field. Empirical studies demonstrate that the proposed algorithm adequately addresses the trade-off between computation complexity and statistical efficiency, one of six core research directions for theoretical data science research proposed by the US National Science Foundation. The case study demonstrates the potential effect of the IBOSS strategy in scientific fields beyond statistics, particularly finance.