A0163
Title: Fast inference for quantile regression with tens of millions of observations
Authors: Myung Hwan Seo - Seoul National University (Korea, South) [presenting]
Yuan Liao - Rutgers University (United States)
Youngki Shin - University of Technology Sydney (Australia)
Sokbae Lee - Columbia University (United States)
Abstract: Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, often not readily available. It is focused on linear quantile regression applied to ultra-large datasets, such as U.S. decennial censuses. A fast inference framework utilizing stochastic sub-gradient descent (S-subGD) updates is presented. The proposed test statistic is calculated fully online, and critical values are calculated without resampling. Extensive numerical studies are conducted to showcase the computational merits of the proposed inference. For inference problems as large as (n,d)(107,103), where n is the sample size and d is the number of regressors, the method generates new insights, surpassing current inference methods in computation. The method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations while controlling over 103 covariates to mitigate confounding effects.