EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0355
Title: Mixture conditional regression with ultrahigh dimensional text data for estimating extralegal factor effects Authors:  Jiaxin Shi - Central University of Finance and Economics (China) [presenting]
Fang Wang - Shangdong University (China)
Yuan Gao - Central University of Finance and Economics (China)
Xiaojun Song - Guanghua School of Management, Peking University (China)
Hansheng Wang - Peking University (China)
Abstract: Testing judicial impartiality is a problem of fundamental importance in empirical legal studies, for which standard regression methods have been popularly used to estimate the effects of extralegal factors. However, those methods cannot handle control variables with ultrahigh dimensionality, such as those found in judgment documents recorded in text format. To solve this problem, a novel mixture conditional regression (MCR) approach is developed, assuming that the whole sample can be classified into a number of latent classes. Within each latent class, a standard linear regression model can be used to model the relationship between the response and a key feature vector, which is assumed to be of a fixed dimension. Meanwhile, ultrahigh dimensional control variables are then used to determine the latent class membership, where a naive Bayes-type model is used to describe the relationship. Hence, the dimension of control variables is allowed to be arbitrarily high. A novel expectation-maximization algorithm is developed for model estimation. Therefore, the key parameters of interest are estimated as efficiently as if the true class membership were known in advance. Simulation studies are presented to demonstrate the proposed MCR method. A real dataset of Chinese burglary offences is analyzed for illustration purposes.