EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1038
Title: Operator-induced structural variable selection at scale: iBART and materials GWAS Authors:  Meng Li - Rice University (United States) [presenting]
Abstract: In the emerging field of materials informatics, a fundamental task is to identify physicochemically meaningful descriptors, or materials genes, which are engineered from primary features and a set of elementary algebraic operators through compositions. Such materials genome-wide association studies, or materials GWAS, pose unprecedented challenges to statistical analysis partly due to the astronomically large number of correlated predictors with limited sample size. This problem is formulated as a variable selection with operator-induced structure (OIS), and a new method is proposed to achieve unconventional dimension reduction by utilizing the geometry embedded in OIS. Although the model remains linear, nonparametric variable selection for effective dimension reduction is iterated. This enables variable selection based on ab initio primary features, leading to a method that is orders of magnitude faster than existing methods, with improved accuracy. To select the nonparametric module, a desired performance criterion is discussed that is uniquely induced by variable selection with OIS; in particular, a Bayesian Additive Regression Trees (BART)- based variable selection method, leading to iterative BART (iBART) is proposed to employ. Numerical studies show the superiority of the proposed method, which continues to exhibit robust performance when the input dimension is out of reach of existing methods. Applications to single-atom catalysis will be discussed.