CMStatistics 2021: Start Registration
View Submission - CMStatistics
B1597
Title: Liu after random forest: Application of machine learning methods in modeling high-dimensional chemical data Authors:  Mohammad Arashi - Ferdowsi University of Mashhad (Iran) [presenting]
Adewale F Lukman - Landmark University (Nigeria)
Zakariya Y Algamal - University of Mosul (Iraq)
Abstract: In the modern era, using advanced technology, we have access to data with many features and therefore feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high-dimensional data problems where the number of features exceeds the number of samples. We propose a novel, yet simple, strategy to estimate the regression parameters in a high-dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. We give a fast and straightforward algorithm to estimate the regression coefficients when multicollinearity exists. Numerical investigation reveals the superior performance of the method in prediction error. The technique is also applied to melting chemical data, where we conducted an estimation among 4885 features and discussed advantages.