CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0640
Title: Manifold random forests for decoding EEG data and estimating mutual information Authors:  Adam Li - Columbia University (United States) [presenting]
Ronan Perry - University of Washington (United States)
Chester Huynh - Microsoft (United States)
Joshua Vogelstein - Johns Hopkins University (United States)
Abstract: Quantitative understanding of biomarkers for mental health requires robust models that can handle high-dimensional and structured data. Moreover, it is important to be able to conduct valid statistical inference to determine for example if two potential biomarkers of mental health are conditionally independent. Decision forests, including random forests, have solidified themselves in the past couple of decades as a powerful ensemble learning method in supervised settings, including both classification and regression. Beyond just prediction, one may be interested in hypothesis testing and estimation of information-theoretic quantities, such as conditional mutual information. Decision trees can be used as semi-parametric models for a wide variety of tasks. Decision trees are demonstrated to be used to model manifolds and apply them to high-dimensional EEG data. Manifold oblique random forests (MORF) are used to analyze an intracranial EEG (iEEG dataset containing subcortical and cortical brain recordings from epilepsy subjects undergoing iEEG monitoring for clinical purposes. In addition, honest random forests are leveraged to estimate mutual information. All the models are implemented in a package, called scikit-tree, which is scikit-learn compatible and leverages Cython and C++ for scalability.