CMStatistics 2021: Start Registration
View Submission - CMStatistics
B0577
Title: Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data Authors:  Weichang Yu - University of Melbourne (Australia) [presenting]
Sara Wade - University of Edinburgh (United Kingdom)
Howard Bondell - University of Melbourne (Australia)
Lamiae Azizi - University of Sydney (Australia)
Abstract: High-dimensional classification and feature selection problems are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the analysed data are often functional and have a complex structure. The high dimensionality of the data coupled with the correlation structure poses serious challenges to the data analysis. Many existing statistical and machine learning models either fit the data poorly or suffer from a lack of model interpretability. We propose a novel Bayesian discriminant analysis-based model that addresses these challenges in a unified framework and performs variable selection simultaneously. The model is a two-layer non-stationary Gaussian process to model the complex high-dimensional observations coupled with an Ising prior to identify differentially-distributed locations. The model inference scalability is achieved via developing a variational scheme that exploits advances in the use of sparse structures covariance matrices. We show the performance of our proposed model in simulated datasets and various proteomics-related mass spectrometry real datasets (breast cancer and SARS-CoV-2). Moreover, we demonstrate how the output from our proposed model may be used to address scientific hypotheses, offering explainability as well as uncertainty quantification, which are crucial to increase trust and social acceptance of data-driven tools.