CMStatistics 2023: Start Registration
View Submission - CFE
A0726
Title: Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data Authors:  Hee Cheol Chung - The University of North Carolina at Charlotte (United States) [presenting]
Yang Ni - Texas AM University (United States)
Irina Gaynanova - Texas A and M University (United States)
Abstract: Sequencing-based technologies provide an abundance of high-dimensional biological datasets with highly skewed and zero-inflated measurements. Classification of such data with linear discriminant analysis leads to poor performance due to the violation of the Gaussian distribution assumption. At the same time, different transformations designed to correct the distributional violations can lead to different results in classification accuracy and selected features, making interpretation dependent on the transformation choice. A new semiparametric framework is proposed for discriminant analysis based on the truncated latent Gaussian copula model to improve the classification performance and robust classification and feature selection concerning data transformations. The model accounts for both skewness and zero inflation, and the proposed estimation procedure ensures that the results are agnostic to monotone transformations of the data. By applying sparsity regularization, the proposed method leads to the consistent estimation of classification direction in high-dimensional settings. The method is applied to human gut microbiome data and breast cancer microRNA sequencing data to discriminate the disease status.