View Submission - HiTECCoDES2024
A0176
Title: Selective pivot log-ratio coordinates for classification in high-dimensional compositional data Authors:  Karel Hron - Palacky University (Czech Republic) [presenting]
Nikola Stefelova - Palacky University Olomouc (Czech Republic)
Julie de Sousa - Palacky University Olomouc (Czech Republic)
Javier Palarea-Albaladejo - University of Girona (Spain)
Dana Dobesova - Palacky University Olomouc (Czech Republic)
Ales Kvasnicka - Palacky University Olomouc (Czech Republic)
David Friedecky - Palacky University and University Hospital Olomouc (Czech Republic)
Abstract: Data from high-throughput biological experiments are often of a relative nature. This means that the most relevant information lies in the shape of the data distribution over the biological features rather than in the size of the measurements themselves. A well-established way to account for this in statistical processing is the log-ratio methodology of compositional data. Selective pivot log-ratio coordinates are introduced as a new type of orthonormal log-ratio coordinate representation for high-dimensional compositional data. This proposal aims to enhance the identification of biomarkers in the context of binary classification problems, which is a common setting of scientific studies in this field. These log-ratio coordinates are constructed such that the pivot coordinate representing a given compositional part aggregates all pairwise log-ratios of that part with the rest but, unlike in the usual formulation, excludes those that deviate from the main pattern. This novel coordinate system is embedded in a partial least squares discriminant analysis (PLS-DA) model for practical application. Using both synthetic and real-world metabolomic datasets, we demonstrate the enhanced performance of the novel approach compared to other methods used in the field.