CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0781
Title: Interactively resolving distortion in nonlinear dimensionality reduction of biomedical data Authors:  Kris Sankaran - University of Wisconsin (United States) [presenting]
Abstract: Nonlinear dimensionality reduction is a key step in many biomedical analysis workflows. For example, when working with text embeddings from pretrained protein language models or when exploring single-cell gene expression measurements, researchers routinely apply UMAP to organize the high-dimensional source data into a more manageable low-dimensional representation. Such nonlinear dimensionality can be powerful, but it inevitably introduces distortion. A growing body of work has demonstrated that this distortion can have serious consequences for downstream interpretation, for example, suggesting clusters that do not exist in the original data. Motivated by these developments, a visual interface is designed that helps to identify where these distortions are most severe and supports interaction to locally resolve them. Though the design and interaction are relatively straightforward, it is found through case studies from single-cell genomics and microbiome data analysis that they can enable more accurate interpretations than more traditional visualization methods, which do not show distortion. It helps researchers who apply nonlinear dimensionality reduction methods address concerns they may have about the reliability of their embeddings and proceed with confidence in their data analysis.