B0773
Title: Clustering of variables around latent variables, with the R package ClustVarLV
Authors: Evelyne Vigneau - National College of veterinary medicine, food science and engineering (France) [presenting]
Veronique Cariou - National college of veterinary medicine food science and engineering (France)
El Mostafa Qannari - National college of veterinary medicine food science and engineering (France)
Abstract: The aim of clustering variables is to group a set of variables into homogeneous and distinct clusters and to identify the underlying structure of the data. In exploratory data analysis, such an approach may be very useful for interpreting complex problems, as it provides a dimension reduction. The clustering of variables around latent variables (CLV) approach, implemented in the ClustVarLV R package, aims at clustering numerical variables along with summarizing each group of variables by a latent component. Directional or local clusters of variables may be defined according to the type of the linear link to be investigated. Moreover, the latent variable associated with the clusters may be constrained to be linear combinations of external information. A new extension has also been implemented within ClustVarLV for the clustering of variables while setting aside atypical ones. In this scope, two strategies have been proposed: the first one consists in introducing an additional group of variables whereas, the second one consists in determining sparse components. This latter aspect will specifically be illustrated on the basis of real case studies.