CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1396
Title: "clustglm" and "clustord": R packages for clustering with covariates for binary, count, and ordinal data Authors:  Louise McMillan - Victoria University of Wellington (New Zealand) [presenting]
Daniel Fernandez - Universitat Politecnica de Catalunya, BarcelonaTech (UPC) (Spain)
Shirley Pledger - Victoria University of Wellington (New Zealand)
Richard Arnold - Victoria University of Wellington (New Zealand)
Ivy Liu - Victoria University Wellington (New Zealand)
Murray Efford - University of Otago (New Zealand)
Abstract: Two R packages are presented for model-based clustering with covariates. Both packages can perform clustering and biclustering (clustering observations and features simultaneously, for example). Both use likelihood-based methods for clustering, so users can compare models using AIC and BIC to assess relative goodness of fit. The models in both packages use linear predictor terms, so they look more like regression models than clustering models. This allows the inclusion of regression-style covariates alongside clustering effects. Both "clustglm" and "clustord" can include the effects of numerical or categorical covariates alongside cluster effects or can fit pattern-detection models that include individual-level effects alongside cluster effects. For example, when applied to presence/absence data, sites and species are clustered while also taking into account any single-species effects and any additional covariates."clustglm" is designed for binary and count data. It uses "glm" and can accommodate balanced and non-balanced designs. "clustord" is designed for ordinal categorical data. It can fit the proportional odds model or the ordered stereotype model, a more flexible model whose fitted parameters can reveal when two ordinal categories are effectively equivalent to each other. The use of "clustglm" and "clustord" is illustrated with ecological and survey datasets.