CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1423
Title: Exploring outliers in compositional data with structural zeros Authors:  Karel Hron - Palacky University (Czech Republic) [presenting]
Matthias Templ - Vienna University of Technology (Austria)
Peter Filzmoser - Vienna University of Technology (Austria)
Abstract: The analysis of multivariate observations carrying relative information (aka compositional data) using the log-ratio approach is based on ratios between variables (compositional parts). Zeros in the parts thus cause serious difficulties for the analysis. This is a particular problem in presence of structural zeros, resulting from a structural process rather than from imprecision of a measurement device. Therefore, they cannot be simply replaced by a non-zero value as it is done, e.g. for values below detection limit or missing values. Instead, zeros have to be incorporated into further statistical processing. We lay the focus on exploratory tools for identifying outliers in compositional data sets with structural zeros. For this purpose, robust Mahalanobis distances are estimated; computed either directly for subcompositions determined by their zero patterns or by using imputation to improve the efficiency of the estimates. We proceed to the subcompositional and subgroup level. For this approach, new theory is formulated that allows to estimate covariances for imputed compositional data and to apply estimations on subgroups using parts of this covariance matrix. Moreover, the zero pattern structure is analyzed using PCA for binary data to achieve a comprehensive view of the overall multivariate structure of zeros. The proposed tools are applied to large-scale data from official statistics, where the need for an appropriate treatment of zeros is obvious.