CMStatistics 2016: Start Registration
View Submission - CMStatistics
B0333
Title: Outlier detection of high dimensional data via dual rotations Authors:  Jeongyoun Ahn - University of Georgia (United States) [presenting]
Myung Hee Lee - Cornell University (United States)
Jung Ae Lee - Washington University in Saint Louis (United States)
Hee Cheol Chung - University of Georgia (United States)
Abstract: Despite the popularity of high dimension, low sample size data analysis, little attention has been paid to the outlier detection problem. We propose a two-stage procedure to detect outliers for high dimensional data. The first step screens out pre-determined most outlying points one by one, based on the distance between each data vector and the affine space generated by the remaining data. At the second step we test whether each of the screened observations is significantly outlying or not. The reference values for the significance test are based on random rotations of the data in the ``dual" space. We show that the rotation procedure generates null data sets with the same volume as the original data, but without any outliers. High dimensional asymptotics is used to justify the proposed remoteness measure. The proposed method shows superior performance with various simulation settings compared to alternative approaches.