CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0549
Title: Projection pursuit for big data Authors:  Yajie Duan - Rutgers University (United States) [presenting]
Javier Cabrera - Rutgers University (United States)
Abstract: Visualization of extremely large datasets in static or dynamic form is a huge challenge because most traditional methods cannot deal with big data problems. A new visualization method for big data is proposed based on projection pursuit, guided tour and data nuggets methods, that will help display interesting hidden structures such as clusters, outliers and other nonlinear structures in big data. The guided tour is a graphical tool for high-dimensional data that displays a dynamic sequence of low-dimensional projections obtained by using projection pursuit (PP) index functions to navigate the data space. Different projection pursuit (PP) indices have been developed to detect interesting structures of multivariate data but there are computational problems for big data using the original guided tour with these indices. A new PP index is developed to be computable for big data, with the help of a data compression method called data nuggets that reduces large datasets while maintaining the original data structure. Simulation studies are conducted and a large dataset is used to illustrate the proposed methodology. Static and dynamic graphical tools for big data can be developed based on the proposed PP index to detect nonlinear structures.