COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0302
Title: dataSDA and ggESDA: Two R packages for exploratory symbolic data analysis Authors:  Han-Ming Wu - National Chengchi University (Taiwan) [presenting]
Abstract: Exploratory Data Analysis (EDA) serves as a preliminary yet essential tool for summarizing the main characteristics of a dataset before appropriate statistical modeling can be applied. Quite often, EDA employs traditional graphical techniques such as boxplots, histograms, and scatterplots, and is equipped with various dimension reduction methods and computer-aided interactive functionalities. Over the years, data collected has become increasingly large and complex. Data descriptions have moved beyond single-value representations, encompassing intervals, histograms, and distributions. These are examples of the so-called symbolic data. In response to this development, we have created two R packages: dataSDA and ggESDA. The dataSDA package is designed to collect a diverse range of symbolic data and offers a comprehensive set of functions that facilitate the conversion of traditional data into the symbolic data format. These datasets can serve as benchmarks for evaluating symbolic data analysis methods. In addition, the package implements various R functions for computing symbolic descriptive statistics. The ggESDA package extends ggplot2 to offer a variety of plots specifically designed for exploratory symbolic data analysis. We will discuss how ggESDA is implemented. We will demonstrate its utility through the analysis of two real symbolic datasets found in dataSDA.