CMStatistics 2020: Start Registration
View Submission - CMStatistics
B1160
Title: Simulating high-dimensional multivariate data using the bigsimr R package Authors:  Alfred Schissler - University of Nevada, Reno (United States) [presenting]
Anna Panorska - University of Nevada (United States)
Tomasz Kozubowski - University of Nevada Reno (United States)
Alex Knudson - University of Nevada-Reno (United States)
Juli Petereit - University of Nevada Reno (United States)
Abstract: It is critical to simulate data when conducting Monte Carlo studies and methods realistically. But measurements are often correlated and high dimensional in this era of big data, such as data obtained through high-throughput biomedical experiments. Due to computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers often resort to simulation designs that posit independence. This greatly diminishes insights into the empirical operating characteristics of any proposed methodology, such as false-positive rates, statistical power, interval coverage, and robustness. This talk introduces the bigsimr R package that provides a general, scalable procedure to simulate high-dimensional random vectors with given marginal characteristics and dependency measures. We'll describe the functions included in the package, including multi-core and graphical-processing-unit accelerated algorithms to simulate random vectors, estimate correlation matrices, and find close positive semi-definite matrices. Finally, we showcase the power of bigsimr by applying these functions to our motivating dataset --- RNA-sequencing data obtained from breast cancer tumor samples with sample size $n=1212$ patients and dimension $d>1000$.