Title: On simulating ultra high-dimensional multivariate data
Authors: Alfred Schissler - University of Nevada, Reno (United States) [presenting]
Alex Knudson - University of Nevada-Reno (United States)
Abstract: In this era of Big Data, it is critical to realistically simulate data to conduct informative Monte Carlo studies. This is often problematic when data are inherently multivariate while at the same time are (ultra-) high dimensional. This situation appears frequently in observational data found on online and in high-throughput biomedical experiments (e.g., RNA-sequencing). Due to the difficulty in simulating realistic correlated data points, researchers often resort to simulation designs that posit independence --- greatly diminishing the insight into the empirical operating characteristics of any proposed methodology. A major challenge lies in the computational complexity involved in simulating these massive multivariate constructions. We motivate and propose a fairly general, scalable procedure to simulate high-dimensional multivariate distributions with pre-specified marginal characteristics and a covariance matrix. We apply our method to simulate RNA-sequencing data sets with heterogeneous negative binomial marginals.