EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0468
Title: A Gaussian mixture model for multiple instance learning with partially subsampled instances Authors:  Baichen Yu - Central University of Finance and Economics (China) [presenting]
Xuetong Li - Central University of Finance and Economics (China)
Jing Zhou - Renmin University of China (China)
Hansheng Wang - Peking University (China)
Abstract: Multiple instance learning is a powerful machine learning technique, which is found useful when numerous instances can be naturally grouped into different bags. Accordingly, a bag-level label can be created for each bag according to whether the instances contained in the bag are all negative or not. Thereafter, how to train a statistical model with bag-level labels with/without partially labelled instances becomes a problem of great interest. To this end, a Gaussian mixture model (GMM) framework is developed to describe the stochastic behavior of the instance-level feature vectors. Both the instance-based maximum likelihood estimator (IMLE) and the bag-based maximum likelihood estimator (BMLE) are theoretically investigated. It is found that the statistical efficiency of the IMLE could be much better than that of the BMLE, if the instance-level labels are relatively hard to be predicted. To fix the problem, a subsampling-based maximum likelihood estimation (SMLE) approach is developed, where the instance-level labels are partially provided through careful subsampling. This leads to a significantly reduced labeling cost with little sacrifice in terms of statistical efficiency. Extensive simulation studies are presented to demonstrate the finite sample performance. A real data example using whole-slide images (WSIs) to diagnose metastatic breast cancer is illustrated.