A1034
 Title: Variational Bayesian semi-supervised keyword extraction
Authors:  Yaofang Hu - The University of Alabama (United States) [presenting]
Yichen Cheng - Georgia State University (United States)
Yusen Xia - Georgia State University (United States)
Xinlei Wang - University of Texas at Arlington (United States)
 Abstract: The expansion of textual data, stemming from various sources such as online product reviews and scholarly publications on scientific discoveries, has created a demand for the extraction of succinct yet comprehensive information. As a result, in recent years, efforts have been spent in developing novel methodologies for keyword extraction. Although many methods have been proposed to automatically extract keywords in the contexts of both unsupervised and fully supervised learning, how to effectively use a partial list of keywords, such as author-specified keywords and Twitter hashtags, remains an under-explored area. A novel variational Bayesian semi-supervised (VBSS) keyword extraction approach is proposed, built on a recent Bayesian semi-supervised (BSS) technique that uses the information from a small set of known keywords to identify previously undetected ones. The proposed VBSS method greatly enhances the computational efficiency of BSS via mean-field variational inference, coupled with data augmentation, which brings closed-form solutions at each step of the optimization process. Further, the numerical results show that VBSS method offers enhanced performance for long texts and improved control over false discovery rates when compared with a list of state-of-the-art keyword extraction methods.