A1100
 Title: Enhancing digital mental health assessments using text and acoustic features
Authors:  Jacky Ngai Lam Chan - The Hong Kong University of Science and Technology (Hong Kong) [presenting]
Amanda Chu - The Education University of Hong Kong (China)
Mike So - The Hong Kong University of Science and Technology (Hong Kong)
Benson Lam - The Hang Seng University of Hong Kong (Hong Kong)
Agnes Tiwari - Hong Kong Sanatorium and Hospital (Hong Kong)
Helina Yuk - The Chinese University of Hong Kong (Hong Kong)
 Abstract: An automatic speech analytics program (ASAP) is developed to detect psychosocial health issues from interviews conducted and recorded with 100 Cantonese-speaking family caregivers recruited for the study. Speech is analyzed using text and acoustic features. Text features convey the speech content, and the transcript was extracted using Google Cloud Speech API from audio tracks. As the text features contain lots of irrelevant information, the cross-validation (CV) method is employed to identify relevant text features for the given psychosocial instrument. The acoustic features contain emotional information about the caregivers. Popular signal processing techniques are applied, including Fourier Transform and spectral methods to extract acoustic features. After obtaining the textual and acoustic features, the two sets of features are combined using principal component analysis (PCA) to address the redundant information often carried by text and acoustic features from the same person, which can cause bias in the analysis. Redundancy and bias can be removed by merging highly correlated features into a set of uncorrelated features using PCA. Finally, a linear support vector machine (LSVM) is adopted to perform classification. The proposed method is applied to classify three different psychosocial instruments using the two sets of features. The correct classification rates of the 10-fold CV of these three instruments (CBI, BDI-II, FRAS) are 87\%, 80\%, and 91\%, respectively.