CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0876
Title: Spherical double k-means Authors:  Ilaria Bombelli - Istat (Italy)
Maurizio Vichi - University La Sapienza, Rome (Italy)
Emiliano Seri - University of Rome Tor Vergata (Italy) [presenting]
Stella Iezzi - University of Rome Tor Vergata (Italy)
Abstract: The spherical double k-means (SDKM) clustering method is introduced for text data. A novel approach for simultaneous clustering of terms and documents. Using the strengths of k-means, double k-means, and spherical k-means, SDKM addresses the challenges of high dimensionality, noise, and sparsity inherent in text analysis. The choice of the number of clusters is addressed, both for the words and documents, using the cluster validity index pseudo-F, and the reliability of the method is verified through simulation studies. SDKM is applied to the corpus of US presidential inaugural addresses, spanning from George Washington in 1789 to Joe Biden in 2021. The analysis reveals distinct clusters of words and documents that correspond to significant historical themes and periods, showcasing the method's ability to facilitate a deeper understanding of the data. Findings demonstrate the efficacy of SDKM in uncovering underlying patterns in textual data.