EcoSta 2024: Start Registration
View Submission - EcoSta2024
A1061
Title: Double weighting scheme for k-nearest neighbors for binary classification of high-dimensional gene expression data Authors:  Zardad Khan - United Arab Emirates University (United Arab Emirates) [presenting]
Saeed Aldahmani - United Arab Emirates University (United Arab Emirates)
Amjad Ali - United Arab Emirates University (United Arab Emirates)
Hailiang Du - Durham University (United Kingdom)
Abstract: The accurate classification of tissue samples in high-dimensional gene expression datasets can be challenging due to the large number of genes, many of which do not significantly contribute to the classification. To address this issue, a new method called double-weighted k-nearest neighbors (DW-k-NN) is introduced. This method is specifically designed for gene expression data and incorporates feature weights that are derived from the differential expression of genes between classes. By using an exponential function to calculate estimated weighted distances, informative features have a greater impact, while less or non-informative features have a decreased impact. The test point is assigned the class label with the largest sum of outputs for both classes separately. DW-k-NN aims to achieve robust classification results for high-dimensional gene expression datasets by considering the proposed weighting scheme based on genes' capability to express differentially. Experimental evaluations have demonstrated the effectiveness of DW-k-NN in accurately classifying gene expression datasets when compared to several other k-NN-based methods. Overall, DW-k-NN presents a promising approach to gene expression data analysis through the two-fold weighted distance calculation strategy.