Title: A new split criterion for classification trees with binary data
Authors: Abdulmajeed Alharbi - Durham University (United Kingdom) [presenting]
Frank Coolen - Durham University (UK)
Tahani Coolen-Maturi - Durham University (United Kingdom)
Abstract: Classification is a technique that is used to assign an observation to one of a set of predefined categories. Classification trees are considered to be one of the most popular approaches for classification. A new classification method is introduced, namely, the Direct Nonparametric Predictive Inference (D-NPI) classification for binary data. Nonparametric Predictive Inference (NPI) is a statistical method which uses few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. D-NPI is based on the NPI lower and upper probabilities without adding any further assumptions or information. A new procedure for building classification trees using the NPI method is presented. It uses a new split criterion, called correct indication, for constructing classification trees. Lower and upper probabilities of correct indication are provided using the NPI method for Bernoulli data. The aim is to maximize the probability of correct indication for a future observation. Imprecision, the difference between upper and lower probabilities is used as stopping criterion. An experiment is carried out to compare this new procedure with classical classification trees. Initial comparisons with alternative methods suggest that the D-NPI classification performs well and tends to lead to relatively small trees.