Title: Classification trees with NPI-based thresholds
Authors: Masad Alrasheedi - Durham university (United Kingdom) [presenting]
Tahani Coolen-Maturi - Durham University (United Kingdom)
Frank Coolen - Durham University (UK)
Abstract: Nonparametric Predictive Inference (NPI) is a statistical method which uses few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. Applications of NPI for building classification trees where the inference itself is based on future observations are presented. When building classification trees, it is necessary to handle continuous attributes and select the optimal thresholds for each continuous variable. One technique that is commonly used to find the optimal thresholds is the C4.5 algorithm. This algorithm suggests sorting the data set and calculates the entropy for all midpoints between consecutive values. At each level, C4.5 chooses the attribute that maximizes entropy. However, classical methods usually do not focus on future observations, only on the data at hand where the attribute value of observation is known. A new method for selecting the optimal thresholds by using the NPI approach is presented. Moreover, the classical approaches often choose the split variables by maximising expected entropy. A new technique is introduced by using the NPI approach. In this technique, the full range of expected entropy is taken for each variable. Initial comparisons of the new approach with alternative methods indicate good classification performance, and the resulting trees are relatively small.