evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
Main Article Content
Abstract
Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the evtree package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the partykit package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. evtree is compared to the open-source CART implementation rpart, conditional inference trees (ctree), and the open-source C4.5 implementation J48. A benchmark study of predictive accuracy and complexity is carried out in which evtree achieved at least similar and most of the time better results compared to rpart, ctree, and J48. Furthermore, the usefulness of evtree in practice is illustrated in a textbook customer classification task.