Simple Simultaneous Ensemble Learning in Genetic Programming
Learning ensembles by bagging can substantially improve the generalization performance of low-bias high-variance estimators, including those evolved by Genetic Programming (GP). Yet, the best way to learn ensembles in GP remains to be determined...
This work attempts to fill the gap between existing GP ensemble learning algorithms, which are often either simple but expensive, or efficient but complex. We propose a new algorithm that is both simple and efficient, named Simple Simultaneous Ensemble Genetic Programming (2SEGP). 2SEGP is obtained by relatively minor modifications to fitness evaluation and selection of a classic GP algorithm, and its only drawback is an (arguably small) increase of the fitness evaluation cost from the classic $mathcal{O}(n ell)$ to $mathcal{O}(n(ell + beta))$, with $n$ the number of observations and $ell$/$beta$ the estimator/ensemble size. Experimental comparisons on real-world datasets between supervised classification and regression show that, despite its simplicity, 2SEGP fares very well against state-of-the-art (ensemble and not) GP algorithms. We further provide insights into what matters in 2SEGP by (i) scaling $beta$, (ii) ablating the proposed selection method, (iii) observing the evolvability induced by traditional subtree variation.