Differentiable decision trees and oblivious ensembles for tabular learning
NODE (Popov et al., 2019) introduces differentiable decision trees built from oblivious trees, enabling end-to-end gradient-based learning on tabular data.
Key Insight
Hard splits like if x < t are replaced with soft routing:
The temperature controls smoothness: low values approximate classic trees; higher values yield soft mixtures of paths.
Oblivious Decision Trees
An oblivious tree uses the same feature and threshold at every depth. Each level applies an identical split, producing a symmetric structure.
Benefits:
- Fewer parameters
- Efficient vectorization on GPUs
- Strong inductive bias for tabular data
NODE ensembles stack many such trees, each producing a weighted leaf prediction.
entmax vs softmax
Instead of softmax, NODE can use entmax, which yields sparse probabilities:
For , entmax interpolates between softmax (dense) and argmax (hard), improving interpretability and stability.
Interactive Visualization
Why Differentiable Trees?
- Train trees with backpropagation
- Combine tree structure with neural representations
- Smooth optimization landscape vs greedy splits
- Naturally ensemble-friendly
Comparison to Other Tabular Models
- GBDT: Strong but non-differentiable, trained greedily
- TabNet: Attention-based, less interpretable splits
- MLPs: Weak inductive bias for tabular data
- NODE: Tree bias + gradient learning
Key Papers
- Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data — Popov et al. (2019)
https://arxiv.org/abs/1909.06312 - Soft Decision Trees — Frosst & Hinton (2017)
- Deep Neural Decision Trees — Kontschieder et al. (2015)