Sequential tree ensembles optimized via gradient descent
Gradient Boosted Decision Trees (GBDT) are an ensemble method that builds decision trees sequentially, where each new tree is trained to correct the errors of the current ensemble.
Core Idea: Gradient Descent in Function Space
GBDT performs gradient descent over functions rather than parameters. We iteratively improve a model by adding a new function (tree) that points in the direction of steepest loss reduction.
At iteration :
- is a regression tree
- is the learning rate
Loss and Gradients
For squared error loss:
The negative gradient (residual) is:
Each tree is trained to predict these residuals.
Interactive Visualization
The demo below shows:
- Data points and ensemble prediction curve
- Residuals as vertical lines
- Individual trees contributing to the final model
- Loss decreasing as trees are added
Trees: 1
Learning rate: 0.30
Gray lines show residuals; curve is ensemble prediction.
Key Hyperparameters
- learning_rate (η) – step size for each tree; smaller values improve generalization
- n_estimators – number of trees in the ensemble
- max_depth – controls tree complexity and interaction order
GBDT vs Random Forest
| Aspect | GBDT | Random Forest |
|---|---|---|
| Training | Sequential | Parallel |
| Bias–Variance | Low bias | Low variance |
| Optimization | Gradient-based | Bagging |
| Overfitting | Controlled via learning rate | Controlled via averaging |
Popular Implementations
- XGBoost – https://arxiv.org/abs/1603.02754
- LightGBM – https://arxiv.org/abs/1706.02677
- CatBoost – https://arxiv.org/abs/1706.09516
Widely available in:
xgboostlightgbmcatboostsklearn.ensemble.GradientBoosting*
When to Use GBDT
- Tabular data with mixed feature types
- Medium-sized datasets
- When strong performance and interpretability matter
Found an error or want to contribute?
Edit on GitHub