Skip to content

Commit 15c71a7

Browse files
committed
nick blog on GBT
1 parent 9d3dbf4 commit 15c71a7

File tree

6 files changed

+40
-0
lines changed

6 files changed

+40
-0
lines changed

.DS_Store

0 Bytes
Binary file not shown.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: "Understanding Gradient Boosting as a Gradient Descent"
3+
date: June 6, 2019
4+
categories:
5+
- Technical
6+
tags:
7+
- Gradient boosting
8+
9+
featured-image: gdbt.png
10+
11+
postauthors:
12+
- name: Nicolas Hug
13+
website: https://github.com/NicolasHug
14+
image: nicolas_hug.jpg
15+
usemathjax: true
16+
---
17+
<div>
18+
<img src="/blog/assets/images/posts_images/{{ page.featured-image }}" alt="">
19+
{% include postauthor.html %}
20+
</div>
21+
22+
There are a lot of resources online about gradient boosting, but not many of them explain how gradient boosting relates to gradient descent. This post is an attempt to explain gradient boosting as a (kinda weird) gradient descent.
23+
24+
I’ll assume zero previous knowledge of gradient boosting here, but this post requires a minimal working knowledge of gradient descent.
25+
26+
__Let’s get started!__
27+
28+
For a given sample $$ \mathbf{x}_i $$, a gradient boosting regressor yields
29+
predictions with the following form:
30+
31+
$$ \hat{y}_i = \sum_{m = 1}^{\text{n_iter}} h_m(\mathbf{x}_i), $$
32+
33+
where each $$ h_m $$ is an instance of a base estimator (often called weak learner, since it usually does not need to be extremely accurate). Since the base estimator is almost always a decision tree, I’ll abusively use the term GBDT (Gradient Boosting Decision Trees) to refer to gradient boosting in general.
34+
35+
Each of the base estimators $$ h_m $$ isn’t trying to predict the target $$ y_i $$. Instead, the base estimators are trying to predict gradients. This sum $$ \sum_{m = 1}^{\text{n_iter}} h_m(\mathbf{x}_i) $$ is actually performing a gradient descent.
36+
37+
Specifically, it’s a gradient descent in a functional space. This is in contrast to what we’re used to in many other machine learning algorithms (e.g. neural networks or linear regression), where gradient descent is instead performed in the parameter space. Let’s review that briefly.
38+
39+
Read the full blog post on Nicolas' blog:
40+
<span style="background-color: #CAE9F5;"> [Understanding Gradient Boosting as a gradient descent](http://nicolas-hug.com/blog/gradient_boosting_descent) </span>

assets/.DS_Store

0 Bytes
Binary file not shown.

assets/images/.DS_Store

0 Bytes
Binary file not shown.
59.2 KB
Loading

assets/images/posts_images/gbdt.png

31.6 KB
Loading

0 commit comments

Comments
 (0)