-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path52_gradient_boosting_ensemble.py
More file actions
32 lines (22 loc) · 1.35 KB
/
52_gradient_boosting_ensemble.py
File metadata and controls
32 lines (22 loc) · 1.35 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Gradient boosting is a framework for boosting ensemble algorithms
# and an extension to AdaBoost.
# It re-frames boosting as an additive model under a statistical framework
# and allows for the use of arbitrary loss functions to make it more flexible
# and loss penalties (shrinkage) to reduce overfitting.
# Gradient boosting also introduces ideas of bagging to the ensemble members,
# such as sampling of the training dataset rows and columns, referred to as
# stochastic gradient boosting.
# It is a very successful ensemble technique for structured or tabular data,
# although it can be slow to fit a model given that models are added sequentially.
# More efficient implementations have been developed, such as the popular extreme
# gradient boosting (XGBoost) and light gradient boosting machines (LightGBM).
from numpy import mean, std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score, RepeatedStratifiedKFold
from sklearn.ensemble import GradientBoostingClassifier
X,Y = make_classification(random_state=1)
model = GradientBoostingClassifier(n_estimators=50)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model, X, Y, cv=cv, scoring='accuracy', n_jobs=1)
# report ensemble performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))