You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Stacking concept + Pictures + Stacking implementation from scratch](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb)
38
-
* Examples:
38
+
* Examples (all examples are valid for both API with little [difference in parameters](https://github.com/vecxoz/vecstack#21-how-do-parameters-of-stacking-function-and-stackingtransformer-correspond)):
4.[What is stacking?](https://github.com/vecxoz/vecstack#4-what-is-stacking)
114
114
5.[What about stacking name?](https://github.com/vecxoz/vecstack#5-what-about-stacking-name)
115
115
6.[Do I need stacking at all?](https://github.com/vecxoz/vecstack#6-do-i-need-stacking-at-all)
116
-
7.[Can you explain stacking in 10 lines of code?](https://github.com/vecxoz/vecstack#7-can-you-explain-stacking-in-10-lines-of-code)
116
+
7.[Can you explain stacking (stacked generalization) in 10 lines of code?](https://github.com/vecxoz/vecstack#7-can-you-explain-stacking-stacked-generalization-in-10-lines-of-code)
117
117
8.[Why do I need complicated inner procedure for stacking?](https://github.com/vecxoz/vecstack#8-why-do-i-need-complicated-inner-procedure-for-stacking)
118
-
9.[I want to implement stacking from scratch. Can you help me?](https://github.com/vecxoz/vecstack#9-i-want-to-implement-stacking-from-scratch-can-you-help-me)
118
+
9.[I want to implement stacking (stacked generalization) from scratch. Can you help me?](https://github.com/vecxoz/vecstack#9-i-want-to-implement-stacking-stacked-generalization-from-scratch-can-you-help-me)
119
119
10.[What is OOF?](https://github.com/vecxoz/vecstack#10-what-is-oof)
120
120
11.[What are *estimator*, *learner*, *model*?](https://github.com/vecxoz/vecstack#11-what-are-estimator-learner-model)
121
121
12.[What is *blending*? How is it related to stacking?](https://github.com/vecxoz/vecstack#12-what-is-blending-how-is-it-related-to-stacking)
122
-
13.[How to optimize weights for blending?](https://github.com/vecxoz/vecstack#13-how-to-optimize-weights-for-blending)
123
-
14.[What is better: *blending* (weighted average) or *stacking* (2nd level model)?](https://github.com/vecxoz/vecstack#14-what-is-better-blending-weighted-average-or-stacking-2nd-level-model)
122
+
13.[How to optimize weights for weighted average?](https://github.com/vecxoz/vecstack#13-how-to-optimize-weights-for-weighted-average)
123
+
14.[What is better: weighted average for current level or additional level?](https://github.com/vecxoz/vecstack#14-what-is-better-weighted-average-for-current-level-or-additional-level)
124
124
15.[What is *bagging*? How is it related to stacking?](https://github.com/vecxoz/vecstack#15-what-is-bagging-how-is-it-related-to-stacking)
125
125
16.[How many models should I use on a given stacking level?](https://github.com/vecxoz/vecstack#16-how-many-models-should-i-use-on-a-given-stacking-level)
126
126
17.[How many stacking levels should I use?](https://github.com/vecxoz/vecstack#17-how-many-stacking-levels-should-i-use)
@@ -162,21 +162,25 @@ Just give me a star in the top right corner of the repository page.
162
162
163
163
### 4. What is stacking?
164
164
165
-
Stacking is a machine learning ensembling technique.
165
+
Stacking (stacked generalization) is a machine learning ensembling technique.
166
166
Main idea is to use predictions as features.
167
-
More specifically we predict train set (in CV-like fashion) and test set using some 1st level model(s), and then use these predictions as features for 2nd level model. You can find more details (concept, pictures, code) in [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb). Also check out Wikipedia article about [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking).
167
+
More specifically we predict train set (in CV-like fashion) and test set using some 1st level model(s), and then use these predictions as features for 2nd level model. You can find more details (concept, pictures, code) in [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb).
168
+
Also make sure to check out:
169
+
*[Ensemble Learning](https://en.wikipedia.org/wiki/Ensemble_learning) ([Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) in Wikipedia
*[Stacked Generalization](https://www.researchgate.net/publication/222467943_Stacked_Generalization) paper by David H. Wolpert
168
172
169
173
### 5. What about stacking name?
170
174
171
-
Sometimes it is also called *stacked generalization*. The term is derived from the verb *to stack* (to put together, to put on top of each other). It implies that we put some models on top of other models, i.e. train some models on predictions of other models. From another point of view we can say that we stack predictions in order to use them as features.
175
+
Often it is also called *stacked generalization*. The term is derived from the verb *to stack* (to put together, to put on top of each other). It implies that we put some models on top of other models, i.e. train some models on predictions of other models. From another point of view we can say that we stack predictions in order to use them as features.
172
176
173
177
### 6. Do I need stacking at all?
174
178
175
179
It depends on specific business case. The main thing to know about stacking is that it requires ***significant computing resources***. [No Free Lunch Theorem](https://en.wikipedia.org/wiki/There_ain%27t_no_such_thing_as_a_free_lunch) applies as always. Stacking can give you an improvement but for certain price (deployment, computation, maintenance). Only experiment for given business case will give you an answer: is it worth an effort and money.
176
180
177
-
At current point large part of stacking users are participants of machine learning competitions. On Kaggle you can't go too far without stacking or [blending](https://github.com/vecxoz/vecstack#12-what-is-blending-how-is-it-related-to-stacking). I can secretly tell you that at least top half of leaderboard in pretty much any competition uses stacking or [blending](https://github.com/vecxoz/vecstack#12-what-is-blending-how-is-it-related-to-stacking) is some way. Stacking is less popular in production due to time and resource constraints, but I think it gains popularity.
181
+
At current point large part of stacking users are participants of machine learning competitions. On Kaggle you can't go too far without ensembling. I can secretly tell you that at least top half of leaderboard in pretty much any competition uses stacking in some way. Stacking is less popular in production due to time and resource constraints, but I think it gains popularity.
178
182
179
-
### 7. Can you explain stacking in 10 lines of code?
183
+
### 7. Can you explain stacking (stacked generalization) in 10 lines of code?
Code above will give meaningless result. If we fit on `X_train` we can’t just predict `X_train`, because our 1st level model has already seen `X_train`, and its prediction will be overfitted. To avoid overfitting we perform cross-validation procedure and in each fold we predict out-of-fold (OOF) part of `X_train`. You can find more details (concept, pictures, code) in [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb).
198
202
199
-
### 9. I want to implement stacking from scratch. Can you help me?
203
+
### 9. I want to implement stacking (stacked generalization) from scratch. Can you help me?
200
204
201
205
[Not a problem](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb)
202
206
203
207
### 10. What is OOF?
204
208
205
-
OOF is abbreviation for out-of-fold prediction. It's also known as *OOF features*, *stacked features*, *stacking features*, etc. Basically it means predictions on the part of data that model haven't seen during training.
209
+
OOF is abbreviation for out-of-fold prediction. It's also known as *OOF features*, *stacked features*, *stacking features*, etc. Basically it means predictions for the part of train data that model haven't seen during training.
206
210
207
211
### 11. What are *estimator*, *learner*, *model*?
208
212
209
213
Basically it is the same thing meaning *machine learning algorithm*. Often these terms are used interchangeably.
210
214
Speaking about inner stacking mechanics, you should remember that when you have *single 1st level model* there will be at least `n_folds` separate models *trained in each CV fold* on different subsets of data. See [Q23](https://github.com/vecxoz/vecstack#23-how-to-estimate-stacking-training-time-and-number-of-models-which-will-be-built) for more details.
211
215
212
216
### 12. What is *blending*? How is it related to stacking?
213
-
214
-
Basically it is the same thing. Both approaches use predictions as features, but final prediction on the 2nd (final) level is obtained differently.
215
-
* In *stacking* we train 2nd (final) level model (e.g. Linear Regression or Logistic Regression) using predictions of 1st level models as features.
216
-
* In *blending* we compute weighted average of predictions of 1st level models. Of course you can view weighted average as a model too.
217
217
218
-
Let's look at example.
218
+
Basically it is the same thing. Both approaches use predictions as features.
219
+
Often this terms are used interchangably.
220
+
The difference is how we generate features (predictions) for the next level:
221
+
**stacking*: perform cross-validation procedure and predict each part of train set (OOF)
*vecstack* package supports only *stacking* i.e. cross-validation approach. For given `random_state` value (e.g. 42) folds (splits) will be the same across all estimators. See also [Q30](https://github.com/vecxoz/vecstack#30-do-folds-splits-have-to-be-the-same-across-estimators-and-stacking-levels-how-does-random_state-work).
250
225
251
-
### 13. How to optimize weights for blending?
226
+
### 13. How to optimize weights for weighted average?
252
227
253
228
You can use for example:
254
229
255
230
*`scipy.optimize.minimize`
256
231
*`scipy.optimize.differential_evolution`
257
232
258
-
### 14. What is better: *blending* (weighted average) or *stacking* (2nd level model)?
233
+
### 14. What is better: weighted average for current level or additional level?
259
234
260
-
By default you can start from blending. It is easier to apply and more chances that it will give good result. Then you can try stacking which potentially can outperform blending (but not always and not in an easy way). Experiment is your friend.
235
+
By default you can start from weighted average. It is easier to apply and more chances that it will give good result. Then you can try additional level which potentially can outperform weighted average (but not always and not in an easy way). Experiment is your friend.
261
236
262
237
### 15. What is *bagging*? How is it related to stacking?
263
238
264
-
[Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating) or Bootstrap aggregating works as follows: generate subsets of training set, train estimator on these subsets and then find average of predictions. When we train several different algorithms on the same data and then find average we can call this bagging as well. See [simple blending](https://github.com/vecxoz/vecstack#12-what-is-blending-how-is-it-related-to-stacking).
239
+
[Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating) or Bootstrap aggregating works as follows: generate subsets of training set, train models on these subsets and then find average of predictions.
240
+
Also term *bagging* is often used to describe following approaches:
241
+
* train several different models on the same data and average predictions
242
+
* train same model with different random seeds on the same data and average predictions
243
+
244
+
So if we run stacking and just average predictions - it is *bagging*.
265
245
266
246
### 16. How many models should I use on a given stacking level?
267
247
268
248
***Note 1:*** The best architecture can be found only by experiment.
269
-
***Note 2:*** Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (blending) is diversity - low correlation between models.
249
+
***Note 2:*** Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (and ensembling in general) is diversity - low correlation between models.
270
250
271
251
It depends on many factors like type of problem, type of data, quality of models, correlation of models, expected result, etc.
272
252
Some example configurations are listed below.
273
253
* Reasonable starting point:
274
-
*`L1: 2-10 models -> L2: blend (weighted average) or single model`
254
+
*`L1: 2-10 models -> L2: weighted (rank) average or single model`
275
255
* Then try to add more 1st level models and additional level:
You can also find some winning stacking architectures on [Kaggle blog](http://blog.kaggle.com/), e.g.: [1st place in Homesite Quote Conversion](http://blog.kaggle.com/2016/04/08/homesite-quote-conversion-winners-write-up-1st-place-kazanova-faron-clobber/)
281
261
282
262
### 17. How many stacking levels should I use?
283
263
284
264
***Note 1:*** The best architecture can be found only by experiment.
285
-
***Note 2:*** Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (blending) is diversity - low correlation between models.
265
+
***Note 2:*** Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (and ensembling in general) is diversity - low correlation between models.
286
266
287
267
For some example configurations see [Q16](https://github.com/vecxoz/vecstack#16-how-many-models-should-i-use-on-a-given-stacking-level)
288
268
@@ -292,7 +272,7 @@ Based on experiments and correlation (e.g. Pearson). Less correlated models give
292
272
293
273
### 19. I am trying hard but still can't beat my best single model with stacking. What is wrong?
294
274
295
-
Nothing is wrong. Stacking is advanced complicated technique. It's hard to make it work. ***Solution:***Try [blending](https://github.com/vecxoz/vecstack#12-what-is-blending-how-is-it-related-to-stacking)first. Blending is much easier to apply and in most cases it will surely outperform your best model. If still no luck - then probably your models are highly correlated.
275
+
Nothing is wrong. Stacking is advanced complicated technique. It's hard to make it work. ***Solution:***make sure to try weighted (rank) average first instead of additional level with some advanced models. Average is much easier to apply and in most cases it will surely outperform your best model. If still no luck - then probably your models are highly correlated.
296
276
297
277
### 20. What should I choose: functional API (`stacking` function) or Scikit-learn API (`StackingTransformer`)?
0 commit comments