You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ Often it is also called *stacked generalization*. The term is derived from the v
178
178
179
179
It depends on specific business case. The main thing to know about stacking is that it requires ***significant computing resources***. [No Free Lunch Theorem](https://en.wikipedia.org/wiki/There_ain%27t_no_such_thing_as_a_free_lunch) applies as always. Stacking can give you an improvement but for certain price (deployment, computation, maintenance). Only experiment for given business case will give you an answer: is it worth an effort and money.
180
180
181
-
At current point large part of stacking users are participants of machine learning competitions. On Kaggle you can't go too far without ensembling. I can secretly tell you that at least top half of leaderboard in pretty much any competition uses stacking in some way. Stacking is less popular in production due to time and resource constraints, but I think it gains popularity.
181
+
At current point large part of stacking users are participants of machine learning competitions. On Kaggle you can't go too far without ensembling. I can secretly tell you that at least top half of leaderboard in pretty much any competition uses ensembling (stacking) in some way. Stacking is less popular in production due to time and resource constraints, but I think it gains popularity.
182
182
183
183
### 7. Can you explain stacking (stacked generalization) in 10 lines of code?
184
184
@@ -216,7 +216,7 @@ Speaking about inner stacking mechanics, you should remember that when you have
216
216
### 12. What is *blending*? How is it related to stacking?
217
217
218
218
Basically it is the same thing. Both approaches use predictions as features.
219
-
Often this terms are used interchangably.
219
+
Often this terms are used interchangeably.
220
220
The difference is how we generate features (predictions) for the next level:
221
221
**stacking*: perform cross-validation procedure and predict each part of train set (OOF)
222
222
**blending*: predict fixed holdout set
@@ -257,14 +257,14 @@ Some example configurations are listed below.
257
257
* If you're crunching numbers at Kaggle and decided to go wild:
You can also find some winning stacking architectures on [Kaggle blog](http://blog.kaggle.com/), e.g.: [1st place in Homesite Quote Conversion](http://blog.kaggle.com/2016/04/08/homesite-quote-conversion-winners-write-up-1st-place-kazanova-faron-clobber/)
260
+
You can also find some winning stacking architectures on [Kaggle blog](http://blog.kaggle.com/), e.g.: [1st place in Homesite Quote Conversion](http://blog.kaggle.com/2016/04/08/homesite-quote-conversion-winners-write-up-1st-place-kazanova-faron-clobber/).
261
261
262
262
### 17. How many stacking levels should I use?
263
263
264
264
***Note 1:*** The best architecture can be found only by experiment.
265
265
***Note 2:*** Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (and ensembling in general) is diversity - low correlation between models.
266
266
267
-
For some example configurations see [Q16](https://github.com/vecxoz/vecstack#16-how-many-models-should-i-use-on-a-given-stacking-level)
267
+
For some example configurations see [Q16](https://github.com/vecxoz/vecstack#16-how-many-models-should-i-use-on-a-given-stacking-level).
268
268
269
269
### 18. How do I choose models for stacking?
270
270
@@ -348,7 +348,7 @@ You can find out only by experiment. Default choice is variant ***A***, because
348
348
349
349
***Note 2:*** To be correctly detected train set does not necessarily have to be identical (exactly the same). It must have the same shape and all values must be *close* (`np.isclose` is used for checking). So if you somehow regenerate your train set you should not worry about numerical precision.
350
350
351
-
If you transform `X_train` and see 'Train set was detected' everything is OK. If you transform `X_train` but you don't see this message then something went wrong. Probably your train set was changed (it is not allowed). In this case you have to retrain `StackingTransformer`. For more details see [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb) or [Q8](https://github.com/vecxoz/vecstack#8-why-do-i-need-complicated-inner-procedure-for-stacking)
351
+
If you transform `X_train` and see 'Train set was detected' everything is OK. If you transform `X_train` but you don't see this message then something went wrong. Probably your train set was changed (it is not allowed). In this case you have to retrain `StackingTransformer`. For more details see [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb) or [Q8](https://github.com/vecxoz/vecstack#8-why-do-i-need-complicated-inner-procedure-for-stacking).
352
352
353
353
### 27. How is the very first stacking level called: L0 or L1? Where does counting start?
0 commit comments