Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/actions.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Build

on: [push]

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- name: Checkout the ${{ github.repository }} repository
uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[test]
- name: Test with Pytest
run: |
pytest --cov=vecstack --cov-report=term-missing tests
- name: Coveralls
uses: coverallsapp/github-action@v2
40 changes: 0 additions & 40 deletions .travis.yml

This file was deleted.

12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

### v0.5.0 -- September 8, 2025 -- Maintenance release

* Python 3.9+
* Testing: pytest and pytest-cov
* CI: GitHub Actions

* Scikit-learn API:
* Fixed `_set_params` method which was not resetting individual estimators in the `estimators` collection

* Functional API
* Fixed saving OOF arrays in file

### v0.4.0 -- August 12, 2019

Since v0.4.0 vecstack provides official support for Python 3.5 and higher only,
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
MIT License

Vecstack. Python package for stacking (machine learning technique)
Copyright (c) 2016-2019 Igor Ivanov
Copyright (c) 2016-2025 Igor Ivanov
Email: [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy
Expand Down
39 changes: 26 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[![PyPI version](https://img.shields.io/pypi/v/vecstack.svg?colorB=4cc61e)](https://pypi.python.org/pypi/vecstack)
[![PyPI license](https://img.shields.io/pypi/l/vecstack.svg)](https://github.com/vecxoz/vecstack/blob/master/LICENSE.txt)
[![Build Status](https://travis-ci.org/vecxoz/vecstack.svg?branch=master)](https://travis-ci.org/vecxoz/vecstack)
[![Build status](https://github.com/vecxoz/vecstack/actions/workflows/actions.yaml/badge.svg?branch=master)](https://github.com/vecxoz/vecstack/actions)
[![Coverage Status](https://coveralls.io/repos/github/vecxoz/vecstack/badge.svg?branch=master)](https://coveralls.io/github/vecxoz/vecstack?branch=master)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/vecstack.svg)](https://pypi.python.org/pypi/vecstack/)

Expand All @@ -11,18 +11,18 @@ Convenient way to automate OOF computation, prediction and bagging using any num
* [Functional API](https://github.com/vecxoz/vecstack#usage-functional-api):
* Minimalistic. Get your stacked features in a single line
* RAM-friendly. The lowest possible memory consumption
* Kaggle-ready. Stacked features and hyperparameters from each run can be [automatically saved](https://github.com/vecxoz/vecstack/blob/master/vecstack/core.py#L209) in files. No more mess at the end of the competition. [Log example](https://github.com/vecxoz/vecstack/blob/master/examples/03_log_example.txt)
* Kaggle-ready. Stacked features and hyperparameters from each run can be [automatically saved](https://github.com/vecxoz/vecstack/blob/master/vecstack/core.py#L210) in files. No more mess at the end of the competition. [Log example](https://github.com/vecxoz/vecstack/blob/master/examples/03_log_example.txt)
* [Scikit-learn API](https://github.com/vecxoz/vecstack#usage-scikit-learn-api):
* Standardized. Fully scikit-learn compatible transformer class exposing `fit` and `transform` methods
* Pipeline-certified. Implement and deploy [multilevel stacking](https://github.com/vecxoz/vecstack/blob/master/examples/04_sklearn_api_regression_pipeline.ipynb) like it's no big deal using `sklearn.pipeline.Pipeline`
* And of course `FeatureUnion` is also invited to the party
* Overall specs:
* Use any sklearn-like estimators
* Perform [classification and regression](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L83) tasks
* Predict [class labels or probabilities](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L119) in classification task
* Apply any [user-defined metric](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L124)
* Apply any [user-defined transformations](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L87) for target and prediction
* Python 3.5 and higher, [unofficial support for Python 2.7 and 3.4](https://github.com/vecxoz/vecstack/blob/master/PY2.md)
* Perform [classification and regression](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L85) tasks
* Predict [class labels or probabilities](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L121) in classification task
* Apply any [user-defined metric](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L126)
* Apply any [user-defined transformations](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L89) for target and prediction
* Python 3.9+, [unofficial support for Python 2.7 and 3.4](https://github.com/vecxoz/vecstack/blob/master/PY2.md)
* Win, Linux, Mac
* [MIT license](https://github.com/vecxoz/vecstack/blob/master/LICENSE.txt)
* Depends on **numpy**, **scipy**, **scikit-learn>=0.18**
Expand All @@ -44,19 +44,19 @@ Convenient way to automate OOF computation, prediction and bagging using any num
* [Regression + Multilevel stacking using Pipeline](https://github.com/vecxoz/vecstack/blob/master/examples/04_sklearn_api_regression_pipeline.ipynb)
* Documentation:
* [Functional API](https://github.com/vecxoz/vecstack/blob/master/vecstack/core.py#L133) or type ```>>> help(stacking)```
* [Scikit-learn API](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L64) or type ```>>> help(StackingTransformer)```
* [Scikit-learn API](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L66) or type ```>>> help(StackingTransformer)```

# Installation

***Note:*** Python 3.5 or higher is required. If you’re still using Python 2.7 or 3.4 see [installation details here](https://github.com/vecxoz/vecstack/blob/master/PY2.md)
***Note:*** Python 3.9+ is officially supported and tested. If you’re still using Python 2.7 or 3.4 see [installation details here](https://github.com/vecxoz/vecstack/blob/master/PY2.md)

* ***Classic 1st time installation (recommended):***
* `pip install vecstack`
* Install for current user only (if you have some troubles with write permission):
* `pip install --user vecstack`
* If your PATH doesn't work:
* `/usr/bin/python -m pip install vecstack`
* `C:/Python36/python -m pip install vecstack`
* `C:/Python3/python -m pip install vecstack`
* Upgrade vecstack and all dependencies:
* `pip install --upgrade vecstack`
* Upgrade vecstack WITHOUT upgrading dependencies:
Expand Down Expand Up @@ -137,6 +137,7 @@ S_test = stack.transform(X_test)
28. [Can I use `(Randomized)GridSearchCV` to tune the whole stacking Pipeline?](https://github.com/vecxoz/vecstack#28-can-i-use-randomizedgridsearchcv-to-tune-the-whole-stacking-pipeline)
29. [How to define custom metric, especially AUC?](https://github.com/vecxoz/vecstack#29-how-to-define-custom-metric-especially-auc)
30. [Do folds (splits) have to be the same across estimators and stacking levels? How does `random_state` work?](https://github.com/vecxoz/vecstack#30-do-folds-splits-have-to-be-the-same-across-estimators-and-stacking-levels-how-does-random_state-work)
31. [How does `vecstack.StackingTransformer` differ from `sklearn.ensemble.StackingClassifier`?](https://github.com/vecxoz/vecstack#31-how-does-vecstackstackingtransformer-differ-from-sklearnensemblestackingclassifier)

### 1. How can I report an issue? How can I ask a question about stacking or vecstack package?

Expand Down Expand Up @@ -167,7 +168,7 @@ Main idea is to use predictions as features.
More specifically we predict train set (in CV-like fashion) and test set using some 1st level model(s), and then use these predictions as features for 2nd level model. You can find more details (concept, pictures, code) in [stacking tutorial](https://github.com/vecxoz/vecstack/blob/master/examples/00_stacking_concept_pictures_code.ipynb).
Also make sure to check out:
* [Ensemble Learning](https://en.wikipedia.org/wiki/Ensemble_learning) ([Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) in Wikipedia
* Classical [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/)
* Classical [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/) or try [another link](https://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/)
* [Stacked Generalization](https://www.researchgate.net/publication/222467943_Stacked_Generalization) paper by David H. Wolpert

### 5. What about stacking name?
Expand Down Expand Up @@ -216,7 +217,7 @@ Speaking about inner stacking mechanics, you should remember that when you have
### 12. What is *blending*? How is it related to stacking?

Basically it is the same thing. Both approaches use predictions as features.
Often this terms are used interchangeably.
Often these terms are used interchangeably.
The difference is how we generate features (predictions) for the next level:
* *stacking*: perform cross-validation procedure and predict each part of train set (OOF)
* *blending*: predict fixed holdout set
Expand Down Expand Up @@ -387,10 +388,14 @@ def auc(y_true, y_pred):

To ensure better result, folds (splits) have to be the same across all estimators and all stacking levels. It means that `random_state` has to be the same in every call to `stacking` function or `StackingTransformer`. This is default behavior of `stacking` function and `StackingTransformer` (by default `random_state=0`). If you want to try different folds (splits) try to set different `random_state` values.

### 31. How does `vecstack.StackingTransformer` differ from `sklearn.ensemble.StackingClassifier`?

It significantly differs. Please see a [detailed explanation](https://github.com/vecxoz/vecstack/issues/37).


# Stacking concept

1. We want to predict train set and test set with some 1st level model(s), and then use these predictions as features for 2nd level model(s).
1. We want to predict train set and test set with some 1st level model(s), and then use these predictions as features for 2nd level model(s).
2. Any model can be used as 1st level model or 2nd level model.
3. To avoid overfitting (for train set) we use cross-validation technique and in each fold we predict out-of-fold (OOF) part of train set.
4. The common practice is to use from 3 to 10 folds.
Expand All @@ -404,6 +409,7 @@ To ensure better result, folds (splits) have to be the same across all estimator
8. We can repeat this cycle using other 1st level models to get more features for 2nd level model.
9. You can also look at animation of [Variant A](https://github.com/vecxoz/vecstack#variant-a-animation) and [Variant B](https://github.com/vecxoz/vecstack#variant-b-animation).


# Variant A

![Fold 1 of 3](https://github.com/vecxoz/vecstack/raw/master/pic/dia1.png "Fold 1 of 3")
Expand All @@ -429,3 +435,10 @@ To ensure better result, folds (splits) have to be the same across all estimator
# Variant B. Animation

![Variant B. Animation](https://github.com/vecxoz/vecstack/raw/master/pic/animation2.gif "Variant B. Animation")


# References

* [Ensemble Learning](https://en.wikipedia.org/wiki/Ensemble_learning) ([Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) in Wikipedia
* Classical [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/) or try [another link](https://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/)
* [Stacked Generalization](https://www.researchgate.net/publication/222467943_Stacked_Generalization) paper by David H. Wolpert
Loading