Skip to content

Commit a49c7fc

Browse files
authored
smart correlation selector closes #115 (#199)
* first draft smart corr * add smart corr to init * final edits * add tests * fix mypy error * finishes docs and docstrings * change wording fit, transform methods * fix typo in docs * rebase master and unifies params with other selectors * fix test
1 parent dbf990e commit a49c7fc

File tree

5 files changed

+679
-20
lines changed

5 files changed

+679
-20
lines changed

docs/index.rst

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Feature-engine features in the following resources
7777
---------------------------------------------------
7878

7979
- `Website <https://www.trainindata.com/feature-engine>`_.
80-
- `Feature Engineering for Machine Learning <https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO>`_, Online Course .
80+
- `Feature Engineering for Machine Learning <https://www.udemy.com/feature-engineering-for-machine-learning/?couponCode=FEATENGREPO>`_, Online Course.
8181
- `Python Feature Engineering Cookbook <https://www.packtpub.com/data/python-feature-engineering-cookbook>`_.
8282
- `Feature-engine: A new open-source Python package for feature engineering <https://www.trainindatablog.com/feature-engine-a-new-open-source-python-package-for-feature-engineering/>`_.
8383
- `Practical Code Implementations of Feature Engineering for Machine Learning with Python <https://www.trainindatablog.com/practical-code-implementations-of-feature-engineering-for-machine-learning-with-python/>`_.
@@ -90,23 +90,6 @@ En Español:
9090
More resources in the **Learning Resources** sections on the navigation panel on the
9191
left.
9292

93-
Contributing
94-
------------
95-
96-
Interested in contributing to Feature-engine? That is great news!
97-
98-
Feature-engine is a welcoming and inclusive project and it would be great to have you
99-
on board. We follow the
100-
`Python Software Foundation Code of Conduct <http://www.python.org/psf/codeofconduct/>`_.
101-
102-
Regardless of your skill level you can help us. We appreciate bug reports, user testing,
103-
feature requests, bug fixes, addition of tests, product enhancements, and documentation
104-
improvements. We also appreciate blogs about Feature-engine. If you happen to have one,
105-
let us know!
106-
107-
For more details on how to contribute check the contributing page. Click on the
108-
"Contributing" page on the left of this page.
109-
11093

11194
Feature-engine's Transformers
11295
-----------------------------
@@ -173,6 +156,7 @@ Feature Selection:
173156
- :doc:`selection/DropConstantFeatures`: drops constant and quasi-constant variables from a dataframe
174157
- :doc:`selection/DropDuplicateFeatures`: drops duplicated variables from a dataframe
175158
- :doc:`selection/DropCorrelatedFeatures`: drops correlated variables from a dataframe
159+
- :doc:`selection/SmartCorrelatedSelection`: selects best feature from correlated group
176160
- :doc:`selection/SelectByShuffling`: selects features by evaluating model performance after feature shuffling
177161
- :doc:`selection/SelectBySingleFeaturePerformance`: selects features based on their performance on univariate estimators
178162
- :doc:`selection/SelectByTargetMeanPerformance`: selects features based on target mean encoding performance
@@ -199,6 +183,23 @@ Check if there's already an open `issue <https://github.com/solegalli/feature_en
199183
on the topic. If not, open a new `issue <https://github.com/solegalli/feature_engine/issues/>`_
200184
with your bug report, suggestion or new feature request.
201185

186+
Contributing
187+
------------
188+
189+
Interested in contributing to Feature-engine? That is great news!
190+
191+
Feature-engine is a welcoming and inclusive project and it would be great to have you
192+
on board. We follow the
193+
`Python Software Foundation Code of Conduct <http://www.python.org/psf/codeofconduct/>`_.
194+
195+
Regardless of your skill level you can help us. We appreciate bug reports, user testing,
196+
feature requests, bug fixes, addition of tests, product enhancements, and documentation
197+
improvements. We also appreciate blogs about Feature-engine. If you happen to have one,
198+
let us know!
199+
200+
For more details on how to contribute check the contributing page. Click on the
201+
"Contributing" link on the left of this page.
202+
202203

203204
Open Source
204205
-----------
@@ -211,6 +212,7 @@ The `issues <https://github.com/solegalli/feature_engine/issues/>`_ and
211212
`pull requests <https://github.com/solegalli/feature_engine/pulls>`_ are tracked there.
212213

213214

215+
214216
.. toctree::
215217
:maxdepth: 1
216218
:caption: Table of Contents

docs/selection/SmartCorrelatedSelection.rst

Lines changed: 73 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,80 @@ SmartCorrelatedSelection
44
API Reference
55
-------------
66

7-
Coming soon
7+
.. autoclass:: feature_engine.selection.SmartCorrelatedSelection
8+
:members:
89

910
Example
1011
-------
1112

12-
Coming soon
13+
.. code:: python
14+
15+
import pandas as pd
16+
from sklearn.datasets import make_classification
17+
from feature_engine.selection import SmartCorrelatedSelection
18+
19+
# make dataframe with some correlated variables
20+
def make_data():
21+
X, y = make_classification(n_samples=1000,
22+
n_features=12,
23+
n_redundant=4,
24+
n_clusters_per_class=1,
25+
weights=[0.50],
26+
class_sep=2,
27+
random_state=1)
28+
29+
# trasform arrays into pandas df and series
30+
colnames = ['var_'+str(i) for i in range(12)]
31+
X = pd.DataFrame(X, columns=colnames)
32+
return X
33+
34+
35+
X = make_data()
36+
37+
38+
# set up the selector
39+
tr = SmartCorrelatedSelection(
40+
variables=None,
41+
method="pearson",
42+
threshold=0.8,
43+
missing_values="raise",
44+
selection_method="variance",
45+
estimator=None,
46+
)
47+
48+
Xt = tr.fit_transform(X)
49+
50+
tr.correlated_feature_sets_
51+
52+
53+
.. code:: python
54+
55+
[{'var_0', 'var_8'}, {'var_4', 'var_6', 'var_7', 'var_9'}]
56+
57+
.. code:: python
58+
59+
tr.selected_features_
60+
61+
.. code:: python
62+
63+
['var_1', 'var_2', 'var_3', 'var_5', 'var_10', 'var_11', 'var_8', 'var_7']
64+
65+
.. code:: python
66+
67+
print(print(Xt.head()))
68+
69+
.. code:: python
70+
71+
var_1 var_2 var_3 var_5 var_10 var_11 var_8 \
72+
0 -2.376400 -0.247208 1.210290 0.091527 2.070526 -1.989335 2.070483
73+
1 1.969326 -0.126894 0.034598 -0.186802 1.184820 -1.309524 2.421477
74+
2 1.499174 0.334123 -2.233844 -0.313881 -0.066448 -0.852703 2.263546
75+
3 0.075341 1.627132 0.943132 -0.468041 0.713558 0.484649 2.792500
76+
4 0.372213 0.338141 0.951526 0.729005 0.398790 -0.186530 2.186741
77+
78+
var_7
79+
0 -2.230170
80+
1 -1.447490
81+
2 -2.240741
82+
3 -3.534861
83+
4 -2.053965

feature_engine/selection/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .drop_constant_features import DropConstantFeatures
66
from .drop_duplicate_features import DropDuplicateFeatures
77
from .drop_correlated_features import DropCorrelatedFeatures
8+
from .smart_correlation_selection import SmartCorrelatedSelection
89
from .shuffle_features import SelectByShuffling
910
from .single_feature_performance import SelectBySingleFeaturePerformance
1011
from .recursive_feature_addition import RecursiveFeatureAddition
@@ -16,6 +17,7 @@
1617
"DropConstantFeatures",
1718
"DropDuplicateFeatures",
1819
"DropCorrelatedFeatures",
20+
"SmartCorrelatedSelection",
1921
"SelectByShuffling",
2022
"SelectBySingleFeaturePerformance",
2123
"RecursiveFeatureAddition",

0 commit comments

Comments
 (0)