Skip to content

Commit 68cc377

Browse files
authored
update files for release X1.0.0 (#202)
* updated changelog * update readme * remove print statement in smart corr * change error for warning, fix selection based on r2 * fix style check * modify threshold funcionality, adjust tests, include example * fixes stylecheck * fix indexing and add example * fix absolute threshold for r2, mean target selector * change threshold definition, adjust test, fix docstrings * add template feature selection examples * fix test error due to random float * add links to tutorials * update circleci with deployment branch * add summary table to selection docs * added contributor to release
1 parent 7278035 commit 68cc377

31 files changed

+2666
-103
lines changed

.circleci/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,4 +138,4 @@ workflows:
138138
filters:
139139
branches:
140140
only:
141-
- 1.1.X
141+
- 1.0.X

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,6 @@ venv.bak/
109109
.idea
110110
.vscode
111111
*.csv
112-
113112
*.DS_Store
114-
untitled9.py
113+
*.db
114+
*.pptx

README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88
![Documentation Status](https://readthedocs.org/projects/feature-engine/badge/?version=latest)
99

1010

11-
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.
11+
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models.
12+
Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the
13+
transforming parameters from data and then transform the data.
1214

1315

1416
## Feature-engine features in the following resources:
@@ -38,33 +40,32 @@ More resources will be added as they appear online!
3840

3941

4042
## Current Feature-engine's transformers include functionality for:
41-
4243
* Missing Data Imputation
4344
* Categorical Variable Encoding
4445
* Outlier Capping or Removal
4546
* Discretisation
4647
* Numerical Variable Transformation
4748
* Scikit-learn Wrappers
48-
* Variables Combination
49+
* Variable Combination
4950
* Variable Selection
5051

5152
### Imputing Methods
52-
5353
* MeanMedianImputer
5454
* RandomSampleImputer
5555
* EndTailImputer
5656
* AddNaNBinaryImputer
57-
* CategoricalVariableImputer
58-
* FrequentCategoryImputer
57+
* CategoricalImputer
5958
* ArbitraryNumberImputer
6059

6160
### Encoding Methods
62-
* CountFrequencyCategoricalEncoder
63-
* OrdinalCategoricalEncoder
64-
* MeanCategoricalEncoder
65-
* WoERatioCategoricalEncoder
66-
* OneHotCategoricalEncoder
67-
* RareLabelCategoricalEncoder
61+
* OneHotEncoder
62+
* OrdinalEncoder
63+
* CountFrequencyEncoder
64+
* MeanEncoder
65+
* WoEEncoder
66+
* PRatioEncoder
67+
* RareLabelEncoder
68+
* DecisionTreeEncoder
6869

6970
### Outlier Handling methods
7071
* Winsorizer
@@ -85,23 +86,22 @@ More resources will be added as they appear online!
8586
* YeoJohnsonTransformer
8687

8788
### Scikit-learn Wrapper:
88-
8989
* SklearnTransformerWrapper
9090

9191
### Variable Combinations:
92-
93-
* MathematicalCombinator
92+
* MathematicalCombination
9493

9594
### Feature Selection:
96-
9795
* DropFeatures
9896
* DropConstantFeatures
9997
* DropDuplicateFeatures
10098
* DropCorrelatedFeatures
99+
* SmartCorrelationSelection
101100
* ShuffleFeaturesSelector
102101
* SelectBySingleFeaturePerformance
103102
* SelectByTargetMeanPerformance
104103
* RecursiveFeatureElimination
104+
* RecursiveFeatureAddition
105105

106106

107107
## Installing
@@ -127,8 +127,8 @@ git clone https://github.com/solegalli/feature_engine.git
127127
### Usage
128128

129129
```python
130-
>>> from feature_engine.encoding import RareLabelEncoder
131130
>>> import pandas as pd
131+
>>> from feature_engine.encoding import RareLabelEncoder
132132

133133
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
134134
>>> data = pd.DataFrame(data)

docs/blogs.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Blogs
1616
Videos
1717
------
1818

19-
- Coming soon!
19+
- `Optimising Feature Engineering Pipelines with Feature-engine <https://www.youtube.com/watch?v=qT-3KUaFYmk/>`_, Pydata Cambridge 2020, from minute 51:43.
2020

2121
En Español
2222
----------

docs/howto.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,6 @@
33
How To
44
======
55

6-
Coming Soon!
6+
Find `jupyter notebooks with examples <https://nbviewer.jupyter.org/github/solegalli/feature_engine/tree/master/examples/>`_
7+
of each transformer functionality. Within each folder, you will find a jupyter notebook
8+
showcasing the functionality of each transformer.

docs/images/Thumbs.db

-71.5 KB
Binary file not shown.

docs/images/selectionSummary.png

91.2 KB
Loading

docs/selection/index.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ Feature Selection
66
Feature-engine's feature selection transformers are used to drop subsets of variables.
77
Or in other words to select subsets of variables.
88

9+
.. figure:: ../images/selectionSummary.png
10+
:align: center
11+
12+
Summary of Feature-engine's selectors main characteristics
13+
914
.. toctree::
1015
:maxdepth: 2
1116

docs/whats_new/v1.rst

Lines changed: 49 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,41 @@ Version 1.0.0
33

44
Deployed: TBD
55

6-
Contributors:
6+
Contributors
7+
------------
8+
- Ashok Kumar
79
- Christopher Samiullah
810
- Nicolas Galli
911
- Nodar Okroshiashvili
12+
- Pradumna Suryawanshi
1013
- Sana Ben Driss
1114
- Tejash Shah
1215
- Tung Lee
1316
- Soledad Galli
1417

1518

1619
In this version, we made a major overhaul of the package, with code quality improvement
17-
throughout the code base, unification of attributes and methods when possible, addition
18-
of new transformers and extended documentation. Read below for more details.
20+
throughout the code base, unification of attributes and methods, addition of new
21+
transformers and extended documentation. Read below for more details.
1922

20-
**Renaming of Modules within Feature-engine**:
23+
New transformers for Feature Selection
24+
--------------------------------------
25+
26+
We included a whole new module with multiple transformers to select features.
27+
28+
- **DropConstantFeatures**: removes constant and quasi-constant features from a dataframe (**by Tejash Shah**)
29+
- **DropDuplicateFeatures**: removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**)
30+
- **DropCorrelatedFeatures**: removes features that are correlated (**by Nicolas Galli**)
31+
- **SmartCorrelationSelection**: selects feature from group of correlated features based on certain criteria (**by Soledad Galli**)
32+
- **ShuffleFeaturesSelector**: selects features by drop in machine learning model performance after feature's values are randomly shuffled (**by Sana Ben Driss**)
33+
- **SelectBySingleFeaturePerformance**: selects features based on a ML model performance trained on individual features (**by Nicolas Galli**)
34+
- **SelectByTargetMeanPerformance**: selects features encoding the categories or intervals with the target mean and using that as proxy for performance (**by Tung Lee and Soledad Galli**)
35+
- **RecursiveFeatureElimination**: selects features recursively, evaluating the drop in ML performance, from the least to the most important feature (**by Sana Ben Driss**)
36+
- **RecursiveFeatureAddition**: selects features recursively, evaluating the increase in ML performance, from the most to the least important feature (**by Sana Ben Driss**)
37+
38+
39+
Renaming of Modules
40+
-------------------
2141

2242
Feature-engine transformers have been sorted into submodules to smooth the development
2343
of the package and shorten import syntax for users.
@@ -30,50 +50,57 @@ of the package and shorten import syntax for users.
3050
- **Module selection**: new module hosts transformers to select or remove variables from a dataset.
3151
- **Module creation**: new module hosts transformers that combine variables into new features using mathematical or other operations.
3252

33-
**Renaming of Classes**:
53+
Renaming of Classes
54+
-------------------
3455

35-
In this release, we have shortened the name of categorical encoders, and also renamed
36-
other classes of Feature-engine to simplify import syntax.
56+
We shortened the name of categorical encoders, and also renamed other classes to
57+
simplify import syntax.
3758

3859
- **Encoders**: the word ``Categorical`` was removed from the classes name. Now, instead of ``MeanCategoricalEncoder``, the class is called ``MeanEncoder``. Instead of ``RareLabelCategoricalEncoder`` it is ``RareLabelEncoder`` and so on. Please check the encoders documentation for more details.
3960
- **Imputers**: the ``CategoricalVariableImputer`` is now called ``CategoricalImputer``.
4061
- **Discretisers**: the ``UserInputDiscretiser`` is now called ``ArbitraryDiscretiser``.
4162
- **Creation**: the ``MathematicalCombinator`` is not called ``MathematicalCombination``.
4263
- **WoEEncoder and PRatioEncoder**: the ``WoEEncoder`` now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: the ``PRatioEncoder`` (**by Nicolas Galli**).
4364

44-
**Renaming of class init Parameters**:
65+
Renaming of Parameters
66+
----------------------
4567

4668
We renamed a few parameters to unify the nomenclature across the Package.
4769

4870
- **EndTailImputer**: the parameter ``distribution`` is now called ``imputation_method`` to unify convention among imputers. To impute using the IQR, we now need to pass ``imputation_method="iqr"`` instead of ``imputation_method="skewed"``.
4971
- **AddMissingIndicator**: the parameter ``missing_only`` now takes the boolean values ``True`` or ``False``.
5072
- **Winzoriser and OutlierTrimmer**: the parameter ``distribution`` is now called ``capping_method`` to unify names across Feature-engine transformers.
5173

52-
**New transformers and classes**:
5374

54-
We included a whole new module with multiple transformers to select features.
75+
Tutorials
76+
---------
5577

56-
- **DropConstantFeatures**: finds and removes constant and quasi-constant features from a dataframe (**by Tejash Shah**)
57-
- **DropDuplicateFeatures**: finds and removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**)
58-
- **DropCorrelatedFeatures**: finds and removes features that are correlated (**by Nicolas Galli**)
59-
- **ShuffleFeaturesSelector**: selects features by determining the drop in machine learning model performance when each feature's values are randomly shuffled from a dataframe (**by Sana Ben Driss**)
60-
- **SelectBySingleFeaturePerformance**: trains a model based of each individual features, and derives performance (**by Nicolas Galli**)
61-
- **SelectByTargetMeanPerformance**: selects features encoding the categories with the target mean and using that as proxy for performance (**by Tung Lee and Soledad Galli**)
62-
- **RecursiveFeatureElimination**: selects features recursively, evaluating the drop in ML performance, from the least to the important feature (**by Sana Ben Driss**)
63-
- **RecursiveFeatureAddition**: selects features recursively, evaluating the increase in ML performance, after adding a new feature, starting from the most to the least important feature (**by Sana Ben Driss**)
78+
- **Imputation**: updated "how to" examples of missing data imputation (**by Pradumna Suryawanshi**)
79+
- **Encoders**: new and updated "how to" examples of categorical encoding (**by Ashok Kumar**)
80+
- **Discretisation**: new and updated "how to" examples of discretisation (**by Ashok Kumar**)
81+
82+
83+
For Contributors and Developers
84+
-------------------------------
85+
86+
Code Architecture
87+
~~~~~~~~~~~~~~~~~
6488

65-
**Code Architecture - Important for Contributors and Developers**:
6689
- **Submodules**: transformers have been grouped within relevant submodules and modules.
6790
- **Individual tests**: testing classes have been subdivided into individual tests
6891
- **Code Style**: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code.
69-
- **Type hint**: we rolled out the use of type hint throughout Feature-engine classes and functions (**by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah**)
92+
- **Type hint**: we rolled out the use of type hint throughout classes and functions (**by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah**)
93+
94+
Documentation
95+
~~~~~~~~~~~~~
7096

71-
**Documentation**
7297
- Switched fully to numpydoc and away from Napoleon
7398
- Included more detail about methods, parameters, returns and raises, as per numpydoc docstring style (**by Nodar Okroshiashvili, Soledad Galli**)
7499
- Linked documentation to github repository
75100
- Improved layout
76101

77-
**Other Changes**:
102+
Other Changes
103+
-------------
104+
78105
- **Updated documentation**: documentation reflects the current use of Feature-engine transformers
79106
- **Typo fixes**: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)

examples/feature-engine-with-sklearn-pipeline.ipynb renamed to examples/Pipelines/feature-engine-with-sklearn-pipeline.ipynb

File renamed without changes.

0 commit comments

Comments
 (0)