You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data.
11
+
Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models.
12
+
Feature-engine's transformers follow scikit-learn's functionality with fit() and transform() methods to first learn the
13
+
transforming parameters from data and then transform the data.
12
14
13
15
14
16
## Feature-engine features in the following resources:
@@ -38,33 +40,32 @@ More resources will be added as they appear online!
38
40
39
41
40
42
## Current Feature-engine's transformers include functionality for:
41
-
42
43
* Missing Data Imputation
43
44
* Categorical Variable Encoding
44
45
* Outlier Capping or Removal
45
46
* Discretisation
46
47
* Numerical Variable Transformation
47
48
* Scikit-learn Wrappers
48
-
*Variables Combination
49
+
*Variable Combination
49
50
* Variable Selection
50
51
51
52
### Imputing Methods
52
-
53
53
* MeanMedianImputer
54
54
* RandomSampleImputer
55
55
* EndTailImputer
56
56
* AddNaNBinaryImputer
57
-
* CategoricalVariableImputer
58
-
* FrequentCategoryImputer
57
+
* CategoricalImputer
59
58
* ArbitraryNumberImputer
60
59
61
60
### Encoding Methods
62
-
* CountFrequencyCategoricalEncoder
63
-
* OrdinalCategoricalEncoder
64
-
* MeanCategoricalEncoder
65
-
* WoERatioCategoricalEncoder
66
-
* OneHotCategoricalEncoder
67
-
* RareLabelCategoricalEncoder
61
+
* OneHotEncoder
62
+
* OrdinalEncoder
63
+
* CountFrequencyEncoder
64
+
* MeanEncoder
65
+
* WoEEncoder
66
+
* PRatioEncoder
67
+
* RareLabelEncoder
68
+
* DecisionTreeEncoder
68
69
69
70
### Outlier Handling methods
70
71
* Winsorizer
@@ -85,23 +86,22 @@ More resources will be added as they appear online!
Copy file name to clipboardExpand all lines: docs/whats_new/v1.rst
+49-22Lines changed: 49 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,21 +3,41 @@ Version 1.0.0
3
3
4
4
Deployed: TBD
5
5
6
-
Contributors:
6
+
Contributors
7
+
------------
8
+
- Ashok Kumar
7
9
- Christopher Samiullah
8
10
- Nicolas Galli
9
11
- Nodar Okroshiashvili
12
+
- Pradumna Suryawanshi
10
13
- Sana Ben Driss
11
14
- Tejash Shah
12
15
- Tung Lee
13
16
- Soledad Galli
14
17
15
18
16
19
In this version, we made a major overhaul of the package, with code quality improvement
17
-
throughout the code base, unification of attributes and methods when possible, addition
18
-
of new transformers and extended documentation. Read below for more details.
20
+
throughout the code base, unification of attributes and methods, addition of new
21
+
transformers and extended documentation. Read below for more details.
19
22
20
-
**Renaming of Modules within Feature-engine**:
23
+
New transformers for Feature Selection
24
+
--------------------------------------
25
+
26
+
We included a whole new module with multiple transformers to select features.
27
+
28
+
- **DropConstantFeatures**: removes constant and quasi-constant features from a dataframe (**by Tejash Shah**)
29
+
- **DropDuplicateFeatures**: removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**)
30
+
- **DropCorrelatedFeatures**: removes features that are correlated (**by Nicolas Galli**)
31
+
- **SmartCorrelationSelection**: selects feature from group of correlated features based on certain criteria (**by Soledad Galli**)
32
+
- **ShuffleFeaturesSelector**: selects features by drop in machine learning model performance after feature's values are randomly shuffled (**by Sana Ben Driss**)
33
+
- **SelectBySingleFeaturePerformance**: selects features based on a ML model performance trained on individual features (**by Nicolas Galli**)
34
+
- **SelectByTargetMeanPerformance**: selects features encoding the categories or intervals with the target mean and using that as proxy for performance (**by Tung Lee and Soledad Galli**)
35
+
- **RecursiveFeatureElimination**: selects features recursively, evaluating the drop in ML performance, from the least to the most important feature (**by Sana Ben Driss**)
36
+
- **RecursiveFeatureAddition**: selects features recursively, evaluating the increase in ML performance, from the most to the least important feature (**by Sana Ben Driss**)
37
+
38
+
39
+
Renaming of Modules
40
+
-------------------
21
41
22
42
Feature-engine transformers have been sorted into submodules to smooth the development
23
43
of the package and shorten import syntax for users.
@@ -30,50 +50,57 @@ of the package and shorten import syntax for users.
30
50
- **Module selection**: new module hosts transformers to select or remove variables from a dataset.
31
51
- **Module creation**: new module hosts transformers that combine variables into new features using mathematical or other operations.
32
52
33
-
**Renaming of Classes**:
53
+
Renaming of Classes
54
+
-------------------
34
55
35
-
In this release, we have shortened the name of categorical encoders, and also renamed
36
-
other classes of Feature-engine to simplify import syntax.
56
+
We shortened the name of categorical encoders, and also renamed other classes to
57
+
simplify import syntax.
37
58
38
59
- **Encoders**: the word ``Categorical`` was removed from the classes name. Now, instead of ``MeanCategoricalEncoder``, the class is called ``MeanEncoder``. Instead of ``RareLabelCategoricalEncoder`` it is ``RareLabelEncoder`` and so on. Please check the encoders documentation for more details.
39
60
- **Imputers**: the ``CategoricalVariableImputer`` is now called ``CategoricalImputer``.
40
61
- **Discretisers**: the ``UserInputDiscretiser`` is now called ``ArbitraryDiscretiser``.
41
62
- **Creation**: the ``MathematicalCombinator`` is not called ``MathematicalCombination``.
42
63
- **WoEEncoder and PRatioEncoder**: the ``WoEEncoder`` now applies only encoding with the weight of evidence. To apply encoding by probability ratios, use a different transformer: the ``PRatioEncoder`` (**by Nicolas Galli**).
43
64
44
-
**Renaming of class init Parameters**:
65
+
Renaming of Parameters
66
+
----------------------
45
67
46
68
We renamed a few parameters to unify the nomenclature across the Package.
47
69
48
70
- **EndTailImputer**: the parameter ``distribution`` is now called ``imputation_method`` to unify convention among imputers. To impute using the IQR, we now need to pass ``imputation_method="iqr"`` instead of ``imputation_method="skewed"``.
49
71
- **AddMissingIndicator**: the parameter ``missing_only`` now takes the boolean values ``True`` or ``False``.
50
72
- **Winzoriser and OutlierTrimmer**: the parameter ``distribution`` is now called ``capping_method`` to unify names across Feature-engine transformers.
51
73
52
-
**New transformers and classes**:
53
74
54
-
We included a whole new module with multiple transformers to select features.
75
+
Tutorials
76
+
---------
55
77
56
-
- **DropConstantFeatures**: finds and removes constant and quasi-constant features from a dataframe (**by Tejash Shah**)
57
-
- **DropDuplicateFeatures**: finds and removes duplicated features from a dataset (**by Tejash Shah and Soledad Galli**)
58
-
- **DropCorrelatedFeatures**: finds and removes features that are correlated (**by Nicolas Galli**)
59
-
- **ShuffleFeaturesSelector**: selects features by determining the drop in machine learning model performance when each feature's values are randomly shuffled from a dataframe (**by Sana Ben Driss**)
60
-
- **SelectBySingleFeaturePerformance**: trains a model based of each individual features, and derives performance (**by Nicolas Galli**)
61
-
- **SelectByTargetMeanPerformance**: selects features encoding the categories with the target mean and using that as proxy for performance (**by Tung Lee and Soledad Galli**)
62
-
- **RecursiveFeatureElimination**: selects features recursively, evaluating the drop in ML performance, from the least to the important feature (**by Sana Ben Driss**)
63
-
- **RecursiveFeatureAddition**: selects features recursively, evaluating the increase in ML performance, after adding a new feature, starting from the most to the least important feature (**by Sana Ben Driss**)
78
+
- **Imputation**: updated "how to" examples of missing data imputation (**by Pradumna Suryawanshi**)
79
+
- **Encoders**: new and updated "how to" examples of categorical encoding (**by Ashok Kumar**)
80
+
- **Discretisation**: new and updated "how to" examples of discretisation (**by Ashok Kumar**)
81
+
82
+
83
+
For Contributors and Developers
84
+
-------------------------------
85
+
86
+
Code Architecture
87
+
~~~~~~~~~~~~~~~~~
64
88
65
-
**Code Architecture - Important for Contributors and Developers**:
66
89
- **Submodules**: transformers have been grouped within relevant submodules and modules.
67
90
- **Individual tests**: testing classes have been subdivided into individual tests
68
91
- **Code Style**: we adopted the use of flake8 for linting and PEP8 style checks, and black for automatic re-styling of code.
69
-
- **Type hint**: we rolled out the use of type hint throughout Feature-engine classes and functions (**by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah**)
92
+
- **Type hint**: we rolled out the use of type hint throughout classes and functions (**by Nodar Okroshiashvili, Soledad Galli and Chris Samiullah**)
93
+
94
+
Documentation
95
+
~~~~~~~~~~~~~
70
96
71
-
**Documentation**
72
97
- Switched fully to numpydoc and away from Napoleon
73
98
- Included more detail about methods, parameters, returns and raises, as per numpydoc docstring style (**by Nodar Okroshiashvili, Soledad Galli**)
74
99
- Linked documentation to github repository
75
100
- Improved layout
76
101
77
-
**Other Changes**:
102
+
Other Changes
103
+
-------------
104
+
78
105
- **Updated documentation**: documentation reflects the current use of Feature-engine transformers
79
106
- **Typo fixes**: Thank you to all who contributed to typo fixes (Tim Vink, Github user @piecot)
0 commit comments