Skip to content

Commit 50d9523

Browse files
feat: log c plus transformer (#281)
* feat(log): log constant plus transformer * docs(LogCpTransformer): document LogCpTransformer * test(test_logcp_transformer): test logcp transformer * feat(log): add inverse transform method in LogCpTransformer * test(test_logcp_transformer): inverse transform and C_ parameter * docs(LogCpTransformer): C_ constant example * docs(LogCpTransformer): identations * LogCpTransformer example notebook * fix(log): C_ as dict for multiple variables * fix(log): incompatible types in assignment * method description * feat(log): log constant plus transformer * docs(LogCpTransformer): document LogCpTransformer * test(test_logcp_transformer): test logcp transformer * feat(log): add inverse transform method in LogCpTransformer * test(test_logcp_transformer): inverse transform and C_ parameter * docs(LogCpTransformer): C_ constant example * docs(LogCpTransformer): identations * LogCpTransformer example notebook * fix(log): C_ as dict for multiple variables * fix(log): incompatible types in assignment * method description * rebase master, changes to doscstrings, adds tests for inverse_transform * modifies base transformer to check variables from dictionary * modify to pass typechecks * fixes variable selection from user dictionary passed to c * merge solegalli branch * add transformer to readme and main index, minor changes in docstring Co-authored-by: Soledad Galli <[email protected]>
1 parent 117ea30 commit 50d9523

File tree

14 files changed

+798
-25
lines changed

14 files changed

+798
-25
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ More resources will be added as they appear online!
8989

9090
### Variable Transformation methods
9191
* LogTransformer
92+
* LogCpTransformer
9293
* ReciprocalTransformer
9394
* PowerTransformer
9495
* BoxCoxTransformer

docs/images/logcpraw.png

5.65 KB
Loading

docs/images/logcptransform.png

5.32 KB
Loading

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ Numerical Variable Transformation: Transformers
119119
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120120

121121
- :doc:`transformation/LogTransformer`: performs logarithmic transformation of numerical variables
122+
- :doc:`transformation/LogCpTransformer`: adds the variables a constant value and then applies the logarithm
122123
- :doc:`transformation/ReciprocalTransformer`: performs reciprocal transformation of numerical variables
123124
- :doc:`transformation/PowerTransformer`: performs power transformation of numerical variables
124125
- :doc:`transformation/BoxCoxTransformer`: performs Box-Cox transformation of numerical variables
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
LogCpTransformer
2+
================
3+
4+
API Reference
5+
-------------
6+
7+
.. autoclass:: feature_engine.transformation.LogCpTransformer
8+
:members:
9+
10+
11+
Example
12+
-------
13+
14+
.. code:: python
15+
16+
import pandas as pd
17+
import matplotlib.pyplot as plt
18+
from sklearn.model_selection import train_test_split
19+
from sklearn.datasets import load_boston
20+
21+
from feature_engine import transformation as vt
22+
23+
# Load dataset
24+
X, y = load_boston(return_X_y=True)
25+
X = pd.DataFrame(X)
26+
27+
# Separate into train and test sets
28+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
29+
30+
# set up the variable transformer
31+
tf = vt.LogCpTransformer(variables = [7, 12], C="auto")
32+
33+
# fit the transformer
34+
tf.fit(X_train)
35+
36+
# transform the data
37+
train_t= tf.transform(X_train)
38+
test_t= tf.transform(X_test)
39+
40+
# learned constant C
41+
tf.C_
42+
43+
.. code:: python
44+
45+
{7: 2.1742, 12: 2.73}
46+
47+
.. code:: python
48+
49+
# un-transformed variable
50+
X_train[12].hist()
51+
52+
.. image:: ../images/logcpraw.png
53+
54+
.. code:: python
55+
56+
# transformed variable
57+
train_t[12].hist()
58+
59+
.. image:: ../images/logcptransform.png

docs/transformation/LogTransformer.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Example
2121
from feature_engine import transformation as vt
2222
2323
# Load dataset
24-
data = data = pd.read_csv('houseprice.csv')
24+
data = pd.read_csv('houseprice.csv')
2525
2626
# Separate into train and test sets
2727
X_train, X_test, y_train, y_test = train_test_split(

docs/transformation/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ mathematical transformations.
1010
:maxdepth: 2
1111

1212
LogTransformer
13+
LogCpTransformer
1314
ReciprocalTransformer
1415
PowerTransformer
1516
BoxCoxTransformer

examples/transformation/LogCpTransformer.ipynb

Lines changed: 196 additions & 0 deletions
Large diffs are not rendered by default.

feature_engine/base_transformers.py

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
classes. Provides the base functionality within the fit() and transform() methods
33
shared by most transformers, like checking that input is a df, the size, NA, etc.
44
"""
5-
from typing import List, Optional, Union
5+
from typing import Dict, Optional
66

77
import pandas as pd
88
from sklearn.base import BaseEstimator, TransformerMixin
@@ -23,6 +23,47 @@ class BaseNumericalTransformer(BaseEstimator, TransformerMixin):
2323
variable transformers, discretisers, math combination.
2424
"""
2525

26+
def _select_variables_from_dict(
27+
self, X: pd.DataFrame, user_dict_: Dict
28+
) -> pd.DataFrame:
29+
"""
30+
Checks that input is a dataframe, checks that variables in the dictionary
31+
entered by the user are of type numerical.
32+
33+
Parameters
34+
----------
35+
X : Pandas DataFrame
36+
37+
user_dict_ : Dictionary. Default = None
38+
Any dictionary allowed by the transformer and entered by user.
39+
40+
Raises
41+
------
42+
TypeError
43+
If the input is not a Pandas DataFrame or a numpy array
44+
If any of the variables in the dictionary are not numerical
45+
ValueError
46+
If there are no numerical variables in the df or the df is empty
47+
If the variable(s) contain null values
48+
49+
Returns
50+
-------
51+
X : Pandas DataFrame
52+
The same dataframe entered as parameter
53+
"""
54+
# check input dataframe
55+
X = _is_dataframe(X)
56+
57+
# find or check for numerical variables
58+
variables = [x for x in user_dict_.keys()]
59+
self.variables_ = _find_or_check_numerical_variables(X, variables)
60+
61+
# check if dataset contains na or inf
62+
_check_contains_na(X, self.variables_)
63+
_check_contains_inf(X, self.variables_)
64+
65+
return X
66+
2667
def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None) -> pd.DataFrame:
2768
"""
2869
Checks that input is a dataframe, finds numerical variables, or alternatively
@@ -54,9 +95,7 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None) -> pd.DataFrame:
5495
X = _is_dataframe(X)
5596

5697
# find or check for numerical variables
57-
self.variables_: List[Union[str, int]] = _find_or_check_numerical_variables(
58-
X, self.variables
59-
)
98+
self.variables_ = _find_or_check_numerical_variables(X, self.variables)
6099

61100
# check if dataset contains na or inf
62101
_check_contains_na(X, self.variables_)

feature_engine/discretisation/arbitrary.py

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,7 @@
66
import pandas as pd
77

88
from feature_engine.base_transformers import BaseNumericalTransformer
9-
from feature_engine.dataframe_checks import (
10-
_check_contains_inf,
11-
_check_contains_na,
12-
_is_dataframe,
13-
)
149
from feature_engine.validation import _return_tags
15-
from feature_engine.variable_manipulation import _find_or_check_numerical_variables
1610

1711

1812
class ArbitraryDiscretiser(BaseNumericalTransformer):
@@ -131,15 +125,7 @@ def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):
131125
self
132126
"""
133127
# check input dataframe
134-
X = _is_dataframe(X)
135-
136-
# find or check for numerical variables
137-
variables = [x for x in self.binning_dict.keys()]
138-
self.variables_ = _find_or_check_numerical_variables(X, variables)
139-
140-
# check if dataset contains na or inf
141-
_check_contains_na(X, self.variables_)
142-
_check_contains_inf(X, self.variables_)
128+
X = super()._select_variables_from_dict(X, self.binning_dict)
143129

144130
# for consistency wit the rest of the discretisers, we add this attribute
145131
self.binner_dict_ = self.binning_dict

0 commit comments

Comments
 (0)