Skip to content

Commit ae02213

Browse files
author
RJ Agrawal
committed
added numerical transformer
1 parent a96ae27 commit ae02213

File tree

4 files changed

+58
-1
lines changed

4 files changed

+58
-1
lines changed

README.rst

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In particular, it provides:
1111

1212
1. A way to map ``DataFrame`` columns to transformations, which are later recombined into features.
1313
2. A compatibility shim for old ``scikit-learn`` versions to cross-validate a pipeline that takes a pandas ``DataFrame`` as input. This is only needed for ``scikit-learn<0.16.0`` (see `#11 <https://github.com/paulgb/sklearn-pandas/issues/11>`__ for details). It is deprecated and will likely be dropped in ``skearn-pandas==2.0``.
14-
3. A couple of special transformers that work well with pandas inputs: ``CategoricalImputer`` and ``FunctionTransformer``.
14+
3. A numerical transformer, ``NumericalTransformer``, that provides commonly used numerical transformation options. This helps serialize the ``DataFrameMapper``.
1515

1616
Installation
1717
------------
@@ -370,6 +370,29 @@ A ``DataFrameMapper`` will return a dense feature array by default. Setting ``sp
370370
The stacking of the sparse features is done without ever densifying them.
371371

372372

373+
Using Numerical Transformer
374+
****************************
375+
376+
While you can use FunctionTransformation to generate artibtrary transformer but they cannot not serialized (pickled).
377+
NumericalTransformer takes function name as a string parameter and hence can be easily serialized.
378+
379+
>>> from sklearn_pandas import NumericalTransformer
380+
>>> mapper5 = DataFrameMapper([
381+
... ('children', NumericalTransformer('log')),
382+
... ])
383+
>>> mapper5.fit_transform(data)
384+
array([[1.38629436],
385+
[1.79175947],
386+
[1.09861229],
387+
[1.09861229],
388+
[0.69314718],
389+
[1.09861229],
390+
[1.60943791],
391+
[1.38629436]])
392+
393+
394+
395+
373396
Changelog
374397
---------
375398

sklearn_pandas/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
from .dataframe_mapper import DataFrameMapper # NOQA
44
from .cross_validation import cross_val_score, GridSearchCV, RandomizedSearchCV # NOQA
55
from .features_generator import gen_features # NOQA
6+
from .transformers import NumericalTransformer # NOQA

sklearn_pandas/dataframe_mapper.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,7 @@ def __setstate__(self, state):
166166
self.built_features = state.get('built_features', self.features)
167167
self.built_default = state.get('built_default', self.default)
168168
self.transformed_names_ = state.get('transformed_names_', [])
169+
self.show_progressbar = state.get('show_progressbar', False)
169170

170171
def _get_col_subset(self, X, cols, input_df=False):
171172
"""

sklearn_pandas/transformers.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import numpy as np
22
import pandas as pd
33

4+
from sklearn.base import BaseEstimator, TransformerMixin
45

56
def _get_mask(X, value):
67
"""
@@ -12,3 +13,34 @@ def _get_mask(X, value):
1213
return pd.isnull(X)
1314
else:
1415
return X == value
16+
17+
18+
class NumericalTransformer(TransformerMixin):
19+
"""
20+
Provides commonly used numerical transformers.
21+
"""
22+
SUPPORTED_FUNCTIONS = ['log', 'log1p']
23+
24+
def __init__(self, func):
25+
"""
26+
Params
27+
28+
func function to apply to input columns. The function will be applied to each value.
29+
Supported functions are defined in SUPPORTED_FUNCTIONS variable. Throws assertion
30+
error if the not supported.
31+
"""
32+
assert func in self.SUPPORTED_FUNCTIONS, \
33+
f"Only following func arguments are supported: {self.SUPPORTED_FUNCTIONS}"
34+
super(NumericalTransformer, self).__init__()
35+
self.__func = func
36+
37+
def fit(self, X, y=None):
38+
return self
39+
40+
def transform(self, X, y=None):
41+
if self.__func == 'log1p':
42+
return np.vectorize(np.log1p)(X)
43+
elif self.__func == 'log':
44+
return np.vectorize(np.log)(X)
45+
46+
raise ValueError(f"Invalid function name: {self.__func}")

0 commit comments

Comments
 (0)