Skip to content

Commit 53205e2

Browse files
committed
Add links to issues and documentation, and fix punctuation
1 parent 5d41f47 commit 53205e2

File tree

1 file changed

+18
-6
lines changed

1 file changed

+18
-6
lines changed

_posts/2022-11-08-pandas-dataframe-output-for-sklearn-transformer.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ date: November 8, 2022
44
categories:
55
- Technical
66
tags:
7-
- Sklearn-Transformers
7+
- performance
88
featured-image: pandas_output_sklearn_transformers.PNG
99

1010
postauthors:
@@ -22,9 +22,21 @@ postauthors:
2222
<iframe width="560" height="315" src="https://www.youtube.com/embed/5bCg8VfX2x8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
2323

2424
## Upcoming feature in release 1.2
25-
Starting next release(v1.2) Scikit-learn provides the ability for the outputs of Scikit-learn transformers to be either in Numpy or Pandas format by configuring it explicitly.Previously, mapping a transformed output back into columns would be cumbersome as it might not be a one-to-one mapping because of complex preprocessing (e.g: Polynomial features ).
26-
The next release(v1.2) Pandas output for transformers maps the transformed features into corresponding names/how they were created automatically.This would be useful for more complex preprocessing pipelines.
25+
Starting with the next release of [scikit-learn](https://github.com/scikit-learn/scikit-learn) (v1.2), pandas dataframe output will be available for all sklearn transformers! This will make running pipelines on dataframes much easier and provide better ways to track feature names. Previously, mapping a transformed output back into columns would be cumbersome as it might not be a one-to-one mapping in cases of complex preprocessing (e.g., polynomial features ).
2726

28-
## Links to Sample notebook and usage:
29-
- [Pandas output for transformers](https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_set_output.html#sphx-glr-auto-examples-miscellaneous-plot-set-output-py)
30-
- [Sample notebook](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb)
27+
The pandas dataframe output feature for transformers solves this by tracking features generated from pipelines automatically. The transformer output format can be configured explictly for either **numpy** or **pandas** output formats as shown in [sklearn.set_config](https://scikit-learn.org/dev/modules/generated/sklearn.set_config.html#sklearn.set_config) and the sample code below.
28+
```python
29+
from sklearn import set_config
30+
set_config(transform_output = "pandas")
31+
```
32+
33+
Please see the sample notebook and documentation for a more detailed example and usage.
34+
35+
## Links to documentation and example notebook:
36+
- [Pandas output for transformers documentation](https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_set_output.html#sphx-glr-auto-examples-miscellaneous-plot-set-output-py)
37+
- [Sample notebook](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb)
38+
39+
40+
## Reporting bugs:
41+
We'd love your feedback on this. In case of any suggestions or bugs, please report them at
42+
[scikit-learn issues](https://github.com/scikit-learn/scikit-learn/issues)

0 commit comments

Comments
 (0)