Skip to content

Commit 1d1a31c

Browse files
authored
Merge pull request #145 from SangamSwadiK/pandas_output_for_transformers
Add post on Pandas dataframe output for sklearn-transformers
2 parents 6936c2c + ab637d4 commit 1d1a31c

File tree

3 files changed

+44
-0
lines changed

3 files changed

+44
-0
lines changed
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: "Pandas DataFrame Output for sklearn Transformers"
3+
date: November 8, 2022
4+
categories:
5+
- Technical
6+
tags:
7+
- performance
8+
featured-image: pandas_output_sklearn_transformers.PNG
9+
10+
postauthors:
11+
- name: Sangam SwadiK
12+
website: https://www.linkedin.com/in/sangam-swadi-k/
13+
image: sangam_swadik.jpg
14+
---
15+
16+
<div>
17+
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
18+
{% include postauthor.html %}
19+
</div>
20+
21+
## Video
22+
<iframe width="560" height="315" src="https://www.youtube.com/embed/5bCg8VfX2x8" title="Pandas DataFrame Output for sklearn Transformers" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
23+
24+
## Upcoming feature in release 1.2
25+
Starting with the next release of [scikit-learn](https://github.com/scikit-learn/scikit-learn) (v1.2), pandas dataframe output will be available for all sklearn transformers! This will make running pipelines on dataframes much easier and provide better ways to track feature names. Previously, mapping a transformed output back into columns would be cumbersome as it might not be a one-to-one mapping in cases of complex preprocessing (e.g., polynomial features).
26+
27+
The pandas dataframe output feature for transformers solves this by tracking features generated from pipelines automatically. The transformer output format can be configured explictly for either **numpy** or **pandas** output formats as shown in [sklearn.set_config](https://scikit-learn.org/dev/modules/generated/sklearn.set_config.html#sklearn.set_config) and the sample code below.
28+
```python
29+
from sklearn import set_config
30+
set_config(transform_output = "pandas")
31+
```
32+
33+
See the sample notebook, [pandas-dataframe-output-for-sklearn-transformer.ipynb](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb) and documentation for a more detailed example and usage.
34+
35+
## Links to documentation and example notebook
36+
- [Pandas output for transformers documentation](https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_set_output.html#sphx-glr-auto-examples-miscellaneous-plot-set-output-py)
37+
- [pandas-dataframe-output-for-sklearn-transformer.ipynb](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb)
38+
39+
40+
## Reporting bugs
41+
We'd love your feedback on this. In case of any suggestions or bugs, please report them at
42+
[scikit-learn issues](https://github.com/scikit-learn/scikit-learn/issues)
43+
44+
Thanks 🙏🏾 to maintainers: [**Thomas J. Fan**](https://github.com/thomasjpfan), [**Guillaume Lemaitre**](https://github.com/glemaitre) , [**Christian Lorentzen**](https://github.com/lorentzenchr) !!
191 KB
Loading
352 KB
Loading

0 commit comments

Comments
 (0)