|
| 1 | +--- |
| 2 | +title: "Pandas DataFrame Output for sklearn Transformers" |
| 3 | +date: November 8, 2022 |
| 4 | +categories: |
| 5 | + - Technical |
| 6 | +tags: |
| 7 | + - performance |
| 8 | +featured-image: pandas_output_sklearn_transformers.PNG |
| 9 | + |
| 10 | +postauthors: |
| 11 | + - name: Sangam SwadiK |
| 12 | + website: https://www.linkedin.com/in/sangam-swadi-k/ |
| 13 | + image: sangam_swadik.jpg |
| 14 | +--- |
| 15 | + |
| 16 | +<div> |
| 17 | + <img src="/assets/images/posts_images/{{ page.featured-image }}" alt=""> |
| 18 | + {% include postauthor.html %} |
| 19 | +</div> |
| 20 | + |
| 21 | +## Video |
| 22 | +<iframe width="560" height="315" src="https://www.youtube.com/embed/5bCg8VfX2x8" title="Pandas DataFrame Output for sklearn Transformers" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> |
| 23 | + |
| 24 | +## Upcoming feature in release 1.2 |
| 25 | +Starting with the next release of [scikit-learn](https://github.com/scikit-learn/scikit-learn) (v1.2), pandas dataframe output will be available for all sklearn transformers! This will make running pipelines on dataframes much easier and provide better ways to track feature names. Previously, mapping a transformed output back into columns would be cumbersome as it might not be a one-to-one mapping in cases of complex preprocessing (e.g., polynomial features). |
| 26 | + |
| 27 | +The pandas dataframe output feature for transformers solves this by tracking features generated from pipelines automatically. The transformer output format can be configured explictly for either **numpy** or **pandas** output formats as shown in [sklearn.set_config](https://scikit-learn.org/dev/modules/generated/sklearn.set_config.html#sklearn.set_config) and the sample code below. |
| 28 | +```python |
| 29 | +from sklearn import set_config |
| 30 | +set_config(transform_output = "pandas") |
| 31 | +``` |
| 32 | + |
| 33 | +See the sample notebook, [pandas-dataframe-output-for-sklearn-transformer.ipynb](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb) and documentation for a more detailed example and usage. |
| 34 | + |
| 35 | +## Links to documentation and example notebook |
| 36 | +- [Pandas output for transformers documentation](https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_set_output.html#sphx-glr-auto-examples-miscellaneous-plot-set-output-py) |
| 37 | +- [pandas-dataframe-output-for-sklearn-transformer.ipynb](https://github.com/scikit-learn/blog/blob/main/assets/notebooks/sklearn-pandas-df-output.ipynb) |
| 38 | + |
| 39 | + |
| 40 | +## Reporting bugs |
| 41 | +We'd love your feedback on this. In case of any suggestions or bugs, please report them at |
| 42 | +[scikit-learn issues](https://github.com/scikit-learn/scikit-learn/issues) |
| 43 | + |
| 44 | +Thanks 🙏🏾 to maintainers: [**Thomas J. Fan**](https://github.com/thomasjpfan), [**Guillaume Lemaitre**](https://github.com/glemaitre) , [**Christian Lorentzen**](https://github.com/lorentzenchr) !! |
0 commit comments