ENH: Add engine='polars' support in read_csv #61989
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Enhancement: Add
engine='polars'
Support inread_csv
🔧 Summary of Changes
This PR introduces support for using [Polars](https://pola-rs.github.io/polars/py-polars/html/reference/api/pl.read_csv.html) as a backend CSV parsing engine in
pandas.read_csv
, providing faster parsing capabilities for large files.The following changes are included:
✅ Added support for
engine="polars"
inpandas.read_csv
✅ Dynamically imported Polars and handled
ImportError
gracefully✅ Filtered
read_csv()
kwargs to only allow those compatible with Polars✅ Converted
Path
input to string (Polars does not accept path-like objects in all versions)✅ Added test case
test_read_csv_with_polars
undertests/io/parser
✅ Updated version to
2.3.3.dev0
in__init__.py
andpyproject.toml
(as part of the development build)✅ Resolved all
ruff
linter errors and pre-commit hook failures (e.g., B904, E501, F841, SC1017)✅ Formatted shell scripts using
dos2unix
to fix line-ending issues across:ci/code_checks.sh
ci/run_tests.sh
scripts/cibw_before_build.sh
scripts/download_wheels.sh
scripts/upload_wheels.sh
gitpod/workspace_config
📆 Usage Example
✅ Expected Output:
💡 Why This Matters
Polars is a high-performance DataFrame library designed for speed and multi-threaded performance. Adding it as a supported backend:
c
,python
, orpolars
)✅ Tests & Quality Checks
test_read_csv_with_polars
ruff
,shellcheck
,cython-lint
,codespell
, etc.dos2unix
for consistent CI/CD compatibility🧠 Notes
polars
is treated as an optional dependency“Polars is not installed. Please install it with 'pip install polars'.”
🙌 Acknowledgements
Thanks to the maintainers for reviewing this contribution!
Looking forward to feedback or further improvements.