Skip to content

ENH: Add engine='polars' support in read_csv #61989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

abujabarmubarak
Copy link

🚀 Enhancement: Add engine='polars' Support in read_csv

🔧 Summary of Changes

This PR introduces support for using [Polars](https://pola-rs.github.io/polars/py-polars/html/reference/api/pl.read_csv.html) as a backend CSV parsing engine in pandas.read_csv, providing faster parsing capabilities for large files.

The following changes are included:

  • Added support for engine="polars" in pandas.read_csv

  • Dynamically imported Polars and handled ImportError gracefully

  • Filtered read_csv() kwargs to only allow those compatible with Polars

  • Converted Path input to string (Polars does not accept path-like objects in all versions)

  • Added test case test_read_csv_with_polars under tests/io/parser

  • Updated version to 2.3.3.dev0 in __init__.py and pyproject.toml (as part of the development build)

  • Resolved all ruff linter errors and pre-commit hook failures (e.g., B904, E501, F841, SC1017)

  • Formatted shell scripts using dos2unix to fix line-ending issues across:

    • ci/code_checks.sh
    • ci/run_tests.sh
    • scripts/cibw_before_build.sh
    • scripts/download_wheels.sh
    • scripts/upload_wheels.sh
    • gitpod/workspace_config

📆 Usage Example

import pandas as pd

df = pd.read_csv("sample.csv", engine="polars")
print(df)
✅ Expected Output:
   a  b
0  1  2
1  3  4

💡 Why This Matters

Polars is a high-performance DataFrame library designed for speed and multi-threaded performance. Adding it as a supported backend:

  • Provides significant performance boosts for CSV reading
  • Enhances flexibility for end-users to choose engines (like c, python, or polars)
  • Keeps Pandas future-ready with optional modular parsing backends

✅ Tests & Quality Checks

  • 🔪 Unit test added: test_read_csv_with_polars
  • ✅ Passed: All pytest tests
  • ✅ Passed: All pre-commit hooks
  • ✅ Passed: ruff, shellcheck, cython-lint, codespell, etc.
  • ↺ Converted scripts to LF line endings using dos2unix for consistent CI/CD compatibility

🧠 Notes

  • polars is treated as an optional dependency
  • If not installed, Pandas will raise a clear error:
    “Polars is not installed. Please install it with 'pip install polars'.”

🙌 Acknowledgements

Thanks to the maintainers for reviewing this contribution!
Looking forward to feedback or further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant