Skip to content

Conversation

abujabarmubarak
Copy link

🚀 Enhancement: Add engine='polars' Support in read_csv

🔧 Summary of Changes

This PR introduces support for using [Polars](https://pola-rs.github.io/polars/py-polars/html/reference/api/pl.read_csv.html) as a backend CSV parsing engine in pandas.read_csv, providing faster parsing capabilities for large files.

The following changes are included:

  • Added support for engine="polars" in pandas.read_csv

  • Dynamically imported Polars and handled ImportError gracefully

  • Filtered read_csv() kwargs to only allow those compatible with Polars

  • Converted Path input to string (Polars does not accept path-like objects in all versions)

  • Added test case test_read_csv_with_polars under tests/io/parser

  • Updated version to 2.3.3.dev0 in __init__.py and pyproject.toml (as part of the development build)

  • Resolved all ruff linter errors and pre-commit hook failures (e.g., B904, E501, F841, SC1017)

  • Formatted shell scripts using dos2unix to fix line-ending issues across:

    • ci/code_checks.sh
    • ci/run_tests.sh
    • scripts/cibw_before_build.sh
    • scripts/download_wheels.sh
    • scripts/upload_wheels.sh
    • gitpod/workspace_config

📆 Usage Example

import pandas as pd

df = pd.read_csv("sample.csv", engine="polars")
print(df)
✅ Expected Output:
   a  b
0  1  2
1  3  4

💡 Why This Matters

Polars is a high-performance DataFrame library designed for speed and multi-threaded performance. Adding it as a supported backend:

  • Provides significant performance boosts for CSV reading
  • Enhances flexibility for end-users to choose engines (like c, python, or polars)
  • Keeps Pandas future-ready with optional modular parsing backends

✅ Tests & Quality Checks

  • 🔪 Unit test added: test_read_csv_with_polars
  • ✅ Passed: All pytest tests
  • ✅ Passed: All pre-commit hooks
  • ✅ Passed: ruff, shellcheck, cython-lint, codespell, etc.
  • ↺ Converted scripts to LF line endings using dos2unix for consistent CI/CD compatibility

🧠 Notes

  • polars is treated as an optional dependency
  • If not installed, Pandas will raise a clear error:
    “Polars is not installed. Please install it with 'pip install polars'.”

🙌 Acknowledgements

Thanks to the maintainers for reviewing this contribution!
Looking forward to feedback or further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant