Skip to content

docs: Expand streaming user guide with 6 real-world patterns#26731

Open
ThotDjehuty wants to merge 4 commits intopola-rs:mainfrom
ThotDjehuty:docs/expand-streaming-user-guide
Open

docs: Expand streaming user guide with 6 real-world patterns#26731
ThotDjehuty wants to merge 4 commits intopola-rs:mainfrom
ThotDjehuty:docs/expand-streaming-user-guide

Conversation

@ThotDjehuty
Copy link

Summary

The streaming page currently only covers collect(engine="streaming") and show_graph(). This PR expands it with actionable examples for the full streaming API surface.

What's added

streaming.py — 6 new tagged code sections:

Tag Pattern
larger_than_ram Realistic group_by/agg showing the chunk-based memory model
sink_parquet ETL write: scan CSV → enrich → sink_parquet (full output never in RAM)
sink_csv Aggregation → sink_csv
sink_batches Per-batch callback: count rows in each chunk
collect_async Two concurrent queries via asyncio.gather + collect_async()
partition_pruning Multi-file scan_parquet with predicate pushdown skipping non-matching files

streaming.md — expanded from ~1,500 to ~4,000 bytes:

  • API summary table at the top (all 6 streaming APIs in one view)
  • ASCII diagram of the chunk-based memory pipeline
  • "Writing in streaming mode: sink_*" section (sink_parquet + sink_csv)
  • "Per-batch callbacks: sink_batches" section
  • "Concurrent async execution: collect_async" section
  • "Multi-file scans and partition pruning" section with Hive-partition best-practices note

The hidden comment <!-- Not included in the docs "until we have something we are proud of" --> has been removed now that the page has substantive content.

Testing

All examples are self-contained (iris fixture + tempfile.TemporaryDirectory), requiring no external network access. Compatible with the existing docs CI.

Related

#20947 (streaming engine tracking issue)

Adds: larger_than_ram, sink_parquet, sink_csv, sink_batches, collect_async, partition_pruning

Related: pola-rs#20947
Adds API summary table, ASCII memory model diagram, and sections for:
larger_than_ram, sink_*, sink_batches, collect_async, partition_pruning.

Related: pola-rs#20947
Adds: larger_than_ram, sink_parquet, sink_csv, sink_batches, collect_async, partition_pruning

Related: pola-rs#20947
Adds API summary table, ASCII memory model diagram, and sections for:
larger_than_ram, sink_*, sink_batches, collect_async, partition_pruning.

Related: pola-rs#20947
@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.42%. Comparing base (0a50a14) to head (696fbdb).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #26731      +/-   ##
==========================================
- Coverage   81.43%   81.42%   -0.02%     
==========================================
  Files        1801     1801              
  Lines      246750   246750              
  Branches     3081     3081              
==========================================
- Hits       200936   200908      -28     
- Misses      45028    45056      +28     
  Partials      786      786              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexander-beedie alexander-beedie changed the title docs(streaming): expand streaming user guide with 6 real-world patterns docs: Expand streaming user guide with 6 real-world patterns Feb 27, 2026
@github-actions github-actions bot added A-streaming Related to the streaming engine documentation Improvements or additions to documentation python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Feb 27, 2026
@orlp
Copy link
Member

orlp commented Mar 2, 2026

Did you follow the AI policy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-streaming Related to the streaming engine documentation Improvements or additions to documentation python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants