Skip to content

feat(ml): add stateless bundle-local size-aware batching and benchmark#37532

Open
Eliaaazzz wants to merge 1 commit intoapache:masterfrom
Eliaaazzz:users/elia/stateless-smart-bucketing
Open

feat(ml): add stateless bundle-local size-aware batching and benchmark#37532
Eliaaazzz wants to merge 1 commit intoapache:masterfrom
Eliaaazzz:users/elia/stateless-smart-bucketing

Conversation

@Eliaaazzz
Copy link
Contributor

Updates #37531

Summary

This PR adds an opt-in stateless bundle-local size-aware batching path for variable-length inference workloads in RunInference.

It introduces SortAndBatchElements in apache_beam/transforms/util.py, which:

  1. Buffers elements within a bundle (StartBundleFinishBundle)
  2. Orders elements by size (default len(x), overridable via element_size_fn)
  3. Forms batches under existing constraints (max_batch_size, max_batch_weight)

Default behavior remains unchanged unless this path is enabled.

Motivation

BatchElements is count-based. For heavy-tail length distributions, long outliers can inflate padding cost for many short elements in the same batch, increasing tail latency and reducing effective throughput.

This PR provides a stateless (bundle-local, no shuffle) way to improve batch composition under variable-length inputs.

Mechanism clarification

A strict-control ablation is included to isolate effects:

  • Fixed boundaries + intra-batch sorting only: negligible gain on Pareto
  • Size-aware ordering + max_batch_weight: significant gain

In this workload, gains are primarily consistent with boundary changes under weight constraints after size-aware ordering, rather than intra-batch reordering alone.

Benchmark methodology

Script: apache_beam/transforms/sort_and_batch_benchmark.py

  • N=20 trials (3 warmup trials excluded)
  • Same fixed corpus and seed for A/B
  • Metrics: padding ratio, throughput (Ktok/s), E2E latency, batch p95/p99
  • Invariants checked: element/token conservation

Pareto (heavy-tail) results

Configuration:

  • N=10,000 elements
  • max_batch_size=32, max_batch_weight=2000
  • Input stats: mean=4.2, median=1, std=18.5, max=500

Baseline → Stateless:

  • Padding ratio: 13.00x → 3.19x (↓75.5%)
  • Throughput median: 2.2 → 7.3 Ktok/s (↑230.4%)
  • E2E latency p95: 18924.3 → 5728.6 ms (↓69.7%)
  • Batch latency p95: 283.3 → 64.4 ms (↓77.3%)
  • Batch latency p99: 1011.9 → 632.5 ms (↓37.5%)
  • Batch count: 313 → 321 (+3%)
  • Invariants: elements/tokens matched

Scope

Included in this PR:

  • Stateless, bundle-local implementation (SortAndBatchElements)
  • Unit tests
  • Benchmark + strict-control ablation

Not included in this PR:

  • Stateful/global keying strategy
  • Cross-worker/global reordering
  • Auto-tuning or public weighting knobs

Files changed

  • apache_beam/transforms/util.py
  • apache_beam/transforms/util_test.py
  • apache_beam/transforms/sort_and_batch_benchmark.py

Notes

Claims in this PR are scoped to the Pareto heavy-tail setup used above. Broader-distribution conclusions and stateful/global strategy are follow-up work.

image image

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.

@github-actions github-actions bot added the python label Feb 6, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Eliaaazzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances Apache Beam's capabilities for machine learning inference by introducing a novel size-aware batching mechanism. The new SortAndBatchElements transform intelligently groups elements of similar sizes within a bundle, significantly reducing the computational waste associated with padding variable-length inputs. This optimization leads to substantial improvements in throughput and a reduction in inference latency, particularly beneficial for workloads with diverse input sizes.

Highlights

  • New Stateless Size-Aware Batching: Introduced SortAndBatchElements in apache_beam/transforms/util.py, an opt-in stateless (bundle-local) transform that sorts elements by size before batching. This is designed to optimize variable-length inference workloads.
  • Performance Improvements: Benchmarking shows significant gains for heavy-tail length distributions: a 75.5% reduction in padding ratio, a 230.4% increase in median throughput, and approximately 70% reduction in p95 end-to-end and batch latency.
  • Comprehensive Benchmarking: A new benchmark script (apache_beam/transforms/sort_and_batch_benchmark.py) has been added to compare the new SortAndBatchElements with the existing BatchElements, validating its efficiency improvements.
  • Robust Unit Testing: Extensive unit tests have been added in apache_beam/transforms/util_test.py to ensure the correctness and reliability of the SortAndBatchElements transform, covering various scenarios including windowing and parameter validation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdks/python/apache_beam/transforms/sort_and_batch_benchmark.py
    • Added a new benchmark script to compare the performance of BatchElements (baseline) against the new SortAndBatchElements (stateless).
    • Implemented data generators for various length distributions (Pareto, log-normal, bimodal, low-variance) to simulate diverse inference workloads.
    • Included a simulated inference latency model to provide realistic performance metrics.
    • Developed statistical helpers for percentile calculation and padding efficiency analysis.
    • Added invariant validation to ensure element and token conservation across batching strategies.
  • sdks/python/apache_beam/transforms/util.py
    • Introduced SortAndBatchElements PTransform, which buffers elements within a bundle, sorts them by size, and then forms batches based on max_batch_size and max_batch_weight.
    • Added _SortAndBatchElementsDoFn for default windowing and _WindowAwareSortAndBatchElementsDoFn for window-aware processing, handling element buffering, sorting, and batch emission.
    • Implemented _default_element_size_fn to provide a flexible way to determine element size, defaulting to len() or 1 if len() is not supported.
    • Updated the __all__ list to include SortAndBatchElements for public access.
  • sdks/python/apache_beam/transforms/util_test.py
    • Added SortAndBatchElementsTest class with multiple test cases to verify the functionality of the new transform.
    • Tests confirm that elements are correctly sorted by size within batches.
    • Validation tests ensure max_batch_size, max_batch_weight, and element_size_fn parameters are handled correctly.
    • Tests cover default and custom element_size_fn behavior for various data types.
    • Included tests for empty input, single element input, and correct handling of windowed elements.
    • A specific test demonstrates the padding efficiency improvement achieved by SortAndBatchElements compared to traditional batching.
Activity
  • The pull request introduces a new feature, SortAndBatchElements, aimed at improving ML inference performance.
  • A new benchmark script has been added, providing a clear comparison between the new and existing batching strategies.
  • Benchmark results for a heavy-tail distribution show significant performance gains, including a 75.5% reduction in padding ratio and a 230.4% increase in throughput.
  • Comprehensive unit tests have been implemented to ensure the correctness and robustness of the new transform across different scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@Eliaaazzz Eliaaazzz force-pushed the users/elia/stateless-smart-bucketing branch 3 times, most recently from 907ddfd to 501bf5c Compare February 7, 2026 01:08
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

Assigning reviewers:

R: @jrmccluskey for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@Eliaaazzz Eliaaazzz force-pushed the users/elia/stateless-smart-bucketing branch 5 times, most recently from 4948b6e to 142962d Compare February 7, 2026 11:49
@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 4.89130% with 350 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.99%. Comparing base (d6920de) to head (142962d).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...eam/testing/benchmarks/sort_and_batch_benchmark.py 0.00% 263 Missing ⚠️
sdks/python/apache_beam/transforms/util.py 17.14% 87 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #37532      +/-   ##
============================================
- Coverage     40.06%   39.99%   -0.07%     
  Complexity     3404     3404              
============================================
  Files          1177     1178       +1     
  Lines        187083   187526     +443     
  Branches       3581     3581              
============================================
+ Hits          74947    75002      +55     
- Misses       108744   109132     +388     
  Partials       3392     3392              
Flag Coverage Δ
python 39.53% <4.89%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Eliaaazzz Eliaaazzz force-pushed the users/elia/stateless-smart-bucketing branch from 142962d to 541fd95 Compare February 7, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant