Use MaxText max_segments_per_seq config variable to control Grain batch packing #2774

gabeweisz · 2025-12-02T19:43:24Z

Description

When using THD format packed data with TransformerEngine, the user must specify the maximum number of segments that can be packed into a sequence at Jax JIT time. If grain packs more segments than allowed, then this can cause crashes or data corruption.

We have previously updated grain to allow limiting the number of segments to pack into a sequence, and this PR takes the appropriate value from the MaxText configuration and passes it to Grain

Tests

We have had this fix in place in our AMD fork of MaxText for some time, but needed to get the Grain fix upstreamed first before creating this PR.
We have tested this fix extensively internally and have customers using it in production.

MaxText does not currently have any tests that use packed batches, but I can create some if needed.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

yeandy

We may need to add max_sequences_per_bin=config.max_segments_per_seq in make_hf_eval_iterator too

gabeweisz · 2025-12-04T18:56:12Z

We may need to add max_sequences_per_bin=config.max_segments_per_seq in make_hf_eval_iterator too

Done, thanks for the tip

aireenmei · 2025-12-10T03:58:21Z

src/MaxText/input_pipeline/_hf_data_processing.py

        use_sft=config.use_sft,
        sft_train_on_completion_only=config.sft_train_on_completion_only,
        chat_template_path=config.chat_template_path,
+        max_sequences_per_bin=config.max_segments_per_seq,


According to this PR, max_segment_per_seq is only relevant for GPU packed attention. But this change will apply it to TPU workloads as well.
To be cleaner, it's better to align the behavior across hardware, and across different pipelines (grain pipeline's FirstFitPackIterDataset also has this parameter). We can set the default value to -1, which means no limit (passing None to PackAndBatchOperation)

Plumb config.max_segments_per_seq to grain PackAndBatchOperation

b9de909

gabeweisz requested review from NicoGrande, SurbhiJainUSC, aireenmei, richjames0 and shralex as code owners December 2, 2025 19:43

yeandy reviewed Dec 2, 2025

View reviewed changes

Add max_sequences_per_bin to make_hf_eval_iterator too

6f2a59e

gabeweisz closed this Dec 4, 2025

gabeweisz reopened this Dec 4, 2025

aireenmei requested changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use MaxText max_segments_per_seq config variable to control Grain batch packing #2774

Use MaxText max_segments_per_seq config variable to control Grain batch packing #2774

Uh oh!

gabeweisz commented Dec 2, 2025

Uh oh!

yeandy left a comment

Uh oh!

gabeweisz commented Dec 4, 2025

Uh oh!

aireenmei Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use MaxText max_segments_per_seq config variable to control Grain batch packing #2774

Are you sure you want to change the base?

Use MaxText max_segments_per_seq config variable to control Grain batch packing #2774

Uh oh!

Conversation

gabeweisz commented Dec 2, 2025

Description

Tests

Checklist

Uh oh!

yeandy left a comment

Choose a reason for hiding this comment

Uh oh!

gabeweisz commented Dec 4, 2025

Uh oh!

aireenmei Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants