[Reland] Setup prompt-processsing and token-generation testing in text generation tasks #236

titaiwangms · 2025-09-26T16:20:28Z

Previous to this PR, in text-generation task, we are exporting and testing LLMs with multi-turn conversation, where the model runs "prompt processing" --> [for loop for "token generation"] --> using the for loop output for prompt processing again, and the whole run is restricted to have batch_size=1 in GQA contrib op.

In this PR, we export LLMs with token generation setting, and test the model with both token generation and prompt processing scenario.

Cited @kunal-vaishnavi

These are the general shapes:

input_ids = (batch_size, sequence_length)
attn_mask = (batch_size, past_sequence_length + sequence_length)
pos_ids = (batch_size, sequence_length)
past_key_values = (batch_size, num_key_value_heads, past_sequence_length, head_dim)
present_key_values = (batch_size, num_key_value_heads, past_sequence_length + sequence_length, head_dim)
Prompt processing (aka prefill):

input_ids = (batch_size, prompt_length)
attn_mask = (batch_size, 0 + prompt_length) = (batch_size, prompt_length)
pos_ids = (batch_size, prompt_length)
past_key_values = (batch_size, num_key_value_heads, 0, head_dim)
present_key_values = (batch_size, num_key_value_heads, 0 + prompt_length, head_dim) = (batch_size, * num_key_value_heads, prompt_length, head_dim)
Token generation (aka decode):

input_ids = (batch_size, 1)
attn_mask = (batch_size, past_sequence_length + 1)
pos_ids = (batch_size, 1)
past_key_values = (batch_size, num_key_value_heads, past_sequence_length, head_dim)
present_key_values = (batch_size, num_key_value_heads, past_sequence_length + 1, head_dim)

titaiwangms · 2025-10-07T00:32:22Z

This PR tried to export LLMs with multi-turn conversation (batch_size = 1 and sequence_length > 1 and past_sequence_length > 1) but blocked by torch.export.export dynamism with 0/1 specialization. The phi models complain 2 scenarios:

(1) input batch is dynamic, while output batch is 1

_unittests/ut_torch_models/test_validate_whole_models.py::TestValidateWholeModels::test_o_validate_phi35_4k_mini_instruct - onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Concat node. Name:'cat16' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 0 has mismatched dimensions of 1 and 2

(2) Shape env issue

FAILED _unittests/ut_torch_models/test_validate_models.py::TestValidateModel::test_validate_microsoft_phi4_reasoning - AssertionError: [patched_ShapeEnv] Ignored guard s60 + s70 <= s31 + s70 == True, this could result in accuracy problems
FAILED

titaiwangms · 2025-10-07T17:33:43Z

Closing this as the comment above.

titaiwangms added 13 commits September 19, 2025 17:46

draft improve llm random inputs

65f1ca0

Merge branch 'main' into titaiwang/fix_modelbuilder_discrepancy

94a9b10

resolve conflicts

5c55755

revert unintentional changes

aa8b0f8

add comments

cd1a19f

draft-patched_sdpa

f413ea7

set is_causal

8bd2fa1

support prompt processing and token generation

d65493b

fix mypy

df3ad9b

fix draft

d817f19

fix static cahce

f15360e

fix torch.export 0/1 specializing

6fea147

add a test

9568e18

This was referenced Sep 26, 2025

Setup prompt-processsing and token-generation testing in text generation tasks #229

Closed

Duplicates #229 to trigger CI #230

Closed

titaiwangms added 2 commits September 26, 2025 18:20

fix CIs - 4.48.3

393d391

fail fast

d527851

titaiwangms requested review from sdpython and xadupre September 26, 2025 19:49

titaiwangms added 3 commits September 26, 2025 23:20

disable ort tests

31dfd97

fix dynamic shape

77939dd

modelbuilder test is duplicated

5749221

titaiwangms mentioned this pull request Sep 27, 2025

Fix guess_dynamic_shapes_from_inputs #241

Open

titaiwangms added 7 commits September 27, 2025 00:51

broken api from tr main

21355b5

Merge branch 'main' into titaiwang/fix_modelbuilder_discrepancy

358159a

fix a test

dc11cfa

fix patch

3dd887a

use multi-turn batch=1 to export

fd455e1

disable modeling_utils rewrite

1f4ca3a

bring back inputs2

dc02405

titaiwangms added 4 commits October 6, 2025 21:06

fix CI

28cd455

enable sdpa rewritten patch

2badb72

only examine attention_mask shape when it's available

3430eb5

fix summary naming

ddbbdb3

titaiwangms closed this Oct 7, 2025

xadupre deleted the titaiwang/fix_modelbuilder_discrepancy branch November 12, 2025 12:25

xadupre restored the titaiwang/fix_modelbuilder_discrepancy branch November 12, 2025 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Reland] Setup prompt-processsing and token-generation testing in text generation tasks #236

[Reland] Setup prompt-processsing and token-generation testing in text generation tasks #236

Uh oh!

titaiwangms commented Sep 26, 2025

Uh oh!

titaiwangms commented Oct 7, 2025

Uh oh!

titaiwangms commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Reland] Setup prompt-processsing and token-generation testing in text generation tasks #236

[Reland] Setup prompt-processsing and token-generation testing in text generation tasks #236

Uh oh!

Conversation

titaiwangms commented Sep 26, 2025

Uh oh!

titaiwangms commented Oct 7, 2025

Uh oh!

titaiwangms commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants