Setup prompt-processsing and token-generation testing in text generation tasks #229

titaiwangms · 2025-09-22T22:18:25Z

Previous to this PR, in text-generation task, we are exporting and testing LLMs with multi-turn conversation, where the model runs "prompt processing" --> [for loop for "token generation"] --> using the for loop output for prompt processing again, and the whole run is restricted to have batch_size=1 in GQA contrib op.

In this PR, we export LLMs with token generation setting, and test the model with both token generation and prompt processing scenario.

Cited @kunal-vaishnavi

These are the general shapes:

input_ids = (batch_size, sequence_length)
attn_mask = (batch_size, past_sequence_length + sequence_length)
pos_ids = (batch_size, sequence_length)
past_key_values = (batch_size, num_key_value_heads, past_sequence_length, head_dim)
present_key_values = (batch_size, num_key_value_heads, past_sequence_length + sequence_length, head_dim)

Prompt processing (aka prefill):

input_ids = (batch_size, prompt_length)
attn_mask = (batch_size, 0 + prompt_length) = (batch_size, prompt_length)
pos_ids = (batch_size, prompt_length)
past_key_values = (batch_size, num_key_value_heads, 0, head_dim)
present_key_values = (batch_size, num_key_value_heads, 0 + prompt_length, head_dim) = (batch_size, * num_key_value_heads, prompt_length, head_dim)

Token generation (aka decode):

input_ids = (batch_size, 1)
attn_mask = (batch_size, past_sequence_length + 1)
pos_ids = (batch_size, 1)
past_key_values = (batch_size, num_key_value_heads, past_sequence_length, head_dim)
present_key_values = (batch_size, num_key_value_heads, past_sequence_length + 1, head_dim)

onnx_diagnostic/torch_export_patches/patches/patch_transformers.py

titaiwangms · 2025-09-24T15:14:18Z

@sdpython @xadupre Is there a full benchmarking I can run before merging it?

sdpython · 2025-09-24T15:54:10Z

CI is not running. I wonder why so I can't tell if the tests are passing. Let me create a temporary PR.

sdpython · 2025-09-24T16:02:20Z

CI is running.

onnx_diagnostic/torch_export_patches/patches/patch_transformers.py

titaiwangms · 2025-09-26T16:20:40Z

#236

titaiwangms added 6 commits September 19, 2025 17:46

draft improve llm random inputs

65f1ca0

Merge branch 'main' into titaiwang/fix_modelbuilder_discrepancy

94a9b10

resolve conflicts

5c55755

revert unintentional changes

aa8b0f8

add comments

cd1a19f

draft-patched_sdpa

f413ea7

titaiwangms commented Sep 23, 2025

View reviewed changes

onnx_diagnostic/torch_export_patches/patches/patch_transformers.py Outdated Show resolved Hide resolved

titaiwangms added 3 commits September 23, 2025 21:00

set is_causal

8bd2fa1

support prompt processing and token generation

d65493b

fix mypy

df3ad9b

titaiwangms marked this pull request as ready for review September 23, 2025 22:01

titaiwangms requested a review from sdpython September 24, 2025 15:13

sdpython reviewed Sep 24, 2025

View reviewed changes

onnx_diagnostic/torch_export_patches/patches/patch_transformers.py Outdated Show resolved Hide resolved

titaiwangms added 4 commits September 24, 2025 21:07

fix draft

d817f19

fix static cahce

f15360e

fix torch.export 0/1 specializing

6fea147

add a test

9568e18

titaiwangms closed this Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setup prompt-processsing and token-generation testing in text generation tasks #229

Setup prompt-processsing and token-generation testing in text generation tasks #229

Uh oh!

titaiwangms commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

titaiwangms commented Sep 24, 2025

Uh oh!

sdpython commented Sep 24, 2025

Uh oh!

sdpython commented Sep 24, 2025

Uh oh!

Uh oh!

titaiwangms commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Setup prompt-processsing and token-generation testing in text generation tasks #229

Setup prompt-processsing and token-generation testing in text generation tasks #229

Uh oh!

Conversation

titaiwangms commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

titaiwangms commented Sep 24, 2025

Uh oh!

sdpython commented Sep 24, 2025

Uh oh!

sdpython commented Sep 24, 2025

Uh oh!

Uh oh!

titaiwangms commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titaiwangms commented Sep 22, 2025 •

edited

Loading