register sdpa operator with model config #3793

lanluo-nvidia · 2025-08-25T20:35:13Z

Description

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/register_sdpa.py	2025-08-25 20:35:29.149375+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/register_sdpa.py	2025-08-25 20:35:59.784651+00:00
@@ -162,6 +162,6 @@

    gm = clean_up_graph_after_modifications(gm)
    new_output_tensors = create_random_output_tensors(new_outputs)
    new_out_spec = pytree.tree_flatten(new_output_tensors)[1]
    gm._out_spec = new_out_spec
-    return gm
\ No newline at end of file
+    return gm
--- /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/sdpa_converter.py	2025-08-25 20:35:29.149375+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/sdpa_converter.py	2025-08-25 20:35:59.940021+00:00
@@ -286,11 +286,10 @@
                    false_input = mm
                    output_layer = if_layer.add_output(
                        true_input.get_output(0), false_input.get_output(0)
                    )
                    scaled_add_attn_bias = output_layer.get_output(0)
-    

    softmax = impl.normalization.softmax(
        ctx, target, source_ir, name + "_softmax", scaled_add_attn_bias, -1, False
    )
    if use_fp32_acc:

lanluo-nvidia and others added 8 commits August 22, 2025 12:45

initial check in

060266a

add kv cache support(not working yet)

1d4d439

add if_conditional, not working though.

22197e2

Index converter dynamic cases fix

2b93ac1

support for boolean indices

879410b

mask test cases and correction

d2778f5

adding the discontinuous mask indices case

76f7f55

unifying the squeee layer

51a60a5

meta-cla bot added the cla signed label Aug 25, 2025

github-actions bot requested changes Aug 25, 2025

View reviewed changes

lanluo-nvidia mentioned this pull request Aug 25, 2025

add sliding window support for Gemma3 #3742

Closed

7 tasks

lanluo-nvidia changed the title ~~try to use the attn mask transformer passed in~~ try to use the attn mask transformer passed in(Not working with KV Cache) Aug 25, 2025

register lowering pass with model config

f9b66ad

lanluo-nvidia changed the title ~~try to use the attn mask transformer passed in(Not working with KV Cache)~~ register sdpa operator with model config Aug 29, 2025

Merge branch 'main' into lluo/attn_mask_kv_cache

04bb39e

lanluo-nvidia marked this pull request as ready for review August 29, 2025 00:33

lanluo-nvidia added 3 commits August 28, 2025 17:37

clean up code

befc0b9

test

e59a529

add register for different sdpa

6a1a02e

lanluo-nvidia closed this Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

register sdpa operator with model config #3793

register sdpa operator with model config #3793

Uh oh!

lanluo-nvidia commented Aug 25, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

register sdpa operator with model config #3793

register sdpa operator with model config #3793

Uh oh!

Conversation

lanluo-nvidia commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lanluo-nvidia commented Aug 25, 2025 •

edited

Loading