Skip to content

Conversation

lanluo-nvidia
Copy link
Collaborator

@lanluo-nvidia lanluo-nvidia commented Aug 25, 2025

Description

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla bot added the cla signed label Aug 25, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/register_sdpa.py	2025-08-25 20:35:29.149375+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/register_sdpa.py	2025-08-25 20:35:59.784651+00:00
@@ -162,6 +162,6 @@

    gm = clean_up_graph_after_modifications(gm)
    new_output_tensors = create_random_output_tensors(new_outputs)
    new_out_spec = pytree.tree_flatten(new_output_tensors)[1]
    gm._out_spec = new_out_spec
-    return gm
\ No newline at end of file
+    return gm
--- /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/sdpa_converter.py	2025-08-25 20:35:29.149375+00:00
+++ /home/runner/work/TensorRT/TensorRT/tools/llm/torchtrt_ext/sdpa_converter.py	2025-08-25 20:35:59.940021+00:00
@@ -286,11 +286,10 @@
                    false_input = mm
                    output_layer = if_layer.add_output(
                        true_input.get_output(0), false_input.get_output(0)
                    )
                    scaled_add_attn_bias = output_layer.get_output(0)
-    

    softmax = impl.normalization.softmax(
        ctx, target, source_ir, name + "_softmax", scaled_add_attn_bias, -1, False
    )
    if use_fp32_acc:

@lanluo-nvidia lanluo-nvidia changed the title try to use the attn mask transformer passed in try to use the attn mask transformer passed in(Not working with KV Cache) Aug 25, 2025
@github-actions github-actions bot added component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Aug 28, 2025
@lanluo-nvidia lanluo-nvidia changed the title try to use the attn mask transformer passed in(Not working with KV Cache) register sdpa operator with model config Aug 29, 2025
@lanluo-nvidia lanluo-nvidia marked this pull request as ready for review August 29, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants