-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't working
Description
System Info
TensorRT-LLM main
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
In https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/modules/qk_norm_attention.py
line 58:
factor = getattr(rope_scaling, "factor", 1.0)
rope_scaling is a dict, so "factor" will always be 1.0.
Expected behavior
None
actual behavior
None
additional notes
None
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't working