-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't working
Description
System Info
Null
Who can help?
in https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/indexerTopK.cu#L287-L305
if constexpr (step < 3)
{
// Only fill the final items for sorting if the threshold bin fits
if (binIdx == thresholdBinIdx && smemFinalBinSize[0] <= kNumFinalItems)
{
int dstIdx = atomicAdd(&smemFinalDstIdx[0], 1);
smemFinal.items.logits[dstIdx] = logit;
if constexpr (mergeBlocks)
{
smemFinal.items.indices[dstIdx] = indices[idx];
}
else if constexpr (multipleBlocksPerRow)
{
smemFinal.items.indices[dstIdx] = idx + rowStart;
}
else
{
smemFinal.items.indices[dstIdx] = idx;
}
}
}
should we add check like:
int dstIdx = atomicAdd(&smemFinalDstIdx[0], 1);
if (dstIdx < kNumFinalItems) {
smemFinal.items.logits[dstIdx] = logit;
if constexpr (mergeBlocks)
{
smemFinal.items.indices[dstIdx] = indices[idx];
}
else if constexpr (multipleBlocksPerRow)
{
smemFinal.items.indices[dstIdx] = idx + rowStart;
}
else
{
smemFinal.items.indices[dstIdx] = idx;
}
}
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Null
Expected behavior
Null
actual behavior
Null
additional notes
Null
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Customized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't working