Skip to content

Commit 62b431c

Browse files
ziyixiong-nvcodego7250
authored andcommitted
[https://nvbugs/5698581][fix] Init draft tokens for CUDA graph dummy request (NVIDIA#9505)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
1 parent 82d3341 commit 62b431c

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -389,6 +389,7 @@ def _get_padded_batch(self, batch: ScheduledRequests,
389389
if spec_res_mgr:
390390
spec_res_mgr.add_dummy_requests([CUDA_GRAPH_DUMMY_REQUEST_ID])
391391

392+
self.padding_dummy_request.py_draft_tokens = [0] * runtime_draft_len
392393
batch.generation_requests.extend([self.padding_dummy_request] *
393394
padding_size)
394395
return padding_size

0 commit comments

Comments
 (0)