Skip to content

Commit 37504c8

Browse files
committed
fix test
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
1 parent 04b992b commit 37504c8

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

tensorrt_llm/_torch/pyexecutor/py_executor.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2613,7 +2613,8 @@ def _handle_responses(self):
26132613
if request.is_finished:
26142614
# Finalize any remaining logits transfers for the finished request in chunked mode
26152615
if request.py_use_chunked_generation_logits and request.py_return_generation_logits:
2616-
request.py_result.transfer_remaining_device_logits()
2616+
with torch.inference_mode():
2617+
request.py_result.transfer_remaining_device_logits()
26172618

26182619
request_done = False
26192620
if request.py_decoding_iter == 1 or request.is_finished or \

0 commit comments

Comments
 (0)