Skip to content

Commit 3a5927c

Browse files
author
Robert Shaw
committed
comment clean up
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
1 parent 5662826 commit 3a5927c

File tree

1 file changed

+1
-7
lines changed

1 file changed

+1
-7
lines changed

vllm/model_executor/layers/fused_moe/runner/shared_experts.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -147,14 +147,8 @@ def _run_in_aux_stream(
147147
) -> torch.Tensor:
148148
# TODO: assert that maybe_setup_shared_experts_stream has been called.
149149

150-
# Run shared experts in parallel on a separate stream
151-
# NOTE: We start the separate stream here and mark the
152-
# sync end point immediately after it is done. This is
153-
# important to avoid excessive stream allocations by the cuda
154-
# graph replay later.
150+
# Run shared experts in parallel on a separate stream.
155151
with torch.cuda.stream(self._stream):
156-
# Note that hidden_states clone() is necessary here to avoid
157-
# conflict with the main stream
158152
output = self._layer(shared_experts_input)
159153
current_stream().wait_stream(self._stream)
160154

0 commit comments

Comments
 (0)