File tree Expand file tree Collapse file tree 1 file changed +1
-7
lines changed
vllm/model_executor/layers/fused_moe/runner Expand file tree Collapse file tree 1 file changed +1
-7
lines changed Original file line number Diff line number Diff line change @@ -147,14 +147,8 @@ def _run_in_aux_stream(
147147 ) -> torch .Tensor :
148148 # TODO: assert that maybe_setup_shared_experts_stream has been called.
149149
150- # Run shared experts in parallel on a separate stream
151- # NOTE: We start the separate stream here and mark the
152- # sync end point immediately after it is done. This is
153- # important to avoid excessive stream allocations by the cuda
154- # graph replay later.
150+ # Run shared experts in parallel on a separate stream.
155151 with torch .cuda .stream (self ._stream ):
156- # Note that hidden_states clone() is necessary here to avoid
157- # conflict with the main stream
158152 output = self ._layer (shared_experts_input )
159153 current_stream ().wait_stream (self ._stream )
160154
You can’t perform that action at this time.
0 commit comments