Skip to content

Commit 0c1bb01

Browse files
authored
Merge branch 'main' into feat/vllmomni_profiling
2 parents 4aa67aa + 1e86404 commit 0c1bb01

File tree

10 files changed

+16
-47
lines changed

10 files changed

+16
-47
lines changed

docs/.nav.yml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,6 @@ nav:
5858
- contributing/model/README.md
5959
- contributing/model/adding_omni_model.md
6060
- contributing/model/adding_diffusion_model.md
61-
- Advanced Features:
62-
- contributing/features/cfg_parallel.md
63-
- contributing/features/sequence_parallel.md
64-
- contributing/features/tensor_parallel.md
65-
- contributing/features/cache_dit.md
66-
- contributing/features/teacache.md
6761
- CI: contributing/ci
6862
- Design Documents:
6963
- design/index.md
@@ -72,6 +66,11 @@ nav:
7266
- design/feature/disaggregated_inference.md
7367
- design/feature/ray_based_execution.md
7468
- design/feature/omni_connectors/
69+
- design/feature/cfg_parallel.md
70+
- design/feature/sequence_parallel.md
71+
- design/feature/tensor_parallel.md
72+
- design/feature/cache_dit.md
73+
- design/feature/teacache.md
7574
- Module Design:
7675
- design/module/ar_module.md
7776
- design/module/dit_module.md

docs/assets/WeChat.jpg

3.55 KB
Loading

docs/contributing/model/adding_diffusion_model.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Adding a Diffusion Model to vLLM-Omni
1+
# Adding a Diffusion Model
22

33
This guide walks you through adding a new diffusion model to vLLM-Omni. We use **Qwen-Image** as the primary example, with references to other models (LongCat, Flux, Wan2.2) to illustrate different patterns.
44

@@ -680,7 +680,7 @@ vLLM-Omni automatically compiles blocks in `_repeated_blocks` when `torch.compil
680680

681681
### Tensor Parallelism
682682

683-
See detailed guide: [How to add Tensor Parallel support](../features/tensor_parallel.md)
683+
See detailed guide: [How to add Tensor Parallel support](../../design/feature/tensor_parallel.md)
684684

685685
**Quick setup:**
686686

@@ -694,7 +694,7 @@ omni = Omni(model="your-model", tensor_parallel_size=2)
694694

695695
### CFG Parallelism
696696

697-
See detailed guide: [How to add CFG-Parallel support](../features/cfg_parallel.md)
697+
See detailed guide: [How to add CFG-Parallel support](../../design/feature/cfg_parallel.md)
698698

699699
**Quick setup:**
700700

@@ -708,7 +708,7 @@ omni = Omni(model="your-model", cfg_parallel_size=2)
708708

709709
### Sequence Parallelism
710710

711-
See detailed guide: [How to add Sequence Parallel support](../features/sequence_parallel.md)
711+
See detailed guide: [How to add Sequence Parallel support](../../design/feature/sequence_parallel.md)
712712

713713
**Quick setup:**
714714

@@ -724,7 +724,7 @@ omni = Omni(model="your-model", ulysses_degree=2, ring_degree=2)
724724

725725
#### TeaCache
726726

727-
See detailed guide: [How to add TeaCache support](../features/teacache.md)
727+
See detailed guide: [How to add TeaCache support](../../design/feature/teacache.md)
728728

729729
**Quick setup:**
730730

@@ -744,7 +744,7 @@ omni = Omni(model="your-model",
744744

745745
#### Cache-DiT
746746

747-
See detailed guide: [How to add Cache-DiT support](../features/cache_dit.md)
747+
See detailed guide: [How to add Cache-DiT support](../../design/feature/cache_dit.md)
748748

749749
**Quick setup:**
750750

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Support Cache-DiT
1+
# Cache-DiT
22

33
This section describes how to add cache-dit acceleration to a new diffusion pipeline. We use the Qwen-Image pipeline and LongCat-Image pipeline as reference implementations.
44

docs/contributing/features/cfg_parallel.md renamed to docs/design/feature/cfg_parallel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Support CFG-Parallel
1+
# CFG-Parallel
22

33
This section describes how to add CFG-Parallel (Classifier-Free Guidance Parallel) to a diffusion pipeline. We use the Qwen-Image pipeline as the reference implementation.
44

docs/contributing/features/sequence_parallel.md renamed to docs/design/feature/sequence_parallel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Support Sequence Parallel
1+
# Sequence Parallel
22

33
This section describes how to add Sequence Parallel (SP) to a diffusion transformer model. We use the Qwen-Image transformer and Wan2.2 transformer as reference implementations.
44

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Support TeaCache
1+
# TeaCache
22

33
This section describes how to add TeaCache to a diffusion transformer model. We use the Qwen-Image transformer as the reference implementation.
44

docs/contributing/features/tensor_parallel.md renamed to docs/design/feature/tensor_parallel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Support Tensor Parallel
1+
# Tensor Parallel
22

33
This section describes how to add Tensor Parallel (TP) to a diffusion transformer model. We use the Z-Image transformer as the reference implementation.
44

vllm_omni/worker/gpu_ar_model_runner.py

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -192,15 +192,6 @@ def execute_model(
192192
num_encoder_reqs=len(scheduler_output.scheduled_encoder_inputs),
193193
)
194194

195-
logger.debug(
196-
"Running batch with cudagraph_mode: %s, batch_descriptor: %s, "
197-
"should_ubatch: %s, num_tokens_across_dp: %s",
198-
cudagraph_mode,
199-
batch_desc,
200-
should_ubatch,
201-
num_tokens_across_dp,
202-
)
203-
204195
num_tokens_padded = batch_desc.num_tokens
205196
num_reqs_padded = batch_desc.num_reqs if batch_desc.num_reqs is not None else num_reqs
206197
ubatch_slices, ubatch_slices_padded = maybe_create_ubatch_slices(
@@ -211,12 +202,6 @@ def execute_model(
211202
self.parallel_config.num_ubatches,
212203
)
213204

214-
logger.debug(
215-
"ubatch_slices: %s, ubatch_slices_padded: %s",
216-
ubatch_slices,
217-
ubatch_slices_padded,
218-
)
219-
220205
pad_attn = cudagraph_mode == CUDAGraphMode.FULL
221206

222207
use_spec_decode = len(scheduler_output.scheduled_spec_decode_tokens) > 0
@@ -308,15 +293,6 @@ def execute_model(
308293
aux_hidden_states = None
309294

310295
hidden_states, multimodal_outputs = self.extract_multimodal_outputs(model_output)
311-
if multimodal_outputs is not None:
312-
keys_or_type = (
313-
list(multimodal_outputs.keys())
314-
if isinstance(multimodal_outputs, dict)
315-
else type(multimodal_outputs)
316-
)
317-
logger.debug(f"[AR] execute_model: multimodal_outputs keys = {keys_or_type}")
318-
else:
319-
logger.debug("[AR] execute_model: multimodal_outputs is None")
320296

321297
if not self.broadcast_pp_output:
322298
# Common case.

vllm_omni/worker/gpu_model_runner.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1151,12 +1151,6 @@ def _model_forward(
11511151
"""Inject omni-specific kwargs into forward and cache model output"""
11521152
model_kwargs_extra = self._build_model_kwargs_extra()
11531153

1154-
runtime_info = model_kwargs_extra.get("runtime_additional_information", [])
1155-
if runtime_info:
1156-
for i, info in enumerate(runtime_info):
1157-
if info:
1158-
logger.debug(f"[OMNI] req[{i}] runtime_additional_information keys: {list(info.keys())}")
1159-
11601154
model_output = super()._model_forward(
11611155
input_ids=input_ids,
11621156
positions=positions,

0 commit comments

Comments
 (0)