[Feat] support for multi-block layerwise offloading by RuixiangMa · Pull Request #1486 · vllm-project/vllm-omni

RuixiangMa · 2026-02-25T17:55:14Z

Purpose

Some diffusion models (e.g., Flux, LongCat, Ovis) have two types of transformer blocks(e.g., transformer_blocks and single_transformer_blocks )， the previous implementation only supported single block type, limiting layerwise offloading effectiveness for these models.

Implement _layerwise_offload_blocks_attrs attribute to support models with multiple block types
Compatible with existing single-block models using _layerwise_offload_blocks_attr
Added support for Flux, Flux2-Klein and Z-Image(single block) models
Bug fix : Fixed top-level parameters/buffers staying on CPU during offloading

Test Plan

Test Result

NVIDIA-4090(24G)

vllm serve --model /data/models/black-forest-labs/FLUX* --omni --enable_layerwise_offload --port 8004

curl -X POST http://localhost:8004/v1/images/generations   -H "Content-Type: application/json"   -d '{
    "prompt": "a majestic dragon perched on the mountain ridge of Vermont, misty morning atmosphere, photorealistic style",
    "size": "1024x1024",
    "num_inference_steps": 50,
    "cfg_scale": 4.0,
    "guidance_scale": 4.0,
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > dragon.png

Model	FLUX.1-dev	FLUX.2-klein-4B	FLUX.2-klein-9B	Qwen-Image-2512
Image

Note: FLUX series adopts a multi-block, while Qwen-Image-2512 uses a single-block.

Offload VS no offload

Since FLUX.1-dev and FLUX.2-klein-9B et.al incur OOM without layer offloading, we use FLUX.2-klein-4B and Z-Image as a representative example to illustrate memory usage:

Model	No Offload		With Offload
	Image	VRAM	Image	VRAM
FLUX.2-klein-4B		`19.7GB`		`13.8GB`
Z-Image		`22.7GB`		`15.5GB`

Signed-off-by: Lancer <maruixiang6688@gmail.com>

alex-jw-brooks · 2026-02-25T20:39:42Z

vllm_omni/diffusion/offloader/layerwise_backend.py

+
+        # Handle multiple block types (_layerwise_offload_blocks_attrs)
        if blocks_attr_name is None:
+            blocks_attrs_names = getattr(model.__class__, "_layerwise_offload_blocks_attrs", None)


IMO having both _layerwise_offload_blocks_attrs and _layerwise_offload_blocks_attr is a little confusing. I think it would be cleaner to just have one attr that can also be a list, because the behavior is not well-defined if a module sets both attributes by mistake

Yeah, accounted for that, only kept the legacy path for compatibility, but can refactor if needed.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

hsliuustc0106 · 2026-02-27T00:20:36Z

docs/user_guide/diffusion/cpu_offload_diffusion.md


    def __init__(self):
        self.blocks = nn.ModuleList([...])  # Transformer blocks
 ```


This PR adds multi-block layerwise offloading but provides no test coverage. Add tests to verify: (1) multi-block offloading works correctly with different block types, (2) memory usage is reduced as expected, (3) output quality is maintained, and (4) edge cases like empty or invalid block attributes are handled.

hsliuustc0106 · 2026-02-27T00:20:42Z

vllm_omni/diffusion/offloader/layerwise_backend.py


-            if not blocks_attr_name or not blocks:
+            if not blocks:
                logger.warning(


No validation for blocks_attr_names. What happens if an attribute name doesn't exist on the model? Add error handling to check that each attribute in _layerwise_offload_blocks_attrs exists and contains valid blocks, with clear error messages for misconfiguration.

hsliuustc0106 · 2026-02-27T00:20:48Z

vllm_omni/diffusion/offloader/layerwise_backend.py

+                    m.to(self.device)
+
+            # Move top-level params/buffers to GPU (dit_module's own, not sub-modules)
+            for param in dit_module._parameters.values():


This changes the offloading behavior from single 'blocks' attribute to multiple block attributes. Verify backward compatibility - existing models with only 'blocks' should still work. Consider adding a deprecation warning if the old single-attribute pattern is detected.

The single-block model test has been verified. I'll supplement the result

lishunyang12

Left a couple comments on the backend changes. The multi-block approach looks right for Flux-style models.

lishunyang12 · 2026-02-28T03:50:03Z

vllm_omni/diffusion/offloader/layerwise_backend.py

-                m.to(self.device)
-                logger.debug(f"Moved {name} to device {self.device}")
+                if blocks_attr_names and name not in blocks_attr_names:
+                    m.to(self.device)


The old code had logger.debug calls here for skipped/moved modules. Dropping them makes offloading issues harder to debug — can you keep the logging?

The debug logging for skipped vs moved modules was useful — can you add it back?

lishunyang12 · 2026-02-28T03:50:12Z

vllm_omni/diffusion/offloader/layerwise_backend.py

+            for param in dit_module._parameters.values():
+                if param is not None:
+                    param.data = param.data.to(self.device, non_blocking=True)
+


Moving top-level params/buffers looks like a separate bug fix (previously they would stay on CPU). Worth calling out in the PR description so it does not get overlooked during review.

This is still worth mentioning in the PR description — it is a behavior change for models with top-level params.

lishunyang12 · 2026-02-28T03:50:20Z

vllm_omni/diffusion/offloader/layerwise_backend.py

-                    logger.debug(f"Skipped blocks module {name}")
-                    continue
-                m.to(self.device)
-                logger.debug(f"Moved {name} to device {self.device}")


Nit: the blocks_attr_names and guard is redundant — we already continue above when not blocks, and blocks being non-empty implies blocks_attr_names is non-empty.

ths, fixed it

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-02-28T08:25:58Z

z-image is also supported in the pr to validate memory savings

Signed-off-by: Lancer <maruixiang6688@gmail.com>

lishunyang12

Deprecation warning and validation look good. Two minor items still open — see inline threads.

[Feat] support for multi-block layerwise offloading

161d948

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa requested a review from hsliuustc0106 as a code owner February 25, 2026 17:55

alex-jw-brooks reviewed Feb 25, 2026

View reviewed changes

upd

d6324f5

Signed-off-by: Lancer <maruixiang6688@gmail.com>

hsliuustc0106 reviewed Feb 27, 2026

View reviewed changes

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

RuixiangMa added 2 commits February 28, 2026 15:57

upd

f1da0c7

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

faa6839

Signed-off-by: Lancer <maruixiang6688@gmail.com>

Merge branch 'main' into multiblockoffload

6ecc0d6

Signed-off-by: Lancer <maruixiang6688@gmail.com>

lishunyang12 reviewed Mar 4, 2026

View reviewed changes

Conversation

RuixiangMa commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RuixiangMa commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RuixiangMa commented Feb 25, 2026 •

edited

Loading

RuixiangMa commented Feb 28, 2026 •

edited

Loading