[feature] : add cache-dit for stable-audio-open-1.0 by akshatvishu · Pull Request #1341 · vllm-project/vllm-omni

akshatvishu · 2026-02-11T21:03:02Z

Part of #1217

Purpose

Add cache-dit support for stable audio open 1.0

Test Plan

    omni = Omni(
        model=MODEL_PATH,
        dtype="float16",
        num_workers=1,
        cache_backend=cache_backend,
        cache_config=cache_config
    )
}

sampling_params = OmniDiffusionSamplingParams(
    num_inference_steps=100,
    guidance_scale=7.0,
    seed=42,
    extra_args={"audio_end_in_s": 10.0}
)

outputs = omni.generate(
    {"prompt": "The sound of a hammer hitting a wooden surface", "negative_prompt": "Low quality, noisy"},
    sampling_params
)

full comprehensive testing can be found in this kaggle_notebook

Test Result

Device: cuda
GPU: NVIDIA Tesla T4
Prompt : The sound of a hammer hitting a wooden surface
num_inference_steps=100
guidance_scale=7.0,
max_audio_length = 10 seconds

Baseline:

Configuration	Time	Speed Up(vs baseline)	file (mp3)
Baseline( OMNI)	25.91 s	-	baseline.mp3
Baseline (HF Diffuser )	27.78s	-	baseline_hf.mp3

Config1:

    "configA_balanced": {
        "Fn_compute_blocks": 2,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 4,
        "residual_diff_threshold": 0.22,
        "max_continuous_cached_steps": 3,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },

Configuration	Time	Speed Up(vs baseline)	file (mp3)
OMNI	22.08s	1.17x	configA_balanced.mp3
HF Diffuser + CacheDit	24.69s	1.13x	configA_balanced_hf.mp3

Config2:

    "configB_aggressive": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 3,
        "residual_diff_threshold": 0.30,
        "max_continuous_cached_steps": 5,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },

Configuration	Time	Speed Up(vs baseline)	file (mp3)
OMNI	20.15 s	1.29x	configB_aggressive.mp3
HF Diffuser + CacheDit	24.05s	1.16x	configB_aggressive_hf.mp3

Config3:

    "configC_ultra": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 2,
        "residual_diff_threshold": 0.35,
        "max_continuous_cached_steps": 6,
        "enable_taylorseer": True,
        "taylorseer_order": 2,
    }
}

Configuration	Time	Speed Up(vs baseline)	file (mp3)
OMNI	19.16 s	1.35x	configC_ultra.mp3
HF Diffuser + CacheDit	20.90s	1.33x	configC_ultra_hf.mp3

Files are in .mp3 format as github doesn't support .wav in comments.

Note :

Stable Audio Open 1.0 exhibits a high natural step-to-step drift (median residual ≈0.34) as seen in cache-dit.summary() when running the same config as vllm-omni in hf diffuser+cache-dit setup. To achieve significant speedups on T4 hardware, it is necessary to use a residual_diff_threshold near or above this drift value as using conservative residual_diff_threshold like 0.12 resulted in 1.00x speedup (or even slowdowns) because the cache missed on nearly every step, leaving only the management overhead without any compute savings.
The vllm-omni orchestrator performs a 1-step dummy warmup run during server initialization, If a user provides an SCM (Step Computation Masking) policy, the engine crashes with the following error:

AssertionError: Only total_steps=4 or 6 is supported for predefined masks while total_steps < 8. Got total_steps=1.

Thus, I am wondering if we should a guard condition like below or it's an acceptable behavior.

def refresh_cache_context(pipeline: Any, num_inference_steps: int, verbose: bool = True) -> None:
    """
    Refresh cache context. 
    Guards against 1-step dummy warmup causing SCM mask generation errors.
    """
    # Disable SCM policy for the 1-step dummy warmup to prevent AssertionError
    effective_mask_policy = cache_config.scm_steps_mask_policy if num_inference_steps > 1 else None

Also added the missing _repeated_blocks = ["StableAudioDiTBlock"] to StableAudioDiTModel to enable regional compilation and backend patching.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf50517d5d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/diffusion/cache/cache_dit_backend.py

hsliuustc0106 · 2026-02-12T07:11:01Z

fix DCO please

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…g warmup" This reverts commit e4c5a1f. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-02-12T16:02:30Z

@hsliuustc0106 Sorry! I've updated it !

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…p/cache-dit Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…tion Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu requested a review from hsliuustc0106 as a code owner February 11, 2026 21:03

akshatvishu mentioned this pull request Feb 11, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

vllm_omni/diffusion/cache/cache_dit_backend.py Show resolved Hide resolved

SamitHuang added the ready label to trigger buildkite CI label Feb 12, 2026

hsliuustc0106 removed the ready label to trigger buildkite CI label Feb 12, 2026

akshatvishu added 4 commits February 12, 2026 19:26

feat(diffusion): add cache-dit support for Stable Audio Open 1.0

c5dbef6

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(cache-dit): add step guard to prevent sao scm crash during warmup

5212ac6

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Revert "fix(cache-dit): add step guard to prevent sao scm crash durin…

adadbc1

…g warmup" This reverts commit e4c5a1f. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

docs: update Cache-DiT entry

4569fd7

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the cache-dit-sao branch from cf50517 to 4569fd7 Compare February 12, 2026 13:56

akshatvishu added 3 commits February 12, 2026 21:45

correctly initialize Stable Audio cache context

6390191

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Fix StableAudio CacheDiT forward_pattern to Pattern_3 to match vipsho…

2667215

…p/cache-dit Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

feat: add _repeated_blocks to Stable Audio DiT for Cache-DiT accelera…

3dfc0b2

…tion Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] : add cache-dit for stable-audio-open-1.0#1341

[feature] : add cache-dit for stable-audio-open-1.0#1341
akshatvishu wants to merge 7 commits intovllm-project:mainfrom
akshatvishu:cache-dit-sao

akshatvishu commented Feb 11, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

hsliuustc0106 commented Feb 12, 2026

Uh oh!

akshatvishu commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akshatvishu commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Baseline:

Config1:

Config2:

Config3:

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 commented Feb 12, 2026

Uh oh!

akshatvishu commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akshatvishu commented Feb 11, 2026 •

edited

Loading