Skip to content

[feature] : add cache-dit for stable-audio-open-1.0#1341

Open
akshatvishu wants to merge 7 commits intovllm-project:mainfrom
akshatvishu:cache-dit-sao
Open

[feature] : add cache-dit for stable-audio-open-1.0#1341
akshatvishu wants to merge 7 commits intovllm-project:mainfrom
akshatvishu:cache-dit-sao

Conversation

@akshatvishu
Copy link

@akshatvishu akshatvishu commented Feb 11, 2026

Part of #1217

Purpose

Add cache-dit support for stable audio open 1.0

Test Plan

    omni = Omni(
        model=MODEL_PATH,
        dtype="float16",
        num_workers=1,
        cache_backend=cache_backend,
        cache_config=cache_config
    )
}

sampling_params = OmniDiffusionSamplingParams(
    num_inference_steps=100,
    guidance_scale=7.0,
    seed=42,
    extra_args={"audio_end_in_s": 10.0}
)

outputs = omni.generate(
    {"prompt": "The sound of a hammer hitting a wooden surface", "negative_prompt": "Low quality, noisy"},
    sampling_params
)

full comprehensive testing can be found in this kaggle_notebook

Test Result

  • Device: cuda

  • GPU: NVIDIA Tesla T4

  • Prompt : The sound of a hammer hitting a wooden surface

  • num_inference_steps=100

  • guidance_scale=7.0,

  • max_audio_length = 10 seconds

Baseline:

Configuration Time Speed Up(vs baseline) file (mp3)
Baseline( OMNI) 25.91 s - baseline.mp3
Baseline (HF Diffuser ) 27.78s - baseline_hf.mp3

Config1:

    "configA_balanced": {
        "Fn_compute_blocks": 2,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 4,
        "residual_diff_threshold": 0.22,
        "max_continuous_cached_steps": 3,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 22.08s 1.17x configA_balanced.mp3
HF Diffuser + CacheDit 24.69s 1.13x configA_balanced_hf.mp3

Config2:

    "configB_aggressive": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 3,
        "residual_diff_threshold": 0.30,
        "max_continuous_cached_steps": 5,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 20.15 s 1.29x configB_aggressive.mp3
HF Diffuser + CacheDit 24.05s 1.16x configB_aggressive_hf.mp3

Config3:

    "configC_ultra": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 2,
        "residual_diff_threshold": 0.35,
        "max_continuous_cached_steps": 6,
        "enable_taylorseer": True,
        "taylorseer_order": 2,
    }
}
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 19.16 s 1.35x configC_ultra.mp3
HF Diffuser + CacheDit 20.90s 1.33x configC_ultra_hf.mp3

Files are in .mp3 format as github doesn't support .wav in comments.

Note :

  • Stable Audio Open 1.0 exhibits a high natural step-to-step drift (median residual ≈0.34) as seen in cache-dit.summary() when running the same config as vllm-omni in hf diffuser+cache-dit setup. To achieve significant speedups on T4 hardware, it is necessary to use a residual_diff_threshold near or above this drift value as using conservative residual_diff_threshold like 0.12 resulted in 1.00x speedup (or even slowdowns) because the cache missed on nearly every step, leaving only the management overhead without any compute savings.

  • The vllm-omni orchestrator performs a 1-step dummy warmup run during server initialization, If a user provides an SCM (Step Computation Masking) policy, the engine crashes with the following error:

AssertionError: Only total_steps=4 or 6 is supported for predefined masks while total_steps < 8. Got total_steps=1.

Thus, I am wondering if we should a guard condition like below or it's an acceptable behavior.

def refresh_cache_context(pipeline: Any, num_inference_steps: int, verbose: bool = True) -> None:
    """
    Refresh cache context. 
    Guards against 1-step dummy warmup causing SCM mask generation errors.
    """
    # Disable SCM policy for the 1-step dummy warmup to prevent AssertionError
    effective_mask_policy = cache_config.scm_steps_mask_policy if num_inference_steps > 1 else None
   
  • Also added the missing _repeated_blocks = ["StableAudioDiTBlock"] to StableAudioDiTModel to enable regional compilation and backend patching.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf50517d5d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@SamitHuang SamitHuang added the ready label to trigger buildkite CI label Feb 12, 2026
@hsliuustc0106
Copy link
Collaborator

fix DCO please

@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Feb 12, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…g warmup"

This reverts commit e4c5a1f.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu
Copy link
Author

@hsliuustc0106 Sorry! I've updated it !

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…p/cache-dit

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…tion

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants