[feature] Add TeaCache Support to Glm Image by akshatvishu · Pull Request #1458 · vllm-project/vllm-omni

akshatvishu · 2026-02-24T22:43:44Z

Part of #1217

Purpose

Add TeaCache Support to Glm Image

Test Plan

Initial coefficient estimation and testing done at this colab notebook

Test Result

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

- Implement GLM-Image CacheContext extractor - Add GLMImageAdapter for coefficient estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

- Remove collected_data tensor storage - Compute and store scalar L1 diffs immediately - Prevent RAM growth for large calibration runs - No change to coefficient math Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Added `self.do_true_cfg = True` to GlmImageTransformer2DModel initialization. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-02-24T22:56:01Z

Question 1: Coefficient Validity Without do_true_cfg = True

During coefficient estimation, GlmImageTransformer2DModel did not have self.do_true_cfg = True set.
The estimation pipeline performed full forward passes and collected modulated input/output tensors directly via the extractor. Since the extractor operates on the transformer forward pass itself (independent of CFG branch state handling), I would expect the collected tensors to remain valid.

However, I’m unsure whether the missing do_true_cfg flag could have caused:

unintended mixing of CFG branches during forward execution, or
altered internal behavior that would invalidate the polynomial fit training data.

Q2) After adding self.do_true_cfg = True to GlmImageTransformer2DModel, TeaCache still shows no speedup:

Baseline: ~31s

TeaCache: ~32s

Additionally, the log line:

"TeaCache applied with..."

never appears in subprocess logs, so I cannot confirm whether the hook is firing during inference. Maybe I am missing setting some env flag in colab to enable logs. Because I did test the Z Image teacache integration and while it not show TeaCache applied with in the logs as before, you can see that it was applied as the generation time was significantly less then baseline. All this testing can be seen in this colab notebook

So far, I’ve verified:

TeaCacheConfig initializes correctly with GLM coefficients
apply_teacache_hook() attaches correctly in the main process
get_cache_backend("tea_cache") returns a valid backend
pipeline.transformer exists and is the expected class
_WrappedForward correctly wraps forward() in isolation

vllm_omni/diffusion/models/glm_image/glm_image_transformer.py

vllm_omni/diffusion/cache/teacache/coefficient_estimator.py

vllm_omni/diffusion/cache/teacache/hook.py

lishunyang12

Left a few comments. The biggest issue is do_true_cfg = True hardcoded in the transformer __init__ — pretty sure that's why the hook never fires correctly (see inline comment).

Also, estimate_teacache_coefficients() is now dead code after inlining its logic into estimate(). Either remove it or keep calling it — having both is confusing.

Same print() statements exist in the standalone function too (lines 132-135 on main), so those should get cleaned up as part of this.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

… print with logger Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…ew one in forward() Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-02-28T12:34:11Z

Thanks again for the quick review!

Left a few comments. The biggest issue is do_true_cfg = True hardcoded in the transformer __init__ — pretty sure that's why the hook never fires correctly (see inline comment).

Yes, this indeed was the main issue, now I am seeing a 1.3x vs the base when I tested it in-here at: colab -- under heading "Testing Teacache". I suspect the AR stage is a fixed cost (since TeaCache only optimized the DiT part) that makes the total speedup look smaller. Once my colab credits are back, I want to profile the AR and DiT stages separately to see the exact impact on the denoising loop.

Also, estimate_teacache_coefficients() is now dead code after inlining its logic into estimate(). Either remove it or keep calling it — having both is confusing.

I've removed the dead code!

Same print() statements exist in the standalone function too (lines 132-135 on main), so those should get cleaned up as part of this.

Cleaned this up too!

Signed-off-by: akshatvishu <33392262+akshatvishu@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d39eb7a4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-28T12:53:35Z

vllm_omni/diffusion/cache/teacache/coefficient_estimator.py

+        od_config = OmniDiffusionConfig.from_kwargs(model=model_path, dtype=dtype)
+        od_config.model_class_name = "GlmImagePipeline"
+
+        tf_config_path = hf_hub_download(model_path, "transformer/config.json")


Handle local checkpoints when reading GLM transformer config

GlmImageAdapter.load_pipeline unconditionally calls hf_hub_download(model_path, "transformer/config.json"), which treats model_path as a Hub repo id; when callers pass a local model directory (a supported pattern elsewhere in this repo), this path resolution fails before the pipeline is built. That makes TeaCache coefficient estimation for GLM unusable in local/offline setups and breaks parity with other loaders that accept filesystem paths.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-28T12:53:35Z

vllm_omni/diffusion/cache/teacache/coefficient_estimator.py

+        sampling_params = OmniDiffusionSamplingParams(
            num_inference_steps=generate_kwargs.get("num_inference_steps", 20),
            seed=generate_kwargs.get("seed", 42),
        )


Pass GLM prior tokens into coefficient collection requests

collect_from_prompt only populates num_inference_steps and seed, then sends a plain prompt; for GlmImagePipeline, non-warmup requests require prior_token_ids (and optionally prior_token_image_ids) in request extras, otherwise pipeline_glm_image.forward raises a ValueError. As written, the new GLM estimator path cannot collect data from normal prompts, so the advertised support is functionally broken unless users bypass this API.

Useful? React with 👍 / 👎.

akshatvishu · 2026-02-28T13:51:44Z

It seems like the new GlmImagePipeline requires prior_token_ids in the request extras for any non-warmup call which in-turns leads to the old coefficient estimation logic being broken. I think , we can load the AR model separately inside GlmImageAdapter (on CPU to avoid the memory issue)and generate prior tokens before each forward() call. Is this alright or do you want this handled any specific way? @lishunyang12

akshatvishu added 9 commits February 24, 2026 22:04

feat(teacache): add GLM-Image extractor and adapter support

539465b

- Implement GLM-Image CacheContext extractor - Add GLMImageAdapter for coefficient estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix config not found

88ff0a7

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

only move modules needed for coefficient estimation to gpu

d831a6a

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix OmniDiffusionRequest signature wanting prompt as list

14be7fa

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

def new_forward not handling bfloat16

aed91cb

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Add calculated coefficient for Glm Image

561a175

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(teacache): enable true CFG tracking for GLM-Image

c10fe29

Added `self.do_true_cfg = True` to GlmImageTransformer2DModel initialization. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

debug

c9234aa

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

vllm_omni/diffusion/models/glm_image/glm_image_transformer.py Outdated Show resolved Hide resolved

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

vllm_omni/diffusion/cache/teacache/coefficient_estimator.py Outdated Show resolved Hide resolved

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

vllm_omni/diffusion/cache/teacache/hook.py Outdated Show resolved Hide resolved

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

akshatvishu added 5 commits February 28, 2026 15:31

fix: move do_true_cfg to diffuse loop to fix KV-cache hits

8074caa

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(teacache): remove dead coefficient estimator function and replace…

85def43

… print with logger Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

perf(glm-image): add AR/DiT split timing for TeaCache benchmark analysis

3e1dca9

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(glm-image): seed AR model sampling for deterministic generation

d0bf296

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(glm-image): use runner-provided generator instead of creating a n…

4b69c41

…ew one in forward() Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Merge branch 'main' into teacache/glm_image

6d39eb7

Signed-off-by: akshatvishu <33392262+akshatvishu@users.noreply.github.com>

akshatvishu marked this pull request as ready for review February 28, 2026 12:47

akshatvishu requested a review from hsliuustc0106 as a code owner February 28, 2026 12:47

chatgpt-codex-connector bot reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Add TeaCache Support to Glm Image#1458

[feature] Add TeaCache Support to Glm Image#1458
akshatvishu wants to merge 15 commits intovllm-project:mainfrom
akshatvishu:teacache/glm_image

akshatvishu commented Feb 24, 2026

Uh oh!

akshatvishu commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Uh oh!

akshatvishu commented Feb 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 28, 2026

Uh oh!

chatgpt-codex-connector bot Feb 28, 2026

Uh oh!

akshatvishu commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akshatvishu commented Feb 24, 2026

Purpose

Test Plan

Test Result

Uh oh!

akshatvishu commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

akshatvishu commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

akshatvishu commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akshatvishu commented Feb 24, 2026 •

edited

Loading

akshatvishu commented Feb 28, 2026 •

edited

Loading