[feature] Add TeaCache Support to Glm Image#1458
[feature] Add TeaCache Support to Glm Image#1458akshatvishu wants to merge 15 commits intovllm-project:mainfrom
Conversation
- Implement GLM-Image CacheContext extractor - Add GLMImageAdapter for coefficient estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
- Remove collected_data tensor storage - Compute and store scalar L1 diffs immediately - Prevent RAM growth for large calibration runs - No change to coefficient math Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Added `self.do_true_cfg = True` to GlmImageTransformer2DModel initialization. Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
Question 1: Coefficient Validity Without During coefficient estimation, However, I’m unsure whether the missing
Q2) After adding self.do_true_cfg = True to GlmImageTransformer2DModel, TeaCache still shows no speedup: Baseline: ~31s
TeaCache: ~32sAdditionally, the log line: "TeaCache applied with..."never appears in subprocess logs, so I cannot confirm whether the hook is firing during inference. Maybe I am missing setting some env flag in colab to enable logs. Because I did test the So far, I’ve verified:
|
lishunyang12
left a comment
There was a problem hiding this comment.
Left a few comments. The biggest issue is do_true_cfg = True hardcoded in the transformer __init__ — pretty sure that's why the hook never fires correctly (see inline comment).
Also, estimate_teacache_coefficients() is now dead code after inlining its logic into estimate(). Either remove it or keep calling it — having both is confusing.
Same print() statements exist in the standalone function too (lines 132-135 on main), so those should get cleaned up as part of this.
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
… print with logger Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…ew one in forward() Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
Thanks again for the quick review!
Yes, this indeed was the main issue, now I am seeing a
I've removed the dead code!
Cleaned this up too! |
Signed-off-by: akshatvishu <33392262+akshatvishu@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d39eb7a4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| od_config = OmniDiffusionConfig.from_kwargs(model=model_path, dtype=dtype) | ||
| od_config.model_class_name = "GlmImagePipeline" | ||
|
|
||
| tf_config_path = hf_hub_download(model_path, "transformer/config.json") |
There was a problem hiding this comment.
Handle local checkpoints when reading GLM transformer config
GlmImageAdapter.load_pipeline unconditionally calls hf_hub_download(model_path, "transformer/config.json"), which treats model_path as a Hub repo id; when callers pass a local model directory (a supported pattern elsewhere in this repo), this path resolution fails before the pipeline is built. That makes TeaCache coefficient estimation for GLM unusable in local/offline setups and breaks parity with other loaders that accept filesystem paths.
Useful? React with 👍 / 👎.
| sampling_params = OmniDiffusionSamplingParams( | ||
| num_inference_steps=generate_kwargs.get("num_inference_steps", 20), | ||
| seed=generate_kwargs.get("seed", 42), | ||
| ) |
There was a problem hiding this comment.
Pass GLM prior tokens into coefficient collection requests
collect_from_prompt only populates num_inference_steps and seed, then sends a plain prompt; for GlmImagePipeline, non-warmup requests require prior_token_ids (and optionally prior_token_image_ids) in request extras, otherwise pipeline_glm_image.forward raises a ValueError. As written, the new GLM estimator path cannot collect data from normal prompts, so the advertised support is functionally broken unless users bypass this API.
Useful? React with 👍 / 👎.
|
It seems like the new |
Part of #1217
Purpose
Add TeaCache Support to Glm Image
Test Plan
Initial coefficient estimation and testing done at this colab notebook
Test Result
BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)