Draft
Conversation
Contributor
|
glm 5 needs DSA, same as DS V3.2 |
tianhaox
reviewed
Feb 28, 2026
| help="Optional end-to-end request latency target (ms). Enables request-latency optimization mode.", | ||
| ) | ||
| parser.add_argument("--prefix", type=int, default=0, help="Prefix cache length. Default to 0.") | ||
| parser.add_argument( |
Contributor
There was a problem hiding this comment.
suggest we remove the wideep support in this pr. Do more complete design in a seperate PR.
tianhaox
reviewed
Feb 28, 2026
| @@ -0,0 +1,27 @@ | |||
| { | |||
Contributor
There was a problem hiding this comment.
This is incorrect.
https://huggingface.co/zai-org/GLM-5-FP8/blob/main/config.json
Contributor
There was a problem hiding this comment.
i will suggest we do manual copy paste to avoid illusion.
tianhaox
reviewed
Feb 28, 2026
| @@ -0,0 +1,21 @@ | |||
| { | |||
Contributor
There was a problem hiding this comment.
this is incorrect. without quant field.
Contributor
|
I think this whole PR needs a better design of DEEPSEEK_V32 model family. DS V32 and GLM-5 will share the model family. It uses DSA+MoE. I suggest we cancel this PR. Redesign after the DSA Op PR I've created. I will start a PR manually to better support this DEEPSEEK_V32 model family |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All GLM-5-specific MLA params are correctly read. Here's a summary of what was done:
New files:
qk_nope=192, v_head=256
common.py:
utils.py:
models.py:
a parameter
mla_config (defaulting to DeepSeek-V3 values for backward compat)