You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding q_norm, k_norm support for quantized models (microsoft#1483)
This PR adds support for q_norm and k_norm layers in quantized models
within the OGA framework.
Specifically, it introduces the following enhancements to
**quantized_model.py**:
- Initializes `q_norm` and `k_norm` as Tensor modules within the
`QuantizedAttention` and `QuantizedDecoder` classes.
- Maps the corresponding weights and biases for `q_norm` and `k_norm` to
the initialized tensor modules during model loading.
This enables accurate handling of models that include` q_norm` and
`k_norm` as part of their quantized attention mechanisms, improving
compatibility with newer quantized LLMs.
**Changes Made:**
- Added initialization of `q_norm` and `k_norm` as `Tensor` modules in:
- `QuantizedAttention` class
- `QuantizedDecoder` class
- Mapped corresponding weights and biases from the model to these tensor
modules during model loading
- Ensured consistency with the existing quantized attention
initialization flow
**Reviewer Notes:**
- Please verify:
- The initialization logic aligns with the handling of other norm layers
(e.g., `qkv_norm`)
- No side effects are introduced for models that do not contain `q_norm`
or `k_norm`
- Tested locally with a quantized model(Qwen3 models) containing
`q_norm`/`k_norm`, but additional validation with other architectures is
welcome
---------
Co-authored-by: Sumedha Atreysa <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
# Map weights and biases of norm, attention, and feed-forward network
151
152
# Graph order is input_layernorm --> q_proj/k_proj/v_proj --> o_proj --> post_attention_layernorm --> gate_proj/up_proj --> down_proj
153
+
# If model uses q_norm and k_norm, graph order is input_layernorm --> q_norm/q_proj/k_norm/k_proj/v_proj --> o_proj --> post_attention_layernorm --> gate_proj/up_proj --> down_proj
0 commit comments