Skip to content

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Oct 6, 2025

Purpose

Prerequisites

Changes

  • Add attention head quantization strategy

Testing

  • Added attention head quantization test and validated that generated scales and zero points make sense

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/refactor-initialize-tests October 6, 2025 22:22
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from bf00a99 to 2ea692d Compare October 6, 2025 22:28
@kylesayrs kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 97a4d16 to 0fdfbd1 Compare October 6, 2025 22:31
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 2ea692d to 326f802 Compare October 6, 2025 22:31
@kylesayrs kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 0fdfbd1 to 8973328 Compare October 7, 2025 22:05
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 326f802 to 6a13bc4 Compare October 7, 2025 22:09
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 70da261 to 48875e2 Compare October 7, 2025 22:13
@kylesayrs kylesayrs marked this pull request as ready for review October 7, 2025 22:15
Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we expecting users to use QuantizationStrategy.ATTN_HEAD in a recipe? If i'm understanding correctly, it would look something like this?

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          weights:
            strategy: attn_head
            ...
        group1:
          targets: ["re:.*(q|k|v)_proj$"]
          weights:
            strategy: group
            ...

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Oct 7, 2025

@brian-dellabetta I’ve decided that giving per-attention strategy its own strategy (rather than reusing group) makes more sense.

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          input_activations:
            strategy: attn_head
            ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants