[Attention] Attention head quantization strategy #481

kylesayrs · 2025-10-06T22:19:26Z

Purpose

Support attention head quantization
- This will be used by [Transform] Attention/Cache transforms #436

Prerequisites

[Tests] Mock Observers, Static Lifecycle Tests #482

Changes

Add attention head quantization strategy

Testing

Added attention head quantization test and validated that generated scales and zero points make sense

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

how are we expecting users to use QuantizationStrategy.ATTN_HEAD in a recipe? If i'm understanding correctly, it would look something like this?

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          weights:
            strategy: attn_head
            ...
        group1:
          targets: ["re:.*(q|k|v)_proj$"]
          weights:
            strategy: group
            ...

kylesayrs · 2025-10-07T23:00:28Z

@brian-dellabetta I’ve decided that giving per-attention strategy its own strategy (rather than reusing group) makes more sense.

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          input_activations:
            strategy: attn_head
            ...

kylesayrs changed the base branch from main to kylesayrs/refactor-initialize-tests October 6, 2025 22:22

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from bf00a99 to 2ea692d Compare October 6, 2025 22:28

kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 97a4d16 to 0fdfbd1 Compare October 6, 2025 22:31

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 2ea692d to 326f802 Compare October 6, 2025 22:31

kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 0fdfbd1 to 8973328 Compare October 7, 2025 22:05

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 326f802 to 6a13bc4 Compare October 7, 2025 22:09

kylesayrs added 7 commits October 7, 2025 18:13

refactor

e0db8db

Signed-off-by: Kyle Sayers <[email protected]>

reduce diff

0e58290

Signed-off-by: Kyle Sayers <[email protected]>

increase num of required observed dims

b6d0560

Signed-off-by: Kyle Sayers <[email protected]>

add tests

36bf6a5

Signed-off-by: Kyle Sayers <[email protected]>

add tests for attn head

e6d92db

Signed-off-by: Kyle Sayers <[email protected]>

add tests

51602c2

Signed-off-by: Kyle Sayers <[email protected]>

reduce diff

48875e2

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 70da261 to 48875e2 Compare October 7, 2025 22:13

kylesayrs marked this pull request as ready for review October 7, 2025 22:15

brian-dellabetta reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention] Attention head quantization strategy #481

[Attention] Attention head quantization strategy #481

kylesayrs commented Oct 6, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

kylesayrs commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Attention] Attention head quantization strategy #481

Are you sure you want to change the base?

[Attention] Attention head quantization strategy #481

Conversation

kylesayrs commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Testing

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Oct 6, 2025 •

edited

Loading

kylesayrs commented Oct 7, 2025 •

edited

Loading