[Attention] Attention head quantization strategy #481

kylesayrs · 2025-10-06T22:19:26Z

Purpose

Support attention head quantization
- This will be used by [Transform] Attention/Cache transforms #436

Given an attention state of shape (batch_size, num_heads, seq_len, head_dim), the head attention strategy will generate scales of shape (num_heads, 1, 1).

Prerequisites

[Tests] Mock Observers, Static Lifecycle Tests #482

Changes

Add attention head quantization strategy
Fix shapes of per-tensor attention flattening
Elaborate on attention calibration tests

Testing

Added attention head quantization test and validated that generated scales and zero points make sense

brian-dellabetta

how are we expecting users to use QuantizationStrategy.ATTN_HEAD in a recipe? If i'm understanding correctly, it would look something like this?

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          weights:
            strategy: attn_head
            ...
        group1:
          targets: ["re:.*(q|k|v)_proj$"]
          weights:
            strategy: group
            ...

kylesayrs · 2025-10-07T23:00:28Z

@brian-dellabetta I’ve decided that giving per-attention strategy its own strategy (rather than reusing group) makes more sense.

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          input_activations:
            strategy: attn_head
            ...

brian-dellabetta

overall format LGTM, but i'm struggling with understanding how we're arriving at some of these expected_shapes

src/compressed_tensors/quantization/lifecycle/initialize.py

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

Thanks for updating!

brian-dellabetta · 2025-10-08T19:34:20Z

tests/mock_observer.py

        raise ValueError("Block quantization cannot be applied to attention")

+    if args.strategy == QuantizationStrategy.ATTN_HEAD:
+        # (batch_size * seq_len, num_heads, 1, 1, head_dim)


why do we want the 1, 1 here?

kylesayrs changed the base branch from main to kylesayrs/refactor-initialize-tests October 6, 2025 22:22

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from bf00a99 to 2ea692d Compare October 6, 2025 22:28

kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 97a4d16 to 0fdfbd1 Compare October 6, 2025 22:31

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 2ea692d to 326f802 Compare October 6, 2025 22:31

kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 0fdfbd1 to 8973328 Compare October 7, 2025 22:05

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from 70da261 to 48875e2 Compare October 7, 2025 22:13

kylesayrs marked this pull request as ready for review October 7, 2025 22:15

brian-dellabetta reviewed Oct 7, 2025

View reviewed changes

brian-dellabetta approved these changes Oct 8, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/initialize.py Outdated Show resolved Hide resolved

kylesayrs added 8 commits October 8, 2025 14:22

refactor

3287d99

Signed-off-by: Kyle Sayers <[email protected]>

reduce diff

0d10e16

Signed-off-by: Kyle Sayers <[email protected]>

increase num of required observed dims

0d08376

Signed-off-by: Kyle Sayers <[email protected]>

add tests

7c1951a

Signed-off-by: Kyle Sayers <[email protected]>

add tests for attn head

db3f9b7

Signed-off-by: Kyle Sayers <[email protected]>

add tests

233890f

Signed-off-by: Kyle Sayers <[email protected]>

reduce diff

c4a5cf4

Signed-off-by: Kyle Sayers <[email protected]>

fix shapes

e1ca4fd

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 48875e2 to e1ca4fd Compare October 8, 2025 18:44

fix shapes

d084c5e

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta approved these changes Oct 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention] Attention head quantization strategy #481

[Attention] Attention head quantization strategy #481

kylesayrs commented Oct 6, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

kylesayrs commented Oct 7, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

brian-dellabetta Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Attention] Attention head quantization strategy #481

Are you sure you want to change the base?

[Attention] Attention head quantization strategy #481

Conversation

kylesayrs commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Testing

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs commented Oct 6, 2025 •

edited

Loading

kylesayrs commented Oct 7, 2025 •

edited

Loading

brian-dellabetta Oct 8, 2025 •

edited

Loading