Skip to content

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Oct 6, 2025

Purpose

Given an attention state of shape (batch_size, num_heads, seq_len, head_dim), the head attention strategy will generate scales of shape (num_heads, 1, 1).

Prerequisites

Changes

  • Add attention head quantization strategy
  • Fix shapes of per-tensor attention flattening
  • Elaborate on attention calibration tests

Testing

  • Added attention head quantization test and validated that generated scales and zero points make sense

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/refactor-initialize-tests October 6, 2025 22:22
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from bf00a99 to 2ea692d Compare October 6, 2025 22:28
@kylesayrs kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 97a4d16 to 0fdfbd1 Compare October 6, 2025 22:31
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 2ea692d to 326f802 Compare October 6, 2025 22:31
@kylesayrs kylesayrs force-pushed the kylesayrs/refactor-initialize-tests branch from 0fdfbd1 to 8973328 Compare October 7, 2025 22:05
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch 2 times, most recently from 70da261 to 48875e2 Compare October 7, 2025 22:13
@kylesayrs kylesayrs marked this pull request as ready for review October 7, 2025 22:15
Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we expecting users to use QuantizationStrategy.ATTN_HEAD in a recipe? If i'm understanding correctly, it would look something like this?

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          weights:
            strategy: attn_head
            ...
        group1:
          targets: ["re:.*(q|k|v)_proj$"]
          weights:
            strategy: group
            ...

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Oct 7, 2025

@brian-dellabetta I’ve decided that giving per-attention strategy its own strategy (rather than reusing group) makes more sense.

quant_stage:
  quant_modifiers:
    QuantizationModifier:
      config_groups:
        group0:
          targets: ["re:.*self_attn$"]
          input_activations:
            strategy: attn_head
            ...

Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall format LGTM, but i'm struggling with understanding how we're arriving at some of these expected_shapes

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/add-attn-head-strat branch from 48875e2 to e1ca4fd Compare October 8, 2025 18:44
Signed-off-by: Kyle Sayers <[email protected]>
Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

raise ValueError("Block quantization cannot be applied to attention")

if args.strategy == QuantizationStrategy.ATTN_HEAD:
# (batch_size * seq_len, num_heads, 1, 1, head_dim)
Copy link
Contributor

@brian-dellabetta brian-dellabetta Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want the 1, 1 here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants