-
Notifications
You must be signed in to change notification settings - Fork 33
[Attention] Attention head quantization strategy #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: kylesayrs/refactor-initialize-tests
Are you sure you want to change the base?
[Attention] Attention head quantization strategy #481
Conversation
bf00a99
to
2ea692d
Compare
97a4d16
to
0fdfbd1
Compare
2ea692d
to
326f802
Compare
0fdfbd1
to
8973328
Compare
70da261
to
48875e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we expecting users to use QuantizationStrategy.ATTN_HEAD
in a recipe? If i'm understanding correctly, it would look something like this?
quant_stage:
quant_modifiers:
QuantizationModifier:
config_groups:
group0:
targets: ["re:.*self_attn$"]
weights:
strategy: attn_head
...
group1:
targets: ["re:.*(q|k|v)_proj$"]
weights:
strategy: group
...
@brian-dellabetta I’ve decided that giving per-attention strategy its own strategy (rather than reusing group) makes more sense. quant_stage:
quant_modifiers:
QuantizationModifier:
config_groups:
group0:
targets: ["re:.*self_attn$"]
input_activations:
strategy: attn_head
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall format LGTM, but i'm struggling with understanding how we're arriving at some of these expected_shapes
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
48875e2
to
e1ca4fd
Compare
Signed-off-by: Kyle Sayers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
raise ValueError("Block quantization cannot be applied to attention") | ||
|
||
if args.strategy == QuantizationStrategy.ATTN_HEAD: | ||
# (batch_size * seq_len, num_heads, 1, 1, head_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want the 1, 1
here?
Purpose
Given an attention state of shape
(batch_size, num_heads, seq_len, head_dim)
, the head attention strategy will generate scales of shape(num_heads, 1, 1)
.Prerequisites
Changes
Testing