-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Contact Details
No response
What happened?
Got this email, flagging for later:
The issue is with the dimensionality of the head_attention_value_output for GPT-OSS 20B, which I think might generalize to other models. From the best I can tell, the intervention happens on the output of the Value matrix before multiplying it with the Output matrix. Hence the correct dimensionality would be head_dim (= 64). This is what I did to fix it:
In the modelings_intervenable_gpt_oss.py file, there is the gpt_oss_type_to_dimension_mapping dictionary. Within it, there is this line:
"head_attention_value_output": ("hidden_size/num_attention_heads",)
This quantity is 2880/64=45, and 45 is not an important number in any of the parameters.
Currently I changed this to:
"head_attention_value_output": ("num_attention_heads*64",), (the * operator in the formula does not allow the second operand to be a parameter name)
but I think it could as well have been
"head_attention_value_output": ("head_dim",),
because any value above 64 seems to give the correct dimensionality of 64.
The issue seems to be that the dimensionality cannot be smaller than 64, because of this line in intervention_utils.py:121 :
return base[..., :interchange_dim]
which truncates the output to the first 45 dimensions.
I think the 'interchange_dim' value in turn comes from the following lines in intervenable_base.py:134 :
component_dim = get_dimension_by_component(
get_internal_model_type(model), model.config,
representation.component
)
which uses the dimensionality formula from the modelings_intervenable_gpt_oss.py file.
With the current fix, I am using it for CollectIntervention and I get sensible results (I took the 64 dimensional vector, projected it using the Output matrix and logitlens-ed it to see if the output makes sense). I can send you this code if it would help.
It seems llama (below) has this same dimensionality formula, whereas gemma-2b has head_dim as I expected.
llama: "head_attention_value_output": ("hidden_size/num_attention_heads",),
gemma-2b: "head_attention_value_output": ("head_dim",),
I am guessing this might be due to how the head_dim of different models are calculated, and the formula was wrong for GPT-OSS 20B.
Thanks!
Code to produce this issue.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working