Registering Forward Hook on Encoder Self-Attentions v20231117 vs v20240930 #2460
-
Hi, I used to be able to get the encoder self-attention activations from Whisper models using a simple forward hook method shown below, which would return a 1x8x1500x1500 (expected) for the base model. However, in the latest version (20240930) what I'm getting as an output is a 1x1500x512 tensor. What has changed to cause this? Are there any straightforward solutions other than using the older version? import torch
import librosa
import whisper
from whisper.tokenizer import get_tokenizer
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
model = whisper.load_model("base").to(DEVICE)
encoder_attn = [None] * model.dims.n_audio_layer
for i, block in enumerate(model.encoder.blocks):
block.attn.register_forward_hook(
lambda _, ins, outs, index=i: encoder_attn.__setitem__(index, outs)
)
tokenizer = get_tokenizer(model.is_multilingual, language='en')
file = "an_audio_file.wav"
speech_, sr_ = librosa.load(path=file, sr=16000)
speech_ = torch.from_numpy(speech_).float()
tokens = torch.tensor(
[
*tokenizer.sot_sequence,
]
).to(DEVICE)
mel = whisper.log_mel_spectrogram(whisper.pad_or_trim(speech_), n_mels=n_mels).to(DEVICE)
with torch.no_grad():
logits = model(mel.unsqueeze(0), tokens.unsqueeze(0)) |
Beta Was this translation helpful? Give feedback.
Answered by
erfanashams
Dec 2, 2024
Replies: 1 comment
-
I found the solution by installing hooks on the QK tensors and calculating the weights manually. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
erfanashams
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found the solution by installing hooks on the QK tensors and calculating the weights manually.