-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
In the attention-based pooling section at exp_long_term_forecasting.py#L569C20-L569C85, the attention scores are computed by:
attention_scores = torch.bmm(prompt_emb_norm, outputs_norm)This batch matrix multiplication requires that the tensor dimensions align correctly:
- If
prompt_emb_normhas shape$(B, L, N)$ where:-
$B$ is the batch size, -
$L$ is the prompt sequence length (number of tokens), -
$N$ is the projected embedding solution
-
Then outputs_norm must have shape
- The middle dimension
$N$ must match the embedding projection dimension ofprompt_emb_norm, -
$P$ is the output sequence length .
The result is an attention score tensor of shape
Issue:
Right now, the code does not explicitly enforce or verify that outputs_norm's dimension prompt_emb_norm's embedding dimension before the multiplication.
- One would a expected a linear layer mapping the initial forecasting models outputs to a vector of dimension of (B, N, P) to make the multiplication feasible.
- You have certainly set the hyperams such that N(text_embedding_dim in code) matches the 2nd dimension of the Forecasting model (which is the the feature count),
thus making the computation of attention feasible but the rationale behind such choise is not clear and not mentionned in your original paper.
I would appreciate if you could provide some explanations on those technical choices.
Thanks
Metadata
Metadata
Assignees
Labels
No labels