Skip to content

Multiplication Rationale is not clear here #18

@Waguy02

Description

@Waguy02

In the attention-based pooling section at exp_long_term_forecasting.py#L569C20-L569C85, the attention scores are computed by:

attention_scores = torch.bmm(prompt_emb_norm, outputs_norm)

This batch matrix multiplication requires that the tensor dimensions align correctly:

  • If prompt_emb_norm has shape $(B, L, N)$ where:
    • $B$ is the batch size,
    • $L$ is the prompt sequence length (number of tokens),
    • $N$ is the projected embedding solution

Then outputs_norm must have shape $(B, N, P)$, where:

  • The middle dimension $N$ must match the embedding projection dimension of prompt_emb_norm,
  • $P$ is the output sequence length .

The result is an attention score tensor of shape $(B, L, P)$.


Issue:

Right now, the code does not explicitly enforce or verify that outputs_norm's dimension $N$ matches prompt_emb_norm's embedding dimension before the multiplication.

  • One would a expected a linear layer mapping the initial forecasting models outputs to a vector of dimension of (B, N, P) to make the multiplication feasible.
  • You have certainly set the hyperams such that N(text_embedding_dim in code) matches the 2nd dimension of the Forecasting model (which is the the feature count),
    thus making the computation of attention feasible but the rationale behind such choise is not clear and not mentionned in your original paper.

I would appreciate if you could provide some explanations on those technical choices.

Thanks


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions