Skip to content

(Enhancement) Applying mask to attention in one operation (3.5 Hiding future words with causal attention) #279

@labdmitriy

Description

@labdmitriy

Bug description

Hi Sebastian,

I think that it is not a bug but possible enhancement - to apply mask we have two steps now:

  • Creating lower triangular matrix ones and zeros:
mask_simple = torch.tril(torch.ones(context_length, context_length))
  • Multiply attention matrix with triangular matrix:
masked_simple = attn_weights * mask_simple

However this function (torch.tril) can be applied directly to attention matrix to get the same result:

torch.tril(attn_weights)

Thank you.

What operating system are you using?

None

Where do you run your code?

None

Environment




Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions