We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent f548686 commit 338353eCopy full SHA for 338353e
kv-cache.md
@@ -85,8 +85,10 @@ Here’s a minimal PyTorch equivalent using a causal mask:
85
86
```python
87
import torch.nn.functional as F
88
+import math
89
-attention_scores = Q @ K.T
90
+d_k = K.shape[-1]
91
+attention_scores = (Q @ K.T) / math.sqrt(d_k)
92
93
# Lower triangular mask to prevent future token access
94
causal_mask = torch.tril(torch.ones(input_seq_length, input_seq_length))
0 commit comments