Skip to content

Commit 69603b1

Browse files
committed
nlp>transformer minor grammar fix
1 parent e8fe43d commit 69603b1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/natural_language_processing/transformer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Transformers
4141

4242
- And thats it :smile: Well at least from 10k feet :airplane:. Looking at the technicalities, the process drills down to,
4343
- Every token is not used as-it-is, but first converted to key, value and query format using linear projections. We have key, value and query weights denoted as $W_k$, $W_v$ and $W_q$. Each input token's representation is first multipled with these weights to get $k_i$, $v_i$ and $q_i$.
44-
- Next the query of one token is dot product with the keys of all token. On applying softmax to the output, we get a probability score of importance of every token for the the given token.
44+
- Next the query of one token is dot product with the keys of all token. On applying softmax to the output, we get a probability score of importance of every token for the given token.
4545
- Finally, we do weighted sum of values of all keys with this score and get the vector representation of the current token.
4646
- It is easy to understand the process while looking at one token at a time, but in reality it is completely vectorized and happens for all the tokens at the same time. The formula for the self-attention is shown below, where Q, K and V are the matrices you get on multiplication of all input tokens with the query, key and value weights.
4747

0 commit comments

Comments
 (0)