Skip to content

Commit 3cd9346

Browse files
committed
fix typos
1 parent 06e3407 commit 3cd9346

File tree

2 files changed

+2
-4
lines changed

2 files changed

+2
-4
lines changed

learn-pr/wwl-data-ai/fundamentals-machine-learning/includes/5-binary-classification.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ So 100% of the patients predicted by our model to have diabetes do in fact have
134134

135135
#### F1-score
136136

137-
*F1-score* is an overall metric that combined recall and precision. The formula for F1-score is:
137+
*F1-score* is an overall metric that combines recall and precision. The formula for F1-score is:
138138

139139
***(2 x Precision x Recall) ÷ (Precision + Recall)***
140140

learn-pr/wwl-data-ai/fundamentals-machine-learning/includes/8a-transformers.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ We can plot these vectors in three-dimensional space, like this:
8181

8282
![Diagram of token vectors plotted in three dimensional space.](../media/embed-example.png)
8383

84-
The embedding vectors for `"dog"` and `"puppy"` describe a path along an almost identical direction, which is also fairly similar to the direction for `"cat"`. The embedding vector for `"skateboard"` however describes journey in a very different direction.
84+
The embedding vectors for `"dog"` and `"puppy"` describe a path along an almost identical direction, which is also fairly similar to the direction for `"cat"`. The embedding vector for `"skateboard"` however describes a journey in a very different direction.
8585

8686
> [!NOTE]
8787
> The previous example shows a simple example model in which each embedding has only three dimensions. Real language models have many more dimensions.
@@ -100,12 +100,10 @@ In a decoder block, attention layers are used to predict the next token in a seq
100100

101101
Remember that the attention layer is working with numeric vector representations of the tokens, not the actual text. In a decoder, the process starts with a sequence of token embeddings representing the text to be completed. The first thing that happens is that another *positional encoding* layer adds a value to each embedding to indicate its position in the sequence:
102102

103-
```
104103
- [**1**,5,6,2] (I)
105104
- [**2**,9,3,1] (heard)
106105
- [**3**,1,1,2] (a)
107106
- [**4**,10,3,2] (dog)
108-
```
109107

110108
During training, the goal is to predict the vector for the final token in the sequence based on the preceding tokens. The attention layer assigns a numeric *weight* to each token in the sequence so far. It uses that value to perform a calculation on the weighted vectors that produces an *attention score* that can be used to calculate a possible vector for the next token. In practice, a technique called *multi-head attention* uses different elements of the embeddings to calculate multiple attention scores. A neural network is then used to evaluate all possible tokens to determine the most probable token with which to continue the sequence. The process continues iteratively for each token in the sequence, with the output sequence so far being used regressively as the input for the next iteration – essentially building the output one token at a time.
111109

0 commit comments

Comments
 (0)