Skip to content

Commit 6122070

Browse files
committed
fixing syntax errors in posts
1 parent 30757ea commit 6122070

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

docs/writing/posts/Karpathy's - let's build GPT from scratch.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,10 +111,13 @@ In the first approach, we added 1 to the actual count because we don't want to e
111111

112112
Similarly, gradient based approach has a way to "smoothing". When you keep all values of `W` to be zero, exp(W) gives all ones and softmax would provide equal probabilities to all outputs. You incentivise this in loss function by using second component like below
113113

114-
```python
114+
115+
```python
115116
loss = -probs[torch.arange(228146), ys].log().mean() + (0.1 * (W**2).mean())
116117
```
117118

119+
120+
118121
Second component pushed W to be zero , 0.1 is the strength of Regularization that determines the how much weight we want to give to this regularization component. It is similar to the number of "fake" count you add in the first approach.
119122

120123
We took two approaches
@@ -135,12 +138,14 @@ As a first step, we need to build embedding for the characters, we start with 2
135138

136139
![Pasted%20image%2020250130124540](img/Pasted%20image%2020250205123847.png)
137140

138-
Pasted image 20250205123847.png
141+
142+
139143

140144
```python
141145
h = emb.view(-1, 6) @ W1 + b1 # Hiden layer activation
142146
```
143147

148+
144149
We index on embedding matrix to get the weight / embeddings for the character. Another way to interpret is one hot encoding. indexing and one hot encoding produce similar result. in this case we think first layer as weight of neural network.
145150

146151
```python

0 commit comments

Comments
 (0)