Skip to content

Commit e4f6a2e

Browse files
fix(slides): resolve lecture 16 overflow issues and convert code block to table
- Add scale classes to fix overflow on training loop, HuggingFace, and scaling laws slides - Convert 'Concrete numbers' code block to proper markdown table - Recompile HTML and PDF with fixes applied Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
1 parent 3d11b60 commit e4f6a2e

File tree

3 files changed

+145
-110
lines changed

3 files changed

+145
-110
lines changed

slides/week5/lecture16.html

Lines changed: 135 additions & 101 deletions
Large diffs are not rendered by default.

slides/week5/lecture16.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ Perplexity measures how "confused" the model is. A perplexity of $k$ means the m
159159
</div>
160160

161161
---
162+
<!-- _class: scale-95 -->
162163

163164
# The training loop
164165

@@ -293,7 +294,7 @@ This preserves the gradient *direction* while limiting its *magnitude*. A common
293294
</div>
294295

295296
---
296-
<!-- _class: scale-90 -->
297+
<!-- _class: scale-80 -->
297298

298299
# Training with HuggingFace
299300

@@ -354,6 +355,7 @@ Performance improves as a straight line on a log-log plot. There are no sudden j
354355
</div>
355356

356357
---
358+
<!-- _class: scale-85 -->
357359

358360
# Visualizing scaling laws
359361

@@ -369,14 +371,13 @@ This means doubling compute reduces loss by a *fixed percentage* — not a fixed
369371

370372
<div class="example-box" data-title="Concrete numbers">
371373

372-
```
373-
Model Parameters Loss Perplexity
374-
GPT-2 small 117M 3.30 27.0
375-
GPT-2 medium 345M 3.07 21.5
376-
GPT-2 large 774M 2.93 18.8
377-
GPT-2 XL 1.5B 2.85 17.4
378-
GPT-3 175B ~2.4 ~11.0
379-
```
374+
| Model | Parameters | Loss | Perplexity |
375+
|-------|-----------|------|-----------|
376+
| GPT-2 small | 117M | 3.30 | 27.0 |
377+
| GPT-2 medium | 345M | 3.07 | 21.5 |
378+
| GPT-2 large | 774M | 2.93 | 18.8 |
379+
| GPT-2 XL | 1.5B | 2.85 | 17.4 |
380+
| GPT-3 | 175B | ~2.4 | ~11.0 |
380381

381382
Each ~10x increase in parameters gives roughly the same *percentage* improvement in loss.
382383

slides/week5/lecture16.pdf

-156 KB
Binary file not shown.

0 commit comments

Comments
 (0)