Commit b73f618
committed
examples/finetune -opt SGD (stochastic gradient descent) memory opt
add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating
m, v tensors.
support finetune.cpp arg -opt SGD (or sgd). (default adamw as before)
llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch)
when using SGD instead of 19gb (55 sec/epoch) using adamw.
(wikipedia 100 lines finetune)
(
using the same GPU memory, adamw can only do before OOM 512
batch/context, reaching:
train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00
val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00
SGD is superior, though it converges slower, with max before OOM 1728
batch/context (esp see the better validation perf):
train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00
val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00
)
note: when finetuning long enough (or w/ enough -lr),
validation accuracy *eventually* drops ('catastrophic forgetting')
-lr-half (halflife) option useful for SGD to avoid oscillation or
super slow underdamped learning (makes setting -lr more forgiving).
terminal -lr for now is set by lr-halvings i.e. if you want at most
1/8 the inital -lr you set -lr-halvings 3.
note: objective loss not directly comparable between adamw, sgd? -
check perplexity or accuracy or consider relative improvements
for convergence
new finetune args -wd 1e-9 to enable weight decay in sgd or adamw,
and max -epochs N (default 2 as before)
cache (1 - wd*alpha) in 'adamw' opt struct -
no noticeable perf benefit, disabled (still done
for new SGD though)
since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params
would probably be able to change between SGD and AdamW with each epoch
but would need to use adamw for the first (unconfirmed - no cmdline arg
to set such a policy yet)
test-opt checks adamw as before and now sgd (except for a few disabled
tests for sgd only; probably just needs logging values and adding
alternate reference values); tolerance on the 'regression'
test is broader for sgd (so we don't need many more epochs)1 parent 55c2646 commit b73f618
File tree
23 files changed
+729
-213
lines changed- common
- examples/training
- ggml
- include
- src
- ggml-cpu
- ggml-cuda
- ggml-vulkan
- include
- src
- tests
23 files changed
+729
-213
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1196 | 1196 | | |
1197 | 1197 | | |
1198 | 1198 | | |
| 1199 | + | |
1199 | 1200 | | |
1200 | 1201 | | |
1201 | 1202 | | |
| |||
2609 | 2610 | | |
2610 | 2611 | | |
2611 | 2612 | | |
2612 | | - | |
| 2613 | + | |
2613 | 2614 | | |
2614 | | - | |
| 2615 | + | |
2615 | 2616 | | |
2616 | 2617 | | |
2617 | 2618 | | |
| |||
3416 | 3417 | | |
3417 | 3418 | | |
3418 | 3419 | | |
| 3420 | + | |
| 3421 | + | |
| 3422 | + | |
| 3423 | + | |
| 3424 | + | |
| 3425 | + | |
| 3426 | + | |
| 3427 | + | |
| 3428 | + | |
| 3429 | + | |
| 3430 | + | |
| 3431 | + | |
| 3432 | + | |
| 3433 | + | |
| 3434 | + | |
| 3435 | + | |
| 3436 | + | |
| 3437 | + | |
| 3438 | + | |
| 3439 | + | |
| 3440 | + | |
| 3441 | + | |
| 3442 | + | |
| 3443 | + | |
| 3444 | + | |
| 3445 | + | |
| 3446 | + | |
| 3447 | + | |
| 3448 | + | |
| 3449 | + | |
| 3450 | + | |
| 3451 | + | |
| 3452 | + | |
| 3453 | + | |
| 3454 | + | |
| 3455 | + | |
| 3456 | + | |
| 3457 | + | |
| 3458 | + | |
| 3459 | + | |
| 3460 | + | |
| 3461 | + | |
| 3462 | + | |
| 3463 | + | |
| 3464 | + | |
| 3465 | + | |
3419 | 3466 | | |
3420 | 3467 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
| |||
1548 | 1549 | | |
1549 | 1550 | | |
1550 | 1551 | | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | 5 | | |
| 6 | + | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| |||
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
| 87 | + | |
84 | 88 | | |
85 | 89 | | |
86 | 90 | | |
| |||
223 | 227 | | |
224 | 228 | | |
225 | 229 | | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
226 | 249 | | |
227 | 250 | | |
228 | 251 | | |
| |||
354 | 377 | | |
355 | 378 | | |
356 | 379 | | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
357 | 386 | | |
358 | 387 | | |
359 | 388 | | |
| |||
677 | 706 | | |
678 | 707 | | |
679 | 708 | | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | 1 | | |
7 | 2 | | |
8 | 3 | | |
9 | 4 | | |
10 | 5 | | |
11 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | | - | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | | - | |
| 27 | + | |
| 28 | + | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
| |||
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | | - | |
42 | 43 | | |
43 | | - | |
44 | | - | |
45 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
72 | 75 | | |
73 | | - | |
| 76 | + | |
74 | 77 | | |
75 | | - | |
| 78 | + | |
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
79 | 82 | | |
80 | | - | |
81 | | - | |
82 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
83 | 86 | | |
84 | 87 | | |
85 | 88 | | |
| |||
88 | 91 | | |
89 | 92 | | |
90 | 93 | | |
91 | | - | |
| 94 | + | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
| |||
0 commit comments