Commit 48cce2b
committed
examples/finetune -opt SGD (stochastic gradient descent) memory opt
add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating
m, v tensors.
support finetune.cpp arg -opt SGD (or sgd). (default adamw as before)
llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch)
when using SGD instead of 19gb (55 sec/epoch) using adamw.
(wikipedia 100 lines finetune)
(
using the same GPU memory, adamw can only do before OOM 512
batch/context, reaching:
train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00
val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00
SGD is superior, though it converges slower, with max before OOM 1728
batch/context (esp see the better validation perf):
train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00
val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00
)
note: when finetuning long enough (or w/ enough -lr),
validation accuracy *eventually* drops ('catastrophic forgetting')
-lr-half (halflife) option useful for SGD to avoid oscillation or
super slow underdamped learning (makes setting -lr more forgiving).
terminal -lr for now is set by lr-halvings i.e. if you want at most
1/8 the inital -lr you set -lr-halvings 3.
note: objective loss not directly comparable between adamw, sgd? -
check perplexity or accuracy or consider relative improvements
for convergence
new finetune args -wd 1e-9 to enable weight decay in sgd or adamw,
and max -epochs N (default 2 as before)
cache (1 - wd*alpha) in 'adamw' opt struct -
no noticeable perf benefit, disabled (still done
for new SGD though)
since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params
would probably be able to change between SGD and AdamW with each epoch
but would need to use adamw for the first (unconfirmed - no cmdline arg
to set such a policy yet)
test-opt checks adamw as before and now sgd (except for a few disabled
tests for sgd only; probably just needs logging values and adding
alternate reference values); tolerance on the 'regression'
test is broader for sgd (so we don't need many more epochs)1 parent 21c0217 commit 48cce2b
File tree
23 files changed
+727
-211
lines changed- common
- examples/training
- ggml
- include
- src
- ggml-cpu
- ggml-cuda
- ggml-vulkan
- include
- src
- tests
23 files changed
+727
-211
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1196 | 1196 | | |
1197 | 1197 | | |
1198 | 1198 | | |
| 1199 | + | |
1199 | 1200 | | |
1200 | 1201 | | |
1201 | 1202 | | |
| |||
2617 | 2618 | | |
2618 | 2619 | | |
2619 | 2620 | | |
2620 | | - | |
| 2621 | + | |
2621 | 2622 | | |
2622 | | - | |
| 2623 | + | |
2623 | 2624 | | |
2624 | 2625 | | |
2625 | 2626 | | |
| |||
3460 | 3461 | | |
3461 | 3462 | | |
3462 | 3463 | | |
| 3464 | + | |
| 3465 | + | |
| 3466 | + | |
| 3467 | + | |
| 3468 | + | |
| 3469 | + | |
| 3470 | + | |
| 3471 | + | |
| 3472 | + | |
| 3473 | + | |
| 3474 | + | |
| 3475 | + | |
| 3476 | + | |
| 3477 | + | |
| 3478 | + | |
| 3479 | + | |
| 3480 | + | |
| 3481 | + | |
| 3482 | + | |
| 3483 | + | |
| 3484 | + | |
| 3485 | + | |
| 3486 | + | |
| 3487 | + | |
| 3488 | + | |
| 3489 | + | |
| 3490 | + | |
| 3491 | + | |
| 3492 | + | |
| 3493 | + | |
| 3494 | + | |
| 3495 | + | |
| 3496 | + | |
| 3497 | + | |
| 3498 | + | |
| 3499 | + | |
| 3500 | + | |
| 3501 | + | |
| 3502 | + | |
| 3503 | + | |
| 3504 | + | |
| 3505 | + | |
| 3506 | + | |
| 3507 | + | |
| 3508 | + | |
| 3509 | + | |
3463 | 3510 | | |
3464 | 3511 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
| |||
1555 | 1556 | | |
1556 | 1557 | | |
1557 | 1558 | | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | 5 | | |
| 6 | + | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| |||
82 | 85 | | |
83 | 86 | | |
84 | 87 | | |
| 88 | + | |
85 | 89 | | |
86 | 90 | | |
87 | 91 | | |
| |||
233 | 237 | | |
234 | 238 | | |
235 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
236 | 259 | | |
237 | 260 | | |
238 | 261 | | |
| |||
366 | 389 | | |
367 | 390 | | |
368 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
369 | 397 | | |
370 | 398 | | |
371 | 399 | | |
| |||
690 | 718 | | |
691 | 719 | | |
692 | 720 | | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | 18 | | |
20 | 19 | | |
21 | | - | |
| 20 | + | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | | - | |
| 25 | + | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | 41 | | |
43 | | - | |
44 | | - | |
45 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
72 | 73 | | |
73 | | - | |
| 74 | + | |
74 | 75 | | |
75 | | - | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
79 | 80 | | |
80 | | - | |
81 | | - | |
82 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
83 | 84 | | |
84 | 85 | | |
85 | 86 | | |
| |||
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
91 | | - | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
77 | 84 | | |
78 | 85 | | |
79 | | - | |
80 | 86 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
86 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
87 | 97 | | |
88 | 98 | | |
89 | 99 | | |
| |||
113 | 123 | | |
114 | 124 | | |
115 | 125 | | |
116 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
117 | 130 | | |
118 | 131 | | |
119 | 132 | | |
| |||
142 | 155 | | |
143 | 156 | | |
144 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
145 | 162 | | |
146 | 163 | | |
147 | 164 | | |
| |||
226 | 243 | | |
227 | 244 | | |
228 | 245 | | |
| 246 | + | |
229 | 247 | | |
230 | 248 | | |
231 | 249 | | |
232 | 250 | | |
233 | 251 | | |
234 | 252 | | |
| 253 | + | |
235 | 254 | | |
236 | 255 | | |
237 | 256 | | |
0 commit comments