Commit 7c03387
committed
examples/finetune -opt SGD (stochastic gradient descent) memory opt
add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating
m, v tensors.
support finetune.cpp arg -opt SGD (or sgd). (default adamw as before)
llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch)
when using SGD instead of 19gb (55 sec/epoch) using adamw.
(wikipedia 100 lines finetune)
(
using the same GPU memory, adamw can only do before OOM 512
batch/context, reaching:
train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00
val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00
SGD is superior, though it converges slower, with max before OOM 1728
batch/context (esp see the better validation perf):
train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00
val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00
)
note: when finetuning long enough (or w/ enough -lr),
validation accuracy *eventually* drops ('catastrophic forgetting')
-lr-half (halflife) option useful for SGD to avoid oscillation or
super slow underdamped learning (makes setting -lr more forgiving).
terminal -lr for now is set by lr-halvings i.e. if you want at most
1/8 the inital -lr you set -lr-halvings 3.
note: objective loss not directly comparable between adamw, sgd? -
check perplexity or accuracy or consider relative improvements
for convergence
new finetune args -wd 1e-9 to enable weight decay in sgd or adamw,
and max -epochs N (default 2 as before)
cache (1 - wd*alpha) in 'adamw' opt struct -
no noticeable perf benefit, disabled (still done
for new SGD though)
since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params
would probably be able to change between SGD and AdamW with each epoch
but would need to use adamw for the first (unconfirmed - no cmdline arg
to set such a policy yet)
test-opt checks adamw as before and now sgd (except for a few disabled
tests for sgd only; probably just needs logging values and adding
alternate reference values); tolerance on the 'regression'
test is broader for sgd (so we don't need many more epochs)1 parent 860a9e4 commit 7c03387
File tree
22 files changed
+681
-211
lines changed- common
- examples/training
- ggml
- include
- src
- ggml-cpu
- ggml-cuda
- include
- src
- tests
22 files changed
+681
-211
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1196 | 1196 | | |
1197 | 1197 | | |
1198 | 1198 | | |
| 1199 | + | |
1199 | 1200 | | |
1200 | 1201 | | |
1201 | 1202 | | |
| |||
2609 | 2610 | | |
2610 | 2611 | | |
2611 | 2612 | | |
2612 | | - | |
| 2613 | + | |
2613 | 2614 | | |
2614 | | - | |
| 2615 | + | |
2615 | 2616 | | |
2616 | 2617 | | |
2617 | 2618 | | |
| |||
3373 | 3374 | | |
3374 | 3375 | | |
3375 | 3376 | | |
| 3377 | + | |
| 3378 | + | |
| 3379 | + | |
| 3380 | + | |
| 3381 | + | |
| 3382 | + | |
| 3383 | + | |
| 3384 | + | |
| 3385 | + | |
| 3386 | + | |
| 3387 | + | |
| 3388 | + | |
| 3389 | + | |
| 3390 | + | |
| 3391 | + | |
| 3392 | + | |
| 3393 | + | |
| 3394 | + | |
| 3395 | + | |
| 3396 | + | |
| 3397 | + | |
| 3398 | + | |
| 3399 | + | |
| 3400 | + | |
| 3401 | + | |
| 3402 | + | |
| 3403 | + | |
| 3404 | + | |
| 3405 | + | |
| 3406 | + | |
| 3407 | + | |
| 3408 | + | |
| 3409 | + | |
| 3410 | + | |
| 3411 | + | |
| 3412 | + | |
| 3413 | + | |
| 3414 | + | |
| 3415 | + | |
| 3416 | + | |
| 3417 | + | |
| 3418 | + | |
| 3419 | + | |
| 3420 | + | |
| 3421 | + | |
| 3422 | + | |
3376 | 3423 | | |
3377 | 3424 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1539 | 1539 | | |
1540 | 1540 | | |
1541 | 1541 | | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | 5 | | |
| 6 | + | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
11 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
| |||
80 | 82 | | |
81 | 83 | | |
82 | 84 | | |
| 85 | + | |
83 | 86 | | |
84 | 87 | | |
85 | 88 | | |
| |||
219 | 222 | | |
220 | 223 | | |
221 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
222 | 246 | | |
223 | 247 | | |
224 | 248 | | |
| |||
350 | 374 | | |
351 | 375 | | |
352 | 376 | | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
353 | 383 | | |
354 | 384 | | |
355 | 385 | | |
| |||
670 | 700 | | |
671 | 701 | | |
672 | 702 | | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | 1 | | |
7 | 2 | | |
8 | 3 | | |
9 | 4 | | |
10 | 5 | | |
11 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | | - | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | | - | |
| 27 | + | |
| 28 | + | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
| |||
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | | - | |
42 | 43 | | |
43 | | - | |
44 | | - | |
45 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
72 | 75 | | |
73 | | - | |
| 76 | + | |
74 | 77 | | |
75 | | - | |
| 78 | + | |
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
79 | 82 | | |
80 | | - | |
81 | | - | |
82 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
83 | 86 | | |
84 | 87 | | |
85 | 88 | | |
| |||
88 | 91 | | |
89 | 92 | | |
90 | 93 | | |
91 | | - | |
| 94 | + | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
77 | 84 | | |
78 | 85 | | |
79 | | - | |
80 | 86 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
86 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
87 | 97 | | |
88 | 98 | | |
89 | 99 | | |
| |||
113 | 123 | | |
114 | 124 | | |
115 | 125 | | |
116 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
117 | 130 | | |
118 | 131 | | |
119 | 132 | | |
| |||
142 | 155 | | |
143 | 156 | | |
144 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
145 | 162 | | |
146 | 163 | | |
147 | 164 | | |
| |||
226 | 243 | | |
227 | 244 | | |
228 | 245 | | |
| 246 | + | |
229 | 247 | | |
230 | 248 | | |
231 | 249 | | |
232 | 250 | | |
233 | 251 | | |
234 | 252 | | |
| 253 | + | |
235 | 254 | | |
236 | 255 | | |
237 | 256 | | |
0 commit comments