You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
new finetune CLI arg -wd 1e-5 to enable weight decay in sgd or adamw,
and -epochs N (default 2 as before)
cache 1. - wd*alpha in 'adamw' opt struct
cache computed optimizer opts (formerly were computed twice per epoch)
new GGML_OPT_OPTIMIZER_SGD in ggml. avoids allocating
m,v. ggml_opt_init now becomes aware of the optimization method
observed 11gb gpu ram when using SGD instead of 20gb using adamw for
llama 3.2-1b-F32 (finetune/ggml-opt only works on F32 so far),
objective perplexity not directly comparable but improvements observed
over two epochs, and accuracy on train strictly improves when
switching between tune methods
since memory is pre-allocated, the user defined fn that can vary
optimizer settings would probably be able to change between SGD and
adamw with each epoch but would need to use adamw for the first (not
verified)
0 commit comments