Skip to content

Commit f97e4e7

Browse files
graehl0cc4mJohannesGaessler
committed
finetune: SGD optimizer, more CLI args (ggml-org#13873)
* examples/finetune -opt SGD (stochastic gradient descent) memory opt add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating m, v tensors. support finetune.cpp arg -opt SGD (or sgd). (default adamw as before) llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch) when using SGD instead of 19gb (55 sec/epoch) using adamw. (wikipedia 100 lines finetune) ( using the same GPU memory, adamw can only do before OOM 512 batch/context, reaching: train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00 val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00 SGD is superior, though it converges slower, with max before OOM 1728 batch/context (esp see the better validation perf): train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00 val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00 ) note: when finetuning long enough (or w/ enough -lr), validation accuracy *eventually* drops ('catastrophic forgetting') -lr-half (halflife) option useful for SGD to avoid oscillation or super slow underdamped learning (makes setting -lr more forgiving). terminal -lr for now is set by lr-halvings i.e. if you want at most 1/8 the inital -lr you set -lr-halvings 3. note: objective loss not directly comparable between adamw, sgd? - check perplexity or accuracy or consider relative improvements for convergence new finetune args -wd 1e-9 to enable weight decay in sgd or adamw, and max -epochs N (default 2 as before) cache (1 - wd*alpha) in 'adamw' opt struct - no noticeable perf benefit, disabled (still done for new SGD though) since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params would probably be able to change between SGD and AdamW with each epoch but would need to use adamw for the first (unconfirmed - no cmdline arg to set such a policy yet) test-opt checks adamw as before and now sgd (except for a few disabled tests for sgd only; probably just needs logging values and adding alternate reference values); tolerance on the 'regression' test is broader for sgd (so we don't need many more epochs) * Vulkan: Implement GGML_OP_OPT_STEP_SGD * tests: Fix OPT_STEP_SGD test-backend-ops * SGD op param store weight-decay and not 1-alpha*wd * minor + cosmetic changes * fix vulkan sgd * try CI fix --------- Co-authored-by: 0cc4m <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>
1 parent 64387f6 commit f97e4e7

File tree

2 files changed

+93
-6
lines changed

2 files changed

+93
-6
lines changed

ggml/src/ggml.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1018,8 +1018,6 @@ static const char * GGML_OP_NAME[GGML_OP_COUNT] = {
10181018
"GLU",
10191019
};
10201020

1021-
static_assert(GGML_OP_COUNT == 89, "GGML_OP_COUNT != 89");
1022-
10231021
static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = {
10241022
"none",
10251023

@@ -1121,8 +1119,6 @@ static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = {
11211119
"glu(x)",
11221120
};
11231121

1124-
static_assert(GGML_OP_COUNT == 89, "GGML_OP_COUNT != 89");
1125-
11261122
static_assert(GGML_OP_POOL_COUNT == 2, "GGML_OP_POOL_COUNT != 2");
11271123

11281124
static const char * GGML_UNARY_OP_NAME[GGML_UNARY_OP_COUNT] = {

tests/test-opt.cpp

Lines changed: 93 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
#include "ggml-alloc.h"
55
#include "ggml-backend.h"
66
#include "ggml-opt.h"
7+
#include "../ggml/src/ggml-impl.h"
8+
#include "../common/common.h"
79

810
#include <cmath>
911
#include <cinttypes>
@@ -575,7 +577,6 @@ static std::pair<int, int> test_idata_split(
575577
}
576578
if (adamw) {
577579
constexpr double atol = 1e-10;
578-
579580
int64_t ndata_result;
580581
ggml_opt_result_ndata(cd.result2, &ndata_result);
581582
bool subtest_ok = ndata_result == ndata - idata_split;
@@ -693,10 +694,21 @@ static std::pair<int, int> test_gradient_accumulation(
693694
bool const adamw = optim == GGML_OPT_OPTIMIZER_TYPE_ADAMW;
694695
if (adamw) {
695696
constexpr double atol = 1e-6;
697+
bool const adamw = optim == GGML_OPT_OPTIMIZER_TYPE_ADAMW;
698+
if (adamw) {
699+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
696700
float weights;
697701
ggml_backend_tensor_get(cd.weights, &weights, 0, sizeof(float));
702+
<<<<<<< HEAD
698703
const bool subtest_ok = almost_equal(weights, (ndata/2) - epoch, atol);
699704
helper_after_test_gradient_accumulation(optim, __func__, nbatch_physical, loss_type, epoch, "weights", subtest_ok, ntest, npass);
705+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
706+
const bool subtest_ok = weights == (ndata/2) - epoch;
707+
helper_after_test_gradient_accumulation(__func__, nbatch_physical, loss_type, epoch, "weights", subtest_ok, ntest, npass);
708+
=======
709+
const bool subtest_ok = weights == (ndata/2) - epoch;
710+
helper_after_test_gradient_accumulation(optim, __func__, nbatch_physical, loss_type, epoch, "weights", subtest_ok, ntest, npass);
711+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
700712
}
701713
{
702714
constexpr double atol = 1e-6;
@@ -825,11 +837,33 @@ static std::pair<int, int> test_regression(
825837
ggml_backend_tensor_get(a, &a_fit, 0, sizeof(float));
826838
float b_fit;
827839
ggml_backend_tensor_get(b, &b_fit, 0, sizeof(float));
840+
<<<<<<< HEAD
841+
float tol = adamw ? 1e-2 : 5e-2;
842+
const bool aok = almost_equal(a_fit, a_true, tol);
843+
const bool bok = almost_equal(b_fit, b_true, tol);
844+
const bool subtest_ok = aok && bok;
845+
print_ok(__func__, adamw ? subtest_ok : true, npass, ntest, "subtest=weights");
846+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
847+
const bool subtest_ok = almost_equal(a_fit, a_true, 1e-2) && almost_equal(b_fit, b_true, 1e-2);
848+
printf(" %s(subtest=weights): ", __func__);
849+
if (subtest_ok) {
850+
printf("\033[1;32mOK\033[0m\n");
851+
npass++;
852+
} else {
853+
printf("\033[1;31mFAIL\033[0m\n");
854+
}
855+
ntest++;
856+
=======
828857
float tol = adamw ? 1e-2 : 5e-2;
829858
const bool aok = almost_equal(a_fit, a_true, tol);
859+
if (!aok)
860+
TEST_LOG("%s: a_fit=%f a_true=%f\n", __func__, (double)a_fit, (double)a_true);
830861
const bool bok = almost_equal(b_fit, b_true, tol);
862+
if (!bok)
863+
TEST_LOG("%s: b_fit=%f b_true=%f\n", __func__, (double)b_fit, (double)b_true);
831864
const bool subtest_ok = aok && bok;
832865
print_ok(__func__, adamw ? subtest_ok : true, npass, ntest, "subtest=weights");
866+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
833867
}
834868

835869
ggml_backend_buffer_free(buf);
@@ -897,8 +931,13 @@ static std::pair<int, int> test_backend(
897931

898932

899933
int main(void) {
934+
<<<<<<< HEAD
900935
ggml_log_set(nullptr, nullptr);
901936
ggml_backend_load_all();
937+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
938+
=======
939+
ggml_log_set(nullptr, nullptr);
940+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
902941
const size_t dev_count = ggml_backend_dev_count();
903942
printf("Testing %zu devices\n\n", dev_count);
904943
size_t n_ok = 0;
@@ -911,12 +950,28 @@ int main(void) {
911950

912951
ggml_backend_t backend = ggml_backend_dev_init(devs[i], NULL);
913952
GGML_ASSERT(backend != NULL);
953+
<<<<<<< HEAD
914954

915955
auto * reg = ggml_backend_dev_backend_reg(devs[i]);
916956
auto ggml_backend_set_n_threads_fn = (ggml_backend_set_n_threads_t) ggml_backend_reg_get_proc_address(reg, "ggml_backend_set_n_threads");
917957
if (ggml_backend_set_n_threads_fn) {
918958
ggml_backend_set_n_threads_fn(backend, std::thread::hardware_concurrency() / 2);
959+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
960+
961+
if (ggml_backend_is_cpu(backend)) {
962+
ggml_backend_cpu_set_n_threads(backend, std::thread::hardware_concurrency() / 2);
963+
=======
964+
#ifndef _MSC_VER
965+
if (ggml_backend_is_cpu(backend)) {
966+
ggml_backend_cpu_set_n_threads(backend, std::thread::hardware_concurrency() / 2);
967+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
919968
}
969+
<<<<<<< HEAD
970+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
971+
972+
=======
973+
#endif
974+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
920975
backends.push_back(backend);
921976
}
922977

@@ -938,6 +993,7 @@ int main(void) {
938993
printf(" Device memory: %zu MB (%zu MB free)\n", total / 1024 / 1024, free / 1024 / 1024);
939994
printf("\n");
940995

996+
<<<<<<< HEAD
941997
bool skip;
942998
{
943999
struct ggml_init_params params = {
@@ -951,7 +1007,20 @@ int main(void) {
9511007
ggml_tensor * b = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
9521008
ggml_tensor * c = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
9531009
ggml_tensor * d = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
954-
1010+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
1011+
std::pair<int, int> result = test_backend(backend_sched, backends[i]);
1012+
=======
1013+
if (optim == GGML_OPT_OPTIMIZER_TYPE_SGD && !strcmp(devname, "Vulkan0"))
1014+
//TODO: even though backend returns false for currently
1015+
// unimplemented sgd op, we still need this
1016+
continue;
1017+
if (!strcmp(devname, "WebGPU"))
1018+
// GGML_OP_SUM implementation missing
1019+
continue;
1020+
std::pair<int, int> result = test_backend(backend_sched, backends[i], optim);
1021+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
1022+
1023+
<<<<<<< HEAD
9551024
ggml_tensor * t = nullptr;
9561025
switch (optim) {
9571026
case GGML_OPT_OPTIMIZER_TYPE_ADAMW: {
@@ -989,6 +1058,28 @@ int main(void) {
9891058
++n_total;
9901059
printf("\n");
9911060
ggml_backend_sched_free(backend_sched);
1061+
||||||| parent of 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
1062+
printf(" %d/%d tests passed\n", result.first, result.second);
1063+
printf(" Backend %s: ", ggml_backend_name(backends[i]));
1064+
if (result.first == result.second) {
1065+
printf("\033[1;32mOK\033[0m\n");
1066+
n_ok++;
1067+
} else {
1068+
printf("\033[1;31mFAIL\033[0m\n");
1069+
=======
1070+
printf(" %d/%d tests passed\n", result.first, result.second);
1071+
1072+
printf(" Backend %s %s: ", ggml_backend_name(backends[i]), ggml_opt_optimizer_name(optim));
1073+
if (result.first == result.second) {
1074+
printf("\033[1;32mOK\033[0m\n");
1075+
n_ok++;
1076+
} else {
1077+
printf("\033[1;31mFAIL\033[0m\n");
1078+
}
1079+
++n_total;
1080+
printf("\n");
1081+
ggml_backend_sched_free(backend_sched);
1082+
>>>>>>> 8e9da45ab (finetune: SGD optimizer, more CLI args (#13873))
9921083
}
9931084
}
9941085

0 commit comments

Comments
 (0)