-
Notifications
You must be signed in to change notification settings - Fork 13
finetune-lora: Add checkpoint saving & resuming from saved checkpoint #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: temp-latest-finetuning
Are you sure you want to change the base?
finetune-lora: Add checkpoint saving & resuming from saved checkpoint #32
Conversation
This commit adds checkpointing for fine-tuning: - Add checkpoint saving every N steps with --checkpoint-save-steps - Save complete training state: model weights, optimizer state, metadata - Implement two-phase optimizer state loading to avoid memory issues - Add --resume-from and --auto-resume functionality - Store optimizer momentum/variance tensors in GGUF format - Add checkpoint validation for rank, alpha, and target modules - Update README.md with checkpointing documentation The optimizer state loading: iteration count is loaded during initialization, while tensor data (grad_m, grad_v) is loaded after ggml_opt_alloc creates the proper tensor structures.
Signed-off-by: Marcus Edel <[email protected]>
Signed-off-by: Marcus Edel <[email protected]>
…sion using .string().c_str(). Signed-off-by: Marcus Edel <[email protected]>
params.escape = false; | ||
parse_finetune_args(argc, argv, ft_params); | ||
|
||
if (!common_params_parse(argc, argv, params, LLAMA_EXAMPLE_PERPLEXITY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make all parameters from LLAMA_EXAMPLE_PERPLEXITY also be available in LLAMA_EXAMPLE_MAIN ? as we use this last one in the addon to access all tunable parameters.
// get the gradient accumulator for a node from the forward graph | ||
GGML_API struct ggml_tensor * ggml_opt_grad_acc(ggml_opt_context_t opt_ctx, struct ggml_tensor * node); | ||
|
||
// get optimizer state tensors (momentum and variance for AdamW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add SGD support, so we could choose between them on command line? SGD have fewer parameters, and looks like it is already supported in llama.cpp
This PR adds checkpointing for fine-tuning:
The optimizer state loading: iteration count is loaded during initialization, while tensor data (grad_m, grad_v) is loaded after ggml_opt_alloc creates the proper tensor structures.