Skip to content

Conversation

makaveli10
Copy link

The PR adds:

  • LoRA finetuning support for both training a new adapter or finetuning an existing adapter. And saved the adapter at the end of the training run to be used as required for inference.
  • cuda: OUT_PROD Q8/Q4 for quantised lora finetuning.
  • vulkan: Added OUT_PROD operator for fp32 to enable finetuning. Added OUT_PROD Q8, Q4 to enable quantised finetuning.
  • vulkan: Added cross-entropy-loss-backward to allow lower context size which is critical for training on mobile device due to memory constraint.

@zoq
Copy link

zoq commented Aug 19, 2025

Steps to test llama.cpp inference on Android:

  1. Install Termux from the PlayStore and open it.
  2. Run apt update
  3. Run apt remove vulkan-loader-generic
  4. Run apt install git cmake vulkan-tools vulkan-headers shaderc vulkan-loader-android
  5. Run vulkaninfo --summary: This should show the driver and gpu information. If it's the stock driver, it shouldn't mention Mesa.
  6. git clone the repo inside termux and cd into it.
git clone https://github.com/makaveli10/qvac-ext-lib-llama.cpp.git
git checkout lora-finetuning

make sure to checkout the lora-finetuning branch
7. Configure the vulkan backend build with cmake -B build -DGGML_VULKAN=1
8. Build it with cmake --build build --config Debug -j2
9. Run termux-setup-storage and give storage permissions to termux.
10. Outside termux, download a model on the phone, click on it and select to open it with termux. Download the model from here: https://huggingface.co/prithivMLmods/Qwen3-0.6B-GGUF/tree/main i.e. download Qwen3_0.6B.Q8_0.gguf
11. Click "Open Directory" on the prompt.
12. The model should now be reachable inside termux in the ~/downloads directory.
13. For finetunine 8 bit Qwen:

./build/bin/llama-finetune-lora -m Qwen3_0.6B.Q8_0.gguf -f trump.txt -c 256 -b 256 -ub 256 -ngl 999

trump.txt dataset: https://github.com/user-attachments/files/21859494/trump.txt

@zoq
Copy link

zoq commented Aug 19, 2025

@zoq
Copy link

zoq commented Aug 19, 2025

./build/bin/llama-cli -m Qwen3_0.6B.Q8_0.gguf --lora trained-lora-adapter.gguf -if -p "What is your favorite pokemon?" -ngl 999

command we used for testing

Copy link

@andrunko andrunko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM in general, just some small comments/nits overall, feel free to ignore the nitpicks :).

case GGML_OP_ADD:
case GGML_OP_SUB:
case GGML_OP_MUL:
case GGML_OP_DIV:
return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&
return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&
Copy link

@andrunko andrunko Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spurious change?

@andrunko
Copy link

andrunko commented Aug 21, 2025

Looks like there are some CI failures also related to these changes - see https://github.com/tetherto/qvac-ext-lib-llama.cpp/actions/runs/17076253696/job/48418341198?pr=5 for example:

/__w/qvac-ext-lib-llama.cpp/qvac-ext-lib-llama.cpp/src/llama-lora-training.cpp:293:29: error: the address of 'ggml_tensor::name' will never be NULL [-Werror=address]
  293 |     if (!tensor || !tensor->name) {
      |                     ~~~~~~~~^~~~

@JamieBohannaWebDev
Copy link

JamieBohannaWebDev commented Aug 22, 2025

Fine Tuning attempt on Pixel 9 Pro Fold Evidence below.

Please note the 27.5 hours estimated completion time...

37156 37157 37158 37159

@makaveli10
Copy link
Author

makaveli10 commented Aug 22, 2025

@JamieBohannaWebDev On our side, I think for a test we ran it with 10-20% of the data. Took much less time. Also we have checkpoint saving resuming integration going on which would allows us to train in bursts by saving a checkpoint and resuming later from the same point.

@nurmanmus
Copy link

@JamieBohannaWebDev Did we test the output with some prompts after the completion of the fine-tuned model? (before vs. after)

@github-actions github-actions bot added the ggml label Aug 26, 2025
Copy link

@andrunko andrunko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, looks like I can't merge it though so will defer to someone else with perms to do it.

@andrunko
Copy link

The current CI failures seem unrelated to the changes here, both are failing with:

No suitable Dawn artifact found!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants