Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

makaveli10 · 2025-08-19T14:25:15Z

The PR adds:

LoRA finetuning support for both training a new adapter or finetuning an existing adapter. And saved the adapter at the end of the training run to be used as required for inference.
cuda: OUT_PROD Q8/Q4 for quantised lora finetuning.
vulkan: Added OUT_PROD operator for fp32 to enable finetuning. Added OUT_PROD Q8, Q4 to enable quantised finetuning.
vulkan: Added cross-entropy-loss-backward to allow lower context size which is critical for training on mobile device due to memory constraint.

…a is provided

Signed-off-by: vineet <[email protected]>

…lation Signed-off-by: vineet <[email protected]>

zoq · 2025-08-19T16:04:58Z

Steps to test llama.cpp inference on Android:

Install Termux from the PlayStore and open it.
Run apt update
Run apt remove vulkan-loader-generic
Run apt install git cmake vulkan-tools vulkan-headers shaderc vulkan-loader-android
Run vulkaninfo --summary: This should show the driver and gpu information. If it's the stock driver, it shouldn't mention Mesa.
git clone the repo inside termux and cd into it.

git clone https://github.com/makaveli10/qvac-ext-lib-llama.cpp.git
git checkout lora-finetuning

make sure to checkout the lora-finetuning branch
7. Configure the vulkan backend build with cmake -B build -DGGML_VULKAN=1
8. Build it with cmake --build build --config Debug -j2
9. Run termux-setup-storage and give storage permissions to termux.
10. Outside termux, download a model on the phone, click on it and select to open it with termux. Download the model from here: https://huggingface.co/prithivMLmods/Qwen3-0.6B-GGUF/tree/main i.e. download Qwen3_0.6B.Q8_0.gguf
11. Click "Open Directory" on the prompt.
12. The model should now be reachable inside termux in the ~/downloads directory.
13. For finetunine 8 bit Qwen:

./build/bin/llama-finetune-lora -m Qwen3_0.6B.Q8_0.gguf -f trump.txt -c 256 -b 256 -ub 256 -ngl 999

trump.txt dataset: https://github.com/user-attachments/files/21859494/trump.txt

zoq · 2025-08-19T16:05:42Z

For testing I'll reference the updated README: https://github.com/tetherto/qvac-ext-lib-llama.cpp/blob/bc7dd9f9288222394da37eac3d7adf71d409ad83/examples/training/README.md#using-trained-adapters

zoq · 2025-08-19T16:11:49Z

./build/bin/llama-cli -m Qwen3_0.6B.Q8_0.gguf --lora trained-lora-adapter.gguf -if -p "What is your favorite pokemon?" -ngl 999

command we used for testing

andrunko

Changes LGTM in general, just some small comments/nits overall, feel free to ignore the nitpicks :).

ggml/src/ggml-vulkan/ggml-vulkan.cpp

examples/training/finetune.cpp

ggml/src/ggml-cuda/ggml-cuda.cu

ggml/src/ggml-cuda/out-prod.cu

andrunko · 2025-08-21T21:48:01Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

        case GGML_OP_ADD:
        case GGML_OP_SUB:
        case GGML_OP_MUL:
        case GGML_OP_DIV:
-            return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&
+	    return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&


nit: spurious change?

andrunko · 2025-08-21T21:55:45Z

Looks like there are some CI failures also related to these changes - see https://github.com/tetherto/qvac-ext-lib-llama.cpp/actions/runs/17076253696/job/48418341198?pr=5 for example:

/__w/qvac-ext-lib-llama.cpp/qvac-ext-lib-llama.cpp/src/llama-lora-training.cpp:293:29: error: the address of 'ggml_tensor::name' will never be NULL [-Werror=address]
  293 |     if (!tensor || !tensor->name) {
      |                     ~~~~~~~~^~~~

JamieBohannaWebDev · 2025-08-22T07:55:48Z

Fine Tuning attempt on Pixel 9 Pro Fold Evidence below.

Please note the 27.5 hours estimated completion time...

makaveli10 · 2025-08-22T09:26:55Z

@JamieBohannaWebDev On our side, I think for a test we ran it with 10-20% of the data. Took much less time. Also we have checkpoint saving resuming integration going on which would allows us to train in bursts by saving a checkpoint and resuming later from the same point.

nurmanmus · 2025-08-25T13:57:25Z

@JamieBohannaWebDev Did we test the output with some prompts after the completion of the fine-tuned model? (before vs. after)

This fixes the vkDeviceLostError on Mali

ggml/src/ggml-vulkan/ggml-vulkan.cpp

andrunko

Changes LGTM, looks like I can't merge it though so will defer to someone else with perms to do it.

andrunko · 2025-08-28T16:23:52Z

The current CI failures seem unrelated to the changes here, both are failing with:

No suitable Dawn artifact found!

makaveli10 and others added 15 commits August 19, 2025 10:07

Add lora finetuning from adapter

f7b0025

Add: create new lora adapter for target modules to finetune if no lor…

116f3dd

…a is provided

Fix identical loss over epochs; fix garbage lora initization

9e6d8ce

Signed-off-by: vineet <[email protected]>

Remove lora training from finetune.cpp

8bb11c0

Signed-off-by: vineet <[email protected]>

Add adapter saving & other lora target modules

486ebc1

Signed-off-by: vineet <[email protected]>

Add finetune-lora for lora finetuning in examples

c23ada9

Signed-off-by: vineet <[email protected]>

Add dequantization to out_prod cuda kernel

3f295e1

Signed-off-by: vineet <[email protected]>

Update README with finetune-lora

0c1ffd1

Signed-off-by: vineet <[email protected]>

Vulkan: add support for fp32 OUT_PROD op

e9f5d88

CPU: add support for fp16_fp32 OUT_PROD op

fb0e501

Vulkan: add support for f16_f32 OUT_PROD op

2b0c835

Vulkan: Add Q4_0/Q8_0 OUT_PROD Vulkan support

0aef6c8

vulkan: Add initial cross entropy loss backward shader

25c5316

Signed-off-by: vineet <[email protected]>

vulkan: Fix cross-entropy-loss-back dispatch size and wg denominator

0721550

Signed-off-by: vineet <[email protected]>

vulkan: Change uint32 cast to int32 for outprod; allows android compi…

bc7dd9f

…lation Signed-off-by: vineet <[email protected]>

andrunko reviewed Aug 21, 2025

View reviewed changes

makaveli10 added 3 commits August 26, 2025 09:31

vulkan: Deallocate memory after destroying buffer

c36aeee

vulkan: Set specialization constants to { 0 } for out_prod

1709861

This fixes the vkDeviceLostError on Mali

vulkan: Set out_prod pipeline disable_robustness to true

b0c5b5b

makaveli10 force-pushed the lora-finetuning branch from 25dfd75 to 53f2e8e Compare August 26, 2025 15:16

github-actions bot added Nvidia GPU Vulkan examples labels Aug 26, 2025

github-actions bot added the ggml label Aug 26, 2025

makaveli10 commented Aug 28, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

andrunko approved these changes Aug 28, 2025

View reviewed changes

makaveli10 force-pushed the lora-finetuning branch from cb9e955 to ca99485 Compare August 28, 2025 16:32

makaveli10 and others added 2 commits August 28, 2025 13:19

Fix out_prod; vulkan ci issues

075d1cb

Add GEGLU backward (Vulkan) to enable Gemma training.

191dd7e

makaveli10 force-pushed the lora-finetuning branch from ca99485 to 191dd7e Compare August 28, 2025 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Uh oh!

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

andrunko left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrunko Aug 21, 2025 •

edited

Loading

Uh oh!

andrunko commented Aug 21, 2025 •

edited

Loading

Uh oh!

JamieBohannaWebDev commented Aug 22, 2025 •

edited

Loading

Uh oh!

makaveli10 commented Aug 22, 2025 •

edited

Loading

Uh oh!

nurmanmus commented Aug 25, 2025

Uh oh!

Uh oh!

andrunko left a comment

Uh oh!

andrunko commented Aug 28, 2025

Uh oh!

Uh oh!

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Are you sure you want to change the base?

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Uh oh!

Conversation

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrunko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrunko Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrunko commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamieBohannaWebDev commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

makaveli10 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nurmanmus commented Aug 25, 2025

Uh oh!

Uh oh!

andrunko left a comment

Choose a reason for hiding this comment

Uh oh!

andrunko commented Aug 28, 2025

Uh oh!

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

zoq commented Aug 19, 2025 •

edited

Loading

andrunko Aug 21, 2025 •

edited

Loading

andrunko commented Aug 21, 2025 •

edited

Loading

JamieBohannaWebDev commented Aug 22, 2025 •

edited

Loading

makaveli10 commented Aug 22, 2025 •

edited

Loading