Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

makaveli10 · 2025-08-19T14:25:15Z

The PR adds:

LoRA finetuning support for both training a new adapter or finetuning an existing adapter. And saved the adapter at the end of the training run to be used as required for inference.
cuda: OUT_PROD Q8/Q4 for quantised lora finetuning.
vulkan: Added OUT_PROD operator for fp32 to enable finetuning. Added OUT_PROD Q8, Q4 to enable quantised finetuning.
vulkan: Added cross-entropy-loss-backward to allow lower context size which is critical for training on mobile device due to memory constraint.

…a is provided

Signed-off-by: vineet <[email protected]>

…lation Signed-off-by: vineet <[email protected]>

zoq · 2025-08-19T16:04:58Z

Steps to test llama.cpp inference on Android:

Install Termux from the PlayStore and open it.
Run apt update
Run apt remove vulkan-loader-generic
Run apt install git cmake vulkan-tools vulkan-headers shaderc vulkan-loader-android
Run vulkaninfo --summary: This should show the driver and gpu information. If it's the stock driver, it shouldn't mention Mesa.
git clone the repo inside termux and cd into it.

git clone https://github.com/makaveli10/qvac-ext-lib-llama.cpp.git
git checkout lora-finetuning

make sure to checkout the lora-finetuning branch
7. Configure the vulkan backend build with cmake -B build -DGGML_VULKAN=1
8. Build it with cmake --build build --config Debug -j2
9. Run termux-setup-storage and give storage permissions to termux.
10. Outside termux, download a model on the phone, click on it and select to open it with termux. Download the model from here: https://huggingface.co/prithivMLmods/Qwen3-0.6B-GGUF/tree/main i.e. download Qwen3_0.6B.Q8_0.gguf
11. Click "Open Directory" on the prompt.
12. The model should now be reachable inside termux in the ~/downloads directory.
13. For finetunine 8 bit Qwen:

./build/bin/llama-finetune-lora -m Qwen3_0.6B.Q8_0.gguf -f trump.txt -c 256 -b 256 -ub 256 -ngl 999

trump.txt dataset: https://github.com/user-attachments/files/21859494/trump.txt

zoq · 2025-08-19T16:05:42Z

For testing I'll reference the updated README: https://github.com/tetherto/qvac-ext-lib-llama.cpp/blob/bc7dd9f9288222394da37eac3d7adf71d409ad83/examples/training/README.md#using-trained-adapters

zoq · 2025-08-19T16:11:49Z

./build/bin/llama-cli -m Qwen3_0.6B.Q8_0.gguf --lora trained-lora-adapter.gguf -if -p "What is your favorite pokemon?" -ngl 999

command we used for testing

Signed-off-by: vineet <[email protected]>

makaveli10 and others added 15 commits August 19, 2025 10:07

Add lora finetuning from adapter

f7b0025

Add: create new lora adapter for target modules to finetune if no lor…

116f3dd

…a is provided

Fix identical loss over epochs; fix garbage lora initization

9e6d8ce

Signed-off-by: vineet <[email protected]>

Remove lora training from finetune.cpp

8bb11c0

Signed-off-by: vineet <[email protected]>

Add adapter saving & other lora target modules

486ebc1

Signed-off-by: vineet <[email protected]>

Add finetune-lora for lora finetuning in examples

c23ada9

Signed-off-by: vineet <[email protected]>

Add dequantization to out_prod cuda kernel

3f295e1

Signed-off-by: vineet <[email protected]>

Update README with finetune-lora

0c1ffd1

Signed-off-by: vineet <[email protected]>

Vulkan: add support for fp32 OUT_PROD op

e9f5d88

CPU: add support for fp16_fp32 OUT_PROD op

fb0e501

Vulkan: add support for f16_f32 OUT_PROD op

2b0c835

Vulkan: Add Q4_0/Q8_0 OUT_PROD Vulkan support

0aef6c8

vulkan: Add initial cross entropy loss backward shader

25c5316

Signed-off-by: vineet <[email protected]>

vulkan: Fix cross-entropy-loss-back dispatch size and wg denominator

0721550

Signed-off-by: vineet <[email protected]>

vulkan: Change uint32 cast to int32 for outprod; allows android compi…

bc7dd9f

…lation Signed-off-by: vineet <[email protected]>

Italo Nicola and others added 2 commits August 19, 2025 12:19

wip vulkan crash fix

b820541

vulkan: Set out_prod pipeline disable_robustness to true

25dfd75

Signed-off-by: vineet <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Uh oh!

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Are you sure you want to change the base?

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Uh oh!

Conversation

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

zoq commented Aug 19, 2025 •

edited

Loading