Skip to content

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: temp-finetuning
Choose a base branch
from

Conversation

makaveli10
Copy link

The PR adds:

  • LoRA finetuning support for both training a new adapter or finetuning an existing adapter. And saved the adapter at the end of the training run to be used as required for inference.
  • cuda: OUT_PROD Q8/Q4 for quantised lora finetuning.
  • vulkan: Added OUT_PROD operator for fp32 to enable finetuning. Added OUT_PROD Q8, Q4 to enable quantised finetuning.
  • vulkan: Added cross-entropy-loss-backward to allow lower context size which is critical for training on mobile device due to memory constraint.

@zoq
Copy link

zoq commented Aug 19, 2025

Steps to test llama.cpp inference on Android:

  1. Install Termux from the PlayStore and open it.
  2. Run apt update
  3. Run apt remove vulkan-loader-generic
  4. Run apt install git cmake vulkan-tools vulkan-headers shaderc vulkan-loader-android
  5. Run vulkaninfo --summary: This should show the driver and gpu information. If it's the stock driver, it shouldn't mention Mesa.
  6. git clone the repo inside termux and cd into it.
git clone https://github.com/makaveli10/qvac-ext-lib-llama.cpp.git
git checkout lora-finetuning

make sure to checkout the lora-finetuning branch
7. Configure the vulkan backend build with cmake -B build -DGGML_VULKAN=1
8. Build it with cmake --build build --config Debug -j2
9. Run termux-setup-storage and give storage permissions to termux.
10. Outside termux, download a model on the phone, click on it and select to open it with termux. Download the model from here: https://huggingface.co/prithivMLmods/Qwen3-0.6B-GGUF/tree/main i.e. download Qwen3_0.6B.Q8_0.gguf
11. Click "Open Directory" on the prompt.
12. The model should now be reachable inside termux in the ~/downloads directory.
13. For finetunine 8 bit Qwen:

./build/bin/llama-finetune-lora -m Qwen3_0.6B.Q8_0.gguf -f trump.txt -c 256 -b 256 -ub 256 -ngl 999

trump.txt dataset: https://github.com/user-attachments/files/21859494/trump.txt

@zoq
Copy link

zoq commented Aug 19, 2025

@zoq
Copy link

zoq commented Aug 19, 2025

./build/bin/llama-cli -m Qwen3_0.6B.Q8_0.gguf --lora trained-lora-adapter.gguf -if -p "What is your favorite pokemon?" -ngl 999

command we used for testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants