Fine-Tuning a Small LLM for Customer Support

Fine-tune Qwen3-4B with LoRA on Apple Silicon to build a local, private customer support model — trained on real FAQ data and served with llama.cpp.

Overview

Step	Description	Script
1	Define FAQ knowledge base	`faqs.json`
2	Generate synthetic training data	`generate_training_data.py`
3	Fine-tune Qwen3-4B with LoRA	`train.py`
4	Test the fine-tuned model	`test_model.py`
5	Merge LoRA weights & export to GGUF	`merge_and_export.py`
6	Serve locally with llama.cpp	`llama-server`

Prerequisites

MacBook Pro with Apple Silicon (M1 Pro 16GB+ recommended)
Python 3.11+
uv package manager
OpenRouter API key (for synthetic data generation)
Hugging Face account

Setup

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv --python 3.11
source .venv/bin/activate

# Install dependencies
uv add torch torchvision torchaudio
uv add "transformers>=4.51.0" datasets peft trl accelerate
uv add openai huggingface_hub sentencepiece

# Install llama.cpp (for GGUF conversion and serving)
brew install llama.cpp
uv pip install "gguf @ git+https://github.com/ggerganov/llama.cpp.git#subdirectory=gguf-py"

Usage

1. Generate training data

export OPENROUTER_API_KEY="your-key-here"
uv run python generate_training_data.py

2. Fine-tune the model

uv run python train.py

3. Test the model

uv run python test_model.py

4. Merge and export to GGUF

uv run python merge_and_export.py

.venv/bin/python $(brew --prefix llama.cpp)/bin/convert_hf_to_gguf.py ./taikai-support-merged \
    --outfile ./taikai-support-q8_0.gguf \
    --outtype q8_0

5. Serve locally

llama-server \
    -m ./taikai-support-q8_0.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -ngl 99 \
    -c 2048 \
    --chat-template chatml

6. Query the model

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful customer support assistant for TAIKAI."},
      {"role": "user", "content": "how do i join a hackathon?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Project Structure

.
├── faqs.json                    # Source FAQ knowledge base (196 entries)
├── generate_training_data.py    # Synthetic data generation via OpenRouter
├── train.py                     # LoRA fine-tuning script
├── test_model.py                # Test the fine-tuned model
├── merge_and_export.py          # Merge LoRA weights into base model
├── train.jsonl                  # Generated training data
├── val.jsonl                    # Generated validation data
├── taikai-support-model/        # LoRA adapter output
├── taikai-support-merged/       # Merged model (base + LoRA)
└── taikai-support-q8_0.gguf    # Final GGUF model for llama.cpp

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
.python-version		.python-version
FINE_TUNNING_SLLM_PT1.md		FINE_TUNNING_SLLM_PT1.md
FINE_TUNNING_SLLM_PT2.md		FINE_TUNNING_SLLM_PT2.md
README.md		README.md
faqs.json		faqs.json
generate_training_data.py		generate_training_data.py
merge_and_export.py		merge_and_export.py
plot_training.py		plot_training.py
prepare_rl_data.py		prepare_rl_data.py
pyproject.toml		pyproject.toml
rewards.py		rewards.py
test_model.py		test_model.py
train.jsonl		train.jsonl
train.py		train.py
train_grpo.py		train_grpo.py
uv.lock		uv.lock
val.jsonl		val.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning a Small LLM for Customer Support

Overview

Prerequisites

Setup

Usage

1. Generate training data

2. Fine-tune the model

3. Test the model

4. Merge and export to GGUF

5. Serve locally

6. Query the model

Project Structure

Blog Posts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning a Small LLM for Customer Support

Overview

Prerequisites

Setup

Usage

1. Generate training data

2. Fine-tune the model

3. Test the model

4. Merge and export to GGUF

5. Serve locally

6. Query the model

Project Structure

Blog Posts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages