Skip to content

Commit 686abc9

Browse files
author
Christin Pohl
committed
Lora fine tuning example
1 parent afbbf46 commit 686abc9

22 files changed

+2938
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Environment & secrets
2+
config.sh
3+
config.env
4+
.env*
5+
*.key
6+
*.pem
7+
secrets/
8+
9+
# Azure & Kubernetes
10+
.azure/
11+
*.kubeconfig
12+
13+
# Model & data files (large)
14+
*.bin
15+
*.safetensors
16+
models/
17+
checkpoints/
18+
.cache/
19+
20+
# Python
21+
__pycache__/
22+
*.pyc
23+
venv/
24+
.venv/
25+
26+
# OS & IDE
27+
.DS_Store
28+
Thumbs.db
29+
.idea/
30+
.vscode/settings.json
31+
*.swp
32+
33+
# Temporary
34+
*.log
35+
*.tmp
36+
temp/
37+
tmp/
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# AKS GPU Fine-tuning with LoRA
2+
3+
Fine-tune and deploy large language models on Azure Kubernetes Service (AKS) with GPU support using LoRA (Low-Rank Adaptation).
4+
5+
## Use Case
6+
7+
**Problem:** Organizations need AI models that perform internal reasoning in a specific language (e.g., for regulatory audits) while maintaining flexibility in input/output languages for end users.
8+
9+
**Solution:** LoRA fine-tuning to modify the model's reasoning behavior—something not achievable through RAG, prompt engineering, or agentic approaches.
10+
11+
**Example:** A Swiss bank requires all AI reasoning traces to be in German for audit compliance, but wants customers to interact in any language. The fine-tuned model:
12+
- Receives a question in English, French, or any language
13+
- Performs all internal chain-of-thought reasoning in German
14+
- Responds to the user in their original language
15+
16+
This repo demonstrates fine-tuning GPT-OSS 20B to achieve this behavior using LoRA on Azure Kubernetes Service with H100 GPUs.
17+
18+
## Architecture
19+
20+
![Architecture](media/Architecture.png)
21+
22+
## Demo
23+
24+
![Demo](media/Demo.png)
25+
26+
27+
## Features
28+
29+
- **Automated Azure Infrastructure**: Creates AKS cluster, ACR, storage, and managed identities
30+
- **GPU-Optimized**: Uses NVIDIA GPU Operator and NC-series VMs
31+
- **GPU Monitoring**: Azure Managed Prometheus + Grafana with DCGM metrics
32+
- **LoRA Fine-tuning**: Efficient fine-tuning with parameter-efficient methods
33+
- **Side-by-Side Inference**: Compare fine-tuned vs baseline models via Web UI
34+
35+
## Prerequisites
36+
37+
**Tools:**
38+
- **Azure CLI** - [Install](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
39+
- **kubectl** - [Install](https://kubernetes.io/docs/tasks/tools/) or `az aks install-cli`
40+
- **Helm** - [Install](https://helm.sh/docs/intro/install/)
41+
- **Bash shell** - WSL/WSL2 (Windows), Terminal (macOS/Linux), or Git Bash
42+
43+
**Azure:**
44+
- Azure subscription with **Owner** or **Contributor + User Access Administrator** role
45+
- GPU quota for `Standard_NC80adis_H100_v5` (80 vCPUs or 1 node) in your region
46+
- Request quota at: [Azure Portal → Quotas](https://portal.azure.com/#view/Microsoft_Azure_Capacity/QuotaMenuBlade/~/myQuotas)
47+
## Quick Deploy
48+
49+
```bash
50+
# 1. Configure your environment
51+
cp config.sh.template config.sh
52+
# Edit config.sh with your Azure subscription and settings
53+
# Requires Azure subscription with GPU quota for NC80adis_H100_v5 adjust region if needed
54+
55+
# 2. Login to Azure
56+
az login
57+
58+
# 3. Run deployment
59+
bash ./scripts/quick-deploy.sh
60+
```
61+
62+
This will:
63+
1. Create Azure resources (resource group, storage, ACR, managed identity)
64+
2. Create AKS cluster with GPU node pool
65+
3. Setup GPU monitoring (Prometheus + Grafana)
66+
4. Build and push Docker images
67+
5. Deploy fine-tuning job (~20 mins)
68+
6. Deploy inference service with Web UI
69+
70+
## Monitor Fine-tuning
71+
72+
```bash
73+
kubectl get jobs -n workloads
74+
kubectl logs job/gpt-oss-finetune -n workloads -f
75+
```
76+
77+
## GPU Monitoring
78+
79+
The monitoring script automatically sets up:
80+
- Azure Monitor Workspace (Managed Prometheus)
81+
- Azure Managed Grafana with DCGM dashboard
82+
- NVIDIA DCGM metrics scraping
83+
84+
Access Grafana URL from the script output or Azure Portal.
85+
86+
**Test queries in Grafana Explore:**
87+
- `DCGM_FI_DEV_GPU_UTIL` - GPU utilization
88+
- `DCGM_FI_DEV_GPU_TEMP` - GPU temperature
89+
- `DCGM_FI_DEV_FB_USED` - GPU memory usage
90+
91+
Alternatively, visit the DCGM Dashboard for GPU metrics visualization.
92+
93+
## Access Inference Service
94+
95+
```bash
96+
kubectl get svc gpt-oss-inference -n workloads
97+
# Use the EXTERNAL-IP to access the Web UI
98+
```
99+
100+
> **⚠️ Security Note:** The inference service uses a public `LoadBalancer` without authentication—suitable for demos only. For production, use an internal load balancer, network policies, or an API gateway with authentication.
101+
102+
## Individual Steps
103+
104+
```bash
105+
./scripts/01-setup-azure-resources.sh # Azure resources
106+
./scripts/02-create-aks-cluster.sh # AKS cluster (GPU pool at 0 nodes)
107+
./scripts/03-setup-gpu-monitoring.sh # Prometheus + Grafana
108+
./scripts/04-build-and-push-image.sh # Docker build
109+
./scripts/05-deploy-finetune.sh # Scales up GPU, deploys job
110+
./scripts/06-deploy-inference.sh # Inference service
111+
```
112+
113+
## Cost Optimization
114+
115+
**GPU nodes start at 0** - The GPU nodepool is created with 0 nodes to avoid idle costs (~$20/hr).
116+
Script 05/06 automatically scales up when deploying workloads.
117+
118+
```bash
119+
# Manual scale down after training
120+
az aks nodepool update \
121+
--resource-group <rg-name> --cluster-name <cluster-name> \
122+
--name gpupool --min-count 0 --max-count 0
123+
```
124+
125+
## kubectl Context
126+
127+
When creating a new cluster, a new context is added to your kubeconfig:
128+
```bash
129+
kubectl config get-contexts # List all contexts
130+
kubectl config use-context <name> # Switch to a different cluster
131+
```
132+
133+
## Notes
134+
135+
- Fine-tuning typically takes ~20 minutes on H100
136+
- GPU nodes take 5-10 minutes to provision when scaling up
137+
- GPU metrics take 3-5 minutes to appear in Grafana after setup
138+
- Uses NVIDIA PyTorch NGC container for optimized performance
139+
- Requires Azure subscription with GPU quota for NC80adis_H100_v5
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# AKS GPU Fine-tuning Project Configuration Template
2+
# Copy this to config.sh and fill in your values
3+
# DO NOT commit config.sh to version control
4+
5+
# Fix Git Bash path conversion issue on Windows (safe to keep on all platforms)
6+
# Set this in your shell: export MSYS_NO_PATHCONV=1
7+
MSYS_NO_PATHCONV=1
8+
9+
# Azure Subscription
10+
SUBSCRIPTION_ID=""
11+
12+
# Project Configuration
13+
PROJECT_NAME="aks-gpu-finetune"
14+
LOCATION="switzerlandnorth"
15+
16+
# UNIQUE SUFFIX - Required for globally unique resource names!
17+
# Option 1: Set your own suffix (e.g., your initials + random: "cp1234")
18+
# Option 2: Leave empty to auto-generate on first run
19+
UNIQUE_SUFFIX=""
20+
21+
# Resource Names (defaults are set from PROJECT_NAME in scripts)
22+
# Override below ONLY if you want fully custom names
23+
RESOURCE_GROUP_NAME=""
24+
AKS_CLUSTER_NAME=""
25+
# For custom storage/ACR names, uncomment and set these in your config.env
26+
STORAGE_ACCOUNT_NAME=""
27+
ACR_NAME=""
28+
29+
# Storage configuration
30+
STORAGE_CONTAINER_NAME="models"
31+
STORAGE_DATASET_CONTAINER="datasets"
32+
33+
# AKS configuration
34+
AKS_SYSTEM_NODE_COUNT=1
35+
AKS_SYSTEM_NODE_SIZE="Standard_D2s_v6"
36+
AKS_GPU_NODE_POOL_NAME="gpupool"
37+
AKS_GPU_NODE_MIN_COUNT=0 # Start at 0 to avoid idle GPU costs (~$20/hr)
38+
AKS_GPU_NODE_MAX_COUNT=1 # Max 1 node - scripts scale up/down as needed
39+
AKS_GPU_NODE_SIZE="Standard_NC80adis_H100_v5" # 2x H100 GPUs per node
40+
AKS_NODE_DISK_SIZE=100
41+
AKS_VERSION="1.32"
42+
43+
# Container Registry
44+
ACR_SKU="Basic"
45+
46+
# Common tags (set dynamically from PROJECT_NAME in scripts)
47+
TAGS=""
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Dockerfile for GPT-OSS Fine-tuning on H100 GPU
2+
# Base: NVIDIA PyTorch container (includes CUDA, cuDNN, PyTorch pre-optimized)
3+
# Using PyTorch NGC container for better performance and faster builds
4+
5+
FROM nvcr.io/nvidia/pytorch:25.01-py3
6+
7+
# Set environment variables
8+
ENV PYTHONUNBUFFERED=1
9+
10+
# Install ML dependencies for fine-tuning
11+
RUN pip install --no-cache-dir \
12+
"trl>=0.20.0" \
13+
"peft>=0.17.0" \
14+
"transformers>=4.55.0" \
15+
"bitsandbytes>=0.41.0" \
16+
"accelerate>=0.20.0"
17+
18+
# Install Azure SDK for blob storage integration
19+
RUN pip install --no-cache-dir \
20+
azure-storage-blob \
21+
azure-identity
22+
23+
# Create working directory
24+
WORKDIR /workspace
25+
26+
# Copy training script from src folder
27+
COPY src/finetune.py /workspace/
28+
29+
# Set entrypoint
30+
ENTRYPOINT ["python", "finetune.py"]
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Inference Dockerfile for GPT-OSS-20B on H100 GPU
2+
# Base: NVIDIA PyTorch container (includes CUDA, cuDNN, PyTorch pre-optimized)
3+
4+
FROM nvcr.io/nvidia/pytorch:25.01-py3
5+
6+
# Set environment variables
7+
ENV PYTHONUNBUFFERED=1 \
8+
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
9+
10+
# Install Flask for web UI, Azure SDK, and ML packages
11+
RUN pip install --no-cache-dir \
12+
flask \
13+
gunicorn \
14+
azure-storage-blob \
15+
azure-identity \
16+
"transformers>=4.55.0" \
17+
"peft>=0.17.0" \
18+
"accelerate>=0.20.0" \
19+
"bitsandbytes>=0.41.0" \
20+
sentencepiece \
21+
protobuf
22+
23+
# Set working directory
24+
WORKDIR /workspace
25+
26+
# Copy unified inference engine, web UI, and HTML template from src folder
27+
COPY src/model_inference.py /workspace/model_inference.py
28+
COPY src/web-ui.py /workspace/web-ui.py
29+
COPY src/index.html /workspace/index.html
30+
RUN chmod +x /workspace/model_inference.py /workspace/web-ui.py
31+
32+
# Expose port for web UI
33+
EXPOSE 8080
34+
35+
# Run web UI by default (can override with CLI: model_inference.py --model [finetuned|baseline|both])
36+
CMD ["python3", "/workspace/web-ui.py"]
37+
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: gpt-oss-finetune
5+
namespace: workloads
6+
spec:
7+
# Allow 0 retries to prevent pod cleanup before we can check logs
8+
backoffLimit: 0
9+
# Keep failed pods for debugging
10+
ttlSecondsAfterFinished: 3600
11+
12+
template:
13+
metadata:
14+
labels:
15+
app: gpt-oss-finetune
16+
azure.workload.identity/use: "true" # Enable workload identity
17+
spec:
18+
# Use workload identity service account for blob storage access
19+
serviceAccountName: workload-identity-sa
20+
21+
# Tolerate GPU node taints
22+
tolerations:
23+
- key: nvidia.com/gpu
24+
operator: Equal
25+
value: "true"
26+
effect: NoSchedule
27+
28+
restartPolicy: Never
29+
30+
containers:
31+
- name: finetune
32+
image: __ACR_NAME__.azurecr.io/gpt-oss-finetune:latest
33+
imagePullPolicy: Always
34+
35+
# Environment variables for Azure Storage
36+
env:
37+
- name: STORAGE_ACCOUNT_NAME
38+
value: "__STORAGE_ACCOUNT_NAME__"
39+
- name: MODEL_CONTAINER
40+
value: "models"
41+
- name: DATASET_CONTAINER
42+
value: "datasets"
43+
44+
# Request 1 H100 GPU (Standard_NC40ads_H100_v5: 40 vCPUs, 96GB RAM)
45+
resources:
46+
requests:
47+
nvidia.com/gpu: 1
48+
memory: "80Gi"
49+
cpu: "32"
50+
limits:
51+
nvidia.com/gpu: 1
52+
memory: "96Gi"
53+
cpu: "40"
54+
55+
# Set working directory
56+
workingDir: /workspace

0 commit comments

Comments
 (0)