Skip to content

Commit 3d52738

Browse files
author
Christin Pohl
committed
Lora fine tuning example
1 parent afbbf46 commit 3d52738

22 files changed

+2941
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Environment & secrets
2+
config.sh
3+
config.env
4+
.env*
5+
*.key
6+
*.pem
7+
secrets/
8+
9+
# Azure & Kubernetes
10+
.azure/
11+
*.kubeconfig
12+
13+
# Model & data files (large)
14+
*.bin
15+
*.safetensors
16+
models/
17+
checkpoints/
18+
.cache/
19+
20+
# Python
21+
__pycache__/
22+
*.pyc
23+
venv/
24+
.venv/
25+
26+
# OS & IDE
27+
.DS_Store
28+
Thumbs.db
29+
.idea/
30+
.vscode/settings.json
31+
*.swp
32+
33+
# Temporary
34+
*.log
35+
*.tmp
36+
temp/
37+
tmp/
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# AKS GPU Fine-tuning with LoRA
2+
3+
Fine-tune and deploy large language models on Azure Kubernetes Service (AKS) with GPU support using LoRA (Low-Rank Adaptation).
4+
5+
## Use Case
6+
7+
**Problem:** Organizations need AI models that perform internal reasoning in a specific language (e.g., for regulatory audits) while maintaining flexibility in input/output languages for end users.
8+
9+
**Solution:** LoRA fine-tuning to modify the model's reasoning behavior—something not achievable through RAG, prompt engineering, or agentic approaches.
10+
11+
**Example:** A Swiss bank requires all AI reasoning traces to be in German for audit compliance, but wants customers to interact in any language. The fine-tuned model:
12+
13+
- Receives a question in English, French, or any language
14+
- Performs all internal chain-of-thought reasoning in German
15+
- Responds to the user in their original language
16+
17+
This repo demonstrates fine-tuning GPT-OSS 20B to achieve this behavior using LoRA on Azure Kubernetes Service with H100 GPUs.
18+
19+
## Architecture
20+
21+
![Architecture](media/Architecture.png)
22+
23+
## Demo
24+
25+
![Demo](media/Demo.png)
26+
27+
## Features
28+
29+
- **Automated Azure Infrastructure**: Creates AKS cluster, ACR, storage, and managed identities
30+
- **GPU-Optimized**: Uses NVIDIA GPU Operator and NC-series VMs
31+
- **GPU Monitoring**: Azure Managed Prometheus + Grafana with DCGM metrics
32+
- **LoRA Fine-tuning**: Efficient fine-tuning with parameter-efficient methods
33+
- **Side-by-Side Inference**: Compare fine-tuned vs baseline models via Web UI
34+
35+
## Prerequisites
36+
37+
**Tools:**
38+
39+
- **Azure CLI** - [Install](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
40+
- **kubectl** - [Install](https://kubernetes.io/docs/tasks/tools/) or `az aks install-cli`
41+
- **Helm** - [Install](https://helm.sh/docs/intro/install/)
42+
- **Bash shell** - WSL/WSL2 (Windows), Terminal (macOS/Linux), or Git Bash
43+
44+
**Azure:**
45+
46+
- Azure subscription with **Owner** or **Contributor + User Access Administrator** role
47+
- GPU quota for `Standard_NC80adis_H100_v5` (80 vCPUs or 1 node) in your region
48+
- Request quota at: [Azure Portal → Quotas](https://portal.azure.com/#view/Microsoft_Azure_Capacity/QuotaMenuBlade/~/myQuotas)
49+
50+
## Quick Deploy
51+
52+
```bash
53+
# 1. Configure your environment
54+
cp config.sh.template config.sh
55+
# Edit config.sh with your Azure subscription and settings
56+
# Requires Azure subscription with GPU quota for NC80adis_H100_v5 adjust region if needed
57+
58+
# 2. Login to Azure
59+
az login
60+
61+
# 3. Run deployment
62+
bash ./scripts/quick-deploy.sh
63+
```
64+
65+
This will:
66+
67+
1. Create Azure resources (resource group, storage, ACR, managed identity)
68+
2. Create AKS cluster with GPU node pool
69+
3. Setup GPU monitoring (Prometheus + Grafana)
70+
4. Build and push Docker images
71+
5. Deploy fine-tuning job (~20 mins)
72+
6. Deploy inference service with Web UI
73+
74+
## Monitor Fine-tuning
75+
76+
```bash
77+
kubectl get jobs -n workloads
78+
kubectl logs job/gpt-oss-finetune -n workloads -f
79+
```
80+
81+
## GPU Monitoring
82+
83+
The monitoring script automatically sets up:
84+
85+
- Azure Monitor Workspace (Managed Prometheus)
86+
- Azure Managed Grafana with DCGM dashboard
87+
- NVIDIA DCGM metrics scraping
88+
89+
Access Grafana URL from the script output or Azure Portal.
90+
91+
**Test queries in Grafana Explore:**
92+
93+
- `DCGM_FI_DEV_GPU_UTIL` - GPU utilization
94+
- `DCGM_FI_DEV_GPU_TEMP` - GPU temperature
95+
- `DCGM_FI_DEV_FB_USED` - GPU memory usage
96+
97+
Alternatively, visit the DCGM Dashboard for GPU metrics visualization.
98+
99+
## Access Inference Service
100+
101+
```bash
102+
kubectl get svc gpt-oss-inference -n workloads
103+
# Use the EXTERNAL-IP to access the Web UI
104+
```
105+
106+
> **⚠️ Security Note:** The inference service uses a public `LoadBalancer` without authentication—suitable for demos only. For production, use an internal load balancer, network policies, or an API gateway with authentication.
107+
108+
## Individual Steps
109+
110+
```bash
111+
./scripts/01-setup-azure-resources.sh # Azure resources
112+
./scripts/02-create-aks-cluster.sh # AKS cluster (GPU pool at 0 nodes)
113+
./scripts/03-setup-gpu-monitoring.sh # Prometheus + Grafana
114+
./scripts/04-build-and-push-image.sh # Docker build
115+
./scripts/05-deploy-finetune.sh # Scales up GPU, deploys job
116+
./scripts/06-deploy-inference.sh # Inference service
117+
```
118+
119+
## Cost Optimization
120+
121+
**GPU nodes start at 0** - The GPU nodepool is created with 0 nodes to avoid idle costs (~$20/hr).
122+
Script 05/06 automatically scales up when deploying workloads.
123+
124+
```bash
125+
# Manual scale down after training
126+
az aks nodepool update \
127+
--resource-group <rg-name> --cluster-name <cluster-name> \
128+
--name gpupool --min-count 0 --max-count 0
129+
```
130+
131+
## kubectl Context
132+
133+
When creating a new cluster, a new context is added to your kubeconfig:
134+
135+
```bash
136+
kubectl config get-contexts # List all contexts
137+
kubectl config use-context <name> # Switch to a different cluster
138+
```
139+
140+
## Notes
141+
142+
- Fine-tuning typically takes ~20 minutes on H100
143+
- GPU nodes take 5-10 minutes to provision when scaling up
144+
- GPU metrics take 3-5 minutes to appear in Grafana after setup
145+
- Uses NVIDIA PyTorch NGC container for optimized performance
146+
- Requires Azure subscription with GPU quota for NC80adis_H100_v5
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# AKS GPU Fine-tuning Project Configuration Template
2+
# Copy this to config.sh and fill in your values
3+
# DO NOT commit config.sh to version control
4+
5+
# Fix Git Bash path conversion issue on Windows (safe to keep on all platforms)
6+
# Set this in your shell: export MSYS_NO_PATHCONV=1
7+
MSYS_NO_PATHCONV=1
8+
9+
# Azure Subscription
10+
SUBSCRIPTION_ID=""
11+
12+
# Project Configuration
13+
PROJECT_NAME="aks-gpu-finetune"
14+
LOCATION="switzerlandnorth"
15+
16+
# UNIQUE SUFFIX - Required for globally unique resource names!
17+
# Option 1: Set your own suffix (e.g., your initials + random: "cp1234")
18+
# Option 2: Leave empty to auto-generate on first run
19+
UNIQUE_SUFFIX=""
20+
21+
# Resource Names (defaults are set from PROJECT_NAME in scripts)
22+
# Override below ONLY if you want fully custom names
23+
RESOURCE_GROUP_NAME=""
24+
AKS_CLUSTER_NAME=""
25+
# For custom storage/ACR names, uncomment and set these in your config.env
26+
STORAGE_ACCOUNT_NAME=""
27+
ACR_NAME=""
28+
29+
# Storage configuration
30+
STORAGE_CONTAINER_NAME="models"
31+
STORAGE_DATASET_CONTAINER="datasets"
32+
33+
# AKS configuration
34+
AKS_SYSTEM_NODE_COUNT=1
35+
AKS_SYSTEM_NODE_SIZE="Standard_D2s_v6"
36+
AKS_GPU_NODE_POOL_NAME="gpupool"
37+
AKS_GPU_NODE_MIN_COUNT=0 # Start at 0 to avoid idle GPU costs (~$20/hr)
38+
AKS_GPU_NODE_MAX_COUNT=1 # Max 1 node - scripts scale up/down as needed
39+
AKS_GPU_NODE_SIZE="Standard_NC80adis_H100_v5" # 2x H100 GPUs per node
40+
AKS_NODE_DISK_SIZE=100
41+
AKS_VERSION="1.32"
42+
43+
# Container Registry
44+
ACR_SKU="Basic"
45+
46+
# Common tags (set dynamically from PROJECT_NAME in scripts)
47+
TAGS=""
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Dockerfile for GPT-OSS Fine-tuning on H100 GPU
2+
# Base: NVIDIA PyTorch container (includes CUDA, cuDNN, PyTorch pre-optimized)
3+
# Using PyTorch NGC container for better performance and faster builds
4+
5+
FROM nvcr.io/nvidia/pytorch:25.01-py3
6+
7+
# Set environment variables
8+
ENV PYTHONUNBUFFERED=1
9+
10+
# Install ML dependencies for fine-tuning
11+
RUN pip install --no-cache-dir \
12+
"trl>=0.20.0" \
13+
"peft>=0.17.0" \
14+
"transformers>=4.55.0" \
15+
"bitsandbytes>=0.41.0" \
16+
"accelerate>=0.20.0"
17+
18+
# Install Azure SDK for blob storage integration
19+
RUN pip install --no-cache-dir \
20+
azure-storage-blob \
21+
azure-identity
22+
23+
# Create working directory
24+
WORKDIR /workspace
25+
26+
# Copy training script from src folder
27+
COPY src/finetune.py /workspace/
28+
29+
# Set entrypoint
30+
ENTRYPOINT ["python", "finetune.py"]
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Inference Dockerfile for GPT-OSS-20B on H100 GPU
2+
# Base: NVIDIA PyTorch container (includes CUDA, cuDNN, PyTorch pre-optimized)
3+
4+
FROM nvcr.io/nvidia/pytorch:25.01-py3
5+
6+
# Set environment variables
7+
ENV PYTHONUNBUFFERED=1 \
8+
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
9+
10+
# Install Flask for web UI, Azure SDK, and ML packages
11+
RUN pip install --no-cache-dir \
12+
flask \
13+
gunicorn \
14+
azure-storage-blob \
15+
azure-identity \
16+
"transformers>=4.55.0" \
17+
"peft>=0.17.0" \
18+
"accelerate>=0.20.0" \
19+
"bitsandbytes>=0.41.0" \
20+
sentencepiece \
21+
protobuf
22+
23+
# Set working directory
24+
WORKDIR /workspace
25+
26+
# Copy unified inference engine, web UI, and HTML template from src folder
27+
COPY src/model_inference.py /workspace/model_inference.py
28+
COPY src/web-ui.py /workspace/web-ui.py
29+
COPY src/index.html /workspace/index.html
30+
RUN chmod +x /workspace/model_inference.py /workspace/web-ui.py
31+
32+
# Expose port for web UI
33+
EXPOSE 8080
34+
35+
# Run web UI by default (can override with CLI: model_inference.py --model [finetuned|baseline|both])
36+
CMD ["python3", "/workspace/web-ui.py"]
37+
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: gpt-oss-finetune
5+
namespace: workloads
6+
spec:
7+
# Allow 0 retries to prevent pod cleanup before we can check logs
8+
backoffLimit: 0
9+
# Keep failed pods for debugging
10+
ttlSecondsAfterFinished: 3600
11+
12+
template:
13+
metadata:
14+
labels:
15+
app: gpt-oss-finetune
16+
azure.workload.identity/use: "true" # Enable workload identity
17+
spec:
18+
# Use workload identity service account for blob storage access
19+
serviceAccountName: workload-identity-sa
20+
21+
# Tolerate GPU node taints
22+
tolerations:
23+
- key: nvidia.com/gpu
24+
operator: Equal
25+
value: "true"
26+
effect: NoSchedule
27+
28+
restartPolicy: Never
29+
30+
containers:
31+
- name: finetune
32+
image: __ACR_NAME__.azurecr.io/gpt-oss-finetune:latest
33+
imagePullPolicy: Always
34+
35+
# Environment variables for Azure Storage
36+
env:
37+
- name: STORAGE_ACCOUNT_NAME
38+
value: "__STORAGE_ACCOUNT_NAME__"
39+
- name: MODEL_CONTAINER
40+
value: "models"
41+
- name: DATASET_CONTAINER
42+
value: "datasets"
43+
44+
# Request 1 H100 GPU (Standard_NC40ads_H100_v5: 40 vCPUs, 96GB RAM)
45+
resources:
46+
requests:
47+
nvidia.com/gpu: 1
48+
memory: "80Gi"
49+
cpu: "32"
50+
limits:
51+
nvidia.com/gpu: 1
52+
memory: "96Gi"
53+
cpu: "40"
54+
55+
# Set working directory
56+
workingDir: /workspace

0 commit comments

Comments
 (0)