Skip to content

Commit a9df240

Browse files
authored
Steps to setup on runpod
1 parent 96911b3 commit a9df240

File tree

11 files changed

+777
-0
lines changed

11 files changed

+777
-0
lines changed

RAW-Steps.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Qwen3-Coder Deployment on RunPod - Complete Setup Guide
2+
3+
## ✅ Deployment Successfully Completed!
4+
5+
Congratulations! Your Qwen3-Coder model is now running on RunPod with Ollama, exposed as a secure, OpenAI-compatible API endpoint.
6+
7+
## 📋 README.md - Complete Deployment Documentation
8+
9+
```markdown
10+
# Qwen3-Coder on RunPod with Ollama - OpenAI-Compatible API
11+
12+
This guide documents the complete setup for running Qwen3-Coder on RunPod GPU with a secure, OpenAI-compatible API endpoint.
13+
14+
## 🚀 Overview
15+
16+
- **Model**: Qwen3-Coder (30.5B parameters)
17+
- **Platform**: RunPod GPU Instance
18+
- **Framework**: Ollama + LiteLLM Proxy
19+
- **API**: OpenAI-compatible with authentication
20+
- **Security**: API key authentication
21+
22+
## 🛠️ Setup Instructions
23+
24+
### 1. RunPod Instance Configuration
25+
26+
**Template**: `runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04`
27+
28+
**Ports**:
29+
- HTTP Ports: 8888, 8000, 11434
30+
- TCP Ports: Enabled
31+
32+
**Storage**:
33+
- Volume Disk: 80GB (persistent, mounted to `/workspace`)
34+
- Container Disk: 30GB (temporary)
35+
36+
### 2. Ollama Installation
37+
38+
```bash
39+
# Install Ollama
40+
curl -fsSL https://ollama.com/install.sh | sh
41+
42+
# Configure external access
43+
export OLLAMA_HOST=0.0.0.0:11434
44+
export OLLAMA_ORIGINS=*
45+
```
46+
47+
### 3. Model Installation
48+
49+
```bash
50+
# Pull Qwen3-Coder model
51+
ollama pull qwen3-coder
52+
53+
# Verify model
54+
ollama list
55+
```
56+
57+
### 4. LiteLLM Configuration
58+
59+
**API Key Generation**:
60+
```bash
61+
API_KEY=$(openssl rand -hex 32)
62+
echo "$API_KEY" > /workspace/api-key.txt
63+
```
64+
65+
**Configuration File** (`/workspace/litellm-config.yaml`):
66+
```yaml
67+
model_list:
68+
- model_name: qwen3-coder
69+
litellm_params:
70+
model: ollama/qwen3-coder
71+
- model_name: gpt-3.5-turbo
72+
litellm_params:
73+
model: ollama/qwen3-coder
74+
- model_name: gpt-4
75+
litellm_params:
76+
model: ollama/qwen3-coder
77+
78+
general_settings:
79+
master_key: "YOUR_API_KEY_HERE"
80+
drop_params: true
81+
debug_level: "DEBUG"
82+
```
83+
84+
### 5. Startup Script
85+
86+
Create `/start-final.sh`:
87+
```bash
88+
#!/bin/bash
89+
90+
# Create logs directory
91+
mkdir -p /workspace/logs
92+
93+
# Set environment variables
94+
export OLLAMA_HOST=0.0.0.0:11434
95+
export OLLAMA_ORIGINS=*
96+
97+
# Start Ollama
98+
echo "Starting Ollama..."
99+
ollama serve > /workspace/logs/ollama.log 2>&1 &
100+
OLLAMA_PID=$!
101+
echo "Ollama started with PID: $OLLAMA_PID"
102+
103+
# Wait for Ollama to be ready
104+
sleep 15
105+
106+
# Verify Ollama is running
107+
if ! curl -s http://localhost:11434/api/tags > /dev/null; then
108+
echo "ERROR: Ollama failed to start properly"
109+
exit 1
110+
fi
111+
112+
echo "Ollama is running successfully"
113+
114+
# Start LiteLLM proxy with configuration file
115+
echo "Starting LiteLLM proxy..."
116+
litellm --config /workspace/litellm-config.yaml --host 0.0.0.0 --port 8000 --num_workers 8 > /workspace/logs/litellm.log 2>&1 &
117+
LITELLM_PID=$!
118+
echo "LiteLLM started with PID: $LITELLM_PID"
119+
120+
# Wait for LiteLLM to start
121+
sleep 10
122+
123+
echo "All services started successfully!"
124+
echo "Ollama PID: $OLLAMA_PID"
125+
echo "LiteLLM PID: $LITELLM_PID"
126+
```
127+
128+
Make it executable:
129+
```bash
130+
chmod +x /start-final.sh
131+
```
132+
133+
### 6. Running the Services
134+
135+
```bash
136+
# Start services
137+
/start-final.sh
138+
139+
# Check if running
140+
ps aux | grep -E "(ollama|litellm)"
141+
142+
# Check port binding
143+
netstat -tlnp | grep :11434
144+
netstat -tlnp | grep :8000
145+
```
146+
147+
### 7. API Testing
148+
149+
**Test Ollama**:
150+
```bash
151+
curl http://localhost:11434/api/tags
152+
```
153+
154+
**Test LiteLLM with Authentication**:
155+
```bash
156+
API_KEY=$(cat /workspace/api-key.txt)
157+
curl http://localhost:8000/v1/models -H "Authorization: Bearer $API_KEY"
158+
159+
# Chat completions test
160+
curl http://localhost:8000/v1/chat/completions \
161+
-H "Authorization: Bearer $API_KEY" \
162+
-H "Content-Type: application/json" \
163+
-d '{
164+
"model": "qwen3-coder",
165+
"messages": [{"role": "user", "content": "Write a Python function to calculate factorial"}]
166+
}'
167+
```
168+
169+
## 🔐 Security Features
170+
171+
- **API Key Authentication**: All API endpoints require valid Bearer token
172+
- **Port Isolation**: Services only bind to configured ports
173+
- **Process Isolation**: Services run in background with proper logging
174+
175+
## 🔄 Restarting Services
176+
177+
After pod restart:
178+
```bash
179+
/start-final.sh
180+
```
181+
182+
## 📊 Monitoring
183+
184+
**Check logs**:
185+
```bash
186+
tail -f /workspace/logs/ollama.log
187+
tail -f /workspace/logs/litellm.log
188+
```
189+
190+
**Check processes**:
191+
```bash
192+
ps aux | grep -E "(ollama|litellm)"
193+
```
194+
195+
## 🌐 External Access
196+
197+
Once deployed on RunPod, access your API at:
198+
- **Endpoint**: `http://your-pod-id.runpod.io:8000/v1`
199+
- **API Key**: Contents of `/workspace/api-key.txt`
200+
201+
## 🧪 VS Code Integration
202+
203+
Configure your OpenAI-compatible extension with:
204+
- **API Base**: `http://your-pod-id.runpod.io:8000/v1`
205+
- **API Key**: Your generated API key
206+
- **Model**: `qwen3-coder` or `gpt-3.5-turbo`
207+
208+
## 📈 Performance Notes
209+
210+
- **GPU**: NVIDIA A40/A100 recommended (24GB+ VRAM)
211+
- **Memory**: 32GB+ system RAM recommended
212+
- **Model Size**: 18GB on disk
213+
- **Startup Time**: ~30 seconds for full initialization
214+
215+
## 🆘 Troubleshooting
216+
217+
**Connection Refused**:
218+
- Check port binding with `netstat -tlnp`
219+
- Verify services are running with `ps aux`
220+
- Ensure RunPod ports are properly exposed
221+
222+
**Authentication Errors**:
223+
- Verify API key in `/workspace/api-key.txt`
224+
- Check LiteLLM configuration file
225+
- Test with `curl` using proper Authorization header
226+
227+
**Model Loading Issues**:
228+
- Check Ollama logs: `tail -f /workspace/logs/ollama.log`
229+
- Verify model exists: `ollama list`
230+
- Restart Ollama service if needed
231+
232+
## 📝 Notes
233+
234+
- Persistent storage ensures models aren't re-downloaded
235+
- Services automatically restart with the startup script
236+
- API is fully OpenAI-compatible for seamless integration
237+
```
238+
239+
## 🎯 Next Steps
240+
241+
1. **Test external access** from your local machine
242+
2. **Configure VS Code integration** with the API endpoint
243+
3. **Set up HTTPS** for production use (optional)
244+
4. **Monitor usage** and costs on RunPod
245+
246+
Your Qwen3-Coder deployment is now complete and ready for production use!

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Qwen3-Coder RunPod Ollama API
2+
3+
Deploy Qwen3-Coder (30B/480B) on RunPod GPU with Ollama and LiteLLM proxy. Complete setup with secure OpenAI-compatible API endpoint, authentication, persistent storage, automated backups, and VS Code integration.
4+
5+
## 🚀 Features
6+
- Secure OpenAI-compatible API endpoint
7+
- API key authentication
8+
- Persistent storage configuration
9+
- Automated backup system
10+
- VS Code integration ready
11+
- Multi-model support (30B/480B)
12+
- Production-ready deployment
13+
14+
## 🛠️ Requirements
15+
- RunPod account with GPU instance
16+
- Docker (optional but recommended)
17+
- VS Code with Cline extension
18+
19+
## 📁 Project Structure
20+
```
21+
runpod/
22+
├── README.md
23+
├── VSCODE_INTEGRATION.md
24+
├── deployment/
25+
│ ├── start_ollama.sh
26+
│ ├── backup_config.sh
27+
│ └── litellm-config.yaml
28+
├── scripts/
29+
│ ├── model_management.sh
30+
│ └── health_check.sh
31+
├── configs/
32+
│ ├── .env.example
33+
│ └── docker-compose.yml
34+
└── docs/
35+
├── TROUBLESHOOTING.md
36+
└── PERFORMANCE_TUNING.md
37+
```
38+
39+
## 📋 Setup Instructions
40+
1. Create RunPod instance with PyTorch template
41+
2. Install Ollama and required dependencies
42+
3. Pull Qwen3-Coder model
43+
4. Configure LiteLLM proxy with authentication
44+
5. Start services with deployment scripts
45+
6. Configure VS Code integration
46+
47+
## 🎯 Usage
48+
- API Endpoint: `https://your-runpod-endpoint.runpod.io/v1`
49+
- Model Names: `qwen3-coder`, `gpt-3.5-turbo`, `gpt-4`
50+
- Authentication: Bearer token required

VSCODE_INTEGRATION.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# VS Code Integration Guide
2+
3+
## 🚀 Setup Instructions
4+
5+
### 1. Install Cline Extension
6+
1. Open VS Code
7+
2. Go to Extensions (Ctrl+Shift+X)
8+
3. Search for "Cline" and install
9+
10+
### 2. Configure Cline Settings
11+
Add to VS Code `settings.json`:
12+
```json
13+
{
14+
"cline.customModel": "qwen3-coder",
15+
"cline.customEndpoint": "https://your-runpod-endpoint.runpod.io/v1",
16+
"cline.customApiKey": "YOUR_API_KEY_HERE"
17+
}
18+
```
19+
20+
### 3. Environment Variables (Alternative)
21+
Set in your terminal:
22+
```bash
23+
export OPENAI_API_KEY="YOUR_API_KEY_HERE"
24+
export OPENAI_BASE_URL="https://your-runpod-endpoint.runpod.io/v1"
25+
```
26+
27+
## 🧪 Testing Integration
28+
```bash
29+
# Test API access
30+
curl https://your-runpod-endpoint.runpod.io/v1/models \
31+
-H "Authorization: Bearer YOUR_API_KEY_HERE"
32+
```
33+
34+
## 🎯 Common Tasks
35+
- Code explanation: Select code and ask Cline to explain
36+
- Code generation: Ask Cline to write functions, classes, or scripts
37+
- Bug fixing: Highlight problematic code and ask for fixes
38+
- Code optimization: Request performance improvements
39+
- Documentation: Generate docstrings and comments

configs/docker-compose.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
version: '3.8'
2+
services:
3+
ollama:
4+
image: ollama/ollama
5+
container_name: ollama
6+
volumes:
7+
- ./ollama-data:/root/.ollama
8+
- ./models:/home/ollama/models
9+
ports:
10+
- "11434:11434"
11+
restart: unless-stopped
12+
environment:
13+
- OLLAMA_HOST=0.0.0.0:11434
14+
- OLLAMA_ORIGINS=*
15+
16+
litellm:
17+
image: litellm/litellm:latest
18+
container_name: litellm-proxy
19+
ports:
20+
- "8000:8000"
21+
volumes:
22+
- ./configs/litellm-config.yaml:/app/config.yaml
23+
command: "--config /app/config.yaml --host 0.0.0.0 --port 8000"
24+
depends_on:
25+
- ollama
26+
restart: unless-stopped

deployment/backup_config.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
# Backup configuration and models
3+
4+
BACKUP_DIR="/home/ollama/backups"
5+
DATE=$(date +%Y%m%d_%H%M%S)
6+
7+
echo "Creating backup..."
8+
mkdir -p $BACKUP_DIR
9+
10+
# Backup model list
11+
ollama list > $BACKUP_DIR/model_list_$DATE.txt
12+
13+
# Backup configuration files
14+
cp -r /root/.ollama/* $BACKUP_DIR/ 2>/dev/null || true
15+
16+
echo "Backup completed: $BACKUP_DIR/model_list_$DATE.txt"

0 commit comments

Comments
 (0)