WebDevSachin
diff --git a/‎RAW-Steps.md‎
Lines changed: 246 additions & 0 deletions b/‎RAW-Steps.md‎
Lines changed: 246 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 50 additions & 0 deletions b/‎README.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎VSCODE_INTEGRATION.md‎
Lines changed: 39 additions & 0 deletions b/‎VSCODE_INTEGRATION.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎configs/docker-compose.yml‎
Lines changed: 26 additions & 0 deletions b/‎configs/docker-compose.yml‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎deployment/backup_config.sh‎
Lines changed: 16 additions & 0 deletions b/‎deployment/backup_config.sh‎
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,246 @@
+# Qwen3-Coder Deployment on RunPod - Complete Setup Guide
+
+## ✅ Deployment Successfully Completed!
+
+Congratulations! Your Qwen3-Coder model is now running on RunPod with Ollama, exposed as a secure, OpenAI-compatible API endpoint.
+
+## 📋 README.md - Complete Deployment Documentation
+
+```markdown
+# Qwen3-Coder on RunPod with Ollama - OpenAI-Compatible API
+
+This guide documents the complete setup for running Qwen3-Coder on RunPod GPU with a secure, OpenAI-compatible API endpoint.
+
+## 🚀 Overview
+
+- **Model**: Qwen3-Coder (30.5B parameters)
+- **Platform**: RunPod GPU Instance
+- **Framework**: Ollama + LiteLLM Proxy
+- **API**: OpenAI-compatible with authentication
+- **Security**: API key authentication
+
+## 🛠️ Setup Instructions
+
+### 1. RunPod Instance Configuration
+
+**Template**: `runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04`
+
+**Ports**:
+- HTTP Ports: 8888, 8000, 11434
+- TCP Ports: Enabled
+
+**Storage**:
+- Volume Disk: 80GB (persistent, mounted to `/workspace`)
+- Container Disk: 30GB (temporary)
+
+### 2. Ollama Installation
+
+```bash
+# Install Ollama
+curl -fsSL https://ollama.com/install.sh | sh
+
+# Configure external access
+export OLLAMA_HOST=0.0.0.0:11434
+export OLLAMA_ORIGINS=*
+```
+
+### 3. Model Installation
+
+```bash
+# Pull Qwen3-Coder model
+ollama pull qwen3-coder
+
+# Verify model
+ollama list
+```
+
+### 4. LiteLLM Configuration
+
+**API Key Generation**:
+```bash
+API_KEY=$(openssl rand -hex 32)
+echo "$API_KEY" > /workspace/api-key.txt
+```
+
+**Configuration File** (`/workspace/litellm-config.yaml`):
+```yaml
+model_list:
+  - model_name: qwen3-coder
+    litellm_params:
+      model: ollama/qwen3-coder
+  - model_name: gpt-3.5-turbo
+    litellm_params:
+      model: ollama/qwen3-coder
+  - model_name: gpt-4
+    litellm_params:
+      model: ollama/qwen3-coder
+
+general_settings:
+  master_key: "YOUR_API_KEY_HERE"
+  drop_params: true
+  debug_level: "DEBUG"
+```
+
+### 5. Startup Script
+
+Create `/start-final.sh`:
+```bash
+#!/bin/bash
+
+# Create logs directory
+mkdir -p /workspace/logs
+
+# Set environment variables
+export OLLAMA_HOST=0.0.0.0:11434
+export OLLAMA_ORIGINS=*
+
+# Start Ollama
+echo "Starting Ollama..."
+ollama serve > /workspace/logs/ollama.log 2>&1 &
+OLLAMA_PID=$!
+echo "Ollama started with PID: $OLLAMA_PID"
+
+# Wait for Ollama to be ready
+sleep 15
+
+# Verify Ollama is running
+if ! curl -s http://localhost:11434/api/tags > /dev/null; then
+    echo "ERROR: Ollama failed to start properly"
+    exit 1
+fi
+
+echo "Ollama is running successfully"
+
+# Start LiteLLM proxy with configuration file
+echo "Starting LiteLLM proxy..."
+litellm --config /workspace/litellm-config.yaml --host 0.0.0.0 --port 8000 --num_workers 8 > /workspace/logs/litellm.log 2>&1 &
+LITELLM_PID=$!
+echo "LiteLLM started with PID: $LITELLM_PID"
+
+# Wait for LiteLLM to start
+sleep 10
+
+echo "All services started successfully!"
+echo "Ollama PID: $OLLAMA_PID"
+echo "LiteLLM PID: $LITELLM_PID"
+```
+
+Make it executable:
+```bash
+chmod +x /start-final.sh
+```
+
+### 6. Running the Services
+
+```bash
+# Start services
+/start-final.sh
+
+# Check if running
+ps aux | grep -E "(ollama|litellm)"
+
+# Check port binding
+netstat -tlnp | grep :11434
+netstat -tlnp | grep :8000
+```
+
+### 7. API Testing
+
+**Test Ollama**:
+```bash
+curl http://localhost:11434/api/tags
+```
+
+**Test LiteLLM with Authentication**:
+```bash
+API_KEY=$(cat /workspace/api-key.txt)
+curl http://localhost:8000/v1/models -H "Authorization: Bearer $API_KEY"
+
+# Chat completions test
+curl http://localhost:8000/v1/chat/completions \
+  -H "Authorization: Bearer $API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen3-coder",
+    "messages": [{"role": "user", "content": "Write a Python function to calculate factorial"}]
+  }'
+```
+
+## 🔐 Security Features
+
+- **API Key Authentication**: All API endpoints require valid Bearer token
+- **Port Isolation**: Services only bind to configured ports
+- **Process Isolation**: Services run in background with proper logging
+
+## 🔄 Restarting Services
+
+After pod restart:
+```bash
+/start-final.sh
+```
+
+## 📊 Monitoring
+
+**Check logs**:
+```bash
+tail -f /workspace/logs/ollama.log
+tail -f /workspace/logs/litellm.log
+```
+
+**Check processes**:
+```bash
+ps aux | grep -E "(ollama|litellm)"
+```
+
+## 🌐 External Access
+
+Once deployed on RunPod, access your API at:
+- **Endpoint**: `http://your-pod-id.runpod.io:8000/v1`
+- **API Key**: Contents of `/workspace/api-key.txt`
+
+## 🧪 VS Code Integration
+
+Configure your OpenAI-compatible extension with:
+- **API Base**: `http://your-pod-id.runpod.io:8000/v1`
+- **API Key**: Your generated API key
+- **Model**: `qwen3-coder` or `gpt-3.5-turbo`
+
+## 📈 Performance Notes
+
+- **GPU**: NVIDIA A40/A100 recommended (24GB+ VRAM)
+- **Memory**: 32GB+ system RAM recommended
+- **Model Size**: 18GB on disk
+- **Startup Time**: ~30 seconds for full initialization
+
+## 🆘 Troubleshooting
+
+**Connection Refused**:
+- Check port binding with `netstat -tlnp`
+- Verify services are running with `ps aux`
+- Ensure RunPod ports are properly exposed
+
+**Authentication Errors**:
+- Verify API key in `/workspace/api-key.txt`
+- Check LiteLLM configuration file
+- Test with `curl` using proper Authorization header
+
+**Model Loading Issues**:
+- Check Ollama logs: `tail -f /workspace/logs/ollama.log`
+- Verify model exists: `ollama list`
+- Restart Ollama service if needed
+
+## 📝 Notes
+
+- Persistent storage ensures models aren't re-downloaded
+- Services automatically restart with the startup script
+- API is fully OpenAI-compatible for seamless integration
+```
+
+## 🎯 Next Steps
+
+1. **Test external access** from your local machine
+2. **Configure VS Code integration** with the API endpoint
+3. **Set up HTTPS** for production use (optional)
+4. **Monitor usage** and costs on RunPod
+
+Your Qwen3-Coder deployment is now complete and ready for production use!
@@ -0,0 +1,50 @@
+# Qwen3-Coder RunPod Ollama API
+
+Deploy Qwen3-Coder (30B/480B) on RunPod GPU with Ollama and LiteLLM proxy. Complete setup with secure OpenAI-compatible API endpoint, authentication, persistent storage, automated backups, and VS Code integration.
+
+## 🚀 Features
+- Secure OpenAI-compatible API endpoint
+- API key authentication
+- Persistent storage configuration
+- Automated backup system
+- VS Code integration ready
+- Multi-model support (30B/480B)
+- Production-ready deployment
+
+## 🛠️ Requirements
+- RunPod account with GPU instance
+- Docker (optional but recommended)
+- VS Code with Cline extension
+
+## 📁 Project Structure
+```
+runpod/
+├── README.md
+├── VSCODE_INTEGRATION.md
+├── deployment/
+│   ├── start_ollama.sh
+│   ├── backup_config.sh
+│   └── litellm-config.yaml
+├── scripts/
+│   ├── model_management.sh
+│   └── health_check.sh
+├── configs/
+│   ├── .env.example
+│   └── docker-compose.yml
+└── docs/
+    ├── TROUBLESHOOTING.md
+    └── PERFORMANCE_TUNING.md
+```
+
+## 📋 Setup Instructions
+1. Create RunPod instance with PyTorch template
+2. Install Ollama and required dependencies
+3. Pull Qwen3-Coder model
+4. Configure LiteLLM proxy with authentication
+5. Start services with deployment scripts
+6. Configure VS Code integration
+
+## 🎯 Usage
+- API Endpoint: `https://your-runpod-endpoint.runpod.io/v1`
+- Model Names: `qwen3-coder`, `gpt-3.5-turbo`, `gpt-4`
+- Authentication: Bearer token required
@@ -0,0 +1,39 @@
+# VS Code Integration Guide
+
+## 🚀 Setup Instructions
+
+### 1. Install Cline Extension
+1. Open VS Code
+2. Go to Extensions (Ctrl+Shift+X)
+3. Search for "Cline" and install
+
+### 2. Configure Cline Settings
+Add to VS Code `settings.json`:
+```json
+{
+  "cline.customModel": "qwen3-coder",
+  "cline.customEndpoint": "https://your-runpod-endpoint.runpod.io/v1",
+  "cline.customApiKey": "YOUR_API_KEY_HERE"
+}
+```
+
+### 3. Environment Variables (Alternative)
+Set in your terminal:
+```bash
+export OPENAI_API_KEY="YOUR_API_KEY_HERE"
+export OPENAI_BASE_URL="https://your-runpod-endpoint.runpod.io/v1"
+```
+
+## 🧪 Testing Integration
+```bash
+# Test API access
+curl https://your-runpod-endpoint.runpod.io/v1/models \
+  -H "Authorization: Bearer YOUR_API_KEY_HERE"
+```
+
+## 🎯 Common Tasks
+- Code explanation: Select code and ask Cline to explain
+- Code generation: Ask Cline to write functions, classes, or scripts
+- Bug fixing: Highlight problematic code and ask for fixes
+- Code optimization: Request performance improvements
+- Documentation: Generate docstrings and comments
@@ -0,0 +1,26 @@
+version: '3.8'
+services:
+  ollama:
+    image: ollama/ollama
+    container_name: ollama
+    volumes:
+      - ./ollama-data:/root/.ollama
+      - ./models:/home/ollama/models
+    ports:
+      - "11434:11434"
+    restart: unless-stopped
+    environment:
+      - OLLAMA_HOST=0.0.0.0:11434
+      - OLLAMA_ORIGINS=*
+
+  litellm:
+    image: litellm/litellm:latest
+    container_name: litellm-proxy
+    ports:
+      - "8000:8000"
+    volumes:
+      - ./configs/litellm-config.yaml:/app/config.yaml
+    command: "--config /app/config.yaml --host 0.0.0.0 --port 8000"
+    depends_on:
+      - ollama
+    restart: unless-stopped
@@ -0,0 +1,16 @@
+#!/bin/bash
+# Backup configuration and models
+
+BACKUP_DIR="/home/ollama/backups"
+DATE=$(date +%Y%m%d_%H%M%S)
+
+echo "Creating backup..."
+mkdir -p $BACKUP_DIR
+
+# Backup model list
+ollama list > $BACKUP_DIR/model_list_$DATE.txt
+
+# Backup configuration files
+cp -r /root/.ollama/* $BACKUP_DIR/ 2>/dev/null || true
+
+echo "Backup completed: $BACKUP_DIR/model_list_$DATE.txt"