🚀 Razer AIKit + Visionary Tool Server Deployment Guide

Complete Integration for Local LLM Infrastructure

Created: February 2, 2026 Author: Claude Opus 4.5 + Jeremy Status: Ready for Deployment Estimated Time: 30-45 minutes

📋 Executive Summary

This guide deploys a drop-in replacement containing:

Razer AIKit - vLLM-powered local inference with 280K+ models
Visionary Tool Server - 312+ MCP tools with new AIKit bridge (22 tools)
ngrok - Remote access tunnel (same domain as before)

Your existing setup stays on ice as backup. Same port 8082, same ngrok domain.

What Changes

Before	After
13 LLM API providers	1 unified AIKit endpoint
$80-130/mo potential API costs	~$25-30/mo (Claude API only)
67 local models (Ollama + LMStudio)	280,000+ HuggingFace models
No fine-tuning capability	Full LoRA/QLoRA/DPO fine-tuning
Standard inference	vLLM optimized (2-3x faster)

What Stays the Same

All 312 existing MCP tools work identically
Your workflow (Claude/ChatGPT/Gemini → Tool Server) unchanged
Same port 8082 - drop-in replacement
Same ngrok domain - no client config changes
ngrok tunnel for mobile/remote access
Docker Desktop "click to run" simplicity

🔧 Prerequisites Checklist

Before starting, verify these are installed/configured:

Required

Docker Desktop running with WSL2 backend
NVIDIA Container Toolkit in WSL2 (for GPU passthrough)
ngrok account with custom domain capability
HuggingFace account (free, for model access)

Verify Docker GPU Support

# In PowerShell
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Should show your RTX 4090. If not, see Troubleshooting section.

Verify WSL2

wsl --list --verbose
# Should show at least one distro with VERSION 2

📁 File Structure

After deployment, your project will look like:

D:\DEV_PROJECTS\GitHub\Claude_Opus_ChatGPT_App_Project\
├── main.py                          # Entry point (unchanged)
├── docker-compose.yml               # Original Sandbox A
├── docker-compose.aikit.yml         # NEW: Sandbox B with AIKit
├── Dockerfile                        # Original
├── Dockerfile.aikit                  # NEW: AIKit-optimized
├── .env.master                       # API keys (unchanged)
├── RAZER_AIKIT_DEPLOYMENT.md        # This guide
├── app/
│   ├── server.py
│   ├── config.py
│   ├── utils.py
│   └── tools/
│       ├── github.py                # Existing (35 tools)
│       ├── discord.py               # Existing
│       ├── razer_aikit.py           # NEW: 22 AIKit tools
│       └── ... (60+ more modules)
└── D:\Visionary_Models\
    ├── aikit/                       # NEW: AIKit model storage
    └── aikit-cache/                 # NEW: HuggingFace cache

🚀 Deployment Steps

Step 1: Create Required Directories

# Create AIKit storage directories
New-Item -ItemType Directory -Force -Path "D:\Visionary_Models\aikit"
New-Item -ItemType Directory -Force -Path "D:\Visionary_Models\aikit-cache"

Step 2: Copy Deployment Files

Copy these files to your project root (D:\DEV_PROJECTS\GitHub\Claude_Opus_ChatGPT_App_Project\):

docker-compose.aikit.yml → Project root
Dockerfile.aikit → Project root
app/tools/razer_aikit.py → app/tools/ directory

Step 3: Register AIKit Module in Server

Edit app/server.py to import the new module. Add this line with the other tool imports:

# In app/server.py, add with other imports:
from app.tools import razer_aikit

Or if using dynamic import pattern, add to the tools list:

# If you have a TOOL_MODULES list:
TOOL_MODULES = [
    # ... existing modules ...
    "razer_aikit",  # ADD THIS LINE
]

Step 4: Configure ngrok Domain (Optional)

If you want a separate ngrok domain for the new sandbox, update docker-compose.aikit.yml:

# In ngrok service, change domain:
command: >
  http tool-server:8083
  --domain=visionary-aikit-sandbox.ngrok.io  # Your custom domain

Or use your existing domain by changing port mapping.

Step 5: Add HuggingFace Token to Environment

Add to your .env.master:

# Add this line (get token from https://huggingface.co/settings/tokens)
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Step 6: Build and Start the Stack

# Navigate to project
cd D:\DEV_PROJECTS\GitHub\Claude_Opus_ChatGPT_App_Project

# Build the containers (first time takes ~5-10 minutes)
docker compose -f docker-compose.aikit.yml build

# Start the stack
docker compose -f docker-compose.aikit.yml up -d

# Check status
docker compose -f docker-compose.aikit.yml ps

# View logs
docker compose -f docker-compose.aikit.yml logs -f

Step 7: Verify Deployment

# Check AIKit health
curl http://localhost:8000/health

# Check Tool Server health  
curl http://localhost:8082/health

# Should return: {"status":"healthy","tools":334}
# (312 original + 22 new AIKit tools)

Step 8: Test AIKit Tools

Via Tool Server MCP or direct API call:

# Test chat completion
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/deepseek-coder-1.3b-instruct",
    "messages": [{"role": "user", "content": "Write a Python hello world"}],
    "max_tokens": 100
  }'

Step 9: Start ngrok Tunnel (Optional)

For mobile/remote access:

# Start with tunnel profile
docker compose -f docker-compose.aikit.yml --profile tunnel up -d ngrok

# Check tunnel
curl http://localhost:4040/api/tunnels

🎮 Daily Usage

Starting the Stack

Option A: Docker Desktop UI

Open Docker Desktop
Find "visionary-aikit-stack"
Click ▶️ Run

Option B: Command Line

docker compose -f docker-compose.aikit.yml up -d

Stopping the Stack

Option A: Docker Desktop UI

Find "visionary-aikit-stack"
Click ⏹️ Stop

Option B: Command Line

docker compose -f docker-compose.aikit.yml down

Viewing Logs

# All services
docker compose -f docker-compose.aikit.yml logs -f

# Specific service
docker compose -f docker-compose.aikit.yml logs -f aikit
docker compose -f docker-compose.aikit.yml logs -f tool-server

Switching Models

The default model is deepseek-ai/deepseek-coder-1.3b-instruct (fast, small).

To use a different model, either:

Per-request: Specify model parameter in API calls
Default change: Edit docker-compose.aikit.yml command section

Popular models to try:

Qwen/Qwen2.5-7B-Instruct - Great all-rounder
microsoft/phi-4 - Strong reasoning
Qwen/Qwen2.5-Coder-32B-Instruct - Best coding quality
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B - Chain-of-thought

🔌 Connecting Clients

NOTHING CHANGES - same endpoints as before!

Claude.ai / Claude Mobile

Use MCP connector URL:

http://localhost:8082/sse        # Local
https://visionary-tool-server.ngrok.io/sse  # Remote

ChatGPT Desktop / Mobile

Same MCP endpoint:

https://visionary-tool-server.ngrok.io/sse

AnythingLLM Desktop & Mobile

Settings → Agent Skills → Add MCP Server:

Name: Visionary Tool Server
URL: http://localhost:8082/sse

Already configured? It just works. No changes needed.

Direct API Access

import httpx

# Chat with local LLM
response = httpx.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
    }
)
print(response.json())

🛠️ New AIKit Tools Reference

Core Inference (4 tools)

Tool	Description
`aikit_chat`	Chat completion (OpenAI-compatible)
`aikit_complete`	Text completion
`aikit_embed`	Generate embeddings
`aikit_code_assist`	Specialized code assistance

Model Management (5 tools)

Tool	Description
`aikit_list_models`	List available models
`aikit_pull_model`	Download from HuggingFace
`aikit_model_info`	Get model metadata
`aikit_load_model`	Load model into memory
`aikit_unload_model`	Remove model from memory

Fine-Tuning (5 tools)

Tool	Description
`aikit_finetune_start`	Start LoRA/QLoRA training
`aikit_finetune_status`	Check training progress
`aikit_finetune_stop`	Cancel training
`aikit_finetune_list`	List all jobs
`aikit_merge_adapter`	Merge adapter with base

System & Monitoring (4 tools)

Tool	Description
`aikit_health`	Server health check
`aikit_gpu_status`	GPU metrics
`aikit_cluster_status`	Ray cluster info
`aikit_benchmark`	Performance testing

Convenience (2 tools)

Tool	Description
`aikit_quick_chat`	Simple one-turn chat
`aikit_recommend_model`	Get model suggestions

Total: 22 new tools

🔧 Troubleshooting

Docker GPU Not Working

# Reinstall NVIDIA Container Toolkit in WSL2
wsl -d Ubuntu
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

AIKit Container Won't Start

# Check logs
docker compose -f docker-compose.aikit.yml logs aikit

# Common issues:
# - Out of GPU memory: Use smaller model
# - Model download failed: Check HuggingFace token
# - Port conflict: Change ports in docker-compose

Port 8000/8082 Already in Use

# Find what's using the port
netstat -ano | findstr :8000
netstat -ano | findstr :8082

# Kill the process
taskkill /PID <pid> /F

# Or change ports in docker-compose.aikit.yml

Model Too Large for VRAM

Your RTX 4090 has 24GB VRAM. Max model sizes:

7B models: ~14GB VRAM (comfortable)
13B models: ~22GB VRAM (tight)
32B+ models: Requires quantization

Use quantized versions:

# In docker-compose, change model to quantized version:
aikit run Qwen/Qwen2.5-32B-Instruct-AWQ --quantization awq

Tool Server Can't Connect to AIKit

# Check if AIKit is running
docker compose -f docker-compose.aikit.yml ps

# Check network
docker network inspect aikit-network

# Verify internal DNS
docker compose -f docker-compose.aikit.yml exec tool-server curl http://aikit:8000/health

📊 Performance Expectations

Inference Speed (RTX 4090)

Model Size	Tokens/sec	First Token Latency
1-3B	150-200	<100ms
7B	80-120	200-500ms
13B	40-60	500-1000ms
32B (quantized)	20-40	1-2s

Memory Usage

Component	RAM	VRAM
AIKit (7B model)	~4GB	~14GB
Tool Server	~1GB	~0
Docker overhead	~2GB	~0
Total	~7GB	~14GB

🔄 Rollback Plan

If anything goes wrong, your original setup is untouched:

# Stop new stack
docker compose -f docker-compose.aikit.yml down

# Use original stack
docker compose up -d

# Or run directly
python main.py

Your Sandbox A on port 8082 works exactly as before.

✅ Verification Checklist

After deployment, verify these work:

curl http://localhost:8000/health returns healthy
curl http://localhost:8083/health shows 334 tools
AIKit chat completion works with test prompt
Tool Server can call aikit_chat tool
Existing GitHub tools still work
ngrok tunnel accessible (if enabled)
Claude.ai can connect via MCP
GPU visible in container (nvidia-smi)

📚 Resources

🎉 Success!

Once verified, you have:

334 MCP tools (312 original + 22 AIKit)
280,000+ local LLM models via HuggingFace
vLLM optimized inference (2-3x faster)
Fine-tuning capability (LoRA, QLoRA, DPO)
$50-100/month savings on API costs
Complete data privacy - nothing leaves your machine

Your AI orchestration now runs through:

You → Claude/ChatGPT/Gemini → Visionary Tool Server → Razer AIKit → RTX 4090

Welcome to the future of local AI infrastructure! 🚀

FilesExpand file tree

DEPLOYMENT.md

Latest commit

History