Repository: gen-idea-lab Role: OpenAI-compatible API Gateway & Fusion Orchestrator Status: 🟢 Production Ready (Phase 4 Complete)
The Tier-2 orchestrator is an OpenAI-compatible API gateway that orchestrates between local-first AI services:
- Tier-3A (MLX): Local LLM inference and embeddings
- Tier-3B (RAG): Vector database and semantic retrieval
- Tier-3C (Smart Campus): Room and entity context for campus-aware prompts
This enables drop-in replacement for OpenAI API calls with local-first, privacy-preserving AI inference enhanced with RAG and spatial context.
┌─────────────────────────────────────────────────────────────┐
│ Client Applications │
│ (Browser, CLI, MCP Tools) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Tier 2: gen-idea-lab │
│ (Orchestrator & API Gateway) │
│ │
│ ┌─────────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ OpenAI Gateway │ │ Fusion │ │ MCP Server │ │
│ │ /v1/chat │ │ Pipeline │ │ Integration │ │
│ │ /v1/embeddings │ │ /fusion/* │ │ /mcp/* │ │
│ │ /v1/models │ │ │ │ │ │
│ └────────┬────────┘ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼───────────────────┘ │
│ │ │
└──────────────────────────────┼───────────────────────────────┘
│
┌───────────────────┴───────────────────┬───────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐
│ Tier 3A: MLX │ │ Tier 3B: RAG │ │ Tier 3C: │ │ Cloud │
│ (LLM) │ │ (Vector DB) │ │ Smart Campus │ │ Fallback │
│ │ │ │ │ (Context) │ │ │
│ Port: 8000 │ │ Port: 5100 │ │ Port: 5200 │ │ (Optional)│
└──────────────────┘ └──────────────┘ └──────────────┘ └────────────┘
- Drop-in replacement for OpenAI API
/v1/chat/completions,/v1/embeddings,/v1/models- Full request/response format compatibility
- Extended with HTDI metadata for enhanced observability
- RAG + MLX + Smart Campus in a single pipeline
- Intelligent context enrichment from multiple sources
- Multi-step reasoning support
- Graceful degradation when services are unavailable
- Room and entity context for campus-aware prompts
- Real-time sensor data integration
- IoT device state queries
- Automatic context formatting for LLM prompts
- End-to-end request ID propagation
- Per-step latency tracking
- Comprehensive error normalization
- Structured logging throughout
- Pre-flight health checks before expensive operations
- Automatic fallback to available services
- Clear error messages when services are down
# Terminal 1: MLX Server (Tier 3A)
cd ../mlx-openai-server-lab
python server.py --port 8000
# Terminal 2: RAG Engine (Tier 3B)
cd ../mlx-rag-lab
python app.py --port 5100
# Terminal 3: Smart Campus (Tier 3C) - Optional
cd ../smart-campus-service
npm start # or python server.py --port 5200
# Terminal 4: Verify health
curl http://localhost:8000/health # MLX
curl http://localhost:5100/health # RAG
curl http://localhost:5200/health # Smart Campus# Copy example env file
cp .env.example .env
# Edit .env and set:
MLX_URL=http://localhost:8000
RAG_URL=http://localhost:5100
SMART_CAMPUS_URL=http://localhost:5200
# Enable local-first mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=truenpm run dev:server
# Orchestrator runs on http://localhost:8081# Check orchestrator health
curl http://localhost:8081/internal/health
# Check Fusion health (includes all Tier-3 services)
curl http://localhost:8081/api/fusion/health
# Test OpenAI-compatible endpoint
curl -X POST http://localhost:8081/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'OpenAI-compatible chat completions with HTDI extensions.
Request:
{
"model": "mlx-llama-3.1-8b",
"messages": [
{ "role": "user", "content": "What's the temperature in the Peace room?" }
],
"temperature": 0.7,
"max_tokens": 1024,
"htdi": {
"use_rag": true,
"rag_collection": "default",
"room_id": "peace",
"include_entities": true
}
}Response:
{
"id": "chatcmpl-abc-123",
"object": "chat.completion",
"model": "mlx-llama-3.1-8b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "The temperature in the Peace room is 22.5°C..."
},
"finish_reason": "stop",
"htdi": {
"request_id": "abc-123",
"latency_ms_total": 450,
"rag_context": { "collection": "default", "count": 3 },
"entities": [
{ "id": "sensor.temperature", "state": "22.5" }
]
}
}],
"usage": { "total_tokens": 175 }
}Generate embeddings via MLX.
Request:
{
"model": "mlx-embeddings-default",
"input": ["Document 1", "Document 2"]
}Response:
{
"object": "list",
"data": [
{ "object": "embedding", "embedding": [0.123, ...], "index": 0 },
{ "object": "embedding", "embedding": [0.456, ...], "index": 1 }
],
"model": "mlx-embeddings-default",
"usage": { "total_tokens": 15 }
}List available models from MLX.
Response:
{
"object": "list",
"data": [
{
"id": "mlx-llama-3.1-8b",
"object": "model",
"family": "llama",
"description": "Llama 3.1 8B Instruct",
"context_length": 8192,
"tags": ["chat", "instruct"]
}
]
}RAG-enhanced LLM response with optional Smart Campus context.
Request:
{
"question": "What rooms are available?",
"collection": "campus-docs",
"topK": 5,
"minScore": 0.5,
"roomId": "main-hall",
"includeEntities": true
}Response:
{
"ok": true,
"answer": "Based on the room data...",
"mode": "fusion",
"ragResults": [...],
"smartCampus": {
"roomId": "main-hall",
"entities": [...]
},
"timings": {
"healthCheckMs": 25,
"ragQueryMs": 100,
"smartCampusQueryMs": 45,
"mlxGenerationMs": 300,
"totalMs": 470
}
}Multi-step reasoning with iterative RAG queries.
Request:
{
"question": "Complex question requiring multiple steps",
"collection": "default",
"maxSteps": 3
}Response:
{
"ok": true,
"answer": "Final synthesized answer",
"steps": [
{ "step": 1, "question": "...", "mlxResponse": "..." },
{ "step": 2, "question": "...", "mlxResponse": "..." }
],
"totalSteps": 2,
"latencyMs": 1200
}Query room information and entities.
Parameters:
roomId(URL) - Room identifierinclude_entities(query) - Include entity states (default: true)include_sensors(query) - Include sensor data (default: false)
Response:
{
"ok": true,
"room": {
"id": "peace",
"name": "Peace Room",
"entities": [
{ "id": "light.peace", "state": "on", "attributes": { "brightness": 80 } },
{ "id": "sensor.temperature", "state": "22.5" }
]
},
"latencyMs": 45
}Query specific entity state.
Response:
{
"ok": true,
"entity": {
"id": "light.peace",
"name": "Peace Room Light",
"state": "on",
"attributes": { "brightness": 80 }
},
"latencyMs": 25
}Query multiple entities at once.
Request:
{
"entity_ids": ["light.peace", "sensor.temperature"]
}Response:
{
"ok": true,
"entities": [
{ "id": "light.peace", "state": "on" },
{ "id": "sensor.temperature", "state": "22.5" }
],
"latencyMs": 60
}All Tier-3 communication goes through provider classes:
server/providers/
├── mlx.js → Tier-3A (LLM & embeddings)
├── rag.js → Tier-3B (vector DB)
├── smartcampus.js → Tier-3C (room & entity context)
└── fusion.js → Orchestrates all providers
Key Methods:
// MLX Provider
mlxProvider.healthCheck({ requestId })
mlxProvider.chat(messages, model, { temperature, maxTokens, requestId })
mlxProvider.embed(textList, model, { requestId })
mlxProvider.listModels({ requestId })
// RAG Provider
ragProvider.healthCheck({ requestId })
ragProvider.query(query, collection, k, minScore, { requestId })
ragProvider.upsert(documents, collection, { requestId })
ragProvider.delete(ids, collection, { requestId })
// Smart Campus Provider
smartCampusProvider.healthCheck({ requestId })
smartCampusProvider.queryRoom(roomId, { includeEntities, requestId })
smartCampusProvider.queryEntity(entityId, { requestId })
smartCampusProvider.formatRoomContext(roomData, includeDetails)
// Fusion Provider
fusionProvider.healthCheck({ requestId, includeSmartCampus })
fusionProvider.answer(question, { collection, roomId, includeEntities, ... })
fusionProvider.multiStepReasoning(question, { maxSteps, ... })The Fusion provider implements graceful degradation:
- Health: ✅ MLX + ✅ RAG + ✅ Smart Campus
- Flow: Smart Campus → RAG → MLX with full context
- Quality: Highest (grounded + contextually aware)
- Health: ✅ MLX + ✅ RAG + ❌ Smart Campus
- Flow: RAG → MLX (without campus context)
- Quality: High (grounded in knowledge base)
- Health: ✅ MLX + ❌ RAG + ❌ Smart Campus
- Flow: Direct MLX generation
- Quality: Good (general knowledge)
- Health: ❌ MLX + ✅ RAG + any Smart Campus
- Flow: Return formatted RAG results
- Quality: Context only (no rewriting)
- Health: ❌ MLX + ❌ RAG + ❌ Smart Campus
- Response: Error with service status
# Tier-3 Service URLs
MLX_URL=http://localhost:8000 # Tier 3A (LLM)
RAG_URL=http://localhost:5100 # Tier 3B (RAG)
SMART_CAMPUS_URL=http://localhost:5200 # Tier 3C (Smart Campus)
# Local-First Mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=true
FALLBACK_TO_CLOUD=false # Fail fast (recommended for Phase 4)# Authentication (if not using bypass)
AUTH_SECRET=your_jwt_secret
ENCRYPTION_KEY=your_encryption_key
# Cloud API Keys (fallback only)
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...npm test# Phase-4 integration tests
npm test tests/phase4-integration.test.js
# Smart Campus integration tests
npm test tests/smartcampus-integration.test.js
# Registry tests (critical for Phase 4)
npm test tests/registry.routes.test.js# Health checks
curl http://localhost:8081/internal/health
curl http://localhost:8081/api/fusion/health
curl http://localhost:8081/api/smartcampus/health
# OpenAI Gateway
curl -X POST http://localhost:8081/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "mlx-llama-3.1-8b", "messages": [{"role": "user", "content": "Test"}]}'
# Fusion with Smart Campus
curl -X POST http://localhost:8081/api/fusion/answer \
-H "Content-Type: application/json" \
-d '{
"question": "What is the temperature?",
"roomId": "peace",
"includeEntities": true
}'
# Smart Campus direct
curl http://localhost:8081/api/smartcampus/rooms/peace?include_entities=true| Operation | P50 | P95 | P99 |
|---|---|---|---|
| OpenAI Gateway (MLX only) | 250ms | 400ms | 600ms |
| Fusion (RAG + MLX) | 350ms | 600ms | 900ms |
| Fusion (Full with Smart Campus) | 400ms | 700ms | 1000ms |
| Smart Campus room query | 30ms | 60ms | 100ms |
- OpenAI Gateway: Limited by MLX (~10-20 req/s)
- Fusion Pipeline: ~5-10 req/s (multi-step operations)
- Smart Campus: 100+ req/s (lightweight context queries)
Symptom: SERVER_UNREACHABLE errors
Fix:
# Check which services are down
curl http://localhost:8000/health # MLX
curl http://localhost:5100/health # RAG
curl http://localhost:5200/health # Smart Campus
# Start missing services (see Quick Start section)Symptom: mode: "mlx_only" or mode: "rag_only"
Debug:
# Check fusion health
curl http://localhost:8081/api/fusion/health | jq
# Look for services reporting ok: falseDebug:
# Use fusion debug for timing breakdown
curl -X POST http://localhost:8081/api/fusion/answer \
-H "Content-Type: application/json" \
-d '{"question": "test", "roomId": "peace"}' | jq '.timings'
# Look for slow operations:
# - healthCheckMs (should be < 50ms)
# - ragQueryMs (should be < 150ms)
# - smartCampusQueryMs (should be < 100ms)
# - mlxGenerationMs (depends on model, typically 250-600ms)docs/PHASE4_SERVICE_TOPOLOGY.md- Overall architecture (700+ lines)docs/PHASE4_TIER3A_MLX_CONTRACT.md- MLX server API contractdocs/PHASE4_TIER3B_RAG_CONTRACT.md- RAG engine API contractdocs/PHASE4_SMART_CAMPUS_INTEGRATION.md- Smart Campus integration guide (NEW)docs/PHASE4_PROVIDER_CONTRACTS.md- Tier-2 provider reference (950 lines)docs/PHASE4_COMPLETION_REPORT.md- Implementation summaryPHASE4_HANDOFF.md- Quick handoff guide
Tier-2 Orchestrator Status: 🟢 PRODUCTION READY
Capabilities:
- ✅ OpenAI API compatibility
- ✅ RAG-enhanced responses
- ✅ Smart Campus context integration
- ✅ Multi-step reasoning
- ✅ Graceful degradation
- ✅ End-to-end tracing
- ✅ Health gating
- ✅ 100% local-first (with cloud fallback option)
Integrations:
- ✅ Tier-3A (MLX) - LLM inference & embeddings
- ✅ Tier-3B (RAG) - Semantic search & vector DB
- ✅ Tier-3C (Smart Campus) - Room & entity context
Next Steps:
- Start all Tier-3 services
- Configure environment variables
- Run integration tests
- Deploy orchestrator
- Monitor
/internal/healthand logs
Version: 1.0 Date: 2025-11-20 Maintainer: Phase-4 Team