Skip to content

Latest commit

 

History

History
671 lines (525 loc) · 16.8 KB

File metadata and controls

671 lines (525 loc) · 16.8 KB

Tier-2 HTDI Orchestrator

Repository: gen-idea-lab Role: OpenAI-compatible API Gateway & Fusion Orchestrator Status: 🟢 Production Ready (Phase 4 Complete)


Overview

The Tier-2 orchestrator is an OpenAI-compatible API gateway that orchestrates between local-first AI services:

  • Tier-3A (MLX): Local LLM inference and embeddings
  • Tier-3B (RAG): Vector database and semantic retrieval
  • Tier-3C (Smart Campus): Room and entity context for campus-aware prompts

This enables drop-in replacement for OpenAI API calls with local-first, privacy-preserving AI inference enhanced with RAG and spatial context.


Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
│                  (Browser, CLI, MCP Tools)                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Tier 2: gen-idea-lab                       │
│              (Orchestrator & API Gateway)                    │
│                                                              │
│  ┌─────────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ OpenAI Gateway  │  │   Fusion     │  │  MCP Server   │  │
│  │  /v1/chat       │  │  Pipeline    │  │  Integration  │  │
│  │  /v1/embeddings │  │  /fusion/*   │  │  /mcp/*       │  │
│  │  /v1/models     │  │              │  │               │  │
│  └────────┬────────┘  └──────┬───────┘  └───────┬───────┘  │
│           │                  │                   │           │
│           └──────────────────┼───────────────────┘           │
│                              │                               │
└──────────────────────────────┼───────────────────────────────┘
                               │
           ┌───────────────────┴───────────────────┬───────────┐
           │                   │                   │           │
           ▼                   ▼                   ▼           ▼
┌──────────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐
│  Tier 3A: MLX    │  │ Tier 3B: RAG │  │ Tier 3C:     │  │  Cloud     │
│  (LLM)           │  │ (Vector DB)  │  │ Smart Campus │  │  Fallback  │
│                  │  │              │  │  (Context)   │  │            │
│  Port: 8000      │  │ Port: 5100   │  │ Port: 5200   │  │  (Optional)│
└──────────────────┘  └──────────────┘  └──────────────┘  └────────────┘

Key Features

✅ OpenAI API Compatibility

  • Drop-in replacement for OpenAI API
  • /v1/chat/completions, /v1/embeddings, /v1/models
  • Full request/response format compatibility
  • Extended with HTDI metadata for enhanced observability

✅ Fusion Orchestration

  • RAG + MLX + Smart Campus in a single pipeline
  • Intelligent context enrichment from multiple sources
  • Multi-step reasoning support
  • Graceful degradation when services are unavailable

✅ Smart Campus Integration (NEW)

  • Room and entity context for campus-aware prompts
  • Real-time sensor data integration
  • IoT device state queries
  • Automatic context formatting for LLM prompts

✅ Request Tracing

  • End-to-end request ID propagation
  • Per-step latency tracking
  • Comprehensive error normalization
  • Structured logging throughout

✅ Health Gating

  • Pre-flight health checks before expensive operations
  • Automatic fallback to available services
  • Clear error messages when services are down

Quick Start

1. Start Tier-3 Services

# Terminal 1: MLX Server (Tier 3A)
cd ../mlx-openai-server-lab
python server.py --port 8000

# Terminal 2: RAG Engine (Tier 3B)
cd ../mlx-rag-lab
python app.py --port 5100

# Terminal 3: Smart Campus (Tier 3C) - Optional
cd ../smart-campus-service
npm start # or python server.py --port 5200

# Terminal 4: Verify health
curl http://localhost:8000/health  # MLX
curl http://localhost:5100/health  # RAG
curl http://localhost:5200/health  # Smart Campus

2. Configure Environment

# Copy example env file
cp .env.example .env

# Edit .env and set:
MLX_URL=http://localhost:8000
RAG_URL=http://localhost:5100
SMART_CAMPUS_URL=http://localhost:5200

# Enable local-first mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=true

3. Start Orchestrator

npm run dev:server
# Orchestrator runs on http://localhost:8081

4. Verify Integration

# Check orchestrator health
curl http://localhost:8081/internal/health

# Check Fusion health (includes all Tier-3 services)
curl http://localhost:8081/api/fusion/health

# Test OpenAI-compatible endpoint
curl -X POST http://localhost:8081/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

API Reference

OpenAI Gateway Endpoints

POST /api/v1/chat/completions

OpenAI-compatible chat completions with HTDI extensions.

Request:

{
  "model": "mlx-llama-3.1-8b",
  "messages": [
    { "role": "user", "content": "What's the temperature in the Peace room?" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "htdi": {
    "use_rag": true,
    "rag_collection": "default",
    "room_id": "peace",
    "include_entities": true
  }
}

Response:

{
  "id": "chatcmpl-abc-123",
  "object": "chat.completion",
  "model": "mlx-llama-3.1-8b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The temperature in the Peace room is 22.5°C..."
    },
    "finish_reason": "stop",
    "htdi": {
      "request_id": "abc-123",
      "latency_ms_total": 450,
      "rag_context": { "collection": "default", "count": 3 },
      "entities": [
        { "id": "sensor.temperature", "state": "22.5" }
      ]
    }
  }],
  "usage": { "total_tokens": 175 }
}

POST /api/v1/embeddings

Generate embeddings via MLX.

Request:

{
  "model": "mlx-embeddings-default",
  "input": ["Document 1", "Document 2"]
}

Response:

{
  "object": "list",
  "data": [
    { "object": "embedding", "embedding": [0.123, ...], "index": 0 },
    { "object": "embedding", "embedding": [0.456, ...], "index": 1 }
  ],
  "model": "mlx-embeddings-default",
  "usage": { "total_tokens": 15 }
}

GET /api/v1/models

List available models from MLX.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "mlx-llama-3.1-8b",
      "object": "model",
      "family": "llama",
      "description": "Llama 3.1 8B Instruct",
      "context_length": 8192,
      "tags": ["chat", "instruct"]
    }
  ]
}

Fusion Endpoints

POST /api/fusion/answer

RAG-enhanced LLM response with optional Smart Campus context.

Request:

{
  "question": "What rooms are available?",
  "collection": "campus-docs",
  "topK": 5,
  "minScore": 0.5,
  "roomId": "main-hall",
  "includeEntities": true
}

Response:

{
  "ok": true,
  "answer": "Based on the room data...",
  "mode": "fusion",
  "ragResults": [...],
  "smartCampus": {
    "roomId": "main-hall",
    "entities": [...]
  },
  "timings": {
    "healthCheckMs": 25,
    "ragQueryMs": 100,
    "smartCampusQueryMs": 45,
    "mlxGenerationMs": 300,
    "totalMs": 470
  }
}

POST /api/fusion/multistep

Multi-step reasoning with iterative RAG queries.

Request:

{
  "question": "Complex question requiring multiple steps",
  "collection": "default",
  "maxSteps": 3
}

Response:

{
  "ok": true,
  "answer": "Final synthesized answer",
  "steps": [
    { "step": 1, "question": "...", "mlxResponse": "..." },
    { "step": 2, "question": "...", "mlxResponse": "..." }
  ],
  "totalSteps": 2,
  "latencyMs": 1200
}

Smart Campus Endpoints

GET /api/smartcampus/rooms/:roomId

Query room information and entities.

Parameters:

  • roomId (URL) - Room identifier
  • include_entities (query) - Include entity states (default: true)
  • include_sensors (query) - Include sensor data (default: false)

Response:

{
  "ok": true,
  "room": {
    "id": "peace",
    "name": "Peace Room",
    "entities": [
      { "id": "light.peace", "state": "on", "attributes": { "brightness": 80 } },
      { "id": "sensor.temperature", "state": "22.5" }
    ]
  },
  "latencyMs": 45
}

GET /api/smartcampus/entities/:entityId

Query specific entity state.

Response:

{
  "ok": true,
  "entity": {
    "id": "light.peace",
    "name": "Peace Room Light",
    "state": "on",
    "attributes": { "brightness": 80 }
  },
  "latencyMs": 25
}

POST /api/smartcampus/entities/batch

Query multiple entities at once.

Request:

{
  "entity_ids": ["light.peace", "sensor.temperature"]
}

Response:

{
  "ok": true,
  "entities": [
    { "id": "light.peace", "state": "on" },
    { "id": "sensor.temperature", "state": "22.5" }
  ],
  "latencyMs": 60
}

Provider Architecture

Provider Layer

All Tier-3 communication goes through provider classes:

server/providers/
├── mlx.js          → Tier-3A (LLM & embeddings)
├── rag.js          → Tier-3B (vector DB)
├── smartcampus.js  → Tier-3C (room & entity context)
└── fusion.js       → Orchestrates all providers

Key Methods:

// MLX Provider
mlxProvider.healthCheck({ requestId })
mlxProvider.chat(messages, model, { temperature, maxTokens, requestId })
mlxProvider.embed(textList, model, { requestId })
mlxProvider.listModels({ requestId })

// RAG Provider
ragProvider.healthCheck({ requestId })
ragProvider.query(query, collection, k, minScore, { requestId })
ragProvider.upsert(documents, collection, { requestId })
ragProvider.delete(ids, collection, { requestId })

// Smart Campus Provider
smartCampusProvider.healthCheck({ requestId })
smartCampusProvider.queryRoom(roomId, { includeEntities, requestId })
smartCampusProvider.queryEntity(entityId, { requestId })
smartCampusProvider.formatRoomContext(roomData, includeDetails)

// Fusion Provider
fusionProvider.healthCheck({ requestId, includeSmartCampus })
fusionProvider.answer(question, { collection, roomId, includeEntities, ... })
fusionProvider.multiStepReasoning(question, { maxSteps, ... })

Degradation Modes

The Fusion provider implements graceful degradation:

1. fusion (All Services Healthy)

  • Health: ✅ MLX + ✅ RAG + ✅ Smart Campus
  • Flow: Smart Campus → RAG → MLX with full context
  • Quality: Highest (grounded + contextually aware)

2. fusion (MLX + RAG Only)

  • Health: ✅ MLX + ✅ RAG + ❌ Smart Campus
  • Flow: RAG → MLX (without campus context)
  • Quality: High (grounded in knowledge base)

3. mlx_only (MLX Only)

  • Health: ✅ MLX + ❌ RAG + ❌ Smart Campus
  • Flow: Direct MLX generation
  • Quality: Good (general knowledge)

4. rag_only (RAG Only)

  • Health: ❌ MLX + ✅ RAG + any Smart Campus
  • Flow: Return formatted RAG results
  • Quality: Context only (no rewriting)

5. error (All Down)

  • Health: ❌ MLX + ❌ RAG + ❌ Smart Campus
  • Response: Error with service status

Configuration

Required Environment Variables

# Tier-3 Service URLs
MLX_URL=http://localhost:8000           # Tier 3A (LLM)
RAG_URL=http://localhost:5100           # Tier 3B (RAG)
SMART_CAMPUS_URL=http://localhost:5200  # Tier 3C (Smart Campus)

# Local-First Mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=true
FALLBACK_TO_CLOUD=false  # Fail fast (recommended for Phase 4)

Optional Environment Variables

# Authentication (if not using bypass)
AUTH_SECRET=your_jwt_secret
ENCRYPTION_KEY=your_encryption_key

# Cloud API Keys (fallback only)
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

Testing

Run All Tests

npm test

Run Specific Test Suites

# Phase-4 integration tests
npm test tests/phase4-integration.test.js

# Smart Campus integration tests
npm test tests/smartcampus-integration.test.js

# Registry tests (critical for Phase 4)
npm test tests/registry.routes.test.js

Manual Testing

# Health checks
curl http://localhost:8081/internal/health
curl http://localhost:8081/api/fusion/health
curl http://localhost:8081/api/smartcampus/health

# OpenAI Gateway
curl -X POST http://localhost:8081/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mlx-llama-3.1-8b", "messages": [{"role": "user", "content": "Test"}]}'

# Fusion with Smart Campus
curl -X POST http://localhost:8081/api/fusion/answer \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the temperature?",
    "roomId": "peace",
    "includeEntities": true
  }'

# Smart Campus direct
curl http://localhost:8081/api/smartcampus/rooms/peace?include_entities=true

Performance

Latency Targets

Operation P50 P95 P99
OpenAI Gateway (MLX only) 250ms 400ms 600ms
Fusion (RAG + MLX) 350ms 600ms 900ms
Fusion (Full with Smart Campus) 400ms 700ms 1000ms
Smart Campus room query 30ms 60ms 100ms

Throughput

  • OpenAI Gateway: Limited by MLX (~10-20 req/s)
  • Fusion Pipeline: ~5-10 req/s (multi-step operations)
  • Smart Campus: 100+ req/s (lightweight context queries)

Troubleshooting

Services Not Running

Symptom: SERVER_UNREACHABLE errors

Fix:

# Check which services are down
curl http://localhost:8000/health  # MLX
curl http://localhost:5100/health  # RAG
curl http://localhost:5200/health  # Smart Campus

# Start missing services (see Quick Start section)

Fusion Returns Degraded Mode

Symptom: mode: "mlx_only" or mode: "rag_only"

Debug:

# Check fusion health
curl http://localhost:8081/api/fusion/health | jq

# Look for services reporting ok: false

High Latency

Debug:

# Use fusion debug for timing breakdown
curl -X POST http://localhost:8081/api/fusion/answer \
  -H "Content-Type: application/json" \
  -d '{"question": "test", "roomId": "peace"}' | jq '.timings'

# Look for slow operations:
# - healthCheckMs (should be < 50ms)
# - ragQueryMs (should be < 150ms)
# - smartCampusQueryMs (should be < 100ms)
# - mlxGenerationMs (depends on model, typically 250-600ms)

Documentation

Phase-4 Documents

  • docs/PHASE4_SERVICE_TOPOLOGY.md - Overall architecture (700+ lines)
  • docs/PHASE4_TIER3A_MLX_CONTRACT.md - MLX server API contract
  • docs/PHASE4_TIER3B_RAG_CONTRACT.md - RAG engine API contract
  • docs/PHASE4_SMART_CAMPUS_INTEGRATION.md - Smart Campus integration guide (NEW)
  • docs/PHASE4_PROVIDER_CONTRACTS.md - Tier-2 provider reference (950 lines)
  • docs/PHASE4_COMPLETION_REPORT.md - Implementation summary
  • PHASE4_HANDOFF.md - Quick handoff guide

Summary

Tier-2 Orchestrator Status: 🟢 PRODUCTION READY

Capabilities:

  • ✅ OpenAI API compatibility
  • ✅ RAG-enhanced responses
  • ✅ Smart Campus context integration
  • ✅ Multi-step reasoning
  • ✅ Graceful degradation
  • ✅ End-to-end tracing
  • ✅ Health gating
  • ✅ 100% local-first (with cloud fallback option)

Integrations:

  • ✅ Tier-3A (MLX) - LLM inference & embeddings
  • ✅ Tier-3B (RAG) - Semantic search & vector DB
  • ✅ Tier-3C (Smart Campus) - Room & entity context

Next Steps:

  1. Start all Tier-3 services
  2. Configure environment variables
  3. Run integration tests
  4. Deploy orchestrator
  5. Monitor /internal/health and logs

Version: 1.0 Date: 2025-11-20 Maintainer: Phase-4 Team