Tier-2 HTDI Orchestrator

Repository: gen-idea-lab Role: OpenAI-compatible API Gateway & Fusion Orchestrator Status: 🟢 Production Ready (Phase 4 Complete)

Overview

The Tier-2 orchestrator is an OpenAI-compatible API gateway that orchestrates between local-first AI services:

Tier-3A (MLX): Local LLM inference and embeddings
Tier-3B (RAG): Vector database and semantic retrieval
Tier-3C (Smart Campus): Room and entity context for campus-aware prompts

This enables drop-in replacement for OpenAI API calls with local-first, privacy-preserving AI inference enhanced with RAG and spatial context.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Client Applications                     │
│                  (Browser, CLI, MCP Tools)                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Tier 2: gen-idea-lab                       │
│              (Orchestrator & API Gateway)                    │
│                                                              │
│  ┌─────────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ OpenAI Gateway  │  │   Fusion     │  │  MCP Server   │  │
│  │  /v1/chat       │  │  Pipeline    │  │  Integration  │  │
│  │  /v1/embeddings │  │  /fusion/*   │  │  /mcp/*       │  │
│  │  /v1/models     │  │              │  │               │  │
│  └────────┬────────┘  └──────┬───────┘  └───────┬───────┘  │
│           │                  │                   │           │
│           └──────────────────┼───────────────────┘           │
│                              │                               │
└──────────────────────────────┼───────────────────────────────┘
                               │
           ┌───────────────────┴───────────────────┬───────────┐
           │                   │                   │           │
           ▼                   ▼                   ▼           ▼
┌──────────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐
│  Tier 3A: MLX    │  │ Tier 3B: RAG │  │ Tier 3C:     │  │  Cloud     │
│  (LLM)           │  │ (Vector DB)  │  │ Smart Campus │  │  Fallback  │
│                  │  │              │  │  (Context)   │  │            │
│  Port: 8000      │  │ Port: 5100   │  │ Port: 5200   │  │  (Optional)│
└──────────────────┘  └──────────────┘  └──────────────┘  └────────────┘

Key Features

✅ OpenAI API Compatibility

Drop-in replacement for OpenAI API
/v1/chat/completions, /v1/embeddings, /v1/models
Full request/response format compatibility
Extended with HTDI metadata for enhanced observability

✅ Fusion Orchestration

RAG + MLX + Smart Campus in a single pipeline
Intelligent context enrichment from multiple sources
Multi-step reasoning support
Graceful degradation when services are unavailable

✅ Smart Campus Integration (NEW)

Room and entity context for campus-aware prompts
Real-time sensor data integration
IoT device state queries
Automatic context formatting for LLM prompts

✅ Request Tracing

End-to-end request ID propagation
Per-step latency tracking
Comprehensive error normalization
Structured logging throughout

✅ Health Gating

Pre-flight health checks before expensive operations
Automatic fallback to available services
Clear error messages when services are down

Quick Start

1. Start Tier-3 Services

# Terminal 1: MLX Server (Tier 3A)
cd ../mlx-openai-server-lab
python server.py --port 8000

# Terminal 2: RAG Engine (Tier 3B)
cd ../mlx-rag-lab
python app.py --port 5100

# Terminal 3: Smart Campus (Tier 3C) - Optional
cd ../smart-campus-service
npm start # or python server.py --port 5200

# Terminal 4: Verify health
curl http://localhost:8000/health  # MLX
curl http://localhost:5100/health  # RAG
curl http://localhost:5200/health  # Smart Campus

2. Configure Environment

# Copy example env file
cp .env.example .env

# Edit .env and set:
MLX_URL=http://localhost:8000
RAG_URL=http://localhost:5100
SMART_CAMPUS_URL=http://localhost:5200

# Enable local-first mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=true

3. Start Orchestrator

npm run dev:server
# Orchestrator runs on http://localhost:8081

4. Verify Integration

# Check orchestrator health
curl http://localhost:8081/internal/health

# Check Fusion health (includes all Tier-3 services)
curl http://localhost:8081/api/fusion/health

# Test OpenAI-compatible endpoint
curl -X POST http://localhost:8081/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

API Reference

OpenAI Gateway Endpoints

POST `/api/v1/chat/completions`

OpenAI-compatible chat completions with HTDI extensions.

Request:

{
  "model": "mlx-llama-3.1-8b",
  "messages": [
    { "role": "user", "content": "What's the temperature in the Peace room?" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "htdi": {
    "use_rag": true,
    "rag_collection": "default",
    "room_id": "peace",
    "include_entities": true
  }
}

Response:

{
  "id": "chatcmpl-abc-123",
  "object": "chat.completion",
  "model": "mlx-llama-3.1-8b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The temperature in the Peace room is 22.5°C..."
    },
    "finish_reason": "stop",
    "htdi": {
      "request_id": "abc-123",
      "latency_ms_total": 450,
      "rag_context": { "collection": "default", "count": 3 },
      "entities": [
        { "id": "sensor.temperature", "state": "22.5" }
      ]
    }
  }],
  "usage": { "total_tokens": 175 }
}

POST `/api/v1/embeddings`

Generate embeddings via MLX.

Request:

{
  "model": "mlx-embeddings-default",
  "input": ["Document 1", "Document 2"]
}

Response:

{
  "object": "list",
  "data": [
    { "object": "embedding", "embedding": [0.123, ...], "index": 0 },
    { "object": "embedding", "embedding": [0.456, ...], "index": 1 }
  ],
  "model": "mlx-embeddings-default",
  "usage": { "total_tokens": 15 }
}

GET `/api/v1/models`

List available models from MLX.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "mlx-llama-3.1-8b",
      "object": "model",
      "family": "llama",
      "description": "Llama 3.1 8B Instruct",
      "context_length": 8192,
      "tags": ["chat", "instruct"]
    }
  ]
}

Fusion Endpoints

POST `/api/fusion/answer`

RAG-enhanced LLM response with optional Smart Campus context.

Request:

{
  "question": "What rooms are available?",
  "collection": "campus-docs",
  "topK": 5,
  "minScore": 0.5,
  "roomId": "main-hall",
  "includeEntities": true
}

Response:

{
  "ok": true,
  "answer": "Based on the room data...",
  "mode": "fusion",
  "ragResults": [...],
  "smartCampus": {
    "roomId": "main-hall",
    "entities": [...]
  },
  "timings": {
    "healthCheckMs": 25,
    "ragQueryMs": 100,
    "smartCampusQueryMs": 45,
    "mlxGenerationMs": 300,
    "totalMs": 470
  }
}

POST `/api/fusion/multistep`

Multi-step reasoning with iterative RAG queries.

Request:

{
  "question": "Complex question requiring multiple steps",
  "collection": "default",
  "maxSteps": 3
}

Response:

{
  "ok": true,
  "answer": "Final synthesized answer",
  "steps": [
    { "step": 1, "question": "...", "mlxResponse": "..." },
    { "step": 2, "question": "...", "mlxResponse": "..." }
  ],
  "totalSteps": 2,
  "latencyMs": 1200
}

Smart Campus Endpoints

GET `/api/smartcampus/rooms/:roomId`

Query room information and entities.

Parameters:

roomId (URL) - Room identifier
include_entities (query) - Include entity states (default: true)
include_sensors (query) - Include sensor data (default: false)

Response:

{
  "ok": true,
  "room": {
    "id": "peace",
    "name": "Peace Room",
    "entities": [
      { "id": "light.peace", "state": "on", "attributes": { "brightness": 80 } },
      { "id": "sensor.temperature", "state": "22.5" }
    ]
  },
  "latencyMs": 45
}

GET `/api/smartcampus/entities/:entityId`

Query specific entity state.

Response:

{
  "ok": true,
  "entity": {
    "id": "light.peace",
    "name": "Peace Room Light",
    "state": "on",
    "attributes": { "brightness": 80 }
  },
  "latencyMs": 25
}

POST `/api/smartcampus/entities/batch`

Query multiple entities at once.

Request:

{
  "entity_ids": ["light.peace", "sensor.temperature"]
}

Response:

{
  "ok": true,
  "entities": [
    { "id": "light.peace", "state": "on" },
    { "id": "sensor.temperature", "state": "22.5" }
  ],
  "latencyMs": 60
}

Provider Architecture

Provider Layer

All Tier-3 communication goes through provider classes:

server/providers/
├── mlx.js          → Tier-3A (LLM & embeddings)
├── rag.js          → Tier-3B (vector DB)
├── smartcampus.js  → Tier-3C (room & entity context)
└── fusion.js       → Orchestrates all providers

Key Methods:

// MLX Provider
mlxProvider.healthCheck({ requestId })
mlxProvider.chat(messages, model, { temperature, maxTokens, requestId })
mlxProvider.embed(textList, model, { requestId })
mlxProvider.listModels({ requestId })

// RAG Provider
ragProvider.healthCheck({ requestId })
ragProvider.query(query, collection, k, minScore, { requestId })
ragProvider.upsert(documents, collection, { requestId })
ragProvider.delete(ids, collection, { requestId })

// Smart Campus Provider
smartCampusProvider.healthCheck({ requestId })
smartCampusProvider.queryRoom(roomId, { includeEntities, requestId })
smartCampusProvider.queryEntity(entityId, { requestId })
smartCampusProvider.formatRoomContext(roomData, includeDetails)

// Fusion Provider
fusionProvider.healthCheck({ requestId, includeSmartCampus })
fusionProvider.answer(question, { collection, roomId, includeEntities, ... })
fusionProvider.multiStepReasoning(question, { maxSteps, ... })

Degradation Modes

The Fusion provider implements graceful degradation:

1. fusion (All Services Healthy)

Health: ✅ MLX + ✅ RAG + ✅ Smart Campus
Flow: Smart Campus → RAG → MLX with full context
Quality: Highest (grounded + contextually aware)

2. fusion (MLX + RAG Only)

Health: ✅ MLX + ✅ RAG + ❌ Smart Campus
Flow: RAG → MLX (without campus context)
Quality: High (grounded in knowledge base)

3. mlx_only (MLX Only)

Health: ✅ MLX + ❌ RAG + ❌ Smart Campus
Flow: Direct MLX generation
Quality: Good (general knowledge)

4. rag_only (RAG Only)

Health: ❌ MLX + ✅ RAG + any Smart Campus
Flow: Return formatted RAG results
Quality: Context only (no rewriting)

5. error (All Down)

Health: ❌ MLX + ❌ RAG + ❌ Smart Campus
Response: Error with service status

Configuration

Required Environment Variables

# Tier-3 Service URLs
MLX_URL=http://localhost:8000           # Tier 3A (LLM)
RAG_URL=http://localhost:5100           # Tier 3B (RAG)
SMART_CAMPUS_URL=http://localhost:5200  # Tier 3C (Smart Campus)

# Local-First Mode
PREFER_LOCAL_LLM=true
PREFER_LOCAL_EMBEDDINGS=true
PREFER_LOCAL_RAG=true
FALLBACK_TO_CLOUD=false  # Fail fast (recommended for Phase 4)

Optional Environment Variables

# Authentication (if not using bypass)
AUTH_SECRET=your_jwt_secret
ENCRYPTION_KEY=your_encryption_key

# Cloud API Keys (fallback only)
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

Testing

Run All Tests

npm test

Run Specific Test Suites

# Phase-4 integration tests
npm test tests/phase4-integration.test.js

# Smart Campus integration tests
npm test tests/smartcampus-integration.test.js

# Registry tests (critical for Phase 4)
npm test tests/registry.routes.test.js

Manual Testing

# Health checks
curl http://localhost:8081/internal/health
curl http://localhost:8081/api/fusion/health
curl http://localhost:8081/api/smartcampus/health

# OpenAI Gateway
curl -X POST http://localhost:8081/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "mlx-llama-3.1-8b", "messages": [{"role": "user", "content": "Test"}]}'

# Fusion with Smart Campus
curl -X POST http://localhost:8081/api/fusion/answer \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the temperature?",
    "roomId": "peace",
    "includeEntities": true
  }'

# Smart Campus direct
curl http://localhost:8081/api/smartcampus/rooms/peace?include_entities=true

Performance

Latency Targets

Operation	P50	P95	P99
OpenAI Gateway (MLX only)	250ms	400ms	600ms
Fusion (RAG + MLX)	350ms	600ms	900ms
Fusion (Full with Smart Campus)	400ms	700ms	1000ms
Smart Campus room query	30ms	60ms	100ms

Throughput

OpenAI Gateway: Limited by MLX (~10-20 req/s)
Fusion Pipeline: ~5-10 req/s (multi-step operations)
Smart Campus: 100+ req/s (lightweight context queries)

Troubleshooting

Services Not Running

Symptom: SERVER_UNREACHABLE errors

Fix:

# Check which services are down
curl http://localhost:8000/health  # MLX
curl http://localhost:5100/health  # RAG
curl http://localhost:5200/health  # Smart Campus

# Start missing services (see Quick Start section)

Fusion Returns Degraded Mode

Symptom: mode: "mlx_only" or mode: "rag_only"

Debug:

# Check fusion health
curl http://localhost:8081/api/fusion/health | jq

# Look for services reporting ok: false

High Latency

Debug:

# Use fusion debug for timing breakdown
curl -X POST http://localhost:8081/api/fusion/answer \
  -H "Content-Type: application/json" \
  -d '{"question": "test", "roomId": "peace"}' | jq '.timings'

# Look for slow operations:
# - healthCheckMs (should be < 50ms)
# - ragQueryMs (should be < 150ms)
# - smartCampusQueryMs (should be < 100ms)
# - mlxGenerationMs (depends on model, typically 250-600ms)

Documentation

Phase-4 Documents

docs/PHASE4_SERVICE_TOPOLOGY.md - Overall architecture (700+ lines)
docs/PHASE4_TIER3A_MLX_CONTRACT.md - MLX server API contract
docs/PHASE4_TIER3B_RAG_CONTRACT.md - RAG engine API contract
docs/PHASE4_SMART_CAMPUS_INTEGRATION.md - Smart Campus integration guide (NEW)
docs/PHASE4_PROVIDER_CONTRACTS.md - Tier-2 provider reference (950 lines)
docs/PHASE4_COMPLETION_REPORT.md - Implementation summary
PHASE4_HANDOFF.md - Quick handoff guide

Summary

Tier-2 Orchestrator Status: 🟢 PRODUCTION READY

Capabilities:

✅ OpenAI API compatibility
✅ RAG-enhanced responses
✅ Smart Campus context integration
✅ Multi-step reasoning
✅ Graceful degradation
✅ End-to-end tracing
✅ Health gating
✅ 100% local-first (with cloud fallback option)

Integrations:

✅ Tier-3A (MLX) - LLM inference & embeddings
✅ Tier-3B (RAG) - Semantic search & vector DB
✅ Tier-3C (Smart Campus) - Room & entity context

Next Steps:

Start all Tier-3 services
Configure environment variables
Run integration tests
Deploy orchestrator
Monitor /internal/health and logs

Version: 1.0 Date: 2025-11-20 Maintainer: Phase-4 Team

FilesExpand file tree

TIER2_ORCHESTRATOR_README.md

Latest commit

History

TIER2_ORCHESTRATOR_README.md

File metadata and controls

Tier-2 HTDI Orchestrator

Overview

Architecture

Key Features

✅ OpenAI API Compatibility

✅ Fusion Orchestration

✅ Smart Campus Integration (NEW)

✅ Request Tracing

✅ Health Gating

Quick Start

1. Start Tier-3 Services

2. Configure Environment

3. Start Orchestrator

4. Verify Integration

API Reference

OpenAI Gateway Endpoints

POST /api/v1/chat/completions

POST /api/v1/embeddings

GET /api/v1/models

Fusion Endpoints

POST /api/fusion/answer

POST /api/fusion/multistep

Smart Campus Endpoints

GET /api/smartcampus/rooms/:roomId

GET /api/smartcampus/entities/:entityId

POST /api/smartcampus/entities/batch

Provider Architecture

Provider Layer

Degradation Modes

1. fusion (All Services Healthy)

2. fusion (MLX + RAG Only)

3. mlx_only (MLX Only)

4. rag_only (RAG Only)

5. error (All Down)

Configuration

Required Environment Variables

Optional Environment Variables

Testing

Run All Tests

Run Specific Test Suites

Manual Testing

Performance

Latency Targets

Throughput

Troubleshooting

Services Not Running

Fusion Returns Degraded Mode

High Latency

Documentation

Phase-4 Documents

Summary

POST `/api/v1/chat/completions`

POST `/api/v1/embeddings`

GET `/api/v1/models`

POST `/api/fusion/answer`

POST `/api/fusion/multistep`

GET `/api/smartcampus/rooms/:roomId`

GET `/api/smartcampus/entities/:entityId`

POST `/api/smartcampus/entities/batch`