A unified, secure inference service providing consistent access to multiple Large Language Model providers through the Model Context Protocol (MCP).
LLM Caller MCP serves as a centralized AI gateway that enables tools and applications to interact with various LLM providers (OpenAI, Anthropic, LM Studio) through a single, well-defined interface. Built with security, reliability, and operational excellence in mind, it simplifies AI integration while maintaining provider flexibility.
- Multi-Provider Support: Seamlessly route requests to OpenAI, Anthropic, or local LM Studio instances
- MCP Protocol: Standards-based interface for chat, streaming, and embeddings
- Intelligent Routing: Capability-based provider selection with automatic fallbacks
- Security First: Loopback-only operation, token authentication, and comprehensive request validation
- Full Observability: Structured logging, metrics, tracing, and health monitoring
- Production Ready: Rate limiting, retry logic, error handling, and streaming sanitization
cd modules/llm_caller
npm install# Copy example configs
cp config/client-registry.example.json config/client-registry.json
cp config/providers.example.json config/providers.json
cp .env.example .env
# Configure using the interactive CLI
npm run confignpm testnpm run build
node dist/src/index.jsThe service binds to 127.0.0.1:4037 by default (configurable via .env).
POST /mcp/chat
{
"messages": [{"role": "user", "content": "Hello!"}],
"model": "gpt-4",
"provider": "openai"
}POST /mcp/chatStream
// Returns Server-Sent Events (SSE) streamPOST /mcp/embed
{
"inputs": ["text to embed"],
"model": "text-embedding-3-large"
}GET /health // Provider status and capabilities
GET /mcp/models // Available models across all providers┌─────────────────┐
│ Client Tool │
└────────┬────────┘
│ MCP Protocol (HTTP/SSE)
▼
┌─────────────────────────────────┐
│ LLM Caller MCP Service │
│ ┌──────────────────────────┐ │
│ │ Transport Layer │ │
│ │ - Auth & Validation │ │
│ │ - Rate Limiting │ │
│ └──────────┬───────────────┘ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ Provider Orchestrator │ │
│ │ - Routing Logic │ │
│ │ - Retry Policies │ │
│ └──────────┬───────────────┘ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ Provider Adapters │ │
│ │ - OpenAI │ │
│ │ - Anthropic │ │
│ │ - LM Studio │ │
│ └──────────────────────────┘ │
└─────────────────────────────────┘
LLM Caller MCP is designed for local development and trusted environments only. The service implements multiple defense-in-depth layers to protect API credentials, prevent unauthorized access, and maintain audit trails.
- Loopback-Only Binding: Service binds exclusively to
127.0.0.1and refuses external network connections - No Remote Exposure: Not designed for internet-facing deployment without additional security infrastructure
- Startup Validation: Automatically fails if configured to bind to non-loopback addresses
- Token-Based Authentication: Client registry with cryptographically random tokens
- Method Allow-Lists: Per-client granular control over accessible endpoints (
chat,chatStream,embed,models,getHealth) - Token Hashing: Client tokens hashed in logs and metrics to prevent token leakage
- Session Isolation: Each request validated independently with no implicit trust
- JSON Schema Validation: All payloads validated against versioned schemas before processing
- Input Sanitization: Streaming responses sanitized to strip control characters and limit chunk sizes
- Retry Hint Clamping: Provider retry suggestions capped at 60 seconds to prevent untrusted backoff guidance
- Timeout Enforcement: Per-request timeout limits to prevent resource exhaustion
- Environment-Based Secrets: API keys loaded from environment variables, never stored in code or logs
- Secrets Abstraction Layer: Pluggable credential provider enables future integration with Vault, AWS Secrets Manager
- Redaction Pipeline: Sensitive fields (
prompt,rawError, API keys) automatically scrubbed from logs - Debug Payload Controls: Optional
LLM_CALLER_LOG_DEBUG_PAYLOADSflag for local debugging only (disabled by default)
- Per-Token Throttling: Configurable request limits per client token to prevent runaway usage
- HTTP 429 Handling: Rate limit violations return standardized error responses
- Failure Tracking: Hashed client-token failure counts tracked for anomaly detection
- Provider Circuit Breaking: Retry policies with exponential backoff to avoid amplifying provider outages
- Structured Logging: Every request logged with
requestId,traceId, caller identity, provider, and outcome - Sensitive Data Redaction: Prompts, responses, and error details redacted in persistent logs
- Immutable Audit Trail: Logs include timestamps, classifications, and routing decisions for compliance
- Log Rotation: Configurable retention with size-based rotation to manage disk usage
- ❌ You need internet-facing AI inference (use managed services like OpenAI API directly)
- ❌ You require multi-tenant isolation (service uses shared provider credentials)
- ❌ You need certificate-based mTLS authentication (current implementation uses bearer tokens)
- ❌ You must comply with SOC2/HIPAA without additional controls (logging and encryption require external infrastructure)
- ✅ Protect
.envfiles: Ensure API keys are never committed to version control (.envis in.gitignore) - ✅ Rotate tokens regularly: Generate new client registry tokens and update consumer configurations
- ✅ Monitor logs: Review audit logs for unauthorized access attempts or anomalous patterns
- ✅ Restrict file permissions: Set
config/client-registry.jsonand.envto0600(owner read/write only) - ✅ Use separate API keys: Provision dedicated provider API keys for this service (not shared with other applications)
- ✅ Enable rate limiting: Configure
LLM_CALLER_RATE_LIMIT_MAXto prevent cost overruns - ✅ Disable debug mode in production: Never set
LLM_CALLER_LOG_DEBUG_PAYLOADS=truein shared environments
The current HTTP/SSE transport carries inherent risks for streamable MCP deployments:
- No Mutual Attestation: Bearer tokens can be replayed by any local process with access
- No Runtime Manifest Verification: Configuration files are trusted without signature validation
- Limited Transport Security: HTTP over loopback lacks encryption (acceptable for local-only, not for remote)
Planned Phase 3 Security Enhancements (see Architecture):
- STDIO Transport Option: Direct process-to-process communication with handshake keys
- Signed Configuration Manifests: Cryptographic verification of provider and client registry files
- Short-Lived Session Tokens: Nonce-based attestation to prevent token replay
- Operator Confirmation Hooks: Interactive prompts for high-privilege operations
Until these mitigations are implemented, operators must:
- Run the service on trusted developer workstations only
- Use host-based firewalls to block port 4037 from network access
- Monitor process lists for unexpected client connections
- Audit configuration file changes via version control
# Set restrictive permissions on sensitive files
chmod 600 .env
chmod 600 config/client-registry.json
chmod 600 config/providers.json
# Verify loopback binding before starting
grep "LLM_CALLER_HOST=127.0.0.1" .env || echo "WARNING: Non-loopback binding detected!"
# Use separate API keys for development and production
OPENAI_API_KEY=sk-proj-dev-... # Development key with spending limits
ANTHROPIC_API_KEY=sk-ant-test-... # Test key with restricted quotas
# Enable audit logging
LLM_CALLER_LOG_FILE=/var/log/llm-caller/audit.log
LLM_CALLER_LOG_LEVEL=info # Never set to 'debug' in shared environmentsIf you discover a security vulnerability, please do not open a public GitHub issue. Instead:
- Email security details to [security contact to be added]
- Include steps to reproduce, impact assessment, and suggested mitigations
- Allow 90 days for coordinated disclosure before public announcement
See SECURITY.md for our responsible disclosure policy.
Comprehensive documentation is available in the README/ directory:
- Developer Guide: Integration patterns, API reference, and code examples
- Runbook: Operational procedures, monitoring, and troubleshooting
- Pilot Integration Checklist: Step-by-step integration guide for new consumers
Additional technical documentation:
- Architecture: See
project_docs/architecture/LLM_Caller_Architecture.md - Vision & Roadmap: See
project_docs/LLM_Caller_vision.md - Module README: See
modules/llm_caller/README.mdfor detailed setup
# Server configuration
LLM_CALLER_HOST=127.0.0.1
LLM_CALLER_PORT=4037
# Provider credentials
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Logging
LLM_CALLER_LOG_LEVEL=info
LLM_CALLER_LOG_FILE=/var/log/llm-caller.log
LLM_CALLER_LOG_MAX_BYTES=10485760
LLM_CALLER_LOG_MAX_FILES=5
# Rate limiting
LLM_CALLER_RATE_LIMIT_MAX=100
LLM_CALLER_RATE_LIMIT_INTERVAL_MS=60000{
"providers": {
"openai": {
"baseUrl": "https://api.openai.com/v1",
"defaultModel": "gpt-4",
"capabilities": ["chat", "chatStream", "embed"],
"defaults": {
"chat": "gpt-4",
"chatStream": "gpt-4",
"embed": "text-embedding-3-large"
}
},
"lmstudio": {
"baseUrl": "http://localhost:1234/v1",
"defaultModel": "local-model",
"capabilities": ["chat", "chatStream"]
}
}
}- Node.js 20+
- TypeScript 5.5+
- npm 9+
modules/llm_caller/
├── src/ # TypeScript source code
│ ├── adapters/ # Provider implementations
│ ├── config/ # Configuration management
│ ├── secrets/ # Credential providers
│ ├── transport.ts # MCP HTTP/SSE server
│ ├── orchestrator.ts # Request routing
│ ├── logger.ts # Structured logging
│ └── metrics.ts # Telemetry
├── tests/ # Jest test suite
├── config/ # Runtime configuration
└── api/schemas/v1/ # JSON Schema definitions
# Run all tests
npm test
# Run with coverage
npm test -- --coverage
# Run specific test file
npm test -- tests/orchestrator.spec.ts# Lint markdown documentation
npm run lint:md
# Spell check
npm run lint:spellSee CONTRIBUTING.md for development guidelines, code standards, and submission process.
- Phase 1 (Foundation): ✅ Complete
- Phase 2 (Hardening): ✅ Complete
- Phase 3 (Launch Readiness): 🔄 In Progress
All 66 automated tests passing. Production-ready for loopback deployment with comprehensive observability.
[License details to be added]
For operational guidance, see the Runbook.
For integration assistance, see the Developer Guide.
For issues and questions, please open a GitHub issue.