LLM Caller MCP

A unified, secure inference service providing consistent access to multiple Large Language Model providers through the Model Context Protocol (MCP).

Overview

LLM Caller MCP serves as a centralized AI gateway that enables tools and applications to interact with various LLM providers (OpenAI, Anthropic, LM Studio) through a single, well-defined interface. Built with security, reliability, and operational excellence in mind, it simplifies AI integration while maintaining provider flexibility.

Key Features

Multi-Provider Support: Seamlessly route requests to OpenAI, Anthropic, or local LM Studio instances
MCP Protocol: Standards-based interface for chat, streaming, and embeddings
Intelligent Routing: Capability-based provider selection with automatic fallbacks
Security First: Loopback-only operation, token authentication, and comprehensive request validation
Full Observability: Structured logging, metrics, tracing, and health monitoring
Production Ready: Rate limiting, retry logic, error handling, and streaming sanitization

Quick Start

Installation

cd modules/llm_caller
npm install

Configuration

# Copy example configs
cp config/client-registry.example.json config/client-registry.json
cp config/providers.example.json config/providers.json
cp .env.example .env

# Configure using the interactive CLI
npm run config

Run Tests

npm test

Start Server

npm run build
node dist/src/index.js

The service binds to 127.0.0.1:4037 by default (configurable via .env).

Core Capabilities

Chat Completions

POST /mcp/chat
{
  "messages": [{"role": "user", "content": "Hello!"}],
  "model": "gpt-4",
  "provider": "openai"
}

Streaming Responses

POST /mcp/chatStream
// Returns Server-Sent Events (SSE) stream

Embeddings

POST /mcp/embed
{
  "inputs": ["text to embed"],
  "model": "text-embedding-3-large"
}

Health & Discovery

GET /health          // Provider status and capabilities
GET /mcp/models      // Available models across all providers

Architecture

┌─────────────────┐
│   Client Tool   │
└────────┬────────┘
         │ MCP Protocol (HTTP/SSE)
         ▼
┌─────────────────────────────────┐
│   LLM Caller MCP Service        │
│  ┌──────────────────────────┐   │
│  │  Transport Layer         │   │
│  │  - Auth & Validation     │   │
│  │  - Rate Limiting         │   │
│  └──────────┬───────────────┘   │
│             ▼                    │
│  ┌──────────────────────────┐   │
│  │  Provider Orchestrator   │   │
│  │  - Routing Logic         │   │
│  │  - Retry Policies        │   │
│  └──────────┬───────────────┘   │
│             ▼                    │
│  ┌──────────────────────────┐   │
│  │  Provider Adapters       │   │
│  │  - OpenAI                │   │
│  │  - Anthropic             │   │
│  │  - LM Studio             │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘

Security Architecture

🔒 Security-First Design

LLM Caller MCP is designed for local development and trusted environments only. The service implements multiple defense-in-depth layers to protect API credentials, prevent unauthorized access, and maintain audit trails.

Core Security Features

Network Isolation

Loopback-Only Binding: Service binds exclusively to 127.0.0.1 and refuses external network connections
No Remote Exposure: Not designed for internet-facing deployment without additional security infrastructure
Startup Validation: Automatically fails if configured to bind to non-loopback addresses

Authentication & Authorization

Token-Based Authentication: Client registry with cryptographically random tokens
Method Allow-Lists: Per-client granular control over accessible endpoints (chat, chatStream, embed, models, getHealth)
Token Hashing: Client tokens hashed in logs and metrics to prevent token leakage
Session Isolation: Each request validated independently with no implicit trust

Request Security

JSON Schema Validation: All payloads validated against versioned schemas before processing
Input Sanitization: Streaming responses sanitized to strip control characters and limit chunk sizes
Retry Hint Clamping: Provider retry suggestions capped at 60 seconds to prevent untrusted backoff guidance
Timeout Enforcement: Per-request timeout limits to prevent resource exhaustion

Credential Protection

Environment-Based Secrets: API keys loaded from environment variables, never stored in code or logs
Secrets Abstraction Layer: Pluggable credential provider enables future integration with Vault, AWS Secrets Manager
Redaction Pipeline: Sensitive fields (prompt, rawError, API keys) automatically scrubbed from logs
Debug Payload Controls: Optional LLM_CALLER_LOG_DEBUG_PAYLOADS flag for local debugging only (disabled by default)

Rate Limiting & Abuse Prevention

Per-Token Throttling: Configurable request limits per client token to prevent runaway usage
HTTP 429 Handling: Rate limit violations return standardized error responses
Failure Tracking: Hashed client-token failure counts tracked for anomaly detection
Provider Circuit Breaking: Retry policies with exponential backoff to avoid amplifying provider outages

Audit & Compliance

Structured Logging: Every request logged with requestId, traceId, caller identity, provider, and outcome
Sensitive Data Redaction: Prompts, responses, and error details redacted in persistent logs
Immutable Audit Trail: Logs include timestamps, classifications, and routing decisions for compliance
Log Rotation: Configurable retention with size-based rotation to manage disk usage

⚠️ Security Warnings & Deployment Guidance

DO NOT Deploy This Service If:

❌ You need internet-facing AI inference (use managed services like OpenAI API directly)
❌ You require multi-tenant isolation (service uses shared provider credentials)
❌ You need certificate-based mTLS authentication (current implementation uses bearer tokens)
❌ You must comply with SOC2/HIPAA without additional controls (logging and encryption require external infrastructure)

Required Security Practices

✅ Protect .env files: Ensure API keys are never committed to version control (.env is in .gitignore)
✅ Rotate tokens regularly: Generate new client registry tokens and update consumer configurations
✅ Monitor logs: Review audit logs for unauthorized access attempts or anomalous patterns
✅ Restrict file permissions: Set config/client-registry.json and .env to 0600 (owner read/write only)
✅ Use separate API keys: Provision dedicated provider API keys for this service (not shared with other applications)
✅ Enable rate limiting: Configure LLM_CALLER_RATE_LIMIT_MAX to prevent cost overruns
✅ Disable debug mode in production: Never set LLM_CALLER_LOG_DEBUG_PAYLOADS=true in shared environments

Known Limitations & Mitigation Roadmap

The current HTTP/SSE transport carries inherent risks for streamable MCP deployments:

No Mutual Attestation: Bearer tokens can be replayed by any local process with access
No Runtime Manifest Verification: Configuration files are trusted without signature validation
Limited Transport Security: HTTP over loopback lacks encryption (acceptable for local-only, not for remote)

Planned Phase 3 Security Enhancements (see Architecture):

STDIO Transport Option: Direct process-to-process communication with handshake keys
Signed Configuration Manifests: Cryptographic verification of provider and client registry files
Short-Lived Session Tokens: Nonce-based attestation to prevent token replay
Operator Confirmation Hooks: Interactive prompts for high-privilege operations

Until these mitigations are implemented, operators must:

Run the service on trusted developer workstations only
Use host-based firewalls to block port 4037 from network access
Monitor process lists for unexpected client connections
Audit configuration file changes via version control

Credential Management Best Practices

# Set restrictive permissions on sensitive files
chmod 600 .env
chmod 600 config/client-registry.json
chmod 600 config/providers.json

# Verify loopback binding before starting
grep "LLM_CALLER_HOST=127.0.0.1" .env || echo "WARNING: Non-loopback binding detected!"

# Use separate API keys for development and production
OPENAI_API_KEY=sk-proj-dev-...  # Development key with spending limits
ANTHROPIC_API_KEY=sk-ant-test-... # Test key with restricted quotas

# Enable audit logging
LLM_CALLER_LOG_FILE=/var/log/llm-caller/audit.log
LLM_CALLER_LOG_LEVEL=info  # Never set to 'debug' in shared environments

Reporting Security Issues

If you discover a security vulnerability, please do not open a public GitHub issue. Instead:

Email security details to [security contact to be added]
Include steps to reproduce, impact assessment, and suggested mitigations
Allow 90 days for coordinated disclosure before public announcement

See SECURITY.md for our responsible disclosure policy.

Documentation

Comprehensive documentation is available in the README/ directory:

Developer Guide: Integration patterns, API reference, and code examples
Runbook: Operational procedures, monitoring, and troubleshooting
Pilot Integration Checklist: Step-by-step integration guide for new consumers

Additional technical documentation:

Architecture: See project_docs/architecture/LLM_Caller_Architecture.md
Vision & Roadmap: See project_docs/LLM_Caller_vision.md
Module README: See modules/llm_caller/README.md for detailed setup

Configuration

Environment Variables

# Server configuration
LLM_CALLER_HOST=127.0.0.1
LLM_CALLER_PORT=4037

# Provider credentials
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Logging
LLM_CALLER_LOG_LEVEL=info
LLM_CALLER_LOG_FILE=/var/log/llm-caller.log
LLM_CALLER_LOG_MAX_BYTES=10485760
LLM_CALLER_LOG_MAX_FILES=5

# Rate limiting
LLM_CALLER_RATE_LIMIT_MAX=100
LLM_CALLER_RATE_LIMIT_INTERVAL_MS=60000

Provider Configuration

{
  "providers": {
    "openai": {
      "baseUrl": "https://api.openai.com/v1",
      "defaultModel": "gpt-4",
      "capabilities": ["chat", "chatStream", "embed"],
      "defaults": {
        "chat": "gpt-4",
        "chatStream": "gpt-4",
        "embed": "text-embedding-3-large"
      }
    },
    "lmstudio": {
      "baseUrl": "http://localhost:1234/v1",
      "defaultModel": "local-model",
      "capabilities": ["chat", "chatStream"]
    }
  }
}

Development

Requirements

Node.js 20+
TypeScript 5.5+
npm 9+

Project Structure

modules/llm_caller/
├── src/                    # TypeScript source code
│   ├── adapters/          # Provider implementations
│   ├── config/            # Configuration management
│   ├── secrets/           # Credential providers
│   ├── transport.ts       # MCP HTTP/SSE server
│   ├── orchestrator.ts    # Request routing
│   ├── logger.ts          # Structured logging
│   └── metrics.ts         # Telemetry
├── tests/                 # Jest test suite
├── config/                # Runtime configuration
└── api/schemas/v1/        # JSON Schema definitions

Running Tests

# Run all tests
npm test

# Run with coverage
npm test -- --coverage

# Run specific test file
npm test -- tests/orchestrator.spec.ts

Code Quality

# Lint markdown documentation
npm run lint:md

# Spell check
npm run lint:spell

Contributing

See CONTRIBUTING.md for development guidelines, code standards, and submission process.

Current Status

Phase 1 (Foundation): ✅ Complete
Phase 2 (Hardening): ✅ Complete
Phase 3 (Launch Readiness): 🔄 In Progress

All 66 automated tests passing. Production-ready for loopback deployment with comprehensive observability.

License

[License details to be added]

Support

For operational guidance, see the Runbook.

For integration assistance, see the Developer Guide.

For issues and questions, please open a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README		README
modules/llm_caller		modules/llm_caller
project_docs		project_docs
uat		uat
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.markdownlintignore		.markdownlintignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
cspell.json		cspell.json
package.json		package.json

Riskjuggler/llm_caller_mcp

Folders and files

Latest commit

History

Repository files navigation

LLM Caller MCP

Overview

Key Features

Quick Start

Installation

Configuration

Run Tests

Start Server

Core Capabilities

Chat Completions

Streaming Responses

Embeddings

Health & Discovery

Architecture

Security Architecture

🔒 Security-First Design

Core Security Features

Network Isolation

Authentication & Authorization

Request Security

Credential Protection

Rate Limiting & Abuse Prevention

Audit & Compliance

⚠️ Security Warnings & Deployment Guidance

DO NOT Deploy This Service If:

Required Security Practices

Known Limitations & Mitigation Roadmap

Credential Management Best Practices

Reporting Security Issues

Documentation

Configuration

Environment Variables

Provider Configuration

Development

Requirements

Project Structure

Running Tests

Code Quality

Contributing

Current Status

License

Support

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages