Name	Name	Last commit message	Last commit date
parent directory ..
harper	harper
src	src
tests	tests
.releaserc.json	.releaserc.json
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
package.json	package.json
tsconfig.json	tsconfig.json
vitest.config.ts	vitest.config.ts

@harperfast/cortex-mcp-server

A remote MCP (Model Context Protocol) server that exposes Harper Cortex memory and development context as tools to Claude, Cursor, Windsurf, Copilot, and any MCP-compatible client.

This is the lowest-friction entry point into the Harper ecosystem. Users can add persistent, distributed memory to any AI tool by pointing it at a public URL — no local installation, no CLI, no code required.

Features

Persistent Memory — Store and retrieve facts, decisions, and context using semantic search
Multi-Client Support — Works with Claude (web, desktop, mobile), Claude Code, Cursor, Windsurf, Copilot, and any MCP-compatible client
Server-Side Embeddings — No API keys needed on the client; Cortex handles all embedding with ONNX
Multi-Agent Sharing — Share memory across agents and users with namespace isolation
Self-Hosted — Deploy on your own infrastructure or Harper Cloud
Production-Ready — Real database with ACID guarantees, no local files or SQLite limitations

Quick Start

1. Claude Desktop / Claude.ai

Open Settings → Connectors → Add custom connector
Enter the URL: https://my-instance.harpercloud.com/mcp
(Optional) Authenticate with Bearer token if required
You now have persistent memory tools available in Claude

2. Claude Code

# Install cortex-mcp-server on your machine or server
npm install -g @harperfast/cortex-mcp-server

# Add it as an MCP server
claude mcp add cortex -- cortex-mcp-server --url https://my-instance.harpercloud.com

3. Cursor / Windsurf

Add to your MCP configuration file (.cursor/mcp.json or similar):

{
	"mcpServers": {
		"cortex": {
			"url": "https://my-instance.harpercloud.com/mcp",
			"env": {
				"CORTEX_TOKEN": "your-bearer-token"
			}
		}
	}
}

4. Local Development

# Clone the repo
git clone https://github.com/HarperFast/cortex-mcp-server.git
cd cortex-mcp-server

# Install dependencies
npm install

# Start in HTTP mode
npm run dev

# Or use npx directly
npx @harperfast/cortex-mcp-server --url https://my-cortex.harpercloud.com --port 3000

Configuration

Environment Variables

CORTEX_URL (required) — URL of your Cortex instance (e.g., https://my-instance.harpercloud.com)
CORTEX_TOKEN (optional) — Bearer token for authentication
CORTEX_SCHEMA (optional) — Schema name in Cortex (default: data)
PORT (optional) — Port to listen on for HTTP server (default: 3000)
HOST (optional) — Host to bind to (default: 0.0.0.0)
AUTH_REQUIRED (optional) — Require authentication (default: true)
HTTP_SERVER (optional) — Use HTTP transport instead of stdio (default: false)
MULTI_TENANT (optional) — Set to "true" to enable multi-tenant mode (JWT auth, namespace enforcement, rate limiting)
JWKS_URL (optional) — JWKS endpoint for JWT validation (required in multi-tenant mode)
ADMIN_TOKEN (optional) — Static token for admin API access

Command-Line Arguments

cortex-mcp-server \
  --url https://my-cortex.harpercloud.com \
  --token your-bearer-token \
  --port 3000 \
  --host localhost \
  --no-auth \
  --multi-tenant \
  --jwks-url <url> \
  --admin-token <token>

WARNING: Running with --no-auth exposes all memory data without authentication. Only use in isolated development environments.

Available Tools

Standard Tools (always available)

Tool	Description	Input	Output
`memory_search`	Search memories by semantic similarity	`query`, `limit?`, `filters?`	Results with similarity scores
`memory_store`	Store a new memory	`text`, `source?`, `classification?`, `metadata?`	Memory ID and timestamp
`memory_recall`	Retrieve a specific memory by ID	`id`	Full memory record
`memory_forget`	Delete a memory	`id`	Deletion confirmation
`memory_count`	Count stored memories	`filters?`	Total count
`synapse_search`	Search development context	`query`, `projectId`, `limit?`, `filters?`	Context entries with scores
`synapse_ingest`	Ingest context from a tool	`source`, `content`, `projectId`, `parentId?`, `references?`	Stored entries and count

Admin Tools (multi-tenant mode only)

Tool	Description
`admin_create_tenant`	Create a new tenant with namespace and security policy
`admin_list_tenants`	List all tenants, optionally filtered by status
`admin_get_tenant`	Get details for a specific tenant
`admin_update_tenant`	Update tenant name, tier, status, or quotas
`admin_issue_token`	Generate JWT claims for a tenant
`admin_revoke_token`	Revoke a specific JWT token

Usage Examples

In Claude

You: "Remember that we use event-driven architecture for our order service"
Claude: [Uses memory_store] "Stored. I've saved that your order service uses event-driven architecture."

You: "What's the architecture for the order service?"
Claude: [Uses memory_search] "Based on our notes, your order service uses event-driven architecture."

In Claude Code or Cursor

When ingesting context from your codebase:

[Claude/Cursor detects you're working on authentication]
[Uses synapse_ingest] Stores: "Intent: Implement JWT-based auth"

Later:
[Uses synapse_search] Retrieves: "Previous decision: JWT-based auth with 24h expiry"

Deployment Options

Option 1: Standalone (npx)

npx @harperfast/cortex-mcp-server --url https://my-cortex.harpercloud.com --port 3000

Runs on any Node.js host (VPS, laptop, container orchestration).

Option 2: Docker

# Build
docker build -t cortex-mcp-server:latest .

# Run
docker run \
  -e CORTEX_URL=https://my-cortex.harpercloud.com \
  -e CORTEX_TOKEN=your-token \
  -p 3000:3000 \
  cortex-mcp-server:latest

Option 3: Harper Cloud (Custom Functions)

Deploy directly on Harper:

harper deploy cortex-mcp-server

The MCP server runs in the same instance as your Cortex data, with zero additional infrastructure.

Option 4: Docker Compose

version: '3.8'
services:
  cortex-mcp:
    image: harperfast/cortex-mcp-server:latest
    environment:
      CORTEX_URL: https://my-cortex.harpercloud.com
      CORTEX_TOKEN: ${CORTEX_TOKEN}
      PORT: 3000
    ports:
      - "3000:3000"
    restart: unless-stopped

Authentication

The server supports Bearer token authentication:

Authorization: Bearer <your-token>

In multi-tenant setups, include the user ID in the token:

Authorization: Bearer user-123:secret-token

The server extracts user-123 and scopes all memory operations to that user's namespace.

Auth Layering: Cortex relies on Harper/Fabric platform authentication. Ensure authentication.requireAuthentication is enabled in your Harper config to enforce security at the instance level.

Architecture

┌─────────────────────────────────────┐
│  Claude / Cursor / Windsurf / etc.  │
│  (MCP-compatible client)            │
└────────────┬────────────────────────┘
             │
             │ Streamable HTTP or Stdio
             │ MCP Protocol
             │
┌────────────▼────────────────────────┐
│  cortex-mcp-server                  │
│                                     │
│  ├─ memory_search                   │
│  ├─ memory_store                    │
│  ├─ memory_recall                   │
│  ├─ memory_forget                   │
│  ├─ memory_count                    │
│  ├─ synapse_search                  │
│  └─ synapse_ingest                  │
└────────────┬────────────────────────┘
             │
             │ HTTP + Bearer auth
             │ @harperfast/cortex-client
             │
┌────────────▼────────────────────────┐
│  Harper Cortex                      │
│  (Memory + Synapse database)        │
│                                     │
│  ├─ Vector Search (ONNX)            │
│  ├─ Metadata Filtering              │
│  ├─ Multi-agent Namespaces          │
│  └─ ACID Transactions               │
└─────────────────────────────────────┘

Development

Build from source

git clone https://github.com/HarperFast/cortex-mcp-server.git
cd cortex-mcp-server

npm install
npm run build

Run tests

npm test

Local development with live reload

npm run dev

This starts the server in HTTP mode with hot reload. By default, it connects to http://localhost:8000 for Cortex.

Troubleshooting

"Connection refused" error

Check that CORTEX_URL is correct and the Cortex instance is running
Verify network connectivity: curl https://my-cortex.harpercloud.com/api/health

"Authentication failed" error

Ensure CORTEX_TOKEN is set correctly if your Cortex instance requires auth
Check that the token hasn't expired

Memory not persisting

Verify Cortex is using a persistent database (not in-memory)
Check that the CORTEX_SCHEMA matches your Cortex configuration

Tool not appearing in client

Restart your MCP client after deploying a new version
Check that the server is running and reachable: curl http://localhost:3000/health

MCP connection fails silently

If the MCP connection fails, Claude may silently fall back to local file-based memory. Verify the connection is active by:

curl http://localhost:3000/mcp/health

Or use Claude's built-in diagnostic:

/mcp

This shows all connected MCP servers and their status.

API Reference

For detailed API specifications, see cortex-client.

Data Handling & Compliance

Operators are responsible for ensuring Prohibited Data (PII, PHI, government IDs) is not stored unless covered by their Harper Order. This is specified in PaaS ToS Section 3.3.

All memory storage operations are protected by content sanitization that detects and blocks or sanitizes injection patterns, control characters, and oversized payloads. However, this protection assumes legitimate data. Do not store sensitive personal information without explicit legal coverage.

Security Model

Single-Tenant (Default)

The MCP server is designed for single-tenant deployment: one Cortex instance per team. Auth is handled by Harper's native HTTP auth layer (Basic auth or Bearer tokens configured at the instance level). The MCP server inherits this — no additional auth is needed beyond what Harper provides.

VectorSearch is intentionally excluded from MCP. The VectorSearch endpoint accepts pre-computed embedding vectors and is available for trusted server-to-server paths (e.g., LangChain running in your backend). It is not exposed through MCP because untrusted clients could craft adversarial vectors to poison the vector space or trick dedup into overwriting legitimate memories.

Multi-Tenant

For multi-tenant deployments where multiple users share a single Cortex instance, the server implements:

JWT auth with RS256 JWKS validation — Tokens validated against JWKS endpoint for secure, stateless auth
Server-side namespace enforcement — agentId bound from JWT ns claim, client values overwritten
Per-tenant rate limiting with 3 tiers:
- Free: 60 reads/20 writes per minute
- Team: 300/100 per minute
- Enterprise: 1000/500 per minute
Scope-based access control — memory:read, memory:write, synapse:read, synapse:write
Token revocation with 60s cache TTL
Content audit logging — All operations logged for compliance

Important: Ensure all tenants are provisioned as authorized Users under your Harper subscription (per PaaS ToS Section 3.2).

See docs/multi-tenant-design.md for the full architecture proposal.

Content Safety

All memory storage operations pass through content sanitization that:

Detects and strips prompt injection patterns (system markers, instruction overrides, delimiter injection)
Removes script tags, SQL-like injection, and control characters
Enforces content length limits (16KB)
Normalizes Unicode (NFKC)
Blocks content with detected injection patterns (configurable: block vs. sanitize-and-store)

Retrieval also applies a lighter sanitization pass to prevent stored payloads from reaching LLM clients.

Production Deployment

Rate Limiting

Embedding generation and vector search are compute-intensive. In production:

Harper deployment: Configure rate limits at the Harper instance level via config.yaml or Fabric policies
Standalone deployment: Place an HTTP rate limiter (nginx, Cloudflare, express-rate-limit) in front of the MCP server
Rate limits should be per-tenant for multi-tenant deployments, with separate budgets for reads vs. writes

Network Placement

Internal/team use: Deploy alongside Cortex in your private network. No DMZ needed.
Public-facing: Place the MCP server in a DMZ with strict ingress controls. Cortex must NOT be directly accessible from the internet. The MCP server acts as the auth boundary.
OpenShell/NemoClaw: These environments block internal IPs by default. Cortex must be reachable at a routable HTTPS address. Use the standalone deployment mode with CORTEX_URL pointing to your public Cortex endpoint.

Harper Deployment (Recommended)

When deployed as a Harper component, the MCP endpoint runs inside the Cortex process with direct table access. This eliminates network round-trips and inherits Harper's auth, TLS, and rate limiting automatically. See the harper/ directory.

License

MIT

Contributing

Contributions welcome! Please open issues and PRs on GitHub.

Related Projects

Harper Cortex — The memory database powering this server
cortex-client — TypeScript SDK for Cortex
LangChain Harper Integration — Production RAG with Cortex
OpenClaw — Multi-agent orchestration with shared memory

FilesExpand file tree

cortex-mcp-server

Directory actions

More options