Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 37 additions & 4 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,44 @@
### Runtime build context reduction
### Runtime build context reduction
dist
node_modules
.vscode
.github

### VCS / metadata
.git
# Environment variables
.github
.smithery

### Tool / editor configs
.vscode
*.swp
*.tmp
*.DS_Store
Thumbs.db

### Logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*

### Temp
tmp
**/tmp

### Environment variables & secrets
.env
.env.*
env.list

### Tests & coverage (not needed in runtime image)
tests
coverage
scripts
scripts/accuracy
**/test-data-dumps
.vitest

### Local certificates (copy explicitly if needed)
certs

### Misc local exports
exports
94 changes: 85 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,87 @@
FROM node:22-alpine
ARG VERSION=latest
###
# Optimized multi-stage Dockerfile for mongodb-mcp-server
#
# Build args:
# NODE_VERSION Node.js version (default 22-alpine)
# INSTALL_DEV Keep dev dependencies (true|false, default: false)
# RUNTIME_IMAGE Base runtime image (default: node:22-alpine)
#
# Typical build:
# docker build -t mongodb-mcp-server:local .
# docker build --build-arg INSTALL_DEV=true -t mongodb-mcp-server:dev .
#
# Runtime (stdio transport):
# docker run --rm -it mongodb-mcp-server:local --transport stdio
#
# Runtime (http transport):
# docker run --rm -p 3000:3000 mongodb-mcp-server:local --transport http --httpHost 0.0.0.0
# curl -s -X POST http://localhost:3000/mcp -H 'Content-Type: application/json' \
# -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{}}}'
#
# Optional HTTP auth (Azure Managed Identity):
# docker run --rm -p 3000:3000 \
# -e MDB_MCP_HTTP_AUTH_MODE=azure-managed-identity \
# -e MDB_MCP_AZURE_MANAGED_IDENTITY_TENANT_ID=<tenant-guid> \
# -e MDB_MCP_AZURE_MANAGED_IDENTITY_CLIENT_ID=<app-client-id> \
# mongodb-mcp-server:local --transport http --httpHost 0.0.0.0
###

# syntax=docker/dockerfile:1.7-labs

ARG NODE_VERSION=22-alpine
ARG RUNTIME_IMAGE=node:${NODE_VERSION}
ARG INSTALL_DEV=false

#############################################
# Builder Stage
#############################################
FROM node:${NODE_VERSION} AS builder
WORKDIR /app

# Leverage Docker layer caching: copy only dependency manifests + tsconfigs first (needed by build scripts)
COPY package.json package-lock.json* .npmrc* tsconfig*.json eslint.config.js vitest.config.ts ./

# Install dependencies without running lifecycle scripts (avoid premature build via prepare)
RUN --mount=type=cache,target=/root/.npm \
npm ci --ignore-scripts

# Copy application sources
COPY src ./src
COPY scripts ./scripts

# Now run the build explicitly (includes prepare sequence tasks)
RUN npm run build

# Optionally prune dev dependencies for slimmer runtime
ARG INSTALL_DEV
RUN if [ "${INSTALL_DEV}" != "true" ]; then npm prune --omit=dev; fi

#############################################
# Runtime Stage
#############################################
FROM ${RUNTIME_IMAGE} AS runtime
ENV NODE_ENV=production \
MDB_MCP_LOGGERS=stderr,mcp

# Create non-root user
RUN addgroup -S mcp && adduser -S mcp -G mcp
RUN npm install -g mongodb-mcp-server@${VERSION}
USER mcp
WORKDIR /home/mcp
ENV MDB_MCP_LOGGERS=stderr,mcp
ENTRYPOINT ["mongodb-mcp-server"]
LABEL maintainer="MongoDB Inc <[email protected]>"
LABEL description="MongoDB MCP Server"
LABEL version=${VERSION}

# Copy only required artifacts (preserve ownership in a single layer)
COPY --chown=mcp:mcp --from=builder /app/package*.json ./
COPY --chown=mcp:mcp --from=builder /app/node_modules ./node_modules
COPY --chown=mcp:mcp --from=builder /app/dist ./dist

USER mcp

# Expose default HTTP port (matches default config httpPort=3000)
EXPOSE 3000

LABEL maintainer="MongoDB Inc <[email protected]>" \
org.opencontainers.image.title="mongodb-mcp-server" \
org.opencontainers.image.description="MongoDB MCP Server" \
org.opencontainers.image.source="https://github.com/mongodb-js/mongodb-mcp-server"

# Use exec form for clarity; default command may be overridden at runtime
ENTRYPOINT ["node", "dist/index.js"]
CMD ["--transport", "http"]
145 changes: 144 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A Model Context Protocol server for interacting with MongoDB Databases and Mongo
- [📄 Supported Resources](#supported-resources)
- [⚙️ Configuration](#configuration)
- [Configuration Options](#configuration-options)
- [Vector Search & Embeddings](#vector-search-and-embeddings)
- [Atlas API Access](#atlas-api-access)
- [Configuration Methods](#configuration-methods)
- [Environment Variables](#environment-variables)
Expand Down Expand Up @@ -320,6 +321,7 @@ NOTE: atlas tools are only available when you set credentials on [configuration]
- `collection-storage-size` - Get the size of a collection in MB
- `db-stats` - Return statistics about a MongoDB database
- `export` - Export query or aggregation results to EJSON format. Creates a uniquely named export accessible via the `exported-data` resource.
- `vector-search` - Execute a vector similarity search ($vectorSearch) over a collection. See [Vector Search & Embeddings](#vector-search--embeddings).

## 📄 Supported Resources

Expand Down Expand Up @@ -361,6 +363,13 @@ The MongoDB MCP Server can be configured using multiple methods, with the follow
| `exportTimeoutMs` | `MDB_MCP_EXPORT_TIMEOUT_MS` | 300000 | Time in milliseconds after which an export is considered expired and eligible for cleanup. |
| `exportCleanupIntervalMs` | `MDB_MCP_EXPORT_CLEANUP_INTERVAL_MS` | 120000 | Time in milliseconds between export cleanup cycles that remove expired export files. |
| `atlasTemporaryDatabaseUserLifetimeMs` | `MDB_MCP_ATLAS_TEMPORARY_DATABASE_USER_LIFETIME_MS` | 14400000 | Time in milliseconds that temporary database users created when connecting to MongoDB Atlas clusters will remain active before being automatically deleted. |
| `vectorSearchPath` | `MDB_MCP_VECTOR_SEARCH_PATH` | <not set> | Default vector field path used by `vector-search` (V2 mode). If set together with `vectorSearchIndex`, the V2 vector search tool variant is enabled. |
| `vectorSearchIndex` | `MDB_MCP_VECTOR_SEARCH_INDEX` | <not set> | Default vector search index name used by `vector-search` (V2 mode). Must be set with `vectorSearchPath` to enable V2 mode. |
| `embeddingModelProvider` | `MDB_MCP_EMBEDDING_MODEL_PROVIDER` | azure-ai-inference | Embedding model provider identifier. Currently only `azure-ai-inference` is supported. |
| `embeddingModelEndpoint` | `MDB_MCP_EMBEDDING_MODEL_ENDPOINT` | <not set> | Endpoint for the embedding model provider. Required for vector search. |
| `embeddingModelApikey` | `MDB_MCP_EMBEDDING_MODEL_APIKEY` | <not set> | API key/credential for the embedding model provider. Required for vector search. |
| `embeddingModelDeploymentName` | `MDB_MCP_EMBEDDING_MODEL_DEPLOYMENT_NAME` | <not set> | Deployment/model name to use when requesting embeddings. Required for vector search. |
| `embeddingModelDimension` | `MDB_MCP_EMBEDDING_MODEL_DIMENSION` | <not set> | (Optional) Expected embedding dimension for validation (provider specific). |

#### Logger Options

Expand Down Expand Up @@ -482,6 +491,140 @@ You can disable telemetry using:

> **💡 Platform Note:** For Windows users, see [Environment Variables](#environment-variables) for platform-specific instructions.

### Vector Search and Embeddings

The `vector-search` tool lets you run semantic similarity queries against a MongoDB collection using the `$vectorSearch` aggregation stage. This capability is disabled unless a valid embedding configuration is supplied (see below).

#### Overview

Two internal variants of the `vector-search` tool may register depending on configuration:

1. V1 (argument-driven): You supply `path` and optionally `index` as tool arguments each call.
2. V2 (config-driven): You preconfigure both `vectorSearchPath` and `vectorSearchIndex` in server config; the tool omits those arguments and always searches that path/index.

Variant selection rules:

- If BOTH `MDB_MCP_VECTOR_SEARCH_PATH` and `MDB_MCP_VECTOR_SEARCH_INDEX` are set at startup → V2 registers.
- If NEITHER (or only one) of those is set → V1 registers, and you must provide a `path` argument per invocation (and may provide `index`).
- If embedding config is incomplete, the tool is not registered (you will see a warning in logs).

#### Required MongoDB Setup

1. A collection with a vector field (array of float/number values) containing stored embeddings.
2. A vector search index created on that field (e.g. Atlas Search vector index) when you want to leverage indexing for performance/recall.

#### Embedding Configuration (Required)

You must configure an embedding provider so the server can transform the `queryText` you pass in into a numeric embedding vector. Current provider support:

- `azure-ai-inference` (default if none specified)

Set the following environment variables (or CLI args) for Azure AI Inference:

```bash
export MDB_MCP_EMBEDDING_MODEL_ENDPOINT="https://your-azure-resource.services.ai.azure.com/models/embeddings?api-version=2024-05-01-preview"
export MDB_MCP_EMBEDDING_MODEL_APIKEY="<azure-api-key>"
export MDB_MCP_EMBEDDING_MODEL_DEPLOYMENT_NAME="text-embedding-3-large" # or your deployed embedding model
# (Optional) if you want to assert embedding size
export MDB_MCP_EMBEDDING_MODEL_DIMENSION=3072
```

Without these, `vector-search` will not register.

#### Optional Vector Search Defaults (Enable V2 Mode)

To eliminate passing `path` (and optionally `index`) each call, set both:

```bash
export MDB_MCP_VECTOR_SEARCH_PATH="embedding" # e.g. field path storing embeddings
export MDB_MCP_VECTOR_SEARCH_INDEX="myVectorIndex" # name of the Atlas Search vector index
```

If both are present at startup, the V2 variant is loaded and you no longer pass `path`/`index` arguments at call time. Remove one or both to revert to V1.

#### Usage Examples

##### Example 1: V1 Variant (no defaults configured)

Tool invocation arguments:

```json
{
"name": "vector-search",
"arguments": {
"database": "mydb",
"collection": "articles",
"queryText": "vector databases for personalization",
"path": "embedding",
"limit": 5,
"numCandidates": 200,
"includeVector": false
}
}
```

##### Example 2: V2 Variant (defaults configured)

With `MDB_MCP_VECTOR_SEARCH_PATH=embedding` and `MDB_MCP_VECTOR_SEARCH_INDEX=myVectorIndex` set at startup:

```json
{
"name": "vector-search",
"arguments": {
"database": "mydb",
"collection": "articles",
"queryText": "vector databases for personalization",
"limit": 5,
"numCandidates": 200
}
}
```

#### Returned Data

The tool returns an array of matched documents. By default the raw embedding field is excluded (set `includeVector: true` if you need it). Standard result size safeguards (`maxDocumentsPerQuery`, `maxBytesPerQuery`) still apply.

#### Adding a Custom Embedding Provider

You can extend the server to support additional embedding services (e.g. OpenAI, Hugging Face, Vertex AI) by implementing the `EmbeddingProvider` interface:

`src/embedding/embeddingProvider.ts`:

```ts
export interface EmbeddingProvider {
name: string;
embed(input: string[]): Promise<number[][]>;
}
```

Steps:

1. Create a new file under `src/embedding/`, e.g. `myProviderEmbeddingProvider.ts`, implementing the interface.
2. Add a new case in `EmbeddingProviderFactory.create()` & `isEmbeddingConfigValid()` matching a unique `embeddingModelProvider` string (e.g. `my-provider`).
3. Document required env vars (e.g. `MDB_MCP_EMBEDDING_MODEL_ENDPOINT`, `MDB_MCP_EMBEDDING_MODEL_APIKEY`, etc. or new ones) and update README.
4. (Optional) Support provider‑specific validation (dimension, model name) in `assertEmbeddingConfigValid`.
5. Provide tests (unit + integration if vector search depends on it) ensuring your provider returns deterministic dimensionality.

After adding your provider, users enable it by setting:

```bash
export MDB_MCP_EMBEDDING_MODEL_PROVIDER=my-provider
# plus any provider-specific variables you defined
```

If your provider requires different variable names, follow the existing naming convention: prefix with `MDB_MCP_` and document them.

#### Troubleshooting

| Symptom | Likely Cause | Action |
| ------- | ------------ | ------ |
| `vector-search` tool missing | Incomplete embedding config | Set endpoint, api key, deployment name env vars. Restart client. |
| Error: "Embedding provider returned empty embedding" | Provider/network issue | Check credentials & network; verify model supports embeddings. |
| Error requiring 'path' even though I set env vars | Only one of PATH/INDEX set | Set BOTH `MDB_MCP_VECTOR_SEARCH_PATH` and `MDB_MCP_VECTOR_SEARCH_INDEX` or remove both. |
| High latency | Large `numCandidates` or remote model slowness | Lower `numCandidates`; verify model region proximity. |

---

### Atlas API Access

To use the Atlas API tools, you'll need to create a service account in MongoDB Atlas:
Expand Down Expand Up @@ -680,6 +823,6 @@ connecting to the Atlas API, your MongoDB Cluster, or any other external calls
to third-party services like OID Providers. The behaviour is the same as what
`mongosh` does, so the same settings will work in the MCP Server.

## 🤝Contributing
## Contributing

Interested in contributing? Great! Please check our [Contributing Guide](CONTRIBUTING.md) for guidelines on code contributions, standards, adding new tools, and troubleshooting information.
Loading
Loading