Skip to content

Releases: acryldata/mcp-server-datahub

v0.5.2

25 Feb 04:51

Choose a tag to compare

Fixed

  • HTTP transport ContextVar propagation: Fixed LookupError for _mcp_dh_client ContextVar when running with HTTP transport (stateless_http=True). Each HTTP request runs in a separate async context that doesn't inherit ContextVars from the main thread, causing DocumentToolsMiddleware and VersionFilterMiddleware to fail. Added _DataHubClientMiddleware that sets the ContextVar at the start of every MCP message.
  • create_app() initialization safety: The _app_initialized flag is now set only after all middleware is successfully added, so a failed setup can be retried.
  • --debug middleware ordering: LoggingMiddleware is now added before other middlewares so it wraps the full request/response lifecycle for maximum visibility.

Added

  • create_app() factory function: Extracted server setup into a factory function so that fastmcp dev / fastmcp run work correctly (they import the module but never call main()).
  • Multi-mode smoke testing: smoke_check.py now supports --url and --stdio-cmd options to test against running HTTP/SSE servers or stdio subprocesses, in addition to the default in-process mode.
  • test_all_modes.sh orchestrator: Runs smoke checks across all 5 transport modes (in-process, HTTP, SSE, stdio, fastmcp run), with per-mode log capture to scripts/logs/.
  • SMOKE_CHECK.md: Documentation with step-by-step reproduction instructions for all transport modes.
  • Core tool validation: Smoke check now verifies that all 8 core read-only tools are present, catching silent regressions in tool registration or middleware filtering.

Full Changelog: v0.5.1...v0.5.2

v0.5.1

12 Feb 02:39
31b9e56

Choose a tag to compare

Fixed

  • list_schema_fields — Fixed crash when a dataset has no schema metadata. Now gracefully returns an empty fields list.
  • save_document — Errors (e.g., authorization failures) are now raised as exceptions instead of being silently swallowed. LLM agents now see the actual error message.
  • update_description — Hidden from OSS instances where entity-level description updates are not supported. Available on Cloud only.

Added

  • scripts/smoke_check.py — Comprehensive smoke check script that exercises all available MCP tools against a live DataHub instance. Discovers URNs dynamically, respects version filtering middleware, and tests mutation tools with add-then-remove pairs. Usage: uv run python scripts/smoke_check.py --all

Changed

  • Version-aware tool filtering: update_description now requires Cloud (@min_version(cloud="0.3.16")), previously also allowed on OSS >= 1.4.0.

Full Changelog: v0.5.0...v0.5.1

v0.5.0

10 Feb 19:16
af77e70

Choose a tag to compare

New Tools

Mutation Tools

New tools for modifying metadata in DataHub. Enabled via TOOLS_IS_MUTATION_ENABLED=true.

  • add_tags / remove_tags — Add or remove tags from entities or schema fields. Supports bulk operations.
  • add_terms / remove_terms — Add or remove glossary terms from entities or schema fields.
  • add_owners / remove_owners — Add or remove ownership assignments. Supports different ownership types.
  • set_domains / remove_domains — Assign or remove domain membership for entities.
  • update_description — Update, append to, or remove descriptions for entities or schema fields.
  • add_structured_properties / remove_structured_properties — Manage typed metadata fields on entities.

User Tools

  • get_me — Retrieve information about the currently authenticated user. Enabled via TOOLS_IS_USER_ENABLED=true.

Document Tools

New tools for working with documents (knowledge articles, runbooks, FAQs) stored in DataHub. Automatically hidden when no documents exist in the catalog.

  • search_documents — Search for documents using keyword search with filters.
  • grep_documents — Search within document content using regex patterns.
  • save_document — Save standalone documents to DataHub's knowledge base.

Enhancements

  • Semantic search support — Enable AI-powered semantic search for documents via SEMANTIC_SEARCH_ENABLED=true.
  • Document tools middleware — Automatically hides document tools when no documents exist, with a cached existence check (1-minute TTL).
  • Upgraded FastMCP to 2.14.5 (from 2.10.5) with MCP SDK 1.26.0 compatibility.
  • Relaxed pydantic pin to >=2.0,<3 (was <2.12).

Breaking Changes

  • Python 3.11+ is now required (previously 3.10+).
  • acryl-datahub >= 1.3.1.7 is now required.
  • MCP Inspector: Use fastmcp dev instead of mcp dev for development.

Security

  • Added SECURITY.md with vulnerability reporting guidelines.
  • Bumped authlib (1.6.0 → 1.6.6), urllib3 (2.4.0 → 2.6.3), aiohttp (3.12.7 → 3.13.3), python-multipart (0.0.20 → 0.0.22).

New Environment Variables

Variable Default Description
TOOLS_IS_MUTATION_ENABLED false Enable mutation tools
TOOLS_IS_USER_ENABLED false Enable user tools
DATAHUB_MCP_DOCUMENT_TOOLS_DISABLED false Completely disable document tools
SAVE_DOCUMENT_TOOL_ENABLED true Enable/disable save_document
SAVE_DOCUMENT_PARENT_TITLE Shared Parent folder title for saved documents
SAVE_DOCUMENT_ORGANIZE_BY_USER false Organize saved documents by user
SAVE_DOCUMENT_RESTRICT_UPDATES true Only allow updating documents in shared folder
SEMANTIC_SEARCH_ENABLED false Enable semantic (AI-powered) search

new mcp tools and other improvements

19 Nov 02:27
8761474

Choose a tag to compare

Response Token Budget Management

  • New TokenCountEstimator class for fast token counting using character-based heuristics
  • Automatic result truncation via _select_results_within_budget() to prevent context window issues
  • Configurable token limits:
    • TOOL_RESPONSE_TOKEN_LIMIT environment variable (default: 80,000 tokens)
    • ENTITY_SCHEMA_TOKEN_BUDGET environment variable (default: 16,000 tokens per entity)
  • 90% safety buffer to account for token estimation inaccuracies
  • Ensures at least one result is always returned

Enhanced Search Capabilities

  • Enhanced Keyword Search:
    • Supports pagination with start parameter
    • Added viewUrn for view-based filtering
    • Added sortInput for custom sorting

Query Entity Support

  • Native QueryEntity type support (SQL queries as first-class entities)
  • New query_entity.gql GraphQL query
  • Optimized entity retrieval with specialized query for QueryEntity types
  • Includes query statement, subjects (datasets/fields), and platform information

GraphQL Compatibility

  • Adaptive field detection for newer GMS versions
  • Caching mechanism for GMS version detection
  • Graceful fallback when newer fields aren't available
  • Support for #[CLOUD] and #[NEWER_GMS] conditional field markers
  • DISABLE_NEWER_GMS_FIELD_DETECTION environment variable override

Schema Field Optimization

  • Smart field prioritization to stay within token budgets:
    1. Primary key fields (isPartOfKey=true)
    2. Partitioning key fields (isPartitioningKey=true)
    3. Fields with descriptions
    4. Fields with tags or glossary terms
    5. Alphabetically by field path
  • Generator-based approach for memory efficiency

Error Handling & Security

  • Enhanced error logging with full stack traces in async_background wrapper
  • Logs function name, args, and kwargs on failures
  • ReDoS protection in HTML sanitization with bounded regex patterns
  • Query truncation function (configurable via QUERY_LENGTH_HARD_LIMIT, default: 5,000 chars)

Default Views Support

  • Automatic default view application for all search operations
  • Fetches organization's default global view from DataHub
  • 5-minute caching (configurable via VIEW_CACHE_TTL_SECONDS)
  • Can be disabled via DATAHUB_MCP_DISABLE_DEFAULT_VIEW environment variable
  • Ensures search results respect organization's data governance policies

Dependencies

  • Added cachetools>=5.0.0: For GMS field detection caching
  • Added types-cachetools (dev): Type stubs for mypy

Performance

  • Memory efficiency: Generator-based result selection avoids loading all results into memory
  • Caching: GMS version detection cached per graph instance
  • Fast token estimation: Character-based heuristic (no tokenizer overhead)
  • Smart truncation: Truncates less important schema fields first

v0.3.10

10 Oct 04:00
8622b3d

Choose a tag to compare

fix: workaround for https://github.com/jlowin/fastmcp/issues/1377 (#50)

v0.3.9

05 Aug 16:48
f7d528a

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.8...v0.3.9

v0.3.8

26 Jul 00:20
b89d749

Choose a tag to compare

What's Changed

Full Changelog: v0.3.7...v0.3.8

v0.3.7

25 Jul 22:00
8a93f22

Choose a tag to compare

What's Changed

Full Changelog: v0.3.6...v0.3.7

v0.3.6

22 Jul 21:30
ea18c45

Choose a tag to compare

What's Changed

  • feat: make the search tool more robust by @hsheth2 in #32
  • bump acryl-datahub to use discriminated unions for filters by @hsheth2 in #33
  • perf: make all tools async by @hsheth2 in #34

Full Changelog: v0.3.5...v0.3.6

v0.3.5

18 Jul 19:41
ab7d023

Choose a tag to compare

What's Changed

Full Changelog: v0.3.4...v0.3.5