Releases: acryldata/mcp-server-datahub
Releases · acryldata/mcp-server-datahub
v0.5.2
Fixed
- HTTP transport ContextVar propagation: Fixed
LookupErrorfor_mcp_dh_clientContextVar when running with HTTP transport (stateless_http=True). Each HTTP request runs in a separate async context that doesn't inherit ContextVars from the main thread, causingDocumentToolsMiddlewareandVersionFilterMiddlewareto fail. Added_DataHubClientMiddlewarethat sets the ContextVar at the start of every MCP message. create_app()initialization safety: The_app_initializedflag is now set only after all middleware is successfully added, so a failed setup can be retried.--debugmiddleware ordering:LoggingMiddlewareis now added before other middlewares so it wraps the full request/response lifecycle for maximum visibility.
Added
create_app()factory function: Extracted server setup into a factory function so thatfastmcp dev/fastmcp runwork correctly (they import the module but never callmain()).- Multi-mode smoke testing:
smoke_check.pynow supports--urland--stdio-cmdoptions to test against running HTTP/SSE servers or stdio subprocesses, in addition to the default in-process mode. test_all_modes.shorchestrator: Runs smoke checks across all 5 transport modes (in-process, HTTP, SSE, stdio,fastmcp run), with per-mode log capture toscripts/logs/.SMOKE_CHECK.md: Documentation with step-by-step reproduction instructions for all transport modes.- Core tool validation: Smoke check now verifies that all 8 core read-only tools are present, catching silent regressions in tool registration or middleware filtering.
Full Changelog: v0.5.1...v0.5.2
v0.5.1
Fixed
list_schema_fields— Fixed crash when a dataset has no schema metadata. Now gracefully returns an empty fields list.save_document— Errors (e.g., authorization failures) are now raised as exceptions instead of being silently swallowed. LLM agents now see the actual error message.update_description— Hidden from OSS instances where entity-level description updates are not supported. Available on Cloud only.
Added
scripts/smoke_check.py— Comprehensive smoke check script that exercises all available MCP tools against a live DataHub instance. Discovers URNs dynamically, respects version filtering middleware, and tests mutation tools with add-then-remove pairs. Usage:uv run python scripts/smoke_check.py --all
Changed
- Version-aware tool filtering:
update_descriptionnow requires Cloud (@min_version(cloud="0.3.16")), previously also allowed on OSS >= 1.4.0.
Full Changelog: v0.5.0...v0.5.1
v0.5.0
New Tools
Mutation Tools
New tools for modifying metadata in DataHub. Enabled via TOOLS_IS_MUTATION_ENABLED=true.
add_tags/remove_tags— Add or remove tags from entities or schema fields. Supports bulk operations.add_terms/remove_terms— Add or remove glossary terms from entities or schema fields.add_owners/remove_owners— Add or remove ownership assignments. Supports different ownership types.set_domains/remove_domains— Assign or remove domain membership for entities.update_description— Update, append to, or remove descriptions for entities or schema fields.add_structured_properties/remove_structured_properties— Manage typed metadata fields on entities.
User Tools
get_me— Retrieve information about the currently authenticated user. Enabled viaTOOLS_IS_USER_ENABLED=true.
Document Tools
New tools for working with documents (knowledge articles, runbooks, FAQs) stored in DataHub. Automatically hidden when no documents exist in the catalog.
search_documents— Search for documents using keyword search with filters.grep_documents— Search within document content using regex patterns.save_document— Save standalone documents to DataHub's knowledge base.
Enhancements
- Semantic search support — Enable AI-powered semantic search for documents via
SEMANTIC_SEARCH_ENABLED=true. - Document tools middleware — Automatically hides document tools when no documents exist, with a cached existence check (1-minute TTL).
- Upgraded FastMCP to 2.14.5 (from 2.10.5) with MCP SDK 1.26.0 compatibility.
- Relaxed pydantic pin to
>=2.0,<3(was<2.12).
Breaking Changes
- Python 3.11+ is now required (previously 3.10+).
acryl-datahub >= 1.3.1.7is now required.- MCP Inspector: Use
fastmcp devinstead ofmcp devfor development.
Security
- Added
SECURITY.mdwith vulnerability reporting guidelines. - Bumped
authlib(1.6.0 → 1.6.6),urllib3(2.4.0 → 2.6.3),aiohttp(3.12.7 → 3.13.3),python-multipart(0.0.20 → 0.0.22).
New Environment Variables
| Variable | Default | Description |
|---|---|---|
TOOLS_IS_MUTATION_ENABLED |
false |
Enable mutation tools |
TOOLS_IS_USER_ENABLED |
false |
Enable user tools |
DATAHUB_MCP_DOCUMENT_TOOLS_DISABLED |
false |
Completely disable document tools |
SAVE_DOCUMENT_TOOL_ENABLED |
true |
Enable/disable save_document |
SAVE_DOCUMENT_PARENT_TITLE |
Shared |
Parent folder title for saved documents |
SAVE_DOCUMENT_ORGANIZE_BY_USER |
false |
Organize saved documents by user |
SAVE_DOCUMENT_RESTRICT_UPDATES |
true |
Only allow updating documents in shared folder |
SEMANTIC_SEARCH_ENABLED |
false |
Enable semantic (AI-powered) search |
new mcp tools and other improvements
Response Token Budget Management
- New
TokenCountEstimatorclass for fast token counting using character-based heuristics - Automatic result truncation via
_select_results_within_budget()to prevent context window issues - Configurable token limits:
TOOL_RESPONSE_TOKEN_LIMITenvironment variable (default: 80,000 tokens)ENTITY_SCHEMA_TOKEN_BUDGETenvironment variable (default: 16,000 tokens per entity)
- 90% safety buffer to account for token estimation inaccuracies
- Ensures at least one result is always returned
Enhanced Search Capabilities
- Enhanced Keyword Search:
- Supports pagination with
startparameter - Added
viewUrnfor view-based filtering - Added
sortInputfor custom sorting
- Supports pagination with
Query Entity Support
- Native QueryEntity type support (SQL queries as first-class entities)
- New
query_entity.gqlGraphQL query - Optimized entity retrieval with specialized query for QueryEntity types
- Includes query statement, subjects (datasets/fields), and platform information
GraphQL Compatibility
- Adaptive field detection for newer GMS versions
- Caching mechanism for GMS version detection
- Graceful fallback when newer fields aren't available
- Support for
#[CLOUD]and#[NEWER_GMS]conditional field markers DISABLE_NEWER_GMS_FIELD_DETECTIONenvironment variable override
Schema Field Optimization
- Smart field prioritization to stay within token budgets:
- Primary key fields (
isPartOfKey=true) - Partitioning key fields (
isPartitioningKey=true) - Fields with descriptions
- Fields with tags or glossary terms
- Alphabetically by field path
- Primary key fields (
- Generator-based approach for memory efficiency
Error Handling & Security
- Enhanced error logging with full stack traces in
async_backgroundwrapper - Logs function name, args, and kwargs on failures
- ReDoS protection in HTML sanitization with bounded regex patterns
- Query truncation function (configurable via
QUERY_LENGTH_HARD_LIMIT, default: 5,000 chars)
Default Views Support
- Automatic default view application for all search operations
- Fetches organization's default global view from DataHub
- 5-minute caching (configurable via
VIEW_CACHE_TTL_SECONDS) - Can be disabled via
DATAHUB_MCP_DISABLE_DEFAULT_VIEWenvironment variable - Ensures search results respect organization's data governance policies
Dependencies
- Added
cachetools>=5.0.0: For GMS field detection caching - Added
types-cachetools(dev): Type stubs for mypy
Performance
- Memory efficiency: Generator-based result selection avoids loading all results into memory
- Caching: GMS version detection cached per graph instance
- Fast token estimation: Character-based heuristic (no tokenizer overhead)
- Smart truncation: Truncates less important schema fields first
v0.3.10
fix: workaround for https://github.com/jlowin/fastmcp/issues/1377 (#50)
v0.3.9
What's Changed
- feat: support get_dataset_queries, get_lineage for specific column by @mayurinehate in #37
- feat: add view definition in get_entity by @mayurinehate in #38
New Contributors
- @mayurinehate made their first contribution in #37
Full Changelog: v0.3.8...v0.3.9