Skip to content

Releases: txn2/mcp-data-platform

mcp-data-platform-v0.9.3

30 Jan 09:28
8152041

Choose a tag to compare

SQL Query Semantic Enrichment

Overview

Multi-table semantic enrichment for SQL queries. The enrichment middleware parses SQL to identify all referenced tables and fetches semantic context for each.

Features

  • Multi-table extraction - Identifies all tables in FROM and JOIN clauses
  • Elasticsearch raw_query support - Extracts indices from TABLE(elasticsearch.system.raw_query(...))
  • CTE filtering - Excludes Common Table Expressions from enrichment (not physical tables)
  • Trino 3-part names - Full support for catalog.schema.table syntax

Response Format

Query responses include semantic context for all referenced tables:

{
  "semantic_context": {
    "description": "Primary table description",
    "owners": [...],
    "tags": [...]
  },
  "additional_tables": [
    {"table": "catalog.schema.joined_table", "description": "..."}
  ]
}

Files Changed

File Change
pkg/middleware/sqlextract.go SQL parsing and table extraction
pkg/middleware/sqlextract_test.go Extraction tests
pkg/middleware/semantic.go Integration with enrichment pipeline
pkg/middleware/semantic_test.go Enrichment tests

Test Coverage

95.5% coverage on middleware package.

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.9.3

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.9.3_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.9.3_linux_amd64.tar.gz

mcp-data-platform-v0.9.2

30 Jan 07:05
d68f5fa

Choose a tag to compare

Highlights

This release introduces multi-table SQL extraction for semantic enrichment, enabling the platform to provide complete business context when LLMs execute complex queries that span multiple data sources (Elasticsearch, Cassandra, PostgreSQL, etc.).

New Features

Multi-Table Semantic Enrichment

When an LLM executes a Trino query that references multiple tables, the semantic layer now identifies and enriches ALL physical tables in the query:

  • Elasticsearch raw_query support - Extracts indices from TABLE(elasticsearch.system.raw_query(...)) including comma-separated multi-index queries
  • JOIN extraction - Identifies all tables in INNER/LEFT/RIGHT/CROSS JOINs
  • CTE filtering - Automatically excludes Common Table Expressions (WITH clauses) from enrichment since they're not physical tables
  • Deduplication - Ensures each table is enriched only once even if referenced multiple times

Example:

WITH es_response AS (
    SELECT result FROM TABLE(elasticsearch.system.raw_query(
        index => 'sales-2024,sales-2025', query => '{...}'
    ))
),
parsed AS (SELECT * FROM es_response)
SELECT * FROM parsed p
JOIN cassandra.prod.locations loc ON p.id = loc.id

Extracted tables:

  • elasticsearch.default.sales-2024
  • elasticsearch.default.sales-2025
  • cassandra.prod.locations

Filtered (CTEs): es_response, parsed

Enriched Response Format

Query results now include semantic context for multiple tables:

{
  "semantic_context": {
    "description": "Primary table description",
    "owners": ["data-team@example.com"],
    "tags": ["revenue", "pii"],
    "domain": "Sales",
    "quality_score": 0.95
  },
  "column_context": {
    "customer_id": {
      "description": "Unique customer identifier",
      "is_pii": true,
      "glossary_terms": ["Customer ID"]
    }
  },
  "additional_tables": [
    {
      "table": "elasticsearch.default.sales-2025",
      "description": "2025 sales transactions",
      "owners": ["sales-data@example.com"]
    },
    {
      "table": "cassandra.prod.locations",
      "description": "Store location master data",
      "tags": ["master-data"]
    }
  ]
}

Technical Changes

New Files

File Description
pkg/middleware/sqlextract.go SQL table extraction with CTE filtering
pkg/middleware/sqlextract_test.go Comprehensive test coverage

Modified Files

File Changes
pkg/middleware/semantic.go Integration with multi-table extraction
pkg/middleware/semantic_test.go Updated tests for multi-table scenarios

Dependencies

Dependency Version Notes
github.com/xwb1989/sqlparser existing Used for AST-based table extraction
github.com/txn2/mcp-datahub v0.4.4 Semantic metadata provider

Breaking Changes

None. The enrichment format is additive—existing semantic_context structure is preserved, with new additional_tables array added only when multiple tables are detected.

Migration Guide

No migration required. The feature activates automatically when:

  1. EnrichTrinoResults: true in configuration
  2. A semantic provider (DataHub) is configured
  3. SQL queries reference multiple tables

Configuration

No new configuration options. Existing semantic enrichment settings apply:

semantic:
  provider: datahub
  instance: primary

injection:
  trino_semantic_enrichment: true

Testing

# Run extraction tests
go test -v ./pkg/middleware/... -run TestExtractTablesFromSQL

# Run all middleware tests with race detection
go test -race ./pkg/middleware/...

# Full CI suite
go test -race ./...
golangci-lint run ./...
gosec ./...

Known Limitations

  1. Subquery depth - Deeply nested subqueries may not be fully extracted by the regex fallback
  2. Dynamic SQL - SQL constructed at runtime (e.g., in stored procedures) cannot be analyzed
  3. View resolution - Views are treated as tables; underlying table lineage requires DataHub lineage data

What's Next

  • v0.9.3 - Query pattern detection for smarter context selection
  • v0.10.0 - Column-level lineage propagation across JOINs

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.9.2

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.9.2_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.9.2_linux_amd64.tar.gz

mcp-data-platform-v0.9.1

29 Jan 06:14
d68f5fa

Choose a tag to compare

Highlights

  • Fixed: Search entity type case normalization for DataHub GraphQL queries

Breaking Changes

None.

Dependency Updates

Package Old New
github.com/txn2/mcp-datahub v0.4.3 v0.4.4

Bug Fixes

Search Entity Type Case Normalization

Search requests using camelCase entity types would fail with GraphQL validation errors:

Invalid input for enum 'EntityType'. No value found for name 'glossaryTerm'

Entity types are now automatically normalized to SCREAMING_SNAKE_CASE format:

Input Normalized
glossaryTerm GLOSSARY_TERM
dataProduct DATA_PRODUCT
corpUser CORP_USER
dataFlow DATA_FLOW

This fix is backward-compatible with existing code using uppercase entity types.

Upgrade Instructions

  1. Update your go.mod or pull the latest container image
  2. No configuration changes required

Verification

# Test search with camelCase entity types
datahub_search query="test" entity_types=["glossaryTerm"]

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.9.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.9.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.9.1_linux_amd64.tar.gz

mcp-data-platform-v0.9.0

28 Jan 08:33
a1ce3f6

Choose a tag to compare

Highlights

  • Fixed: Column lineage searches now work with older DataHub GMS versions
  • Fixed: Cache key generation for lineage depth >= 10
  • New: platform_info tool with customizable server description
  • Improved: Performance optimization in table name parsing
  • Improved: Comprehensive lineage configuration documentation

Breaking Changes

None.

Dependency Updates

Package Old New
github.com/txn2/mcp-datahub v0.4.2 v0.4.3

Bug Fixes

Column Lineage GraphQL Compatibility

Searches with prefer_column_lineage: true no longer fail with GraphQL schema errors on older DataHub versions. The upstream mcp-datahub v0.4.3 removes an invalid field from the column lineage query.

Cache Key Generation

Fixed cache key generation for lineage lookups where max_hops >= 10. Previously, depths 10+ produced incorrect cache keys causing cache misses or collisions.

New Features

Platform Info Tool

New platform_info tool provides deployment-specific context to agents:

server:
  name: acme-data-platform
  version: "1.0.0"
  description: |
    Acme POS Data Platform providing access to retail data.

Agents can call platform_info to understand available toolkits, enabled features, and the platform's purpose.

Configuration Changes

New Server Options

Option Type Default Description
server.version string 1.0.0 Server version
server.description string - Platform description for platform_info tool

Documentation Additions

  • DataHub debug field for GraphQL logging
  • Complete semantic.lineage configuration section
  • Lineage-aware semantic enrichment guide with examples

Upgrade Instructions

  1. Update your go.mod or pull the latest container image
  2. Optionally add server.version and server.description to your config
  3. No other changes required

Verification

# Test that searches work with column lineage enabled
datahub_search query="your_table"

# Verify platform_info tool
platform_info

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.9.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.9.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.9.0_linux_amd64.tar.gz

mcp-data-platform-v0.8.2

28 Jan 04:19
0c2fae6

Choose a tag to compare

Overview

This release upgrades mcp-datahub from v0.4.0 to v0.4.2, bringing improved error visibility and debug logging capabilities.

What's New

Debug Logging Support

Added debug configuration option for DataHub instances to enable detailed logging of GraphQL operations:

toolkits:
  datahub:
    instances:
      primary:
        endpoint: "https://datahub.example.com/api/graphql"
        token: "${DATAHUB_TOKEN}"
        debug: true  # Enable debug logging

When enabled, logs include:

  • GraphQL operation names and latency
  • Request/response sizes
  • Retry attempts
  • Detailed error context with URNs

Upstream Dependency Updates

mcp-datahub v0.4.1

  • Fixed column-level glossary terms, tags, and descriptions added via DataHub UI not being returned
  • Queries now fetch both schemaMetadata and editableSchemaMetadata, merging with UI edits taking precedence

mcp-datahub v0.4.2

  • New Logger interface for structured logging
  • Improved error messages for GetEntity failures (includes URN and context)
  • Fixed null data detection (JSON null vs Go nil)

Breaking Changes

None. All changes are backward compatible.

Upgrade Instructions

Update your go.mod:

go get github.com/txn2/mcp-data-platform@v0.8.2

No configuration changes required. The debug option is optional and defaults to false.

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.8.2

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.8.2_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.8.2_linux_amd64.tar.gz

mcp-data-platform-v0.8.1

27 Jan 18:00
c4452b8

Choose a tag to compare

Lineage Configuration Wiring + Timeout/Cache Implementation

This release completes the lineage-aware semantic enrichment feature introduced in v0.8.0 by wiring configuration through the platform and implementing timeout and caching for lineage resolution operations.

Features

Lineage Configuration Wiring

  • SemanticConfig now includes the Lineage field for configuring lineage-aware column inheritance
  • Platform properly passes lineage configuration to the DataHub semantic adapter
  • Example configuration added to configs/platform.yaml

Lineage Resolution Timeout

  • Configurable timeout for lineage resolution operations via lineage.timeout
  • Prevents runaway queries when DataHub is slow or unresponsive

Lineage Resolution Caching

  • Thread-safe caching of lineage resolution results via lineage.cache_ttl
  • Reduces redundant API calls to DataHub for repeated column lookups
  • TTL-based expiration with automatic cleanup

Configuration

semantic:
  provider: datahub
  instance: primary
  lineage:
    enabled: true
    max_hops: 2
    inherit:
      - glossary_terms
      - descriptions
    conflict_resolution: nearest
    prefer_column_lineage: true
    cache_ttl: 10m    # Cache lineage results
    timeout: 5s       # Timeout for lineage operations

Testing

Added comprehensive test coverage:

  • YAML configuration parsing
  • Platform-to-adapter wiring verification
  • Adapter lineage behavior (enabled/disabled)
  • Cache functionality (TTL, expiration, copy semantics)
  • Timeout functionality (context deadline)

Files Changed

  • pkg/platform/config.go
  • pkg/platform/platform.go
  • pkg/platform/config_test.go
  • pkg/platform/platform_test.go
  • pkg/semantic/datahub/adapter.go
  • pkg/semantic/datahub/adapter_test.go
  • pkg/semantic/datahub/lineage_resolver.go
  • pkg/semantic/datahub/lineage_resolver_test.go
  • configs/platform.yaml

Upgrade Notes

No breaking changes. Existing configurations continue to work. To enable the new caching and timeout features, add cache_ttl and timeout to your semantic.lineage configuration.

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.8.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.8.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.8.1_linux_amd64.tar.gz

mcp-data-platform-v0.8.0

27 Jan 09:12
f486db3

Choose a tag to compare

Release Notes: v0.8.0

Highlights

Lineage-Aware Semantic Enrichment - Downstream datasets now automatically inherit column metadata from upstream sources via DataHub lineage. Query an undocumented Elasticsearch index and get descriptions, glossary terms, and tags from its Cassandra source table.

New Features

Lineage-Aware Column Inheritance

When describing a table with undocumented columns, the platform now:

  1. Queries DataHub for upstream lineage
  2. Fetches schemas from upstream datasets
  3. Matches columns and inherits metadata
  4. Returns enriched response with provenance tracking

Configuration:

semantic:
  provider: datahub
  instance: primary

  lineage:
    enabled: true
    max_hops: 2
    inherit:
      - glossary_terms
      - descriptions
      - tags
    prefer_column_lineage: true

    column_transforms:
      - strip_prefix: "rxtxmsg.payload."

    aliases:
      - source: "cassandra.prod_fuse.system_sale"
        targets:
          - "elasticsearch.default.jakes-sale-*"

Response format:

{
  "column_context": {
    "amount": {
      "description": "Net sale amount",
      "glossary_terms": [{"name": "Net Sale Amount"}],
      "inherited_from": {
        "source_dataset": "urn:li:dataset:cassandra.system_sale",
        "source_column": "initial_net",
        "hops": 1,
        "match_method": "name_transformed"
      }
    }
  },
  "inheritance_sources": ["urn:li:dataset:cassandra.system_sale"]
}

Match methods:

Method Description
column_lineage DataHub column-level lineage edges
name_exact Exact column name match
name_transformed Match after strip prefix/suffix
alias Explicit alias configuration

Dependencies

  • Requires mcp-datahub v0.4.0 (for GetColumnLineage and GetSchemas methods)

Breaking Changes

None. Lineage inheritance is opt-in (lineage.enabled: false by default).

Dependency Updates

Dependency From To
Alpine (Docker) 3.21 3.23
actions/setup-python 6.1.0 6.2.0
actions/checkout 6.0.1 6.0.2
anchore/sbom-action 0.21.1 0.22.0
github/codeql-action 4.31.10 4.31.11

Documentation

  • Updated README with new banner image
  • Added lineage inheritance documentation
  • Updated llms.txt and llms-full.txt per llmstxt.org spec

Contributors

  • Lineage-Aware Semantic Enrichment (#26)
  • Docs/README updates (#19, #20)
  • Dependency updates via Dependabot (#21-25)

Changelog

Others

  • 5addde9: Add documentation for automatic column metadata inheritance via DataHub lineage (@cjimti)
  • fc5259f: Docs/readme updates (#19) (@cjimti)
  • f486db3: Document lineage-aware column metadata inheritance in DataHub, including configuration options, match methods, and use cases. (@cjimti)
  • b20b97d: Document lineage-aware inheritance in DataHub, including configuration, metadata context, and provenance tracking. (@cjimti)
  • 415e7e7: Lineage-Aware Semantic Enrichment (#26) (@cjimti)
  • 2c667c1: Update README to replace logo with banner image and add the new banner file (#20) (@cjimti)
  • f55c092: ci: bump actions/checkout from 6.0.1 to 6.0.2 (#24) (@dependabot[bot])
  • 32f7e98: ci: bump actions/setup-python from 6.1.0 to 6.2.0 (#22) (@dependabot[bot])
  • 34b4353: ci: bump anchore/sbom-action from 0.21.1 to 0.22.0 (#23) (@dependabot[bot])
  • 13b2b15: ci: bump github/codeql-action from 4.31.10 to 4.31.11 (#25) (@dependabot[bot])
  • afea984: docker: bump alpine from 3.21 to 3.23 (#21) (@dependabot[bot])

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.8.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.8.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.8.0_linux_amd64.tar.gz

mcp-data-platform-v0.7.1

24 Jan 08:46
979e1e1

Choose a tag to compare

Query context enrichment for datahub_get_schema

datahub_get_schema responses now include query_context with query_table, matching the behavior of datahub_search, datahub_get_entity, and datahub_get_lineage.

What changed

The enrichment logic now extracts URNs from request parameters in addition to response content. Tools that take a urn parameter (like datahub_get_schema) get query context even when the response doesn't echo the URN back.

Affected tools

  • datahub_get_schema
  • datahub_get_entity
  • datahub_get_lineage

Example response

{
  "fields": [
    {"name": "order_id", "type": "NUMBER", "description": "Unique order identifier"}
  ],
  "primary_keys": ["order_id"],
  "query_context": {
    "urn:li:dataset:(urn:li:dataPlatform:trino,hive.sales.orders,PROD)": {
      "available": true,
      "query_table": "hive.sales.orders",
      "connection": "production",
      "estimated_rows": 1500000
    }
  }
}

Upgrade guide

No configuration changes required.

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.7.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.7.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.7.1_linux_amd64.tar.gz

mcp-data-platform-v0.7.0

24 Jan 07:08
270a277

Choose a tag to compare

Release v0.7.0

MCP protocol middleware

Middleware now runs at the MCP protocol level using server.AddReceivingMiddleware(). This integrates directly with the MCP SDK's request handling.

Semantic enrichment

Tool responses are automatically enriched with context from other services:

  • trino_describe_table responses include DataHub metadata (owners, tags, deprecation status, glossary terms)
  • datahub_search responses include Trino query availability
  • s3_list_objects responses include matching DataHub dataset metadata

Audit logging

All tool calls are logged with user identity, parameters, timing, and success/failure status. Logging is async and doesn't block responses.

E2E test suite

Added end-to-end tests that verify cross-injection against real DataHub and Trino instances:

  • TestTrinoToDataHubEnrichment
  • TestDataHubToTrinoEnrichment
  • TestDataHubToS3Enrichment
  • TestS3ToDataHubEnrichment

Test fixtures in test/e2e/testdata/datahub/ with a seed script for DataHub.

Other changes

  • SeaweedFS support for S3-compatible storage in E2E tests
  • Documentation updates for middleware architecture

Test results

go test -race ./...         PASS
golangci-lint run ./...     0 issues
gosec ./...                 0 issues
Coverage                    84.5%

Upgrade guide

No configuration changes required.

mcp-data-platform-v0.6.0

24 Jan 01:58
2eb426a

Choose a tag to compare

Semantic Enrichment Fixes

Fixes cross-injection between Trino and DataHub so table-level metadata (description, tags, domain, glossary terms, owners) flows correctly in both directions.

Bug Fixes

  • Fixed table name extraction - trino_describe_table(catalog="x", schema="y", table="z") now correctly builds x.y.z for URN lookup (was only using z)
  • Added reverse catalog mapping - DataHub URNs now correctly resolve to Trino table paths via query.urn_mapping.catalog_mapping
  • Complete enrichment output - Added missing fields: urn, glossary_terms, custom_properties, last_modified
  • Debug logging - Added slog.Debug when enrichment fails to help diagnose URN mismatches

Configuration

Bidirectional URN mapping is now supported:

semantic:
  provider: datahub
  instance: primary
  urn_mapping:
    platform: postgres
    catalog_mapping:
      rdbms: warehouse           # Trino → DataHub

query:
  provider: trino
  instance: production
  urn_mapping:
    catalog_mapping:
      warehouse: rdbms           # DataHub → Trino

Breaking Changes

None.

Upgrade Guide

Add urn_mapping configuration if your Trino catalogs differ from DataHub catalogs. Both semantic.urn_mapping and query.urn_mapping should be configured as reverse mappings of each other.

Changelog

Others

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v0.6.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_0.6.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_0.6.0_linux_amd64.tar.gz