Skip to content

feat(snowflake): add support for external DMF assertion ingestion#16058

Merged
rajatoss merged 10 commits intomasterfrom
rajat-sf-dmf
Feb 17, 2026
Merged

feat(snowflake): add support for external DMF assertion ingestion#16058
rajatoss merged 10 commits intomasterfrom
rajat-sf-dmf

Conversation

@rajatoss
Copy link
Member

@rajatoss rajatoss commented Feb 3, 2026

Summary

  • Add new config option include_external_dmf_assertions to ingest user-created Snowflake Data Metric Functions (DMFs) as external assertions in DataHub
  • External DMFs are ingested with AssertionInfo aspects using CUSTOM type and EXTERNAL source, including column information from ARGUMENT_NAMES
  • Generate stable, deterministic GUIDs for external DMFs using Snowflake's REFERENCE_ID field
  • Differentiate between DataHub-created DMFs (prefixed datahub__*) which extract GUID from the name, and external DMFs which generate GUIDs from reference IDs

Test Plan

  • Manual testing: Verified end-to-end functionality against a live Snowflake environment with both DataHub-created and external DMFs
  • Unit tests added in test_snowflake_assertion.py covering:
    • Pydantic model parsing of ARGUMENT_NAMES from JSON strings
    • Query generation with and without external DMF filtering
    • GUID generation determinism and uniqueness based on REFERENCE_ID
    • AssertionInfo creation with correct types, sources, and field URNs for single vs multi-column DMFs
    • Mixed processing of both DataHub and external DMFs
  • Run ./gradlew :metadata-ingestion:testQuick to verify tests pass
  • Run ./gradlew :metadata-ingestion:lintFix to verify code quality

Checklist

  • PR conforms to the Contributing Guideline
  • Links to related issues (if applicable)
  • Tests added/updated (if applicable)
  • Docs added/updated (if applicable)

Connector Tests Run for the PR:

https://github.com/acryldata/connector-tests/actions/runs/21699950462

Breaking Changes

None - this is an additive feature with a new opt-in config flag (include_external_dmf_assertions) that defaults to false.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Feb 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Linear: ING-1493

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 86.84211% with 10 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../ingestion/source/snowflake/snowflake_assertion.py 85.71% 9 Missing ⚠️
...hub/ingestion/source/snowflake/snowflake_config.py 85.71% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@alwaysmeticulous
Copy link

alwaysmeticulous bot commented Feb 3, 2026

✅ Meticulous spotted 0 visual differences across 951 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 5fbbdc4. This comment will update as new commits are pushed.

@gabe-lyons
Copy link
Contributor

@AdrianMachado want to take a look here?

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Feb 6, 2026
Copy link
Contributor

@AdrianMachado AdrianMachado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so awesome! Just a couple of suggestions

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Feb 6, 2026
| **Naming** | Prefixed with `datahub__` | Any name |
| **Definition** | Created via `datahub assertions compile` | Created manually by user |
| **Assertion Type** | Based on assertion definition (Freshness, Volume, etc.) | CUSTOM |
| **Source** | INFERRED | EXTERNAL |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Source:

https://docs.datahub.com/docs/generated/metamodel/entities/assertion#assertion-source

The assertionInfo aspect includes an AssertionSource that identifies the origin of the assertion:

  • NATIVE: Defined directly in DataHub (DataHub Cloud feature)
  • EXTERNAL: Ingested from external tools (Great Expectations, dbt, Snowflake, etc.)
  • INFERRED: Generated by ML-based inference systems (DataHub Cloud feature)

External assertions should have a corresponding dataPlatformInstance aspect that identifies the specific platform instance they originated from.

My concerns:

  • INFERRED seems to be limited to ML-based inference systems. We may be deviating original purpose.
  • We are missing the DataPlatformInstance aspect. Which is the one that identifies the origin, according to the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, yes it should be native for datahub created dmfs.
In code we already emits DataPlatformInstance aspect for every unique assertion URN (both DataHub-created and external DMFs).

Added in the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jayacryl around the source choice here

Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly checked new feature from the new docs and left some comments on the modelling.

@rajatoss
Copy link
Member Author

rajatoss commented Feb 8, 2026

cc: @jayacryl if i could get an overall review for this, as it is around dmf, @sgomezvillamor suggested it might be a good idea to double check with you.

Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing all the comments!

Please, get a review from someone in Observability before merging.

@rajatoss rajatoss requested a review from jayacryl February 10, 2026 13:32

### How External DMFs Differ from DataHub-Created DMFs

| Aspect | DataHub-Created DMFs | External DMFs |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Aspect | DataHub-Created DMFs | External DMFs |
| Aspect | DataHub-Managed DMFs | Externally Managed DMFs |

@jayacryl
Copy link
Collaborator

Very minor optional comments. Great work Rajat!

@github-actions github-actions bot requested a deployment to datahub-wheels (Preview) February 17, 2026 05:16 Abandoned
@rajatoss rajatoss enabled auto-merge (squash) February 17, 2026 05:34
@rajatoss rajatoss merged commit 43aa36f into master Feb 17, 2026
68 checks passed
@rajatoss rajatoss deleted the rajat-sf-dmf branch February 17, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants