feat(snowflake): add support for external DMF assertion ingestion#16058
feat(snowflake): add support for external DMF assertion ingestion#16058
Conversation
|
Linear: ING-1493 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
✅ Meticulous spotted 0 visual differences across 951 screens tested: view results. Meticulous evaluated ~8 hours of user flows against your PR. Expected differences? Click here. Last updated for commit 5fbbdc4. This comment will update as new commits are pushed. |
|
@AdrianMachado want to take a look here? |
AdrianMachado
left a comment
There was a problem hiding this comment.
This is so awesome! Just a couple of suggestions
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Show resolved
Hide resolved
| | **Naming** | Prefixed with `datahub__` | Any name | | ||
| | **Definition** | Created via `datahub assertions compile` | Created manually by user | | ||
| | **Assertion Type** | Based on assertion definition (Freshness, Volume, etc.) | CUSTOM | | ||
| | **Source** | INFERRED | EXTERNAL | |
There was a problem hiding this comment.
Regarding the Source:
https://docs.datahub.com/docs/generated/metamodel/entities/assertion#assertion-source
The assertionInfo aspect includes an AssertionSource that identifies the origin of the assertion:
- NATIVE: Defined directly in DataHub (DataHub Cloud feature)
- EXTERNAL: Ingested from external tools (Great Expectations, dbt, Snowflake, etc.)
- INFERRED: Generated by ML-based inference systems (DataHub Cloud feature)
External assertions should have a corresponding dataPlatformInstance aspect that identifies the specific platform instance they originated from.
My concerns:
- INFERRED seems to be limited to ML-based inference systems. We may be deviating original purpose.
- We are missing the
DataPlatformInstanceaspect. Which is the one that identifies the origin, according to the docs.
There was a problem hiding this comment.
Updated, yes it should be native for datahub created dmfs.
In code we already emits DataPlatformInstance aspect for every unique assertion URN (both DataHub-created and external DMFs).
Added in the doc.
sgomezvillamor
left a comment
There was a problem hiding this comment.
I mainly checked new feature from the new docs and left some comments on the modelling.
|
cc: @jayacryl if i could get an overall review for this, as it is around dmf, @sgomezvillamor suggested it might be a good idea to double check with you. |
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Outdated
Show resolved
Hide resolved
sgomezvillamor
left a comment
There was a problem hiding this comment.
LGTM, thanks for addressing all the comments!
Please, get a review from someone in Observability before merging.
|
|
||
| ### How External DMFs Differ from DataHub-Created DMFs | ||
|
|
||
| | Aspect | DataHub-Created DMFs | External DMFs | |
There was a problem hiding this comment.
| | Aspect | DataHub-Created DMFs | External DMFs | | |
| | Aspect | DataHub-Managed DMFs | Externally Managed DMFs | |
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_assertion.py
Outdated
Show resolved
Hide resolved
|
Very minor optional comments. Great work Rajat! |
Summary
include_external_dmf_assertionsto ingest user-created Snowflake Data Metric Functions (DMFs) as external assertions in DataHubAssertionInfoaspects using CUSTOM type and EXTERNAL source, including column information fromARGUMENT_NAMESREFERENCE_IDfielddatahub__*) which extract GUID from the name, and external DMFs which generate GUIDs from reference IDsTest Plan
test_snowflake_assertion.pycovering:ARGUMENT_NAMESfrom JSON stringsREFERENCE_ID./gradlew :metadata-ingestion:testQuickto verify tests pass./gradlew :metadata-ingestion:lintFixto verify code qualityChecklist
Connector Tests Run for the PR:
https://github.com/acryldata/connector-tests/actions/runs/21699950462
Breaking Changes
None - this is an additive feature with a new opt-in config flag (
include_external_dmf_assertions) that defaults tofalse.