feat(ingestion): add Omni BI platform source (INCUBATING)#16564
feat(ingestion): add Omni BI platform source (INCUBATING)#16564treff7es merged 37 commits intodatahub-project:masterfrom
Conversation
Adds a new DataHub ingestion source for the Omni BI platform (https://omni.co). The connector extracts: - Semantic models, topics, and views (as Datasets with schema metadata) - Physical database tables with upstream lineage stitched to existing warehouse entities (Snowflake, BigQuery, etc.) - Dashboards and chart tiles (as Dashboard + Chart entities) - Folder hierarchy (as Containers) - Coarse and fine-grained column-level lineage: Dashboard → Tile → Topic → View → Physical Table - Ownership from document API - Embed metadata for dashboard assets Configuration highlights: - `api_key`: SecretStr Omni Organization API key - `model_pattern` / `document_pattern`: AllowDenyPattern filters - `connection_to_platform`: maps Omni connection IDs to DataHub platforms - `include_column_lineage`: toggle fine-grained lineage extraction - Stateful ingestion with stale entity removal Closes #<issue-number> Made-with: Cursor
- Replace deprecated report_warning()/report_failure() with the structured report.warning()/report.failure() API throughout omni.py - Refactor _collect_tile_data to store the resolved modelId in self._current_tile_model_id instead of using the unconventional StopIteration.value return trick; caller now reads the attribute directly after yielding from the generator - Remove unused doc_id parameter from _emit_inferred_view_datasets Made-with: Cursor
Replace flat omni.md with the standard four-file layout used by other BI sources (sigma, looker): - README.md — Overview + Concept Mapping table - omni_pre.md — Prerequisites and detailed overview (pre-config) - omni_post.md — Capabilities, lineage details, limitations, troubleshooting - omni_recipe.yml — Quickstart recipe example Made-with: Cursor
|
Linear: ING-1902 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
Ruff lint (F401, I001, F841, SIM114, F541, TID252, E402): - Remove unused imports: make_dashboard_urn, StaleEntityRemovalSourceReport, OwnerClass, OwnershipClass, FieldRef, List, ConfigModel, Pipeline - Fix import sort order (isort) across all omni source and test files - Remove unused local variable folder_name - Simplify elif branch to use logical or operator - Remove extraneous f-prefix on string literal - Move pytestmark assignment after imports (E402) - Convert relative import to absolute (TID252) Ruff format: - Auto-format all Python files to match project style Markdown (prettier): - Fix table formatting in README.md and omni_post.md Made-with: Cursor
- Delete stale docs/sources/omni.md (replaced by docs/sources/omni/ folder) - Add TEST_CONNECTION @capability decorator to OmniSource - Add omni entry to datahub.json connector registry - Fix import sort (I001), remove unused import (FakeOmniClientConnectionFail F401), and remove f-string without placeholder (F541) in test_omni_integration.py Made-with: Cursor
Add blank line between third-party and first-party import sections in test_omni_integration.py to satisfy ruff isort requirements. Made-with: Cursor
pyproject.toml is the authoritative source for entry points when installing with modern pip. setup.py alone is not sufficient - the omni source entry point must also appear in pyproject.toml so that datahub's source_registry and docgen can discover it. Made-with: Cursor
Add blank line after module docstring and wrap long function signature to comply with ruff format requirements. Made-with: Cursor
- Annotate self.report as OmniSourceReport to prevent StatefulIngestionReport attribute errors (32 attr-defined errors resolved) - Wrap owner URNs with CorpUserUrn() for set_owners() calls (list-item errors) - Add type: ignore[arg-type] for parent_container str|None vs SDK union type - Filter None keys and cast to str in connections dict comprehension - Explicitly annotate platform/database as str using str() cast to fix object-typed dict values from Dict[str, object] connections lookup - Annotate connection as Optional to match connections.get() return type - Add type: ignore[assignment] for fake client assignments in tests - Add None guard before hasattr check for aspect.to_obj() call - Remove unused FakeOmniClientConnectionFail import (F401) - Fix F541 f-string without placeholder - Apply ruff format to both files Made-with: Cursor
- Add report_dropped() and filtered list to OmniSourceReport (required by
AllowDenyPattern filtering; StaleEntityRemovalSourceReport does not
provide this)
- Rename inner-loop variable connection -> conn to fix no-redef error
(connection was already used in outer loop)
- Add attr-defined to type: ignore on topic.get("views", []) iteration
Made-with: Cursor
|
Thanks for the contribution! |
|
Your PR has been assigned to @treff7es (tamas) for review (ING-1902). |
treff7es
left a comment
There was a problem hiding this comment.
Thanks for the contribution, we really appreciate it.
I left a few comments, please, can you check those?
…move type ignores) Made-with: Cursor
Made-with: Cursor
Bundle ReportChanges will decrease total bundle size by 510.36kB (-2.2%) ⬇️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
|
- Resolve setup.py extras conflict: keep omni extra; glue includes sqlglot_lib (master) - Add [project.optional-dependencies] omni group to pyproject.toml (pip install .[omni]) Made-with: Cursor
2a62c2a to
5f46db1
Compare
|
Hey @treff7es , curious what else needs to be updated here. Am I missing anything? |
Move pydantic/requests imports above module-level code to fix E402, and auto-format lines exceeding 88-char limit.
Without the re-raise, get_workunits_internal silently swallows fatal errors, breaking DataHub's error propagation contract.
Remove trivial wrapper that just called mcp.as_workunit(), replacing all 4 call sites with direct .as_workunit() calls.
- Replace **dict spread for parent_container with explicit kwarg using unset sentinel, fixing arg-type errors on Dashboard constructor - Add FieldConfidence type annotation to confidence variable so mypy can narrow the Literal type from the ternary expression
|
Add isinstance assertion to narrow pipeline.source from Source to OmniSource before accessing .client attribute.
The parent_container fix now properly invokes _set_container, which generates browsePathsV2 with the actual folder path instead of empty paths. Also updates field ordering in fine-grained lineage entries.
Per Omni docs, the data model hierarchy is: Physical Table → View → Topic → Dashboard Previously, lineage was modeled incorrectly in multiple ways: - Physical tables listed Omni views as upstreams (inverted) - Views listed topics as upstreams (inverted) - Views accumulated all dashboard topic URNs (incorrect scoping) Now correctly: - Views list their physical source table as upstream (COPY type) - Topics list their constituent views as upstreams (TRANSFORMED type) - Physical tables have no Omni upstream - Dashboard structural upstream is its folder only, with FGL edges pointing to semantic view fields Also fixes Set[Any] → Set[FieldRef] for type safety in field reference collections.
treff7es
left a comment
There was a problem hiding this comment.
lgtm now, thanks for the contribution
Summary
Adds a new DataHub ingestion source for the Omni BI platform (https://omni.co).
SchemaFields)connection_to_platformconfig — platform-agnostic (Snowflake, BigQuery, Redshift, etc.)Dashboard+Chart) with ownership from the Omni document APIDashboard → Tile → Topic → View → Physical TableStaleEntityRemovalSourceReport)Files changed
src/.../source/omni/omni.pyStatefulIngestionSourceBase,TestableSource, SDK V2)src/.../source/omni/omni_config.pySecretStr,AllowDenyPattern,PlatformInstanceConfigMixin)src/.../source/omni/omni_report.pyStaleEntityRemovalSourceReport)src/.../source/omni/omni_api.pysrc/.../source/omni/omni_lineage_parser.pymetadata-ingestion/setup.pyomniextras + entry pointdocs/sources/omni/README.mddocs/sources/omni/omni_pre.mddocs/sources/omni/omni_post.mddocs/sources/omni/omni_recipe.ymltests/integration/omni/FakeOmniClient+ 86-event golden fileTest plan
FakeOmniClientwith deterministic fixture data (Omni cannot be self-hosted in Docker — same approach as other SaaS connectors)pytestmark = integration_batch_2)test_connection()paths tested (success + failure)AllowDenyPatternmodel/document filtering tested403on connections endpoint testedTo run:
Lineage chain
Notes
INCUBATING