Skip to content

feat(ingestion): add Omni BI platform source (INCUBATING)#16564

Merged
treff7es merged 37 commits intodatahub-project:masterfrom
bearsandrhinos:feat/omni-source
Apr 3, 2026
Merged

feat(ingestion): add Omni BI platform source (INCUBATING)#16564
treff7es merged 37 commits intodatahub-project:masterfrom
bearsandrhinos:feat/omni-source

Conversation

@bearsandrhinos
Copy link
Copy Markdown
Contributor

@bearsandrhinos bearsandrhinos commented Mar 12, 2026

Summary

Adds a new DataHub ingestion source for the Omni BI platform (https://omni.co).

  • Extracts semantic models, topics, and views as Datasets with full schema metadata (dimensions + measures as typed SchemaFields)
  • Physical warehouse table lineage stitched to existing DataHub entities via connection_to_platform config — platform-agnostic (Snowflake, BigQuery, Redshift, etc.)
  • Dashboard and chart tile entities (Dashboard + Chart) with ownership from the Omni document API
  • Folder hierarchy as Containers
  • Coarse and fine-grained column-level lineage: Dashboard → Tile → Topic → View → Physical Table
  • Stateful ingestion with stale entity removal (StaleEntityRemovalSourceReport)

Files changed

Path Purpose
src/.../source/omni/omni.py Main source (StatefulIngestionSourceBase, TestableSource, SDK V2)
src/.../source/omni/omni_config.py Config (SecretStr, AllowDenyPattern, PlatformInstanceConfigMixin)
src/.../source/omni/omni_report.py Report (StaleEntityRemovalSourceReport)
src/.../source/omni/omni_api.py REST client with rate-limiting + exponential backoff
src/.../source/omni/omni_lineage_parser.py Field reference parser for fine-grained lineage
metadata-ingestion/setup.py omni extras + entry point
docs/sources/omni/README.md Concept mapping (Sigma/Looker format)
docs/sources/omni/omni_pre.md Prerequisites + overview
docs/sources/omni/omni_post.md Capabilities, lineage, troubleshooting
docs/sources/omni/omni_recipe.yml Quickstart recipe
tests/integration/omni/ Integration test suite with FakeOmniClient + 86-event golden file

Test plan

  • Integration tests use FakeOmniClient with deterministic fixture data (Omni cannot be self-hosted in Docker — same approach as other SaaS connectors)
  • Golden file covers all entity types and aspect types (86 events, pytestmark = integration_batch_2)
  • test_connection() paths tested (success + failure)
  • AllowDenyPattern model/document filtering tested
  • Snowflake name normalisation (uppercase) tested
  • Graceful fallback for 403 on connections endpoint tested
  • Fine-grained (column-level) lineage tested

To run:

pytest tests/integration/omni/ -v

Lineage chain

Folder → Dashboard → Chart (tile) → Topic → Semantic View → Physical Table

Notes

  • Support status: INCUBATING
  • Omni cannot be deployed locally in Docker, so tests use mocked API responses
  • Long-term E2E tests can be added via a partner DataHub account

Adds a new DataHub ingestion source for the Omni BI platform
(https://omni.co). The connector extracts:

- Semantic models, topics, and views (as Datasets with schema metadata)
- Physical database tables with upstream lineage stitched to existing
  warehouse entities (Snowflake, BigQuery, etc.)
- Dashboards and chart tiles (as Dashboard + Chart entities)
- Folder hierarchy (as Containers)
- Coarse and fine-grained column-level lineage:
    Dashboard → Tile → Topic → View → Physical Table
- Ownership from document API
- Embed metadata for dashboard assets

Configuration highlights:
- `api_key`: SecretStr Omni Organization API key
- `model_pattern` / `document_pattern`: AllowDenyPattern filters
- `connection_to_platform`: maps Omni connection IDs to DataHub platforms
- `include_column_lineage`: toggle fine-grained lineage extraction
- Stateful ingestion with stale entity removal

Closes #<issue-number>

Made-with: Cursor
- Replace deprecated report_warning()/report_failure() with the
  structured report.warning()/report.failure() API throughout omni.py
- Refactor _collect_tile_data to store the resolved modelId in
  self._current_tile_model_id instead of using the unconventional
  StopIteration.value return trick; caller now reads the attribute
  directly after yielding from the generator
- Remove unused doc_id parameter from _emit_inferred_view_datasets

Made-with: Cursor
Replace flat omni.md with the standard four-file layout used by other
BI sources (sigma, looker):
  - README.md       — Overview + Concept Mapping table
  - omni_pre.md     — Prerequisites and detailed overview (pre-config)
  - omni_post.md    — Capabilities, lineage details, limitations, troubleshooting
  - omni_recipe.yml — Quickstart recipe example

Made-with: Cursor
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-1902

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@github-actions github-actions bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Mar 12, 2026
Ruff lint (F401, I001, F841, SIM114, F541, TID252, E402):
- Remove unused imports: make_dashboard_urn, StaleEntityRemovalSourceReport,
  OwnerClass, OwnershipClass, FieldRef, List, ConfigModel, Pipeline
- Fix import sort order (isort) across all omni source and test files
- Remove unused local variable folder_name
- Simplify elif branch to use logical or operator
- Remove extraneous f-prefix on string literal
- Move pytestmark assignment after imports (E402)
- Convert relative import to absolute (TID252)

Ruff format:
- Auto-format all Python files to match project style

Markdown (prettier):
- Fix table formatting in README.md and omni_post.md

Made-with: Cursor
@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Mar 12, 2026
- Delete stale docs/sources/omni.md (replaced by docs/sources/omni/ folder)
- Add TEST_CONNECTION @capability decorator to OmniSource
- Add omni entry to datahub.json connector registry
- Fix import sort (I001), remove unused import (FakeOmniClientConnectionFail F401),
  and remove f-string without placeholder (F541) in test_omni_integration.py

Made-with: Cursor
Add blank line between third-party and first-party import sections
in test_omni_integration.py to satisfy ruff isort requirements.

Made-with: Cursor
pyproject.toml is the authoritative source for entry points when
installing with modern pip. setup.py alone is not sufficient - the
omni source entry point must also appear in pyproject.toml so that
datahub's source_registry and docgen can discover it.

Made-with: Cursor
Add blank line after module docstring and wrap long function signature
to comply with ruff format requirements.

Made-with: Cursor
- Annotate self.report as OmniSourceReport to prevent StatefulIngestionReport
  attribute errors (32 attr-defined errors resolved)
- Wrap owner URNs with CorpUserUrn() for set_owners() calls (list-item errors)
- Add type: ignore[arg-type] for parent_container str|None vs SDK union type
- Filter None keys and cast to str in connections dict comprehension
- Explicitly annotate platform/database as str using str() cast to fix
  object-typed dict values from Dict[str, object] connections lookup
- Annotate connection as Optional to match connections.get() return type
- Add type: ignore[assignment] for fake client assignments in tests
- Add None guard before hasattr check for aspect.to_obj() call
- Remove unused FakeOmniClientConnectionFail import (F401)
- Fix F541 f-string without placeholder
- Apply ruff format to both files

Made-with: Cursor
- Add report_dropped() and filtered list to OmniSourceReport (required by
  AllowDenyPattern filtering; StaleEntityRemovalSourceReport does not
  provide this)
- Rename inner-loop variable connection -> conn to fix no-redef error
  (connection was already used in outer loop)
- Add attr-defined to type: ignore on topic.get("views", []) iteration

Made-with: Cursor
@gabe-lyons
Copy link
Copy Markdown
Contributor

Thanks for the contribution!

@github-actions github-actions bot requested a review from treff7es March 13, 2026 16:47
@github-actions
Copy link
Copy Markdown
Contributor

Your PR has been assigned to @treff7es (tamas) for review (ING-1902).

@maggiehays maggiehays added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Mar 13, 2026
Copy link
Copy Markdown
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, we really appreciate it.
I left a few comments, please, can you check those?

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Bundle Report

Changes will decrease total bundle size by 510.36kB (-2.2%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 22.68MB -510.36kB (-2.2%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js -591.45kB 12.45MB -4.54%
assets/flinklogo-*.svg (New) 81.09kB 81.09kB 100.0% 🚀

- Resolve setup.py extras conflict: keep omni extra; glue includes sqlglot_lib (master)
- Add [project.optional-dependencies] omni group to pyproject.toml (pip install .[omni])

Made-with: Cursor
@bearsandrhinos
Copy link
Copy Markdown
Contributor Author

Hey @treff7es , curious what else needs to be updated here. Am I missing anything?

@treff7es treff7es self-assigned this Apr 2, 2026
treff7es added 5 commits April 2, 2026 21:29
Move pydantic/requests imports above module-level code to fix E402,
and auto-format lines exceeding 88-char limit.
Without the re-raise, get_workunits_internal silently swallows fatal
errors, breaking DataHub's error propagation contract.
Remove trivial wrapper that just called mcp.as_workunit(), replacing
all 4 call sites with direct .as_workunit() calls.
- Replace **dict spread for parent_container with explicit kwarg using
  unset sentinel, fixing arg-type errors on Dashboard constructor
- Add FieldConfidence type annotation to confidence variable so mypy
  can narrow the Literal type from the ternary expression
@treff7es
Copy link
Copy Markdown
Contributor

treff7es commented Apr 2, 2026

Hey @treff7es , curious what else needs to be updated here. Am I missing anything?
I'm checking and fixing the linter issues

treff7es added 2 commits April 2, 2026 22:05
Add isinstance assertion to narrow pipeline.source from Source to
OmniSource before accessing .client attribute.
The parent_container fix now properly invokes _set_container, which
generates browsePathsV2 with the actual folder path instead of empty
paths. Also updates field ordering in fine-grained lineage entries.
Per Omni docs, the data model hierarchy is:
  Physical Table → View → Topic → Dashboard

Previously, lineage was modeled incorrectly in multiple ways:
- Physical tables listed Omni views as upstreams (inverted)
- Views listed topics as upstreams (inverted)
- Views accumulated all dashboard topic URNs (incorrect scoping)

Now correctly:
- Views list their physical source table as upstream (COPY type)
- Topics list their constituent views as upstreams (TRANSFORMED type)
- Physical tables have no Omni upstream
- Dashboard structural upstream is its folder only, with FGL edges
  pointing to semantic view fields

Also fixes Set[Any] → Set[FieldRef] for type safety in field
reference collections.
Copy link
Copy Markdown
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now, thanks for the contribution

@treff7es treff7es enabled auto-merge (squash) April 2, 2026 22:10
@treff7es treff7es disabled auto-merge April 2, 2026 22:10
@treff7es treff7es merged commit e7f2418 into datahub-project:master Apr 3, 2026
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants