Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
751c2e7
feat(ingestion): add Omni BI platform source (INCUBATING)
bearsandrhinos Mar 12, 2026
688218a
fix: address pre-PR review warnings in OmniSource
bearsandrhinos Mar 12, 2026
8ada427
docs(omni): restructure docs to OSS folder format
bearsandrhinos Mar 12, 2026
b1689e4
fix(omni): resolve CI lint and markdown format failures
bearsandrhinos Mar 12, 2026
14e85f1
fix(omni): resolve remaining CI failures
bearsandrhinos Mar 12, 2026
96a33f1
fix(omni): fix import sort order to pass ruff 0.11.7
bearsandrhinos Mar 12, 2026
307c9df
fix(omni): add omni entry point to pyproject.toml
bearsandrhinos Mar 12, 2026
69ee24d
fix(omni): apply ruff format to test_omni_integration.py
bearsandrhinos Mar 12, 2026
50a5079
fix(omni): resolve mypy type errors in source and integration tests
bearsandrhinos Mar 12, 2026
0b6ab6c
fix(omni): resolve mypy errors in testQuick CI
bearsandrhinos Mar 12, 2026
8022d42
Merge branch 'master' into feat/omni-source
bearsandrhinos Mar 12, 2026
e89e010
Update metadata-ingestion/src/datahub/ingestion/source/omni/omni.py
bearsandrhinos Mar 17, 2026
257976d
Update metadata-ingestion/src/datahub/ingestion/source/omni/omni.py
bearsandrhinos Mar 17, 2026
b7eff21
Update metadata-ingestion/src/datahub/ingestion/source/omni/omni.py
bearsandrhinos Mar 17, 2026
68ab468
fix(omni): add missing topics_scanned field to OmniSourceReport
bearsandrhinos Mar 17, 2026
9b0b857
fix(omni): log when skipping YAML parse failures
bearsandrhinos Mar 17, 2026
1199240
fix(omni): use Literal type for SemanticField confidence field
bearsandrhinos Mar 17, 2026
e6cd0a2
fix(omni): add base_url validator for http(s) scheme and trailing slash
bearsandrhinos Mar 17, 2026
6ab5658
fix(omni): log when last_modified datetime parse fails
bearsandrhinos Mar 17, 2026
00c3754
Update metadata-ingestion/tests/integration/omni/test_omni_integratio…
bearsandrhinos Mar 18, 2026
89ed609
Update metadata-ingestion/src/datahub/ingestion/source/omni/omni.py
bearsandrhinos Mar 18, 2026
5557e9f
fix(omni): fix linting errors and update golden file to standard MCP …
treff7es Mar 19, 2026
b1abb2c
Merge branch 'master' into feat/omni-source
treff7es Mar 19, 2026
aa00677
Omni API: log when pagination hits safety cap; document max pages
bearsandrhinos Mar 24, 2026
10dbac8
Omni: fix _ingest_topic_payload typing; add dashboard tile helper (re…
bearsandrhinos Mar 24, 2026
0098d6b
Omni: use DatasetSubTypes.TOPIC for topic dataset subtype
bearsandrhinos Mar 24, 2026
5f46db1
Merge origin/master into feat/omni-source
bearsandrhinos Apr 1, 2026
cccf93b
fix(omni): alphabetize omni optional-deps after okta in pyproject.toml
bearsandrhinos Apr 1, 2026
460ce6f
Merge branch 'master' into feat/omni-source
treff7es Apr 2, 2026
cd1f7e0
fix(ingest/omni): fix lint errors in omni source
treff7es Apr 2, 2026
34fae34
fix(ingest/omni): re-raise exception after reporting failure
treff7es Apr 2, 2026
22b2f0f
refactor(ingest/omni): inline _as_wu method
treff7es Apr 2, 2026
9fa3fce
fix(ingest/omni): fix mypy type errors in omni source
treff7es Apr 2, 2026
4afa12f
fix(ingest/omni): fix mypy attr-defined error in omni test
treff7es Apr 2, 2026
944c183
fix(ingest/omni): regenerate pyproject.toml and uv.lock from setup.py
treff7es Apr 2, 2026
b295d9d
fix(ingest/omni): update golden file for correct browse paths
treff7es Apr 2, 2026
a041d60
fix(ingest/omni): correct lineage direction to match Omni data model
treff7es Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions metadata-ingestion/docs/sources/omni/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Overview

Omni is a cloud-native business intelligence platform. Learn more in the [official Omni documentation](https://docs.omni.co/).

The DataHub integration for Omni covers BI entities such as dashboards, charts, semantic datasets, and related ownership context. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.

## Concept Mapping

| Omni Concept | DataHub Concept | Notes |
| --------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| `Folder` | [Container](../../metamodel/entities/container.md) | SubType `"Folder"` |
| `Dashboard` | [Dashboard](../../metamodel/entities/dashboard.md) | Published document with `hasDashboard=true` |
| `Tile` | [Chart](../../metamodel/entities/chart.md) | Each query presentation within a dashboard |
| `Topic` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"Topic"` — the semantic join graph entry point |
| `View` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"View"` — semantic layer table with dimensions and measures as schema fields |
| `Workbook` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"Workbook"` — unpublished personal exploration document |
| Warehouse table | [Dataset](../../metamodel/entities/dataset.md) | Native platform entity (e.g. Snowflake, BigQuery); linked as upstream of Omni Views |
| Document owner | [User (a.k.a CorpUser)](../../metamodel/entities/corpuser.md) | Propagated as `TECHNICAL_OWNER` to Dashboard and Chart entities |
59 changes: 59 additions & 0 deletions metadata-ingestion/docs/sources/omni/omni_post.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
### Capabilities

Use the **Important Capabilities** table above as the source of truth for supported features and whether additional configuration is required.

#### Physical table lineage

Omni Views reference physical warehouse tables via `sql_table_name` in model YAML. The connector resolves each reference to a DataHub dataset URN using the `connection_to_platform` mapping. If `normalize_snowflake_names: true` (default), database, schema, and table name components are uppercased to match the casing used by the DataHub Snowflake connector.

#### Column-level lineage

When `include_column_lineage: true` (default), the connector emits `FineGrainedLineage` entries by parsing `sql` expressions in model YAML and matching field references to known view columns. This enables precise field-level impact analysis across the full chain:

```
physical_table.column → semantic_view.field → dashboard_tile.field
```

#### Schema metadata

For each Omni Semantic View, the connector emits a `SchemaMetadata` aspect containing one `SchemaField` per dimension and measure defined in model YAML:

- **Dimensions**: emitted with inferred native type (string, date, timestamp, number, boolean)
- **Measures**: emitted with aggregation type and native type `NUMBER`
- Field descriptions are extracted from the YAML `description` attribute when present

#### Model and document filtering

Use `model_pattern` and `document_pattern` to restrict ingestion to specific models or dashboards:

```yaml
model_pattern:
allow:
- "^prod-.*"
deny:
- ".*-dev$"

document_pattern:
allow:
- ".*"
```

### Limitations

- Access Filters, User Attributes, and Cache schedules are not yet ingested.
- Column lineage is limited to fields that appear in model YAML `sql` expressions; complex or fully derived expressions may not fully resolve.
- Large organizations with many models may approach Omni API rate limits; tune `max_requests_per_minute` accordingly.
- True end-to-end integration tests require a live Omni environment; the test suite uses deterministic mock API responses.

### Troubleshooting

If ingestion fails, validate credentials, permissions, and connectivity first. Then review the ingestion report and logs for source-specific errors.

Common issues:

| Symptom | Likely Cause | Resolution |
| ------------------------------------------------ | ----------------------------------------------------- | ----------------------------------------------------------------------------- |
| `403 Forbidden` on `/v1/connections` | API key lacks connection read scope | Ingestion continues with config fallbacks; physical lineage may be incomplete |
| Physical tables not linked to warehouse entities | `connection_to_platform` not configured | Add connection mapping for each Omni connection ID |
| Snowflake URN mismatch | Case mismatch between Omni and DataHub Snowflake URNs | Ensure `normalize_snowflake_names: true` (default) |
| Column lineage empty | View YAML has no `sql` expressions | Expected for views using direct `sql_table_name` without field-level SQL |
38 changes: 38 additions & 0 deletions metadata-ingestion/docs/sources/omni/omni_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
### Overview

The `omni` module ingests metadata from the [Omni](https://omni.co/) BI platform into DataHub. It is intended for production ingestion workflows and supports the following:

- Folders (as Containers), Dashboards, and Chart tiles
- Semantic layer: Models, Topics, and Views with schema fields (dimensions and measures)
- Physical warehouse tables with upstream lineage stitched to existing DataHub entities
- Column-level (fine-grained) lineage from semantic view fields back to warehouse columns
- Ownership propagated from the Omni document API

Lineage is emitted as a five-hop chain:

```
Folder → Dashboard → Chart (tile) → Topic → Semantic View → Physical Table
```

### Prerequisites

Before running ingestion, ensure you have the following:

1. **An Omni Organization API key** with read access to models, documents, and connections. Generate API keys in Omni Admin → API Keys.

2. **Connection mapping configuration** if you want physical table lineage to stitch with existing warehouse entities in DataHub. You will need to map each Omni connection ID to the corresponding DataHub platform name, platform instance, and database name:

```yaml
connection_to_platform:
"conn_abc123": "snowflake"
connection_to_platform_instance:
"conn_abc123": "my_snowflake_account"
connection_to_database:
"conn_abc123": "ANALYTICS_PROD"
```

Connection IDs can be found by calling the Omni `/v1/connections` API or from the Omni Admin UI.

:::note
If the Omni API key does not have permission to list connections (`403 Forbidden`), the connector will fall back to the `connection_to_platform` config overrides and continue ingestion without failing.
:::
46 changes: 46 additions & 0 deletions metadata-ingestion/docs/sources/omni/omni_recipe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
source:
type: omni
config:
# Coordinates
base_url: "https://your-org.omniapp.co/api"

# Credentials
api_key: "${OMNI_API_KEY}"

# Connection → warehouse stitching
# Map Omni connection IDs to DataHub platform names so that physical table
# URNs match what was ingested by your warehouse source connector.
connection_to_platform:
"conn_abc123": "snowflake"

# Optional: map connection IDs to platform instances
# connection_to_platform_instance:
# "conn_abc123": "my_snowflake_account"

# Optional: override the database name inferred from the Omni connection
# connection_to_database:
# "conn_abc123": "ANALYTICS_PROD"

# Optional: include workbook-only documents (not just published dashboards)
# include_workbook_only: false

# Optional: filter which models to ingest
# model_pattern:
# allow:
# - ".*"

# Optional: filter which documents (dashboards/workbooks) to ingest
# document_pattern:
# allow:
# - ".*"

# Optional: disable column-level lineage
# include_column_lineage: true

# Optional: stateful ingestion with stale entity removal
stateful_ingestion:
enabled: true
remove_stale_metadata: true

sink:
# sink configs
11 changes: 9 additions & 2 deletions metadata-ingestion/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -910,6 +910,11 @@ okta = [
"okta~=1.7.0,<2.0.0",
]

omni = [
"PyYAML>=5.4",
"requests<3.0.0",
]

oracle = [
"acryl-datahub-classify==0.0.11",
"acryl-great-expectations==0.15.50.1",
Expand Down Expand Up @@ -1486,6 +1491,7 @@ all = [
"pyspark~=3.5.6,<4.0.0",
"python-ldap>=2.4,<4.0.0",
"python-liquid>=2.0.0,<3.0.0",
"PyYAML>=5.4",
"rdflib==6.3.2",
"redash-toolbelt<0.2.0",
"redshift-connector>=2.1.5,<3.0.0",
Expand Down Expand Up @@ -1651,7 +1657,7 @@ dev = [
"python-json-logger>=2.0.0,<5.0.0",
"python-ldap>=2.4,<4.0.0",
"python-liquid>=2.0.0,<3.0.0",
"PyYAML<7.0.0",
"PyYAML>=5.4,<7.0.0",
"rdflib==6.3.2",
"redash-toolbelt<0.2.0",
"redshift-connector>=2.1.5,<3.0.0",
Expand Down Expand Up @@ -1842,7 +1848,7 @@ docs = [
"python-json-logger>=2.0.0,<5.0.0",
"python-ldap>=2.4,<4.0.0",
"python-liquid>=2.0.0,<3.0.0",
"PyYAML<7.0.0",
"PyYAML>=5.4,<7.0.0",
"rdflib==6.3.2",
"redash-toolbelt<0.2.0",
"redshift-connector>=2.1.5,<3.0.0",
Expand Down Expand Up @@ -2111,6 +2117,7 @@ cassandra = "datahub.ingestion.source.cassandra.cassandra:CassandraSource"
neo4j = "datahub.ingestion.source.neo4j.neo4j_source:Neo4jSource"
vertexai = "datahub.ingestion.source.vertexai.vertexai:VertexAISource"
hex = "datahub.ingestion.source.hex.hex:HexSource"
omni = "datahub.ingestion.source.omni.omni:OmniSource"

[project.entry-points."datahub.ingestion.transformer.plugins"]
pattern_cleanup_ownership = "datahub.ingestion.transformer.pattern_cleanup_ownership:PatternCleanUpOwnership"
Expand Down
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,7 @@
},
"flink": {"requests<3.0.0", "tenacity>=8.0.1,<9.0.0"},
"grafana": {"requests<3.0.0", *sqlglot_lib},
"omni": {"requests<3.0.0", "PyYAML>=5.4"},
"glue": aws_common | cachetools_lib | sqlglot_lib,
# hdbcli is supported officially by SAP, sqlalchemy-hana is built on top but not officially supported
"hana": sql_common
Expand Down Expand Up @@ -958,6 +959,7 @@
"neo4j",
"vertexai",
"mssql-odbc",
"omni",
]
if plugin
for dependency in plugins[plugin]
Expand Down Expand Up @@ -1118,6 +1120,7 @@
"neo4j = datahub.ingestion.source.neo4j.neo4j_source:Neo4jSource",
"vertexai = datahub.ingestion.source.vertexai.vertexai:VertexAISource",
"hex = datahub.ingestion.source.hex.hex:HexSource",
"omni = datahub.ingestion.source.omni.omni:OmniSource",
],
"datahub.ingestion.transformer.plugins": [
"pattern_cleanup_ownership = datahub.ingestion.transformer.pattern_cleanup_ownership:PatternCleanUpOwnership",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2520,6 +2520,62 @@
"platform_name": "Okta",
"support_status": "CERTIFIED"
},
"omni": {
"capabilities": [
{
"capability": "LINEAGE_FINE",
"description": "Field-level lineage when include_column_lineage=true",
"subtype_modifier": null,
"supported": true
},
{
"capability": "DESCRIPTIONS",
"description": "Enabled by default",
"subtype_modifier": null,
"supported": true
},
{
"capability": "DELETION_DETECTION",
"description": "Enabled by default via stateful ingestion",
"subtype_modifier": null,
"supported": true
},
{
"capability": "OWNERSHIP",
"description": "Document owner extracted from Omni API",
"subtype_modifier": null,
"supported": true
},
{
"capability": "PLATFORM_INSTANCE",
"description": "Supported via connection_to_platform_instance config",
"subtype_modifier": null,
"supported": true
},
{
"capability": "SCHEMA_METADATA",
"description": "Dimensions and measures extracted as schema columns",
"subtype_modifier": null,
"supported": true
},
{
"capability": "LINEAGE_COARSE",
"description": "Dashboard \u2192 Tile \u2192 Topic \u2192 View \u2192 DB Table",
"subtype_modifier": null,
"supported": true
},
{
"capability": "TEST_CONNECTION",
"description": "Enabled by default",
"subtype_modifier": null,
"supported": true
}
],
"classname": "datahub.ingestion.source.omni.omni.OmniSource",
"platform_id": "omni",
"platform_name": "Omni",
"support_status": "INCUBATING"
},
"openapi": {
"capabilities": [
{
Expand Down Expand Up @@ -4238,4 +4294,4 @@
"support_status": "CERTIFIED"
}
}
}
}
Empty file.
Loading
Loading