Skip to content

Commit e7f2418

Browse files
feat(ingestion): add Omni BI platform source (INCUBATING) (#16564)
1 parent 88b0b27 commit e7f2418

File tree

18 files changed

+4403
-6
lines changed

18 files changed

+4403
-6
lines changed
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Overview
2+
3+
Omni is a cloud-native business intelligence platform. Learn more in the [official Omni documentation](https://docs.omni.co/).
4+
5+
The DataHub integration for Omni covers BI entities such as dashboards, charts, semantic datasets, and related ownership context. Depending on module capabilities, it can also capture features such as lineage, usage, profiling, ownership, tags, and stateful deletion detection.
6+
7+
## Concept Mapping
8+
9+
| Omni Concept | DataHub Concept | Notes |
10+
| --------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
11+
| `Folder` | [Container](../../metamodel/entities/container.md) | SubType `"Folder"` |
12+
| `Dashboard` | [Dashboard](../../metamodel/entities/dashboard.md) | Published document with `hasDashboard=true` |
13+
| `Tile` | [Chart](../../metamodel/entities/chart.md) | Each query presentation within a dashboard |
14+
| `Topic` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"Topic"` — the semantic join graph entry point |
15+
| `View` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"View"` — semantic layer table with dimensions and measures as schema fields |
16+
| `Workbook` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"Workbook"` — unpublished personal exploration document |
17+
| Warehouse table | [Dataset](../../metamodel/entities/dataset.md) | Native platform entity (e.g. Snowflake, BigQuery); linked as upstream of Omni Views |
18+
| Document owner | [User (a.k.a CorpUser)](../../metamodel/entities/corpuser.md) | Propagated as `TECHNICAL_OWNER` to Dashboard and Chart entities |
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
### Capabilities
2+
3+
Use the **Important Capabilities** table above as the source of truth for supported features and whether additional configuration is required.
4+
5+
#### Physical table lineage
6+
7+
Omni Views reference physical warehouse tables via `sql_table_name` in model YAML. The connector resolves each reference to a DataHub dataset URN using the `connection_to_platform` mapping. If `normalize_snowflake_names: true` (default), database, schema, and table name components are uppercased to match the casing used by the DataHub Snowflake connector.
8+
9+
#### Column-level lineage
10+
11+
When `include_column_lineage: true` (default), the connector emits `FineGrainedLineage` entries by parsing `sql` expressions in model YAML and matching field references to known view columns. This enables precise field-level impact analysis across the full chain:
12+
13+
```
14+
physical_table.column → semantic_view.field → dashboard_tile.field
15+
```
16+
17+
#### Schema metadata
18+
19+
For each Omni Semantic View, the connector emits a `SchemaMetadata` aspect containing one `SchemaField` per dimension and measure defined in model YAML:
20+
21+
- **Dimensions**: emitted with inferred native type (string, date, timestamp, number, boolean)
22+
- **Measures**: emitted with aggregation type and native type `NUMBER`
23+
- Field descriptions are extracted from the YAML `description` attribute when present
24+
25+
#### Model and document filtering
26+
27+
Use `model_pattern` and `document_pattern` to restrict ingestion to specific models or dashboards:
28+
29+
```yaml
30+
model_pattern:
31+
allow:
32+
- "^prod-.*"
33+
deny:
34+
- ".*-dev$"
35+
36+
document_pattern:
37+
allow:
38+
- ".*"
39+
```
40+
41+
### Limitations
42+
43+
- Access Filters, User Attributes, and Cache schedules are not yet ingested.
44+
- Column lineage is limited to fields that appear in model YAML `sql` expressions; complex or fully derived expressions may not fully resolve.
45+
- Large organizations with many models may approach Omni API rate limits; tune `max_requests_per_minute` accordingly.
46+
- True end-to-end integration tests require a live Omni environment; the test suite uses deterministic mock API responses.
47+
48+
### Troubleshooting
49+
50+
If ingestion fails, validate credentials, permissions, and connectivity first. Then review the ingestion report and logs for source-specific errors.
51+
52+
Common issues:
53+
54+
| Symptom | Likely Cause | Resolution |
55+
| ------------------------------------------------ | ----------------------------------------------------- | ----------------------------------------------------------------------------- |
56+
| `403 Forbidden` on `/v1/connections` | API key lacks connection read scope | Ingestion continues with config fallbacks; physical lineage may be incomplete |
57+
| Physical tables not linked to warehouse entities | `connection_to_platform` not configured | Add connection mapping for each Omni connection ID |
58+
| Snowflake URN mismatch | Case mismatch between Omni and DataHub Snowflake URNs | Ensure `normalize_snowflake_names: true` (default) |
59+
| Column lineage empty | View YAML has no `sql` expressions | Expected for views using direct `sql_table_name` without field-level SQL |
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
### Overview
2+
3+
The `omni` module ingests metadata from the [Omni](https://omni.co/) BI platform into DataHub. It is intended for production ingestion workflows and supports the following:
4+
5+
- Folders (as Containers), Dashboards, and Chart tiles
6+
- Semantic layer: Models, Topics, and Views with schema fields (dimensions and measures)
7+
- Physical warehouse tables with upstream lineage stitched to existing DataHub entities
8+
- Column-level (fine-grained) lineage from semantic view fields back to warehouse columns
9+
- Ownership propagated from the Omni document API
10+
11+
Lineage is emitted as a five-hop chain:
12+
13+
```
14+
Folder → Dashboard → Chart (tile) → Topic → Semantic View → Physical Table
15+
```
16+
17+
### Prerequisites
18+
19+
Before running ingestion, ensure you have the following:
20+
21+
1. **An Omni Organization API key** with read access to models, documents, and connections. Generate API keys in Omni Admin → API Keys.
22+
23+
2. **Connection mapping configuration** if you want physical table lineage to stitch with existing warehouse entities in DataHub. You will need to map each Omni connection ID to the corresponding DataHub platform name, platform instance, and database name:
24+
25+
```yaml
26+
connection_to_platform:
27+
"conn_abc123": "snowflake"
28+
connection_to_platform_instance:
29+
"conn_abc123": "my_snowflake_account"
30+
connection_to_database:
31+
"conn_abc123": "ANALYTICS_PROD"
32+
```
33+
34+
Connection IDs can be found by calling the Omni `/v1/connections` API or from the Omni Admin UI.
35+
36+
:::note
37+
If the Omni API key does not have permission to list connections (`403 Forbidden`), the connector will fall back to the `connection_to_platform` config overrides and continue ingestion without failing.
38+
:::
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
source:
2+
type: omni
3+
config:
4+
# Coordinates
5+
base_url: "https://your-org.omniapp.co/api"
6+
7+
# Credentials
8+
api_key: "${OMNI_API_KEY}"
9+
10+
# Connection → warehouse stitching
11+
# Map Omni connection IDs to DataHub platform names so that physical table
12+
# URNs match what was ingested by your warehouse source connector.
13+
connection_to_platform:
14+
"conn_abc123": "snowflake"
15+
16+
# Optional: map connection IDs to platform instances
17+
# connection_to_platform_instance:
18+
# "conn_abc123": "my_snowflake_account"
19+
20+
# Optional: override the database name inferred from the Omni connection
21+
# connection_to_database:
22+
# "conn_abc123": "ANALYTICS_PROD"
23+
24+
# Optional: include workbook-only documents (not just published dashboards)
25+
# include_workbook_only: false
26+
27+
# Optional: filter which models to ingest
28+
# model_pattern:
29+
# allow:
30+
# - ".*"
31+
32+
# Optional: filter which documents (dashboards/workbooks) to ingest
33+
# document_pattern:
34+
# allow:
35+
# - ".*"
36+
37+
# Optional: disable column-level lineage
38+
# include_column_lineage: true
39+
40+
# Optional: stateful ingestion with stale entity removal
41+
stateful_ingestion:
42+
enabled: true
43+
remove_stale_metadata: true
44+
45+
sink:
46+
# sink configs

metadata-ingestion/pyproject.toml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -910,6 +910,11 @@ okta = [
910910
"okta~=1.7.0,<2.0.0",
911911
]
912912

913+
omni = [
914+
"PyYAML>=5.4",
915+
"requests<3.0.0",
916+
]
917+
913918
oracle = [
914919
"acryl-datahub-classify==0.0.11",
915920
"acryl-great-expectations==0.15.50.1",
@@ -1486,6 +1491,7 @@ all = [
14861491
"pyspark~=3.5.6,<4.0.0",
14871492
"python-ldap>=2.4,<4.0.0",
14881493
"python-liquid>=2.0.0,<3.0.0",
1494+
"PyYAML>=5.4",
14891495
"rdflib==6.3.2",
14901496
"redash-toolbelt<0.2.0",
14911497
"redshift-connector>=2.1.5,<3.0.0",
@@ -1651,7 +1657,7 @@ dev = [
16511657
"python-json-logger>=2.0.0,<5.0.0",
16521658
"python-ldap>=2.4,<4.0.0",
16531659
"python-liquid>=2.0.0,<3.0.0",
1654-
"PyYAML<7.0.0",
1660+
"PyYAML>=5.4,<7.0.0",
16551661
"rdflib==6.3.2",
16561662
"redash-toolbelt<0.2.0",
16571663
"redshift-connector>=2.1.5,<3.0.0",
@@ -1842,7 +1848,7 @@ docs = [
18421848
"python-json-logger>=2.0.0,<5.0.0",
18431849
"python-ldap>=2.4,<4.0.0",
18441850
"python-liquid>=2.0.0,<3.0.0",
1845-
"PyYAML<7.0.0",
1851+
"PyYAML>=5.4,<7.0.0",
18461852
"rdflib==6.3.2",
18471853
"redash-toolbelt<0.2.0",
18481854
"redshift-connector>=2.1.5,<3.0.0",
@@ -2111,6 +2117,7 @@ cassandra = "datahub.ingestion.source.cassandra.cassandra:CassandraSource"
21112117
neo4j = "datahub.ingestion.source.neo4j.neo4j_source:Neo4jSource"
21122118
vertexai = "datahub.ingestion.source.vertexai.vertexai:VertexAISource"
21132119
hex = "datahub.ingestion.source.hex.hex:HexSource"
2120+
omni = "datahub.ingestion.source.omni.omni:OmniSource"
21142121

21152122
[project.entry-points."datahub.ingestion.transformer.plugins"]
21162123
pattern_cleanup_ownership = "datahub.ingestion.transformer.pattern_cleanup_ownership:PatternCleanUpOwnership"

metadata-ingestion/setup.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -633,6 +633,7 @@
633633
},
634634
"flink": {"requests<3.0.0", "tenacity>=8.0.1,<9.0.0"},
635635
"grafana": {"requests<3.0.0", *sqlglot_lib},
636+
"omni": {"requests<3.0.0", "PyYAML>=5.4"},
636637
"glue": aws_common | cachetools_lib | sqlglot_lib,
637638
# hdbcli is supported officially by SAP, sqlalchemy-hana is built on top but not officially supported
638639
"hana": sql_common
@@ -958,6 +959,7 @@
958959
"neo4j",
959960
"vertexai",
960961
"mssql-odbc",
962+
"omni",
961963
]
962964
if plugin
963965
for dependency in plugins[plugin]
@@ -1118,6 +1120,7 @@
11181120
"neo4j = datahub.ingestion.source.neo4j.neo4j_source:Neo4jSource",
11191121
"vertexai = datahub.ingestion.source.vertexai.vertexai:VertexAISource",
11201122
"hex = datahub.ingestion.source.hex.hex:HexSource",
1123+
"omni = datahub.ingestion.source.omni.omni:OmniSource",
11211124
],
11221125
"datahub.ingestion.transformer.plugins": [
11231126
"pattern_cleanup_ownership = datahub.ingestion.transformer.pattern_cleanup_ownership:PatternCleanUpOwnership",

metadata-ingestion/src/datahub/ingestion/autogenerated/connector_registry/datahub.json

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2520,6 +2520,62 @@
25202520
"platform_name": "Okta",
25212521
"support_status": "CERTIFIED"
25222522
},
2523+
"omni": {
2524+
"capabilities": [
2525+
{
2526+
"capability": "LINEAGE_FINE",
2527+
"description": "Field-level lineage when include_column_lineage=true",
2528+
"subtype_modifier": null,
2529+
"supported": true
2530+
},
2531+
{
2532+
"capability": "DESCRIPTIONS",
2533+
"description": "Enabled by default",
2534+
"subtype_modifier": null,
2535+
"supported": true
2536+
},
2537+
{
2538+
"capability": "DELETION_DETECTION",
2539+
"description": "Enabled by default via stateful ingestion",
2540+
"subtype_modifier": null,
2541+
"supported": true
2542+
},
2543+
{
2544+
"capability": "OWNERSHIP",
2545+
"description": "Document owner extracted from Omni API",
2546+
"subtype_modifier": null,
2547+
"supported": true
2548+
},
2549+
{
2550+
"capability": "PLATFORM_INSTANCE",
2551+
"description": "Supported via connection_to_platform_instance config",
2552+
"subtype_modifier": null,
2553+
"supported": true
2554+
},
2555+
{
2556+
"capability": "SCHEMA_METADATA",
2557+
"description": "Dimensions and measures extracted as schema columns",
2558+
"subtype_modifier": null,
2559+
"supported": true
2560+
},
2561+
{
2562+
"capability": "LINEAGE_COARSE",
2563+
"description": "Dashboard \u2192 Tile \u2192 Topic \u2192 View \u2192 DB Table",
2564+
"subtype_modifier": null,
2565+
"supported": true
2566+
},
2567+
{
2568+
"capability": "TEST_CONNECTION",
2569+
"description": "Enabled by default",
2570+
"subtype_modifier": null,
2571+
"supported": true
2572+
}
2573+
],
2574+
"classname": "datahub.ingestion.source.omni.omni.OmniSource",
2575+
"platform_id": "omni",
2576+
"platform_name": "Omni",
2577+
"support_status": "INCUBATING"
2578+
},
25232579
"openapi": {
25242580
"capabilities": [
25252581
{
@@ -4238,4 +4294,4 @@
42384294
"support_status": "CERTIFIED"
42394295
}
42404296
}
4241-
}
4297+
}

metadata-ingestion/src/datahub/ingestion/source/omni/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)