Skip to content

feat(ingestion): add Microsoft Fabric Data Factory connector#16646

Merged
aviraj-gour merged 29 commits intomasterfrom
connector/fabric-data-factory
Mar 26, 2026
Merged

feat(ingestion): add Microsoft Fabric Data Factory connector#16646
aviraj-gour merged 29 commits intomasterfrom
connector/fabric-data-factory

Conversation

@aviraj-gour
Copy link
Copy Markdown
Contributor

Summary

This PR implements a new DataHub ingestion connector for Microsoft Fabric Data Factory. The connector extracts pipeline orchestration metadata (workspaces, pipelines, activities, execution history) and resolves lineage from Copy and InvokePipeline activities.

  • Adds a new fabric-data-factory ingestion source that extracts workspaces, data pipelines, activities, and execution history from Microsoft Fabric
  • Extracts dataset-level lineage from Copy activities and pipeline-to-pipeline lineage from InvokePipeline activities, with cross-recipe lineage support via platform_instance_map
  • Introduces shared Fabric infrastructure (fabric/common/) for core API client, models, constants, and URN generation reusable by sibling connectors (e.g. fabric-onelake)
  • Refactors ADF linked service platform mapping into shared azure/constants.py to eliminate duplication between ADF and Fabric connectors
  • Registers fabric and fabric-data-factory as new data platforms with logos in both frontend and backend bootstrap configs

What's Implemented

Workspace & Pipeline Metadata

  • Fabric workspaces emitted as DataHub Containers (shared with OneLake connector via fabric platform)
  • Data pipelines emitted as DataFlows with custom properties (pipeline_id, workspace_id)
  • Pattern-based filtering via workspace_pattern and pipeline_pattern

Pipeline Activities as DataJobs

  • All 30+ Fabric activity types captured — each activity emitted as a DataJob with activity_type as custom property
  • Full intra-pipeline dependency graph: each activity's dependsOn emitted as inputDatajobEdges with dependency conditions (Succeeded/Failed/Skipped/Completed) as edge properties
  • Activity state (Active/Inactive) and onInactiveMarkAs captured as custom properties

Copy Activity — Dataset-Level Lineage

  • Source → sink lineage extracted from Copy activities (not InvokeCopyJob — those are separate standalone Fabric items)
  • Fabric-native datasets (Lakehouse, Warehouse, FabricSql) resolve to fabric-onelake URNs matching the OneLake connector's scheme
  • External platform datasets (Snowflake, BigQuery, S3, etc.) resolve to standard DataHub dataset URNs
  • 4-level connection resolution fallback: dataset externalReferencesconnectionSettings externalReferences → connectionSettings inline type → linkedService type
  • Cross-recipe lineage support via platform_instance_map to connect to datasets ingested by other connectors

Pipeline-to-Pipeline Lineage (InvokePipeline)

  • InvokePipeline activities with InvokeFabricPipeline operation type create cross-pipeline DataFlow edges, there are two more operation type which are not parsed right now.
  • Two-pass processing: all activities cached across workspaces first, then cross-workspace edges resolved
  • Note: ExecutePipeline is marked as legacy in Fabric and is not parsed in this MVP

Pipeline Run & Activity Run History

  • Pipeline runs emitted as DataProcessInstance with start/end events and status mapping
  • Activity runs queried per pipeline run via queryActivityRuns API, emitted as child DataProcessInstance entities
  • Configurable lookback window (execution_history_days, default 7, max 90). Fabric API returns at most 100 recent completed runs per pipeline

Shared Fabric Infrastructure (fabric/common/)

  • core_client.py — Shared Fabric Core API client (workspaces, items, connections, job instances)
  • models.py — Shared models (FabricWorkspace, FabricItem, FabricConnection, WorkspaceKey)
  • constants.pyFABRIC_CONNECTION_PLATFORM_MAP with 100+ connection type → DataHub platform mappings
  • urn_generator.py — Pipeline/activity URN generation functions
  • base_client.py — Generic pagination for GET and POST Fabric API endpoints

Refactoring

  • Extracted ADF_LINKED_SERVICE_PLATFORM_MAP from the ADF source into shared azure/constants.py, used by both ADF and Fabric connectors
  • Added 13 Fabric-specific activity subtypes to common/subtypes.py

Platform Registration

  • Registered fabric and fabric-data-factory as new data platforms in data-platforms.yaml (v7→v8)
  • Added platform logos (SVG) and constants to both frontend ingestion builders (v1 and v2)

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Mar 18, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

@alwaysmeticulous
Copy link
Copy Markdown

alwaysmeticulous bot commented Mar 18, 2026

🔴 Meticulous spotted visual differences in 6 of 1579 screens tested: view and approve differences detected.

Meticulous evaluated ~8 hours of user flows against your PR.

Last updated for commit aec5aa7. This comment will update as new commits are pushed.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Bundle Report

Changes will increase total bundle size by 13.52kB (0.06%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 22.7MB 13.52kB (0.06%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js 387 bytes 12.45MB 0.0%
assets/fabriclogo-*.svg (New) 8.86kB 8.86kB 100.0% 🚀
assets/fabricdatafactorylogo-*.svg (New) 4.27kB 4.27kB 100.0% 🚀

Files in assets/index-*.js:

  • ./src/app/ingest/source/builder/constants.ts → Total Size: 7.91kB

  • ./src/app/ingestV2/source/builder/constants.ts → Total Size: 7.97kB

  • ./src/images/fabriclogo.svg → Total Size: 46 bytes

  • ./src/images/fabricdatafactorylogo.svg → Total Size: 57 bytes

@datahub-connector-tests
Copy link
Copy Markdown

datahub-connector-tests bot commented Mar 23, 2026

Connector Tests Results

All connector tests passed for commit aec5aa7

View full test logs →

To skip connector tests, add the skip-connector-tests label (org members only).

Autogenerated by the connector-tests CI pipeline.

@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-2044

…y and pipeline activity processing with new URN generation and caching mechanisms
… integration, including overview, capabilities, and example recipe
@aviraj-gour aviraj-gour merged commit 6ed319d into master Mar 26, 2026
82 of 83 checks passed
@aviraj-gour aviraj-gour deleted the connector/fabric-data-factory branch March 26, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants