feat(ingestion): add Microsoft Fabric Data Factory connector#16646
feat(ingestion): add Microsoft Fabric Data Factory connector#16646aviraj-gour merged 29 commits intomasterfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
🔴 Meticulous spotted visual differences in 6 of 1579 screens tested: view and approve differences detected. Meticulous evaluated ~8 hours of user flows against your PR. Last updated for commit aec5aa7. This comment will update as new commits are pushed. |
Bundle ReportChanges will increase total bundle size by 13.52kB (0.06%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
Files in
|
542c04b to
63bd7db
Compare
ad35da4 to
59a57bc
Compare
Connector Tests ResultsAll connector tests passed for commit To skip connector tests, add the Autogenerated by the connector-tests CI pipeline. |
|
Linear: ING-2044 |
…y and pipeline activity processing with new URN generation and caching mechanisms
…tance mapping support
…se shared constants
…r pipeline and activity run
…across workspaces
… integration, including overview, capabilities, and example recipe
…nvokeType classes
… and InvokePipeline activities
…e and api response parsing
…ic connections and enhance documentation
…ing, and update tests
311c0cb to
336f395
Compare
Summary
This PR implements a new DataHub ingestion connector for Microsoft Fabric Data Factory. The connector extracts pipeline orchestration metadata (workspaces, pipelines, activities, execution history) and resolves lineage from Copy and InvokePipeline activities.
fabric-data-factoryingestion source that extracts workspaces, data pipelines, activities, and execution history from Microsoft Fabricplatform_instance_mapfabric/common/) for core API client, models, constants, and URN generation reusable by sibling connectors (e.g.fabric-onelake)azure/constants.pyto eliminate duplication between ADF and Fabric connectorsfabricandfabric-data-factoryas new data platforms with logos in both frontend and backend bootstrap configsWhat's Implemented
Workspace & Pipeline Metadata
fabricplatform)pipeline_id,workspace_id)workspace_patternandpipeline_patternPipeline Activities as DataJobs
activity_typeas custom propertydependsOnemitted asinputDatajobEdgeswith dependency conditions (Succeeded/Failed/Skipped/Completed) as edge propertiesonInactiveMarkAscaptured as custom propertiesCopy Activity — Dataset-Level Lineage
fabric-onelakeURNs matching the OneLake connector's schemeexternalReferences→connectionSettingsexternalReferences →connectionSettingsinline type →linkedServicetypeplatform_instance_mapto connect to datasets ingested by other connectorsPipeline-to-Pipeline Lineage (InvokePipeline)
InvokeFabricPipelineoperation type create cross-pipeline DataFlow edges, there are two more operation type which are not parsed right now.ExecutePipelineis marked as legacy in Fabric and is not parsed in this MVPPipeline Run & Activity Run History
DataProcessInstancewith start/end events and status mappingqueryActivityRunsAPI, emitted as childDataProcessInstanceentitiesexecution_history_days, default 7, max 90). Fabric API returns at most 100 recent completed runs per pipelineShared Fabric Infrastructure (
fabric/common/)core_client.py— Shared Fabric Core API client (workspaces, items, connections, job instances)models.py— Shared models (FabricWorkspace, FabricItem, FabricConnection, WorkspaceKey)constants.py—FABRIC_CONNECTION_PLATFORM_MAPwith 100+ connection type → DataHub platform mappingsurn_generator.py— Pipeline/activity URN generation functionsbase_client.py— Generic pagination for GET and POST Fabric API endpointsRefactoring
ADF_LINKED_SERVICE_PLATFORM_MAPfrom the ADF source into sharedazure/constants.py, used by both ADF and Fabric connectorscommon/subtypes.pyPlatform Registration
fabricandfabric-data-factoryas new data platforms indata-platforms.yaml(v7→v8)