Skip to content

Commit 8fcf7e3

Browse files
Merge pull request #223045 from n0elleli/cdcupdate
Cdcupdate
2 parents 0069e38 + bd2a63a commit 8fcf7e3

38 files changed

+337
-2
lines changed

articles/data-factory/TOC.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,12 @@ items:
296296
- name: Supported functions
297297
href: wrangling-functions.md
298298
displayName: power
299-
- name: Change Data Capture
300-
href: concepts-change-data-capture.md
299+
- name: Change data capture
300+
items:
301+
- name: Change data capture
302+
href: concepts-change-data-capture.md
303+
- name: Change data capture resource
304+
href: concepts-change-data-capture-resource.md
301305
- name: Roles and permissions
302306
href: concepts-roles-permissions.md
303307
- name: Naming rules
@@ -354,6 +358,8 @@ items:
354358
href: author-management-hub.md
355359
- name: Source control
356360
href: source-control.md
361+
- name: Author a change data capture resource
362+
href: how-to-change-data-capture-resource.md
357363
- name: Connect to Azure DevOps in another tenant
358364
href: cross-tenant-connections-to-azure-devops.md
359365
- name: Continuous integration and delivery
@@ -1169,6 +1175,11 @@ items:
11691175
- name: Activities
11701176
href: data-factory-troubleshoot-guide.md
11711177
displayName: timeout, troubleshooting
1178+
- name: Change data capture
1179+
items:
1180+
- name: Change data capture troubleshooting
1181+
href: change-data-capture-troubleshoot.md
1182+
displayName: change data capture, troubleshooting
11721183
- name: Connectors
11731184
items:
11741185
- name: Overview and general copy activity errors
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: Troubleshoot the change data capture resource
3+
titleSuffix: Azure Data Factory
4+
description: Learn how to troubleshoot issues with the change data capture resoure in Azure Data Factory.
5+
author: n0elleli
6+
ms.service: data-factory
7+
ms.subservice:
8+
ms.topic: troubleshooting
9+
ms.date: 01/19/2023
10+
ms.author: noelleli
11+
ms.custom:
12+
---
13+
14+
# Troubleshoot the Change data capture resource in Azure Data Factory
15+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
16+
17+
This article provides suggestions on how to troubleshoot common problems with the change data capture resource in Azure Data Factory.
18+
19+
## Issue: Trouble enabling native CDC in my SQL source.
20+
21+
For sources in SQL, two sets of tables are available: tables with native SQL CDC enabled and tables with time-based incremental columns.
22+
23+
Follow these steps to configure native CDC for a specific source table in your SQL database.
24+
25+
Consider you have following table, with ID as the Primary Key. If a Primary Key is present in the schema, supports_net_changes is set to true by default. If not, configure it using the script in Query 3.
26+
27+
**Query 1**
28+
```sql
29+
30+
CREATE TABLE Persons (
31+
ID int,
32+
LastName varchar(255) NOT NULL,
33+
FirstName varchar(255),
34+
Age int,
35+
Last_login DATETIME,
36+
PRIMARY KEY (ID));
37+
38+
```
39+
40+
> [!NOTE]
41+
> Currently the ADF CDC resource only loads net changes for insert, update and delete operations.
42+
43+
To enable CDC at the database level, execute the following query:
44+
45+
**Query 2**
46+
47+
```sql
48+
EXEC sys.sp_cdc_enable_db
49+
```
50+
To enable CDC at the table level, execute the following query:
51+
52+
**Query 3**
53+
54+
```sql
55+
EXEC sys.sp_cdc_enable_table
56+
@source_schema = N'dbo'
57+
, @source_name = N'Persons'
58+
, @role_name = N'cdc_admin'
59+
, @supports_net_changes = 1
60+
, @captured_column_list = N'ID';
61+
```
62+
63+
## Issue: Tables are unavailable to select in the CDC resource configuration process.
64+
65+
If your SQL source doesn't have SQL Server CDC with net_changed enabled or doesn't have any time-based incremental columns, then the tables in your source will be unavailable for selection.
66+
67+
## Issue: The debug cluster is not available from a warm pool.
68+
69+
The debug cluster is not available from a warm pool. There will be a wait time in the order of 1+ minutes.
70+
71+
## Issue: My CDC resource has both source and target linked services that use custom integration runtimes and it won't work.
72+
73+
In factories with virtual networks, CDC resources will work fine if either the source or target linked service is tied to an auto-resolve integration runtime. If both the source and target linked services use custom integration runtimes, the CDC resource will not work.
74+
75+
In non-virtual network factories, CDC resources requiring a virtual network will not work. This fix is in progress.
76+
77+
## Issue: Creating a new linked service pointing to an Azure Key Vault linked service causes an error.
78+
79+
If you create a new linked service using the CDC fly-out process that points to an Azure Key Vault linked service, the CDC resource will break. This fix is in progress.
80+
81+
## Next steps
82+
- [Learn more about the change data capture resource](concepts-change-data-capture-resource.md)
83+
- [Set up a change data capture resource](how-to-change-data-capture-resource.md)
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Change Data Capture Resource
3+
titleSuffix: Azure Data Factory
4+
description: Learn more about the change data capture resource in Azure Data Factory.
5+
author: n0elleli
6+
ms.author: noelleli
7+
ms.reviewer:
8+
ms.service: data-factory
9+
ms.subservice: data-movement
10+
ms.custom:
11+
ms.topic: conceptual
12+
ms.date: 01/20/2023
13+
---
14+
15+
# Change data capture resource overview
16+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
17+
18+
Adapting to the cloud-first big data world can be incredibly challenging for data engineers who are responsible for building complex data integration and ETL pipelines.
19+
20+
Azure Data Factory is introducing a new mechanism to make the life of a data engineer easier.
21+
22+
By automatically detecting data changes at the source without requiring complex designing or coding, ADF is making it a breeze to scale these processes. Change Data Capture will now exist as a **new native top-level resource** in the Azure Data Factory studio where data engineers can quickly configure continuously running jobs to process big data at scale with extreme efficiency.
23+
24+
The new Change Data Capture resource in ADF allows for full fidelity change data capture that continuously runs in near real-time through a guided configuration experience.
25+
26+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-1.png" alt-text="Screenshot of new top-level resource in Factory Resources panel.":::
27+
28+
## Supported data sources
29+
30+
* Avro
31+
* Azure Cosmos DB (SQL API)
32+
* Azure SQL Database
33+
* Delimited Text
34+
* JSON
35+
* ORC
36+
* Parquet
37+
* SQL Server
38+
* XML
39+
40+
## Supported targets
41+
42+
* Avro
43+
* Azure SQL Database
44+
* Azure Synapse Analytics
45+
* Delimited Text
46+
* Delta
47+
* JSON
48+
* ORC
49+
* Parquet
50+
51+
## Known limitations
52+
* Currently, when creating source/target mappings, each source and target is only allowed to be used once.
53+
* Continuous, real-time streaming is coming soon.
54+
* Allow schema drift is coming soon.
55+
56+
For more information on known limitations and troubleshooting assistance, please reference [this troubleshooting guide](change-data-capture-troubleshoot.md).
57+
58+
59+
## Next steps
60+
- [Learn how to set up a change data capture resource](how-to-change-data-capture-resource.md).

articles/data-factory/concepts-change-data-capture.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ The newly updated rows or updated files can be automatically detected and extrac
5252
- [Azure Database for PostgreSQL](connector-azure-database-for-postgresql.md)
5353
- [Common data model](format-common-data-model.md)
5454

55+
### Change data capture top-level resource
56+
57+
A new top-level change data capture resource guides users through a simple configuration process to create a resource that will continuously and automatically read changes from data source(s) without needing to design pipelines, data flows, or set up triggers.
58+
59+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-1.png" alt-text="Screenshot of new top-level artifact in Factory Resources panel.":::
60+
61+
5562
### Customer managed delta data extraction in pipeline
5663

5764
You can always build your own delta data extraction pipeline for all ADF supported data stores including using lookup activity to get the watermark value stored in an external control table, copy activity or mapping data flow activity to query the delta data against timestamp or ID column, and SP activity to write the new watermark value back to your external control table for the next run. When you want to load new files only from a storage store, you can either delete files every time after they have been moved to the destination successfully, or leverage the time partitioned folder or file names or last modified time to identify the new files.
@@ -96,3 +103,4 @@ The followings are the templates to use the change data capture in Azure Data Fa
96103
## Next steps
97104

98105
- [Learn how to use the checkpoint key in the data flow activity](control-flow-execute-data-flow-activity.md).
106+
- [Learn about the Change data capture resource in ADF](concepts-change-data-capture-resource.md).
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
title: Capture changed data with a change data capture resource
3+
description: This tutorial provides step-by-step instructions on how to capture changed data from ADLS Gen2 to SQL DB using a Change data capture resource.
4+
author: n0elleli
5+
ms.author: noelleli
6+
ms.reviewer:
7+
ms.service: data-factory
8+
ms.subservice:
9+
ms.topic: conceptual
10+
ms.custom: seo-lt-2019
11+
ms.date: 01/20/2023
12+
---
13+
14+
# How to capture changed data from ADLS Gen2 to SQL DB using a Change data capture resource
15+
[!INCLUDE[appliesto-adf-asa-md]]
16+
17+
In this tutorial, you will use the Azure Data Factory user interface (UI) to create a new Change data capture resource that picks up changed data from an Azure Data Lake Storage (ADLS) Gen2 source to a SQL Database. The configuration pattern in this tutorial can be modified and expanded upon.
18+
19+
In this tutorial, you follow these steps:
20+
* Create a change data capture resource.
21+
* Monitor change data capture activity.
22+
23+
## Pre-requisites
24+
25+
* **Azure subscription.** If you don't have an Azure subscription, create a free Azure account before you begin.
26+
* **Azure storage account.** You use ADLS storage as a source data store. If you don't have a storage account, see Create an Azure storage account for steps to create one.
27+
* **Azure SQL Database.** You will use Azure SQL DB as a target data store. If you don’t have a SQL DB, please create one in the Azure portal first before continuing the tutorial.
28+
29+
30+
## Create a change data capture artifact
31+
32+
1. Navigate to the **Author** blade in your data factory. You will see a new top-level artifact under **Pipelines** called **Change data capture (preview)**.
33+
34+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-2.png" alt-text="Screenshot of new top level artifact shown under Factory resources panel.":::
35+
36+
2. To create a new **Change data capture**, hover over **Change data capture (preview)** until you see 3 dots appear. Click on the **Change data capture actions**.
37+
38+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-3.png" alt-text="Screenshot of Change data capture (preview) Actions after hovering on the new top-level artifact.":::
39+
40+
3. Select **New change data capture (preview)**. This will open a flyout to begin the guided process.
41+
42+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-4.png" alt-text="Screenshot of a list of change data capture actions.":::
43+
44+
4. You will then be prompted to name your CDC resource. By default, the name will be set to “adfcdc” and continue to increment up by 1. You can replace this default name with your own.
45+
46+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-5.png" alt-text="Screenshot of the text box to update the name of the resource.":::
47+
48+
5. Use the drop-down selection list to choose your data source. For this tutorial, we will use **DelimitedText**.
49+
50+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-6.png" alt-text="Screenshot of the guided process flyout with source options in a drop-down selection menu.":::
51+
52+
6. You will then be prompted to select a linked service. Create a new linked service or select an existing one.
53+
54+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-7.png" alt-text="Screenshot of the selection box to choose or create a new linked service.":::
55+
56+
7. Use the **Browse** button to select your source data folder.
57+
58+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-8.png" alt-text="Screenshot of a folder icon to browse for a folder path.":::
59+
60+
8. Once you’ve selected a folder path, click **Continue** to set your data target.
61+
62+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-9.png" alt-text="Screenshot of the continue button in the guided process to proceed to select data targets.":::
63+
64+
> [!NOTE]
65+
> You can choose to add multiple source folders with the **+** button. The other sources must also use the same linked service that you’ve already selected.
66+
67+
9. Then, select a **Target type** using the drop-down selection. For this tutorial, we will select **Azure SQL Database**.
68+
69+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-10.png" alt-text="Screenshot of a drop-down selection menu of all data target types.":::
70+
71+
10. You will then be prompted to select a linked service. Create a new linked service or select an existing one.
72+
73+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-11.png" alt-text="Screenshot of the selection box to choose or create a new linked service to your data target.":::
74+
75+
11. Create new **Target table(s)** or select an existing **Target table(s)**. Use the checkbox to make your selection(s). The **Preview** button will allow you to view your table data.
76+
77+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-12.png" alt-text="Screenshot of the create new tables button and the selection boxes to choose tables for your target.":::
78+
79+
12. Click **Continue** when you have finalized your selection(s).
80+
81+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-13.png" alt-text="Screenshot of the continue button in the guided process to proceed to the next step.":::
82+
83+
> [!NOTE]
84+
> You can choose multiple target tables from your SQL DB. Use the check boxes to select all targets.
85+
86+
13. You will automatically land in a new change data capture tab, where you can configure your new resource.
87+
88+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-14.png" alt-text="Screenshot of the change data capture studio.":::
89+
90+
14. A new mapping will automatically be created for you. You can update the **Source** and **Target** selections for your mapping by using the drop-down selection lists.
91+
92+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-15.png" alt-text="Screenshot of the source to target mapping in the change data capture studio.":::
93+
94+
15. Once you’ve selected your tables, you should see that there are columns mapped. Select the **Column mappings** button to view the column mappings.
95+
96+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-16.png" alt-text="Screenshot of the mapping icon to view column mappings.":::
97+
98+
16. Here you can view your column mappings. Use the drop-down lists to edit your column mappings for **Mapping method**, **Source column**, and **Target** column.
99+
100+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-17.png" alt-text="Screenshot of the column mappings.":::
101+
102+
You can add additional column mappings using the **New mapping** button. Use the drop-down lists to select the **Mapping method**, **Source column**, and **Target** column.
103+
104+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-18.png" alt-text="Screenshot of the Add new mapping icon to add new column mappings.":::
105+
106+
17. When your mapping is complete, click the back arrow to return to the main canvas.
107+
108+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-19.png" alt-text="Screenshot of the arrow icon to return to the main change data capture canvas.":::
109+
110+
> [!NOTE]
111+
> You can add additional source to target mappings in one CDC artifact. Use the edit button to select more data sources and targets. Then, click **New mapping** and use the drop-down lists to set a new source and target mapping.
112+
113+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-20.png" alt-text="Screenshot of the edit button to add new sources.":::
114+
115+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-21.png" alt-text="Screenshot of the new mapping button to set a new source to target mapping.":::
116+
117+
18. Once your mapping complete, set your frequency using the **Set Latency** button.
118+
119+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-22.png" alt-text="Screenshot of the set frequency button at the top of the canvas.":::
120+
121+
19. Select the cadence of your change data capture and click **Apply** to make the changes. By default, it will be set to 15 minutes.
122+
123+
For example, if you select 30 minutes, every 30 minutes, your change data capture will process your source data and pick up any changed data since the last processed time.
124+
125+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-23.png" alt-text="Screenshot of the set frequency selection menu.":::
126+
127+
> [!NOTE]
128+
> The option to select Real-time to enable streaming data integration is coming soon.
129+
130+
20. Once everything has been finalized, publish your changes.
131+
132+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-24.png" alt-text="Screenshot of the publish button at the top of the canvas.":::
133+
134+
> [!NOTE]
135+
> If you do not publish your changes, you will not be able to start your CDC resource. The start button will be grayed out.
136+
137+
21. Click **Start** to start running your **Change data capture**.
138+
139+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-25.png" alt-text="Screenshot of the start button at the top of the canvas.":::
140+
141+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-26.png" alt-text="Screenshot of an actively running change data capture resource.":::
142+
143+
144+
## Monitor your Change data capture
145+
146+
1. To monitor your change data capture, navigate to the **Monitor** blade or click the monitoring icon from the CDC designer.
147+
148+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-27.png" alt-text="Screenshot of the monitoring blade.":::
149+
150+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-28.png" alt-text="Screenshot of the monitoring button at the top of the change data capture canvas.":::
151+
152+
2. Select **Change data capture** to view your CDC resources.
153+
154+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-29.png" alt-text="Screenshot of the Change data capture monitoring section.":::
155+
156+
3. Here you can see the **Source**, **Target**, **Status**, and **Last processed** time of your change data capture.
157+
158+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-30.png" alt-text="Screenshot of an overview of the change data capture monitoring page.":::
159+
160+
4. Click the name of your CDC to see more details. You can see how many rows were read and written and other diagnostic information.
161+
162+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-31.png" alt-text="Screenshot of the detailed monitoring of a selected change data capture.":::
163+
164+
> [!NOTE]
165+
> If you have multiple mappings set up in your Change data capture, each mapping will show as a different color. Click on the bar to see specific details for each mapping or use the Diagnostics at the bottom of the screen.
166+
167+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-32.png" alt-text="Screenshot of the detailed monitoring page of a change data capture with multiple sources to target mappings.":::
168+
169+
:::image type="content" source="media/adf-cdc/change-data-capture-resource-33.png" alt-text="Screenshot of a detailed breakdown of each mapping in the change data capture artifact.":::
170+
171+
172+
## Next steps
173+
- [Learn more about the change data capture resource](concepts-change-data-capture-resource.md)
145 KB
Loading
89 KB
Loading
114 KB
Loading
118 KB
Loading
115 KB
Loading

0 commit comments

Comments
 (0)