Skip to content

Commit f6798e6

Browse files
authored
Update concepts-change-data-capture.md
1 parent 618a076 commit f6798e6

File tree

1 file changed

+9
-11
lines changed

1 file changed

+9
-11
lines changed

articles/data-factory/concepts-change-data-capture.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: data-factory
99
ms.subservice: data-movement
1010
ms.custom: synapse
1111
ms.topic: conceptual
12-
ms.date: 01/04/2023
12+
ms.date: 01/23/2023
1313
---
1414

1515
# Change data capture in Azure Data Factory and Azure Synapse Analytics
@@ -24,6 +24,10 @@ To learn more, see [Azure Data Factory overview](introduction.md) or [Azure Syna
2424

2525
When you perform data integration and ETL processes in the cloud, your jobs can perform much better and be more effective when you only read the source data that has changed since the last time the pipeline ran, rather than always querying an entire dataset on each run. ADF provides multiple different ways for you to easily get delta data only from the last run.
2626

27+
### Change Data Capture factory resource
28+
29+
The easiest and quickest way to get started in data factory with CDC is through the factory level Change Data Capture resource. From the main pipeline designer, click on New under Factory Resources to create a new Change Data Capture. The CDC factory resource will provide a configuration walk-through experience where you will point to your sources and destinations, apply optional transformations, and then click start to begin your data capture. With the CDC resource, you will not beed to design pipelines or data flow activities and the only billing will be 4 cores of General Purpose data flows while your data in being processed. You set a latency which ADF will use to wake-up and look for changed data. That is the only time you will be billed. The top-level CDC resource is also the ADF method of running your processes continuously. Pipelines in ADF are batch only. But the CDC resource can run continuously.
30+
2731
### Native change data capture in mapping data flow
2832

2933
The changed data including inserted, updated and deleted rows can be automatically detected and extracted by ADF mapping data flow from the source databases. No timestamp or ID columns are required to identify the changes since it uses the native change data capture technology in the databases. By simply chaining a source transform and a sink transform reference to a database dataset in a mapping data flow, you will see the changes happened on the source database to be automatically applied to the target database, so that you can easily synchronize data between two tables. You can also add any transformations in between for any business logic to process the delta data. When defining your sink data destination, you can set insert, update, upsert, and delete operations in your sink without the need of an Alter Row transformation because ADF is able to automatically detect the row makers.
@@ -52,27 +56,20 @@ The newly updated rows or updated files can be automatically detected and extrac
5256
- [Azure Database for PostgreSQL](connector-azure-database-for-postgresql.md)
5357
- [Common data model](format-common-data-model.md)
5458

55-
### Change data capture top-level resource
56-
57-
A new top-level change data capture resource guides users through a simple configuration process to create a resource that will continuously and automatically read changes from data source(s) without needing to design pipelines, data flows, or set up triggers.
58-
59-
:::image type="content" source="media/adf-cdc/change-data-capture-resource-1.png" alt-text="Screenshot of new top-level artifact in Factory Resources panel.":::
60-
61-
6259
### Customer managed delta data extraction in pipeline
6360

6461
You can always build your own delta data extraction pipeline for all ADF supported data stores including using lookup activity to get the watermark value stored in an external control table, copy activity or mapping data flow activity to query the delta data against timestamp or ID column, and SP activity to write the new watermark value back to your external control table for the next run. When you want to load new files only from a storage store, you can either delete files every time after they have been moved to the destination successfully, or leverage the time partitioned folder or file names or last modified time to identify the new files.
6562

6663

6764
## Best Practices
6865

69-
**Change data capture from databases:**
66+
**Change data capture from databases**
7067

7168
- Native change data capture is always recommended as the simplest way for you to get change data. It also brings much less burden on your source database when ADF extracts the change data for further processing.
7269
- If your database stores are not part of the ADF connector list with native change data capture support, we recommend you to check the auto incremental extraction option where you only need to input incremental column to capture the changes. ADF will take care of the rest including creating a dynamic query for delta loading and managing the checkpoint for each activity run.
7370
- Customer managed delta data extraction in pipeline covers all the ADF supported databases and give you the flexibility to control everything by yourself.
7471

75-
**Change files capture from file based storages:**
72+
**Change files capture from file based storages**
7673

7774
- When you want to load data from Azure Blob Storage, Azure Data Lake Storage Gen2 or Azure Data Lake Storage Gen1, mapping data flow provides you the opportunity to get new or updated files only by simple one click. It is the simplest and recommended way for you to achieve delta load from these file based storages in mapping data flow.
7875
- You can get more [best practices](https://techcommunity.microsoft.com/t5/azure-data-factory-blog/best-practices-of-how-to-use-adf-copy-activity-to-copy-new-files/ba-p/1532484).
@@ -103,4 +100,5 @@ The followings are the templates to use the change data capture in Azure Data Fa
103100
## Next steps
104101

105102
- [Learn how to use the checkpoint key in the data flow activity](control-flow-execute-data-flow-activity.md).
106-
- [Learn about the Change data capture resource in ADF](concepts-change-data-capture-resource.md).
103+
- [Learn about the ADF Change Data Capture resource](concepts-change-data-capture-resource.md).
104+
- [Walk through building a top-level CDC artifact](how-to-change-data-capture-resource.md).

0 commit comments

Comments
 (0)