Skip to content

Commit 6735deb

Browse files
committed
Improvements
1 parent 5283f21 commit 6735deb

8 files changed

+143
-142
lines changed

articles/data-factory/author-global-parameters.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ai-usage: ai-assisted
1313

1414
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
1515

16-
Global parameters are constants across a data factory that can be consumed by a pipeline in any expression. They're useful when you have multiple pipelines with identical parameter names and values. When promoting a data factory using the continuous integration and deployment process (CI/CD), you can override these parameters in each environment.
16+
Global parameters are constants across a data factory that pipelines can consume in any expression. They're useful when you have multiple pipelines with identical parameter names and values. When promoting a data factory using the continuous integration and deployment process (CI/CD), you can override these parameters in each environment.
1717

1818
## Creating global parameters
1919

@@ -38,17 +38,18 @@ Global parameters can be used in any [pipeline expression](control-flow-expressi
3838

3939
## <a name="cicd"></a> Global parameters in CI/CD
4040

41-
We recommend including global parameters in the ARM template during the CI/CD. The new mechanism of including global parameters in the ARM template (from 'Manage hub' -> 'ARM template' -> ‘Include global parameters in ARM template') as illustrated below, will not conflict/ override the factory-level settings as it used to do earlier, hence not requiring additional PowerShell for global parameters deployment during CI/CD.
41+
We recommend including global parameters in the ARM template during the CI/CD. The new mechanism of including global parameters in the ARM template (from 'Manage hub' -> 'ARM template' -> ‘Include global parameters in ARM template') as illustrated in the following image, won't conflict/ override the factory-level settings as it used to do earlier, hence not requiring extra PowerShell for global parameters deployment during CI/CD.
4242

4343
:::image type="content" source="media/author-global-parameters/include-arm-template.png" alt-text="Screenshot of 'Include in ARM template'.":::
4444

4545
> [!NOTE]
46-
> We have moved the UI experience for including global parameters from the 'Global parameters' section to the 'ARM template' section in the manage hub.
47-
If you are already using the older mechanism (from 'Manage hub' -> 'Global parameters' -> 'Include in ARM template'), you can continue. We will continue to support it.
46+
> We moved the UI experience for including global parameters from the 'Global parameters' section to the 'ARM template' section in the Manage hub.
47+
48+
If you're already using the older mechanism (from 'Manage hub' -> 'Global parameters' -> 'Include in ARM template'), you can continue. We'll continue to support it.
4849

4950
The **Parameters** folder in the downloaded ARM template contains JSON files that define the parameters used in the ARM template. Each file corresponds to a specific global parameter.
5051

51-
If you are using the older flow of integrating global parameters in your continuous integration and deployment solution, it will continue to work:
52+
If you're using the older flow of integrating global parameters in your continuous integration and deployment solution, it continues to work:
5253

5354
* Include global parameters in the ARM template (from 'Manage hub' -> 'Global parameters' -> 'Include in ARM template')
5455
:::image type="content" source="media/author-global-parameters/include-arm-template-deprecated.png" alt-text="Screenshot of deprecated 'Include in ARM template'.":::
@@ -58,10 +59,10 @@ If you are using the older flow of integrating global parameters in your continu
5859
We strongly recommend using the new mechanism of including global parameters in the ARM template (from 'Manage hub' -> 'ARM template' -> 'Include global parameters in an ARM template') since it makes the CICD with global parameters much more straightforward and easier to manage.
5960

6061
> [!NOTE]
61-
> The **Include global parameters in an ARM template** configuration is only available in "Git mode". Currently it is disabled in "live mode" or "Data Factory" mode.
62+
> The **Include global parameters in an ARM template** configuration is only available in "Git mode". Currently it's disabled in "live mode" or "Data Factory" mode.
6263
6364
> [!WARNING]
64-
>You cannot use ‘-‘ in the parameter name. You will receive an errorcode "{"code":"BadRequest","message":"ErrorCode=InvalidTemplate,ErrorMessage=The expression >'pipeline().globalParameters.myparam-dbtest-url' is not valid: .....}". But, you can use the ‘_’ in the parameter name.
65+
>You can't use ‘-‘ in the parameter name. You receive an errorcode "{"code":"BadRequest","message":"ErrorCode=InvalidTemplate,ErrorMessage=The expression >'pipeline().globalParameters.myparam-dbtest-url' isn't valid: .....}". But, you can use the ‘_’ in the parameter name.
6566
6667
## Related content
6768

articles/data-factory/concepts-change-data-capture.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,15 @@ To learn more, see [Azure Data Factory overview](introduction.md) or [Azure Syna
2121

2222
## Overview
2323

24-
When you perform data integration and ETL processes in the cloud, your jobs can perform better and be more effective when you only read the source data that has changed since the last time the pipeline ran, rather than always querying an entire dataset on each run. ADF provides multiple different ways for you to easily get delta data only from the last run.
24+
When you perform data integration and ETL processes in the cloud, your jobs can perform better and be more effective when you only read the source data that changed since the last time the pipeline ran, rather than always querying an entire dataset on each run. ADF provides multiple different ways for you to easily get delta data only from the last run.
2525

2626
### Change Data Capture factory resource
2727

28-
The easiest and quickest way to get started in data factory with CDC is through the factory level Change Data Capture resource. From the main pipeline designer, click on **New** under Factory Resources to create a new Change Data Capture. The CDC factory resource provides a configuration walk-through experience where you can select your sources and destinations, apply optional transformations, and then click start to begin your data capture. With the CDC resource, you do not need to design pipelines or data flow activities. You are also only billed for four cores of General Purpose data flows while your data in being processed. You can set a preferred latency, which ADF will use to wake up and look for changed data. That is the only time you will be billed. The top-level CDC resource is also the ADF method of running your processes continuously. Pipelines in ADF are batch only, but the CDC resource can run continuously.
28+
The easiest and quickest way to get started in data factory with CDC is through the factory level Change Data Capture resource. From the main pipeline designer, select on **New** under Factory Resources to create a new Change Data Capture. The CDC factory resource provides a configuration walk-through experience where you can select your sources and destinations, apply optional transformations, and then select start to begin your data capture. With the CDC resource, you don't need to design pipelines or data flow activities. You're also only billed for four cores of General Purpose data flows while your data in being processed. You can set a preferred latency, which ADF uses to wake up and look for changed data. That initial check is the only time you are billed. The top-level CDC resource is also the ADF method of running your processes continuously. Pipelines in ADF are batch only, but the CDC resource can run continuously.
2929

3030
### Native change data capture in mapping data flow
3131

32-
The changed data including inserted, updated and deleted rows can be automatically detected and extracted by ADF mapping data flow from the source databases. No timestamp or ID columns are required to identify the changes since it uses the native change data capture technology in the databases. By simply chaining a source transform and a sink transform reference to a database dataset in a mapping data flow, you can see the changes happened on the source database to be automatically applied to the target database, so that you can easily synchronize data between two tables. You can also add any transformations in between for any business logic to process the delta data. When defining your sink data destination, you can set insert, update, upsert, and delete operations in your sink without the need of an Alter Row transformation because ADF is able to automatically detect the row makers.
32+
ADF mapping data flow can automatically detect and extract changed data, including inserted, updated, and deleted rows from the source databases. No timestamp or ID columns are required to identify the changes since it uses the native change data capture technology in the databases. By chaining a source transform and a sink transform reference to a database dataset in a mapping data flow, you can see the changes happened on the source database to be automatically applied to the target database, so that you can easily synchronize data between two tables. You can also add any transformations in between for any business logic to process the delta data. When defining your sink data destination, you can set insert, update, upsert, and delete operations in your sink without the need of an Alter Row transformation because ADF is able to automatically detect the row makers.
3333

3434
> [!VIDEO https://learn-video.azurefd.net/vod/player?id=ba3e201c-c9d0-4c1d-806c-b3c8ca601de2]
3535
@@ -44,7 +44,7 @@ The changed data including inserted, updated and deleted rows can be automatical
4444

4545
### Auto incremental extraction in mapping data flow
4646

47-
The newly updated rows or updated files can be automatically detected and extracted by ADF mapping data flow from the source stores. When you want to get delta data from the databases, the incremental column is required to identify the changes. When you want to load new files or updated files only from a storage store, ADF mapping data flow just works through files’ last modify time.
47+
The newly updated rows or updated files can be automatically detected and extracted by ADF mapping data flow from the source stores. When you want to get delta data from the databases, the incremental column is required to identify the changes. When you want to load new files or updated files only from a storage store, ADF mapping data flow just works through files’ last modify time.
4848

4949
**Supported connectors**
5050
- [Azure Blob Storage](connector-azure-blob-storage.md)
@@ -59,30 +59,30 @@ The newly updated rows or updated files can be automatically detected and extrac
5959

6060
### Customer managed delta data extraction in pipeline
6161

62-
You can always build your own delta data extraction pipeline for all ADF supported data stores including using lookup activity to get the watermark value stored in an external control table, copy activity or mapping data flow activity to query the delta data against timestamp or ID column, and SP activity to write the new watermark value back to your external control table for the next run. When you want to load new files only from a storage store, you can either delete files every time after they have been moved to the destination successfully, or leverage the time partitioned folder or file names or last modified time to identify the new files.
62+
You can always build your own delta data extraction pipeline for all ADF supported data stores including using lookup activity to get the watermark value stored in an external control table, copy activity, or mapping data flow activity to query the delta data against timestamp or ID column, and SP activity to write the new watermark value back to your external control table for the next run. When you want to load new files only from a storage store, you can either delete files every time after they have been moved to the destination successfully, or use the time partitioned folder or file names or last modified time to identify the new files.
6363

6464

6565
## Best Practices
6666

6767
**Change data capture from databases**
6868

6969
- Native change data capture is always recommended as the simplest way for you to get change data. It also brings much less burden on your source database when ADF extracts the change data for further processing.
70-
- If your database stores are not part of the ADF connector list with native change data capture support, we recommend you to check the auto incremental extraction option where you only need to input incremental column to capture the changes. ADF will take care of the rest including creating a dynamic query for delta loading and managing the checkpoint for each activity run.
70+
- If your database stores aren't part of the ADF connector list with native change data capture support, we recommend you to check the auto incremental extraction option where you only need to input incremental column to capture the changes. ADF will take care of the rest including creating a dynamic query for delta loading and managing the checkpoint for each activity run.
7171
- Customer managed delta data extraction in pipeline covers all the ADF supported databases and give you the flexibility to control everything by yourself.
7272

7373
**Change files capture from file based storages**
7474

75-
- When you want to load data from Azure Blob Storage, Azure Data Lake Storage Gen2 or Azure Data Lake Storage Gen1, mapping data flow provides you with the opportunity to get new or updated files only by simple one click. It is the simplest and recommended way for you to achieve delta load from these file based storages in mapping data flow.
75+
- When you want to load data from Azure Blob Storage, Azure Data Lake Storage Gen2 or Azure Data Lake Storage Gen1, mapping data flow provides you with the opportunity to get new or updated files only by simple one select. It's the simplest and recommended way for you to achieve delta load from these file based storages in mapping data flow.
7676
- You can get more [best practices](https://techcommunity.microsoft.com/t5/azure-data-factory-blog/best-practices-of-how-to-use-adf-copy-activity-to-copy-new-files/ba-p/1532484).
7777

7878

7979
## Checkpoint
8080

81-
When you enable native change data capture or auto incremental extraction options in ADF mapping data flow, ADF helps you to manage the checkpoint to make sure each activity run will automatically only read the source data that has changed since the last time the pipeline run. By default, the checkpoint is coupled with your pipeline and activity name. If you change your pipeline name or activity name, the checkpoint will be reset, which leads you to start from beginning or get changes from now in the next run. If you do want to change the pipeline name or activity name but still keep the checkpoint to get changed data from the last run automatically, please use your own [Checkpoint key](control-flow-execute-data-flow-activity.md#checkpoint-key) in data flow activity to achieve that. The [naming rule](naming-rules.md) of your own checkpoint key is same as linked services, datasets, pipelines and data flows.
81+
When you enable native change data capture or auto incremental extraction options in ADF mapping data flow, ADF helps you to manage the checkpoint to make sure each activity run automatically only reads the source data that has changed since the last time the pipeline run. By default, the checkpoint is coupled with your pipeline and activity name. If you change your pipeline name or activity name, the checkpoint will be reset, which leads you to start from beginning or get changes from now in the next run. If you do want to change the pipeline name or activity name but still keep the checkpoint to get changed data from the last run automatically, use your own [Checkpoint key](control-flow-execute-data-flow-activity.md#checkpoint-key) in data flow activity to achieve that. The [naming rule](naming-rules.md) of your own checkpoint key is same as linked services, datasets, pipelines, and data flows.
8282

83-
When you debug the pipeline, this feature works the same. The checkpoint will be reset when you refresh your browser during the debug run. After you are satisfied with the pipeline result from debug run, you can go ahead to publish and trigger the pipeline. At the moment when you first time trigger your published pipeline, it automatically restarts from the beginning or gets changes from now on.
83+
When you debug the pipeline, this feature works the same. The checkpoint is reset when you refresh your browser during the debug run. After you're satisfied with the pipeline result from debug run, you can go ahead to publish and trigger the pipeline. At the moment when you first time trigger your published pipeline, it automatically restarts from the beginning or gets changes from now on.
8484

85-
In the monitoring section, you always have the chance to rerun a pipeline. When you are doing so, the changed data is always captured from the previous checkpoint of your selected pipeline run.
85+
In the monitoring section, you always have the chance to rerun a pipeline. When you're doing so, the changed data is always captured from the previous checkpoint of your selected pipeline run.
8686

8787
## Tutorials
8888

0 commit comments

Comments
 (0)