You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Learn how to use the SAP ODP (preview) data partitioning template to auto-generate a pipeline. Then, use the pipeline in Azure Data Factory to partition SAP change data capture (CDC) extracted data.
17
+
Learn how to use the SAP data partitioning template to auto-generate a pipeline as part of your SAP change data capture (CDC) solution (preview). Then, use the pipeline in Azure Data Factory to partition SAP CDC extracted data.
18
18
19
-
## Use the SAP ODP data partitioning template
19
+
## Create a data partitioning pipeline from a template
20
20
21
-
To auto-generate an Azure Data Factory pipeline by using the SAP ODP data partitioning template:
21
+
To auto-generate an Azure Data Factory pipeline by using the SAP data partitioning template:
22
22
23
-
1. In Azure Data Factory Studio, go to the **Author** hub of your data factory. In **Factory Resources**, under **Pipelines** > **Pipelines Actions**, select **Pipeline from template**.
23
+
1. In Azure Data Factory Studio, go to the Author hub of your data factory. In **Factory Resources**, under **Pipelines** > **Pipelines Actions**, select **Pipeline from template**.
24
24
25
-
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-pipeline-from-template.png" alt-text="Screenshot of the Azure Data Factory resources tab with the Pipeline from template menu highlighted.":::
25
+
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-pipeline-from-template.png" alt-text="Screenshot of the Azure Data Factory resources tab, with Pipeline from template highlighted.":::
26
26
27
-
1. Select the **Partition SAP data to extract and load into Azure Data Lake Store Gen2 in parallel** template, and then select **Continue**..
27
+
1. Select the **Partition SAP data to extract and load into Azure Data Lake Store Gen2 in parallel** template, and then select **Continue**.
28
28
29
-
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-template-selection.png" alt-text="Screenshot of the template gallery with the SAP data partitioning template highlighted.":::
29
+
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-template-selection.png" alt-text="Screenshot of the template gallery, with the SAP data partitioning template highlighted.":::
30
30
31
-
1. Create new or use existing [linked services](sap-change-data-capture-prepare-linked-service-source-dataset.md) for SAP CDC, Azure Data Lake Storage Gen2, and Azure Synapse Analytics. Use the linked services as inputs in the SAP data partitioning template.
31
+
1. Create new or use existing [linked services](sap-change-data-capture-prepare-linked-service-source-dataset.md) for SAP ODP (preview), Azure Data Lake Storage Gen2, and Azure Synapse Analytics. Use the linked services as inputs in the SAP data partitioning template.
32
32
33
-
For the SAP ODP linked service, for**Connect via integration runtime**, select your self-hosted integration runtime. For the Data Lake Storage Gen2 linked service, for**Connect via integration runtime**, select **AutoResolveIntegrationRuntime**.
33
+
Under **Inputs**, for the SAP ODP linked service, in**Connect via integration runtime**, select your self-hosted integration runtime. For the Data Lake Storage Gen2 linked service, in**Connect via integration runtime**, select **AutoResolveIntegrationRuntime**.
34
34
35
-
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-template-configuration.png" alt-text="Screenshot of the SAP data partitioning template configuration page with the Inputs section highlighted.":::
35
+
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-template-configuration.png" alt-text="Screenshot of the SAP data partitioning template configuration page, with the Inputs section highlighted.":::
36
36
37
37
1. Select **Use this template** to auto-generate an SAP data partitioning pipeline that can run multiple Data Factory copy activities to extract multiple partitions in parallel.
38
38
39
-
Data Factory copy activities run on a self-hosted integration runtime to concurrently extract full raw data from your SAP system and load it into ADLS Gen2 as CSV files. The files are stored in the *sapcdc* container under the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. Be sure that **Extraction mode** for the Data Factory copy activity is set to **Full**.
39
+
Data Factory copy activities run on a self-hosted integration runtime to concurrently extract full raw data from your SAP system and load it into Data Lake Storage Gen2 as CSV files. The files are stored in the *sapcdc* container in the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. Be sure that **Extraction mode** for the Data Factory copy activity is set to **Full**.
40
40
41
-
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2, Azure integration runtime, and Azure Synapse Analytics deployment in the same region.
41
+
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2 instance, Azure integration runtime, and Azure Synapse Analytics instance in the same region.
42
42
43
-
1. Assign your SAP data extraction context, data source object names, and an array of partitions. Each is defined as an array of row selection conditions that serve as runtime parameter values for the SAP data partitioning pipeline.
43
+
1. Assign your SAP data extraction context, data source object names, and an array of partitions. Define each element as an array of row selection conditions that serve as runtime parameter values for the SAP data partitioning pipeline.
44
44
45
-
For the **selectionRangeList** parameter, enter your array of partition(s), each is defined as an array of row selection condition(s). For example, here’s an array of three partitions, where the first partition includes only rows where the value in **CUSTOMERID** column is between **1** and **1000000** (the first million customers), the second partition includes only rows where the value in **CUSTOMERID** column is between **1000001** and **2000000** (the second million customers), and the third partition includes the rest of customers:
45
+
For the `selectionRangeList` parameter, enter your array of partitions. Define each partition as an array of row selection conditions. For example, here’s an array of three partitions, where the first partition includes only rows where the value in the **CUSTOMERID** column is between **1** and **1000000** (the first million customers), the second partition includes only rows where the value in the **CUSTOMERID** column is between **1000001** and **2000000** (the second million customers), and the third partition includes the rest of the customers:
The three partitions are extracted by using three Data Factory copy activities that run in parallel.
50
50
51
51
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-partition-extraction-configuration.png" alt-text="Screenshot of the pipeline configuration for the SAP data partitioning template with the parameters section highlighted.":::
52
52
53
-
1. Select **Save all**.
54
-
55
-
Now, you can run the SAP data partitioning pipeline.
53
+
1. Select **Save all** and run the SAP data partitioning pipeline.
56
54
57
55
## Next steps
58
56
59
-
[Auto-generate a pipeline from the SAP ODP data replication template](sap-change-data-capture-data-replication-template.md)
57
+
[Auto-generate a pipeline from the SAP data replication template](sap-change-data-capture-data-replication-template.md)
Learn how to use the SAP ODP (preview) data replication template to auto-generate a pipeline. Then, use the pipeline in Azure Data Factory for SAP change data capture (CDC) extraction in your datasets.
17
+
Learn how to use the SAP data replication template to auto-generate a pipeline as part of your SAP change data capture (CDC) solution (preview). Then, use the pipeline in Azure Data Factory for SAP CDC extraction in your datasets.
18
18
19
19
## Create a data replication pipeline from a template
20
20
21
-
1. In Azure Data Factory Studio, go to the **Author** hub of your data factory. In **Factory Resources**, under **Pipelines** > **Pipelines Actions**, select **Pipeline from template**.
21
+
To auto-generate an Azure Data Factory pipeline by using the SAP data partitioning template:
22
+
23
+
1. In Azure Data Factory Studio, go to the Author hub of your data factory. In **Factory Resources**, under **Pipelines** > **Pipelines Actions**, select **Pipeline from template**.
22
24
23
25
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-new-pipeline.png" alt-text="Screenshot that shows creating a new pipeline in the Author hub.":::
24
26
25
27
1. Select the **Replicate SAP data to Azure Synapse Analytics and persist raw data in Azure Data Lake Storage Gen2** template, and then select **Continue**.
26
28
27
-
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-data-replication-template.png" alt-text="Screenshot of the template gallery with the SAP data replication template highlighted.":::
29
+
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-data-replication-template.png" alt-text="Screenshot of the template gallery, with the SAP data replication template highlighted.":::
28
30
29
-
1. Create new or use existing [linked services](sap-change-data-capture-prepare-linked-service-source-dataset.md) for SAP CDC, Azure Data Lake Storage Gen2, and Azure Synapse Analytics. Use the linked services as inputs in the SAP data replication template.
31
+
1. Create new or use existing [linked services](sap-change-data-capture-prepare-linked-service-source-dataset.md) for SAP ODP (preview), Azure Data Lake Storage Gen2, and Azure Synapse Analytics. Use the linked services as inputs in the SAP data replication template.
30
32
31
-
For the SAP ODP linked service, in **Connect via integration runtime**, select your self-hosted integration runtime. For the Data Lake Storage Gen2 and Azure Synapse Analytics linked services, in **Connect via integration runtime**, select **AutoResolveIntegrationRuntime**.
33
+
Under **Inputs**, for the SAP ODP linked service, in **Connect via integration runtime**, select your self-hosted integration runtime. For the Data Lake Storage Gen2 and Azure Synapse Analytics linked services, in **Connect via integration runtime**, select **AutoResolveIntegrationRuntime**.
32
34
33
35
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-data-replication-template-configuration.png" alt-text="Screenshot of the configuration page for the SAP data replication template.":::
34
36
35
-
1. Select **Use this template** to auto-generate an SAP data replication pipeline that contains Azure Data Factory copy and data flow activities.
37
+
1. Select **Use this template** to auto-generate an SAP data replication pipeline that contains Azure Data Factory copy activities and data flow activities.
36
38
37
-
The Data Factory copy activity runs on the self-hosted integration runtime to extract raw data (full + deltas) from the SAP system. The copy activity loads the raw data into Data Lake Storage Gen2 as a persisted CSV file. Historical changes are archived and preserved. The files are stored in the *sapcdc* container under the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. Be sure that **Extraction mode** for the Data Factory copy activity is set to **Delta**. The **Subscriber process** property of copy activity is parameterized.
39
+
The Data Factory copy activity runs on the self-hosted integration runtime to extract raw data (full and deltas) from the SAP system. The copy activity loads the raw data into Data Lake Storage Gen2 as a persisted CSV file. Historical changes are archived and preserved. The files are stored in the *sapcdc* container in the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. Be sure that **Extraction mode** for the Data Factory copy activity is set to **Delta**. The **Subscriber process** property of copy activity is parameterized.
38
40
39
41
The Data Factory data flow activity runs on the Azure integration runtime to transform the raw data and merge all changes into Azure Synapse Analytics. The process replicates the SAP data.
40
42
41
-
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2, Azure integration runtime, and Azure Synapse Analytics deployment in the same region.
43
+
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2 instance, Azure integration runtime, and Azure Synapse Analytics instance in the same region.
42
44
43
45
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-data-replication-architecture.png" alt-text="Shows a diagram of the architecture of the SAP data replication scenario.":::
44
46
45
-
1. Assign your SAP data extraction context, data source object, `keyColumns` property, subscriber process names, and Synapse SQL schema and table names as runtime parameter values for the SAP data replication pipeline.
47
+
1. Assign your SAP data extraction context, data source object, key column names, subscriber process names, and Synapse SQL schema and table names as runtime parameter values for the SAP data replication pipeline.
46
48
47
-
For the `keyColumns` parameter, enter your key column names as an array of strings, such as `[“CUSTOMERID”]/[“keyColumn1”, “keyColumn2”, “keyColumn3”, … up to 10 key column names]`. The Data Factory data flow activity uses key columns in raw SAP data to identify changed rows. A changed row is a row that is created, deleted, or changed.
49
+
For the `keyColumns` parameter, enter your key column names as an array of strings, like `[“CUSTOMERID”]/[“keyColumn1”, “keyColumn2”, “keyColumn3”, … ]`. Include up to 10 key column names. The Data Factory data flow activity uses key columns in raw SAP data to identify changed rows. A changed row is a row that is created, deleted, or changed.
48
50
49
51
For the `subscriberProcess` parameter, enter a unique name for **Subscriber process** in the Data Factory copy activity. For example, you might name it `<your pipeline name>\<your copy activity name>`. You can rename it to start a new Operational Delta Queue subscription in SAP systems.
50
52
51
53
:::image type="content" source="media/sap-change-data-capture-solution/sap-cdc-data-replication-pipeline-parameters.png" alt-text="Screenshot of the SAP data replication pipeline with the parameters section highlighted.":::
52
54
53
55
1. Select **Save all** and run the SAP data replication pipeline.
54
56
55
-
## Create a data delta pipeline from a template
57
+
## Create a data delta replication pipeline from a template
56
58
57
59
If you want to replicate SAP data to Data Lake Storage Gen2 in delta format, complete the steps that are detailed in the preceding section, but instead use the **Replicate SAP data to Azure Data Lake Store Gen2 in Delta format and persist raw data in CSV format** template.
58
60
59
-
Like in the data replication template, in a data delta pipeline, the Data Factory copy activity runs on the self-hosted integration runtime to extract raw data (full + deltas) from the SAP system. The copy activity loads the raw data into Data Lake Storage Gen2 as a persisted CSV file Historical changes are archived and preserved. The files are stored in the *sapcdc* container under the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. The **Extraction mode** property of the copy activity is set to **Delta**. The **Subscriber process** property of copy activity is parameterized.
61
+
Like in the data replication template, in a data delta pipeline, the Data Factory copy activity runs on the self-hosted integration runtime to extract raw data (full and deltas) from the SAP system. The copy activity loads the raw data into Data Lake Storage Gen2 as a persisted CSV file Historical changes are archived and preserved. The files are stored in the *sapcdc* container under the *deltachange/\<your pipeline name\>\<your pipeline run timestamp\>* folder path. The **Extraction mode** property of the copy activity is set to **Delta**. The **Subscriber process** property of copy activity is parameterized.
60
62
61
63
The Data Factory data flow activity runs on the Azure integration runtime to transform the raw data and merge all changes into Data Lake Storage Gen2 as an open source Delta Lake or Lakehouse table. The process replicates the SAP data.
62
64
63
-
The table is stored in the *saptimetravel* container under the *\<your SAP table or object name\>* folder that has the *\*delta\*log* subfolder and Parquet files. You can [query the table by using a Synapse Analytics serverless SQL pool](../synapse-analytics/sql/query-delta-lake-format.md). You also can use Time Travel by using a Synapse Analytics serverless Apache Spark pool. For more information, see [Quickstart: Create a serverless Apache Spark pool in Azure Synapse Analytics using web tools](../synapse-analytics/quickstart-apache-spark-notebook.md) and [Read older versions of data by using Time Travel](../synapse-analytics/spark/apache-spark-delta-lake-overview.md?pivots=programming-language-python#read-older-versions-of-data-using-time-travel).
65
+
The table is stored in the *saptimetravel* container in the *\<your SAP table or object name\>* folder that has the *\*delta\*log* subfolder and Parquet files. You can [query the table by using an Azure Synapse Analytics serverless SQL pool](../synapse-analytics/sql/query-delta-lake-format.md). You also can use the Delta Lake Time Travel feature with an Azure Synapse Analytics serverless Apache Spark pool. For more information, see [Create a serverless Apache Spark pool in Azure Synapse Analytics by using web tools](../synapse-analytics/quickstart-apache-spark-notebook.md) and [Read older versions of data by using Time Travel](../synapse-analytics/spark/apache-spark-delta-lake-overview.md?pivots=programming-language-python#read-older-versions-of-data-using-time-travel).
64
66
65
-
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2, Azure integration runtime, and Delta Lake or Lakehouse instances in the same region.
67
+
To ensure high throughput, deploy your SAP system, self-hosted integration runtime, Data Lake Storage Gen2 instance, Azure integration runtime, and Delta Lake or Lakehouse instances in the same region.
0 commit comments