You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-sql-data-warehouse.md
+22-2Lines changed: 22 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: data-factory
8
8
ms.subservice: data-movement
9
9
ms.custom: synapse
10
10
ms.topic: conceptual
11
-
ms.date: 12/29/2021
11
+
ms.date: 01/14/2022
12
12
---
13
13
14
14
# Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines
@@ -456,8 +456,13 @@ To copy data to Azure Synapse Analytics, set the sink type in Copy Activity to *
456
456
| tableOption | Specifies whether to [automatically create the sink table](copy-activity-overview.md#auto-create-sink-tables) if not exists based on the source schema. Allowed values are: `none` (default), `autoCreate`. |No |
457
457
| disableMetricsCollection | The service collects metrics such as Azure Synapse Analytics DWUs for copy performance optimization and recommendations, which introduce additional master DB access. If you are concerned with this behavior, specify `true` to turn it off. | No (default is `false`) |
458
458
| maxConcurrentConnections |The upperlimit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.| No |
459
+
| WriteBehavior | Specify the write behavior for copy activity to load data into Azure SQL Database. <br/> The allowed value is **Insert**and**Upsert**. By default, the service uses insert to load data. | No |
460
+
| upsertSettings | Specify the group of the settings for write behavior. <br/> Apply when the WriteBehavior option is `Upert`. | No |
461
+
| ***Under `upsertSettings`:*** | | |
462
+
| keys | Specify the column names for unique row identification. Either a single key or a series of keys can be used. If not specified, the primary key is used. | No |
463
+
| interimSchemaName | Specify the interim schema for creating interim table. Note: user need to have the permission for creating and deleting table. By default, interim table will share the same schema as sink table. | No |
459
464
460
-
#### Azure Synapse Analytics sink example
465
+
#### Example 1: Azure Synapse Analytics sink
461
466
462
467
```json
463
468
"sink": {
@@ -473,6 +478,21 @@ To copy data to Azure Synapse Analytics, set the sink type in Copy Activity to *
473
478
}
474
479
```
475
480
481
+
#### Example 2: Upsert data
482
+
483
+
```json
484
+
"sink": {
485
+
"type": "SqlDWSink",
486
+
"writeBehavior": "Upsert",
487
+
"upsertSettings": {
488
+
"keys": [
489
+
"<column name>"
490
+
],
491
+
"interimSchemaName": "<interim schema name>"
492
+
},
493
+
}
494
+
```
495
+
476
496
## Parallel copy from Azure Synapse Analytics
477
497
478
498
The Azure Synapse Analytics connector in copy activity provides built-in data partitioning to copy data in parallel. You can find data partitioning options on the **Source** tab of the copy activity.
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-sql-database.md
+47-30Lines changed: 47 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: data-factory
8
8
ms.subservice: data-movement
9
9
ms.topic: conceptual
10
10
ms.custom: synapse
11
-
ms.date: 12/24/2021
11
+
ms.date: 01/14/2022
12
12
---
13
13
14
14
# Copy and transform data in Azure SQL Database by using Azure Data Factory or Azure Synapse Analytics
@@ -483,6 +483,12 @@ To copy data to Azure SQL Database, the following properties are supported in th
483
483
| writeBatchTimeout | The wait time for the batch insert operation to finish before it times out.<br/> The allowed value is **timespan**. An example is "00:30:00" (30 minutes). | No |
484
484
| disableMetricsCollection | The service collects metrics such as Azure SQL Database DTUs for copy performance optimization and recommendations, which introduces additional master DB access. If you are concerned with this behavior, specify `true` to turn it off. | No (default is `false`) |
485
485
| maxConcurrentConnections |The upperlimit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.| No |
486
+
| WriteBehavior | Specify the write behavior for copy activity to load data into Azure SQL Database. <br/> The allowed value is **Insert**and**Upsert**. By default, the service uses insert to load data. | No |
487
+
| upsertSettings | Specify the group of the settings for write behavior. <br/> Apply when the WriteBehavior option is `Upert`. | No |
488
+
| ***Under `upsertSettings`:*** | | |
489
+
| useTempDB | Specify whether to use the a global temporary table or physical table as the interim table for upsert. <br>By default, the service uses global temporary table as the interim table. value is `true`. | No |
490
+
| interimSchemaName | Specify the interim schema for creating interim table if physical table is used. Note: user need to have the permission for creating and deleting table. By default, interim table will share the same schema as sink table. <br/> Apply when the useTempDB option is `False`. | No |
491
+
| keys | Specify the column names for unique row identification. Either a single key or a series of keys can be used. If not specified, the primary key is used. | No |
486
492
487
493
**Example 1: Append data**
488
494
@@ -557,6 +563,45 @@ Learn more details from [Invoke a stored procedure from a SQL sink](#invoke-a-st
The Azure SQL Database connector in copy activity provides built-in data partitioning to copy data in parallel. You can find data partitioning options on the **Source** tab of the copy activity.
@@ -641,35 +686,7 @@ Appending data is the default behavior of this Azure SQL Database sink connector
641
686
642
687
### Upsert data
643
688
644
-
**Option 1:** When you have a large amount of data to copy, you can bulk load all records into a staging table by using the copy activity, then run a stored procedure activity to apply a [MERGE](/sql/t-sql/statements/merge-transact-sql) or INSERT/UPDATE statement in one shot.
645
-
646
-
Copy activity currently doesn't natively support loading data into a database temporary table. There is an advanced way to set it up with a combination of multiple activities, refer to [Optimize Azure SQL Database Bulk Upsert scenarios](https://github.com/scoriani/azuresqlbulkupsert). Below shows a sample of using a permanent table as staging.
647
-
648
-
As an example, you can create a pipeline with a **Copy activity** chained with a **Stored Procedure activity**. The former copies data from your source store into an Azure SQL Database staging table, for example, **UpsertStagingTable**, as the table name in the dataset. Then the latter invokes a stored procedure to merge source data from the staging table into the target table and clean up the staging table.
In your database, define a stored procedure with MERGE logic, like the following example, which is pointed to from the previous stored procedure activity. Assume that the target is the **Marketing** table with three columns: **ProfileID**, **State**, and**Category**. Do the upsert based on the **ProfileID** column.
**Option 2:** You can choose to [invoke a stored procedure within the copy activity](#invoke-a-stored-procedure-from-a-sql-sink). This approach runs each batch (as governed by the `writeBatchSize` property) in the source table instead of using bulk insert as the default approach in the copy activity.
671
-
672
-
**Option 3:** You can use [Mapping Data Flow](#sink-transformation) which offers built-in insert/upsert/update methods.
689
+
Copy activity now supports natively loading data into a database temporary table and then update the data in sink table if key exists and otherwise insert new data.
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-sql-managed-instance.md
+47-29Lines changed: 47 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.topic: conceptual
8
8
ms.author: jianleishen
9
9
author: jianleishen
10
10
ms.custom: synapse
11
-
ms.date: 12/28/2021
11
+
ms.date: 01/14/2022
12
12
---
13
13
14
14
# Copy and transform data in Azure SQL Managed Instance using Azure Data Factory or Synapse Analytics
@@ -488,6 +488,12 @@ To copy data to SQL Managed Instance, the following properties are supported in
488
488
| writeBatchSize |Number of rows to insert into the SQL table *per batch*.<br/>Allowed values are integers for the number of rows. By default, the service dynamically determines the appropriate batch size based on the row size. |No |
489
489
| writeBatchTimeout |This property specifies the wait time for the batch insert operation to complete before it times out.<br/>Allowed values are for the timespan. An example is "00:30:00," which is 30 minutes. |No |
490
490
| maxConcurrentConnections |The upperlimit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.| No |
491
+
| WriteBehavior | Specify the write behavior for copy activity to load data into Azure SQL MI. <br/> The allowed value is **Insert**and**Upsert**. By default, the service uses insert to load data. | No |
492
+
| upsertSettings | Specify the group of the settings for write behavior. <br/> Apply when the WriteBehavior option is `Upert`. | No |
493
+
| ***Under `upsertSettings`:*** | | |
494
+
| useTempDB | Specify whether to use the a global temporary table or physical table as the interim table for upsert. <br>By default, the service uses global temporary table as the interim table. value is `true`. | No |
495
+
| interimSchemaName | Specify the interim schema for creating interim table if physical table is used. Note: user need to have the permission for creating and deleting table. By default, interim table will share the same schema as sink table. <br/> Apply when the useTempDB option is `False`. | No |
496
+
| keys | Specify the column names for unique row identification. Either a single key or a series of keys can be used. If not specified, the primary key is used. | No |
491
497
492
498
**Example 1: Append data**
493
499
@@ -562,6 +568,45 @@ Learn more details from [Invoke a stored procedure from a SQL MI sink](#invoke-a
The Azure SQL Managed Instance connector in copy activity provides built-in data partitioning to copy data in parallel. You can find data partitioning options on the **Source** tab of the copy activity.
@@ -646,34 +691,7 @@ Appending data is the default behavior of the SQL Managed Instance sink connecto
646
691
647
692
### Upsert data
648
693
649
-
**Option 1:** When you have a large amount of data to copy, you can bulk load all records into a staging table by using the copy activity, then run a stored procedure activity to apply a [MERGE](/sql/t-sql/statements/merge-transact-sql) or INSERT/UPDATE statement in one shot.
650
-
651
-
Copy activity currently doesn't natively support loading data into a database temporary table. There is an advanced way to set it up with a combination of multiple activities, refer to [Optimize SQL Database Bulk Upsert scenarios](https://github.com/scoriani/azuresqlbulkupsert). Below shows a sample of using a permanent table as staging.
652
-
653
-
As an example, you can create a pipeline with a **Copy activity** chained with a **Stored Procedure activity**. The former copies data from your source store into an Azure SQL Managed Instance staging table, for example, **UpsertStagingTable**, as the table name in the dataset. Then the latter invokes a stored procedure to merge source data from the staging table into the target table and clean up the staging table.
In your database, define a stored procedure with MERGE logic, like the following example, which is pointed to from the previous stored procedure activity. Assume that the target is the **Marketing** table with three columns: **ProfileID**, **State**, and**Category**. Do the upsert based on the **ProfileID** column.
**Option 2:** You can choose to [invoke a stored procedure within the copy activity](#invoke-a-stored-procedure-from-a-sql-sink). This approach runs each batch (as governed by the `writeBatchSize` property) in the source table instead of using bulk insert as the default approach in the copy activity.
694
+
Copy activity now supports natively loading data into a database temporary table and then update the data in sink table if key exists and otherwise insert new data.
0 commit comments