You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-sql-database-managed-insance.md
+54-65Lines changed: 54 additions & 65 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -244,6 +244,9 @@ GO
244
244
245
245
### Azure SQL Database Managed Instance as a sink
246
246
247
+
> [!TIP]
248
+
> Learn more on the supported write behaviors, configurations and best practice from [Best practice for loading data into Azure SQL Database Managed Instance](#best-practice-for-loading-data-into-azure-sql-database-managed-instance).
249
+
247
250
To copy data to Azure SQL Database Managed Instance, set the sink type in the copy activity to **SqlSink**. The following properties are supported in the copy activity sink section:
248
251
249
252
| Property | Description | Required |
@@ -252,14 +255,11 @@ To copy data to Azure SQL Database Managed Instance, set the sink type in the co
252
255
| writeBatchSize |Number of rows to inserts into the SQL table **per batch**.<br/>Allowed values are integers for the number of rows. By default, Data Factory dynamically determine the appropriate batch size based on the row size. |No |
253
256
| writeBatchTimeout |This property specifies the wait time for the batch insert operation to complete before it times out.<br/>Allowed values are for the time span. An example is “00:30:00,” which is 30 minutes. |No. |
254
257
| preCopyScript |This property specifies a SQL query for the copy activity to execute before writing data into the managed instance. It's invoked only once per copy run. You can use this property to clean up preloaded data. |No. |
255
-
| sqlWriterStoredProcedureName |This name is for the stored procedure that defines how to apply source data into the target table. Examples of procedures are to do upserts or transforms by using your own business logic. <br/><br/>This stored procedure is *invoked per batch*. To do an operation that runs only once and has nothing to do with source data, for example, delete or truncate, use the `preCopyScript` property. |No. |
258
+
| sqlWriterStoredProcedureName |This name is for the stored procedure that defines how to apply source data into the target table. <br/>This stored procedure is *invoked per batch*. To do an operation that runs only once and has nothing to do with source data, for example, delete or truncate, use the `preCopyScript` property. |No. |
256
259
| storedProcedureParameters |These parameters are used for the stored procedure.<br/>Allowed values are name or value pairs. The names and casing of the parameters must match the names and casing of the stored procedure parameters. |No. |
257
260
| sqlWriterTableType |This property specifies a table type name to be used in the stored procedure. The copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data that's being copied with existing data. |No. |
258
261
259
-
> [!TIP]
260
-
> When data is copied to Azure SQL Database Managed Instance, the copy activity appends data to the sink table by default. To perform an upsert or additional business logic, use the stored procedure in SqlSink. For more information, see [Invoke a stored procedure from a SQL sink](#invoke-a-stored-procedure-from-a-sql-sink).
261
-
262
-
**Example 1: Append data**
262
+
**Example 1: append data**
263
263
264
264
```json
265
265
"activities":[
@@ -291,9 +291,9 @@ To copy data to Azure SQL Database Managed Instance, set the sink type in the co
291
291
]
292
292
```
293
293
294
-
**Example 2: Invoke a stored procedure during copy for upsert**
294
+
**Example 2: invoke a stored procedure during copy**
295
295
296
-
Learn more details from [Invoke a stored procedure from a SQL sink](#invoke-a-stored-procedure-from-a-sql-sink).
296
+
Learn more details from [Invoke a stored procedure from a SQL sink](#invoking-stored-procedure-for-sql-sink).
297
297
298
298
```json
299
299
"activities":[
@@ -330,80 +330,69 @@ Learn more details from [Invoke a stored procedure from a SQL sink](#invoke-a-st
330
330
]
331
331
```
332
332
333
-
## Identity columns in the target database
333
+
## Best practice for loading data into Azure SQL Database Managed Instance
334
334
335
-
The following example copies data from a source table with no identity column to a destination table with an identity column.
335
+
When you copy data into Azure SQL Database Managed Instance, you may require different write behavior:
336
336
337
-
**Source table**
337
+
-**[Append](#append-data)**: my source data only has new records;
338
+
-**[Upsert](#upsert-data)**: my source data has both inserts and updates;
339
+
-**[Overwrite](#overwrite-entire-table)**: I want to reload entire dimension table each time;
340
+
-**[Write with custom logic](#write-data-with-custom-logic)**: I need extra processing before the final insertion into the destination table.
338
341
339
-
```sql
340
-
createtabledbo.SourceTbl
341
-
(
342
-
name varchar(100),
343
-
age int
344
-
)
345
-
```
342
+
Refer to the respectively sections on how to configure in ADF and the best practices.
343
+
344
+
### Append data
345
+
346
+
This is the default behavior of this Azure SQL Database Managed Instance sink connector, and ADF do **bulk insert** to write to your table efficiently. You can simply configure the source and sink accordingly in Copy activity.
347
+
348
+
### Upsert data
349
+
350
+
**Option I** (suggested especially when you have large data to copy): the **most performant approach** to do upsert is the following:
346
351
347
-
**Destination table**
352
+
- Firstly, leverage a [temporary table](https://docs.microsoft.com/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017#temporary-tables) to bulk load all records using Copy activity. As operations against temporary tables are not logged, you can load millions of records in seconds.
353
+
- Execute a Stored Procedure activity in ADF to apply a [MERGE](https://docs.microsoft.com/sql/t-sql/statements/merge-transact-sql?view=azuresqldb-current) (or INSERT/UPDATE) statement, and use the temp table as source to perform all updates or inserts as a single transaction, reducing the amount of roundtrips and log operations. At the end of the Stored Procedure activity , temp table can be truncated to be ready for the next upsert cycle.
354
+
355
+
As an example, in Azure Data Factory, you can create a pipeline with a **Copy activity** chained with a **Stored Procedure activity** on success. The former copies data from your source store into an temporary table, say "**##UpsertTempTable**" as table name in dataset, then the latter invokes a Stored Procedure to merge source data from the temp table into target table, and clean up temp table.
In your database, define a Stored Procedure with MERGE logic, like the following, which is pointed to from the above Stored Procedure activity. Assuming target **Marketing** table with three columns: **ProfileID**, **State**, and **Category**, and do the upsert based on the **ProfileID** column.
Notice that the target table has an identity column.
378
+
**Option II:** alternatively, you can choose to [Invoke stored procedure within Copy activity](#invoking-stored-procedure-for-sql-sink), while note this approach is executed for each row in the source table instead of leveraging bulk insert as the default approach in Copy activity, thus it doesn't fit for large scale upsert.
359
379
360
-
**Source dataset JSON definition**
380
+
### Overwrite entire table
361
381
362
-
```json
363
-
{
364
-
"name": "SampleSource",
365
-
"properties": {
366
-
"type": " SqlServerTable",
367
-
"linkedServiceName": {
368
-
"referenceName": "TestIdentitySQL",
369
-
"type": "LinkedServiceReference"
370
-
},
371
-
"typeProperties": {
372
-
"tableName": "SourceTbl"
373
-
}
374
-
}
375
-
}
376
-
```
382
+
You can configure **preCopyScript** property in Copy activity sink, in which case for each Copy activity run, ADF executes the script first, then run the copy to insert the data. For example, to overwrite the entire table with the latest data, you can specify a script to first delete all records before bulk-loading the new data from the source.
377
383
378
-
**Destination dataset JSON definition**
384
+
### Write data with custom logic
379
385
380
-
```json
381
-
{
382
-
"name": "SampleTarget",
383
-
"properties": {
384
-
"structure": [
385
-
{ "name": "name" },
386
-
{ "name": "age" }
387
-
],
388
-
"type": "SqlServerTable",
389
-
"linkedServiceName": {
390
-
"referenceName": "TestIdentitySQL",
391
-
"type": "LinkedServiceReference"
392
-
},
393
-
"typeProperties": {
394
-
"tableName": "TargetTbl"
395
-
}
396
-
}
397
-
}
398
-
```
386
+
Similar as described in [Upsert data](#upsert-data) section, when you need to apply extra processing before the final insertion of source data into the destination table, you can a) for large scale, load to a temporary table then invoke a stored procedure, or b) invoking a stored procedure during copy.
399
387
400
-
Notice that your source and target table have different schema. The target table has an identity column. In this scenario, specify the "structure" property in the target dataset definition, which doesn’t include the identity column.
388
+
## <aname="invoking-stored-procedure-for-sql-sink"></a> Invoke a stored procedure from a SQL sink
401
389
402
-
## <aname="invoke-a-stored-procedure-from-a-sql-sink"></a> Invoke a stored procedure from a SQL sink
390
+
When you copy data into Azure SQL Database Managed Instance, you can also configure and invoke a user-specified stored procedure with additional parameters.
403
391
404
-
When data is copied into Azure SQL Database Managed Instance, a stored procedure can be configured and invoked with additional parameters that you specify.
392
+
> [!TIP]
393
+
> Invoking stored procedure processes the data row-by-row instead of bulk operation, which is not suggested for large scale copy. Learn more from [Best practice for loading data into Azure SQL Database Managed Instance](#best-practice-for-loading-data-into-azure-sql-database-managed-instance).
405
394
406
-
You can use a stored procedure when built-in copy mechanisms don't serve the purpose. It's typically used when an upsert (update + insert) or extra processing must be done before the final insertion of source data in the destination table. Extra processing can include tasks such as merging columns, looking up additional values, and insertion into multiple tables.
395
+
You can use a stored procedure when built-in copy mechanisms don't serve the purpose, e.g. apply extra processing before the final insertion of source data into the destination table. Some extra processing examples are merge columns, look up additional values, and insertion into more than one table.
407
396
408
397
The following sample shows how to use a stored procedure to do an upsert into a table in the SQL Server database. Assume that input data and the sink **Marketing** table each have three columns: **ProfileID**, **State**, and **Category**. Do the upsert based on the **ProfileID** column, and only apply it for a specific category.
0 commit comments