Skip to content

Commit 7da223f

Browse files
Merge pull request #79266 from linda33wj/azure-sql-feedback
Update guidance for writing to Azure SQL DB
2 parents 524e8e1 + c6aa2f8 commit 7da223f

File tree

5 files changed

+165
-204
lines changed

5 files changed

+165
-204
lines changed

articles/data-factory/connector-azure-sql-database-managed-insance.md

Lines changed: 54 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,9 @@ GO
244244

245245
### Azure SQL Database Managed Instance as a sink
246246

247+
> [!TIP]
248+
> Learn more on the supported write behaviors, configurations and best practice from [Best practice for loading data into Azure SQL Database Managed Instance](#best-practice-for-loading-data-into-azure-sql-database-managed-instance).
249+
247250
To copy data to Azure SQL Database Managed Instance, set the sink type in the copy activity to **SqlSink**. The following properties are supported in the copy activity sink section:
248251

249252
| Property | Description | Required |
@@ -252,14 +255,11 @@ To copy data to Azure SQL Database Managed Instance, set the sink type in the co
252255
| writeBatchSize |Number of rows to inserts into the SQL table **per batch**.<br/>Allowed values are integers for the number of rows. By default, Data Factory dynamically determine the appropriate batch size based on the row size. |No |
253256
| writeBatchTimeout |This property specifies the wait time for the batch insert operation to complete before it times out.<br/>Allowed values are for the time span. An example is “00:30:00,” which is 30 minutes. |No. |
254257
| preCopyScript |This property specifies a SQL query for the copy activity to execute before writing data into the managed instance. It's invoked only once per copy run. You can use this property to clean up preloaded data. |No. |
255-
| sqlWriterStoredProcedureName |This name is for the stored procedure that defines how to apply source data into the target table. Examples of procedures are to do upserts or transforms by using your own business logic. <br/><br/>This stored procedure is *invoked per batch*. To do an operation that runs only once and has nothing to do with source data, for example, delete or truncate, use the `preCopyScript` property. |No. |
258+
| sqlWriterStoredProcedureName |This name is for the stored procedure that defines how to apply source data into the target table. <br/>This stored procedure is *invoked per batch*. To do an operation that runs only once and has nothing to do with source data, for example, delete or truncate, use the `preCopyScript` property. |No. |
256259
| storedProcedureParameters |These parameters are used for the stored procedure.<br/>Allowed values are name or value pairs. The names and casing of the parameters must match the names and casing of the stored procedure parameters. |No. |
257260
| sqlWriterTableType |This property specifies a table type name to be used in the stored procedure. The copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data that's being copied with existing data. |No. |
258261

259-
> [!TIP]
260-
> When data is copied to Azure SQL Database Managed Instance, the copy activity appends data to the sink table by default. To perform an upsert or additional business logic, use the stored procedure in SqlSink. For more information, see [Invoke a stored procedure from a SQL sink](#invoke-a-stored-procedure-from-a-sql-sink).
261-
262-
**Example 1: Append data**
262+
**Example 1: append data**
263263

264264
```json
265265
"activities":[
@@ -291,9 +291,9 @@ To copy data to Azure SQL Database Managed Instance, set the sink type in the co
291291
]
292292
```
293293

294-
**Example 2: Invoke a stored procedure during copy for upsert**
294+
**Example 2: invoke a stored procedure during copy**
295295

296-
Learn more details from [Invoke a stored procedure from a SQL sink](#invoke-a-stored-procedure-from-a-sql-sink).
296+
Learn more details from [Invoke a stored procedure from a SQL sink](#invoking-stored-procedure-for-sql-sink).
297297

298298
```json
299299
"activities":[
@@ -330,80 +330,69 @@ Learn more details from [Invoke a stored procedure from a SQL sink](#invoke-a-st
330330
]
331331
```
332332

333-
## Identity columns in the target database
333+
## Best practice for loading data into Azure SQL Database Managed Instance
334334

335-
The following example copies data from a source table with no identity column to a destination table with an identity column.
335+
When you copy data into Azure SQL Database Managed Instance, you may require different write behavior:
336336

337-
**Source table**
337+
- **[Append](#append-data)**: my source data only has new records;
338+
- **[Upsert](#upsert-data)**: my source data has both inserts and updates;
339+
- **[Overwrite](#overwrite-entire-table)**: I want to reload entire dimension table each time;
340+
- **[Write with custom logic](#write-data-with-custom-logic)**: I need extra processing before the final insertion into the destination table.
338341

339-
```sql
340-
create table dbo.SourceTbl
341-
(
342-
name varchar(100),
343-
age int
344-
)
345-
```
342+
Refer to the respectively sections on how to configure in ADF and the best practices.
343+
344+
### Append data
345+
346+
This is the default behavior of this Azure SQL Database Managed Instance sink connector, and ADF do **bulk insert** to write to your table efficiently. You can simply configure the source and sink accordingly in Copy activity.
347+
348+
### Upsert data
349+
350+
**Option I** (suggested especially when you have large data to copy): the **most performant approach** to do upsert is the following:
346351

347-
**Destination table**
352+
- Firstly, leverage a [temporary table](https://docs.microsoft.com/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017#temporary-tables) to bulk load all records using Copy activity. As operations against temporary tables are not logged, you can load millions of records in seconds.
353+
- Execute a Stored Procedure activity in ADF to apply a [MERGE](https://docs.microsoft.com/sql/t-sql/statements/merge-transact-sql?view=azuresqldb-current) (or INSERT/UPDATE) statement, and use the temp table as source to perform all updates or inserts as a single transaction, reducing the amount of roundtrips and log operations. At the end of the Stored Procedure activity , temp table can be truncated to be ready for the next upsert cycle.
354+
355+
As an example, in Azure Data Factory, you can create a pipeline with a **Copy activity** chained with a **Stored Procedure activity** on success. The former copies data from your source store into an temporary table, say "**##UpsertTempTable**" as table name in dataset, then the latter invokes a Stored Procedure to merge source data from the temp table into target table, and clean up temp table.
356+
357+
![Upsert](./media/connector-azure-sql-database/azure-sql-database-upsert.png)
358+
359+
In your database, define a Stored Procedure with MERGE logic, like the following, which is pointed to from the above Stored Procedure activity. Assuming target **Marketing** table with three columns: **ProfileID**, **State**, and **Category**, and do the upsert based on the **ProfileID** column.
348360

349361
```sql
350-
create table dbo.TargetTbl
351-
(
352-
identifier int identity(1,1),
353-
name varchar(100),
354-
age int
355-
)
362+
CREATE PROCEDURE [dbo].[spMergeData]
363+
AS
364+
BEGIN
365+
MERGE TargetTable AS target
366+
USING ##UpsertTempTable AS source
367+
ON (target.[ProfileID] = source.[ProfileID])
368+
WHEN MATCHED THEN
369+
UPDATE SET State = source.State
370+
WHEN NOT matched THEN
371+
INSERT ([ProfileID], [State], [Category])
372+
VALUES (source.ProfileID, source.State, source.Category);
373+
374+
TRUNCATE TABLE ##UpsertTempTable
375+
END
356376
```
357377

358-
Notice that the target table has an identity column.
378+
**Option II:** alternatively, you can choose to [Invoke stored procedure within Copy activity](#invoking-stored-procedure-for-sql-sink), while note this approach is executed for each row in the source table instead of leveraging bulk insert as the default approach in Copy activity, thus it doesn't fit for large scale upsert.
359379

360-
**Source dataset JSON definition**
380+
### Overwrite entire table
361381

362-
```json
363-
{
364-
"name": "SampleSource",
365-
"properties": {
366-
"type": " SqlServerTable",
367-
"linkedServiceName": {
368-
"referenceName": "TestIdentitySQL",
369-
"type": "LinkedServiceReference"
370-
},
371-
"typeProperties": {
372-
"tableName": "SourceTbl"
373-
}
374-
}
375-
}
376-
```
382+
You can configure **preCopyScript** property in Copy activity sink, in which case for each Copy activity run, ADF executes the script first, then run the copy to insert the data. For example, to overwrite the entire table with the latest data, you can specify a script to first delete all records before bulk-loading the new data from the source.
377383

378-
**Destination dataset JSON definition**
384+
### Write data with custom logic
379385

380-
```json
381-
{
382-
"name": "SampleTarget",
383-
"properties": {
384-
"structure": [
385-
{ "name": "name" },
386-
{ "name": "age" }
387-
],
388-
"type": "SqlServerTable",
389-
"linkedServiceName": {
390-
"referenceName": "TestIdentitySQL",
391-
"type": "LinkedServiceReference"
392-
},
393-
"typeProperties": {
394-
"tableName": "TargetTbl"
395-
}
396-
}
397-
}
398-
```
386+
Similar as described in [Upsert data](#upsert-data) section, when you need to apply extra processing before the final insertion of source data into the destination table, you can a) for large scale, load to a temporary table then invoke a stored procedure, or b) invoking a stored procedure during copy.
399387

400-
Notice that your source and target table have different schema. The target table has an identity column. In this scenario, specify the "structure" property in the target dataset definition, which doesn’t include the identity column.
388+
## <a name="invoking-stored-procedure-for-sql-sink"></a> Invoke a stored procedure from a SQL sink
401389

402-
## <a name="invoke-a-stored-procedure-from-a-sql-sink"></a> Invoke a stored procedure from a SQL sink
390+
When you copy data into Azure SQL Database Managed Instance, you can also configure and invoke a user-specified stored procedure with additional parameters.
403391

404-
When data is copied into Azure SQL Database Managed Instance, a stored procedure can be configured and invoked with additional parameters that you specify.
392+
> [!TIP]
393+
> Invoking stored procedure processes the data row-by-row instead of bulk operation, which is not suggested for large scale copy. Learn more from [Best practice for loading data into Azure SQL Database Managed Instance](#best-practice-for-loading-data-into-azure-sql-database-managed-instance).
405394
406-
You can use a stored procedure when built-in copy mechanisms don't serve the purpose. It's typically used when an upsert (update + insert) or extra processing must be done before the final insertion of source data in the destination table. Extra processing can include tasks such as merging columns, looking up additional values, and insertion into multiple tables.
395+
You can use a stored procedure when built-in copy mechanisms don't serve the purpose, e.g. apply extra processing before the final insertion of source data into the destination table. Some extra processing examples are merge columns, look up additional values, and insertion into more than one table.
407396

408397
The following sample shows how to use a stored procedure to do an upsert into a table in the SQL Server database. Assume that input data and the sink **Marketing** table each have three columns: **ProfileID**, **State**, and **Category**. Do the upsert based on the **ProfileID** column, and only apply it for a specific category.
409398

0 commit comments

Comments
 (0)