Skip to content

Commit 590974a

Browse files
authored
Merge pull request #77595 from linda33wj/master
Update ADF copy content
2 parents 9f71372 + 7db3f0c commit 590974a

File tree

4 files changed

+44
-18
lines changed

4 files changed

+44
-18
lines changed

articles/data-factory/connector-azure-data-lake-storage.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,9 @@ To use managed identities for Azure resources authentication, follow these steps
164164
>- **Data Factory UI** to test connection and navigating folders during authoring.
165165
>If you have concern on granting permission at account level, you can skip test connection and input path manually during authoring. Copy activity will still work as long as the managed identity is granted with proper permission at the files to be copied.
166166
167+
>[!IMPORTANT]
168+
>If you use PolyBase to load data from ADLS Gen2 into SQL DW, when using ADLS Gen2 managed identity authentication, make sure you also configure SQL DW properly to use MSI to ADLS Gen2 storage, follow the steps #1 to #3.b in [this guidance](../sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage). If your ADLS Gen2 is configured with VNet service endpoint, to use PolyBase to load data from it, you must use managed identity authentication.
169+
167170
These properties are supported in linked service:
168171

169172
| Property | Description | Required |

articles/data-factory/connector-azure-sql-data-warehouse.md

Lines changed: 35 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.workload: data-services
1212
ms.tgt_pltfrm: na
1313

1414
ms.topic: conceptual
15-
ms.date: 04/29/2019
15+
ms.date: 05/22/2019
1616
ms.author: jingwang
1717

1818
---
@@ -146,7 +146,7 @@ To use service principal-based Azure AD application token authentication, follow
146146
4. **Grant the service principal needed permissions** as you normally do for SQL users or others. Run the following code, or refer to more options [here](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-addrolemember-transact-sql?view=sql-server-2017).
147147

148148
```sql
149-
EXEC sp_addrolemember [role name], [your application name];
149+
EXEC sp_addrolemember db_owner, [your application name];
150150
```
151151

152152
5. **Configure an Azure SQL Data Warehouse linked service** in Azure Data Factory.
@@ -196,7 +196,7 @@ To use managed identity authentication, follow these steps:
196196
3. **Grant the Data Factory Managed Identity needed permissions** as you normally do for SQL users and others. Run the following code, or refer to more options [here](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-addrolemember-transact-sql?view=sql-server-2017).
197197

198198
```sql
199-
EXEC sp_addrolemember [role name], [your Data Factory name];
199+
EXEC sp_addrolemember db_owner, [your Data Factory name];
200200
```
201201

202202
5. **Configure an Azure SQL Data Warehouse linked service** in Azure Data Factory.
@@ -372,7 +372,7 @@ To copy data to Azure SQL Data Warehouse, set the sink type in Copy Activity to
372372
| rejectValue | Specifies the number or percentage of rows that can be rejected before the query fails.<br/><br/>Learn more about PolyBase’s reject options in the Arguments section of [CREATE EXTERNAL TABLE (Transact-SQL)](https://msdn.microsoft.com/library/dn935021.aspx). <br/><br/>Allowed values are 0 (default), 1, 2, etc. |No |
373373
| rejectType | Specifies whether the **rejectValue** option is a literal value or a percentage.<br/><br/>Allowed values are **Value** (default) and **Percentage**. | No |
374374
| rejectSampleValue | Determines the number of rows to retrieve before PolyBase recalculates the percentage of rejected rows.<br/><br/>Allowed values are 1, 2, etc. | Yes, if the **rejectType** is **percentage**. |
375-
| useTypeDefault | Specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.<br/><br/>Learn more about this property from the Arguments section in [CREATE EXTERNAL FILE FORMAT (Transact-SQL)](https://msdn.microsoft.com/library/dn935026.aspx).<br/><br/>Allowed values are **True** and **False** (default). | No |
375+
| useTypeDefault | Specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.<br/><br/>Learn more about this property from the Arguments section in [CREATE EXTERNAL FILE FORMAT (Transact-SQL)](https://msdn.microsoft.com/library/dn935026.aspx).<br/><br/>Allowed values are **True** and **False** (default).<br><br>**See [troubleshooting tips](#polybase-troubleshooting) related to this setting.** | No |
376376
| writeBatchSize | Number of rows to inserts into the SQL table **per batch**. Applies only when PolyBase isn't used.<br/><br/>The allowed value is **integer** (number of rows). By default, Data Factory dynamically determine the appropriate batch size based on the row size. | No |
377377
| writeBatchTimeout | Wait time for the batch insert operation to finish before it times out. Applies only when PolyBase isn't used.<br/><br/>The allowed value is **timespan**. Example: “00:30:00” (30 minutes). | No |
378378
| preCopyScript | Specify a SQL query for Copy Activity to run before writing data into Azure SQL Data Warehouse in each run. Use this property to clean up the preloaded data. | No |
@@ -402,6 +402,9 @@ Using [PolyBase](https://docs.microsoft.com/sql/relational-databases/polybase/po
402402
* If your source data is in **Azure Blob, Azure Data Lake Storage Gen1 or Azure Data Lake Storage Gen2**, and the **format is PolyBase compatible**, you can use copy activity to directly invoke PolyBase to let Azure SQL Data Warehouse pull the data from source. For details, see **[Direct copy by using PolyBase](#direct-copy-by-using-polybase)**.
403403
* If your source data store and format isn't originally supported by PolyBase, use the **[Staged copy by using PolyBase](#staged-copy-by-using-polybase)** feature instead. The staged copy feature also provides you better throughput. It automatically converts the data into PolyBase-compatible format. And it stores the data in Azure Blob storage. It then loads the data into SQL Data Warehouse.
404404

405+
>[!TIP]
406+
>Learn more on [Best practices for using PolyBase](#best-practices-for-using-polybase).
407+
405408
### Direct copy by using PolyBase
406409

407410
SQL Data Warehouse PolyBase directly supports Azure Blob, Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2. If your source data meets the criteria described in this section, use PolyBase to copy directly from the source data store to Azure SQL Data Warehouse. Otherwise, use [Staged copy by using PolyBase](#staged-copy-by-using-polybase).
@@ -415,9 +418,12 @@ If the requirements aren't met, Azure Data Factory checks the settings and autom
415418
416419
| Supported source data store type | Supported source authentication type |
417420
|:--- |:--- |
418-
| [Azure Blob](connector-azure-blob-storage.md) | Account key authentication |
421+
| [Azure Blob](connector-azure-blob-storage.md) | Account key authentication, managed identity authentication |
419422
| [Azure Data Lake Storage Gen1](connector-azure-data-lake-store.md) | Service principal authentication |
420-
| [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) | Account key authentication |
423+
| [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) | Account key authentication, managed identity authentication |
424+
425+
>[!IMPORTANT]
426+
>If your Azure Storage is configured with VNet service endpoint, you must use managed identity authentication. Refer to [Impact of using VNet Service Endpoints with Azure storage](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage)
421427
422428
2. The **source data format** is of **Parquet**, **ORC**, or **Delimited text**, with the following configurations:
423429
@@ -512,10 +518,29 @@ To use PolyBase, the user that loads data into SQL Data Warehouse must have ["CO
512518

513519
### Row size and data type limits
514520

515-
PolyBase loads are limited to rows smaller than 1 MB. They can't load to VARCHR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For more information, see [SQL Data Warehouse service capacity limits](../sql-data-warehouse/sql-data-warehouse-service-capacity-limits.md#loads).
521+
PolyBase loads are limited to rows smaller than 1 MB. It cannot be used to load to VARCHR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For more information, see [SQL Data Warehouse service capacity limits](../sql-data-warehouse/sql-data-warehouse-service-capacity-limits.md#loads).
516522

517523
When your source data has rows greater than 1 MB, you might want to vertically split the source tables into several small ones. Make sure that the largest size of each row doesn't exceed the limit. The smaller tables can then be loaded by using PolyBase and merged together in Azure SQL Data Warehouse.
518524
525+
Alternatively, for data with such wide columns, you can use non-PolyBase to load the data using ADF, by turning off "allow PolyBase" setting.
526+
527+
### PolyBase troubleshooting
528+
529+
**Loading to Decimal column**
530+
531+
If your source data is in text format and it contains empty value to be loaded into SQL Data Warehouse Decimal column, you may hit the following error:
532+
533+
```
534+
ErrorCode=FailedDbOperation, ......HadoopSqlException: Error converting data type VARCHAR to DECIMAL.....Detailed Message=Empty string can't be converted to DECIMAL.....
535+
```
536+
537+
The solution is to unselect "**Use type default**" option (as false) in copy activity sink -> PolyBase setings. "[USE_TYPE_DEFAULT](https://docs.microsoft.com/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest#arguments
538+
)" is a PolyBase native configuration which specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.
539+
540+
**Others**
541+
542+
For more knonw PolyBase issues, refer to [Troubleshooting Azure SQL Data Warehouse PolyBase load](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md#polybase).
543+
519544
### SQL Data Warehouse resource class
520545

521546
To achieve the best possible throughput, assign a larger resource class to the user that loads data into SQL Data Warehouse via PolyBase.
@@ -555,6 +580,9 @@ Learn details from [source transformation](data-flow-source.md) and [sink transf
555580

556581
When you copy data from or to Azure SQL Data Warehouse, the following mappings are used from Azure SQL Data Warehouse data types to Azure Data Factory interim data types. See [schema and data type mappings](copy-activity-schema-and-type-mapping.md) to learn how Copy Activity maps the source schema and data type to the sink.
557582

583+
>[!TIP]
584+
>Refer to [Table data types in Azure SQL Data Warehouse](../sql-data-warehouse/sql-data-warehouse-tables-data-types.md) article on SQL DW supported data types and the workarounds for unsupported ones.
585+
558586
| Azure SQL Data Warehouse data type | Data Factory interim data type |
559587
|:--- |:--- |
560588
| bigint | Int64 |
@@ -572,23 +600,18 @@ When you copy data from or to Azure SQL Data Warehouse, the following mappings a
572600
| int | Int32 |
573601
| money | Decimal |
574602
| nchar | String, Char[] |
575-
| ntext | String, Char[] |
576603
| numeric | Decimal |
577604
| nvarchar | String, Char[] |
578605
| real | Single |
579606
| rowversion | Byte[] |
580607
| smalldatetime | DateTime |
581608
| smallint | Int16 |
582609
| smallmoney | Decimal |
583-
| sql_variant | Object |
584-
| text | String, Char[] |
585610
| time | TimeSpan |
586-
| timestamp | Byte[] |
587611
| tinyint | Byte |
588612
| uniqueidentifier | Guid |
589613
| varbinary | Byte[] |
590614
| varchar | String, Char[] |
591-
| xml | Xml |
592615

593616
## Next steps
594617
For a list of data stores supported as sources and sinks by Copy Activity in Azure Data Factory, see [supported data stores and formats](copy-activity-overview.md##supported-data-stores-and-formats).

articles/data-factory/load-sap-bw-data.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.reviewer:
1010
ms.service: data-factory
1111
ms.workload: data-services
1212
ms.topic: conceptual
13-
ms.date: 03/19/2019
13+
ms.date: 05/22/2019
1414
ms.author: jingwang
1515

1616
---
@@ -175,9 +175,9 @@ On the data factory **Let's get started** page, select **Create pipeline from te
175175
"properties": {
176176
"sapOpenHubMaxRequestId": {
177177
"type": "string"
178-
},
179-
"type": "object"
180-
}
178+
}
179+
},
180+
"type": "object"
181181
}
182182
```
183183

articles/data-factory/supported-file-formats-and-compression-codecs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.reviewer: craigg
88
ms.service: data-factory
99
ms.workload: data-services
1010
ms.topic: conceptual
11-
ms.date: 04/29/2019
11+
ms.date: 05/22/2019
1212
ms.author: jingwang
1313

1414
---
@@ -26,7 +26,7 @@ If you want to **copy files as-is** between file-based stores (binary copy), ski
2626
* [Avro format](#avro-format)
2727

2828
> [!TIP]
29-
> Learn how copy activity maps your source data to sink from [Schema mapping in copy activity](copy-activity-schema-and-type-mapping.md), including how the metadata is determined based on your file format settings and tips on when to specify the [dataset `structure`](concepts-datasets-linked-services.md#dataset-structure-or-schema) section.
29+
> Learn how copy activity maps your source data to sink from [Schema mapping in copy activity](copy-activity-schema-and-type-mapping.md).
3030
3131
## Text format
3232

0 commit comments

Comments
 (0)