You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-data-lake-storage.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -164,6 +164,9 @@ To use managed identities for Azure resources authentication, follow these steps
164
164
>-**Data Factory UI** to test connection and navigating folders during authoring.
165
165
>If you have concern on granting permission at account level, you can skip test connection and input path manually during authoring. Copy activity will still work as long as the managed identity is granted with proper permission at the files to be copied.
166
166
167
+
>[!IMPORTANT]
168
+
>If you use PolyBase to load data from ADLS Gen2 into SQL DW, when using ADLS Gen2 managed identity authentication, make sure you also configure SQL DW properly to use MSI to ADLS Gen2 storage, follow the steps #1 to #3.b in [this guidance](../sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage). If your ADLS Gen2 is configured with VNet service endpoint, to use PolyBase to load data from it, you must use managed identity authentication.
Copy file name to clipboardExpand all lines: articles/data-factory/connector-azure-sql-data-warehouse.md
+35-12Lines changed: 35 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.workload: data-services
12
12
ms.tgt_pltfrm: na
13
13
14
14
ms.topic: conceptual
15
-
ms.date: 04/29/2019
15
+
ms.date: 05/22/2019
16
16
ms.author: jingwang
17
17
18
18
---
@@ -146,7 +146,7 @@ To use service principal-based Azure AD application token authentication, follow
146
146
4. **Grant the service principal needed permissions**as you normally do for SQL users or others. Run the following code, or refer to more options [here](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-addrolemember-transact-sql?view=sql-server-2017).
5. **Configure an Azure SQL Data Warehouse linked service**in Azure Data Factory.
@@ -196,7 +196,7 @@ To use managed identity authentication, follow these steps:
196
196
3. **Grant the Data Factory Managed Identity needed permissions**as you normally do for SQL users and others. Run the following code, or refer to more options [here](https://docs.microsoft.com/sql/relational-databases/system-stored-procedures/sp-addrolemember-transact-sql?view=sql-server-2017).
197
197
198
198
```sql
199
-
EXEC sp_addrolemember [role name], [your Data Factory name];
199
+
EXEC sp_addrolemember db_owner, [your Data Factory name];
200
200
```
201
201
202
202
5. **Configure an Azure SQL Data Warehouse linked service**in Azure Data Factory.
@@ -372,7 +372,7 @@ To copy data to Azure SQL Data Warehouse, set the sink type in Copy Activity to
372
372
| rejectValue | Specifies the numberor percentage of rows that can be rejected before the query fails.<br/><br/>Learn more about PolyBase’s reject options in the Arguments section of [CREATE EXTERNAL TABLE (Transact-SQL)](https://msdn.microsoft.com/library/dn935021.aspx). <br/><br/>Allowed values are 0 (default), 1, 2, etc. |No |
373
373
| rejectType | Specifies whether the **rejectValue** option is a literal value or a percentage.<br/><br/>Allowed values are **Value** (default) and**Percentage**. | No |
374
374
| rejectSampleValue | Determines the number of rows to retrieve before PolyBase recalculates the percentage of rejected rows.<br/><br/>Allowed values are 1, 2, etc. | Yes, if the **rejectType** is **percentage**. |
375
-
| useTypeDefault | Specifies how to handle missing valuesin delimited text files when PolyBase retrieves data from the text file.<br/><br/>Learn more about this property from the Arguments section in [CREATE EXTERNAL FILE FORMAT (Transact-SQL)](https://msdn.microsoft.com/library/dn935026.aspx).<br/><br/>Allowed values are **True**and**False** (default). | No |
375
+
| useTypeDefault | Specifies how to handle missing valuesin delimited text files when PolyBase retrieves data from the text file.<br/><br/>Learn more about this property from the Arguments section in [CREATE EXTERNAL FILE FORMAT (Transact-SQL)](https://msdn.microsoft.com/library/dn935026.aspx).<br/><br/>Allowed values are **True**and**False** (default).<br><br>**See [troubleshooting tips](#polybase-troubleshooting) related to this setting.** | No |
376
376
| writeBatchSize | Number of rows to inserts into the SQL table **per batch**. Applies only when PolyBase isn't used.<br/><br/>The allowed value is **integer** (number of rows). By default, Data Factory dynamically determine the appropriate batch size based on the row size. | No |
377
377
| writeBatchTimeout | Wait time for the batch insert operation to finish before it times out. Applies only when PolyBase isn't used.<br/><br/>The allowed value is **timespan**. Example: “00:30:00” (30 minutes). | No |
378
378
| preCopyScript | Specify a SQL query for Copy Activity to run before writing data into Azure SQL Data Warehouse in each run. Use this property to clean up the preloaded data. | No |
@@ -402,6 +402,9 @@ Using [PolyBase](https://docs.microsoft.com/sql/relational-databases/polybase/po
402
402
* If your source data is in **Azure Blob, Azure Data Lake Storage Gen1 or Azure Data Lake Storage Gen2**, and the **format is PolyBase compatible**, you can use copy activity to directly invoke PolyBase to let Azure SQL Data Warehouse pull the data from source. For details, see **[Direct copy by using PolyBase](#direct-copy-by-using-polybase)**.
403
403
* If your source data store and format isn't originally supported by PolyBase, use the **[Staged copy by using PolyBase](#staged-copy-by-using-polybase)** feature instead. The staged copy feature also provides you better throughput. It automatically converts the data into PolyBase-compatible format. And it stores the data in Azure Blob storage. It then loads the data into SQL Data Warehouse.
404
404
405
+
>[!TIP]
406
+
>Learn more on [Best practices for using PolyBase](#best-practices-for-using-polybase).
407
+
405
408
### Direct copy by using PolyBase
406
409
407
410
SQL Data Warehouse PolyBase directly supports Azure Blob, Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2. If your source data meets the criteria described in this section, use PolyBase to copy directly from the source data store to Azure SQL Data Warehouse. Otherwise, use [Staged copy by using PolyBase](#staged-copy-by-using-polybase).
@@ -415,9 +418,12 @@ If the requirements aren't met, Azure Data Factory checks the settings and autom
415
418
416
419
| Supported source data store type | Supported source authentication type |
| [Azure Data Lake Storage Gen1](connector-azure-data-lake-store.md) | Service principal authentication |
420
-
| [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) | Account key authentication |
423
+
| [Azure Data Lake Storage Gen2](connector-azure-data-lake-storage.md) | Account key authentication, managed identity authentication |
424
+
425
+
>[!IMPORTANT]
426
+
>If your Azure Storage is configured with VNet service endpoint, you must use managed identity authentication. Refer to [Impact of using VNet Service Endpoints with Azure storage](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overview.md#impact-of-using-vnet-service-endpoints-with-azure-storage)
421
427
422
428
2. The **source data format** is of **Parquet**, **ORC**, or **Delimited text**, with the following configurations:
423
429
@@ -512,10 +518,29 @@ To use PolyBase, the user that loads data into SQL Data Warehouse must have ["CO
512
518
513
519
### Row size and data type limits
514
520
515
-
PolyBase loads are limited to rows smaller than 1 MB. They can't load to VARCHR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For more information, see [SQL Data Warehouse service capacity limits](../sql-data-warehouse/sql-data-warehouse-service-capacity-limits.md#loads).
521
+
PolyBase loads are limited to rows smaller than 1 MB. It cannot be used to load to VARCHR(MAX), NVARCHAR(MAX), or VARBINARY(MAX). For more information, see [SQL Data Warehouse service capacity limits](../sql-data-warehouse/sql-data-warehouse-service-capacity-limits.md#loads).
516
522
517
523
When your source data has rows greater than 1 MB, you might want to vertically split the source tables into several small ones. Make sure that the largest size of each row doesn't exceed the limit. The smaller tables can then be loaded by using PolyBase and merged together in Azure SQL Data Warehouse.
518
524
525
+
Alternatively, for data with such wide columns, you can use non-PolyBase to load the data using ADF, by turning off "allow PolyBase" setting.
526
+
527
+
### PolyBase troubleshooting
528
+
529
+
**Loading to Decimal column**
530
+
531
+
If your source data is in text format and it contains empty value to be loaded into SQL Data Warehouse Decimal column, you may hit the following error:
532
+
533
+
```
534
+
ErrorCode=FailedDbOperation, ......HadoopSqlException: Error converting data type VARCHAR to DECIMAL.....Detailed Message=Empty string can't be converted to DECIMAL.....
535
+
```
536
+
537
+
The solution is to unselect "**Use type default**" option (as false) in copy activity sink -> PolyBase setings. "[USE_TYPE_DEFAULT](https://docs.microsoft.com/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest#arguments
538
+
)" is a PolyBase native configuration which specifies how to handle missing values in delimited text files when PolyBase retrieves data from the text file.
539
+
540
+
**Others**
541
+
542
+
For more knonw PolyBase issues, refer to [Troubleshooting Azure SQL Data Warehouse PolyBase load](../sql-data-warehouse/sql-data-warehouse-troubleshoot.md#polybase).
543
+
519
544
### SQL Data Warehouse resource class
520
545
521
546
To achieve the best possible throughput, assign a larger resource class to the user that loads data into SQL Data Warehouse via PolyBase.
@@ -555,6 +580,9 @@ Learn details from [source transformation](data-flow-source.md) and [sink transf
555
580
556
581
When you copy data from or to Azure SQL Data Warehouse, the following mappings are used from Azure SQL Data Warehouse data types to Azure Data Factory interim data types. See [schema and data type mappings](copy-activity-schema-and-type-mapping.md) to learn how Copy Activity maps the source schema and data type to the sink.
557
582
583
+
>[!TIP]
584
+
>Refer to [Table data types in Azure SQL Data Warehouse](../sql-data-warehouse/sql-data-warehouse-tables-data-types.md) article on SQL DW supported data types and the workarounds for unsupported ones.
585
+
558
586
| Azure SQL Data Warehouse data type | Data Factory interim data type |
559
587
|:--- |:--- |
560
588
| bigint | Int64 |
@@ -572,23 +600,18 @@ When you copy data from or to Azure SQL Data Warehouse, the following mappings a
572
600
| int | Int32 |
573
601
| money | Decimal |
574
602
| nchar | String, Char[]|
575
-
| ntext | String, Char[] |
576
603
| numeric | Decimal |
577
604
| nvarchar | String, Char[]|
578
605
| real | Single |
579
606
| rowversion | Byte[]|
580
607
| smalldatetime | DateTime |
581
608
| smallint | Int16 |
582
609
| smallmoney | Decimal |
583
-
| sql_variant | Object |
584
-
| text | String, Char[] |
585
610
| time | TimeSpan |
586
-
| timestamp | Byte[] |
587
611
| tinyint | Byte |
588
612
| uniqueidentifier | Guid |
589
613
| varbinary | Byte[]|
590
614
| varchar | String, Char[]|
591
-
| xml | Xml |
592
615
593
616
## Next steps
594
617
For a list of data stores supported as sources and sinks by Copy Activity in Azure Data Factory, see [supported data stores and formats](copy-activity-overview.md##supported-data-stores-and-formats).
Copy file name to clipboardExpand all lines: articles/data-factory/supported-file-formats-and-compression-codecs.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.reviewer: craigg
8
8
ms.service: data-factory
9
9
ms.workload: data-services
10
10
ms.topic: conceptual
11
-
ms.date: 04/29/2019
11
+
ms.date: 05/22/2019
12
12
ms.author: jingwang
13
13
14
14
---
@@ -26,7 +26,7 @@ If you want to **copy files as-is** between file-based stores (binary copy), ski
26
26
*[Avro format](#avro-format)
27
27
28
28
> [!TIP]
29
-
> Learn how copy activity maps your source data to sink from [Schema mapping in copy activity](copy-activity-schema-and-type-mapping.md), including how the metadata is determined based on your file format settings and tips on when to specify the [dataset `structure`](concepts-datasets-linked-services.md#dataset-structure-or-schema) section.
29
+
> Learn how copy activity maps your source data to sink from [Schema mapping in copy activity](copy-activity-schema-and-type-mapping.md).
0 commit comments