Skip to content

Commit 95a3490

Browse files
authored
Merge pull request #108955 from linda33wj/master
Update ADF copy activity docs
2 parents dcb46c5 + f0b211c commit 95a3490

File tree

4 files changed

+15
-18
lines changed

4 files changed

+15
-18
lines changed

articles/data-factory/connector-azure-sql-data-warehouse.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: data-factory
1010
ms.workload: data-services
1111
ms.topic: conceptual
1212
ms.custom: seo-lt-2019
13-
ms.date: 03/12/2020
13+
ms.date: 03/25/2020
1414
---
1515

1616
# Copy and transform data in Azure Synapse Analytics (formerly Azure SQL Data Warehouse) by using Azure Data Factory
@@ -439,7 +439,7 @@ If the requirements aren't met, Azure Data Factory checks the settings and autom
439439
440440
3. If your source is a folder, `recursive` in copy activity must be set to true.
441441
442-
4. `wildcardFolderPath` , `wildcardFilename`, `modifiedDateTimeStart`, and `modifiedDateTimeEnd` are not specified.
442+
4. `wildcardFolderPath` , `wildcardFilename`, `modifiedDateTimeStart`, `modifiedDateTimeEnd` and `additionalColumns` are not specified.
443443
444444
>[!NOTE]
445445
>If your source is a folder, note PolyBase retrieves files from the folder and all of its subfolders, and it doesn't retrieve data from files for which the file name begins with an underline (_) or a period (.), as documented [here - LOCATION argument](https://docs.microsoft.com/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest#arguments-2).
@@ -619,7 +619,7 @@ Using COPY statement supports the following configuration:
619619
620620
3. If your source is a folder, `recursive` in copy activity must be set to true.
621621
622-
4. `wildcardFolderPath` , `wildcardFilename`, `modifiedDateTimeStart`, and `modifiedDateTimeEnd` are not specified.
622+
4. `wildcardFolderPath` , `wildcardFilename`, `modifiedDateTimeStart`, `modifiedDateTimeEnd` and `additionalColumns` are not specified.
623623
624624
The following COPY statement settings are supported under `allowCopyCommand` in copy activity:
625625

articles/data-factory/connector-teradata.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.workload: data-services
1212

1313

1414
ms.topic: conceptual
15-
ms.date: 10/24/2019
15+
ms.date: 03/25/2020
1616
ms.author: jingwang
1717

1818
---
@@ -39,17 +39,11 @@ Specifically, this Teradata connector supports:
3939
- Copying data by using **Basic** or **Windows** authentication.
4040
- Parallel copying from a Teradata source. See the [Parallel copy from Teradata](#parallel-copy-from-teradata) section for details.
4141

42-
> [!NOTE]
43-
>
44-
> After the release of self-hosted integration runtime v3.18, Azure Data Factory upgraded the Teradata connector. Any existing workload that uses the previous Teradata connector is still supported. For new workloads, however, it's a good idea to use the new one. Note that the new path requires a different set of linked service, dataset, and copy source. For configuration details, see the respective sections that follow.
45-
4642
## Prerequisites
4743

4844
[!INCLUDE [data-factory-v2-integration-runtime-requirements](../../includes/data-factory-v2-integration-runtime-requirements.md)]
4945

50-
The integration runtime provides a built-in Teradata driver, starting from version 3.18. You don't need to manually install any driver. The driver requires "Visual C++ Redistributable 2012 Update 4" on the self-hosted integration runtime machine. If you don't yet have it installed, download it from [here](https://www.microsoft.com/en-sg/download/details.aspx?id=30679).
51-
52-
For any self-hosted integration runtime version earlier than 3.18, install the [.NET Data Provider for Teradata](https://go.microsoft.com/fwlink/?LinkId=278886), version 14 or later, on the integration runtime machine.
46+
If you use Self-hosted Integration Runtime, note it provides a built-in Teradata driver starting from version 3.18. You don't need to manually install any driver. The driver requires "Visual C++ Redistributable 2012 Update 4" on the self-hosted integration runtime machine. If you don't yet have it installed, download it from [here](https://www.microsoft.com/en-sg/download/details.aspx?id=30679).
5347

5448
## Getting started
5549

@@ -67,7 +61,7 @@ The Teradata linked service supports the following properties:
6761
| connectionString | Specifies the information needed to connect to the Teradata instance. Refer to the following samples.<br/>You can also put a password in Azure Key Vault, and pull the `password` configuration out of the connection string. Refer to [Store credentials in Azure Key Vault](store-credentials-in-key-vault.md) with more details. | Yes |
6862
| username | Specify a user name to connect to Teradata. Applies when you are using Windows authentication. | No |
6963
| password | Specify a password for the user account you specified for the user name. You can also choose to [reference a secret stored in Azure Key Vault](store-credentials-in-key-vault.md). <br>Applies when you are using Windows authentication, or referencing a password in Key Vault for basic authentication. | No |
70-
| connectVia | The [Integration Runtime](concepts-integration-runtime.md) to be used to connect to the data store. Learn more from [Prerequisites](#prerequisites) section. If not specified, it uses the default Azure Integration Runtime. |Yes |
64+
| connectVia | The [Integration Runtime](concepts-integration-runtime.md) to be used to connect to the data store. Learn more from [Prerequisites](#prerequisites) section. If not specified, it uses the default Azure Integration Runtime. |No |
7165

7266
More connection properties you can set in connection string per your case:
7367

articles/data-factory/copy-activity-overview.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.reviewer: douglasl
1010
ms.service: data-factory
1111
ms.workload: data-services
1212
ms.topic: conceptual
13-
ms.date: 03/24/2020
13+
ms.date: 03/25/2020
1414
ms.author: jingwang
1515

1616
---
@@ -175,10 +175,6 @@ While copying data from source to sink, in scenarios like data lake migration, y
175175

176176
See [Schema and data type mapping](copy-activity-schema-and-type-mapping.md) for information about how the Copy activity maps your source data to your sink.
177177

178-
## Fault tolerance
179-
180-
By default, the Copy activity stops copying data and returns a failure when source data rows are incompatible with sink data rows. To make the copy succeed, you can configure the Copy activity to skip and log the incompatible rows and copy only the compatible data. See [Copy activity fault tolerance](copy-activity-fault-tolerance.md) for details.
181-
182178
## Add additional columns during copy
183179

184180
In addition to copying data from source data store to sink, you can also configure to add additional data columns to copy along to sink. For example:
@@ -191,6 +187,9 @@ You can find the following configuration on copy activity source tab:
191187

192188
![Add additional columns in copy activity](./media/copy-activity-overview/copy-activity-add-additional-columns.png)
193189

190+
>[!TIP]
191+
>This feature works with the latest dataset model. If you don't see this option from the UI, try creating a new dataset.
192+
194193
To configure it programmatically, add the `additionalColumns` property in your copy activity source:
195194

196195
| Property | Description | Required |
@@ -236,6 +235,10 @@ To configure it programmatically, add the `additionalColumns` property in your c
236235
]
237236
```
238237

238+
## Fault tolerance
239+
240+
By default, the Copy activity stops copying data and returns a failure when source data rows are incompatible with sink data rows. To make the copy succeed, you can configure the Copy activity to skip and log the incompatible rows and copy only the compatible data. See [Copy activity fault tolerance](copy-activity-fault-tolerance.md) for details.
241+
239242
## Next steps
240243
See the following quickstarts, tutorials, and samples:
241244

articles/data-factory/copy-activity-performance-features.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ The following table lists the parallel copy behavior:
8787
| Between file stores | `parallelCopies` determines the parallelism **at the file level**. The chunking within each file happens underneath automatically and transparently. It's designed to use the best suitable chunk size for a given data store type to load data in parallel. <br/><br/>The actual number of parallel copies copy activity uses at run time is no more than the number of files you have. If the copy behavior is **mergeFile** into file sink, the copy activity can't take advantage of file-level parallelism. |
8888
| From file store to non-file store | - When copying data into Azure SQL Database or Azure Cosmos DB, default parallel copy also depend on the sink tier (number of DTUs/RUs).<br>- When copying data into Azure Table, default parallel copy is 4. |
8989
| From non-file store to file store | - When copying data from partition-option-enabled data store (including [Oracle](connector-oracle.md#oracle-as-source), [Netezza](connector-netezza.md#netezza-as-source), [Teradata](connector-teradata.md#teradata-as-source), [SAP HANA](connector-sap-hana.md#sap-hana-as-source), [SAP Table](connector-sap-table.md#sap-table-as-source), and [SAP Open Hub](connector-sap-business-warehouse-open-hub.md#sap-bw-open-hub-as-source)), default parallel copy is 4. The actual number of parallel copies copy activity uses at run time is no more than the number of data partitions you have. When use Self-hosted Integration Runtime and copy to Azure Blob/ADLS Gen2, note the max effective parallel copy is 4 or 5 per IR node.<br>- For other scenarios, parallel copy doesn't take effect. Even if parallelism is specified, it's not applied. |
90-
| Between non-file stores | - When copying data into Azure SQL Database or Azure Cosmos DB, default parallel copy also depend on the sink tier (number of DTUs/RUs).<br/>- When copying data into Azure Table, default parallel copy is 4. |
90+
| Between non-file stores | - When copying data into Azure SQL Database or Azure Cosmos DB, default parallel copy also depend on the sink tier (number of DTUs/RUs).<br/>- When copying data from partition-option-enabled data store (including [Oracle](connector-oracle.md#oracle-as-source), [Netezza](connector-netezza.md#netezza-as-source), [Teradata](connector-teradata.md#teradata-as-source), [SAP HANA](connector-sap-hana.md#sap-hana-as-source), [SAP Table](connector-sap-table.md#sap-table-as-source), and [SAP Open Hub](connector-sap-business-warehouse-open-hub.md#sap-bw-open-hub-as-source)), default parallel copy is 4.<br>- When copying data into Azure Table, default parallel copy is 4. |
9191

9292
To control the load on machines that host your data stores, or to tune copy performance, you can override the default value and specify a value for the `parallelCopies` property. The value must be an integer greater than or equal to 1. At run time, for the best performance, the copy activity uses a value that is less than or equal to the value that you set.
9393

0 commit comments

Comments
 (0)