Skip to content

Commit cdf1f85

Browse files
committed
Acrolinx changes
1 parent 34bb680 commit cdf1f85

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

articles/data-explorer/data-factory-integration.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,15 @@ Various integrations with Azure Data Factory are available for Azure Data Explor
2222

2323
### Copy activity
2424

25-
Azure Data Factory Copy activity is used to transfer data between data stores. Azure Data Explorer is supported as a source, where data is copied from Azure Data Explorer to any supported data store, and a sink, where data is copied from any supported data store to Azure Data Explorer. For more information see [copy data to or from Azure Data Explorer using Azure Data Factory](/azure/data-factory/connector-azure-data-explorer). and for a detailed walk-through see [load data from Azure Data Factory into Azure Data Explorer](data-factory-load-data.md).
26-
Azure Data Explorer is supported by Azure IR (Integration Runtime), used when data is copied within Azure, and self-hosted IR, used when copying data from/to data stores located on-premises or in a network with access control, such as an Azure Virtual Network. For more information see [which IR to use](/azure/data-factory/concepts-integration-runtime#determining-which-ir-to-use)
25+
Azure Data Factory Copy activity is used to transfer data between data stores. Azure Data Explorer is supported as a source, where data is copied from Azure Data Explorer to any supported data store, and a sink, where data is copied from any supported data store to Azure Data Explorer. For more information, see [copy data to or from Azure Data Explorer using Azure Data Factory](/azure/data-factory/connector-azure-data-explorer). and for a detailed walk-through see [load data from Azure Data Factory into Azure Data Explorer](data-factory-load-data.md).
26+
Azure Data Explorer is supported by Azure IR (Integration Runtime), used when data is copied within Azure, and self-hosted IR, used when copying data from/to data stores located on-premises or in a network with access control, such as an Azure Virtual Network. For more information, see [which IR to use](/azure/data-factory/concepts-integration-runtime#determining-which-ir-to-use)
2727

2828
> [!TIP]
2929
> When using the copy activity and creating a **Linked Service** or a **Dataset**, select the data store **Azure Data Explorer (Kusto)** and not the old data store **Kusto**.
3030
### Lookup activity
3131

32-
The Lookup activity is used for executing queries on Azure Data Explorer. The result of the query will be returned as the output of the Lookup activity, and therefore, can be used in the next activity in the pipeline as described in the [ADF Lookup documentation](/azure/data-factory/control-flow-lookup-activity#use-the-lookup-activity-result-in-a-subsequent-activity).
33-
In addition to the response size limit of 5,000 rows and 2MB, the activity also has a query timeout limit of 1 hour.
32+
The Lookup activity is used for executing queries on Azure Data Explorer. The result of the query will be returned as the output of the Lookup activity, and can be used in the next activity in the pipeline as described in the [ADF Lookup documentation](/azure/data-factory/control-flow-lookup-activity#use-the-lookup-activity-result-in-a-subsequent-activity).
33+
In addition to the response size limit of 5,000 rows and 2 MB, the activity also has a query timeout limit of 1 hour.
3434

3535
### Command activity
3636

@@ -47,7 +47,7 @@ This section will assist you in selecting the correct activity for your data cop
4747

4848
When copying data from or to Azure Data Explorer, there are two available options in Azure Data Factory:
4949
* Copy activity.
50-
* Azure Data Explorer Command activity which executes one of the control commands that transfer data in Azure Data Explorer.
50+
* Azure Data Explorer Command activity, which executes one of the control commands that transfer data in Azure Data Explorer.
5151

5252
### Copying data from Azure Data Explorer
5353

@@ -57,7 +57,7 @@ See the following table for a comparison of the Copy activity and `.export` comm
5757

5858
| | Copy activity | .export command |
5959
|---|---|---|
60-
| **Flow description** | ADF executes a query on Kusto, processes the result, and sends it to the target data store. <br>(**ADX > ADF > sink data store**) | ADF sends an .export control command to Azure Data Explorer which executes the command, and sends the data directly to the target data store. <br>(**ADX > sink data store**) |
60+
| **Flow description** | ADF executes a query on Kusto, processes the result, and sends it to the target data store. <br>(**ADX > ADF > sink data store**) | ADF sends an `.export` control command to Azure Data Explorer, which executes the command, and sends the data directly to the target data store. <br>(**ADX > sink data store**) |
6161
| **Supported target data stores** | A wide variety of [supported data stores](/azure/data-factory/copy-activity-overview#supported-data-stores-and-formats) | ADLSv2, Azure Blob, SQL Database |
6262
| **Performance** | Centralized | <ul><li>Distributed (default), exporting data from multiple nodes concurrently</li><li>Faster and COGS efficient.</li></ul> |
6363
| **Server limits** | Query limits can be extended/disabled. By default, ADF queries contain: <ul><li>Size limit of 500,000 records or 64 MB.</li><li>Time limit of 10 minutes.</li><li>`noTruncation` set to false.</li></ul> | By default, extends or disables the query limits: <ul><li>Size limits are disabled.</li><li>Server timeout is extended to 1 hour.</li><li>`MaxMemoryConsumptionPerIterator` and `MaxMemoryConsumptionPerQueryPerNode` are extended to max (5 GB, TotalPhysicalMemory/2).</li></ul>
@@ -67,16 +67,16 @@ See the following table for a comparison of the Copy activity and `.export` comm
6767
6868
### Copying data to Azure Data Explorer
6969

70-
You can copy data to Azure Data Explorer using the copy activity or ingestion commands such as [ingest from query](/azure/kusto/management/data-ingestion/ingest-from-query) (`.set-or-append`, `.set-or-replace`, `.set`, `.replace)` and [ingest from storage](/azure/kusto/management/data-ingestion/ingest-from-storage) (`.ingest`).
70+
You can copy data to Azure Data Explorer using the copy activity or ingestion commands such as [ingest from query](/azure/kusto/management/data-ingestion/ingest-from-query) (`.set-or-append`, `.set-or-replace`, `.set`, `.replace)`, and [ingest from storage](/azure/kusto/management/data-ingestion/ingest-from-storage) (`.ingest`).
7171

7272
See the following table for a comparison of the Copy activity, and ingestion commands for copying data to Azure Data Explorer.
7373

7474
| | Copy activity | Ingest from query<br> `.set-or-append` / `.set-or-replace` / `.set` / `.replace` | Ingest from storage <br> `.ingest` |
7575
|---|---|---|---|
76-
| **Flow description** | ADF gets the data from the source data store, converts it into a tabular format, and performs the required schema mapping changes. ADF then uploads the data to Azure blobs, splits it into chunks, then downloads the blobs to ingest them into the ADX table. <br> (**Source data store > ADF > Azure blobs > ADX**) | These commands can execute a query or a `.show` command, and ingest the results of the query into a table (**ADX > ADX**). | This command ingests data into a table by "pulling" the data from one or more cloud storage artifacts. |
76+
| **Flow description** | ADF gets the data from the source data store, converts it into a tabular format, and does the required schema-mapping changes. ADF then uploads the data to Azure blobs, splits it into chunks, then downloads the blobs to ingest them into the ADX table. <br> (**Source data store > ADF > Azure blobs > ADX**) | These commands can execute a query or a `.show` command, and ingest the results of the query into a table (**ADX > ADX**). | This command ingests data into a table by "pulling" the data from one or more cloud storage artifacts. |
7777
| **Supported source data stores** | [variety of options](/azure/data-factory/copy-activity-overview#supported-data-stores-and-formats) | ADLS Gen 2, Azure Blob, SQL (using the sql_request plugin), Cosmos (using the cosmosdb_sql_request plugin), and any other data store that provides HTTP or Python APIs. | Filesystem, Azure Blob Storage, ADLS Gen 1, ADLS Gen 2 |
78-
| **Performance** | Ingestions are queued and managed, which ensures small size ingestions and assures high availability by providing load balancing, retries and error handling. | <ul><li>Those commands weren't designed for high volume data importing.</li><li>Works as expected and cheaper. But for production scenarios and when traffic rates and data sizes are large, use the Copy activity.</li></ul>
79-
| **Server Limits** | <ul><li>No size limit.</li><li>Max timeout limit: 1 hour per ingested blob. |<ul><li>There is only a size limit on the query part, which can be skipped by specifying `noTruncation=true`.</li><li>Max timeout limit: 1 hour.</li></ul> | <ul><li>No size limit.</li><li>Max timeout limit: 1 hour.</li></ul>|
78+
| **Performance** | Ingestions are queued and managed, which ensures small-size ingestions and assures high availability by providing load balancing, retries and error handling. | <ul><li>Those commands weren't designed for high volume data importing.</li><li>Works as expected and cheaper. But for production scenarios and when traffic rates and data sizes are large, use the Copy activity.</li></ul>
79+
| **Server Limits** | <ul><li>No size limit.</li><li>Max timeout limit: 1 hour per ingested blob. |<ul><li>There's only a size limit on the query part, which can be skipped by specifying `noTruncation=true`.</li><li>Max timeout limit: 1 hour.</li></ul> | <ul><li>No size limit.</li><li>Max timeout limit: 1 hour.</li></ul>|
8080

8181
> [!TIP]
8282
> * When copying data from ADF to Azure Data Explorer use the `ingest from query` commands.
@@ -89,13 +89,13 @@ The following table lists the required permissions for various steps in the inte
8989
| Step | Operation | Minimum level of permissions | Notes |
9090
|---|---|---|---|
9191
| **Create a Linked Service** | Database navigation | *database viewer* <br>The logged-in user using ADF should be authorized to read database metadata. | User can provide the database name manually. |
92-
| | Test Connection | *database monitor* or *table ingestor* <br>Service principal should be authorized to execute database level `.show` commands or table level ingestion. | <ul><li>TestConnection verifies the connection to the cluster, and not to the database. It can succeed even if the database doesn’t exists.</li><li>Table admin permissions are not sufficient.</li></ul>|
92+
| | Test Connection | *database monitor* or *table ingestor* <br>Service principal should be authorized to execute database level `.show` commands or table level ingestion. | <ul><li>TestConnection verifies the connection to the cluster, and not to the database. It can succeed even if the database doesn’t exist.</li><li>Table admin permissions aren't sufficient.</li></ul>|
9393
| **Creating a Dataset** | Table navigation | *database monitor* <br>The logged in user using ADF, must be authorized to execute database level `.show` commands. | User can provide table name manually.|
9494
| **Creating a Dataset** or **Copy Activity** | Preview data | *database viewer* <br>Service principal must be authorized to read database metadata. | |
9595
| | Import schema | *database viewer* <br>Service principal must be authorized to read database metadata. | When ADX is the source of a tabular-to-tabular copy, ADF will import schema automatically, even if the user didn’t import schema explicitly. |
9696
| **ADX as Sink** | Create a by-name column mapping | *database monitor* <br>Service principal must be authorized to execute database level `.show` commands. | <ul><li>All mandatory operations will work with *table ingestor*.</li><li> Some optional operations can fail.</li></ul> |
97-
| | <ul><li>Create a CSV mapping on the table</li><li>Drop the mapping</li></ul>| *table ingestor* or *database admin* <br>Service principal must be authorized to perform changes to a table. | |
98-
| | Ingest data | *table ingestor* or *database admin* <br>Service principal must be authorized to perform changes to a table. | |
97+
| | <ul><li>Create a CSV mapping on the table</li><li>Drop the mapping</li></ul>| *table ingestor* or *database admin* <br>Service principal must be authorized to make changes to a table. | |
98+
| | Ingest data | *table ingestor* or *database admin* <br>Service principal must be authorized to make changes to a table. | |
9999
| **ADX as source** | Execute query | *database viewer* <br>Service principal must be authorized to read database metadata. | |
100100
| **Kusto command** | | According to the permissions level of each command. |
101101

@@ -111,18 +111,18 @@ This section addresses the use of copy activity where Azure Data Explorer is the
111111
| **Number of DIUs** | 1 VM for every 4 DIUs used by ADF. <br>Increasing the DIUs will help only if your source is a file-based store with multiple files. Each VM will then process a different file in parallel. Therefore, copying a single large file will have a higher latency than copying multiple smaller files.|
112112
|**Amount and SKU of your ADX cluster** | High number of ADX nodes will boost ingestion processing time.|
113113
| Parallelism | To copy a very large amount of data from a database, partition your data and then use a ForEach loop that copies each partition in parallel or use the [Bulk Copy from Database to Azure Data Explorer Template](data-factory-template.md).
114-
Note: **Settings** > **Degree of Parallelism** in the Copy activity is not relevant to ADX.
114+
Note: **Settings** > **Degree of Parallelism** in the Copy activity isn't relevant to ADX.
115115
| **Data processing complexity** | Latency varies according to source file format, column mapping, and compression.|
116-
| **The VM running your integration runtime** | <ul><li>For Azure copy, those are ADF VMs and machine SKUs can't be changed.</li><li> For on-prem to Azure copy, ascertain that the VM hosting your self-hosted IR is strong enough.</li></ul>|
116+
| **The VM running your integration runtime** | <ul><li>For Azure copy, ADF VMs and machine SKUs can't be changed.</li><li> For on-prem to Azure copy, determine that the VM hosting your self-hosted IR is strong enough.</li></ul>|
117117

118118
## Monitoring activity progress
119119

120120
* When monitoring the activity progress, the *Data written* property may be much larger than the *Data read* property
121121
because *Data read* is calculated according to the binary file size, while *Data written* is calculated according to the in-memory size, after data is de-serialized and decompressed.
122122

123-
* When monitoring the activity progress, you can see that data is written to the Azure Data Explorer sink. When querying the Azure Data Explorer table, you see that data hasn't arrived. This is due to the fact that there are two stages when copying to Azure Data Explorer.
124-
* First stage reads the source data, splits it to 900 MB chunks, and uploads each chunk to an Azure Blob. The first stage is seen by the ADF activity progress view.
125-
* The second stage begins once all the data is uploaded to Azure Blobs. The Azure Data Explorer engine nodes are begin to download the blobs and ingest the data into the sink table. The data can be then be seen in your Azure Data Explorer table.
123+
* When monitoring the activity progress, you can see that data is written to the Azure Data Explorer sink. When querying the Azure Data Explorer table, you see that data hasn't arrived. This is because there are two stages when copying to Azure Data Explorer.
124+
* First stage reads the source data, splits it to 900-MB chunks, and uploads each chunk to an Azure Blob. The first stage is seen by the ADF activity progress view.
125+
* The second stage begins once all the data is uploaded to Azure Blobs. The Azure Data Explorer engine nodes download the blobs and ingest the data into the sink table. The data is then seen in your Azure Data Explorer table.
126126

127127
## Next steps
128128

0 commit comments

Comments
 (0)