Skip to content

Commit cceb892

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr into heidist-api
2 parents fb8472e + ae0c234 commit cceb892

12 files changed

+610
-350
lines changed

articles/data-factory/TOC.yml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -392,14 +392,22 @@
392392
items:
393393
- name: Copy data using Copy Activity
394394
href: copy-activity-overview.md
395+
- name: Monitor copy activity
396+
href: copy-activity-monitoring.md
395397
- name: Delete files using Delete Activity
396398
href: delete-activity.md
397399
- name: Copy Data tool
398400
href: copy-data-tool.md
399401
- name: Format and compression support
400402
href: supported-file-formats-and-compression-codecs.md
401-
- name: Performance and tuning
402-
href: copy-activity-performance.md
403+
- name: Copy activity performance
404+
items:
405+
- name: Performance and scalability guide
406+
href: copy-activity-performance.md
407+
- name: Troubleshoot performance
408+
href: copy-activity-performance-troubleshooting.md
409+
- name: Performance features
410+
href: copy-activity-performance-features.md
403411
- name: Preserve metadata and ACLs
404412
href: copy-activity-preserve-metadata.md
405413
- name: Schema and type mapping
@@ -689,6 +697,8 @@
689697
href: frequently-asked-questions.md
690698
- name: Service updates
691699
href: https://azure.microsoft.com/updates/?product=data-factory
700+
- name: Blog
701+
href: https://techcommunity.microsoft.com/t5/azure-data-factory/bg-p/AzureDataFactoryBlog#
692702
- name: Ask a question - MSDN forum
693703
href: https://social.msdn.microsoft.com/Forums/en-US/home?forum=AzureDataFactory&filter=alltypes&sort=lastpostdesc
694704
- name: Ask a question - Stack Overflow
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
title: Monitor copy activity
3+
description: Learn about how to monitor the copy activity execution in Azure Data Factory.
4+
services: data-factory
5+
documentationcenter: ''
6+
author: linda33wj
7+
manager: shwang
8+
ms.reviewer: douglasl
9+
10+
ms.service: data-factory
11+
ms.workload: data-services
12+
ms.topic: conceptual
13+
ms.date: 03/11/2020
14+
ms.author: jingwang
15+
16+
---
17+
# Monitor copy activity
18+
19+
This article outlines how to monitor the copy activity execution in Azure Data Factory. It builds on the [copy activity overview](copy-activity-overview.md) article that presents a general overview of copy activity.
20+
21+
## Monitor visually
22+
23+
Once you've created and published a pipeline in Azure Data Factory, you can associate it with a trigger or manually kick off an ad hoc run. You can monitor all of your pipeline runs natively in the Azure Data Factory user experience. Learn about Azure Data Factory monitoring in general from [Visually monitor Azure Data Factory](monitor-visually.md).
24+
25+
To monitor the Copy activity run, go to your data factory **Author & Monitor** UI. On the **Monitor** tab, you see a list of pipeline runs, click the **pipeline name** link to access the list of activity runs in the pipeline run.
26+
27+
![Monitor copy activity run](./media/copy-activity-overview/monitor-pipeline-run.png)
28+
29+
At this level, you can see links to copy activity input, output, and errors (if the Copy activity run fails), as well as statistics like duration/status. Clicking the **Details** button (eyeglasses) next to the copy activity name will give you deep details on your copy activity execution.
30+
31+
![Monitor copy activity run](./media/copy-activity-overview/monitor-copy-activity-run.png)
32+
33+
In this graphical monitoring view, Azure Data Factory presents you the copy activity execution information, including data read/written volume, number of files/rows of data copied from source to sink, throughput, the configurations applied for your copy scenario, steps the copy activity goes through with corresponding durations and details, and more. Refer to [this table](#monitor-programmatically) on each possible metric and its detailed description.
34+
35+
In some scenarios, when you run a Copy activity in Data Factory, you'll see **"Performance tuning tips"** at the top of the copy activity monitoring view as shown in the example. The tips tell you the bottleneck identified by ADF for the specific copy run, along with suggestion on what to change to boost copy throughput. Learn more about [auto performance tuning tips](copy-activity-performance-troubleshooting.md#performance-tuning-tips).
36+
37+
The bottom **execution details and durations** describes the key steps your copy activity goes through, which is especially useful for troubleshooting the copy performance. The bottleneck of your copy run is the one with the longest duration. Refer to [Troubleshoot copy activity performance](copy-activity-performance-troubleshooting.md) on for what each stage represents and the detailed troubleshooting guidance.
38+
39+
**Example: Copy from Amazon S3 to Azure Data Lake Storage Gen2**
40+
41+
![Monitor copy activity run details](./media/copy-activity-overview/monitor-copy-activity-run-details.png)
42+
43+
## Monitor programmatically
44+
45+
Copy activity execution details and performance characteristics are also returned in the **Copy Activity run result** > **Output** section, which is used to render the UI monitoring view. Following is a complete list of properties that might be returned. You'll see only the properties that are applicable to your copy scenario. For information about how to monitor activity runs programmatically in general, see [Programmatically monitor an Azure data factory](monitor-programmatically.md).
46+
47+
| Property name | Description | Unit in output |
48+
|:--- |:--- |:--- |
49+
| dataRead | The actual amount of data read from the source. | Int64 value, in bytes |
50+
| dataWritten | The actual mount of data written/committed to the sink. The size may be different from `dataRead` size, as it relates how each data store stores the data. | Int64 value, in bytes |
51+
| filesRead | The number of files read from the file-based source. | Int64 value (no unit) |
52+
| filesWritten | The number of files written/committed to the file-based sink. | Int64 value (no unit) |
53+
| sourcePeakConnections | Peak number of concurrent connections established to the source data store during the Copy activity run. | Int64 value (no unit) |
54+
| sinkPeakConnections | Peak number of concurrent connections established to the sink data store during the Copy activity run. | Int64 value (no unit) |
55+
| rowsRead | Number of rows read from the source (not applicable for binary copy). | Int64 value (no unit) |
56+
| rowsCopied | Number of rows copied to sink (not applicable for binary copy). | Int64 value (no unit) |
57+
| rowsSkipped | Number of incompatible rows that were skipped. You can enable incompatible rows to be skipped by setting `enableSkipIncompatibleRow` to true. | Int64 value (no unit) |
58+
| copyDuration | Duration of the copy run. | Int32 value, in seconds |
59+
| throughput | Rate of data transfer. | Floating point number, in KBps |
60+
| sourcePeakConnections | Peak number of concurrent connections established to the source data store during the Copy activity run. | Int32 value (no unit) |
61+
| sinkPeakConnections| Peak number of concurrent connections established to the sink data store during the Copy activity run.| Int32 value (no unit) |
62+
| sqlDwPolyBase | Whether PolyBase is used when data is copied into SQL Data Warehouse. | Boolean |
63+
| redshiftUnload | Whether UNLOAD is used when data is copied from Redshift. | Boolean |
64+
| hdfsDistcp | Whether DistCp is used when data is copied from HDFS. | Boolean |
65+
| effectiveIntegrationRuntime | The integration runtime (IR) or runtimes used to power the activity run, in the format `<IR name> (<region if it's Azure IR>)`. | Text (string) |
66+
| usedDataIntegrationUnits | The effective Data Integration Units during copy. | Int32 value |
67+
| usedParallelCopies | The effective parallelCopies during copy. | Int32 value |
68+
| redirectRowPath | Path to the log of skipped incompatible rows in the blob storage you configure in the `redirectIncompatibleRowSettings` property. See [Fault tolerance](copy-activity-overview.md#fault-tolerance). | Text (string) |
69+
| executionDetails | More details on the stages the Copy activity goes through and the corresponding steps, durations, configurations, and so on. We don't recommend that you parse this section because it might change. To better understand how it helps you understand and troubleshoot copy performance, refer to [Monitor visually](#monitor-visually) section. | Array |
70+
| perfRecommendation | Copy performance tuning tips. See [Performance tuning tips](copy-activity-performance-troubleshooting.md#performance-tuning-tips) for details. | Array |
71+
72+
**Example:**
73+
74+
```json
75+
"output": {
76+
"dataRead": 1180089300500,
77+
"dataWritten": 1180089300500,
78+
"filesRead": 110,
79+
"filesWritten": 110,
80+
"sourcePeakConnections": 640,
81+
"sinkPeakConnections": 1024,
82+
"copyDuration": 388,
83+
"throughput": 2970183,
84+
"errors": [],
85+
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US)",
86+
"usedDataIntegrationUnits": 128,
87+
"billingReference": "{\"activityType\":\"DataMovement\",\"billableDuration\":[{\"Managed\":11.733333333333336}]}",
88+
"usedParallelCopies": 64,
89+
"executionDetails": [
90+
{
91+
"source": {
92+
"type": "AmazonS3"
93+
},
94+
"sink": {
95+
"type": "AzureBlobFS",
96+
"region": "East US",
97+
"throttlingErrors": 6
98+
},
99+
"status": "Succeeded",
100+
"start": "2020-03-04T02:13:25.1454206Z",
101+
"duration": 388,
102+
"usedDataIntegrationUnits": 128,
103+
"usedParallelCopies": 64,
104+
"profile": {
105+
"queue": {
106+
"status": "Completed",
107+
"duration": 2
108+
},
109+
"transfer": {
110+
"status": "Completed",
111+
"duration": 386,
112+
"details": {
113+
"listingSource": {
114+
"type": "AmazonS3",
115+
"workingDuration": 0
116+
},
117+
"readingFromSource": {
118+
"type": "AmazonS3",
119+
"workingDuration": 301
120+
},
121+
"writingToSink": {
122+
"type": "AzureBlobFS",
123+
"workingDuration": 335
124+
}
125+
}
126+
}
127+
},
128+
"detailedDurations": {
129+
"queuingDuration": 2,
130+
"transferDuration": 386
131+
}
132+
}
133+
],
134+
"perfRecommendation": [
135+
{
136+
"Tip": "6 write operations were throttled by the sink data store. To achieve better performance, you are suggested to check and increase the allowed request rate for Azure Data Lake Storage Gen2, or reduce the number of concurrent copy runs and other data access, or reduce the DIU or parallel copy.",
137+
"ReferUrl": "https://go.microsoft.com/fwlink/?linkid=2102534 ",
138+
"RuleName": "ReduceThrottlingErrorPerfRecommendationRule"
139+
}
140+
],
141+
"durationInQueue": {
142+
"integrationRuntimeQueue": 0
143+
}
144+
}
145+
```
146+
147+
## Next steps
148+
See the other Copy Activity articles:
149+
150+
\- [Copy activity overview](copy-activity-overview.md)
151+
152+
\- [Copy activity performance](copy-activity-performance.md)

0 commit comments

Comments
 (0)