Skip to content

Commit c3edf88

Browse files
committed
acrolinx improvements
1 parent 5044b04 commit c3edf88

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/data-factory/concepts-data-flow-monitoring.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ ms.date: 04/17/2020
1414

1515
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
1616

17-
After you have completed building and debugging your data flow, you will want to schedule your data flow to execute on a schedule within the context of a pipeline. You can schedule the pipeline from Azure Data Factory using Triggers. Or you can use the Trigger Now option from the Azure Data Factory Pipeline Builder to execute a single-run execution to test your data flow within the pipeline context.
17+
After you have completed building and debugging your data flow, you want to schedule your data flow to execute on a schedule within the context of a pipeline. You can schedule the pipeline from Azure Data Factory using Triggers. Or you can use the Trigger Now option from the Azure Data Factory Pipeline Builder to execute a single-run execution to test your data flow within the pipeline context.
1818

19-
When you execute your pipeline, you will be able to monitor the pipeline and all of the activities contained in the pipeline including the Data Flow activity. Click on the monitor icon in the left-hand Azure Data Factory UI panel. You will see a screen similar to the one below. The highlighted icons will allow you to drill into the activities in the pipeline, including the Data Flow activity.
19+
When you execute your pipeline, you can monitor the pipeline and all of the activities contained in the pipeline including the Data Flow activity. Click on the monitor icon in the left-hand Azure Data Factory UI panel. You can see a screen similar to the one below. The highlighted icons allow you to drill into the activities in the pipeline, including the Data Flow activity.
2020

2121
![Data Flow Monitoring](media/data-flow/mon001.png "Data Flow Monitoring")
2222

23-
You will see statistics at this level as well including the run times and status. The Run ID at the activity level is different that the Run ID at the pipeline level. The Run ID at the previous level is for the pipeline. Clicking the eyeglasses will give you deep details on your data flow execution.
23+
You see statistics at this level as well including the run times and status. The Run ID at the activity level is different than the Run ID at the pipeline level. The Run ID at the previous level is for the pipeline. Selecting the eyeglasses gives you deep details on your data flow execution.
2424

2525
![Data Flow Monitoring](media/data-flow/mon002.png "Data Flow Monitoring")
2626

27-
When you are in the graphical node monitoring view, you will see a simplified view-only version of your data flow graph.
27+
When you're in the graphical node monitoring view, you can see a simplified view-only version of your data flow graph.
2828

2929
![Data Flow Monitoring](media/data-flow/mon003.png "Data Flow Monitoring")
3030

@@ -34,18 +34,18 @@ Here is a video overview of monitoring performance of your data flows from the A
3434
3535
## View Data Flow Execution Plans
3636

37-
When your Data Flow is executed in Spark, Azure Data Factory determines optimal code paths based on the entirety of your data flow. Additionally, the execution paths may occur on different scale-out nodes and data partitions. Therefore, the monitoring graph represents the design of your flow, taking into account the execution path of your transformations. When you click on individual nodes, you will see "groupings" that represent code that was executed together on the cluster. The timings and counts that you see represent those groups as opposed to the individual steps in your design.
37+
When your Data Flow is executed in Spark, Azure Data Factory determines optimal code paths based on the entirety of your data flow. Additionally, the execution paths may occur on different scale-out nodes and data partitions. Therefore, the monitoring graph represents the design of your flow, taking into account the execution path of your transformations. When you select individual nodes, you can see "groupings" that represent code that was executed together on the cluster. The timings and counts that you see represent those groups as opposed to the individual steps in your design.
3838

3939
![Data Flow Monitoring](media/data-flow/mon004.png "Data Flow Monitoring")
4040

41-
* When you click on the open space in the monitoring window, the stats in the bottom pane will display timing and row counts for each Sink and the transformations that led to the sink data for transformation lineage.
41+
* When you select the open space in the monitoring window, the stats in the bottom pane display timing and row counts for each Sink and the transformations that led to the sink data for transformation lineage.
4242

43-
* When you select individual transformations, you will receive additional feedback on the right-hand panel that shows partition stats, column counts, skewness (how evenly is the data distributed across partitions), and kurtosis (how spiky is the data).
43+
* When you select individual transformations, you receive additional feedback on the right-hand panel that shows partition stats, column counts, skewness (how evenly is the data distributed across partitions), and kurtosis (how spiky is the data).
4444

45-
* When you click on the Sink in the node view, you will see column lineage. There are three different methods that columns are accumulated throughout your data flow to land in the Sink. They are:
45+
* When you select the Sink in the node view, you can see column lineage. There are three different methods that columns are accumulated throughout your data flow to land in the Sink. They are:
4646

47-
* Computed: You use the column for conditional processing or within an expression in your data flow, but do not land it in the Sink
48-
* Derived: The column is a new column that you generated in your flow, i.e. it was not present in the Source
47+
* Computed: You use the column for conditional processing or within an expression in your data flow, but don't land it in the Sink
48+
* Derived: The column is a new column that you generated in your flow, that is, it was not present in the Source
4949
* Mapped: The column originated from the source and your are mapping it to a sink field
5050
* Data flow status: The current status of your execution
5151
* Cluster startup time: Amount of time to acquire the JIT Spark compute environment for your data flow execution
@@ -59,4 +59,4 @@ This icon means that the transformation data was already cached on the cluster,
5959

6060
![Data Flow Monitoring](media/data-flow/mon004.png "Data Flow Monitoring")
6161

62-
You will also see green circle icons in the transformation. They represent a count of the number of sinks that data is flowing into.
62+
You also see green circle icons in the transformation. They represent a count of the number of sinks that data is flowing into.

0 commit comments

Comments
 (0)