Skip to content

Commit a68a697

Browse files
committed
Health updates
1 parent 214ec6c commit a68a697

File tree

2 files changed

+14
-9
lines changed

2 files changed

+14
-9
lines changed

articles/data-lake-analytics/data-lake-analytics-data-lake-tools-data-skew-solutions.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,32 @@
11
---
2-
title: Resolve data-skew - Azure Data Lake Tools for Visual Studio
3-
description: Troubleshooting potential solutions for data-skew problems by using Azure Data Lake Tools for Visual Studio.
2+
title: Resolve data-skew in Azure Data Lake Analytics using tools for Visual Studio
3+
description: Troubleshoot potential solutions for data-skew problems in Azure Data Lake Analytics by using Azure Data Lake Tools for Visual Studio.
44
ms.reviewer: whhender
55
ms.service: data-lake-analytics
66
ms.topic: how-to
7-
ms.date: 01/20/2023
7+
ms.date: 03/16/2023
88
---
99

10-
# Resolve data-skew problems by using Azure Data Lake Tools for Visual Studio
10+
# Resolve data-skew problems in Azure Data Lake Analytics using Azure Data Lake Tools for Visual Studio
1111

1212
[!INCLUDE [retirement-flag](includes/retirement-flag.md)]
1313

1414
## What is data skew?
1515

16-
Briefly stated, data skew is an over-represented value. Imagine that you've assigned 50 tax examiners to audit tax returns, one examiner for each US state. The Wyoming examiner, because the population there is small, has little to do. In California, however, the examiner is kept busy because of the state's large population.
16+
Briefly stated, data skew is an over-represented value. Imagine that you've assigned 50 tax examiners to audit tax returns, one examiner for each US state. The Wyoming examiner, because the population there's small, has little to do. In California, however, the examiner is kept busy because of the state's large population.
1717

1818
:::image type="content" source="./media/data-lake-analytics-data-lake-tools-data-skew-solutions/data-skew-problem.png" alt-text="A sample column chart showing the majority of data being grouped into two columns, rather than being evenly spread across categories." lightbox="./media/data-lake-analytics-data-lake-tools-data-skew-solutions/data-skew-problem.png":::
1919

2020
In our scenario, the data is unevenly distributed across all tax examiners, which means that some examiners must work more than others. In your own job, you frequently experience situations like the tax-examiner example here. In more technical terms, one vertex gets much more data than its peers, a situation that makes the vertex work more than the others and that eventually slows down an entire job. What's worse, the job might fail, because vertices might have, for example, a 5-hour runtime limitation and a 6-GB memory limitation.
2121

2222
## Resolving data-skew problems
2323

24-
Azure Data Lake Tools for Visual Studio can help detect whether your job has a data-skew problem. If a problem exists, you can resolve it by trying the solutions in this section.
24+
Azure Data Lake Tools for Visual Studio and Visual Studio Code can help detect whether your job has a data-skew problem.
25+
26+
- [Install Azure Data Lake Tools for Visual Studio](data-lake-analytics-data-lake-tools-get-started.md#install-azure-data-lake-tools-for-visual-studio)
27+
- [Install Azure Data Lake Tools for Visual Studio Code](data-lake-analytics-data-lake-tools-for-vscode.md)
28+
29+
If a problem exists, you can resolve it by trying the solutions in this section.
2530

2631
## Solution 1: Improve table partitioning
2732

@@ -129,11 +134,11 @@ You can sometimes write a user-defined operator to deal with complicated process
129134

130135
### Option 1: Use a recursive reducer, if possible
131136

132-
By default, a user-defined reducer runs in non-recursive mode, which means that reduce work for a key is distributed into a single vertex. But if your data is skewed, the huge data sets might be processed in a single vertex and run for a long time.
137+
By default, a user-defined reducer runs in nonrecursive mode, which means that reduce work for a key is distributed into a single vertex. But if your data is skewed, the huge data sets might be processed in a single vertex and run for a long time.
133138

134139
To improve performance, you can add an attribute in your code to define reducer to run in recursive mode. Then, the huge data sets can be distributed to multiple vertices and run in parallel, which speeds up your job.
135140

136-
To change a non-recursive reducer to recursive, you need to make sure that your algorithm is associative. For example, the sum is associative, and the median isn't. You also need to make sure that the input and output for reducer keep the same schema.
141+
To change a nonrecursive reducer to recursive, you need to make sure that your algorithm is associative. For example, the sum is associative, and the median isn't. You also need to make sure that the input and output for reducer keep the same schema.
137142

138143
Attribute of recursive reducer:
139144

articles/data-lake-analytics/data-lake-analytics-diagnostic-logs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Enable and view diagnostic logs for Azure Data Lake Analytics
33
description: Understand how to set up and access diagnostic logs for Azure Data Lake Analytics
44
ms.service: data-lake-analytics
55
ms.topic: how-to
6-
ms.date: 11/15/2022
6+
ms.date: 03/16/2023
77
---
88
# Accessing diagnostic logs for Azure Data Lake Analytics
99

0 commit comments

Comments
 (0)