Merge pull request #231099 from whhender/adla-health-updates

prmerger-automator[bot] · web-flow · commit de8d284c9f68 · 2023-03-16T21:17:14.000Z
Adla health updates
diff --git a/articles/data-lake-analytics/data-lake-analytics-data-lake-tools-data-skew-solutions.md b/articles/data-lake-analytics/data-lake-analytics-data-lake-tools-data-skew-solutions.md
@@ -1,27 +1,32 @@
 ---
-title: Resolve data-skew - Azure Data Lake Tools for Visual Studio
-description: Troubleshooting potential solutions for data-skew problems by using Azure Data Lake Tools for Visual Studio.
+title: Resolve data-skew in Azure Data Lake Analytics using tools for Visual Studio
+description: Troubleshoot potential solutions for data-skew problems in Azure Data Lake Analytics by using Azure Data Lake Tools for Visual Studio.
 ms.reviewer: whhender
 ms.service: data-lake-analytics
 ms.topic: how-to
-ms.date: 01/20/2023
+ms.date: 03/16/2023
 ---
 
-# Resolve data-skew problems by using Azure Data Lake Tools for Visual Studio
+# Resolve data-skew problems in Azure Data Lake Analytics using Azure Data Lake Tools for Visual Studio
 
 [!INCLUDE [retirement-flag](includes/retirement-flag.md)]
 
 ## What is data skew?
 
-Briefly stated, data skew is an over-represented value. Imagine that you've assigned 50 tax examiners to audit tax returns, one examiner for each US state. The Wyoming examiner, because the population there is small, has little to do. In California, however, the examiner is kept busy because of the state's large population.
+Briefly stated, data skew is an over-represented value. Imagine that you've assigned 50 tax examiners to audit tax returns, one examiner for each US state. The Wyoming examiner, because the population there's small, has little to do. In California, however, the examiner is kept busy because of the state's large population.
 
 :::image type="content" source="./media/data-lake-analytics-data-lake-tools-data-skew-solutions/data-skew-problem.png" alt-text="A sample column chart showing the majority of data being grouped into two columns, rather than being evenly spread across categories." lightbox="./media/data-lake-analytics-data-lake-tools-data-skew-solutions/data-skew-problem.png":::
 
 In our scenario, the data is unevenly distributed across all tax examiners, which means that some examiners must work more than others. In your own job, you frequently experience situations like the tax-examiner example here. In more technical terms, one vertex gets much more data than its peers, a situation that makes the vertex work more than the others and that eventually slows down an entire job. What's worse, the job might fail, because vertices might have, for example, a 5-hour runtime limitation and a 6-GB memory limitation.
 
 ## Resolving data-skew problems
 
-Azure Data Lake Tools for Visual Studio can help detect whether your job has a data-skew problem. If a problem exists, you can resolve it by trying the solutions in this section.
+Azure Data Lake Tools for Visual Studio and Visual Studio Code can help detect whether your job has a data-skew problem.
+
+- [Install Azure Data Lake Tools for Visual Studio](data-lake-analytics-data-lake-tools-get-started.md#install-azure-data-lake-tools-for-visual-studio)
+- [Install Azure Data Lake Tools for Visual Studio Code](data-lake-analytics-data-lake-tools-for-vscode.md)
+
+If a problem exists, you can resolve it by trying the solutions in this section.
 
 ## Solution 1: Improve table partitioning
 
@@ -129,11 +134,11 @@ You can sometimes write a user-defined operator to deal with complicated process
 
 ### Option 1: Use a recursive reducer, if possible
 
-By default, a user-defined reducer runs in non-recursive mode, which means that reduce work for a key is distributed into a single vertex. But if your data is skewed, the huge data sets might be processed in a single vertex and run for a long time.
+By default, a user-defined reducer runs in nonrecursive mode, which means that reduce work for a key is distributed into a single vertex. But if your data is skewed, the huge data sets might be processed in a single vertex and run for a long time.
 
 To improve performance, you can add an attribute in your code to define reducer to run in recursive mode. Then, the huge data sets can be distributed to multiple vertices and run in parallel, which speeds up your job.
 
-To change a non-recursive reducer to recursive, you need to make sure that your algorithm is associative. For example, the sum is associative, and the median isn't. You also need to make sure that the input and output for reducer keep the same schema.
+To change a nonrecursive reducer to recursive, you need to make sure that your algorithm is associative. For example, the sum is associative, and the median isn't. You also need to make sure that the input and output for reducer keep the same schema.
 
 Attribute of recursive reducer:
 
diff --git a/articles/data-lake-analytics/data-lake-analytics-diagnostic-logs.md b/articles/data-lake-analytics/data-lake-analytics-diagnostic-logs.md
@@ -3,7 +3,7 @@ title: Enable and view diagnostic logs for Azure Data Lake Analytics
 description: Understand how to set up and access diagnostic logs for Azure Data Lake Analytics
 ms.service: data-lake-analytics
 ms.topic: how-to
-ms.date: 11/15/2022
+ms.date: 03/16/2023
 ---
 # Accessing diagnostic logs for Azure Data Lake Analytics
 
@@ -31,7 +31,7 @@ Diagnostic logging allows you to collect data access audit trails. These logs pr
 
      * Select **Archive to a storage account** to store logs in an Azure storage account. Use this option if you want to archive the data. If you select this option, you must provide an Azure storage account to save the logs to.
 
-     * Select **Stream to an event hub** to stream log data to an Azure Event Hub. Use this option if you have a downstream processing pipeline that is analyzing incoming logs in real time. If you select this option, you must provide the details for the Azure Event Hub you want to use.
+     * Select **Stream to an event hub** to stream log data to an Azure Event Hubs. Use this option if you have a downstream processing pipeline that is analyzing incoming logs in real time. If you select this option, you must provide the details for the Azure Event Hubs you want to use.
 
      * Select **Send to Log Analytics workspace** to send the data to the Azure Monitor service. Use this option if you want to use Azure Monitor logs to gather and analyze logs.
 
@@ -84,6 +84,10 @@ Diagnostic logging allows you to collect data access audit trails. These logs pr
 
     `https://adllogs.blob.core.windows.net/insights-logs-requests/resourceId=/SUBSCRIPTIONS/<sub-id>/RESOURCEGROUPS/myresourcegroup/PROVIDERS/MICROSOFT.DATALAKEANALYTICS/ACCOUNTS/mydatalakeanalytics/y=2016/m=07/d=18/h=14/m=00/PT1H.json`
 
+## Process the log data
+
+Azure Data Lake Analytics provides a sample on how to process and analyze the log data. You can find the sample at [https://github.com/Azure/AzureDataLake/tree/master/Samples/AzureDiagnosticsSample](https://github.com/Azure/AzureDataLake/tree/master/Samples/AzureDiagnosticsSample).
+
 ## Log structure
 
 The audit and request logs are in a structured JSON format.
@@ -144,8 +148,8 @@ Here's a sample entry in the JSON-formatted request log. Each blob has one root
 | Path |String |The path the operation was performed on |
 | RequestContentLength |int |The content length of the HTTP request |
 | ClientRequestId |String |The identifier that uniquely identifies this request |
-| StartTime |String |The time at which the server received the request |
-| EndTime |String |The time at which the server sent a response |
+| StartTime |String |The time when the server received the request |
+| EndTime |String |The time when the server sent a response |
 
 ### Audit logs
 
@@ -181,7 +185,7 @@ Here's a sample entry in the JSON-formatted audit log. Each blob has one root ob
 | category |String |The log category. For example, **Audit**. |
 | operationName |String |Name of the operation that is logged. For example, JobSubmitted. |
 | resultType |String |A substatus for the job status (operationName). |
-| resultSignature |String |Additional details on the job status (operationName). |
+| resultSignature |String |Extra details on the job status (operationName). |
 | identity |String |The user that requested the operation. For example, susan@contoso.com. |
 | properties |JSON |See the next section (Audit log properties schema) for details |
 
@@ -205,10 +209,7 @@ Here's a sample entry in the JSON-formatted audit log. Each blob has one root ob
 > [!NOTE]
 > **SubmitTime**, **StartTime**, **EndTime**, and **Parallelism** provide information on an operation. These entries only contain a value if that operation has started or completed. For example, **SubmitTime** only contains a value after **operationName** has the value **JobSubmitted**.
 
-## Process the log data
-
-Azure Data Lake Analytics provides a sample on how to process and analyze the log data. You can find the sample at [https://github.com/Azure/AzureDataLake/tree/master/Samples/AzureDiagnosticsSample](https://github.com/Azure/AzureDataLake/tree/master/Samples/AzureDiagnosticsSample).
-
 ## Next steps
 
 [Overview of Azure Data Lake Analytics](data-lake-analytics-overview.md)
+[Troubleshoot U-SQL jobs](runtime-troubleshoot.md)