Regressions and false positives

ttorble · web-flow · commit 5409f93bd421 · 2024-05-23T09:42:25.000+01:00
diff --git a/articles/hdinsight/domain-joined/apache-domain-joined-run-hive.md b/articles/hdinsight/domain-joined/apache-domain-joined-run-hive.md
@@ -146,7 +146,7 @@ To test the second policy (read-hivesampletable-devicemake) that you created in
     SELECT clientid, devicemake FROM "HIVE"."default"."hivesampletable"
     ```
 
-    When finished, you see two columns of imported data.
+    When it's finished, you see two columns of imported data.
 
 ## Next steps
 
diff --git a/articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-dotnet-sdk.md b/articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-dotnet-sdk.md
@@ -11,7 +11,7 @@ ms.date: 05/22/2024
 
 [!INCLUDE [mapreduce-selector](../includes/hdinsight-selector-use-mapreduce.md)]
 
-Learn how to submit MapReduce jobs using HDInsight .NET SDK. HDInsight clusters come with a jar file with some MapReduce samples. The jar file is`/example/jars/hadoop-mapreduce-examples.jar`. One of the samples is **wordcount**. You develop a C# console application to submit a wordcount job. The job reads the `/example/data/gutenberg/davinci.txt` file, and outputs the results to `/example/data/davinciwordcount`.  If you want to rerun the application, you must clean up the output folder.
+Learn how to submit MapReduce jobs using HDInsight .NET SDK. HDInsight clusters come with a jar file with some MapReduce samples. The jar file is `/example/jars/hadoop-mapreduce-examples.jar`. One of the samples is **wordcount**. You develop a C# console application to submit a wordcount job. The job reads the `/example/data/gutenberg/davinci.txt` file, and outputs the results to `/example/data/davinciwordcount`.  If you want to rerun the application, you must clean up the output folder.
 
 > [!NOTE]  
 > The steps in this article must be performed from a Windows client. For information on using a Linux, OS X, or Unix client to work with Hive, use the tab selector shown on the top of the article.
@@ -34,7 +34,7 @@ The HDInsight .NET SDK provides .NET client libraries, which make it easier to w
     Install-Package Microsoft.Azure.Management.HDInsight.Job
     ```
 
-1. Copy the code into **Program.cs**. Then edit the code by setting the values for: `existingClusterName`, `existingClusterPassword`, `defaultStorageAccountName`, `defaultStorageAccountKey`, and `defaultStorageContainerName`.
+1. Copy the code below into **Program.cs**. Then edit the code by setting the values for: `existingClusterName`, `existingClusterPassword`, `defaultStorageAccountName`, `defaultStorageAccountKey`, and `defaultStorageContainerName`.
 
     ```csharp
     using System.Collections.Generic;
@@ -155,7 +155,7 @@ The HDInsight .NET SDK provides .NET client libraries, which make it easier to w
 
 1. Press **F5** to run the application.
 
-To run the job again, you must change the job output folder name, in the sample its `/example/data/davinciwordcount`.
+To run the job again, you must change the job output folder name, in the sample it's `/example/data/davinciwordcount`.
 
 When the job completes successfully, the application prints the content of the output file `part-r-00000`.
 
diff --git a/articles/hdinsight/hdinsight-autoscale-clusters.md b/articles/hdinsight/hdinsight-autoscale-clusters.md
@@ -26,7 +26,7 @@ Schedule-based scaling can be used:
 
 Load based scaling can be used: 
 
-* When the load patterns fluctuate substantially and unpredictably during the day, for example, order data processing with random fluctuations in load patterns based on various factors.
+* When the load patterns fluctuate substantially and unpredictably during the day. For example, order data processing with random fluctuations in load patterns based on various factors.
 
 ### Cluster metrics
 
@@ -228,7 +228,7 @@ All of the cluster status messages that you might see are explained in the follo
 | Updating  | The cluster Autoscale configuration is being updated.  |
 | HDInsight configuration  | A cluster scale up or scale down operation is in progress.  |
 | Updating Error  | HDInsight met issues during the Autoscale configuration update. Customers can choose to either retry the update or disable autoscale.  |
-| Error  | Something is wrong with the cluster, and it'sn't usable. Delete this cluster and create a new one.  |
+| Error  | Something is wrong with the cluster, and it isn't usable. Delete this cluster and create a new one.  |
 
 To view the current number of nodes in your cluster, go to the **Cluster size** chart on the **Overview** page for your cluster. Or select **Cluster size** under **Settings**.
 
diff --git a/articles/hdinsight/hdinsight-hadoop-manage-ambari-rest-api.md b/articles/hdinsight/hdinsight-hadoop-manage-ambari-rest-api.md
@@ -21,7 +21,7 @@ Apache Ambari simplifies the management and monitoring of Hadoop clusters by pro
 
 * A Hadoop cluster on HDInsight. See [Get Started with HDInsight on Linux](hadoop/apache-hadoop-linux-tutorial-get-started.md).
 
-* Bash on Ubuntu on Windows 10.  The examples in this article use the Bash shell on Windows 10. See [Windows Subsystem for Linux Installation Guide for Windows 10](/windows/wsl/install-win10) for installation steps.  Other [Unix shells](https://www.gnu.org/software/bash/) works as well.  The examples, with some slight modifications, can work on a Windows Command prompt.  Or you can use Windows PowerShell.
+* Bash on Ubuntu on Windows 10.  The examples in this article use the Bash shell on Windows 10. See [Windows Subsystem for Linux Installation Guide for Windows 10](/windows/wsl/install-win10) for installation steps.  Other [Unix shells](https://www.gnu.org/software/bash/) work as well.  The examples, with some slight modifications, can work on a Windows Command prompt.  Or you can use Windows PowerShell.
 
 * jq, a command-line JSON processor.  See [https://stedolan.github.io/jq/](https://stedolan.github.io/jq/).
 
@@ -41,7 +41,7 @@ For Enterprise Security Package clusters, instead of `admin`, use a fully qualif
 
 ### Setup (Preserve credentials)
 
-Preserve your credentials to avoid reentering them for each example.  The cluster name preserved in a separate step.
+Preserve your credentials to avoid reentering them for each example.  The cluster name is preserved in a separate step.
 
 **A. Bash**  
 Edit the script by replacing `PASSWORD` with your actual password.  Then enter the command.
@@ -185,7 +185,7 @@ foreach($item in $respObj.items) {
 
 ### Get the default storage
 
-HDInsight clusters must use an Azure Storage Account or Data Lake Storage as the default storage. You can use Ambari to retrieve this information after the cluster created. For example, if you want to read/write data to the container outside HDInsight.
+HDInsight clusters must use an Azure Storage Account or Data Lake Storage as the default storage. You can use Ambari to retrieve this information after the cluster has been created. For example, if you want to read/write data to the container outside HDInsight.
 
 The following examples retrieve the default storage configuration from the cluster:
 
@@ -202,7 +202,7 @@ $respObj.items.configurations.properties.'fs.defaultFS'
 ```
 
 > [!IMPORTANT]  
-> These examples return the first configuration applied to the server (`service_config_version=1`) which contains this information. If you retrieve a value that modified after cluster creation, you may need to list the configuration versions and retrieve the latest one.
+> These examples return the first configuration applied to the server (`service_config_version=1`) which contains this information. If you retrieve a value that has been modified after cluster creation, you may need to list the configuration versions and retrieve the latest one.
 
 The return value is similar to one of the following examples:
 
@@ -310,7 +310,7 @@ This example returns a JSON document containing the current configuration for th
      ```
 
      **B. PowerShell**  
-     The PowerShell script uses [jq](https://stedolan.github.io/jq/).  Edit `C:\HD\jq\jq-win64` to reflect your actual path and version of [jq](https://stedolan.github.io/jq/).
+     The PowerShell script uses [jq](https://stedolan.github.io/jq/).  Edit `C:\HD\jq\jq-win64` below to reflect your actual path and version of [jq](https://stedolan.github.io/jq/).
 
      ```powershell
      $epoch = Get-Date -Year 1970 -Month 1 -Day 1 -Hour 0 -Minute 0 -Second 0
@@ -385,7 +385,7 @@ This example returns a JSON document containing the current configuration for th
 
 At this point, the Ambari web UI indicates the Spark service  needs to be restarted before the new configuration can take effect. Use the following steps to restart the service.
 
-1. Use the following to enable maintenance mode for the Spark 2 service:
+1. Use the following to enable maintenance mode for the Spark2 service:
 
     ```bash
     curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -420,7 +420,7 @@ At this point, the Ambari web UI indicates the Spark service  needs to be restar
 
     The return value is `ON`.
 
-3. Next, use the following to turn off the Spark 2 service:
+3. Next, use the following to turn off the Spark2 service:
 
     ```bash
     curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -453,7 +453,7 @@ At this point, the Ambari web UI indicates the Spark service  needs to be restar
     > The `href` value returned by this URI is using the internal IP address of the cluster node. To use it from outside the cluster, replace the `10.0.0.18:8080` portion with the FQDN of the cluster.  
 
 4. Verify request.  
-    Edit the command by replacing `29` with the actual value for `id` returned from the  prior step.  The following commands retrieve the status of the request:
+    Edit the command below by replacing `29` with the actual value for `id` returned from the  prior step.  The following commands retrieve the status of the request:
 
     ```bash
     curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -468,9 +468,9 @@ At this point, the Ambari web UI indicates the Spark service  needs to be restar
     $respObj.Requests.request_status
     ```
 
-    A response of `COMPLETED` indicates that the request finished.
+    A response of `COMPLETED` indicates that the request has finished.
 
-5. Once the previous request completes, use the following to start the Spark 2 service.
+5. Once the previous request completes, use the following to start the Spark2 service.
 
     ```bash
     curl -u admin:$password -sS -H "X-Requested-By: ambari" \
diff --git a/articles/hdinsight/hdinsight-hadoop-script-actions-linux.md b/articles/hdinsight/hdinsight-hadoop-script-actions-linux.md
@@ -34,7 +34,7 @@ When you develop a custom script for an HDInsight cluster, there are several bes
 * [Target the Apache Hadoop version](#bPS1)
 * [Target the OS Version](#bps10)
 * [Provide stable links to script resources](#bPS2)
-* [Use precompiled resources](#bPS4)
+* [Use pre-compiled resources](#bPS4)
 * [Ensure that the cluster customization script is idempotent](#bPS3)
 * [Ensure high availability of the cluster architecture](#bPS5)
 * [Configure the custom components to use Azure Blob storage](#bPS6)
@@ -118,15 +118,15 @@ The best practice is to download and archive everything in an Azure Storage acco
 
 For example, the samples provided by Microsoft are stored in the `https://hdiconfigactions.blob.core.windows.net/`  storage account. This location is a public, read-only container maintained by the HDInsight team.
 
-### <a name="bPS4"></a>Use precompiled resources
+### <a name="bPS4"></a>Use pre-compiled resources
 
-To reduce the time it takes to run the script, avoid operations that compile resources from source code. For example, precompile resources and store them in an Azure Storage account blob in the same data center as HDInsight.
+To reduce the time it takes to run the script, avoid operations that compile resources from source code. For example, pre-compile resources and store them in an Azure Storage account blob in the same data center as HDInsight.
 
 ### <a name="bPS3"></a>Ensure that the cluster customization script is idempotent
 
 Scripts must be idempotent. If the script runs multiple times, it should return the cluster to the same state every time.
 
-If the script runs multiple times, the script modifies configuration files shouldn't add duplicate entries.
+If the script runs multiple times, the script that modifies configuration files shouldn't add duplicate entries.
 
 ### <a name="bPS5"></a>Ensure high availability of the cluster architecture
 
diff --git a/articles/hdinsight/hdinsight-phoenix-in-hdinsight.md b/articles/hdinsight/hdinsight-phoenix-in-hdinsight.md
@@ -11,7 +11,7 @@ ms.date: 05/22/2024
 
 [Apache Phoenix](https://phoenix.apache.org/) is an open source, massively parallel relational database layer built on [Apache HBase](hbase/apache-hbase-overview.md). Phoenix allows you to use SQL-like queries over HBase. Phoenix uses JDBC drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upsert rows individually and in bulk. Phoenix uses noSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase. Phoenix adds coprocessors to support running client-supplied code in the address space of the server, executing the code colocated with the data. This approach minimizes client/server data transfer.
 
-Apache Phoenix opens up big data queries to nondevelopers who can use a SQL-like syntax rather than programming. Phoenix is highly optimized for HBase, unlike other tools such as [Apache Hive](hadoop/hdinsight-use-hive.md) and Apache Spark SQL. The benefit to developers is writing highly performant queries with much less code.
+Apache Phoenix opens up big data queries to non-developers who can use a SQL-like syntax rather than programming. Phoenix is highly optimized for HBase, unlike other tools such as [Apache Hive](hadoop/hdinsight-use-hive.md) and Apache Spark SQL. The benefit to developers is writing highly performant queries with much less code.
 
 When you submit a SQL query, Phoenix compiles the query to HBase native calls and runs the scan (or plan) in parallel for optimization. This layer of abstraction frees the developer from writing MapReduce jobs,  to focus instead on the business logic and the workflow of their application around Phoenix's big data storage.
 
@@ -89,9 +89,9 @@ ALTER TABLE my_other_table SET TRANSACTIONAL=true;
 
 ### Salted Tables
 
-*Region server hotspotting* can occur  when writing records with sequential keys to HBase. Though you may have multiple region servers in your cluster, your writes are all occurring on just one. This concentration creates the hotspotting issue where, instead of your write workload being distributed across all of the available region servers, just one is handling the load. Since each region has a predefined maximum size, when a region reaches that size limit, split into two small regions. When that happens, one of these new regions takes all new records, becoming the new hotspot.
+*Region server hotspotting* can occur  when writing records with sequential keys to HBase. Though you may have multiple region servers in your cluster, your writes are all occurring on just one. This concentration creates the hotspotting issue where, instead of your write workload being distributed across all of the available region servers, just one is handling the load. Since each region has a predefined maximum size, when a region reaches that size limit, it's split into two small regions. When that happens, one of these new regions takes all new records, becoming the new hotspot.
 
-To mitigate this problem and achieve better performance,  presplit tables so  that all of the region servers are equally used. Phoenix provides *salted tables*,  transparently adding the salting byte to the row key for a particular table. The table is presplit on the salt byte boundaries to ensure equal load distribution among region servers during the initial phase of the table. This approach distributes the write workload across all of the available region servers, improving the write and read performance. To salt a table,  specify the `SALT_BUCKETS` table property when the table is created:
+To mitigate this problem and achieve better performance,  pre-split tables so  that all of the region servers are equally used. Phoenix provides *salted tables*,  transparently adding the salting byte to the row key for a particular table. The table is pre-split on the salt byte boundaries to ensure equal load distribution among region servers during the initial phase of the table. This approach distributes the write workload across all of the available region servers, improving the write and read performance. To salt a table,  specify the `SALT_BUCKETS` table property when the table is created:
 
 ```sql
 CREATE TABLE Saltedweblogs (
diff --git a/articles/hdinsight/hdinsight-selecting-vm-size.md b/articles/hdinsight/hdinsight-selecting-vm-size.md
@@ -11,13 +11,13 @@ ms.date: 05/22/2024
 
 This article discusses how to select the right VM size for the various nodes in your HDInsight cluster. 
 
-Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency affects the processing of your workloads. Next, think about your application and how it matches with what different VM families are optimized for. Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. For a list of all supported and recommended VM sizes for each cluster type, see [Azure HDInsight supported node configurations](hdinsight-supported-node-configuration.md). Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.
+Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency affect the processing of your workloads. Next, think about your application and how it matches with what different VM families are optimized for. Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. For a list of all supported and recommended VM sizes for each cluster type, see [Azure HDInsight supported node configurations](hdinsight-supported-node-configuration.md). Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.
 
 For more information on planning other aspects of your cluster such as selecting a storage type or cluster size, see [Capacity planning for HDInsight clusters](hdinsight-capacity-planning.md).
 
 ## VM properties and big data workloads
 
-The VM size and type determined by CPU processing power, RAM size, and network latency:
+The VM size and type are determined by CPU processing power, RAM size, and network latency:
 
 - CPU: The VM size dictates the number of cores. The more cores, the greater the degree of parallel computation each node can achieve. Also, some VM types have faster cores.
 
@@ -40,7 +40,7 @@ Virtual machine families in Azure are optimized to suit different use cases. In
 
 ## Cost saving VM types for light workloads
 
-If you have light processing requirements, the [F-series](https://azure.microsoft.com/blog/f-series-vm-size/) can be a good choice to get started with HDInsight. At a lower per-hour list price, the F-series are the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.
+If you have light processing requirements, the [F-series](https://azure.microsoft.com/blog/f-series-vm-size/) can be a good choice to get started with HDInsight. At a lower per-hour list price, the F-series is the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.
 
 The following table describes the cluster types and node types, which can be created with the Fsv2-series VMs.
 
diff --git a/articles/hdinsight/interactive-query/hdinsight-connect-hive-zeppelin.md b/articles/hdinsight/interactive-query/hdinsight-connect-hive-zeppelin.md
@@ -61,7 +61,7 @@ An HDInsight Interactive Query cluster. See [Create cluster](../hadoop/apache-ha
     limit ${total_count=10}
     ```
 
-    When you Compare the traditional Hive, the query results come back must faster.
+    Compared to the traditional Hive, the query results come back much faster.
 
 ### More examples
 
diff --git a/articles/hdinsight/share-hive-metastore-with-synapse.md b/articles/hdinsight/share-hive-metastore-with-synapse.md