Skip to content

Commit 5409f93

Browse files
authored
Regressions and false positives
1 parent 4fc1d30 commit 5409f93

9 files changed

+41
-41
lines changed

articles/hdinsight/domain-joined/apache-domain-joined-run-hive.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ To test the second policy (read-hivesampletable-devicemake) that you created in
146146
SELECT clientid, devicemake FROM "HIVE"."default"."hivesampletable"
147147
```
148148

149-
When finished, you see two columns of imported data.
149+
When it's finished, you see two columns of imported data.
150150
151151
## Next steps
152152

articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-dotnet-sdk.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.date: 05/22/2024
1111

1212
[!INCLUDE [mapreduce-selector](../includes/hdinsight-selector-use-mapreduce.md)]
1313

14-
Learn how to submit MapReduce jobs using HDInsight .NET SDK. HDInsight clusters come with a jar file with some MapReduce samples. The jar file is`/example/jars/hadoop-mapreduce-examples.jar`. One of the samples is **wordcount**. You develop a C# console application to submit a wordcount job. The job reads the `/example/data/gutenberg/davinci.txt` file, and outputs the results to `/example/data/davinciwordcount`. If you want to rerun the application, you must clean up the output folder.
14+
Learn how to submit MapReduce jobs using HDInsight .NET SDK. HDInsight clusters come with a jar file with some MapReduce samples. The jar file is `/example/jars/hadoop-mapreduce-examples.jar`. One of the samples is **wordcount**. You develop a C# console application to submit a wordcount job. The job reads the `/example/data/gutenberg/davinci.txt` file, and outputs the results to `/example/data/davinciwordcount`. If you want to rerun the application, you must clean up the output folder.
1515

1616
> [!NOTE]
1717
> The steps in this article must be performed from a Windows client. For information on using a Linux, OS X, or Unix client to work with Hive, use the tab selector shown on the top of the article.
@@ -34,7 +34,7 @@ The HDInsight .NET SDK provides .NET client libraries, which make it easier to w
3434
Install-Package Microsoft.Azure.Management.HDInsight.Job
3535
```
3636
37-
1. Copy the code into **Program.cs**. Then edit the code by setting the values for: `existingClusterName`, `existingClusterPassword`, `defaultStorageAccountName`, `defaultStorageAccountKey`, and `defaultStorageContainerName`.
37+
1. Copy the code below into **Program.cs**. Then edit the code by setting the values for: `existingClusterName`, `existingClusterPassword`, `defaultStorageAccountName`, `defaultStorageAccountKey`, and `defaultStorageContainerName`.
3838
3939
```csharp
4040
using System.Collections.Generic;
@@ -155,7 +155,7 @@ The HDInsight .NET SDK provides .NET client libraries, which make it easier to w
155155
156156
1. Press **F5** to run the application.
157157
158-
To run the job again, you must change the job output folder name, in the sample its `/example/data/davinciwordcount`.
158+
To run the job again, you must change the job output folder name, in the sample it's `/example/data/davinciwordcount`.
159159
160160
When the job completes successfully, the application prints the content of the output file `part-r-00000`.
161161

articles/hdinsight/hdinsight-autoscale-clusters.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Schedule-based scaling can be used:
2626

2727
Load based scaling can be used:
2828

29-
* When the load patterns fluctuate substantially and unpredictably during the day, for example, order data processing with random fluctuations in load patterns based on various factors.
29+
* When the load patterns fluctuate substantially and unpredictably during the day. For example, order data processing with random fluctuations in load patterns based on various factors.
3030

3131
### Cluster metrics
3232

@@ -228,7 +228,7 @@ All of the cluster status messages that you might see are explained in the follo
228228
| Updating | The cluster Autoscale configuration is being updated. |
229229
| HDInsight configuration | A cluster scale up or scale down operation is in progress. |
230230
| Updating Error | HDInsight met issues during the Autoscale configuration update. Customers can choose to either retry the update or disable autoscale. |
231-
| Error | Something is wrong with the cluster, and it'sn't usable. Delete this cluster and create a new one. |
231+
| Error | Something is wrong with the cluster, and it isn't usable. Delete this cluster and create a new one. |
232232

233233
To view the current number of nodes in your cluster, go to the **Cluster size** chart on the **Overview** page for your cluster. Or select **Cluster size** under **Settings**.
234234

articles/hdinsight/hdinsight-hadoop-manage-ambari-rest-api.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Apache Ambari simplifies the management and monitoring of Hadoop clusters by pro
2121

2222
* A Hadoop cluster on HDInsight. See [Get Started with HDInsight on Linux](hadoop/apache-hadoop-linux-tutorial-get-started.md).
2323

24-
* Bash on Ubuntu on Windows 10. The examples in this article use the Bash shell on Windows 10. See [Windows Subsystem for Linux Installation Guide for Windows 10](/windows/wsl/install-win10) for installation steps. Other [Unix shells](https://www.gnu.org/software/bash/) works as well. The examples, with some slight modifications, can work on a Windows Command prompt. Or you can use Windows PowerShell.
24+
* Bash on Ubuntu on Windows 10. The examples in this article use the Bash shell on Windows 10. See [Windows Subsystem for Linux Installation Guide for Windows 10](/windows/wsl/install-win10) for installation steps. Other [Unix shells](https://www.gnu.org/software/bash/) work as well. The examples, with some slight modifications, can work on a Windows Command prompt. Or you can use Windows PowerShell.
2525

2626
* jq, a command-line JSON processor. See [https://stedolan.github.io/jq/](https://stedolan.github.io/jq/).
2727

@@ -41,7 +41,7 @@ For Enterprise Security Package clusters, instead of `admin`, use a fully qualif
4141

4242
### Setup (Preserve credentials)
4343

44-
Preserve your credentials to avoid reentering them for each example. The cluster name preserved in a separate step.
44+
Preserve your credentials to avoid reentering them for each example. The cluster name is preserved in a separate step.
4545

4646
**A. Bash**
4747
Edit the script by replacing `PASSWORD` with your actual password. Then enter the command.
@@ -185,7 +185,7 @@ foreach($item in $respObj.items) {
185185

186186
### Get the default storage
187187

188-
HDInsight clusters must use an Azure Storage Account or Data Lake Storage as the default storage. You can use Ambari to retrieve this information after the cluster created. For example, if you want to read/write data to the container outside HDInsight.
188+
HDInsight clusters must use an Azure Storage Account or Data Lake Storage as the default storage. You can use Ambari to retrieve this information after the cluster has been created. For example, if you want to read/write data to the container outside HDInsight.
189189

190190
The following examples retrieve the default storage configuration from the cluster:
191191

@@ -202,7 +202,7 @@ $respObj.items.configurations.properties.'fs.defaultFS'
202202
```
203203

204204
> [!IMPORTANT]
205-
> These examples return the first configuration applied to the server (`service_config_version=1`) which contains this information. If you retrieve a value that modified after cluster creation, you may need to list the configuration versions and retrieve the latest one.
205+
> These examples return the first configuration applied to the server (`service_config_version=1`) which contains this information. If you retrieve a value that has been modified after cluster creation, you may need to list the configuration versions and retrieve the latest one.
206206
207207
The return value is similar to one of the following examples:
208208

@@ -310,7 +310,7 @@ This example returns a JSON document containing the current configuration for th
310310
```
311311
312312
**B. PowerShell**
313-
The PowerShell script uses [jq](https://stedolan.github.io/jq/). Edit `C:\HD\jq\jq-win64` to reflect your actual path and version of [jq](https://stedolan.github.io/jq/).
313+
The PowerShell script uses [jq](https://stedolan.github.io/jq/). Edit `C:\HD\jq\jq-win64` below to reflect your actual path and version of [jq](https://stedolan.github.io/jq/).
314314
315315
```powershell
316316
$epoch = Get-Date -Year 1970 -Month 1 -Day 1 -Hour 0 -Minute 0 -Second 0
@@ -385,7 +385,7 @@ This example returns a JSON document containing the current configuration for th
385385
386386
At this point, the Ambari web UI indicates the Spark service needs to be restarted before the new configuration can take effect. Use the following steps to restart the service.
387387
388-
1. Use the following to enable maintenance mode for the Spark 2 service:
388+
1. Use the following to enable maintenance mode for the Spark2 service:
389389
390390
```bash
391391
curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -420,7 +420,7 @@ At this point, the Ambari web UI indicates the Spark service needs to be restar
420420
421421
The return value is `ON`.
422422
423-
3. Next, use the following to turn off the Spark 2 service:
423+
3. Next, use the following to turn off the Spark2 service:
424424
425425
```bash
426426
curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -453,7 +453,7 @@ At this point, the Ambari web UI indicates the Spark service needs to be restar
453453
> The `href` value returned by this URI is using the internal IP address of the cluster node. To use it from outside the cluster, replace the `10.0.0.18:8080` portion with the FQDN of the cluster.
454454
455455
4. Verify request.
456-
Edit the command by replacing `29` with the actual value for `id` returned from the prior step. The following commands retrieve the status of the request:
456+
Edit the command below by replacing `29` with the actual value for `id` returned from the prior step. The following commands retrieve the status of the request:
457457
458458
```bash
459459
curl -u admin:$password -sS -H "X-Requested-By: ambari" \
@@ -468,9 +468,9 @@ At this point, the Ambari web UI indicates the Spark service needs to be restar
468468
$respObj.Requests.request_status
469469
```
470470
471-
A response of `COMPLETED` indicates that the request finished.
471+
A response of `COMPLETED` indicates that the request has finished.
472472
473-
5. Once the previous request completes, use the following to start the Spark 2 service.
473+
5. Once the previous request completes, use the following to start the Spark2 service.
474474
475475
```bash
476476
curl -u admin:$password -sS -H "X-Requested-By: ambari" \

articles/hdinsight/hdinsight-hadoop-script-actions-linux.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ When you develop a custom script for an HDInsight cluster, there are several bes
3434
* [Target the Apache Hadoop version](#bPS1)
3535
* [Target the OS Version](#bps10)
3636
* [Provide stable links to script resources](#bPS2)
37-
* [Use precompiled resources](#bPS4)
37+
* [Use pre-compiled resources](#bPS4)
3838
* [Ensure that the cluster customization script is idempotent](#bPS3)
3939
* [Ensure high availability of the cluster architecture](#bPS5)
4040
* [Configure the custom components to use Azure Blob storage](#bPS6)
@@ -118,15 +118,15 @@ The best practice is to download and archive everything in an Azure Storage acco
118118
119119
For example, the samples provided by Microsoft are stored in the `https://hdiconfigactions.blob.core.windows.net/` storage account. This location is a public, read-only container maintained by the HDInsight team.
120120

121-
### <a name="bPS4"></a>Use precompiled resources
121+
### <a name="bPS4"></a>Use pre-compiled resources
122122

123-
To reduce the time it takes to run the script, avoid operations that compile resources from source code. For example, precompile resources and store them in an Azure Storage account blob in the same data center as HDInsight.
123+
To reduce the time it takes to run the script, avoid operations that compile resources from source code. For example, pre-compile resources and store them in an Azure Storage account blob in the same data center as HDInsight.
124124

125125
### <a name="bPS3"></a>Ensure that the cluster customization script is idempotent
126126

127127
Scripts must be idempotent. If the script runs multiple times, it should return the cluster to the same state every time.
128128

129-
If the script runs multiple times, the script modifies configuration files shouldn't add duplicate entries.
129+
If the script runs multiple times, the script that modifies configuration files shouldn't add duplicate entries.
130130

131131
### <a name="bPS5"></a>Ensure high availability of the cluster architecture
132132

articles/hdinsight/hdinsight-phoenix-in-hdinsight.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.date: 05/22/2024
1111

1212
[Apache Phoenix](https://phoenix.apache.org/) is an open source, massively parallel relational database layer built on [Apache HBase](hbase/apache-hbase-overview.md). Phoenix allows you to use SQL-like queries over HBase. Phoenix uses JDBC drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upsert rows individually and in bulk. Phoenix uses noSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase. Phoenix adds coprocessors to support running client-supplied code in the address space of the server, executing the code colocated with the data. This approach minimizes client/server data transfer.
1313

14-
Apache Phoenix opens up big data queries to nondevelopers who can use a SQL-like syntax rather than programming. Phoenix is highly optimized for HBase, unlike other tools such as [Apache Hive](hadoop/hdinsight-use-hive.md) and Apache Spark SQL. The benefit to developers is writing highly performant queries with much less code.
14+
Apache Phoenix opens up big data queries to non-developers who can use a SQL-like syntax rather than programming. Phoenix is highly optimized for HBase, unlike other tools such as [Apache Hive](hadoop/hdinsight-use-hive.md) and Apache Spark SQL. The benefit to developers is writing highly performant queries with much less code.
1515

1616
When you submit a SQL query, Phoenix compiles the query to HBase native calls and runs the scan (or plan) in parallel for optimization. This layer of abstraction frees the developer from writing MapReduce jobs, to focus instead on the business logic and the workflow of their application around Phoenix's big data storage.
1717

@@ -89,9 +89,9 @@ ALTER TABLE my_other_table SET TRANSACTIONAL=true;
8989
9090
### Salted Tables
9191

92-
*Region server hotspotting* can occur when writing records with sequential keys to HBase. Though you may have multiple region servers in your cluster, your writes are all occurring on just one. This concentration creates the hotspotting issue where, instead of your write workload being distributed across all of the available region servers, just one is handling the load. Since each region has a predefined maximum size, when a region reaches that size limit, split into two small regions. When that happens, one of these new regions takes all new records, becoming the new hotspot.
92+
*Region server hotspotting* can occur when writing records with sequential keys to HBase. Though you may have multiple region servers in your cluster, your writes are all occurring on just one. This concentration creates the hotspotting issue where, instead of your write workload being distributed across all of the available region servers, just one is handling the load. Since each region has a predefined maximum size, when a region reaches that size limit, it's split into two small regions. When that happens, one of these new regions takes all new records, becoming the new hotspot.
9393

94-
To mitigate this problem and achieve better performance, presplit tables so that all of the region servers are equally used. Phoenix provides *salted tables*, transparently adding the salting byte to the row key for a particular table. The table is presplit on the salt byte boundaries to ensure equal load distribution among region servers during the initial phase of the table. This approach distributes the write workload across all of the available region servers, improving the write and read performance. To salt a table, specify the `SALT_BUCKETS` table property when the table is created:
94+
To mitigate this problem and achieve better performance, pre-split tables so that all of the region servers are equally used. Phoenix provides *salted tables*, transparently adding the salting byte to the row key for a particular table. The table is pre-split on the salt byte boundaries to ensure equal load distribution among region servers during the initial phase of the table. This approach distributes the write workload across all of the available region servers, improving the write and read performance. To salt a table, specify the `SALT_BUCKETS` table property when the table is created:
9595

9696
```sql
9797
CREATE TABLE Saltedweblogs (

articles/hdinsight/hdinsight-selecting-vm-size.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ ms.date: 05/22/2024
1111

1212
This article discusses how to select the right VM size for the various nodes in your HDInsight cluster.
1313

14-
Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency affects the processing of your workloads. Next, think about your application and how it matches with what different VM families are optimized for. Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. For a list of all supported and recommended VM sizes for each cluster type, see [Azure HDInsight supported node configurations](hdinsight-supported-node-configuration.md). Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.
14+
Begin by understanding how the properties of a virtual machine such as CPU processing, RAM size, and network latency affect the processing of your workloads. Next, think about your application and how it matches with what different VM families are optimized for. Make sure that the VM family that you would like to use is compatible with the cluster type that you plan to deploy. For a list of all supported and recommended VM sizes for each cluster type, see [Azure HDInsight supported node configurations](hdinsight-supported-node-configuration.md). Lastly, you can use a benchmarking process to test some sample workloads and check which SKU within that family is right for you.
1515

1616
For more information on planning other aspects of your cluster such as selecting a storage type or cluster size, see [Capacity planning for HDInsight clusters](hdinsight-capacity-planning.md).
1717

1818
## VM properties and big data workloads
1919

20-
The VM size and type determined by CPU processing power, RAM size, and network latency:
20+
The VM size and type are determined by CPU processing power, RAM size, and network latency:
2121

2222
- CPU: The VM size dictates the number of cores. The more cores, the greater the degree of parallel computation each node can achieve. Also, some VM types have faster cores.
2323

@@ -40,7 +40,7 @@ Virtual machine families in Azure are optimized to suit different use cases. In
4040

4141
## Cost saving VM types for light workloads
4242

43-
If you have light processing requirements, the [F-series](https://azure.microsoft.com/blog/f-series-vm-size/) can be a good choice to get started with HDInsight. At a lower per-hour list price, the F-series are the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.
43+
If you have light processing requirements, the [F-series](https://azure.microsoft.com/blog/f-series-vm-size/) can be a good choice to get started with HDInsight. At a lower per-hour list price, the F-series is the best value in price-performance in the Azure portfolio based on the Azure Compute Unit (ACU) per vCPU.
4444

4545
The following table describes the cluster types and node types, which can be created with the Fsv2-series VMs.
4646

articles/hdinsight/interactive-query/hdinsight-connect-hive-zeppelin.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ An HDInsight Interactive Query cluster. See [Create cluster](../hadoop/apache-ha
6161
limit ${total_count=10}
6262
```
6363
64-
When you Compare the traditional Hive, the query results come back must faster.
64+
Compared to the traditional Hive, the query results come back much faster.
6565
6666
### More examples
6767

0 commit comments

Comments
 (0)