Improved Correcteness Score

Sreekanth Iyer (Ushta Te Consultancy Services) · Sreekanth Iyer (Ushta Te Consultancy Services) · commit 7245fe3f8498 · 2024-07-24T18:29:05.000+05:30
diff --git a/articles/hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md b/articles/hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md
@@ -41,9 +41,9 @@ In this section, you create a Hadoop cluster in HDInsight using the Azure portal
    |Region    | From the drop-down list, select a region where the cluster is created.  Choose a location closer to you for better performance. |
    |Cluster type| Select **Select cluster type**. Then select **Hadoop** as the cluster type.|
    |Version|From the drop-down list, select a **version**. Use the default version if you don't know what to choose.|
-   |Cluster login username and password    | The default login name is **admin**. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ```' ` "```). Make sure you **do not provide** common passwords such as "Pass@word1".|
+   |Cluster sign in username and password    | The default sign in name is **admin**. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one nonalphanumeric character (except characters ```' ` "```). Make sure you **do not provide** common passwords such as "Pass@word1".|
    |Secure Shell (SSH) username | The default username is `sshuser`.  You can provide another name for the SSH username. |
-   |Use cluster login password for SSH| Select this check box to use the same password for SSH user as the one you provided for the cluster login user.|
+   |Use cluster sign in password for SSH| Select this check box to use the same password for SSH user as the one you provided for the cluster sign in user.|
 
    :::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/azure-portal-cluster-basics.png" alt-text="HDInsight Linux get started provide cluster basic values." border="true":::
 
@@ -115,7 +115,7 @@ In this section, you create a Hadoop cluster in HDInsight using the Azure portal
 
    :::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/hdinsight-linux-hive-view-save-results.png" alt-text="Save result of Apache Hive query." border="true":::
 
-After you've completed a Hive job, you can [export the results to Azure SQL Database or SQL Server database](apache-hadoop-use-sqoop-mac-linux.md), you can also [visualize the results using Excel](apache-hadoop-connect-excel-power-query.md). For more information about using Hive in HDInsight, see [Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache log4j file](hdinsight-use-hive.md).
+After you've completed a Hive job, you can [export the results to Azure SQL Database or SQL Server database](apache-hadoop-use-sqoop-mac-linux.md), you can also [visualize the results using Excel](apache-hadoop-connect-excel-power-query.md). For more information about using Hive in HDInsight, see [Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache Log4j file](hdinsight-use-hive.md).
 
 ## Clean up resources
 
@@ -130,7 +130,7 @@ After you complete the quickstart, you may want to delete the cluster. With HDIn
 
    :::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/hdinsight-delete-cluster.png" alt-text="Azure HDInsight delete cluster." border="true":::
 
-2. If you want to delete the cluster as well as the default storage account, select the resource group name (highlighted in the previous screenshot) to open the resource group page.
+2. If you want to delete the cluster and the default storage account, select the resource group name (highlighted in the previous screenshot) to open the resource group page.
 
 3. Select **Delete resource group** to delete the resource group, which contains the cluster and the default storage account. Note deleting the resource group deletes the storage account. If you want to keep the storage account, choose to delete the cluster only.
 
diff --git a/articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2-azure-cli.md b/articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2-azure-cli.md
@@ -6,7 +6,7 @@ ms.author: sairamyeturi
 ms.service: hdinsight
 ms.topic: how-to
 ms.custom: hdinsightactive, devx-track-azurecli
-ms.date: 08/21/2023
+ms.date: 07/24/2024
 ---
 
 # Create a cluster with Data Lake Storage Gen2 using Azure CLI
diff --git a/articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md b/articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md
@@ -32,7 +32,7 @@ Use the following links for detailed instructions on how to create HDInsight clu
 
 ## Access control for Data Lake Storage Gen2 in HDInsight
 
-### What kinds of permissions does Data Lake Storage Gen2 support?
+### What kinds of permissions do Data Lake Storage Gen2 support?
 
 Data Lake Storage Gen2 uses an access control model that supports both Azure role-based access control (Azure RBAC) and POSIX-like access control lists (ACLs).
 
diff --git a/articles/hdinsight/hdinsight-upload-data.md b/articles/hdinsight/hdinsight-upload-data.md
@@ -55,7 +55,7 @@ Because the default file system for HDInsight is in Azure Storage, /example/data
 
 `wasbs:///example/data/data.txt`
 
-or
+Or
 
 `wasbs://<ContainerName>@<StorageAccountName>.blob.core.windows.net/example/data/davinci.txt`
 
diff --git a/articles/hdinsight/interactive-query/apache-hive-migrate-workloads.md b/articles/hdinsight/interactive-query/apache-hive-migrate-workloads.md
@@ -183,7 +183,7 @@ To convert external table (non-ACID) to Managed (ACID) table,
 
 **Scenario 1**
 
-Consider table rt is external table (non-ACID). If the table is non-ORC table,
+Consider table `rt` is external table (non-ACID). If the table is non-ORC table,
 
 ```
 alter table rt set TBLPROPERTIES ('transactional'='true');
@@ -199,7 +199,7 @@ ERROR:
 Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. work.rt can't be declared transactional because it's an external table (state=08S01,code=1)
 ```
 
-This error is occurring because the table rt is external table and you can't convert external table to ACID.
+This error is occurring because the table `rt` is external table and you can't convert external table to ACID.
 
 **Scenario 3**
 
@@ -432,13 +432,13 @@ In certain situations when running a Hive query, you might receive `java.lang.Cl
     ```
 The update command is to update the details manually in the backend DB and the alter command is used to alter the table with the new SerDe class from beeline or Hive.
 
-### Hive Backend DB schema compare Script
+### Hive Backend DB schema compares Script
 
 You can run the following script after completing the migration.
 
 There's a chance of missing few columns in the backend DB, which causes the query failures. If the schema upgrade wasn't happened properly, then there's chance that we may hit the invalid column name issue. The below script fetches the column name and datatype from customer backend DB and provides the output if there's any missing column or incorrect datatype.
 
-The following path contains the schemacompare_final.py and test.csv file. The script is present in "schemacompare_final.py" file and the file "test.csv" contains all the column name and the datatype for all the tables, which should be present in the hive backend DB.
+The following path contains the schemacompare_final.py and test.csv file. The script is present in `schemacompare_final.py` file and the file "test.csv" contains all the column name and the datatype for all the tables, which should be present in the hive backend DB.
 
 https://hdiconfigactions2.blob.core.windows.net/hiveschemacompare/schemacompare_final.py
 
@@ -448,11 +448,11 @@ Download these two files from the link. And copy these files to one of the head
 
 **Steps to execute the script:**
 
-Create a directory called "schemacompare" under "/tmp" directory.
+Create a directory called `schemacompare` under "/tmp" directory.
 
 Put the "schemacompare_final.py" and "test.csv" into the folder "/tmp/schemacompare". Do "ls -ltrh /tmp/schemacompare/" and verify whether the files are present.
 
-To execute the Python script, use the command "python schemacompare_final.py". This script starts executing the script and it takes less than five minutes to complete. The above script automatically connects to your backend DB and fetches the details from each and every table, which Hive uses and update the details in the new csv file called "return.csv". After creating the file return.csv, it compares the data with the file "test.csv" and prints the column name or datatype if there's anything missing under the tablename.
+To execute the Python script, use the command "python schemacompare_final.py". This script starts executing the script and it takes less than five minutes to complete. The above script automatically connects to your backend DB and fetches the details from each and every table, which Hive uses and update the details in the new csv file called "return.csv". After you create the file return.csv, it compares the data with the file "test.csv" and prints the column name or datatype if there's anything missing under the tablename.
 
 Once after executing the script you can see the following lines, which indicate that the details are fetched for the tables and the script is in progressing
 
@@ -550,7 +550,7 @@ Tune Metastore to reduce their CPU usage.
    1. New value: `false` 
 
 1.	Optimize the partition repair feature 
-   1.	Disable partition repair - This feature is used to synchronize the partitions of Hive tables in storage location with Hive metastore. You may disable this feature if “msck repair” is used after the data ingestion. 
+   1.	Disable partition repair - This feature is used to synchronize the partitions of Hive tables in storage location with Hive metastore. You may disable this feature if `msck repair` is used after the data ingestion. 
    1. To disable the feature **add "discover.partitions=false"** under table properties using ALTER TABLE.
       OR (if the feature can't be disabled)
    1.	Increase the partition repair frequency. 
diff --git a/articles/hdinsight/interactive-query/hive-default-metastore-export-import.md b/articles/hdinsight/interactive-query/hive-default-metastore-export-import.md
@@ -14,13 +14,13 @@ This article shows how to migrate metadata from a [default metastore DB](../hdin
 
 ## Why migrate to external metastore DB
 
-* Default metastore DB is limited to basic SKU and cannot handle production scale workloads.
+* Default metastore DB is limited to basic SKU and can't handle production scale workloads.
 
 * External metastore DB enables customer to horizontally scale Hive compute resources by adding new HDInsight clusters sharing the same metastore DB.
 
-* For HDInsight 3.6 to 4.0 migration, it is mandatory to migrate metadata to external metastore DB before upgrading the Hive schema version. See [migrating workloads from HDInsight 3.6 to HDInsight 4.0](./apache-hive-migrate-workloads.md).
+* For HDInsight 3.6 to 4.0 migration, it's mandatory to migrate metadata to external metastore DB before upgrading the Hive schema version. See [migrating workloads from HDInsight 3.6 to HDInsight 4.0](./apache-hive-migrate-workloads.md).
 
-Because the default metastore DB has limited compute capacity, we recommend low utilization from other jobs on the cluster while migrating metadata.
+Because the default metastore DB with limited compute capacity, we recommend low utilization from other jobs on the cluster while migrating metadata.
 
 Source and target DBs must use the same HDInsight version and the same Storage Accounts. If upgrading HDInsight versions from 3.6 to 4.0, complete the steps in this article first. Then, follow the official upgrade steps [here](./apache-hive-migrate-workloads.md).
 
@@ -33,7 +33,7 @@ The action is similar to replacing symlinks with their full paths.
 |Property | Value |
 |---|---|
 |Bash script URI|`https://hdiconfigactions.blob.core.windows.net/linuxhivemigrationv01/hive-adl-expand-location-v01.sh`|
-|Node type(s)|Head|
+|Node types|Head|
 |Parameters|""|
 
 ## Migrate with Export/Import using sqlpackage
@@ -51,8 +51,7 @@ An HDInsight cluster created only after 2020-10-15 supports SQL Export/Import fo
     sudo python hive_metastore_tool.py --sqlpackagefile $SQLPACKAGE_FILE --targetfile $TARGET_FILE
     ```
 
-3. Save the BACPAC file. Below is an option.
-
+3. Save the BACPAC file.
     ```bash
     hdfs dfs -mkdir -p /bacpacs
     hdfs dfs -put $TARGET_FILE /bacpacs/
@@ -64,13 +63,13 @@ An HDInsight cluster created only after 2020-10-15 supports SQL Export/Import fo
 
 ## Migrate using Hive script
 
-Clusters created before 2020-10-15 do not support export/import of the default metastore DB.
+Clusters created before 2020-10-15 don't support export/import of the default metastore DB.
 
 For such clusters, follow the guide [Copy Hive tables across Storage Accounts](./hive-migration-across-storage-accounts.md), using a second cluster with an [external Hive metastore DB](../hdinsight-use-external-metadata-stores.md#select-a-custom-metastore-during-cluster-creation). The second cluster can use the same storage account but must use a new default filesystem.
 
 ### Option to "shallow" copy
-Storage consumption would double when tables are "deep" copied using the above guide. You need to manually clean the data in the source storage container.
-We can, instead, "shallow" copy the tables if they are non-transactional. All Hive tables in HDInsight 3.6 are non-transactional by default, but only external tables are non-transactional in HDInsight 4.0. Transactional tables must be deep copied. Follow these steps to shallow copy non-transactional tables:
+Storage consumption would double when tables are "deep" copied using the guide. You need to manually clean the data in the source storage container.
+We can, instead, "shallow" copy the tables if they're nontransactional. All Hive tables in HDInsight 3.6 are nontransactional by default, but only external tables are nontransactional in HDInsight 4.0. Transactional tables must be deep copied. Follow these steps to shallow copy nontransactional tables:
 
 1. Execute script [hive-ddls.sh](https://hdiconfigactions.blob.core.windows.net/linuxhivemigrationv01/hive-ddls.sh) on the source cluster's primary headnode to generate the DDL for every Hive table.
 2. The DDL is written to a local Hive script named `/tmp/hdi_hive_ddls.hql`. Execute this on the target cluster that uses an external Hive metastore DB.
diff --git a/articles/hdinsight/spark/apache-spark-overview.md b/articles/hdinsight/spark/apache-spark-overview.md
@@ -43,7 +43,7 @@ Spark clusters in HDInsight offer a fully managed Spark service. Benefits of cre
 | Ease creation |You can create a new Spark cluster in HDInsight in minutes using the Azure portal, Azure PowerShell, or the HDInsight .NET SDK. See [Get started with Apache Spark cluster in HDInsight](apache-spark-jupyter-spark-sql-use-portal.md). |
 | Ease of use |Spark cluster in HDInsight include Jupyter Notebooks and Apache Zeppelin Notebooks. You can use these notebooks for interactive data processing and visualization. See [Use Apache Zeppelin notebooks with Apache Spark](apache-spark-zeppelin-notebook.md) and [Load data and run queries on an Apache Spark cluster](apache-spark-load-data-run-query.md).|
 | REST APIs |Spark clusters in HDInsight include [Apache Livy](https://github.com/cloudera/hue/tree/master/apps/spark/java#welcome-to-livy-the-rest-spark-server), a REST API-based Spark job server to remotely submit and monitor jobs. See [Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster](apache-spark-livy-rest-interface.md).|
-| Support for Azure Storage | Spark clusters in HDInsight can use Azure Data Lake Storage Gen2 as both the primary storage or additional storage. . For more information on Data Lake Storage Gen2, see [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md).|
+| Support for Azure Storage | Spark clusters in HDInsight can use Azure Data Lake Storage Gen2 as both the primary storage or additional storage. For more information on Data Lake Storage Gen2, see [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md).|
 | Integration with Azure services |Spark cluster in HDInsight comes with a connector to Azure Event Hubs. You can build streaming applications using the Event Hubs. Including Apache Kafka, which is already available as part of Spark. |
 | Integration with third-party IDEs | HDInsight provides several IDE plugins that are useful to create and submit applications to an HDInsight Spark cluster. For more information, see [Use Azure Toolkit for IntelliJ IDEA](apache-spark-intellij-tool-plugin.md), [Use Spark & Hive Tools for VSCode](../hdinsight-for-vscode.md), and [Use Azure Toolkit for Eclipse](apache-spark-eclipse-tool-plugin.md).|
 | Concurrent Queries |Spark clusters in HDInsight support concurrent queries. This capability enables multiple queries from one user or multiple queries from various users and applications to share the same cluster resources. |
@@ -75,15 +75,15 @@ The SparkContext can connect to several types of cluster managers, which give re
 
 The SparkContext runs the user's main function and executes the various parallel operations on the worker nodes. Then, the SparkContext collects the results of the operations. The worker nodes read and write data from and to the Hadoop distributed file system. The worker nodes also cache transformed data in-memory as Resilient Distributed Datasets (RDDs).
 
-The SparkContext connects to the Spark master and is responsible for converting an application to a directed graph (DAG) of individual tasks. Tasks that get executed within an executor process on the worker nodes. Each application gets its own executor processes. Which stay up during the whole application and run tasks in multiple threads.
+The SparkContext connects to the Spark master and is responsible for converting an application to a directed graph (DAG) of individual tasks. Tasks that get executed within an executor process on the worker nodes. Each application gets its own executor processes, which stay up during the whole application and run tasks in multiple threads.
 
 ## Spark in HDInsight use cases
 
 Spark clusters in HDInsight enable the following key scenarios:
 
 ### Interactive data analysis and BI
 
-Apache Spark in HDInsight stores data in Azure Blob Storage, Azure Data Lake Azure Data Lake Storage Gen2. Business experts and key decision makers can analyze and build reports over that data. And use Microsoft Power BI to build interactive reports from the analyzed data. Analysts can start from unstructured/semi structured data in cluster storage, define a schema for the data using notebooks, and then build data models using Microsoft Power BI. Spark clusters in HDInsight also support many third-party BI tools. Such as Tableau, making it easier for data analysts, business experts, and key decision makers.
+Apache Spark in HDInsight stores data in Azure Blob Storage and  Azure Data Lake Storage Gen2. Business experts and key decision makers can analyze and build reports over that data. And use Microsoft Power BI to build interactive reports from the analyzed data. Analysts can start from unstructured/semi structured data in cluster storage, define a schema for the data using notebooks, and then build data models using Microsoft Power BI. Spark clusters in HDInsight also support many third-party BI tools. Such as Tableau, making it easier for data analysts, business experts, and key decision makers.
 
 * [Tutorial: Visualize Spark data using Power BI](apache-spark-use-bi-tools.md)