Merge pull request #113277 from dagiro/freshness_c63

ShannonLeavitt · web-flow · commit 6ae715dc1787 · 2020-04-29T16:08:33.000-06:00
freshness_c63
diff --git a/articles/hdinsight/hdinsight-hadoop-linux-information.md b/articles/hdinsight/hdinsight-hadoop-linux-information.md
@@ -5,9 +5,9 @@ author: hrasheed-msft
 ms.author: hrasheed
 ms.reviewer: jasonh
 ms.service: hdinsight
-ms.custom: hdinsightactive
+ms.custom: hdinsightactive,seoapr2020
 ms.topic: conceptual
-ms.date: 11/14/2019
+ms.date: 04/29/2020
 ---
 
 # Information about using HDInsight on Linux
@@ -90,21 +90,21 @@ Example data and JAR files can be found on Hadoop Distributed File System at `/e
 
 ## HDFS, Azure Storage, and Data Lake Storage
 
-In most Hadoop distributions, the data is stored in HDFS, which is backed by local storage on the machines in the cluster. Using local storage can be costly for a cloud-based solution where you're charged hourly or by minute for compute resources.
+In most Hadoop distributions, the data is stored in HDFS. HDFS is backed by local storage on the machines in the cluster. Using local storage can be costly for a cloud-based solution where you're charged hourly or by minute for compute resources.
 
-When using HDInsight, the data files are stored in a scalable and resilient way in the cloud using Azure Blob Storage and optionally Azure Data Lake Storage. These services provide the following benefits:
+When using HDInsight, the data files are stored in an adaptable and resilient way in the cloud using Azure Blob Storage and optionally Azure Data Lake Storage. These services provide the following benefits:
 
 * Cheap long-term storage.
 * Accessibility from external services such as websites, file upload/download utilities, various language SDKs, and web browsers.
-* Large file capacity and large scalable storage.
+* Large file capacity and large adaptable storage.
 
 For more information, see [Understanding blobs](https://docs.microsoft.com/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs) and [Data Lake Storage](https://azure.microsoft.com/services/storage/data-lake-storage/).
 
-When using either Azure Storage or Data Lake Storage, you don't have to do anything special from HDInsight to access the data. For example, the following command lists files in the `/example/data` folder regardless of whether it's stored on Azure Storage or Data Lake Storage:
+When using either Azure Storage or Data Lake Storage, you don't have to do anything special from HDInsight to access the data. For example, the following command lists files in the `/example/data` folder whether it's stored on Azure Storage or Data Lake Storage:
 
     hdfs dfs -ls /example/data
 
-In HDInsight, the data storage resources (Azure Blob Storage and Azure Data Lake Storage) are decoupled from compute resources. Therefore, you can create HDInsight clusters to do computation as you need, and later delete the cluster when the work is finished, meanwhile keeping your data files persisted safely in cloud storage as long as you need.
+In HDInsight, the data storage resources (Azure Blob Storage and Azure Data Lake Storage) are decoupled from compute resources. You can create HDInsight clusters to do computation as you need, and later delete the cluster when the work is finished. Meanwhile keeping your data files persisted safely in cloud storage as long as you need.
 
 ### <a name="URI-and-scheme"></a>URI and scheme
 
@@ -205,46 +205,11 @@ If using __Azure Data Lake Storage__, see the following links for ways that you
 
 ## <a name="scaling"></a>Scaling your cluster
 
-The cluster scaling feature allows you to dynamically change the number of data nodes used by a cluster. You can perform scaling operations while other jobs or processes are running on a cluster.  See also, [Scale HDInsight clusters](./hdinsight-scaling-best-practices.md)
-
-The different cluster types are affected by scaling as follows:
-
-* **Hadoop**: When scaling down the number of nodes in a cluster, some of the services in the cluster are restarted. Scaling operations can cause jobs running or pending to fail at the completion of the scaling operation. You can resubmit the jobs once the operation is complete.
-* **HBase**: Regional servers are automatically balanced within a few minutes, once the scaling operation completes. To manually balance regional servers, use the following steps:
-
-    1. Connect to the HDInsight cluster using SSH. For more information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md).
-
-    2. Use the following to start the HBase shell:
-
-            hbase shell
-
-    3. Once the HBase shell has loaded, use the following to manually balance the regional servers:
-
-            balancer
-
-* **Storm**: You should rebalance any running Storm topologies after a scaling operation has been performed. Rebalancing allows the topology to readjust parallelism settings based on the new number of nodes in the cluster. To rebalance running topologies, use one of the following options:
-
-    * **SSH**: Connect to the server and use the following command to rebalance a topology:
-
-            storm rebalance TOPOLOGYNAME
-
-        You can also specify parameters to override the parallelism hints originally provided by the topology. For example, `storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10` reconfigures the topology to 5 worker processes, 3 executors for the blue-spout component, and 10 executors for the yellow-bolt component.
-
-    * **Storm UI**: Use the following steps to rebalance a topology using the Storm UI.
-
-        1. Open `https://CLUSTERNAME.azurehdinsight.net/stormui` in your web browser, where `CLUSTERNAME` is the name of your Storm cluster. If prompted, enter the HDInsight cluster administrator (admin) name and password you specified when creating the cluster.
-        2. Select the topology you wish to rebalance, then select the **Rebalance** button. Enter the delay before the rebalance operation is performed.
-
-* **Kafka**: You should rebalance partition replicas after scaling operations. For more information, see the [High availability of data with Apache Kafka on HDInsight](./kafka/apache-kafka-high-availability.md) document.
-
-For specific information on scaling your HDInsight cluster, see:
-
-* [Manage Apache Hadoop clusters in HDInsight by using the Azure portal](hdinsight-administer-use-portal-linux.md#scale-clusters)
-* [Manage Apache Hadoop clusters in HDInsight by using Azure CLI](hdinsight-administer-use-command-line.md#scale-clusters)
+The cluster scaling feature allows you to dynamically change the number of data nodes used by a cluster. You can do scaling operations while other jobs or processes are running on a cluster.  See [Scale HDInsight clusters](./hdinsight-scaling-best-practices.md)
 
 ## How do I install Hue (or other Hadoop component)?
 
-HDInsight is a managed service. If Azure detects a problem with the cluster, it may delete the failing node and create a node to replace it. If you manually install things on the cluster, they aren't persisted when this operation occurs. Instead, use [HDInsight Script Actions](hdinsight-hadoop-customize-cluster-linux.md). A script action can be used to make the following changes:
+HDInsight is a managed service. If Azure detects a problem with the cluster, it may delete the failing node and create a node to replace it. When you manually install things on the cluster, they aren't persisted when this operation occurs. Instead, use [HDInsight Script Actions](hdinsight-hadoop-customize-cluster-linux.md). A script action can be used to make the following changes:
 
 * Install and configure a service or web site.
 * Install and configure a component that requires configuration changes on multiple nodes in the cluster.
@@ -253,7 +218,7 @@ Script Actions are Bash scripts. The scripts run during cluster creation, and ar
 
 ### Jar files
 
-Some Hadoop technologies are provided in self-contained jar files that contain functions used as part of a MapReduce job, or from inside Pig or Hive. They often don't require any setup, and can be uploaded to the cluster after creation and used directly. If you want to make sure the component survives reimaging of the cluster, you can store the jar file in the default storage for your cluster (WASB or ADL).
+Some Hadoop technologies provide self-contained jar files. These files contain functions used as part of a MapReduce job, or from inside Pig or Hive. They often don't require any setup, and can be uploaded to the cluster after creation and used directly. If you want to make sure the component survives reimaging of the cluster, store the jar file in the cluster default storage.
 
 For example, if you want to use the latest version of [Apache DataFu](https://datafu.incubator.apache.org/), you can download a jar containing the project and upload it to the HDInsight cluster. Then follow the DataFu documentation on how to use it from Pig or Hive.
 
diff --git a/articles/hdinsight/hdinsight-scaling-best-practices.md b/articles/hdinsight/hdinsight-scaling-best-practices.md
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
 ms.service: hdinsight
 ms.topic: conceptual
 ms.custom: seoapr2020
-ms.date: 04/23/2020
+ms.date: 04/29/2020
 ---
 
 # Scale Azure HDInsight clusters
@@ -69,28 +69,39 @@ The impact of changing the number of data nodes varies for each type of cluster
 
 * Apache Storm
 
-    You can seamlessly add or remove data nodes while Storm is running. However, after a successful completion of the scaling operation, you'll need to rebalance the topology.
-
-    Rebalancing can be accomplished in two ways:
+    You can seamlessly add or remove data nodes while Storm is running. However, after a successful completion of the scaling operation, you'll need to rebalance the topology. Rebalancing allows the topology to readjust [parallelism settings](https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html) based on the new number of nodes in the cluster. To rebalance running topologies, use one of the following options:
 
   * Storm web UI
-  * Command-line interface (CLI) tool
 
-    For more information, see [Apache Storm documentation](https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html).
+    Use the following steps to rebalance a topology using the Storm UI.
+
+    1. Open `https://CLUSTERNAME.azurehdinsight.net/stormui` in your web browser, where `CLUSTERNAME` is the name of your Storm cluster. If prompted, enter the HDInsight cluster administrator (admin) name and password you specified when creating the cluster.
+
+    1. Select the topology you wish to rebalance, then select the **Rebalance** button. Enter the delay before the rebalance operation is done.
+
+        ![HDInsight Storm scale rebalance](./media/hdinsight-scaling-best-practices/hdinsight-portal-scale-cluster-storm-rebalance.png)
+
+  * Command-line interface (CLI) tool
 
-    The Storm web UI is available on the HDInsight cluster:
+    Connect to the server and use the following command to rebalance a topology:
 
-    ![HDInsight Storm scale rebalance](./media/hdinsight-scaling-best-practices/hdinsight-portal-scale-cluster-storm-rebalance.png)
+    ```bash
+     storm rebalance TOPOLOGYNAME
+    ```
 
-    Here is an example CLI command to rebalance the Storm topology:
+    You can also specify parameters to override the parallelism hints originally provided by the topology. For example, the code below reconfigures the `mytopology` topology to 5 worker processes, 3 executors for the blue-spout component, and 10 executors for the yellow-bolt component.
 
-    ```console
+    ```bash
     ## Reconfigure the topology "mytopology" to use 5 worker processes,
     ## the spout "blue-spout" to use 3 executors, and
     ## the bolt "yellow-bolt" to use 10 executors
     $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
     ```
 
+* Kafka
+
+    You should rebalance partition replicas after scaling operations. For more information, see the [High availability of data with Apache Kafka on HDInsight](./kafka/apache-kafka-high-availability.md) document.
+
 ## How to safely scale down a cluster
 
 ### Scale down a cluster with running jobs
@@ -247,3 +258,8 @@ Region servers are automatically balanced within a few minutes after completing
 ## Next steps
 
 * [Automatically scale Azure HDInsight clusters](hdinsight-autoscale-clusters.md)
+
+For specific information on scaling your HDInsight cluster, see:
+
+* [Manage Apache Hadoop clusters in HDInsight by using the Azure portal](hdinsight-administer-use-portal-linux.md#scale-clusters)
+* [Manage Apache Hadoop clusters in HDInsight by using Azure CLI](hdinsight-administer-use-command-line.md#scale-clusters)