Style corrections, Acrolinx fixes

yousefi-msft · yousefi-msft · commit deeaf89c345a · 2024-10-05T11:58:58.000-04:00
diff --git a/articles/modeling-simulation-workbench/tutorial-install-slurm.md b/articles/modeling-simulation-workbench/tutorial-install-slurm.md
@@ -12,7 +12,7 @@ ms.date: 10/02/2024
 
 # Tutorial: Install the Slurm workload manager in the Azure Modeling and Simulation Workbench
 
-The [Slurm](https://slurm.schedmd.com/overview.html) Workload Manager is a scheduler used in microelectronics design and other high-performance computing scenarios to manage jobs across compute clusters. The Modeling and Simulation Workbench can be deployed with a range of high-performance virtual machines (VM) ideal for large, compute-intensive workloads. Slurm clusters consist of a *controller node*, where the administrator manages, stages, and schedules jobs bound for the *compute nodes*, where the actual workloads are performed. A *node* is simply a part of the cluster, in this case a VM.
+The [Slurm](https://slurm.schedmd.com/overview.html) Workload Manager is a scheduler used in microelectronics design and other high-performance computing scenarios to manage jobs across compute clusters. The Modeling and Simulation Workbench can be deployed with a range of high-performance virtual machines (VM) ideal for large, compute-intensive workloads. Slurm clusters consist of a *controller node* that manages, stages, and schedules jobs bound for the *compute nodes*. Compute nodes are where the actual workloads are performed. A *node* is an individual element of the cluster, such as a VM.
 
 The Slurm installation package is already available on all Modeling and Simulation Workbench Chamber VMs. This tutorial shows you how to create VMs for your Slurm cluster and install Slurm.
 
@@ -24,7 +24,7 @@ In this tutorial, you learn how to:
 > * Create an inventory of VMs
 > * Designate controller and compute nodes and install Slurm on each
 
-If you don’t have a Azure subscription, [create a free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
+If you don’t have an Azure subscription, [create a free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
 
 ## Prerequisites
 
@@ -34,60 +34,60 @@ If you don’t have a Azure subscription, [create a free account](https://azure.
 
 ## Sign in to the Azure portal and navigate to your workbench
 
-If you are not already signed into the Azure portal, go to [https://portal.azure.com](https://portal.azure.com). Navigate to your workbench, then the chamber where you will create your Slurm cluster.
+If you aren't already signed into the Azure portal, go to [https://portal.azure.com](https://portal.azure.com). Navigate to your workbench, then the chamber where you'll create your Slurm cluster.
 
 ## Create a cluster for Slurm
 
-Slurm requires one node to serve as the controller and a set of compute nodes where workloads will execute. The controller is traditionally a modestly sized VM as it isn't used for computational tasks and is left deployed between jobs, while the compute nodes are sized for the workload. Learn about the different VMs available in Modeling and Simulation Workbench on the [VM Offerings page](./concept-vm-offerings.md).
+Slurm requires one node to serve as the controller and a set of compute nodes where workloads execute. The controller is traditionally a modestly sized VM. The controller isn't used for computational workloads and is left deployed between jobs, while the compute nodes themselves are typically sized for a specific task and often deleted after the job. Learn about the different VMs available in Modeling and Simulation Workbench on the [VM Offerings page](./concept-vm-offerings.md).
 
 ### Create the Slurm controller node
 
 1. From the chamber overview page, select **Chamber VM** from the **Settings** menu, then either the **+ Create** button on action menu along the top or the blue **Create chamber VM** button in center of the page.
     :::image type="content" source="media/tutorial-slurm/create-chamber-vm.png" alt-text="Screenshot of chamber VM overview page with Chamber VM in Settings and the create options on the page highlighted by red outlines.":::
 1. On the **Create chamber VM** page:
-    * Enter a **Name** for the VM. Select something to indicate that this is the controller node.
-    * Select a VM size. For the controller, you can select the smallest VM available.
-    * Leave the **Chamber VM image type** and **Chamber VM count** as the default of "Semiconductor" and "1".
+    * Enter a **Name** for the VM. We recommend choosing a name that indicates it is the controller node.
+    * Select a VM size. For the controller, you can select the smallest VM available. The *D4s_v4* is currently the smallest.
+    * Leave the **Chamber VM image type** and **Chamber VM count** as the default of *Semiconductor* and *1*.
     * Select **Review + create**.
     :::image type="content" source="media/tutorial-slurm/configure-create-chamber-vm.png" alt-text="Screenshot of create chamber VM page with the name and VM size textboxes and the create button highlighted in red outline.":::
 1. After the validation check passes, select the **Create** button.
 
 Once the VM deploys, it's available in the connector desktop dashboard.
 
-### Create the Slurm compute cluster
+### Create a Slurm compute cluster
 
-A *cluster* is a collection of VMs, individually referred to as *nodes* that perform the actual work. The compute nodes have their workloads dispatched and managed by the controller node. Similar to the steps taken in creating the controller, return to the **Chamber VM** page and create a cluster. The Modeling and Simulation Workbench allows you to create identical VMs.
+A *cluster* is a collection of VMs, individually referred to as *nodes* that perform the actual work. The compute nodes have their workloads dispatched and managed by the controller node. Similar to the steps you took when you created the  controller, return to the **Chamber VM** page to create a cluster. The Modeling and Simulation Workbench allows you to create multiple, identical VMs in a single step.
 
 1. On the **Create chamber VM** page:
-    * Enter a **Name** for the VM cluster. Use a name that identifies these VMs as compute nodes. For example, include the word "node" in the name.
-    * Select a VM appropriate for the workload. Refer to the [VM Offerings](concept-vm-offerings.md) page for guidance on VM capabilities and sizes.
-    * Leave the **Chamber VM image type** as the default of "Semiconductor".
+    * Enter a **Name** for the VM cluster. Use a name that identifies these VMs as compute nodes. For example, include the word "node" or the type of workload somewhere in the name.
+    * Select a VM appropriately sized for the workload. Refer to the [VM Offerings](concept-vm-offerings.md) page for guidance on VM offerings, capabilities, features, and sizes.
+    * Leave the **Chamber VM image type** as the default of *Semiconductor*.
     * In the **Chamber VM count** box, enter the number of nodes required.
     * Select **Review + create**.
 1. After the validation check passes, select the **Create** button.
 
-VMs are deployed in parallel and appear in the dashboard. It isn't typically necessary to access worker nodes, however you can ssh to worker nodes in the same chamber if needed.
+VMs are deployed in parallel and appear in the dashboard. It isn't typically necessary to individually access worker nodes, however you can ssh to worker nodes in the same chamber if needed. In the next steps, you'll configure the compute nodes from the controller.
 
-### Connect to the controller node
+### Connect to the controller node desktop
 
 Slurm installation is performed from the controller node.
 
-1. Navigate to the connector. From the **Settings** menu of the chamber, select **Connector**, then select the sole connector that appears in the resource list.
+1. Navigate to the connector. From the **Settings** menu of the chamber, select **Connector**. Select the sole connector that appears in the resource list.
     :::image type="content" source="media/tutorial-slurm/connector-overview.png" alt-text="Screenshot of connector overview page with Connector in Settings and the target connector highlighted with a red rectangle.":::
 1. From the connector page, select the **Desktop dashboard** URL.
 1. The desktop dashboard opens. Select your controller VM.
 
 ## Create an inventory of VMs
 
-Slurm installation requires that you have a technical inventory of the compute nodes, as well as their host names.
+Slurm installation requires that you have a technical inventory of the compute nodes and as host names.
 
-### Get a list of available VMs
+### Get a list of deployed VMs
 
 Configuring Slurm requires an inventory of nodes. From the controller node:
 
-1. Open a terminal in your desktop.
+1. Open a terminal in your desktop by selecting the terminal icon from the menu bar at the top.
     :::image type="content" source="media/tutorial-slurm/open-terminal.png" alt-text="Screenshot of desktop with terminal button highlighted in red.":::
-1. Execute the following bash script to print a list of all VMs in the chamber. In this example, we have one controller and five nodes. The command prints the IP addresses in the first column and the hostnames in the second.
+1. Execute the following commands to print a list of all VMs in the chamber. In this example, we have one controller and five nodes. The command prints the IP addresses in the first column and the hostnames in the second. From the naming, you can see the controller node and the compute nodes.
 
     ```bash
     $ ip=$(hostname -i | cut -d'.' -f1-3)
@@ -100,11 +100,11 @@ Configuring Slurm requires an inventory of nodes. From the controller node:
     10.163.4.9 wrkldvmslurm-nod034b970    
     ```
 
-1. Create a file with just the worker nodes, one host per line and call it *slurm_worker.txt*. For the remaining steps of this tutorial, you'll use this list to configure the compute nodes from your controller. In some steps, the nodes need to be in a comma-delimited format. In those instances, we use a command-line shortcut to format without having to create a new file. Create the file of your Slurm worker nodes as your hostnames will not match the name you entered in the portal.
+1. Create a file with just the worker nodes, one host per line and call it *slurm_worker.txt*. For the remaining steps of this tutorial, you'll use this list to configure the compute nodes from your controller. In some steps, the nodes need to be in a comma-delimited format. In those instances, we use a command-line shortcut to format the list without having to create a new file. To create *slurm_worker.txt*, remove the IP addresses in the first column, and the controller node which is listed first.
 
 ### Gather technical specifications about the compute nodes
 
-Assuming that you created all the worker nodes in your cluster using the same VM, choose any node to retrieve technical information about the platform. In this example, we use *head* to grab the first host name in the compute node list:
+Assuming that you created all the worker nodes in your cluster using the same VM, choose any node to retrieve technical information about the platform. In this example, we use `head` to grab the first host name in the compute node list and using `ssh` send the `lscpu` command to be executed:
 
 ```bash
 $ ssh `head -1 ./slurm_worker.txt` lscpu
@@ -146,7 +146,7 @@ You'll be asked by the ssh client to verify the ECDSA key fingerprint of the rem
 * **Core(s) per socket**
 * **Thread(s) per core**
 
-Slurm also requires an estimate of available memory on the compute nodes. To obtain the available memory of a worker node, execute the *free* command on any of the compute nodes from your controller and note the **available** memory reported in the output. Again, we use the first worker node in our list using the *head* command.
+Slurm also requires an estimate of available memory on the compute nodes. To obtain the available memory of a worker node, execute the `free` command on any of the compute nodes from your controller and note the **available** memory reported in the output. Again, we use the first worker node in our list using the `head` command submitted via `ssh`.
 
 ```bash
 $ ssh `head -1 ./slurm_worker.txt` free
@@ -155,13 +155,13 @@ Mem:       16139424     1433696     7885256      766356     6820472    13593564
 Swap:             0           0           0
 ```
 
-Note the available memory listed in the "available" column.
+Note the available memory listed in the **available** column.
 
 ## Install Slurm on your cluster
 
-### Pre-requisite: Install MariaDB
+### Prerequisite: Install MariaDB
 
-Slurm requires the MySql fork of MariaDB to be installed from the Red Hat repository before it can be installed. Azure maintains a private Red Hat repository mirror and chamber VMs have access to this repository. Install and configure MariaDB with the following:
+Slurm requires the MySql fork MariaDB to be installed from the Red Hat repository before installation. Azure maintains a private Red Hat repository mirror and chamber VMs have access to this repository. Install and configure MariaDB with the following commands:
 
 ```bash
 sudo yum install -y mariadb-server
@@ -170,11 +170,11 @@ sudo systemctl enable mariadb
 mysql_secure_installation
 ```
 
-The *mysql_secure_installation* script requires more configuration.
+The *mysql_secure_installation* script asks for more configuration.
 
-* The default, new installation password isn't set. Hit **Enter** when asked for current password.
-* Enter 'Y' when asked to set root password. Create a new, secure root password for MariaDB, then reenter to confirm. You'll need this password later when you set up the Slurm controller in the following step.
-* Enter 'Y' for the remaining questions for:
+* The default database password isn't set. Hit **Enter** when asked for current password.
+* Enter *Y* when asked to set root password. Create a new, secure root password for MariaDB, take note of it for later, then reenter to confirm. You need this password when you configure the Slurm controller in the following step.
+* Enter *Y* for the remaining questions for:
   * Reloading privileged tables
   * Removing anonymous users
   * Disabling remote root login
@@ -183,13 +183,13 @@ The *mysql_secure_installation* script requires more configuration.
 
 ### Install Slurm on the controller
 
-The Modeling and Simulation Workbench provides a setup script to speed installation. It requires the parameters you collected earlier in this tutorial. Replace the placeholders in these example commands with the parameters you collected. Execute these commands on the controller node. The *\<clusternodes>* placeholder is a comma-separated, no space list of hostnames. The examples include a shortcut to do so, reformatting your compute node list into the proper comma-delimited format to prevent having to create another file. The format of the *sdwChamberSlurm* script is as follows:
+The Modeling and Simulation Workbench provides a setup script to speed installation. It requires the parameters you collected earlier in this tutorial. Replace the placeholders with the parameters you collected. Execute these commands on the controller node. The \<clusternodes\> placeholder is a comma-separated, no space list of hostnames. The examples include a shortcut to do so, reformatting your compute node list in *slurm_worker.txt* into the proper comma-delimited format. The argument of the *sdwChamberSlurm.sh* script is as follows:
 
 ```bash
 sudo /usr/sdw/slurm/sdwChamberSlurm.sh CONTROLLER <databaseSecret> <clusterNodes> <numberOfCpus> <numberOfSockets> <coresPerSocket> <threadsPerCore> <availableMemory>
 ```
 
-For this example, we use the list of nodes we created in the previous steps and substitute our values collected during discovery. The *paste* command is used to reformat the list of worker nodes into the comma-delimited format without needing to create a new file.
+For this example, we use the list of nodes we created in the previous steps and substitute our values collected during discovery. The `paste` command is used to reformat the list of worker nodes into the comma-delimited format without needing to create a new file.
 
 ```bash
 $ sudo /usr/sdw/slurm/sdwChamberSlurm.sh CONTROLLER <databasepassword> `paste -d, -s ./slurm_nodes.txt` 4 1 2 2 13593564
@@ -210,22 +210,23 @@ Complete!
 ```
 
 > [!TIP]
-> If your installation shows an [ERROR] message, check that you haven't mistyped any parameter. Review your information and repeat the step.
+> If your installation shows any [ERROR] message in these steps, check that you haven't mistyped or misplaced any parameter. Review your information and repeat the step.
 
 ### Install Slurm on compute nodes
 
-Slurm must now be installed on the compute nodes. To ease this task, you can use your home directory which is mounted on all VMs, to ease distribution of files and scripts used.  
+Slurm must now be installed on the compute nodes. To ease this task, use your home directory which is mounted on all VMs, to ease distribution of files and scripts used.  
 
 From your user account, copy the *munge.key* file to your home directory.
 
 ```bash
+cd
 sudo cp /etc/munge/munge.key .
 ```
 
 Create a script named *node-munge.sh* to set up each node's **munge** settings. This script should be in your home directory.
 
 ```bash
-$ cat >> node-munge.sh <<END
+$ cat > node-munge.sh <<END
 #!/bin/bash
 
 mkdir -p /etc/munge
@@ -236,7 +237,7 @@ chown -R munge:munge /etc/munge/munge.key
 END
 ```
 
-Using the same file of the node hostnames that you previously used, execute the bash script you just created on the node.
+Using the same file of the node hostnames that you previously used, execute the bash script you created on the node.
 
 ```bash
 $ for host in `cat ./slurm_nodes.txt`; do ssh $host sudo sh ~/node-munge.sh; done
@@ -266,32 +267,29 @@ Installed:
 Complete!
 ```
 
-After setting up the compute nodes, be sure to delete the *munge.key* file from your home directory.
-
-```bash
-rm munge.key
-```
+> [!IMPORTANT]
+> After configuring the compute nodes, be sure to delete the *munge.key* file from your home directory.
 
 ## Validate installation
 
-To validate that Slurm installed, a Chamber Admin can execute the *sinfo* command on any Slurm node, either the controller or a compute node.
+To validate that Slurm installed successfully, a Chamber Admin can execute the `sinfo` command on any Slurm node, either on the controller or on a compute node.
 
 ```bash
 $ sinfo
 PARTITION               AVAIL  TIMELIMIT  NODES  STATE NODELIST
 chamberSlurmPartition1*    up   infinite      5   idle wrkldvmslurm-nod0aef63d,wrkldvmslurm-nod034b970...
 ```
 
-You can validate execution on compute nodes by sending a simple command with the *srun* command.
+You can validate execution on compute nodes by sending a simple command using the `srun` command.
 
 ```shell
-$ srun --nodes=3 hostname && srun sleep 30
+$ srun --nodes=6 hostname && srun sleep 30
 wrkldvmslurm-nod034b970
 wrkldvmslurm-nod0aef63d
 wrkldvmslurm-nod10870ad
 ```
 
-If jobs show as queued, run *squeue* to list the job queue.
+If a job shows as *queued*, run `squeue` to list the job queue.
 
 ```shell
 $ squeue
@@ -328,7 +326,7 @@ JobId=12 JobName=sleep
 
 ## Troubleshooting
 
-If a node's state is *down* or *drain*, the *scontrol* command can restart it. Follow that with the *sinfo* command to verify.
+If a node's state is reported as *down* or *drain*, the `scontrol` command can restart it. Follow that with the `sinfo` command to verify activation.
 
 ```bash
 $ sudo -u slurm scontrol update nodename=nodename1,nodename2 state=resume