edits

v-thepet · v-thepet · commit e18c2951c006 · 2023-04-12T09:26:26.000-07:00
diff --git a/articles/batch/batch-pool-node-error-checking.md b/articles/batch/batch-pool-node-error-checking.md
@@ -17,13 +17,13 @@ Pool errors might be related to resize timeout or failure, automatic scaling fai
 
 ### Resize timeout or failure
 
-When you create a new pool or resize an existing pool, you specify the target number of nodes. The create or resize operation completes immediately, but the actual allocation of new nodes or removal of existing nodes might take several minutes. You can specify the resize timeout in the [Pool - Add](/rest/api/batchservice/pool/add) or [Pool - Resize](/rest/api/batchservice/pool/resize) APIs. If Batch can't allocate the target number of nodes during the resize timeout period, the pool goes into a steady state and reports resize errors.
+When you create a new pool or resize an existing pool, you specify the target number of nodes. The create or resize operation completes immediately, but the actual allocation of new nodes or removal of existing nodes might take several minutes. You can specify the resize timeout in the [Pool - Add](/rest/api/batchservice/pool/add) or [Pool - Resize](/rest/api/batchservice/pool/resize) APIs. If Batch can't allocate the target number of nodes during the resize timeout period, the pool goes into a steady state, and reports resize errors.
 
-The [ResizeError](/rest/api/batchservice/pool/get#resizeerror) property for the most recent evaluation lists the errors that occurred.
+The [ResizeError](/rest/api/batchservice/pool/get#resizeerror) property lists the errors that occurred for the most recent evaluation.
 
 Common causes for resize errors include:
 
-- **Resize timeout too short.** The default timeout of 15 minutes is usually long enough to allocate or remove pool nodes. If you're allocating a large number of nodes, such as more than 1,000 nodes from an Azure Marketplace image, or more than 300 nodes from a custom virtual machine (VM) image, you can set the resize timeout to 30 minutes.
+- **Resize timeout too short.** Usually, the default timeout of 15 minutes is long enough to allocate or remove pool nodes. If you're allocating a large number of nodes, such as more than 1,000 nodes from an Azure Marketplace image, or more than 300 nodes from a custom virtual machine (VM) image, you can set the resize timeout to 30 minutes.
 
 - **Insufficient core quota.** A Batch account is limited in the number of cores it can allocate across all pools, and stops allocating nodes once it reaches that quota. You can increase the core quota so Batch can allocate more nodes. For more information, see [Batch service quotas and limits](batch-quota-limit.md).
 
@@ -57,7 +57,7 @@ If the pool deletion is taking longer than expected, Batch retries periodically
 
 - Resource locks might be placed on Batch-created resources, or on network resources that Batch uses.
 
-- Resources that you created might depend on a Batch-created resource. For instance, if you [create a pool in a virtual network](batch-virtual-network.md), Batch creates a NSG, a public IP address, and a load balancer. If you use these resources outside the pool, you must remove that dependency to delete the pool.
+- Resources that you created might depend on a Batch-created resource. For instance, if you [create a pool in a virtual network](batch-virtual-network.md), Batch creates an NSG, a public IP address, and a load balancer. If you use these resources outside the pool, you must remove that dependency to delete the pool.
 
 - The `Microsoft.Batch` resource provider might be unregistered from the subscription that contains your pool.
 
@@ -77,7 +77,7 @@ A failed start task also causes Batch to set the [computeNodeState](/rest/api/ba
 
 As with any task, there can be many causes for a start task failure. To troubleshoot, check the *stdout*, *stderr*, and any other task-specific log files.
 
-Start tasks must be reentrant, because the start task can run multiple times on the same node, for example when the node is reimaged or rebooted. In rare cases, when a start task runs after an event causes a node reboot, one operating system (OS) or ephemeral disk reimages while the other doesn't. Since Batch start tasks and all Batch tasks run from the ephemeral disk, this situation isn't usually a problem. However, in some cases where the start task installs an application to the OS disk and keeps other data on the ephemeral disk, there can be sync problems. Protect your application accordingly if you use both disks.
+Start tasks must be re-entrant, because the start task can run multiple times on the same node, for example when the node is reimaged or rebooted. In rare cases, when a start task runs after an event causes a node reboot, one operating system (OS) or ephemeral disk reimages while the other doesn't. Since Batch start tasks and all Batch tasks run from the ephemeral disk, this situation isn't usually a problem. However, in some cases where the start task installs an application to the OS disk and keeps other data on the ephemeral disk, there can be sync problems. Protect your application accordingly if you use both disks.
 
 ### Application package download failure
 
@@ -95,42 +95,42 @@ For Windows pools, `enableAutomaticUpdates` is set to `true` by default. Allowin
 
 ### Node in unusable state
 
-Batch might set the [computeNodeState](/rest/api/batchservice/computenode/get#computenodestate) to `unusable` for many reasons. You can't schedule tasks to a node with state set to `unusable`, but the node still incurs charges.
+Batch might set the [computeNodeState](/rest/api/batchservice/computenode/get#computenodestate) to `unusable` for many reasons. You can't schedule tasks to an `unusable` node, but the node still incurs charges.
 
-If Batch can determine the cause, the [computeNodeError](/rest/api/batchservice/computenode/get#computenodeerror) property reports it. If a node is in an `unusable` state, but has no [computeNodeError](/rest/api/batchservice/computenode/get#computenodeerror), it means Batch is unable to communicate with the VM. In this case, Batch always tries to recover the VM. However, Batch won't automatically attempt to recover VMs that failed to install application packages or containers, even if their state is `unusable`.
+If Batch can determine the cause, the [computeNodeError](/rest/api/batchservice/computenode/get#computenodeerror) property reports it. If a node is in an `unusable` state, but has no [computeNodeError](/rest/api/batchservice/computenode/get#computenodeerror), it means Batch is unable to communicate with the VM. In this case, Batch always tries to recover the VM. However, Batch doesn't automatically attempt to recover VMs that failed to install application packages or containers, even if their state is `unusable`.
 
 Other reasons for `unusable` nodes might include the following causes:
 
 - A custom VM image is invalid. For example, the image isn't properly prepared.
 - A VM is moved because of an infrastructure failure or a low-level upgrade. Batch recovers the node.
-- A VM image has been deployed on hardware that doesn't support it. For example, a CentOS HPC image is deployed on a [Standard_D1_v2](/azure/virtual-machines/dv2-dsv2-series.md) VM.
+- A VM image has been deployed on hardware that doesn't support it. For example, a CentOS HPC image is deployed on a [Standard_D1_v2](/azure/virtual-machines/dv2-dsv2-series) VM.
 - The VMs are in an [Azure virtual network](batch-virtual-network.md), and traffic has been blocked to key ports.
 - The VMs are in a virtual network, but outbound traffic to Azure Storage is blocked.
 - The VMs are in a virtual network with a custom DNS configuration, and the DNS server can't resolve Azure storage.
 
 ### Node agent log files
 
-The Batch agent process that runs on each pool node provides log files that might help if you need to contact support about a pool node issue. You can upload log files for a node via the Azure portal, Batch Explorer, or the [Compute Node - Upload Batch Service Logs](/rest/api/batchservice/computenode/uploadbatchservicelogs) API. You can upload and save the log files and then delete the node or pool to save the cost of running the nodes.
+The Batch agent process that runs on each pool node provides log files that might help if you need to contact support about a pool node issue. You can upload log files for a node via the Azure portal, Batch Explorer, or the [Compute Node - Upload Batch Service Logs](/rest/api/batchservice/computenode/uploadbatchservicelogs) API. Upload and save the log files and then delete the node or pool to save the cost of running the nodes.
 
 ### Node disk full
 
-Batch uses the temporary drive for a node pool VM to store files such as the following job files, task files, and shared files:
+Batch uses the temporary drive on a node pool VM to store files such as the following job files, task files, and shared files:
 
 - Application package files
 - Task resource files
 - Application-specific files downloaded to one of the Batch folders
 - *Stdout* and *stderr* files for each task application execution
 - Application-specific output files
 
-Some of these files, such as pool application packages or pool start task resource files, are written only once when pool nodes are created. Even though they only write once, if these files are too large they could fill the temporary drive.
+Files like application packages or start task resource files write only once when Batch creates the pool node. Even though they only write once, if these files are too large they could fill the temporary drive.
 
 Other files, such as *stdout* and *stderr*, are written for each task that a node runs. If a large number of tasks run on the same node, or the task files are too large, they could fill the temporary drive.
 
 The node also needs a small amount of space on the OS disk to create users after it starts.
 
 The size of the temporary drive depends on the VM size. One consideration when picking a VM size is to ensure that the temporary drive has enough space for the planned workload.
 
-When you add a pool in the Azure portal, you can display the full list of VM sizes, including a **Resource disk size** column. The articles that describe VM sizes have tables with a **Temp Storage** column. For example, see [Compute optimized virtual machine sizes](/azure/virtual-machines/sizes-compute.md).
+When you add a pool in the Azure portal, you can display the full list of VM sizes, including a **Resource disk size** column. The articles that describe VM sizes have tables with a **Temp Storage** column. For more information, see [Compute optimized virtual machine sizes](/azure/virtual-machines/sizes-compute). For an example size table, see [Fsv2-series](/azure/virtual-machines/fsv2-series).
 
 You can specify a retention time for files written by each task. The retention time determines how long to keep the task files before automatically cleaning them up. You can reduce the retention time to lower storage requirements.
 
@@ -142,7 +142,7 @@ After you make sure to retrieve any data you need from the node or upload it to
 
 You can delete old completed jobs or tasks whose task data is still on the nodes. Look in the `recentTasks` collection in the [taskInformation](/rest/api/batchservice/computenode/get#taskinformation) on the node, or use the [File - List From Compute Node](/rest/api/batchservice/file/listfromcomputenode) API. Deleting a job deletes all the tasks in the job. Deleting the tasks in the job triggers deletion of data in the task directories on the nodes, and frees up space. Once you've freed up enough space, reboot the node. The node should move out of `unusable` state and into `idle` again.
 
-To recover an unusable node in [VirtualMachineConfiguration](/rest/api/batchservice/pool/add#virtualmachineconfiguration) pools, you can remove a node from the pool by using the [Pool - Remove Nodes](/rest/api/batchservice/pool/removenodes) API. Then you can grow the pool again to replace the bad node with a fresh one. For [CloudServiceConfiguration](/rest/api/batchservice/pool/add#cloudserviceconfiguration) pools, you can reimage the node by using the [Compute Node - Reimage](/rest/api/batchservice/computenode/reimage) API to clean the entire disk. Reimage isn't currently supported for [VirtualMachineConfiguration](/rest/api/batchservice/pool/add#virtualmachineconfiguration) pools.
+To recover an unusable node in [VirtualMachineConfiguration](/rest/api/batchservice/pool/add#virtualmachineconfiguration) pools, you can remove the node from the pool by using the [Pool - Remove Nodes](/rest/api/batchservice/pool/removenodes) API. Then you can grow the pool again to replace the bad node with a fresh one. For [CloudServiceConfiguration](/rest/api/batchservice/pool/add#cloudserviceconfiguration) pools, you can reimage the node by using the [Compute Node - Reimage](/rest/api/batchservice/computenode/reimage) API to clean the entire disk. Reimage isn't currently supported for [VirtualMachineConfiguration](/rest/api/batchservice/pool/add#virtualmachineconfiguration) pools.
 
 ## Next steps