Skip to content

Commit e78282d

Browse files
Apply suggestions from code review
Co-authored-by: Mope Akande <[email protected]>
1 parent d0a5db0 commit e78282d

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/machine-learning/how-to-troubleshoot-batch-endpoints.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: inferencing
88
ms.topic: troubleshooting-general
99
author: msakande
1010
ms.author: mopeakande
11-
ms.date: 07/19/2024
11+
ms.date: 07/29/2024
1212
ms.reviewer: cacrest
1313
ms.custom: devplatv2
1414

@@ -27,15 +27,15 @@ After you invoke a batch endpoint by using the Azure CLI or the REST API, the ba
2727

2828
- **Option 1**: Stream job logs to a local console. Only logs in the _azureml-logs_ folder are streamed.
2929

30-
Run the following command to stream system-generated logs to your console. Replace the `\<job_name>` parameter with the name of your batch scoring job:
30+
Run the following command to stream system-generated logs to your console. Replace the `<job_name>` parameter with the name of your batch scoring job:
3131

3232
```azurecli
3333
az ml job stream --name <job_name>
3434
```
3535

3636
- **Option 2**: View job logs in Azure Machine Learning studio.
3737

38-
Run the following command to get the job link to use in the studio. Replace the `\<job_name>` parameter with the name of your batch scoring job:
38+
Run the following command to get the job link to use in the studio. Replace the `<job_name>` parameter with the name of your batch scoring job:
3939

4040
```azurecli
4141
az ml job show --name <job_name> --query services.Studio.endpoint -o tsv
@@ -49,7 +49,7 @@ After you invoke a batch endpoint by using the Azure CLI or the REST API, the ba
4949

5050
## Review log files
5151

52-
Machine Learning provides several types of log files and other data files that you can use to help troubleshoot your batch scoring job.
52+
Azure Machine Learning provides several types of log files and other data files that you can use to help troubleshoot your batch scoring job.
5353

5454
The two top-level folders for batch scoring logs are _azureml-logs_ and _logs_. Information from the controller that launches the scoring script is stored in the _~/azureml-logs/70\_driver\_log.txt_ file.
5555

@@ -59,8 +59,8 @@ The distributed nature of batch scoring jobs results in logs from different sour
5959

6060
| File | Description |
6161
| --- | --- |
62-
| **~/logs/job_progress_overview.txt** | Provides high-level information about the current number of created mini-batches (also known as _tasks_) created and the current number of processed mini-batches. As processing for mini-batches comes to an end, the log records the results of the job. If the job fails, the log shows the error message and where to start the troubleshooting. |
63-
| **~/logs/sys/master_role.txt** | Supplies the principal node (also known as the _orchestrator_) view of the running job. This log includes information about the task creation, progress monitoring, and the job result. |
62+
| **~/logs/job_progress_overview.txt** | Provides high-level information about the current number of mini-batches (also known as _tasks_) created and the current number of processed mini-batches. As processing for mini-batches comes to an end, the log records the results of the job. If the job fails, the log shows the error message and where to start the troubleshooting. |
63+
| **~/logs/sys/master_role.txt** | Provides the principal node (also known as the _orchestrator_) view of the running job. This log includes information about the task creation, progress monitoring, and the job result. |
6464

6565
### Examine stack trace data for errors
6666

@@ -69,13 +69,13 @@ Other files provide information about possible errors in your script:
6969
| File | Description |
7070
| --- | --- |
7171
| **~/logs/user/error.txt** | Provides a summary of errors in your script. |
72-
| **~/logs/user/error/\*** | Supplies the full stack traces of exceptions thrown while loading and running the entry script. |
72+
| **~/logs/user/error/\*** | Provides the full stack traces of exceptions thrown while loading and running the entry script. |
7373

7474
### Examine process logs per node
7575

7676
For a complete understanding of how each node executes your score script, examine the individual process logs for each node. The process logs are stored in the _~/logs/sys/node_ folder and grouped by worker nodes.
7777

78-
The folder contains an _\<ip\_address>/_ subfolder and a _\<process\_name>.txt_ file with detailed info about each mini-batch. The folder contents updates when a worker selects or completes the mini-batch. For each mini-batch, the log file includes:
78+
The folder contains an _\<ip\_address>/_ subfolder that contains a _\<process\_name>.txt_ file with detailed info about each mini-batch. The folder contents updates when a worker selects or completes the mini-batch. For each mini-batch, the log file includes:
7979

8080
- The IP address and the process ID (PID) of the worker process.
8181
- The total number of items, the number of successfully processed items, and the number of failed items.
@@ -94,7 +94,7 @@ The folder contains an _\<ip\_address>/_ subfolder about each mini-batch. The fo
9494

9595
| File or Folder | Description |
9696
| --- | --- |
97-
| **os/** | Stores information about all running processes in the node. One check runs an operating system command and saves the result to a file. On Linux, the command is `ps`. The folder contains the following items: <br> - **%Y%m%d%H**: Contains one or more process check files. The subfolder name is the creation date and time of the check (Year, Month, Day, Hour). <br> **processes_%M**: Shows details about the process check. The file name ends with the check time (Minute) relative to the check creation time. |
97+
| **os/** | Stores information about all running processes in the node. One check runs an operating system command and saves the result to a file. On Linux, the command is `ps`. The folder contains the following items: <br> - **%Y%m%d%H**: Subfolder that contains one or more process check files. The subfolder name is the creation date and time of the check (Year, Month, Day, Hour). <br> **processes_%M**: File within the subfolder. The file shows details about the process check. The file name ends with the check time (Minute) relative to the check creation time. |
9898
| **node_disk_usage.csv** | Shows the detailed disk usage of the node. |
9999
| **node_resource_usage.csv** | Supplies the resource usage overview of the node. |
100100
| **processes_resource_usage.csv** | Provides a resource usage overview of each process. |
@@ -126,15 +126,15 @@ logger.debug("Debug log statement")
126126

127127
The following sections describe common errors that can occur during batch endpoint development and consumption, and steps for resolution.
128128

129-
### No azureml module in installation
129+
### No module named azureml
130130

131131
Azure Machine Learning batch deployment requires the **azureml-core** package in the installation.
132132

133-
**Message logged**: "No module named azureml."
133+
**Message logged**: "No module named `azureml`."
134134

135-
**Reason**: The azureml-core package appears to be missing in the installation.
135+
**Reason**: The `azureml-core` package appears to be missing in the installation.
136136

137-
**Solution**: Add the azureml-core package to your conda dependencies file.
137+
**Solution**: Add the `azureml-core` package to your conda dependencies file.
138138

139139
### No output in predictions file
140140

@@ -170,7 +170,7 @@ For batch deployment to succeed, the managed identity for the compute cluster mu
170170

171171
**Solution**: Ensure the managed identity associated with the compute cluster where your deployment is running has at least [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) access to the storage account. Only Azure Storage account owners can [change the access level in the Azure portal](../storage/blobs/assign-azure-role-data-access.md).
172172

173-
### No mounted storage, no dataset initialization
173+
### Dataset initialization failed, can't mount dataset
174174

175175
The batch deployment process requires mounted storage for the data asset. When the storage doesn't mount, the dataset can't be initialized.
176176

@@ -180,11 +180,11 @@ The batch deployment process requires mounted storage for the data asset. When t
180180

181181
**Solution**: Ensure the managed identity associated with the compute cluster where your deployment is running has at least [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) access to the storage account. Only Azure Storage account owners can [change the access level in the Azure portal](../storage/blobs/assign-azure-role-data-access.md).
182182

183-
### No value for dataset_param parameter
183+
### The dataset_param parameter doesn't have a specified value or a default value
184184

185185
During batch deployment, the data set node references the `dataset_param` parameter. For the deployment to proceed, the parameter must have an assigned value or a specified default value.
186186

187-
**Message logged**: "Data set node [code] references parameter dataset_param, which doesn't have a specified value or a default value."
187+
**Message logged**: "Data set node [code] references parameter `dataset_param`, which doesn't have a specified value or a default value."
188188

189189
**Reason**: The input data asset provided to the batch endpoint isn't supported.
190190

@@ -272,7 +272,7 @@ For batch deployment to succeed, the batch endpoint must have at least one valid
272272

273273
- Define the route with a deployment-specific header.
274274

275-
## Review unsupported configurations and file types
275+
## Limitations and unsupported scenarios
276276

277277
When you design machine learning deployment solutions that rely on batch endpoints, keep in mind that some configurations and scenarios aren't supported. The following sections identify unsupported workspaces and compute resources, and invalid types for input files.
278278

0 commit comments

Comments
 (0)