Skip to content

Commit e989129

Browse files
committed
edits, bookmarks
1 parent 3a614aa commit e989129

File tree

4 files changed

+22
-22
lines changed

4 files changed

+22
-22
lines changed

articles/machine-learning/how-to-inference-server-http.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The server can also be used to create validation gates in a continuous integrati
2323

2424
This article supports developers who want to use the inference server to debug locally and describes how to use the inference server with online endpoints on Windows.
2525

26-
## Explore local debug options for online endpoints
26+
## Explore local debugging options for online endpoints
2727

2828
By debugging endpoints locally before you deploy to the cloud, you can catch errors in your code and configuration earlier. To debug endpoints locally, you have several options, including:
2929

@@ -68,9 +68,9 @@ python -m pip install azureml-inference-server-http
6868

6969
To debug your scoring script locally, you have several options for testing the server behavior:
7070

71-
- Try a dummy scoring script
72-
- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package
73-
- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples)
71+
- Try a dummy scoring script.
72+
- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package.
73+
- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples).
7474

7575
### Test server behavior with dummy scoring script
7676

@@ -93,7 +93,7 @@ To debug your scoring script locally, you have several options for testing the s
9393
<!-- Reviewer: The 'source' command appears to apply to Linux only.
9494
I found that the 'python -m virtualenv...' command (as opposed to 'python -m venv ...') both creates and activates the env. -->
9595

96-
After you test the server, you can run `deactivate` to deactivate the Python virtual environment.
96+
After you test the server, you can run the `deactivate` command to deactivate the Python virtual environment.
9797

9898
1. Install the `azureml-inference-server-http` package from the [pypi](https://pypi.org/project/azureml-inference-server-http/) feed:
9999

@@ -199,7 +199,7 @@ To use VS Code and the [Python Extension](https://marketplace.visualstudio.com/i
199199

200200
1. Select **Run** > **Start Debugging** or use the keyboard shortcut F5.
201201

202-
1. In the command window, view the logs from the inference server, and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`).
202+
1. In the command window, view the logs from the inference server and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`):
203203

204204
:::image type="content" source="./media/how-to-inference-server-http/debug-attach-pid.png" border="false" alt-text="Screenshot that shows a command window displaying logs from the inference HTTP server and the process ID of the azmlinfsrv command highlighted.":::
205205

@@ -247,7 +247,7 @@ The following procedure runs the server locally with [sample files](https://gith
247247
azmlinfsrv --entry_script ./onlinescoring/score.py --model_dir ./
248248
```
249249

250-
When the server launches and successfully invokes the scoring script, the example [startup log](#startup-logs) opens. Otherwise, the log shows error messages.
250+
When the server launches and successfully invokes the scoring script, the example [startup log](#view-startup-logs) opens. Otherwise, the log shows error messages.
251251

252252
1. Test the scoring script with sample data:
253253

@@ -273,7 +273,7 @@ The inference HTTP server listens on port 5001 by default at the following route
273273

274274
## Review server parameters
275275

276-
The following table summarizes the parameters accepted by the inference HTTP server:
276+
The inference HTTP server accepts the following parameters:
277277

278278
| Parameter | Required | Default | Description |
279279
| --- | --- | :---: | --- |
@@ -286,7 +286,7 @@ The following table summarizes the parameters accepted by the inference HTTP ser
286286

287287
## Explore server request processing
288288

289-
The following steps demonstrate how the Azure Machine Learning inference HTTP server (azmlinfsrv) handles incoming requests:
289+
The following steps demonstrate how the Azure Machine Learning inference HTTP server (`azmlinfsrv`) handles incoming requests:
290290

291291
1. A Python CLI wrapper sits around the server's network stack and is used to start the server.
292292

@@ -308,7 +308,7 @@ The following steps demonstrate how the Azure Machine Learning inference HTTP se
308308
There are two ways to obtain log data for the inference HTTP server test:
309309

310310
- Run the `azureml-inference-server-http` package locally and view the logs output.
311-
- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server <version>**.
311+
- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server \<version>**.
312312

313313
> [!NOTE]
314314
> The logging format has changed since version 0.8.0. If your log uses a different style than expected, update the `azureml-inference-server-http` package to the latest version.
@@ -342,7 +342,7 @@ Score: POST 127.0.0.1:<port>/score
342342

343343
For example, when you launch the server by following the [end-to-end example](#use-an-end-to-end-example), the log displays as follows:
344344

345-
```
345+
```console
346346
Azure Machine Learning Inferencing HTTP server v0.8.0
347347

348348
Server Settings
@@ -382,13 +382,13 @@ All logs from the inference HTTP server, except for the launcher script, present
382382

383383
`<UTC Time> | <level> [<pid>] <logger name> - <message>`
384384

385-
The format consists of the following values:
385+
The entry consists of the following components:
386386

387387
- `<UTC Time>`: Time when the entry was entered into the log.
388-
- `<pid>`: The ID of the process associated with the entry.
389-
- `<level>`: The first character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.
390-
- `<logger name>`: The name of the resource associated with the log entry.
391-
- `<message>`: The contents of the log message.
388+
- `<pid>`: ID of the process associated with the entry.
389+
- `<level>`: First character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.
390+
- `<logger name>`: Name of the resource associated with the log entry.
391+
- `<message>`: Contents of the log message.
392392

393393
There are six levels of logging in Python with assigned numeric values according to severity:
394394

articles/machine-learning/how-to-troubleshoot-online-endpoints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ If the error message mentions `"failed to communicate with the workspace's conta
269269
Image build timeouts are often due to an image becoming too large to be able to complete building within the timeframe of deployment creation.
270270
To verify if this is your issue, check your image build logs at the location that the error may specify. The logs are cut off at the point that the image build timed out.
271271

272-
To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi) so that the image only needs to be pulled during deployment creation.
272+
To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi&preserve-view=true) so that the image only needs to be pulled during deployment creation.
273273

274274
#### Generic image build failure
275275

articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ ms.author: shnagata
88

99
### Check installed packages
1010

11-
Follow these steps to determine issues with installed packages:
11+
Follow these steps to address issues with installed packages:
1212

1313
1. Gather information about installed packages and versions for your Python environment.
1414

15-
1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#startup-logs).
15+
1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#view-startup-logs).
1616

1717
- In some cases, the pip dependency resolver installs unexpected package versions.
1818

@@ -110,6 +110,6 @@ You might encounter an `ImportError` or `ModuleNotFoundError` on specific module
110110
ImportError: cannot import name 'Markup' from 'jinja2'
111111
```
112112

113-
The import and module errors result in older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version.
113+
The import and module errors occur when you use older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version.
114114

115-
This problem is fixed in the latest version of the server.
115+
To prevent the issue, install a later version of the server.

articles/machine-learning/reference-yaml-deployment-managed-online.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ The source JSON schema can be found at https://azuremlschemas.azureedge.net/late
5353
| Key | Type | Description | Default value |
5454
| --- | ---- | ----------- | ------------- |
5555
| `request_timeout_ms` | integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `180000` milliseconds. See [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints) for more. | `5000` |
56-
| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
56+
| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#review-server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
5757
| `max_queue_wait_ms` | integer | (Deprecated) The maximum amount of time in milliseconds a request will stay in the queue. (Now increase `request_timeout_ms` to account for any networking/queue delays) | `500` |
5858

5959
### ProbeSettings

0 commit comments

Comments
 (0)