You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-inference-server-http.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The server can also be used to create validation gates in a continuous integrati
23
23
24
24
This article supports developers who want to use the inference server to debug locally and describes how to use the inference server with online endpoints on Windows.
25
25
26
-
## Explore local debug options for online endpoints
26
+
## Explore local debugging options for online endpoints
27
27
28
28
By debugging endpoints locally before you deploy to the cloud, you can catch errors in your code and configuration earlier. To debug endpoints locally, you have several options, including:
To debug your scoring script locally, you have several options for testing the server behavior:
70
70
71
-
- Try a dummy scoring script
72
-
- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package
73
-
- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples)
71
+
- Try a dummy scoring script.
72
+
- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package.
73
+
- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples).
74
74
75
75
### Test server behavior with dummy scoring script
76
76
@@ -93,7 +93,7 @@ To debug your scoring script locally, you have several options for testing the s
93
93
<!-- Reviewer: The 'source' command appears to apply to Linux only.
94
94
I found that the 'python -m virtualenv...' command (as opposed to 'python -m venv ...') both creates and activates the env. -->
95
95
96
-
After you test the server, you can run `deactivate` to deactivate the Python virtual environment.
96
+
After you test the server, you can run the `deactivate` command to deactivate the Python virtual environment.
97
97
98
98
1. Install the `azureml-inference-server-http` package from the [pypi](https://pypi.org/project/azureml-inference-server-http/) feed:
99
99
@@ -199,7 +199,7 @@ To use VS Code and the [Python Extension](https://marketplace.visualstudio.com/i
199
199
200
200
1. Select **Run** > **Start Debugging** or use the keyboard shortcut F5.
201
201
202
-
1. In the command window, view the logs from the inference server, and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`).
202
+
1. In the command window, view the logs from the inference server and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`):
203
203
204
204
:::image type="content" source="./media/how-to-inference-server-http/debug-attach-pid.png" border="false" alt-text="Screenshot that shows a command window displaying logs from the inference HTTP server and the process ID of the azmlinfsrv command highlighted.":::
205
205
@@ -247,7 +247,7 @@ The following procedure runs the server locally with [sample files](https://gith
When the server launches and successfully invokes the scoring script, the example [startup log](#startup-logs) opens. Otherwise, the log shows error messages.
250
+
When the server launches and successfully invokes the scoring script, the example [startup log](#view-startup-logs) opens. Otherwise, the log shows error messages.
251
251
252
252
1. Test the scoring script with sample data:
253
253
@@ -273,7 +273,7 @@ The inference HTTP server listens on port 5001 by default at the following route
273
273
274
274
## Review server parameters
275
275
276
-
The following table summarizes the parameters accepted by the inference HTTP server:
276
+
The inference HTTP server accepts the following parameters:
277
277
278
278
| Parameter | Required | Default | Description |
279
279
| --- | --- | :---: | --- |
@@ -286,7 +286,7 @@ The following table summarizes the parameters accepted by the inference HTTP ser
286
286
287
287
## Explore server request processing
288
288
289
-
The following steps demonstrate how the Azure Machine Learning inference HTTP server (azmlinfsrv) handles incoming requests:
289
+
The following steps demonstrate how the Azure Machine Learning inference HTTP server (`azmlinfsrv`) handles incoming requests:
290
290
291
291
1. A Python CLI wrapper sits around the server's network stack and is used to start the server.
292
292
@@ -308,7 +308,7 @@ The following steps demonstrate how the Azure Machine Learning inference HTTP se
308
308
There are two ways to obtain log data for the inference HTTP server test:
309
309
310
310
- Run the `azureml-inference-server-http` package locally and view the logs output.
311
-
- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server <version>**.
311
+
- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server \<version>**.
312
312
313
313
> [!NOTE]
314
314
> The logging format has changed since version 0.8.0. If your log uses a different style than expected, update the `azureml-inference-server-http` package to the latest version.
@@ -342,7 +342,7 @@ Score: POST 127.0.0.1:<port>/score
342
342
343
343
For example, when you launch the server by following the [end-to-end example](#use-an-end-to-end-example), the log displays as follows:
344
344
345
-
```
345
+
```console
346
346
Azure Machine Learning Inferencing HTTP server v0.8.0
347
347
348
348
Server Settings
@@ -382,13 +382,13 @@ All logs from the inference HTTP server, except for the launcher script, present
-`<UTC Time>`: Time when the entry was entered into the log.
388
-
-`<pid>`: The ID of the process associated with the entry.
389
-
-`<level>`: The first character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.
390
-
-`<logger name>`: The name of the resource associated with the log entry.
391
-
-`<message>`: The contents of the log message.
388
+
-`<pid>`: ID of the process associated with the entry.
389
+
-`<level>`: First character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.
390
+
-`<logger name>`: Name of the resource associated with the log entry.
391
+
-`<message>`: Contents of the log message.
392
392
393
393
There are six levels of logging in Python with assigned numeric values according to severity:
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-online-endpoints.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -269,7 +269,7 @@ If the error message mentions `"failed to communicate with the workspace's conta
269
269
Image build timeouts are often due to an image becoming too large to be able to complete building within the timeframe of deployment creation.
270
270
To verify if this is your issue, check your image build logs at the location that the error may specify. The logs are cut off at the point that the image build timed out.
271
271
272
-
To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi) so that the image only needs to be pulled during deployment creation.
272
+
To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi&preserve-view=true) so that the image only needs to be pulled during deployment creation.
Copy file name to clipboardExpand all lines: articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ ms.author: shnagata
8
8
9
9
### Check installed packages
10
10
11
-
Follow these steps to determine issues with installed packages:
11
+
Follow these steps to address issues with installed packages:
12
12
13
13
1. Gather information about installed packages and versions for your Python environment.
14
14
15
-
1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#startup-logs).
15
+
1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#view-startup-logs).
16
16
17
17
- In some cases, the pip dependency resolver installs unexpected package versions.
18
18
@@ -110,6 +110,6 @@ You might encounter an `ImportError` or `ModuleNotFoundError` on specific module
110
110
ImportError: cannot import name 'Markup' from 'jinja2'
111
111
```
112
112
113
-
The import and module errors result in older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version.
113
+
The import and module errors occur when you use older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version.
114
114
115
-
This problem is fixed in the latest version of the server.
115
+
To prevent the issue, install a later version of the server.
Copy file name to clipboardExpand all lines: articles/machine-learning/reference-yaml-deployment-managed-online.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ The source JSON schema can be found at https://azuremlschemas.azureedge.net/late
53
53
| Key | Type | Description | Default value |
54
54
| --- | ---- | ----------- | ------------- |
55
55
|`request_timeout_ms`| integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `180000` milliseconds. See [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints) for more. |`5000`|
56
-
| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
56
+
| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#review-server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
57
57
|`max_queue_wait_ms`| integer | (Deprecated) The maximum amount of time in milliseconds a request will stay in the queue. (Now increase `request_timeout_ms` to account for any networking/queue delays) |`500`|
0 commit comments