edits, bookmarks

GitHubber17 · GitHubber17 · commit e9891297056c · 2024-08-08T15:24:45.000-07:00
diff --git a/articles/machine-learning/how-to-inference-server-http.md b/articles/machine-learning/how-to-inference-server-http.md
@@ -23,7 +23,7 @@ The server can also be used to create validation gates in a continuous integrati
 
 This article supports developers who want to use the inference server to debug locally and describes how to use the inference server with online endpoints on Windows.
 
-## Explore local debug options for online endpoints
+## Explore local debugging options for online endpoints
 
 By debugging endpoints locally before you deploy to the cloud, you can catch errors in your code and configuration earlier. To debug endpoints locally, you have several options, including:
 
@@ -68,9 +68,9 @@ python -m pip install azureml-inference-server-http
 
 To debug your scoring script locally, you have several options for testing the server behavior:
 
-- Try a dummy scoring script
-- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package
-- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples)
+- Try a dummy scoring script.
+- Use Visual Studio Code to debug with the [azureml-inference-server-http](https://pypi.org/project/azureml-inference-server-http/) package.
+- Run an actual scoring script, model file, and environment file from our [examples repo](https://github.com/Azure/azureml-examples).
 
 ### Test server behavior with dummy scoring script
 
@@ -93,7 +93,7 @@ To debug your scoring script locally, you have several options for testing the s
    <!-- Reviewer: The 'source' command appears to apply to Linux only. 
         I found that the 'python -m virtualenv...' command (as opposed to 'python -m venv ...') both creates and activates the env. -->
 
-   After you test the server, you can run `deactivate` to deactivate the Python virtual environment.
+   After you test the server, you can run the `deactivate` command to deactivate the Python virtual environment.
 
 1. Install the `azureml-inference-server-http` package from the [pypi](https://pypi.org/project/azureml-inference-server-http/) feed:
 
@@ -199,7 +199,7 @@ To use VS Code and the [Python Extension](https://marketplace.visualstudio.com/i
 
       1. Select **Run** > **Start Debugging** or use the keyboard shortcut F5.
 
-      1. In the command window, view the logs from the inference server, and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`).
+      1. In the command window, view the logs from the inference server and locate the process ID of the `azmlinfsrv` command (not the `gunicorn`):
 
          :::image type="content" source="./media/how-to-inference-server-http/debug-attach-pid.png" border="false" alt-text="Screenshot that shows a command window displaying logs from the inference HTTP server and the process ID of the azmlinfsrv command highlighted.":::
 
@@ -247,7 +247,7 @@ The following procedure runs the server locally with [sample files](https://gith
    azmlinfsrv --entry_script ./onlinescoring/score.py --model_dir ./
    ```
 
-   When the server launches and successfully invokes the scoring script, the example [startup log](#startup-logs) opens. Otherwise, the log shows error messages.
+   When the server launches and successfully invokes the scoring script, the example [startup log](#view-startup-logs) opens. Otherwise, the log shows error messages.
 
 1. Test the scoring script with sample data:
 
@@ -273,7 +273,7 @@ The inference HTTP server listens on port 5001 by default at the following route
 
 ## Review server parameters
 
-The following table summarizes the parameters accepted by the inference HTTP server:
+The inference HTTP server accepts the following parameters:
 
 | Parameter | Required | Default | Description |
 | --- | --- | :---: | --- |
@@ -286,7 +286,7 @@ The following table summarizes the parameters accepted by the inference HTTP ser
 
 ## Explore server request processing
 
-The following steps demonstrate how the Azure Machine Learning inference HTTP server (azmlinfsrv) handles incoming requests:
+The following steps demonstrate how the Azure Machine Learning inference HTTP server (`azmlinfsrv`) handles incoming requests:
 
 1. A Python CLI wrapper sits around the server's network stack and is used to start the server.
 
@@ -308,7 +308,7 @@ The following steps demonstrate how the Azure Machine Learning inference HTTP se
 There are two ways to obtain log data for the inference HTTP server test:
 
 - Run the `azureml-inference-server-http` package locally and view the logs output.
-- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server <version>**.
+- Use online endpoints and view the [container logs](how-to-troubleshoot-online-endpoints.md#get-container-logs). The log for the inference server is named **Azure Machine Learning Inferencing HTTP server \<version>**.
 
 > [!NOTE]
 > The logging format has changed since version 0.8.0. If your log uses a different style than expected, update the `azureml-inference-server-http` package to the latest version.
@@ -342,7 +342,7 @@ Score:          POST  127.0.0.1:<port>/score
 
 For example, when you launch the server by following the [end-to-end example](#use-an-end-to-end-example), the log displays as follows:
 
-```
+```console
 Azure Machine Learning Inferencing HTTP server v0.8.0
 
 Server Settings
@@ -382,13 +382,13 @@ All logs from the inference HTTP server, except for the launcher script, present
 
 `<UTC Time> | <level> [<pid>] <logger name> - <message>`
 
-The format consists of the following values:
+The entry consists of the following components:
 
 - `<UTC Time>`: Time when the entry was entered into the log.
-- `<pid>`: The ID of the process associated with the entry.
-- `<level>`: The first character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.  
-- `<logger name>`: The name of the resource associated with the log entry.
-- `<message>`: The contents of the log message.
+- `<pid>`: ID of the process associated with the entry.
+- `<level>`: First character of the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) for the entry, such as `E` for ERROR, `I` for INFO, and so on.  
+- `<logger name>`: Name of the resource associated with the log entry.
+- `<message>`: Contents of the log message.
 
 There are six levels of logging in Python with assigned numeric values according to severity:
 
diff --git a/articles/machine-learning/how-to-troubleshoot-online-endpoints.md b/articles/machine-learning/how-to-troubleshoot-online-endpoints.md
@@ -269,7 +269,7 @@ If the error message mentions `"failed to communicate with the workspace's conta
 Image build timeouts are often due to an image becoming too large to be able to complete building within the timeframe of deployment creation.
 To verify if this is your issue, check your image build logs at the location that the error may specify. The logs are cut off at the point that the image build timed out.
 
-To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi) so that the image only needs to be pulled during deployment creation.
+To resolve this, please [build your image separately](/azure/devops/pipelines/ecosystems/containers/publish-to-acr?view=azure-devops&tabs=javascript%2Cportal%2Cmsi&preserve-view=true) so that the image only needs to be pulled during deployment creation.
 
 #### Generic image build failure
 
diff --git a/articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md b/articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md
@@ -8,11 +8,11 @@ ms.author: shnagata
 
 ### Check installed packages
 
-Follow these steps to determine issues with installed packages:
+Follow these steps to address issues with installed packages:
 
 1. Gather information about installed packages and versions for your Python environment.
 
-1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#startup-logs).
+1. Confirm the `azureml-inference-server-http` Python package version specified in the environment file matches the Azure Machine Learning inference HTTP server version displayed in the [startup log](../how-to-inference-server-http.md#view-startup-logs).
 
    - In some cases, the pip dependency resolver installs unexpected package versions.
 
@@ -110,6 +110,6 @@ You might encounter an `ImportError` or `ModuleNotFoundError` on specific module
 ImportError: cannot import name 'Markup' from 'jinja2'
 ```
 
-The import and module errors result in older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version. 
+The import and module errors occur when you use older versions of the server (version **0.4.10** and earlier) that don't pin the Flask dependency to a compatible version. 
 
-This problem is fixed in the latest version of the server.
+To prevent the issue, install a later version of the server.
diff --git a/articles/machine-learning/reference-yaml-deployment-managed-online.md b/articles/machine-learning/reference-yaml-deployment-managed-online.md
@@ -53,7 +53,7 @@ The source JSON schema can be found at https://azuremlschemas.azureedge.net/late
 | Key | Type | Description | Default value |
 | --- | ---- | ----------- | ------------- |
 | `request_timeout_ms` | integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `180000` milliseconds. See [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints) for more. | `5000` |
-| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
+| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#review-server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
 | `max_queue_wait_ms` | integer | (Deprecated) The maximum amount of time in milliseconds a request will stay in the queue. (Now increase `request_timeout_ms` to account for any networking/queue delays) | `500` |
 
 ### ProbeSettings