Update text; use shortened name of inference server consistently

JKirsch1 · JKirsch1 · commit 7f9240267aac · 2025-01-06T14:19:15.000-05:00
diff --git a/articles/machine-learning/how-to-inference-server-http.md b/articles/machine-learning/how-to-inference-server-http.md
diff --git a/articles/machine-learning/how-to-troubleshoot-online-endpoints.md b/articles/machine-learning/how-to-troubleshoot-online-endpoints.md
@@ -474,7 +474,7 @@ To run the *score.py* file you provide as part of the deployment, Azure creates
 
 - There's an error in the container environment setup, such as a missing dependency.
 
-  If you get the `TypeError: register() takes 3 positional arguments but 4 were given` error, check the dependency between flask v2 and `azureml-inference-server-http`. For more information, see [Troubleshoot HTTP server issues](how-to-inference-server-http.md#typeerror-during-server-startup).
+  If you get the `TypeError: register() takes 3 positional arguments but 4 were given` error, check the dependency between flask v2 and `azureml-inference-server-http`. For more information, see [Troubleshoot HTTP server issues](how-to-inference-server-http.md#typeerror-during-inference-server-startup).
 
 ### ERROR: ResourceNotFound
 
diff --git a/articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md b/articles/machine-learning/includes/machine-learning-inference-server-troubleshooting.md
@@ -13,25 +13,25 @@ Follow these steps to address issues with installed packages:
 
 1. Gather information about installed packages and versions for your Python environment.
 
-1. In your environment file, check the version of the `azureml-inference-server-http` Python package that's specified. In the Azure Machine Learning inference HTTP server [startup logs](../how-to-inference-server-http.md#view-startup-logs), check the version of the server that's displayed. Confirm that the two versions match.
+1. In your environment file, check the version of the `azureml-inference-server-http` Python package that's specified. In the Azure Machine Learning inference HTTP server [startup logs](../how-to-inference-server-http.md#view-startup-logs), check the version of the inference server that's displayed. Confirm that the two versions match.
 
    In some cases, the pip dependency resolver installs unexpected package versions. You might need to run `pip` to correct installed packages and versions.
 
 1. If you specify Flask or its dependencies in your environment, remove these items.
 
    - Dependent packages include `flask`, `jinja2`, `itsdangerous`, `werkzeug`, `markupsafe`, and `click`.
-   - The `flask` package is listed as a dependency in the server package. The best approach is to allow the inference server to install the `flask` package.
-   - When the inference server is configured to support new versions of Flask, the server automatically receives the package updates as they become available.
+   - The `flask` package is listed as a dependency in the inference server package. The best approach is to allow the inference server to install the `flask` package.
+   - When the inference server is configured to support new versions of Flask, the inference server automatically receives the package updates as they become available.
 
-### Check server version
+### Check inference server version
 
 The `azureml-inference-server-http` server package is published to PyPI. The [PyPI page](https://pypi.org/project/azureml-inference-server-http/) lists the changelog and all versions of the package.
 
 If you use an early package version, update your configuration to the latest version. The following table summarizes stable versions, common issues, and recommended adjustments:
 
 | Package version | Description | Issue | Resolution |
-| --- | --- | --- |
-| 0.4.x | Bundled in training images dated `20220601` or earlier and `azureml-defaults` package versions 0.1.34 through 1.43. Latest stable version is 0.4.13. | For server versions earlier than 0.4.11, you might encounter Flask dependency issues, such as `can't import name Markup from jinja2`. | Upgrade to version 0.4.13 or 0.8.x, the latest version, if possible. |
+| --- | --- | --- | --- |
+| 0.4.x | Bundled in training images dated `20220601` or earlier and `azureml-defaults` package versions 0.1.34 through 1.43. Latest stable version is 0.4.13. | For server versions earlier than 0.4.11, you might encounter Flask dependency issues, such as `can't import name Markup from jinja2`. | Upgrade to version 0.4.13 or 1.4.x, the latest version, if possible. |
 | 0.6.x | Preinstalled in inferencing images dated `20220516` and earlier. Latest stable version is 0.6.1. | N/A | N/A |
 | 0.7.x | Supports Flask 2. Latest stable version is 0.7.7. | N/A | N/A |
 | 0.8.x | Uses an updated log format. Ends support for Python 3.6. | N/A | N/A |
@@ -54,9 +54,9 @@ If you specify the `azureml-defaults` package in your Python environment, the `a
 > [!TIP]
 > If you use the Azure Machine Learning SDK for Python v1 and don't explicitly specify the `azureml-defaults` package in your Python environment, the SDK might automatically add the package. However, the package version is locked relative to the SDK version. For example, if the SDK version is 1.38.0, the `azureml-defaults==1.38.0` entry is added to the environment's pip requirements.
 
-### TypeError during server startup
+### TypeError during inference server startup
 
-You might encounter the following `TypeError` during server startup:
+You might encounter the following `TypeError` during inference server startup:
 
 ```bash
 TypeError: register() takes 3 positional arguments but 4 were given
@@ -89,20 +89,20 @@ This error occurs when you have Flask 2 installed in your Python environment, bu
 
   If you don't see a similar message in your container log, your image is out-of-date and should be updated. If you use a Compute Unified Device Architecture (CUDA) image and you can't find a newer image, check the [AzureML-Containers](https://github.com/Azure/AzureML-Containers) repo to see whether your image is deprecated. You can find designated replacements for deprecated images.
 
-  If you use the server with an online endpoint, you can also find the logs in Azure Machine Learning studio. On the page for your endpoint, select the **Logs** tab.
+  If you use the inference server with an online endpoint, you can also find the logs in Azure Machine Learning studio. On the page for your endpoint, select the **Logs** tab.
 
-If you deploy with the SDK v1 and don't explicitly specify an image in your deployment configuration, the server applies the `openmpi4.1.0-ubuntu20.04` package with a version that matches your local SDK toolset. However, the installed version might not be the latest available version of the image.
+If you deploy with the SDK v1 and don't explicitly specify an image in your deployment configuration, the inference server applies the `openmpi4.1.0-ubuntu20.04` package with a version that matches your local SDK toolset. However, the installed version might not be the latest available version of the image.
 
-For SDK version 1.43, the server installs the `openmpi4.1.0-ubuntu20.04:20220616` package version by default, but this package version isn't compatible with SDK 1.43. Make sure you use the latest SDK for your deployment.
+For SDK version 1.43, the inference server installs the `openmpi4.1.0-ubuntu20.04:20220616` package version by default, but this package version isn't compatible with SDK 1.43. Make sure you use the latest SDK for your deployment.
 
-If you can't update the image, you can temporarily avoid the issue by pinning the `azureml-defaults==1.43` or `azureml-inference-server-http~=0.4.13` entries in your environment file. These entries direct the server to install the older version with `flask 1.0.x`.
+If you can't update the image, you can temporarily avoid the issue by pinning the `azureml-defaults==1.43` or `azureml-inference-server-http~=0.4.13` entries in your environment file. These entries direct the inference server to install the older version with `flask 1.0.x`.
 
-### ImportError or ModuleNotFoundError during server startup
+### ImportError or ModuleNotFoundError during inference server startup
 
-You might encounter an `ImportError` or `ModuleNotFoundError` on specific modules, such as  `opencensus`, `jinja2`, `markupsafe`, or `click`, during server startup. The following example shows the error message:
+You might encounter an `ImportError` or `ModuleNotFoundError` on specific modules, such as  `opencensus`, `jinja2`, `markupsafe`, or `click`, during inference server startup. The following example shows the error message:
 
 ```bash
 ImportError: cannot import name 'Markup' from 'jinja2'
 ```
 
-The import and module errors occur when you use version 0.4.10 or earlier versions of the server that don't pin the Flask dependency to a compatible version. To prevent the issue, install a later version of the server.
+The import and module errors occur when you use version 0.4.10 or earlier versions of the inference server that don't pin the Flask dependency to a compatible version. To prevent the issue, install a later version of the inference server.
diff --git a/articles/machine-learning/media/how-to-inference-server-http/inference-server-architecture.png b/articles/machine-learning/media/how-to-inference-server-http/inference-server-architecture.png
diff --git a/articles/machine-learning/reference-yaml-deployment-managed-online.md b/articles/machine-learning/reference-yaml-deployment-managed-online.md
@@ -53,7 +53,7 @@ The source JSON schema can be found at https://azuremlschemas.azureedge.net/late
 | Key | Type | Description | Default value |
 | --- | ---- | ----------- | ------------- |
 | `request_timeout_ms` | integer | The scoring timeout in milliseconds. Note that the maximum value allowed is `180000` milliseconds. See [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints) for more. | `5000` |
-| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#review-server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
+| `max_concurrent_requests_per_instance` | integer | The maximum number of concurrent requests per instance allowed for the deployment. <br><br> **Note:** If you're using [Azure Machine Learning Inference Server](how-to-inference-server-http.md) or [Azure Machine Learning Inference Images](concept-prebuilt-docker-images-inference.md), your model must be configured to handle concurrent requests. To do so, pass `WORKER_COUNT: <int>` as an environment variable. For more information about `WORKER_COUNT`, see [Azure Machine Learning Inference Server Parameters](how-to-inference-server-http.md#review-inference-server-parameters) <br><br> **Note:** Set to the number of requests that your model can process concurrently on a single node. Setting this value higher than your model's actual concurrency can lead to higher latencies. Setting this value too low might lead to under utilized nodes. Setting too low might also result in requests being rejected with a 429 HTTP status code, as the system will opt to fail fast. For more information, see [Troubleshooting online endpoints: HTTP status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes). | `1` |
 | `max_queue_wait_ms` | integer | (Deprecated) The maximum amount of time in milliseconds a request will stay in the queue. (Now increase `request_timeout_ms` to account for any networking/queue delays) | `500` |
 
 ### ProbeSettings
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -953,7 +953,7 @@ items:
           href: how-to-monitor-online-endpoints.md
         - name: Debug online endpoints locally VS Code
           href: how-to-debug-managed-online-endpoints-visual-studio-code.md
-        - name: Debug scoring script with inference HTTP server
+        - name: Debug scoring scripts with inference HTTP server
           href: how-to-inference-server-http.md
         - name: Troubleshoot online endpoints
           href: how-to-troubleshoot-online-endpoints.md