remove old slm doc

cephalin · cephalin · commit 2e224e2ceca6 · 2025-05-07T11:33:39.000+02:00
diff --git a/articles/app-service/includes/tutorial-ai-slm/faq.md b/articles/app-service/includes/tutorial-ai-slm/faq.md
@@ -0,0 +1,56 @@
+---
+author: cephalin
+ms.service: azure-app-service
+ms.topic: include
+ms.date: 05/07/2025
+ms.author: cephalin
+---
+
+## Frequently asked questions
+
+## How does pricing tier affect the performance of the SLM sidecar?
+
+Since AI models consume considerable resources, choose the pricing tier that gives you sufficient vCPUs and memory to run your specific model. For this reason, the built-in AI sidecar extensions only appear when the app is in a suitable pricing tier. If you build your own SLM sidecar container, you should also use a CPU-optimized model, since the App Service pricing tiers are CPU-only tiers.
+
+For example, the [Phi-3 mini model with a 4K context length from Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) is designed to run with limited resources and provides strong math and logical reasoning for many common scenarios. It also comes with a CPU-optimized version. In App Service, we tested the model on all premium tiers and found it to perform well in the [P2mv3](https://azure.microsoft.com/pricing/details/app-service/linux/) tier or higher. If your requirements allow, you can run it on a lower tier.
+
+### How use my own SLM sidecar?
+
+The sample respository contains a sample SLM container that you can use as a sidecar. It runs a FastAPI application that listens on port 8000, as specified in its [Dockerfile](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/bring_your_own_slm/src/phi-3-sidecar/Dockerfile). The application uses [ONNX Runtime](https://onnxruntime.ai/docs/) to load the Phi-3 model, then forwards the HTTP POST data to the model and streams the response from the model back to the client. For more information, see [model_api.py](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/src/phi-3-sidecar/model_api.py).
+
+To build the sidecar image yourself, you need to install Docker Desktop locally on your machine.
+
+1. Clone the repository locally.
+
+    ```bash
+    git clone https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar
+    cd ai-slm-in-app-service-sidecar
+    ```
+
+1. Change into the Phi-3 image's source directory and download the model locally using the [Huggingface CLI](https://huggingface.co/docs/huggingface_hub/guides/cli).
+
+    ```bash
+    cd bring_your_own_slm/src/phi-3-sidecar
+    huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --local-dir ./Phi-3-mini-4k-instruct-onnx
+    ```
+    
+    The [Dockerfile](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/src/phi-3-sidecar/Dockerfile) is configured to copy the model from *./Phi-3-mini-4k-instruct-onnx*.
+    
+1. Build the Docker image. For example:
+
+    ```bash
+    docker build --tag phi-3 .
+    ```
+
+1. Upload the built image to Azure Container Registry with [Push your first image to your Azure container registry using the Docker CLI](/azure/container-registry/container-registry-get-started-docker-cli).
+
+1. In the **Deployment Center** > **Containers (new)** tab, select **Add** > **Custom container** and configure the new container as follows:
+    - **Name**: *phi-3*
+    - **Image source**: **Azure Container Registry**
+    - **Registry**: your registry
+    - **Image**: the uploaded image
+    - **Tag**: the image tag you want
+    - **Port**: *8000*
+1. Select **Apply**.
+
+See [bring_your_own_slm/src/webapp](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/bring_your_own_slm/src/webapp) for a sample application that interacts with this custom sidecar container.
diff --git a/articles/app-service/includes/tutorial-ai-slm/phi-3-extension-create-test.md b/articles/app-service/includes/tutorial-ai-slm/phi-3-extension-create-test.md
@@ -0,0 +1,33 @@
+---
+author: cephalin
+ms.service: azure-app-service
+ms.topic: include
+ms.date: 05/07/2025
+ms.author: cephalin
+---
+
+## Add the Phi-3 sidecar extension
+
+In this section, you add the Phi-3 sidecar extension to your ASP.NET Core application hosted on Azure App Service.
+
+1. Navigate to the Azure portal and go to your app's management page.
+2. In the left-hand menu, select **Deployment** > **Deployment Center**.
+3. On the **Containers** tab, select **Add** > **Sidecar extension**.
+4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**.
+5. Provide a name for the sidecar extension.
+6. Select **Save** to apply the changes.
+7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**.
+
+This Phi-3 sidecar extension uses a [chat completion API like OpenAI](https://platform.openai.com/docs/api-reference/chat/create) that can respond to chat completion response at `http://localhost:11434/v1/chat/completions`. For more information on how to interact with the API, see:
+
+- [OpenAI documentation: Create chat completion](https://platform.openai.com/docs/api-reference/chat/create)
+- [OpenAI documentation: Streaming](https://platform.openai.com/docs/api-reference/chat-streaming)
+
+## Test the chatbot
+
+1. In your app's management page, in the left-hand menu, select **Overview**.
+1. Under **Default domain**, select the URL to open your web app in a browser.
+1. Verify that the chatbot application is running and responding to user inputs.
+
+    :::image type="content" source="../media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser.":::
+
diff --git a/articles/app-service/toc.yml b/articles/app-service/toc.yml
@@ -472,8 +472,6 @@ items:
   items:
     - name: Deploy an application that uses OpenAI on App Service
       href: deploy-intelligent-apps.md
-    - name: Run an SLM in sidecar
-      href: tutorial-sidecar-local-small-language-model.md
     - name: Deploy a .NET app with Azure OpenAI and Azure SQL
       href: deploy-intelligent-apps-dotnet-to-azure-sql.md
     - name: Invoke OpenAPI app from Azure AI Agent
diff --git a/articles/app-service/tutorial-ai-slm-dotnet.md b/articles/app-service/tutorial-ai-slm-dotnet.md
@@ -26,7 +26,7 @@ Hosting your own small language model (SLM) offers several advantages:
 
 ## Deploy the sample application
 
-1. In the browser, navigate to the [sample application repository](https://github.com/cephalin/sidecar-samples).
+1. In the browser, navigate to the [sample application repository](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar).
 2. Start a new Codespace from the repository.
 1. Log in with your Azure account:
 
@@ -37,37 +37,19 @@ Hosting your own small language model (SLM) offers several advantages:
 1. Open the terminal in the Codespace and run the following commands:
 
     ```azurecli
-    cd dotnetapp
+    cd use_sidecar_extension/dotnetapp
     az webapp up --sku P3MV3 --os-type linux
     ```
 
 This startup command is a common setup for deploying ASP.NET Core applications to Azure App Service. For more information, see [Quickstart: Deploy an ASP.NET web app](quickstart-dotnetcore.md).
 
-## Add the Phi-3 sidecar extension
-
-In this section, you add the Phi-3 sidecar extension to your ASP.NET Core application hosted on Azure App Service.
-
-1. Navigate to the Azure portal and go to your app's management page.
-2. In the left-hand menu, select **Deployment** > **Deployment Center**.
-3. On the **Containers** tab, select **Add** > **Sidecar extension**.
-4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**.
-5. Provide a name for the sidecar extension.
-6. Select **Save** to apply the changes.
-7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**.
-
-## Test the chatbot
-
-1. In your app's management page, in the left-hand menu, select **Overview**.
-1. Under **Default domain**, select the URL to open your web app in a browser.
-1. Verify that the chatbot application is running and responding to user inputs.
-
-    :::image type="content" source="media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser.":::
+[!INCLUDE [phi-3-extension-create-test](includes/tutorial-ai-slm/phi-3-extension-create-test.md)]
 
 ## How the sample application works
 
 The sample application demonstrates how to integrate a .NET service with the SLM sidecar extension. The `SLMService` class encapsulates the logic for sending requests to the SLM API and processing the streamed responses. This integration enables the application to generate conversational responses dynamically.
 
-Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/dotnetapp/Services/SLMService.cs, you see that:
+Looking in [use_sidecar_extension/dotnetapp/Services/SLMService.cs](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/use_sidecar_extension/dotnetapp/Services/SLMService.cs), you see that:
 
 - The service reads the URL from `fashion.assistant.api.url`, which is set in *appsettings.json* and has the value of `http://localhost:11434/v1/chat/completions`.
 
@@ -78,6 +60,7 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/dotnetapp/
         _apiUrl = configuration["FashionAssistantAPI:Url"] ?? "httpL//localhost:11434";
     }
     ```
+
 - The POST payload includes the system message and the prompt that's built from the selected product and the user query.
 
     ```csharp
@@ -119,4 +102,8 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/dotnetapp/
     }
     ```
 
+[!INCLUDE [faq](includes/tutorial-ai-slm/faq.md)]
+
 ## Next steps
+
+[Tutorial: Configure a sidecar container for a Linux app in Azure App Service](tutorial-sidecar.md)
diff --git a/articles/app-service/tutorial-ai-slm-expressjs.md b/articles/app-service/tutorial-ai-slm-expressjs.md
@@ -26,7 +26,7 @@ Hosting your own small language model (SLM) offers several advantages:
 
 ## Deploy the sample application
 
-1. In the browser, navigate to the [sample application repository](https://github.com/cephalin/sidecar-samples).
+1. In the browser, navigate to the [sample application repository](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar).
 2. Start a new Codespace from the repository.
 1. Log in with your Azure account:
 
@@ -37,43 +37,26 @@ Hosting your own small language model (SLM) offers several advantages:
 1. Open the terminal in the Codespace and run the following commands:
 
     ```azurecli
-    cd expressapp
+    cd use_sidecar_extension/expressapp
     az webapp up --sku P3MV3
     ```
 
 This startup command is a common setup for deploying Express.js applications to Azure App Service. For more information, see [Deploy a Node.js web app in Azure](quickstart-nodejs.md).
 
-## Add the Phi-3 sidecar extension
-
-In this section, you add the Phi-3 sidecar extension to your Express.js application hosted on Azure App Service.
-
-1. Navigate to the Azure portal and go to your app's management page.
-2. In the left-hand menu, select **Deployment** > **Deployment Center**.
-3. On the **Containers** tab, select **Add** > **Sidecar extension**.
-4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**.
-5. Provide a name for the sidecar extension.
-6. Select **Save** to apply the changes.
-7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**.
-
-## Test the chatbot
-
-1. In your app's management page, in the left-hand menu, select **Overview**.
-1. Under **Default domain**, select the URL to open your web app in a browser.
-1. Verify that the chatbot application is running and responding to user inputs.
-
-    :::image type="content" source="media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser.":::
+[!INCLUDE [phi-3-extension-create-test](includes/tutorial-ai-slm/phi-3-extension-create-test.md)]
 
 ## How the sample application works
 
 The sample application demonstrates how to integrate a Express.js-based service with the SLM sidecar extension. The `SLMService` class encapsulates the logic for sending requests to the SLM API and processing the streamed responses. This integration enables the application to generate conversational responses dynamically.
 
-Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/expressapp/src/services/slm_service.js, you see that:
+Looking in [use_sidecar_extension/expressapp/src/services/slm_service.js](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/use_sidecar_extension/expressapp/src/services/slm_service.js), you see that:
 
 - The service sends a POST request to the SLM endpoint `http://127.0.0.1:11434/v1/chat/completions`.
 
     ```javascript
     this.apiUrl = 'http://127.0.0.1:11434/v1/chat/completions';
     ```
+
 - The POST payload includes the system message and the prompt that's built from the selected product and the user query.
 
     ```javascript
@@ -134,4 +117,8 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/expressapp
     });
     ```
 
+[!INCLUDE [faq](includes/tutorial-ai-slm/faq.md)]
+
 ## Next steps
+
+[Tutorial: Configure a sidecar container for a Linux app in Azure App Service](tutorial-sidecar.md)
diff --git a/articles/app-service/tutorial-ai-slm-fastapi.md b/articles/app-service/tutorial-ai-slm-fastapi.md
@@ -25,7 +25,7 @@ Hosting your own small language model (SLM) offers several advantages:
 
 ## Deploy the sample application
 
-1. In the browser, navigate to the [sample application repository](https://github.com/cephalin/sidecar-samples).
+1. In the browser, navigate to the [sample application repository](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar).
 2. Start a new Codespace from the repository.
 1. Log in with your Azure account:
 
@@ -36,38 +36,20 @@ Hosting your own small language model (SLM) offers several advantages:
 1. Open the terminal in the Codespace and run the following commands:
 
     ```azurecli
-    cd fastapiapp
+    cd use_sidecar_extension/fastapiapp
     az webapp up --sku P3MV3
     az webapp config set --startup-file "gunicorn -w 4 -k uvicorn.workers.UvicornWorker app.main:app"
     ```
 
 This startup command is a common setup for deploying FastAPI applications to Azure App Service. For more information, see [Quickstart: Deploy a Python (Django, Flask, or FastAPI) web app to Azure App Service](quickstart-python.md).
 
-## Add the Phi-3 sidecar extension
-
-In this section, you add the Phi-3 sidecar extension to your FastAPI application hosted on Azure App Service.
-
-1. Navigate to the Azure portal and go to your app's management page.
-2. In the left-hand menu, select **Deployment** > **Deployment Center**.
-3. On the **Containers** tab, select **Add** > **Sidecar extension**.
-4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**.
-5. Provide a name for the sidecar extension.
-6. Select **Save** to apply the changes.
-7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**.
-
-## Test the chatbot
-
-1. In your app's management page, in the left-hand menu, select **Overview**.
-1. Under **Default domain**, select the URL to open your web app in a browser.
-1. Verify that the chatbot application is running and responding to user inputs.
-
-    :::image type="content" source="media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser.":::
+[!INCLUDE [phi-3-extension-create-test](includes/tutorial-ai-slm/phi-3-extension-create-test.md)]
 
 ## How the sample application works
 
 The sample application demonstrates how to integrate a FastAPI-based service with the SLM sidecar extension. The `SLMService` class encapsulates the logic for sending requests to the SLM API and processing the streamed responses. This integration enables the application to generate conversational responses dynamically.
 
-Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/fastapiapp/app/services/slm_service.py, you see that:
+Looking in [use_sidecar_extension/fastapiapp/app/services/slm_service.py](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/use_sidecar_extension/fastapiapp/app/services/slm_service.py), you see that:
 
 - The service sends a POST request to the SLM endpoint `http://localhost:11434/v1/chat/completions`.
 
@@ -116,4 +98,8 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/fastapiapp
                             yield content
     ```
 
+[!INCLUDE [faq](includes/tutorial-ai-slm/faq.md)]
+
 ## Next steps
+
+[Tutorial: Configure a sidecar container for a Linux app in Azure App Service](tutorial-sidecar.md)
diff --git a/articles/app-service/tutorial-ai-slm-spring-boot.md b/articles/app-service/tutorial-ai-slm-spring-boot.md
@@ -26,7 +26,7 @@ Hosting your own small language model (SLM) offers several advantages:
 
 ## Deploy the sample application
 
-1. In the browser, navigate to the [sample application repository](https://github.com/cephalin/sidecar-samples).
+1. In the browser, navigate to the [sample application repository](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar).
 2. Start a new Codespace from the repository.
 1. Log in with your Azure account:
 
@@ -37,36 +37,18 @@ Hosting your own small language model (SLM) offers several advantages:
 1. Open the terminal in the Codespace and run the following commands:
 
     ```azurecli
-    cd springapp
+    cd use_sidecar_extension/springapp
     ./mvnw clean package
     az webapp up --sku P3MV3 --runtime "JAVA:21-java21" --os-type linux
     ```
 
-## Add the Phi-3 sidecar extension
-
-In this section, you add the Phi-3 sidecar extension to your FastAPI application hosted on Azure App Service.
-
-1. Navigate to the Azure portal and go to your app's management page.
-2. In the left-hand menu, select **Deployment** > **Deployment Center**.
-3. On the **Containers** tab, select **Add** > **Sidecar extension**.
-4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**.
-5. Provide a name for the sidecar extension.
-6. Select **Save** to apply the changes.
-7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**.
-
-## Test the chatbot
-
-1. In your app's management page, in the left-hand menu, select **Overview**.
-1. Under **Default domain**, select the URL to open your web app in a browser.
-1. Verify that the chatbot application is running and responding to user inputs.
-
-    :::image type="content" source="media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser.":::
+[!INCLUDE [phi-3-extension-create-test](includes/tutorial-ai-slm/phi-3-extension-create-test.md)]
 
 ## How the sample application works
 
 The sample application demonstrates how to integrate a Java service with the SLM sidecar extension. The `ReactiveSLMService` class encapsulates the logic for sending requests to the SLM API and processing the streamed responses. This integration enables the application to generate conversational responses dynamically.
 
-Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/springapp/src/main/java/com/example/springapp/service/ReactiveSLMService.java, you see that:
+Looking in [use_sidecar_extension/springapp/src/main/java/com/example/springapp/service/ReactiveSLMService.java](https://github.com/Azure-Samples/ai-slm-in-app-service-sidecar/blob/main/use_sidecar_extension/springapp/src/main/java/com/example/springapp/service/ReactiveSLMService.java), you see that:
 
 - The service reads the URL from `fashion.assistant.api.url`, which is set in *application.properties* and has the value of `http://localhost:11434/v1/chat/completions`.
 
@@ -77,6 +59,7 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/springapp/
                 .build();
     }
     ```
+
 - The POST payload includes the system message and the prompt that's built from the selected product and the user query.
 
     ```java
@@ -116,4 +99,8 @@ Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/springapp/
             .map(content -> content.replace(" ", "\u00A0"));
     ```
 
+[!INCLUDE [faq](includes/tutorial-ai-slm/faq.md)]
+
 ## Next steps
+
+[Tutorial: Configure a sidecar container for a Linux app in Azure App Service](tutorial-sidecar.md)
diff --git a/articles/app-service/tutorial-sidecar-local-small-language-model.md b/articles/app-service/tutorial-sidecar-local-small-language-model.md
diff --git a/redirects/.openpublishing.redirection.app-service.json b/redirects/.openpublishing.redirection.app-service.json