|
| 1 | +```markdown |
| 2 | +--- |
| 3 | +title: "Tutorial: Express.js chatbot with SLM extension" |
| 4 | +description: "Learn how to deploy a Express.js application integrated with a Phi-3 sidecar extension on Azure App Service." |
| 5 | +author: "cephalin" |
| 6 | +ms.author: "cephalin" |
| 7 | +ms.date: "2025-05-06" |
| 8 | +ms.topic: tutorial |
| 9 | +ms.service: app-service |
| 10 | +--- |
| 11 | + |
| 12 | +# Tutorial: Run chatbot in App Service with a Phi-3 sidecar extension (Express.js) |
| 13 | + |
| 14 | +This tutorial guides you through deploying a Express.js-based chatbot application integrated with the Phi-3 sidecar extension on Azure App Service. By following the steps, you'll learn how to set up a scalable web app, add an AI-powered sidecar for enhanced conversational capabilities, and test the chatbot's functionality. |
| 15 | + |
| 16 | +Hosting your own small language model (SLM) offers several advantages: |
| 17 | + |
| 18 | +- By hosting the model yourself, you maintain full control over your data. This ensures sensitive information is not exposed to third-party services, which is critical for industries with strict compliance requirements. |
| 19 | +- Self-hosted models can be fine-tuned to meet specific use cases or domain-specific requirements. |
| 20 | +- Hosting the model close to your application or users minimizes network latency, resulting in faster response times and a better user experience. |
| 21 | +- You can scale the deployment based on your specific needs and have full control over resource allocation, ensuring optimal performance for your application. |
| 22 | +- Hosting your own model allows for greater flexibility in experimenting with new features, architectures, or integrations without being constrained by third-party service limitations. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +- An [Azure account](https://azure.microsoft.com/free/) with an active subscription. |
| 27 | +- A [GitHub account](https://github.com/). |
| 28 | + |
| 29 | +## Deploy the sample application |
| 30 | + |
| 31 | +1. In the browser, navigate to the [sample application repository](https://github.com/cephalin/sidecar-samples). |
| 32 | +2. Start a new Codespace from the repository. |
| 33 | +1. Log in with your Azure account: |
| 34 | + |
| 35 | + ```azurecli |
| 36 | + az login |
| 37 | + ``` |
| 38 | + |
| 39 | +1. Open the terminal in the Codespace and run the following commands: |
| 40 | + |
| 41 | + ```azurecli |
| 42 | + cd expressapp |
| 43 | + az webapp up --sku P3MV3 |
| 44 | + ``` |
| 45 | + |
| 46 | +This startup command is a common setup for deploying Express.js applications to Azure App Service. For more information, see [Deploy a Node.js web app in Azure](quickstart-nodejs.md). |
| 47 | + |
| 48 | +## Add the Phi-3 sidecar extension |
| 49 | + |
| 50 | +In this section, you add the Phi-3 sidecar extension to your Express.js application hosted on Azure App Service. |
| 51 | + |
| 52 | +1. Navigate to the Azure portal and go to your app's management page. |
| 53 | +2. In the left-hand menu, select **Deployment** > **Deployment Center**. |
| 54 | +3. On the **Containers** tab, select **Add** > **Sidecar extension**. |
| 55 | +4. In the sidecar extension options, select **AI: phi-3-mini-4k-instruct-q4-gguf (Experimental)**. |
| 56 | +5. Provide a name for the sidecar extension. |
| 57 | +6. Select **Save** to apply the changes. |
| 58 | +7. Wait a few minutes for the sidecar extension to deploy. Keep selecting **Refresh** until the **Status** column shows **Running**. |
| 59 | + |
| 60 | +## Test the chatbot |
| 61 | + |
| 62 | +1. In your app's management page, in the left-hand menu, select **Overview**. |
| 63 | +1. Under **Default domain**, select the URL to open your web app in a browser. |
| 64 | +1. Verify that the chatbot application is running and responding to user inputs. |
| 65 | + |
| 66 | + :::image type="content" source="media/tutorial-ai-slm-dotnet/fashion-store-assistant-live.png" alt-text="screenshot showing the fashion assistant app running in the browser."::: |
| 67 | + |
| 68 | +## How the sample application works |
| 69 | + |
| 70 | +The sample application demonstrates how to integrate a Express.js-based service with the SLM sidecar extension. The `SLMService` class encapsulates the logic for sending requests to the SLM API and processing the streamed responses. This integration enables the application to generate conversational responses dynamically. |
| 71 | + |
| 72 | +Looking in https://github.com/cephalin/sidecar-samples/blob/webstacks/expressapp/src/services/slm_service.js, you see that: |
| 73 | + |
| 74 | +- The service sends a POST request to the SLM endpoint `http://127.0.0.1:11434/v1/chat/completions`. |
| 75 | + |
| 76 | + ```javascript |
| 77 | + this.apiUrl = 'http://127.0.0.1:11434/v1/chat/completions'; |
| 78 | + ``` |
| 79 | +- The POST payload includes the system message and the prompt that's built from the selected product and the user query. |
| 80 | + |
| 81 | + ```javascript |
| 82 | + const requestPayload = { |
| 83 | + messages: [ |
| 84 | + { role: 'system', content: 'You are a helpful assistant.' }, |
| 85 | + { role: 'user', content: prompt } |
| 86 | + ], |
| 87 | + stream: true, |
| 88 | + cache_prompt: false, |
| 89 | + n_predict: 2048 // Increased token limit to allow longer responses |
| 90 | + }; |
| 91 | + ``` |
| 92 | + |
| 93 | +- The POST request streams the response line by line. Each line is parsed to extract the generated content (or token). |
| 94 | + |
| 95 | + ```javascript |
| 96 | + // Set up Server-Sent Events headers |
| 97 | + res.setHeader('Content-Type', 'text/event-stream'); |
| 98 | + res.setHeader('Cache-Control', 'no-cache'); |
| 99 | + res.setHeader('Connection', 'keep-alive'); |
| 100 | + res.flushHeaders(); |
| 101 | + |
| 102 | + const response = await axios.post(this.apiUrl, requestPayload, { |
| 103 | + headers: { 'Content-Type': 'application/json' }, |
| 104 | + responseType: 'stream' |
| 105 | + }); |
| 106 | + |
| 107 | + response.data.on('data', (chunk) => { |
| 108 | + const lines = chunk.toString().split('\n').filter(line => line.trim() !== ''); |
| 109 | + |
| 110 | + for (const line of lines) { |
| 111 | + let parsedLine = line; |
| 112 | + if (line.startsWith('data: ')) { |
| 113 | + parsedLine = line.replace('data: ', '').trim(); |
| 114 | + } |
| 115 | + |
| 116 | + if (parsedLine === '[DONE]') { |
| 117 | + return; |
| 118 | + } |
| 119 | + |
| 120 | + try { |
| 121 | + const jsonObj = JSON.parse(parsedLine); |
| 122 | + if (jsonObj.choices && jsonObj.choices.length > 0) { |
| 123 | + const delta = jsonObj.choices[0].delta || {}; |
| 124 | + const content = delta.content; |
| 125 | + |
| 126 | + if (content) { |
| 127 | + // Use non-breaking space to preserve formatting |
| 128 | + const formattedToken = content.replace(/ /g, '\u00A0'); |
| 129 | + res.write(`data: ${formattedToken}\n\n`); |
| 130 | + } |
| 131 | + } |
| 132 | + } catch (parseError) { |
| 133 | + console.warn(`Failed to parse JSON from line: ${parsedLine}`); |
| 134 | + } |
| 135 | + } |
| 136 | + }); |
| 137 | + ``` |
| 138 | + |
| 139 | +## Next steps |
0 commit comments