You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-deploy-to-code.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -371,6 +371,9 @@ environment_variables:
371
371
my_connection: <override_connection_name>
372
372
```
373
373
374
+
If you want to override a specific field of the connection, you can override by adding environment variables with naming pattern `<connection_name>_<field_name>`. For example, if your flow uses a connection named `my_connection` with a configuration key called `chat_deployment_name`, the serving backend will attempt to retrieve `chat_deployment_name` from the environment variable 'MY_CONNECTION_CHAT_DEPLOYMENT_NAME' by default. If the environment variable is not set, it will use the original value from the flow definition.
375
+
376
+
374
377
**Option 2**: override by referring to asset
375
378
376
379
```yaml
@@ -461,7 +464,7 @@ environment_variables:
461
464
While tuning above parameters, you need to monitor the following metrics to ensure optimal performance and stability:
462
465
- Instance CPU/Memory utilization of this deployment
463
466
- Non-200 responses (4xx, 5xx)
464
-
- If you receive a 429 response, this typically indicates that you need to either re-tune your concurrency settings following the above guide or scale your deployment.
467
+
- If you receive a 429 response, this typically indicates that you need to either retune your concurrency settings following the above guide or scale your deployment.
465
468
- Azure OpenAI throttle status
466
469
467
470
### Monitor endpoints
@@ -497,6 +500,16 @@ request_settings:
497
500
request_timeout_ms: 300000
498
501
```
499
502
503
+
> [!NOTE]
504
+
>
505
+
> 300,000 ms timeout only works for maanged online deployments from prompt flow. You need to make sure that you have added properties for your model as below (either inline model specification in the deployment yaml or standalone model specification yaml) to indicate this is a deployment from prompt flow.
506
+
507
+
```yaml
508
+
properties:
509
+
# indicate a deployment from prompt flow
510
+
azureml.promptflow.source_flow_id: <value>
511
+
```
512
+
500
513
## Next steps
501
514
502
515
- Learn more about [managed online endpoint schema](../reference-yaml-endpoint-online.md) and [managed online deployment schema](../reference-yaml-deployment-managed-online.md).
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-enable-streaming-mode.md
+17-7Lines changed: 17 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ To learn how to deploy your flow as an online endpoint, see [Deploy a flow to o
74
74
75
75
> [!NOTE]
76
76
>
77
-
> Deploy with Runtime environment version later than version `20230710.v2`.
77
+
> Deploy with Runtime environment version later than version `20230816.v10`.
78
78
79
79
You can check your runtime version and update runtime in the run time detail page.
80
80
@@ -258,17 +258,27 @@ If the response code is "424 Model Error", it means that the error is caused by
258
258
259
259
### Consume using Python
260
260
261
+
In this sample usage, we are using the `SSEClient` class. This class is not a built-in Python class and needs to be installed separately. You can install it via pip:
There are several libraries to consume server-sent events in JavaScript. For example, this is the [sse.js library](https://www.npmjs.com/package/sse.js?activeTab=code).
292
+
There are several libraries to consume server-sent events in JavaScript. Here is [one of them as an example](https://www.npmjs.com/package/sse.js?activeTab=code).
283
293
284
294
## A sample chat app using Python
285
295
286
-
Here's a sample chat app written in Python.
296
+
[Here's a sample chat app written in Python](https://github.com/microsoft/promptflow/blob/main/docs/media/how-to-guides/how-to-enable-streaming-mode/scripts/chat_app.py).
287
297
288
298
:::image type="content" source="./media/how-to-enable-streaming-mode/chat-app.gif" alt-text="Gif a sample chat app using Python."lightbox ="./media/how-to-enable-streaming-mode/chat-app.gif":::
289
299
290
-
## Advance usage - hybrid stream and non-stream flow output
300
+
## Advanced usage - hybrid stream and non-stream flow output
291
301
292
302
Sometimes, you might want to get both stream and non-stream results from a flow output. For example, in the “Chat with Wikipedia” flow, you might want to get not only LLM’s answer, but also the list of URLs that the flow searched. To do this, you need to modify the flow to output a combination of stream LLM’s answer and non-stream URL list.
293
303
@@ -297,7 +307,7 @@ In the sample "Chat With Wikipedia" flow, the output is connected to the LLM nod
297
307
298
308
The output of the flow will be a non-stream field as the base and a stream field as the delta. Here's an example of request and response.
299
309
300
-
### Advance usage - 0. The client sends a message to the server
310
+
### Advanced usage - 0. The client sends a message to the server
301
311
302
312
```JSON
303
313
POST https://<your-endpoint>.inference.ml.azure.com/score
0 commit comments