Skip to content

Commit e73be9a

Browse files
committed
update pf deploy
1 parent a15bb6a commit e73be9a

File tree

2 files changed

+31
-8
lines changed

2 files changed

+31
-8
lines changed

articles/machine-learning/prompt-flow/how-to-deploy-to-code.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,9 @@ environment_variables:
371371
my_connection: <override_connection_name>
372372
```
373373

374+
If you want to override a specific field of the connection, you can override by adding environment variables with naming pattern `<connection_name>_<field_name>`. For example, if your flow uses a connection named `my_connection` with a configuration key called `chat_deployment_name`, the serving backend will attempt to retrieve `chat_deployment_name` from the environment variable 'MY_CONNECTION_CHAT_DEPLOYMENT_NAME' by default. If the environment variable is not set, it will use the original value from the flow definition.
375+
376+
374377
**Option 2**: override by referring to asset
375378

376379
```yaml
@@ -461,7 +464,7 @@ environment_variables:
461464
While tuning above parameters, you need to monitor the following metrics to ensure optimal performance and stability:
462465
- Instance CPU/Memory utilization of this deployment
463466
- Non-200 responses (4xx, 5xx)
464-
- If you receive a 429 response, this typically indicates that you need to either re-tune your concurrency settings following the above guide or scale your deployment.
467+
- If you receive a 429 response, this typically indicates that you need to either retune your concurrency settings following the above guide or scale your deployment.
465468
- Azure OpenAI throttle status
466469

467470
### Monitor endpoints
@@ -497,6 +500,16 @@ request_settings:
497500
request_timeout_ms: 300000
498501
```
499502

503+
> [!NOTE]
504+
>
505+
> 300,000 ms timeout only works for maanged online deployments from prompt flow. You need to make sure that you have added properties for your model as below (either inline model specification in the deployment yaml or standalone model specification yaml) to indicate this is a deployment from prompt flow.
506+
507+
```yaml
508+
properties:
509+
# indicate a deployment from prompt flow
510+
azureml.promptflow.source_flow_id: <value>
511+
```
512+
500513
## Next steps
501514

502515
- Learn more about [managed online endpoint schema](../reference-yaml-endpoint-online.md) and [managed online deployment schema](../reference-yaml-deployment-managed-online.md).

articles/machine-learning/prompt-flow/how-to-enable-streaming-mode.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ To learn how to deploy your flow as an online endpoint, see [Deploy a flow to o
7474

7575
> [!NOTE]
7676
>
77-
> Deploy with Runtime environment version later than version `20230710.v2`.
77+
> Deploy with Runtime environment version later than version `20230816.v10`.
7878
7979
You can check your runtime version and update runtime in the run time detail page.
8080

@@ -258,17 +258,27 @@ If the response code is "424 Model Error", it means that the error is caused by
258258

259259
### Consume using Python
260260

261+
In this sample usage, we are using the `SSEClient` class. This class is not a built-in Python class and needs to be installed separately. You can install it via pip:
262+
263+
```bash
264+
pip install sseclient-py
265+
```
266+
261267
A sample usage would like:
262268

263269
```python
270+
import requests
271+
from sseclient import SSEClient
272+
from requests.exceptions import HTTPError
273+
264274
try:
265275
response = requests.post(url, json=body, headers=headers, stream=stream)
266276
response.raise_for_status()
267277

268278
content_type = response.headers.get('Content-Type')
269279
if "text/event-stream" in content_type:
270-
event_stream = EventStream(response.iter_lines())
271-
for event in event_stream:
280+
client = SSEClient(response)
281+
for event in client.events():
272282
# Handle event, i.e. print to stdout
273283
else:
274284
# Handle json response
@@ -279,15 +289,15 @@ except HTTPError:
279289

280290
### Consume using JavaScript
281291

282-
There are several libraries to consume server-sent events in JavaScript. For example, this is the [sse.js library](https://www.npmjs.com/package/sse.js?activeTab=code).
292+
There are several libraries to consume server-sent events in JavaScript. Here is [one of them as an example](https://www.npmjs.com/package/sse.js?activeTab=code).
283293

284294
## A sample chat app using Python
285295

286-
Here's a sample chat app written in Python.
296+
[Here's a sample chat app written in Python](https://github.com/microsoft/promptflow/blob/main/docs/media/how-to-guides/how-to-enable-streaming-mode/scripts/chat_app.py).
287297

288298
:::image type="content" source="./media/how-to-enable-streaming-mode/chat-app.gif" alt-text="Gif a sample chat app using Python."lightbox ="./media/how-to-enable-streaming-mode/chat-app.gif":::
289299

290-
## Advance usage - hybrid stream and non-stream flow output
300+
## Advanced usage - hybrid stream and non-stream flow output
291301

292302
Sometimes, you might want to get both stream and non-stream results from a flow output. For example, in the “Chat with Wikipedia” flow, you might want to get not only LLM’s answer, but also the list of URLs that the flow searched. To do this, you need to modify the flow to output a combination of stream LLM’s answer and non-stream URL list.
293303

@@ -297,7 +307,7 @@ In the sample "Chat With Wikipedia" flow, the output is connected to the LLM nod
297307

298308
The output of the flow will be a non-stream field as the base and a stream field as the delta. Here's an example of request and response.
299309

300-
### Advance usage - 0. The client sends a message to the server
310+
### Advanced usage - 0. The client sends a message to the server
301311

302312
```JSON
303313
POST https://<your-endpoint>.inference.ml.azure.com/score

0 commit comments

Comments
 (0)