update pf deploy

likebupt · likebupt · commit e73be9a6ed12 · 2024-06-21T14:48:46.000+08:00
diff --git a/articles/machine-learning/prompt-flow/how-to-deploy-to-code.md b/articles/machine-learning/prompt-flow/how-to-deploy-to-code.md
@@ -371,6 +371,9 @@ environment_variables:
   my_connection: <override_connection_name>
 ```
 
+If you want to override a specific field of the connection, you can override by adding environment variables with naming pattern `<connection_name>_<field_name>`. For example, if your flow uses a connection named `my_connection` with a configuration key called `chat_deployment_name`, the serving backend will attempt to retrieve `chat_deployment_name` from the environment variable 'MY_CONNECTION_CHAT_DEPLOYMENT_NAME' by default. If the environment variable is not set, it will use the original value from the flow definition.
+
+
 **Option 2**: override by referring to asset
 
 ```yaml
@@ -461,7 +464,7 @@ environment_variables:
 While tuning above parameters, you need to monitor the following metrics to ensure optimal performance and stability:
 - Instance CPU/Memory utilization of this deployment
 - Non-200 responses (4xx, 5xx)
-    - If you receive a 429 response, this typically indicates that you need to either re-tune your concurrency settings following the above guide or scale your deployment.
+    - If you receive a 429 response, this typically indicates that you need to either retune your concurrency settings following the above guide or scale your deployment.
 - Azure OpenAI throttle status
 
 ### Monitor endpoints
@@ -497,6 +500,16 @@ request_settings:
   request_timeout_ms: 300000
 ```
 
+> [!NOTE]
+>
+> 300,000 ms timeout only works for maanged online deployments from prompt flow. You need to make sure that you have added properties for your model as below (either inline model specification in the deployment yaml or standalone model specification yaml) to indicate this is a deployment from prompt flow.
+
+```yaml
+properties:
+  # indicate a deployment from prompt flow
+  azureml.promptflow.source_flow_id: <value>
+```
+
 ## Next steps
 
 - Learn more about [managed online endpoint schema](../reference-yaml-endpoint-online.md) and [managed online deployment schema](../reference-yaml-deployment-managed-online.md).
diff --git a/articles/machine-learning/prompt-flow/how-to-enable-streaming-mode.md b/articles/machine-learning/prompt-flow/how-to-enable-streaming-mode.md
@@ -74,7 +74,7 @@ To learn how to deploy your flow as an online endpoint, see  [Deploy a flow to o
 
 > [!NOTE]
 > 
-> Deploy with Runtime environment version later than version `20230710.v2`.
+> Deploy with Runtime environment version later than version `20230816.v10`.
 
 You can check your runtime version and update runtime in the run time detail page.
 
@@ -258,17 +258,27 @@ If the response code is "424 Model Error", it means that the error is caused by
 
 ### Consume using Python
 
+In this sample usage, we are using the `SSEClient` class. This class is not a built-in Python class and needs to be installed separately. You can install it via pip:
+
+```bash
+pip install sseclient-py
+```
+
 A sample usage would like:
 
 ```python
+import requests
+from sseclient import SSEClient
+from requests.exceptions import HTTPError
+
 try:
     response = requests.post(url, json=body, headers=headers, stream=stream)
     response.raise_for_status()
 
     content_type = response.headers.get('Content-Type')
     if "text/event-stream" in content_type:
-        event_stream = EventStream(response.iter_lines())
-        for event in event_stream:
+        client = SSEClient(response)
+        for event in client.events():
             # Handle event, i.e. print to stdout
     else:
         # Handle json response
@@ -279,15 +289,15 @@ except HTTPError:
 
 ### Consume using JavaScript
 
-There are several libraries to consume server-sent events in JavaScript. For example, this is the [sse.js library](https://www.npmjs.com/package/sse.js?activeTab=code).
+There are several libraries to consume server-sent events in JavaScript. Here is [one of them as an example](https://www.npmjs.com/package/sse.js?activeTab=code).
 
 ## A sample chat app using Python
 
-Here's a sample chat app written in Python.
+[Here's a sample chat app written in Python](https://github.com/microsoft/promptflow/blob/main/docs/media/how-to-guides/how-to-enable-streaming-mode/scripts/chat_app.py).
 
 :::image type="content" source="./media/how-to-enable-streaming-mode/chat-app.gif" alt-text="Gif a sample chat app using Python."lightbox ="./media/how-to-enable-streaming-mode/chat-app.gif":::
 
-## Advance usage - hybrid stream and non-stream flow output
+## Advanced usage - hybrid stream and non-stream flow output
 
 Sometimes, you might want to get both stream and non-stream results from a flow output. For example, in the “Chat with Wikipedia” flow, you might want to get not only LLM’s answer, but also the list of URLs that the flow searched. To do this, you need to modify the flow to output a combination of stream LLM’s answer and non-stream URL list.
 
@@ -297,7 +307,7 @@ In the sample "Chat With Wikipedia" flow, the output is connected to the LLM nod
 
 The output of the flow will be a non-stream field as the base and a stream field as the delta. Here's an example of request and response.
 
-### Advance usage - 0. The client sends a message to the server
+### Advanced usage - 0. The client sends a message to the server
 
 ```JSON
 POST https://<your-endpoint>.inference.ml.azure.com/score