Skip to content

Commit 10da527

Browse files
Merge pull request #262583 from PatrickFarley/streaming
Streaming
2 parents d505c91 + a8bb4e0 commit 10da527

File tree

1 file changed

+168
-0
lines changed

1 file changed

+168
-0
lines changed

articles/ai-services/openai/concepts/content-filter.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -628,6 +628,174 @@ For details on the inference REST API endpoints for Azure OpenAI and how to crea
628628
}
629629
```
630630
631+
## Streaming
632+
633+
Azure OpenAI Service includes a content filtering system that works alongside core models. The following section describes the AOAI streaming experience and options in the context of content filters.
634+
635+
### Default
636+
637+
The content filtering system is integrated and enabled by default for all customers. In the default streaming scenario, completion content is buffered, the content filtering system runs on the buffered content, and – depending on content filtering configuration – content is either returned to the user if it does not violate the content filtering policy (Microsoft default or custom user configuration), or it’s immediately blocked which returns a content filtering error, without returning harmful completion content. This process is repeated until the end of the stream. Content was fully vetted according to the content filtering policy before returned to the user. Content is not returned token-by-token in this case, but in “content chunks” of the respective buffer size.
638+
639+
### Asynchronous modified filter
640+
641+
Customers who have been approved for modified content filters can choose Asynchronous Modified Filter as an additional option, providing a new streaming experience. In this case, content filters are run asynchronously, completion content is returned immediately with a smooth token-by-token streaming experience. No content is buffered, the content filters run asynchronously, which allows for zero latency in this context.
642+
643+
> [!NOTE]
644+
> Customers must be aware that while the feature improves latency, it can bring a trade-off in terms of the safety and real-time vetting of smaller sections of model output. Because content filters are run asynchronously, content moderation messages and the content filtering signal in case of a policy violation are delayed, which means some sections of harmful content that would otherwise have been filtered immediately could be displayed to the user.
645+
646+
**Annotations**: Annotations and content moderation messages are continuously returned during the stream. We strongly recommend to consume annotations and implement additional AI content safety mechanisms, such as redacting content or returning additional safety information to the user.
647+
648+
**Content filtering signal**: The content filtering error signal is delayed; in case of a policy violation, it’s returned as soon as it’s available, and the stream is stopped. The content filtering signal is guaranteed within ~1,000-character windows in case of a policy violation.
649+
650+
Approval for Modified Content Filtering is required for access to Streaming – Asynchronous Modified Filter. The application can be found [here](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xURE01NDY1OUhBRzQ3MkQxMUhZSE1ZUlJKTiQlQCN0PWcu). To enable it via Azure OpenAI Studio please follow the instructions [here](/azure/ai-services/openai/how-to/content-filters) to create a new content filtering configuration, and select “Asynchronous Modified Filter” in the Streaming section, as shown in the below screenshot.
651+
652+
### Overview tbd
653+
654+
| | Streaming - Default | Streaming - Asynchronous Modified Filter |
655+
|---|---|---|
656+
|Status |GA |Public Preview |
657+
| Access | Enabled by default, no action needed |Customers approved for Modified Content Filtering can configure directly via Azure OpenAI Studio (as part of a content filtering configuration; applied on deployment-level) |
658+
| Eligibility |All customers |Customers approved for Modified Content Filtering |
659+
|Modality and Availability |Text; all GPT-models |Text; all GPT-models except gpt-4-vision |
660+
|Streaming experience |Content is buffered and returned in chunks |Zero latency (no buffering, filters run asynchronously) |
661+
|Content filtering signal |Immediate filtering signal |Delayed filtering signal (in up to ~1,000 char increments) |
662+
|Content filtering configurations |Supports default and any customer-defined filter setting (including optional models) |Supports default and any customer-defined filter setting (including optional models) |
663+
664+
### Annotations and sample response stream
665+
666+
#### Prompt annotation message
667+
668+
This is the same as default annotations.
669+
670+
```json
671+
data: {
672+
"id": "",
673+
"object": "",
674+
"created": 0,
675+
"model": "",
676+
"prompt_filter_results": [
677+
{
678+
"prompt_index": 0,
679+
"content_filter_results": { ... }
680+
}
681+
],
682+
"choices": [],
683+
"usage": null
684+
}
685+
```
686+
687+
#### Completion token message
688+
689+
Completion messages are forwarded immediately. No moderation is performed first, and no annotations are provided initially.
690+
691+
```json
692+
data: {
693+
"id": "chatcmpl-7rAJvsS1QQCDuZYDDdQuMJVMV3x3N",
694+
"object": "chat.completion.chunk",
695+
"created": 1692905411,
696+
"model": "gpt-35-turbo",
697+
"choices": [
698+
{
699+
"index": 0,
700+
"finish_reason": null,
701+
"delta": {
702+
"content": "Color"
703+
}
704+
}
705+
],
706+
"usage": null
707+
}
708+
```
709+
710+
#### Annotation message
711+
712+
The text field will always be an empty string, indicating no new tokens. Annotations will only be relevant to already-sent tokens. There may be multiple Annotation Messages referring to the same tokens.
713+
714+
“start_offset” and “end_offset” are low-granularity offsets in text (with 0 at beginning of prompt) which the annotation is relevant to.
715+
716+
“check_offset” represents how much text has been fully moderated. It is an exclusive lower bound on the end_offsets of future annotations. It is nondecreasing.
717+
718+
```json
719+
data: {
720+
"id": "",
721+
"object": "",
722+
"created": 0,
723+
"model": "",
724+
"choices": [
725+
{
726+
"index": 0,
727+
"finish_reason": null,
728+
"content_filter_results": { ... },
729+
"content_filter_raw": [ ... ],
730+
"content_filter_offsets": {
731+
"check_offset": 44,
732+
"start_offset": 44,
733+
"end_offset": 198
734+
}
735+
}
736+
],
737+
"usage": null
738+
}
739+
```
740+
741+
742+
### Sample response stream
743+
744+
Below is a real chat completion response using Asynchronous Modified Filter. Note how prompt annotations are not changed; completion tokens are sent without annotations; and new annotation messages are sent without tokens, instead associated with certain content filter offsets.
745+
746+
`{"temperature": 0, "frequency_penalty": 0, "presence_penalty": 1.0, "top_p": 1.0, "max_tokens": 800, "messages": [{"role": "user", "content": "What is color?"}], "stream": true}`
747+
748+
```
749+
data: {"id":"","object":"","created":0,"model":"","prompt_annotations":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}
750+
751+
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null}
752+
753+
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":"Color"}}],"usage":null}
754+
755+
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" is"}}],"usage":null}
756+
757+
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" a"}}],"usage":null}
758+
759+
...
760+
761+
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":44,"start_offset":44,"end_offset":198}}],"usage":null}
762+
763+
...
764+
765+
data: {"id":"chatcmpl-7rCNsVeZy0PGnX3H6jK8STps5nZUY","object":"chat.completion.chunk","created":1692913344,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":"stop","delta":{}}],"usage":null}
766+
767+
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":506,"start_offset":44,"end_offset":571}}],"usage":null}
768+
769+
data: [DONE]
770+
```
771+
772+
### Sample response stream (blocking)
773+
774+
`{"temperature": 0, "frequency_penalty": 0, "presence_penalty": 1.0, "top_p": 1.0, "max_tokens": 800, "messages": [{"role": "user", "content": "Tell me the lyrics to \"Hey Jude\"."}], "stream": true}`
775+
776+
```
777+
data: {"id":"","object":"","created":0,"model":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"choices":[],"usage":null}
778+
779+
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk","created":1699587397,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null}
780+
781+
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk","created":1699587397,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":"Hey"}}],"usage":null}
782+
783+
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk","created":1699587397,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" Jude"}}],"usage":null}
784+
785+
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk","created":1699587397,"model":"gpt-35-turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":","}}],"usage":null}
786+
787+
...
788+
789+
data: {"id":"chatcmpl-8JCbt5d4luUIhYCI7YH4dQK7hnHx2","object":"chat.completion.chunk","created":1699587397,"model":"gpt-35-
790+
791+
turbo","choices":[{"index":0,"finish_reason":null,"delta":{"content":" better"}}],"usage":null}
792+
793+
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}},"content_filter_offsets":{"check_offset":65,"start_offset":65,"end_offset":1056}}],"usage":null}
794+
795+
data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_reason":"content_filter","content_filter_results":{"protected_material_text":{"detected":true,"filtered":true}},"content_filter_offsets":{"check_offset":65,"start_offset":65,"end_offset":1056}}],"usage":null}
796+
797+
data: [DONE]
798+
```
631799
## Best practices
632800
633801
As part of your application design, consider the following best practices to deliver a positive experience with your application while minimizing potential harms:

0 commit comments

Comments
 (0)