Question / Feature Clarification: Impact of Asynchronous Content Filtering on stream quality in Azure AI Foundry #226
Unanswered
Messatsu92
asked this question in
Get Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We have a question regarding the Content Filters within Azure AI Foundry, specifically around the Asynchronous Filtering mode used for streaming responses.
According to Microsoft’s documentation on asynchronous filtering, the filter evaluates text progressively as tokens are generated by the model, rather than blocking until the entire completion is available. While this enables real-time streaming, it raises an important question about potential degradation of content filtering quality.
Our concern is the following:
Because the asynchronous filter analyzes partial outputs, some tokens may be displayed before the full context of the sentence or paragraph is understood.
In certain cases, harmful or disallowed content might appear momentarily before being flagged.
The documentation does not clearly state the level of risk or mitigation strategy for this behavior, making it difficult for compliance and security teams to decide whether to enable this mode by default.
We would like to understand Microsoft’s official posture on this trade-off between streaming performance and content moderation completeness.
Current Workaround
Currently, our teams are evaluating whether to disable asynchronous filtering in streaming use cases until more clarity is available.
This approach ensures that the entire content is reviewed by the filter before being sent to end users, but it introduces latency and reduces the real-time experience.
In practice, this means:
We either use synchronous filtering to guarantee full content moderation coverage,
Or, when streaming is required, we apply additional post-processing filters on the client side to catch potential issues missed by the asynchronous filter.
This workaround helps mitigate risk but is not ideal — it increases complexity and slows down the user experience.
Desired Outcome
We would like Microsoft to clarify or enhance the documentation regarding asynchronous content filtering behavior in streaming mode, particularly:
How reliable is the filter in detecting harmful or policy-violating content when evaluating partial token streams?
Are there safeguards or thresholds to prevent partial harmful text from being displayed before being blocked?
What is Microsoft’s recommended default posture for enterprises with strict compliance requirements — synchronous or asynchronous filtering?
Requested solution:
Provide detailed technical guidance or metrics on the accuracy and completeness of asynchronous filtering.
Optionally, expose configuration options that allow customers to tune the filtering aggressiveness (e.g., stricter early blocking policies during streaming).
This clarification would help compliance and security teams make informed decisions about enabling or disabling asynchronous filtering in production environments.
Beta Was this translation helpful? Give feedback.
All reactions