Skip to content

Reasoning support for evaluators #42482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
4318329
Prepare evals SDK Release
May 28, 2025
192b980
Fix bug
May 28, 2025
758adb4
Fix for ADV_CONV for FDP projects
May 29, 2025
de09fd1
Update release date
May 29, 2025
ef60fe6
Merge branch 'main' into main
nagkumar91 May 29, 2025
8ca51d0
Merge branch 'Azure:main' into main
nagkumar91 May 30, 2025
98bfc3a
Merge branch 'Azure:main' into main
nagkumar91 Jun 2, 2025
a5f32e8
Merge branch 'Azure:main' into main
nagkumar91 Jun 9, 2025
5fd88b6
Merge branch 'Azure:main' into main
nagkumar91 Jun 10, 2025
51f2b44
Merge branch 'Azure:main' into main
nagkumar91 Jun 10, 2025
a5be8b5
Merge branch 'Azure:main' into main
nagkumar91 Jun 16, 2025
75965b7
Merge branch 'Azure:main' into main
nagkumar91 Jun 25, 2025
d0c5e53
Merge branch 'Azure:main' into main
nagkumar91 Jun 25, 2025
b790276
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
d5ca243
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
8d62e36
re-add pyrit to matrix
Jun 26, 2025
59a70f2
Change grader ids
Jun 26, 2025
4d146d7
Merge branch 'Azure:main' into main
nagkumar91 Jun 26, 2025
f7a4c83
Update unit test
Jun 27, 2025
79e3a40
replace all old grader IDs in tests
Jun 27, 2025
588cbec
Merge branch 'main' into main
nagkumar91 Jun 30, 2025
7514472
Update platform-matrix.json
nagkumar91 Jun 30, 2025
28b2513
Update test to ensure everything is mocked
Jul 1, 2025
8603e0e
tox/black fixes
Jul 1, 2025
895f226
Skip that test with issues
Jul 1, 2025
b4b2daf
Merge branch 'Azure:main' into main
nagkumar91 Jul 1, 2025
023f07f
update grader ID according to API View feedback
Jul 1, 2025
45b5f5d
Update test
Jul 2, 2025
1ccb4db
remove string check for grader ID
Jul 2, 2025
6fd9aa5
Merge branch 'Azure:main' into main
nagkumar91 Jul 2, 2025
f871855
Update changelog and officialy start freeze
Jul 2, 2025
59ac230
update the enum according to suggestions
Jul 2, 2025
794a2c4
update the changelog
Jul 2, 2025
b33363c
Finalize logic
Jul 2, 2025
464e2dd
Merge branch 'Azure:main' into main
nagkumar91 Jul 3, 2025
4585b14
Merge branch 'Azure:main' into main
nagkumar91 Jul 7, 2025
89c2988
Initial plan
Copilot Jul 7, 2025
6805018
Fix client request ID headers in azure-ai-evaluation
Copilot Jul 7, 2025
aad48df
Fix client request ID header format in rai_service.py
Copilot Jul 7, 2025
db75552
Merge pull request #5 from nagkumar91/copilot/fix-4
nagkumar91 Jul 10, 2025
b8eebf3
Merge branch 'Azure:main' into main
nagkumar91 Jul 10, 2025
2899ad4
Merge branch 'Azure:main' into main
nagkumar91 Jul 10, 2025
c431563
Merge branch 'Azure:main' into main
nagkumar91 Jul 17, 2025
79ed63c
Merge branch 'Azure:main' into main
nagkumar91 Jul 18, 2025
a3be3fc
Merge branch 'Azure:main' into main
nagkumar91 Jul 21, 2025
056ac4d
Passing threshold in AzureOpenAIScoreModelGrader
Jul 21, 2025
1779059
Add changelog
Jul 21, 2025
43fecff
Adding the self.pass_threshold instead of pass_threshold
Jul 21, 2025
b0c102b
Merge branch 'Azure:main' into main
nagkumar91 Jul 22, 2025
7bf5f1f
Add the python grader
Jul 22, 2025
3248ad0
Remove redundant test
Jul 22, 2025
d76f59b
Add class to exception list and format code
Jul 23, 2025
4d60e43
Merge branch 'main' into feature/python_grader
nagkumar91 Jul 24, 2025
98d1626
Merge branch 'Azure:main' into main
nagkumar91 Jul 24, 2025
9248c38
Add properties to evaluation upload run for FDP
Jul 24, 2025
74b760f
Remove debug
Jul 24, 2025
23dbc85
Merge branch 'feature/python_grader'
Jul 24, 2025
467ccb6
Remove the redundant property
Jul 24, 2025
c2beee8
Merge branch 'Azure:main' into main
nagkumar91 Jul 24, 2025
be9a19a
Fix changelog
Jul 24, 2025
de3a1e1
Fix the multiple features added section
Jul 24, 2025
f9faa61
removed the properties in update
Jul 24, 2025
69e783a
Merge branch 'Azure:main' into main
nagkumar91 Jul 28, 2025
8ebea2a
Merge branch 'Azure:main' into main
nagkumar91 Jul 31, 2025
3f9c818
Merge branch 'Azure:main' into main
nagkumar91 Aug 1, 2025
3b3159c
Merge branch 'Azure:main' into main
nagkumar91 Aug 5, 2025
d78b834
Merge branch 'Azure:main' into main
nagkumar91 Aug 6, 2025
ae3fc52
Merge branch 'Azure:main' into main
nagkumar91 Aug 8, 2025
19cce75
evaluation: support is_reasoning_model across all prompty-based evalu…
Aug 8, 2025
e59ca7f
evaluation: docs(Preview) + groundedness feature-detection + is_reaso…
Aug 8, 2025
98b4618
evaluation: revert _proxy_completion_model.py to origin/main version
Aug 8, 2025
706c042
Merge branch 'Azure:main' into main
nagkumar91 Aug 11, 2025
c418513
Merge remote-tracking branch 'origin/main' into diff-20250811-171736
Aug 12, 2025
86f24ba
Restore files that shouldn't have been modified
Aug 12, 2025
a1e55b4
Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evalua…
nagkumar91 Aug 12, 2025
bd6809f
Update the groundedness based on comments
Aug 12, 2025
3ae37cb
Add changelog to bug fix and link issue
Aug 12, 2025
6b8d4ce
Fix docstring
Aug 12, 2025
733ee1a
lint fixes
Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,19 @@
### Breaking Changes

### Features Added

- Added support for user-supplied tags in the `evaluate` function. Tags are key-value pairs that can be used for experiment tracking, A/B testing, filtering, and organizing evaluation runs. The function accepts a `tags` parameter.
- Enhanced `GroundednessEvaluator` to support AI agent evaluation with tool calls. The evaluator now accepts agent response data containing tool calls and can extract context from `file_search` tool results for groundedness assessment. This enables evaluation of AI agents that use tools to retrieve information and generate responses. Note: Agent groundedness evaluation is currently supported only when the `file_search` tool is used.

### Bugs Fixed

- [Bug](https://github.com/Azure/azure-sdk-for-python/issues/39909): Added `is_reasoning_model` keyword parameter to all evaluators
(`SimilarityEvaluator`, `RelevanceEvaluator`, `CoherenceEvaluator`, `FluencyEvaluator`,
`RetrievalEvaluator`, `GroundednessEvaluator`, `IntentResolutionEvaluator`,
`ResponseCompletenessEvaluator`, `TaskAdherenceEvaluator`, `ToolCallAccuracyEvaluator`).
When set, evaluator configuration is adjusted appropriately for reasoning models.
`QAEvaluator` now propagates this parameter to its child evaluators.

### Other Changes

## 1.10.0 (2025-07-31)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,22 @@

class CoherenceEvaluator(PromptyEvaluatorBase[Union[str, float]]):
"""
Evaluates coherence score for a given query and response or a multi-turn conversation, including reasoning.
Evaluates coherence for a given query and response or a multi-turn
conversation, including reasoning.

The coherence measure assesses the ability of the language model to generate text that reads naturally,
flows smoothly, and resembles human-like language in its responses. Use it when assessing the readability
and user-friendliness of a model's generated responses in real-world applications.
The coherence measure assesses the model's ability to generate text that
reads naturally, flows smoothly, and resembles human-like language. Use it
when assessing the readability and user-friendliness of responses.

:param model_config: Configuration for the Azure OpenAI model.
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
:type model_config:
Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
~azure.ai.evaluation.OpenAIModelConfiguration]
:param threshold: The threshold for the coherence evaluator. Default is 3.
:type threshold: int
:keyword is_reasoning_model: (Preview) config for chat completions is
updated to use reasoning models
:type is_reasoning_model: bool

.. admonition:: Example:

Expand All @@ -31,7 +36,8 @@ class CoherenceEvaluator(PromptyEvaluatorBase[Union[str, float]]):
:end-before: [END coherence_evaluator]
:language: python
:dedent: 8
:caption: Initialize and call CoherenceEvaluator using azure.ai.evaluation.AzureAIProject
:caption: Initialize and call CoherenceEvaluator using
azure.ai.evaluation.AzureAIProject

.. admonition:: Example using Azure AI Project URL:

Expand All @@ -40,7 +46,8 @@ class CoherenceEvaluator(PromptyEvaluatorBase[Union[str, float]]):
:end-before: [END coherence_evaluator]
:language: python
:dedent: 8
:caption: Initialize and call CoherenceEvaluator using Azure AI Project URL in following format
:caption: Initialize and call CoherenceEvaluator using Azure AI
Project URL in following format
https://{resource_name}.services.ai.azure.com/api/projects/{project_name}

.. admonition:: Example with Threshold:
Expand All @@ -50,23 +57,24 @@ class CoherenceEvaluator(PromptyEvaluatorBase[Union[str, float]]):
:end-before: [END threshold_coherence_evaluator]
:language: python
:dedent: 8
:caption: Initialize with threshold and call a CoherenceEvaluator with a query and response.
:caption: Initialize with threshold and call a CoherenceEvaluator
with a query and response.

.. note::

To align with our support of a diverse set of models, an output key without the `gpt_` prefix has been added.
To maintain backwards compatibility, the old key with the `gpt_` prefix is still be present in the output;
however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
To align with support of diverse models, an output key without the
`gpt_` prefix has been added. The old key with the `gpt_` prefix is
still present for compatibility; however, it will be deprecated.
"""

_PROMPTY_FILE = "coherence.prompty"
_RESULT_KEY = "coherence"

id = "azureai://built-in/evaluators/coherence"
"""Evaluator identifier, experimental and to be used only with evaluation in cloud."""
"""Evaluator identifier, experimental to be used only with cloud evaluation"""
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing a comma. It should read 'Evaluator identifier, experimental, to be used only with cloud evaluation' or 'Evaluator identifier (experimental) to be used only with cloud evaluation'.

Suggested change
"""Evaluator identifier, experimental to be used only with cloud evaluation"""
"""Evaluator identifier, experimental, to be used only with cloud evaluation"""

Copilot uses AI. Check for mistakes.


@override
def __init__(self, model_config, *, threshold=3):
def __init__(self, model_config, *, threshold=3, **kwargs):
current_dir = os.path.dirname(__file__)
prompty_path = os.path.join(current_dir, self._PROMPTY_FILE)
self._threshold = threshold
Expand All @@ -77,6 +85,7 @@ def __init__(self, model_config, *, threshold=3):
result_key=self._RESULT_KEY,
threshold=threshold,
_higher_is_better=self._higher_is_better,
**kwargs,
)

@overload
Expand Down Expand Up @@ -104,9 +113,11 @@ def __call__(
) -> Dict[str, Union[float, Dict[str, List[Union[str, float]]]]]:
"""Evaluate coherence for a conversation

:keyword conversation: The conversation to evaluate. Expected to contain a list of conversation turns under the
key "messages", and potentially a global context under the key "context". Conversation turns are expected
to be dictionaries with keys "content", "role", and possibly "context".
:keyword conversation: The conversation to evaluate. Expected to
contain a list of conversation turns under the key "messages",
and optionally a global context under the key "context". Turns are
dictionaries with keys "content", "role", and possibly
"context".
:paramtype conversation: Optional[~azure.ai.evaluation.Conversation]
:return: The coherence score.
:rtype: Dict[str, Union[float, Dict[str, List[float]]]]
Expand All @@ -118,19 +129,22 @@ def __call__( # pylint: disable=docstring-missing-param
*args,
**kwargs,
):
"""Evaluate coherence. Accepts either a query and response for a single evaluation,
or a conversation for a potentially multi-turn evaluation. If the conversation has more than one pair of
turns, the evaluator will aggregate the results of each turn.
"""Evaluate coherence.

Accepts a query/response for a single evaluation, or a conversation
for a multi-turn evaluation. If the conversation has more than one
pair of turns, results are aggregated.

:keyword query: The query to be evaluated.
:paramtype query: str
:keyword response: The response to be evaluated.
:paramtype response: Optional[str]
:keyword conversation: The conversation to evaluate. Expected to contain a list of conversation turns under the
key "messages". Conversation turns are expected
to be dictionaries with keys "content" and "role".
:keyword conversation: The conversation to evaluate. Expected to
contain conversation turns under the key "messages" as
dictionaries with keys "content" and "role".
:paramtype conversation: Optional[~azure.ai.evaluation.Conversation]
:return: The relevance score.
:rtype: Union[Dict[str, float], Dict[str, Union[float, Dict[str, List[float]]]]]
:rtype: Union[Dict[str, float], Dict[str, Union[float, Dict[str,
List[float]]]]]
"""
return super().__call__(*args, **kwargs)
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
from concurrent.futures import as_completed
from typing import TypeVar, Dict, List

from azure.ai.evaluation._legacy._adapters.tracing import ThreadPoolExecutorWithContext as ThreadPoolExecutor
from azure.ai.evaluation._legacy._adapters.tracing import (
ThreadPoolExecutorWithContext as ThreadPoolExecutor,
)
from typing_extensions import override

from azure.ai.evaluation._evaluators._common import EvaluatorBase
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,17 @@

from azure.ai.evaluation._common.constants import PROMPT_BASED_REASON_EVALUATORS
from azure.ai.evaluation._constants import EVALUATION_PASS_FAIL_MAPPING
from azure.ai.evaluation._exceptions import EvaluationException, ErrorBlame, ErrorCategory, ErrorTarget
from ..._common.utils import construct_prompty_model_config, validate_model_config, parse_quality_evaluator_reason_score
from azure.ai.evaluation._exceptions import (
EvaluationException,
ErrorBlame,
ErrorCategory,
ErrorTarget,
)
from ..._common.utils import (
construct_prompty_model_config,
validate_model_config,
parse_quality_evaluator_reason_score,
)
from . import EvaluatorBase

try:
Expand Down Expand Up @@ -71,7 +80,11 @@ def __init__(
self._prompty_file = prompty_file
self._threshold = threshold
self._higher_is_better = _higher_is_better
super().__init__(eval_last_turn=eval_last_turn, threshold=threshold, _higher_is_better=_higher_is_better)
super().__init__(
eval_last_turn=eval_last_turn,
threshold=threshold,
_higher_is_better=_higher_is_better,
)

subclass_name = self.__class__.__name__
user_agent = f"{UserAgentSingleton().value} (type=evaluator subtype={subclass_name})"
Expand All @@ -82,7 +95,9 @@ def __init__(
)

self._flow = AsyncPrompty.load(
source=self._prompty_file, model=prompty_model_config, is_reasoning_model=self._is_reasoning_model
source=self._prompty_file,
model=prompty_model_config,
is_reasoning_model=self._is_reasoning_model,
)

# __call__ not overridden here because child classes have such varied signatures that there's no point
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,24 @@

class FluencyEvaluator(PromptyEvaluatorBase[Union[str, float]]):
"""
Evaluates the fluency of a given response or a multi-turn conversation, including reasoning.
Evaluates the fluency of a given response or a multi-turn conversation,
including reasoning.

The fluency measure assesses the extent to which the generated text conforms to grammatical rules, syntactic
structures, and appropriate vocabulary usage, resulting in linguistically correct responses.
The fluency measure assesses the extent to which generated text conforms
to grammar, syntax, and appropriate vocabulary, resulting in linguistically
correct responses.

Fluency scores range from 1 to 5, with 1 being the least fluent and 5 being the most fluent.
Fluency scores range from 1 to 5 (1 = least fluent, 5 = most fluent).

:param model_config: Configuration for the Azure OpenAI model.
:type model_config: Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
:type model_config:
Union[~azure.ai.evaluation.AzureOpenAIModelConfiguration,
~azure.ai.evaluation.OpenAIModelConfiguration]
:param threshold: The threshold for the fluency evaluator. Default is 3.
:type threshold: int
:keyword is_reasoning_model: (Preview) config for chat completions is
updated to use reasoning models
:type is_reasoning_model: bool

.. admonition:: Example:

Expand All @@ -51,24 +57,25 @@ class FluencyEvaluator(PromptyEvaluatorBase[Union[str, float]]):
:end-before: [END fluency_evaluator]
:language: python
:dedent: 8
:caption: Initialize and call FluencyEvaluator using Azure AI Project URL in the following format
:caption: Initialize and call FluencyEvaluator using Azure AI
Project URL in the following format
https://{resource_name}.services.ai.azure.com/api/projects/{project_name}

.. note::

To align with our support of a diverse set of models, an output key without the `gpt_` prefix has been added.
To maintain backwards compatibility, the old key with the `gpt_` prefix is still be present in the output;
however, it is recommended to use the new key moving forward as the old key will be deprecated in the future.
To align with support of diverse models, an output key without the
`gpt_` prefix has been added. The old key with the `gpt_` prefix is
still present for compatibility and will be deprecated.
"""

_PROMPTY_FILE = "fluency.prompty"
_RESULT_KEY = "fluency"

id = "azureai://built-in/evaluators/fluency"
"""Evaluator identifier, experimental and to be used only with evaluation in cloud."""
"""Evaluator identifier for cloud evaluation."""

@override
def __init__(self, model_config, *, threshold=3):
def __init__(self, model_config, *, threshold=3, **kwargs):
current_dir = os.path.dirname(__file__)
prompty_path = os.path.join(current_dir, self._PROMPTY_FILE)
self._threshold = threshold
Expand All @@ -79,6 +86,7 @@ def __init__(self, model_config, *, threshold=3):
result_key=self._RESULT_KEY,
threshold=threshold,
_higher_is_better=self._higher_is_better,
**kwargs,
)

@overload
Expand All @@ -103,9 +111,10 @@ def __call__(
) -> Dict[str, Union[float, Dict[str, List[Union[str, float]]]]]:
"""Evaluate fluency for a conversation

:keyword conversation: The conversation to evaluate. Expected to contain a list of conversation turns under the
key "messages", and potentially a global context under the key "context". Conversation turns are expected
to be dictionaries with keys "content", "role", and possibly "context".
:keyword conversation: The conversation to evaluate. Expected to
contain turns under the key "messages", and optionally a global
context under the key "context". Turns are dictionaries with
keys "content", "role", and possibly "context".
:paramtype conversation: Optional[~azure.ai.evaluation.Conversation]
:return: The fluency score
:rtype: Dict[str, Union[float, Dict[str, List[float]]]]
Expand All @@ -118,16 +127,19 @@ def __call__( # pylint: disable=docstring-missing-param
**kwargs,
):
"""
Evaluate fluency. Accepts either a response for a single evaluation,
or a conversation for a multi-turn evaluation. If the conversation has more than one turn,
the evaluator will aggregate the results of each turn.

:keyword response: The response to be evaluated. Mutually exclusive with the "conversation" parameter.
:paramtype response: Optional[str]
:keyword conversation: The conversation to evaluate. Expected to contain a list of conversation turns under the
key "messages". Conversation turns are expected to be dictionaries with keys "content" and "role".
:paramtype conversation: Optional[~azure.ai.evaluation.Conversation]
:return: The fluency score.
:rtype: Union[Dict[str, float], Dict[str, Union[float, Dict[str, List[float]]]]]
Evaluate fluency. Accepts either a response for a single evaluation,
or a conversation for a multi-turn evaluation. If the conversation has
more than one turn, the evaluator will aggregate per-turn results.

:keyword response: The response to be evaluated. Mutually exclusive
with the "conversation" parameter.
:paramtype response: Optional[str]
:keyword conversation: The conversation to evaluate. Expected to
contain turns under the key "messages" as dictionaries with
keys "content" and "role".
:paramtype conversation: Optional[~azure.ai.evaluation.Conversation]
:return: The fluency score.
:rtype: Union[Dict[str, float], Dict[str, Union[float, Dict[str,
List[float]]]]]
"""
return super().__call__(*args, **kwargs)
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import os, logging
import os
import logging
from inspect import signature
from typing import Dict, List, Optional, Union

from typing_extensions import overload, override
Expand Down Expand Up @@ -49,6 +51,9 @@ class GroundednessEvaluator(PromptyEvaluatorBase[Union[str, float]]):
~azure.ai.evaluation.OpenAIModelConfiguration]
:param threshold: The threshold for the groundedness evaluator. Default is 3.
:type threshold: int
:keyword is_reasoning_model: (Preview) config for chat completions is
updated to use reasoning models
:type is_reasoning_model: bool

.. admonition:: Example:

Expand Down Expand Up @@ -105,10 +110,16 @@ def __init__(self, model_config, *, threshold=3, **kwargs):
result_key=self._RESULT_KEY,
threshold=threshold,
_higher_is_better=self._higher_is_better,
**kwargs,
)
self._model_config = model_config
self.threshold = threshold
# Needs to be set because it's used in call method to re-validate prompt if `query` is provided

# Cache whether AsyncPrompty.load supports the is_reasoning_model parameter.
try:
self._has_is_reasoning_model_param: bool = "is_reasoning_model" in signature(AsyncPrompty.load).parameters
except Exception: # Very defensive: if inspect fails, assume not supported
self._has_is_reasoning_model_param = False

@overload
def __call__(
Expand Down Expand Up @@ -202,7 +213,18 @@ def __call__( # pylint: disable=docstring-missing-param
self._DEFAULT_OPEN_API_VERSION,
UserAgentSingleton().value,
)
self._flow = AsyncPrompty.load(source=self._prompty_file, model=prompty_model_config)

if self._has_is_reasoning_model_param:
self._flow = AsyncPrompty.load(
source=self._prompty_file,
model=prompty_model_config,
is_reasoning_model=self._is_reasoning_model,
)
else:
self._flow = AsyncPrompty.load(
source=self._prompty_file,
model=prompty_model_config,
)

return super().__call__(*args, **kwargs)

Expand Down Expand Up @@ -282,4 +304,4 @@ def _get_context_from_agent_response(self, response, tool_definitions):
logger.debug(f"Error extracting context from agent response : {str(ex)}")
context = ""

return context if context else None
return context
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _get_context_from_agent_response should return None when context is empty, not an empty string. The original code returned context if context else None, which properly handles the case where no context is found. Returning an empty string may cause issues in downstream processing that expects None for missing context.

Suggested change
return context
return context if context else None

Copilot uses AI. Check for mistakes.

Loading