Skip to content

Commit 64f5f51

Browse files
salma-elshafeySalma Elshafey
andauthored
Fix evaluation aggregation logic for not applicable results (#42888)
* Replace not applicable results in evaluator outputs to aggregate metrics * Update aggregation description --------- Co-authored-by: Salma Elshafey <[email protected]>
1 parent c099fb6 commit 64f5f51

File tree

1 file changed

+4
-0
lines changed
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate

1 file changed

+4
-0
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
from azure.ai.evaluation._common.math import list_mean_nan_safe, apply_transform_nan_safe
2020
from azure.ai.evaluation._common.utils import validate_azure_ai_project, is_onedp_project
21+
from azure.ai.evaluation._evaluators._common._base_eval import EvaluatorBase
2122
from azure.ai.evaluation._exceptions import ErrorBlame, ErrorCategory, ErrorTarget, EvaluationException
2223

2324
from azure.ai.evaluation._aoai.aoai_grader import AzureOpenAIGrader
@@ -317,6 +318,9 @@ def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dic
317318
# For rest of metrics, we will calculate mean
318319
df.drop(columns=handled_columns, inplace=True)
319320

321+
# Convert "not applicable" strings to None to allow proper numeric aggregation
322+
df = df.replace(EvaluatorBase._NOT_APPLICABLE_RESULT, None)
323+
320324
# NOTE: nan/None values don't count as as booleans, so boolean columns with
321325
# nan/None values won't have a mean produced from them.
322326
# This is different from label-based known evaluators, which have special handling.

0 commit comments

Comments
 (0)