Skip to content

Commit d270703

Browse files
nagkumar91Nagkumar ArkalgudNagkumar ArkalgudCopilot
authored
Passing threshold in AzureOpenAIScoreModelGrader (#42136)
* Prepare evals SDK Release * Fix bug * Fix for ADV_CONV for FDP projects * Update release date * re-add pyrit to matrix * Change grader ids * Update unit test * replace all old grader IDs in tests * Update platform-matrix.json Add pyrit and not remove the other one * Update test to ensure everything is mocked * tox/black fixes * Skip that test with issues * update grader ID according to API View feedback * Update test * remove string check for grader ID * Update changelog and officialy start freeze * update the enum according to suggestions * update the changelog * Finalize logic * Initial plan * Fix client request ID headers in azure-ai-evaluation Co-authored-by: nagkumar91 <[email protected]> * Fix client request ID header format in rai_service.py Co-authored-by: nagkumar91 <[email protected]> * Passing threshold in AzureOpenAIScoreModelGrader * Add changelog * Adding the self.pass_threshold instead of pass_threshold --------- Co-authored-by: Nagkumar Arkalgud <[email protected]> Co-authored-by: Nagkumar Arkalgud <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: nagkumar91 <[email protected]>
1 parent f374054 commit d270703

File tree

2 files changed

+2
-0
lines changed

2 files changed

+2
-0
lines changed

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
- Fixes and improvements to ToolCallAccuracy evaluator. New version has less variance. and now works on all tool calls that happen in a turn at once. Previously, it worked on each tool call independently without having context on the other tool calls that happen in the same turn, and then aggregated the results to a score in the range [0-1]. The score range is now [1-5].
3030
- Fixed MeteorScoreEvaluator and other threshold-based evaluators returning incorrect binary results due to integer conversion of decimal scores. Previously, decimal scores like 0.9375 were incorrectly converted to integers (0) before threshold comparison, causing them to fail even when above the threshold. [#41415](https://github.com/Azure/azure-sdk-for-python/issues/41415)
3131
- Added a new enum `ADVERSARIAL_QA_DOCUMENTS` which moves all the "file_content" type prompts away from `ADVERSARIAL_QA` to the new enum
32+
- `AzureOpenAIScoreModelGrader` evaluator now supports `pass_threshold` parameter to set the minimum score required for a response to be considered passing. This allows users to define custom thresholds for evaluation results, enhancing flexibility in grading AI model responses.
3233

3334
## 1.8.0 (2025-05-29)
3435

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_aoai/score_model_grader.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ def __init__(
8484
grader_kwargs["range"] = range
8585
if sampling_params is not None:
8686
grader_kwargs["sampling_params"] = sampling_params
87+
grader_kwargs["pass_threshold"] = self.pass_threshold
8788

8889
grader = ScoreModelGrader(**grader_kwargs)
8990

0 commit comments

Comments
 (0)