You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Prepare evals SDK Release
* Fix bug
* Fix for ADV_CONV for FDP projects
* Update release date
* re-add pyrit to matrix
* Change grader ids
* Update unit test
* replace all old grader IDs in tests
* Update platform-matrix.json
Add pyrit and not remove the other one
* Update test to ensure everything is mocked
* tox/black fixes
* Skip that test with issues
* update grader ID according to API View feedback
* Update test
* remove string check for grader ID
* Update changelog and officialy start freeze
* update the enum according to suggestions
* update the changelog
* Finalize logic
* Initial plan
* Fix client request ID headers in azure-ai-evaluation
Co-authored-by: nagkumar91 <[email protected]>
* Fix client request ID header format in rai_service.py
Co-authored-by: nagkumar91 <[email protected]>
* Passing threshold in AzureOpenAIScoreModelGrader
* Add changelog
* Adding the self.pass_threshold instead of pass_threshold
* Add the python grader
* Remove redundant test
* Add class to exception list and format code
* Add properties to evaluation upload run for FDP
* Remove debug
* Remove the redundant property
* Fix changelog
* Fix the multiple features added section
* removed the properties in update
* fix(evaluation): pad AOAI grader results to expected_rows to prevent row misalignment; add missing-rows unit test
* Update the test and changelog
* Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py
Co-authored-by: Copilot <[email protected]>
* Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate_aoai.py
Co-authored-by: Copilot <[email protected]>
* Fix the indent
* Lint fixes
* update cspell
---------
Co-authored-by: Nagkumar Arkalgud <[email protected]>
Co-authored-by: Nagkumar Arkalgud <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: nagkumar91 <[email protected]>
Co-authored-by: Copilot <[email protected]>
Copy file name to clipboardExpand all lines: sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@
9
9
- Enhanced `GroundednessEvaluator` to support AI agent evaluation with tool calls. The evaluator now accepts agent response data containing tool calls and can extract context from `file_search` tool results for groundedness assessment. This enables evaluation of AI agents that use tools to retrieve information and generate responses. Note: Agent groundedness evaluation is currently supported only when the `file_search` tool is used.
10
10
11
11
### Bugs Fixed
12
+
- Fixed issue where evaluation results were not properly aligned with input data, leading to incorrect metrics being reported.
12
13
13
14
### Other Changes
14
15
- Deprecating `AdversarialSimulator` in favor of the [AI Red Teaming Agent](https://aka.ms/airedteamingagent-sample). `AdversarialSimulator` will be removed in the next minor release.
0 commit comments