Commit bf66ea7
Tool Call Accuracy V2 (#41740)
* support 5 levels, evaluate all tools at once
* Update sample notebook and change log
* Add missing import
* Modify test cases to match the new output format
* Modify other test file to match the new output format
* Fixed parsing of results
* Change key name in output
* Spell check fixes
* Minor prompt update
* Update result key to tool_call_accuracy
* Delete test_new_evaluator.ipynb
* Added field names and messages as constants
* Additional note in prompt
* Re-add the temperature to the prompty file
* Removed 'applicable' field and print statement
* Move excess/missing tool calls fields under additional details
* Typo fix and removal of redundant field in the prompt
* Modify per_tool_call_details field's name to details
* Revert "Modify per_tool_call_details field's name to details"
This reverts commit 2c3ce50.
* Revert 'Merge branch 'main' into selshafey/improve_tool_call_accuracy'
* Black reformat
* Reformat with black
* To re-trigger build pipelines
---------
Co-authored-by: Salma Elshafey <[email protected]>1 parent 007b9c9 commit bf66ea7
File tree
8 files changed
+522
-451
lines changed- sdk/evaluation/azure-ai-evaluation
- azure/ai/evaluation/_evaluators
- _common
- _tool_call_accuracy
- samples/agent_evaluators
- tests/unittests
8 files changed
+522
-451
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | 49 | | |
53 | 50 | | |
54 | 51 | | |
| |||
Lines changed: 30 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
8 | 20 | | |
9 | 21 | | |
10 | 22 | | |
11 | 23 | | |
12 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
13 | 30 | | |
14 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
15 | 35 | | |
16 | 36 | | |
17 | 37 | | |
| |||
176 | 196 | | |
177 | 197 | | |
178 | 198 | | |
179 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
180 | 202 | | |
181 | 203 | | |
182 | 204 | | |
| |||
235 | 257 | | |
236 | 258 | | |
237 | 259 | | |
238 | | - | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
239 | 263 | | |
240 | 264 | | |
241 | 265 | | |
| |||
288 | 312 | | |
289 | 313 | | |
290 | 314 | | |
291 | | - | |
| 315 | + | |
292 | 316 | | |
293 | 317 | | |
294 | 318 | | |
| |||
0 commit comments