Commit bcec119
Add support for Built-in Tools for the Tool Call Accuracy Evaluator (Azure#42359)
* support 5 levels, evaluate all tools at once
* Update sample notebook and change log
* Add missing import
* Modify test cases to match the new output format
* Modify other test file to match the new output format
* Fixed parsing of results
* Change key name in output
* Spell check fixes
* Minor prompt update
* Update result key to tool_call_accuracy
* Delete test_new_evaluator.ipynb
* Added field names and messages as constants
* Additional note in prompt
* Re-add the temperature to the prompty file
* Removed 'applicable' field and print statement
* Move excess/missing tool calls fields under additional details
* Typo fix and removal of redundant field in the prompt
* Modify per_tool_call_details field's name to details
* Revert "Modify per_tool_call_details field's name to details"
This reverts commit 2c3ce50.
* Revert 'Merge branch 'main' into selshafey/improve_tool_call_accuracy'
* Black reformat
* Reformat with black
* To re-trigger build pipelines
* Add notebook for bugbash
* modify bugbash notebook
* Add support for built-in tools for Tool Call Accuracy Evaluator
* Remove bugbash notebook
* Resolve issues with merge
* Fix id value
* Use existing built-in tool definitions
* Run black
* Prompt modifications
* Add test cases for built-in tools
* Handle converter format
* Add test cases for converter format
* Support only converter format
* Revert tool definitions to be required, run black
---------
Co-authored-by: Salma Elshafey <[email protected]>1 parent 78e1967 commit bcec119
File tree
4 files changed
+399
-77
lines changed- sdk/evaluation/azure-ai-evaluation
- azure/ai/evaluation
- _converters
- _evaluators/_tool_call_accuracy
- tests/unittests
4 files changed
+399
-77
lines changedLines changed: 29 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
| 39 | + | |
38 | 40 | | |
| 41 | + | |
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
| |||
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
47 | | - | |
| 50 | + | |
| 51 | + | |
48 | 52 | | |
| 53 | + | |
49 | 54 | | |
| 55 | + | |
50 | 56 | | |
51 | 57 | | |
52 | 58 | | |
| |||
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
62 | 77 | | |
63 | 78 | | |
64 | 79 | | |
| |||
76 | 91 | | |
77 | 92 | | |
78 | 93 | | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
79 | 100 | | |
80 | 101 | | |
81 | 102 | | |
82 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
83 | 111 | | |
84 | 112 | | |
85 | 113 | | |
| |||
Lines changed: 93 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
24 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
25 | 59 | | |
26 | 60 | | |
27 | 61 | | |
| |||
153 | 187 | | |
154 | 188 | | |
155 | 189 | | |
156 | | - | |
| 190 | + | |
157 | 191 | | |
158 | 192 | | |
159 | | - | |
160 | 193 | | |
161 | 194 | | |
162 | 195 | | |
| |||
165 | 198 | | |
166 | 199 | | |
167 | 200 | | |
168 | | - | |
169 | | - | |
170 | 201 | | |
171 | 202 | | |
172 | 203 | | |
173 | 204 | | |
174 | | - | |
| 205 | + | |
175 | 206 | | |
176 | 207 | | |
177 | 208 | | |
178 | 209 | | |
179 | | - | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
180 | 216 | | |
181 | | - | |
| 217 | + | |
182 | 218 | | |
183 | 219 | | |
184 | 220 | | |
| |||
269 | 305 | | |
270 | 306 | | |
271 | 307 | | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
| 308 | + | |
280 | 309 | | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
281 | 319 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
291 | 348 | | |
| 349 | + | |
292 | 350 | | |
293 | | - | |
| 351 | + | |
294 | 352 | | |
295 | 353 | | |
296 | 354 | | |
297 | 355 | | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
298 | 365 | | |
299 | 366 | | |
300 | 367 | | |
| |||
0 commit comments