Skip to content

Commit d8ec000

Browse files
lgayhardtchangliu2
andauthored
Update articles/ai-foundry/concepts/evaluation-metrics-built-in.md
Co-authored-by: changliu2 <[email protected]>
1 parent 1060937 commit d8ec000

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-foundry/concepts/evaluation-metrics-built-in.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ Our definition and grading rubrics to be used by the Large Language Model judge
333333

334334
**Definition:**
335335

336-
An average accuracy score, or a passing rate, of a single or multiple tool calls. A correct tool call considers relevance and potential usefulness, including syntactic and semantic correctness of a proposed tool call from an intelligent system. The judgment for each tool call is based on the following provided criteria, user query, and the tool definitions available to the agent.
336+
Tool Call Accuracy returns the correctness of a single tool call, or the passing rate of the correct tool calls among multiple ones. A correct tool call considers relevance and potential usefulness, including syntactic and semantic correctness of a proposed tool call from an intelligent system. The judgment for each tool call is based on the following provided criteria, user query, and the tool definitions available to the agent.
337337

338338
**Ratings:**
339339

0 commit comments

Comments
 (0)