You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rename Task Success Evaluator to Task Completion Evaluator (#43190)
* Add Task Success Evaluator V0
* Add samples for task success evaluator
* Run black
* Modify output format
* Modify output format in the examples
* Make Task Success a private preview evaluator
* Minor TaskSuccessEvaluator prompt update
* Fix path for importing Task Success Evaluator in samples
* Modify path for TaskSuccessEvaluator in eval mapping
* Remove sample notebook
* To retrigger build pipelines
* Add credential to TaskSuccessEvaluator
* Run Black
* To retrigger build pipeline
* Minor prompt modification
* Change tool_definitions type in TaskSuccess prompt
* Mark model grader tests as skip
* Remove task success evaluator from the samples notebook
* Rename Task Success to Task Completion
* Minor definition modification
* Minor rename
* remove task_success
* Fix merge issue
---------
Co-authored-by: Salma Elshafey <[email protected]>
"""Evaluate task success for a given query, response, and optionally tool definitions.
87
+
"""Evaluate task completion for a given query, response, and optionally tool definitions.
88
88
The query and response can be either a string or a list of messages.
89
89
90
90
91
91
Example with string inputs and no tools:
92
-
evaluator = TaskSuccessEvaluator(model_config)
92
+
evaluator = TaskCompletionEvaluator(model_config)
93
93
query = "Plan a 3-day itinerary for Paris with cultural landmarks and local cuisine."
94
94
response = "**Day 1:** Morning: Louvre Museum, Lunch: Le Comptoir du Relais..."
95
95
96
96
result = evaluator(query=query, response=response)
97
97
98
98
Example with list of messages:
99
-
evaluator = TaskSuccessEvaluator(model_config)
99
+
evaluator = TaskCompletionEvaluator(model_config)
100
100
query = [{'role': 'system', 'content': 'You are a helpful travel planning assistant.'}, {'createdAt': 1700000060, 'role': 'user', 'content': [{'type': 'text', 'text': 'Plan a 3-day Paris itinerary with cultural landmarks and cuisine'}]}]
Copy file name to clipboardExpand all lines: sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_completion/task_completion.prompty
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
name: Task Success
2
+
name: Task Completion
3
3
description: Evaluates whether a task was successfully completed
4
4
model:
5
5
api: chat
@@ -27,7 +27,7 @@ You are an expert evaluator who determines if an agent has successfully complete
27
27
user:
28
28
ROLE
29
29
====
30
-
You are a judge on Task Success who assesses the final outcome of a user-agent interaction. Your single focus is: **Was the user's task successfully and completely accomplished?**
30
+
You are a judge on Task Completion who assesses the final outcome of a user-agent interaction. Your single focus is: **Was the user's task successfully and completely accomplished?**
0 commit comments