-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Describe the bug
Conversation Completeness does not return fractional completion.
To Reproduce
**************************************************
Conversation Completeness Verbose Logs
**************************************************
Turns:
[
{
"role": "user",
"content": "How many views did I get on Monday?",
"user_id": null,
"retrieval_context": null,
"tools_called": null,
"mcp_tools_called": null,
"mcp_resources_called": null,
"mcp_prompts_called": null,
"additional_metadata": null
},
{
"role": "assistant",
"content": "You received 10 views.",
"user_id": null,
"retrieval_context": null,
"tools_called": null,
"mcp_tools_called": null,
"mcp_resources_called": null,
"mcp_prompts_called": null,
"additional_metadata": null
},
{
"role": "user",
"content": "How many views did I get on Tuesday?",
"user_id": null,
"retrieval_context": null,
"tools_called": null,
"mcp_tools_called": null,
"mcp_resources_called": null,
"mcp_prompts_called": null,
"additional_metadata": null
},
{
"role": "assistant",
"content": "I cannot help you with that, sorry.",
"user_id": null,
"retrieval_context": null,
"tools_called": null,
"mcp_tools_called": null,
"mcp_resources_called": null,
"mcp_prompts_called": null,
"additional_metadata": null
}
]
User Intentions:
[
"User wants to know the number of views they received on specific days"
]
Verdicts:
[
{
"verdict": "no",
"reason": "The user asked for the number of views on both Monday and Tuesday. The assistant only answered for Monday ('You
received 10 views.') but refused to answer for Tuesday ('I cannot help you with that, sorry.'). Therefore, the user's intention to
know the number of views for both days was not fully satisfied."
}
]
Score: 0.0
Reason: The score is 0.0 because the LLM only provided the number of views for Monday and did not address the user's request for
Tuesday, leaving the user's intention to know the views for both days completely unmet.
======================================================================
Expected behavior
I expected a score of 0.5
Desktop (please complete the following information):
- OS: [e.g. iOS] MacOS
- Browser [e.g. chrome, safari] Chrome
- Version [e.g. 22] 144.0.7559.110
Additional context
This is with strict_mode=False.
Reactions are currently unavailable