Refine FactCheckingEvaluator prompt to specify yes/no answer. #1605

habuma · 2024-10-26T22:02:51Z

This PR addresses what I believe is a shortcoming in the existing prompt used in FactCheckingEvaluator.

Specifically: No matter what I provided to FactCheckingEvaluator, isPass() was always returning false. I did a little debugging and noticed that the evaluation response was a long-winded explanation of how the response content either aligned or didn't align with the given context. But the decision on whether the fact-checking passes was based on the much simpler expectation that the evaluation response was either "yes" or "no" (case-insensitive).

This change refines the prompt used by FactCheckingEvaluator to specify a yes/no answer, which seems to have resulted in good isPass() values in my testing (at least when evaluated against the default OpenAI GPT-4o model).

markpollack · 2024-10-28T15:09:28Z

I'll run this against the bespoke-minicheck model on ollama, which I believe responds with the yes/no as that is the models purpose. There was a github repo that generated the LLM-AggreFact Leaderboard and I sort of remember there being some massaging of the prompt depending on what LLM was used. I suspect this change may break usage with bespoke-minicheck but work with openai and other models. Not yet sure on the best way to handle it, if there can be a portable prompt or we need to pass in the llm used in order to pick the correct prompt. Will report back.

markpollack · 2024-12-19T20:43:49Z

I've updated the class so that there are two prompt styles, one for general llms and one for bespoke mini. I made the general llm prompt the default and it is likely the more common - though perhaps less accurate - usage.

merged in f92a3f0

Refine FactCheckingEvaluator prompt to specify yes/no answer.

0d49850

markpollack self-assigned this Oct 28, 2024

markpollack added the Evaluation label Oct 28, 2024

markpollack added this to the 1.0.0-M4 milestone Oct 28, 2024

markpollack modified the milestones: 1.0.0-M4, 1.0.0-M5 Nov 20, 2024

habuma mentioned this pull request Dec 18, 2024

Make expectations in FactCheckingEvaluator's prompt more explicit #1969

Open

markpollack closed this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refine FactCheckingEvaluator prompt to specify yes/no answer. #1605

Refine FactCheckingEvaluator prompt to specify yes/no answer. #1605

Uh oh!

habuma commented Oct 26, 2024

Uh oh!

markpollack commented Oct 28, 2024

Uh oh!

markpollack commented Dec 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refine FactCheckingEvaluator prompt to specify yes/no answer. #1605

Refine FactCheckingEvaluator prompt to specify yes/no answer. #1605

Uh oh!

Conversation

habuma commented Oct 26, 2024

Uh oh!

markpollack commented Oct 28, 2024

Uh oh!

markpollack commented Dec 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants