Skip to content

Conversation

@habuma
Copy link
Member

@habuma habuma commented Oct 26, 2024

This PR addresses what I believe is a shortcoming in the existing prompt used in FactCheckingEvaluator.

Specifically: No matter what I provided to FactCheckingEvaluator, isPass() was always returning false. I did a little debugging and noticed that the evaluation response was a long-winded explanation of how the response content either aligned or didn't align with the given context. But the decision on whether the fact-checking passes was based on the much simpler expectation that the evaluation response was either "yes" or "no" (case-insensitive).

This change refines the prompt used by FactCheckingEvaluator to specify a yes/no answer, which seems to have resulted in good isPass() values in my testing (at least when evaluated against the default OpenAI GPT-4o model).

@markpollack
Copy link
Member

I'll run this against the bespoke-minicheck model on ollama, which I believe responds with the yes/no as that is the models purpose. There was a github repo that generated the LLM-AggreFact Leaderboard and I sort of remember there being some massaging of the prompt depending on what LLM was used. I suspect this change may break usage with bespoke-minicheck but work with openai and other models. Not yet sure on the best way to handle it, if there can be a portable prompt or we need to pass in the llm used in order to pick the correct prompt. Will report back.

@markpollack
Copy link
Member

I've updated the class so that there are two prompt styles, one for general llms and one for bespoke mini. I made the general llm prompt the default and it is likely the more common - though perhaps less accurate - usage.

merged in f92a3f0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants