-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the Feature
This feature request is about integrating LangChain's PydanticOutputParser and pydantic models into ragas prompting.
Why is the feature important for you?
The current prompting of the validation metrics uses a somewhat vague definition of the answer format, specifying only examples about the expected format. The LLM then should deduct the expected format and generate an answer that corresponds to the schema that is only implicitly specified in the single metrics implementation. Especially when using different LLMs compared to the default GPT-3.5, this leads to potential parsing errors and the metrics can not be calculated correctly.
For example:
- Claude and other models tends to return the binary verdict as a JSON number opposed to the expected string: #752 (this issue is about testset generation, but I saw very similar issues also during context precision/recall calculation) and also #715
- Sometimes the "verdict" envelop is omitted from the response: #733
- Sometimes the response is embedded in a superfluous envelope: #668
- It seems different models have issues with the "Attributed" keys: #619
- ...and numerous other bugs might be related to the weak JSON parsing.
Additional context
LangChain has a robust implementation of instructing the models to return a JSON response conforming to a specific schema. The output format can be specified with pydantic data classes, the expected JSON schema is injected into the prompt, and the response is automatically parsed by pydantic. Additionally, a retry mechanism can be included for even more robust JSON parsing.
Considering that Ragas is already using LangChain, implementing this feature would not create addition dependencies to the project.