Skip to content

More robust JSON prompting and parsingΒ #761

@mrtj

Description

@mrtj

Describe the Feature

This feature request is about integrating LangChain's PydanticOutputParser and pydantic models into ragas prompting.

Why is the feature important for you?

The current prompting of the validation metrics uses a somewhat vague definition of the answer format, specifying only examples about the expected format. The LLM then should deduct the expected format and generate an answer that corresponds to the schema that is only implicitly specified in the single metrics implementation. Especially when using different LLMs compared to the default GPT-3.5, this leads to potential parsing errors and the metrics can not be calculated correctly.

For example:

  • Claude and other models tends to return the binary verdict as a JSON number opposed to the expected string: #752 (this issue is about testset generation, but I saw very similar issues also during context precision/recall calculation) and also #715
  • Sometimes the "verdict" envelop is omitted from the response: #733
  • Sometimes the response is embedded in a superfluous envelope: #668
  • It seems different models have issues with the "Attributed" keys: #619
  • ...and numerous other bugs might be related to the weak JSON parsing.

Additional context

LangChain has a robust implementation of instructing the models to return a JSON response conforming to a specific schema. The output format can be specified with pydantic data classes, the expected JSON schema is injected into the prompt, and the response is automatically parsed by pydantic. Additionally, a retry mechanism can be included for even more robust JSON parsing.

Considering that Ragas is already using LangChain, implementing this feature would not create addition dependencies to the project.

Metadata

Metadata

Labels

enhancementNew feature or requeststaleIssue has not had recent activity or appears to be solved. Stale issues will be automatically closed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions