Skip to content

answer_correctness fails with OutputParserException: expected "text" but got "statements"`Β #2162

@harshil-sanghvi

Description

@harshil-sanghvi
  • I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
When running evaluate() with the answer_correctness metric, the evaluation fails with an OutputParserException caused by a pydantic.ValidationError.
The error occurs because the statement_generator_prompt inside answer_correctness returns JSON with a statements field (e.g., {"statements": [...]}) but the output parser is using a model (StringIO) that expects a text field.
This mismatch causes the parser to throw a Field required error for text.

Ragas version: v0.2.15
Python version: 3.9.x

Code to Reproduce (pseudocode - Cannot share full reproduction code as it contains proprietary information)

# prepare minimal dataset
create dataset with columns:
  - question: list[str]
  - answer: list[str]
  - ground_truth: list[str]

# initialize metric(s)
metrics = [answer_correctness]  # from ragas.metrics

# (depending on environment) prepare LLM and embeddings
llm = <any langchain-compatible LLM>               # e.g., a local stub or provider-backed
embeddings = <any langchain-compatible embeddings> # shape compatible with ragas

# run evaluation
score = ragas.evaluate(
    dataset,
    metrics=metrics,
    llm=llm,
    embeddings=embeddings
)

# observe failure
# OutputParserException bubbling up from:
#   statement_generator_prompt β†’ PydanticOutputParser(StringIO[text]) β†’ ValidationError (missing "text")
print(score.to_pandas())

Error trace

OutputParserException: Failed to parse StringIO from completion {"statements": ["..."]}.
Got: 1 validation error for StringIO
text
Field required [type=missing, input_value={'statements': [...]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing

Expected behavior
The statement_generator_prompt output should be successfully parsed without raising a ValidationError. The parser schema and the prompt output format should be aligned.

Additional context
The statement_generator_prompt examples in answer_correctness.get_prompts() show an expected model with a statements: List[str] field, but the parser is still configured for a StringIO model with a single text field. This mismatch causes the evaluation to fail when the LLM outputs match the examples but not the parser schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    answeredπŸ€– The question has been answered. Will be closed automatically if no new commentsbugSomething isn't workingmodule-metricsthis is part of metrics module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions