Skip to content

Add faithfulness check #2368

@linear

Description

@linear

Summary

Add a built-in LLM-based check that evaluates whether the model's answer is faithful to the provided source material. Complementary to Groundedness — while Groundedness checks if claims are supported by context, Faithfulness focuses on whether the answer accurately represents the source without distortion.

Motivation

Faithfulness is a standard RAG metric shipped by Ragas (Faithfulness), DeepEval, and Opik. It's distinct from Groundedness in that it focuses on accurate representation rather than just grounding.

Implementation Guide

Steps

  1. Create template: src/giskard/checks/prompts/judges/faithfulness.j2
    • Given an answer and source material, evaluate faithfulness
    • Check for: misrepresentation, selective quoting, distortion, unsupported claims
  2. Create check: src/giskard/checks/judges/faithfulness.py
    • Subclass BaseLLMCheck, register as "faithfulness"
    • Support:
      • answer_key: JSONPathStr — JSONPath for answer (default: trace.last.outputs)
      • source: str | list[str] | None = None — source material
      • source_key: JSONPathStr | None = None — JSONPath for source
  3. Add tests

Distinction from Groundedness

  • Groundedness: "Is every claim in the answer supported by the context?" (binary per-claim)
  • Faithfulness: "Does the answer accurately represent the source material?" (holistic assessment including distortion, misrepresentation)

Example usage

from giskard.checks import Faithfulness, Scenario

scenario = (
    Scenario(name="faithful_answer")
    .interact(
        inputs="Summarize this document",
        outputs="The document states that...",
        metadata={"source": "Original document text..."}
    )
    .check(Faithfulness(source_key="trace.last.metadata.source"))
)

Related issues

Acceptance Criteria

  • Evaluates faithfulness of answer to source material
  • Detects misrepresentation and distortion
  • Distinct behavior from Groundedness check
  • Tests cover: faithful answer passes, distorted answer fails, partial faithfulness

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions