Summary
Add a built-in LLM-based check that evaluates whether the model's answer is faithful to the provided source material. Complementary to Groundedness — while Groundedness checks if claims are supported by context, Faithfulness focuses on whether the answer accurately represents the source without distortion.
Motivation
Faithfulness is a standard RAG metric shipped by Ragas (Faithfulness), DeepEval, and Opik. It's distinct from Groundedness in that it focuses on accurate representation rather than just grounding.
Implementation Guide
Steps
- Create template:
src/giskard/checks/prompts/judges/faithfulness.j2
- Given an answer and source material, evaluate faithfulness
- Check for: misrepresentation, selective quoting, distortion, unsupported claims
- Create check:
src/giskard/checks/judges/faithfulness.py
- Subclass
BaseLLMCheck, register as "faithfulness"
- Support:
answer_key: JSONPathStr — JSONPath for answer (default: trace.last.outputs)
source: str | list[str] | None = None — source material
source_key: JSONPathStr | None = None — JSONPath for source
- Add tests
Distinction from Groundedness
- Groundedness: "Is every claim in the answer supported by the context?" (binary per-claim)
- Faithfulness: "Does the answer accurately represent the source material?" (holistic assessment including distortion, misrepresentation)
Example usage
from giskard.checks import Faithfulness, Scenario
scenario = (
Scenario(name="faithful_answer")
.interact(
inputs="Summarize this document",
outputs="The document states that...",
metadata={"source": "Original document text..."}
)
.check(Faithfulness(source_key="trace.last.metadata.source"))
)
Related issues
Acceptance Criteria
Summary
Add a built-in LLM-based check that evaluates whether the model's answer is faithful to the provided source material. Complementary to
Groundedness— while Groundedness checks if claims are supported by context, Faithfulness focuses on whether the answer accurately represents the source without distortion.Motivation
Faithfulness is a standard RAG metric shipped by Ragas (
Faithfulness), DeepEval, and Opik. It's distinct from Groundedness in that it focuses on accurate representation rather than just grounding.Implementation Guide
Steps
src/giskard/checks/prompts/judges/faithfulness.j2src/giskard/checks/judges/faithfulness.pyBaseLLMCheck, register as"faithfulness"answer_key: JSONPathStr— JSONPath for answer (default:trace.last.outputs)source: str | list[str] | None = None— source materialsource_key: JSONPathStr | None = None— JSONPath for sourceDistinction from Groundedness
Example usage
Related issues
Acceptance Criteria