Skip to content

Proposal: ship WFGY 16-problem RAG failure taxonomy as a built-in tag set #2517

@onestardao

Description

@onestardao

Hi DeepEval team,

DeepEval already does a great job providing metrics and tests for LLM / RAG systems. A missing piece for many teams is a structured way to tag why a sample failed.

I maintain WFGY RAG 16 Problem Map, an open-source failure taxonomy for RAG / LLM pipelines, together with a Global Debug Card and triage prompt.

Repo (MIT):
https://github.com/onestardao/WFGY
Main reference page:
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

This map is already integrated or cited in projects such as RAGFlow, LlamaIndex, ToolUniverse (Harvard MIMS Lab), Rankify (Univ. of Innsbruck), Multimodal RAG Survey (QCRI LLM Lab) and curated lists like Awesome LLM Apps.

Proposal:

Add WFGY’s 16-problem map as an optional built-in tag set inside DeepEval, for example:

  1. Provide a small helper that:

    • Given a failing test case (question, context, answer), calls an LLM with the WFGY Global Debug Card.
    • Returns one of the 16 failure labels as a tag on the sample.
  2. Add a short “RAG failure modes” doc that explains the taxonomy and demonstrates:

    • How to enable these tags in a test suite.
    • How to aggregate results by failure type (e.g. more “retrieval blind spots” vs “prompt leakage”).

This would nicely complement DeepEval’s metric-based view with a semantic failure map that is already being used by other RAG frameworks and labs.

If this sounds useful I can draft the helper code and example usage, following your existing API style.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions