Proposal: ship WFGY 16-problem RAG failure taxonomy as a built-in tag set

Hi DeepEval team,

DeepEval already does a great job providing metrics and tests for LLM / RAG systems. A missing piece for many teams is a structured way to tag *why* a sample failed.

I maintain **WFGY RAG 16 Problem Map**, an open-source failure taxonomy for RAG / LLM pipelines, together with a Global Debug Card and triage prompt.

Repo (MIT):  
https://github.com/onestardao/WFGY  
Main reference page:  
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

This map is already integrated or cited in projects such as **RAGFlow, LlamaIndex, ToolUniverse (Harvard MIMS Lab), Rankify (Univ. of Innsbruck), Multimodal RAG Survey (QCRI LLM Lab)** and curated lists like **Awesome LLM Apps**.

Proposal:

Add WFGY’s 16-problem map as an **optional built-in tag set** inside DeepEval, for example:

1. Provide a small helper that:
   - Given a failing test case (question, context, answer), calls an LLM with the WFGY Global Debug Card.
   - Returns one of the 16 failure labels as a tag on the sample.

2. Add a short **“RAG failure modes”** doc that explains the taxonomy and demonstrates:
   - How to enable these tags in a test suite.
   - How to aggregate results by failure type (e.g. more “retrieval blind spots” vs “prompt leakage”).

This would nicely complement DeepEval’s metric-based view with a semantic failure map that is already being used by other RAG frameworks and labs.

If this sounds useful I can draft the helper code and example usage, following your existing API style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: ship WFGY 16-problem RAG failure taxonomy as a built-in tag set #2517

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: ship WFGY 16-problem RAG failure taxonomy as a built-in tag set #2517

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions