We build WFGY, an open-source reasoning and debugging engine for AI systems.
One architecture, different depths. Not a random collection of tools.
Over a year of focused development, now fully open sourced under the MIT License.
WFGY is designed for people who need structured debugging and serious reasoning, not just another prompt recipe.
-
RAG and agent teams
Your pipeline runs, infra looks healthy, but answers are still wrong or unstable. You want a reproducible failure map instead of trial and error. -
Infra and platform owners
You operate LLM, RAG, or agent platforms and need a way to audit reasoning behavior across models, tenants, or deployments. -
Researchers and evaluation teams
You study long-horizon reasoning, safety, or stress tests, and want a concrete set of problems and observables to benchmark against. -
Founders, PMs, and domain experts
You carry a small number of high-tension questions in finance, climate, AI, or society, and want to see how a structured reasoning engine treats those cases.
If you do not fit neatly into any of the above, you can still start with the Problem Map or the Global Debug Card and treat them as diagnostic checklists for debugging your own systems.
-
WFGY RAG 16 Problem Map π§©
Flagship 16-problem RAG failure checklist and fix map for broken RAG / agent pipelines.
Use this when your infra looks healthy but answers are still wrong.
β 16 Problem Map -
WFGY Global Debug Card πΌοΈ
Image-as-protocol layer for the 16 Problem Map.
Upload one poster plus (Q, E, P, A) context to any strong LLM and triage the run.
β Global Debug Card -
WFGY 3.0 β Frontier TXT Engine π
TXT-based tension reasoning engine built on a 131 S-class backbone.
Use this when you want a long-horizon stress test for serious questions.
β Singularity Demo
Unlike traditional tools, WFGY is an ecosystem of fix-first reasoning components.
Every artifact here started from a real failure:
- a broken RAG pipeline that refused to stabilize,
- an agent stack that looked fine at the infra level but still collapsed in edge cases,
- long-horizon questions that generic benchmarks do not touch.
The goal is simple:
make reasoning failures visible, reproducible, and fixable.
If WFGY helps your workflow or thinking, a star on the repo helps others discover it.
As of 2026-03, the WFGY RAG 16 Problem Map line has been adopted or referenced by 20+ frameworks, academic labs, and curated lists in the RAG and agent ecosystem.
Some representative integrations:
| Project | Stars | Segment | How it uses WFGY ProblemMap | Proof (PR / doc) |
|---|---|---|---|---|
| RAGFlow | Mainstream RAG engine | Introduced a RAG failure modes checklist guide to the RAGFlow documentation via PR, adapted from the WFGY 16-problem failure map for step-by-step RAG pipeline diagnostics. | PR #13204 | |
| LlamaIndex | Mainstream RAG infra | Integrates the WFGY 16-problem RAG failure checklist into its official RAG troubleshooting docs as a structured failure mode reference. | PR #20760 | |
| FlashRAG | Academic lab / RAG research toolkit | Adapts the WFGY ProblemMap as a structured RAG failure checklist in its documentation. The 16-mode taxonomy is cited to support reproducible debugging and systematic failure-mode reasoning for RAG experiments. | PR #224 | |
| ToolUniverse (Harvard MIMS Lab) | Academic lab / tools | Provides a WFGY_triage_llm_rag_failure tool that wraps the 16 mode map for incident triage. |
PR #75 | |
| LightAgent | Agent framework | Incorporates WFGY ProblemMap concepts into its documentation via a Multi-agent troubleshooting (failure map) section, providing a structured symptom β failure-mode β debugging checklist for diagnosing role drift, cross-agent memory issues, and coordination failures in multi-agent systems. | PR #24 | |
| Rankify (Univ. of Innsbruck) | Academic lab / system | Uses the 16 failure patterns in RAG and re-ranking troubleshooting docs. | PR #76 | |
| Multimodal RAG Survey (QCRI LLM Lab) | Academic lab / survey | Cites WFGY as a practical diagnostic resource for multimodal RAG. | PR #4 |
Most external references today point to the WFGY ProblemMap / 16-problem failure checklist.
A smaller but growing set also uses WFGY 3.0 Β· Singularity Demo as a long-horizon, TXT-based stress test.
This does not mean every project is using the full WFGY ecosystem. In most cases, WFGY appears as a ProblemMap-style diagnostic layer for RAG and agent pipelines.
For the full, up-to-date 20+ project list (frameworks, benchmarks, and curated lists), see:
π WFGY Recognition Map
If you maintain an AI system, research project, or infra stack and want to explore deeper collaboration around WFGY, you can:
- open an issue in the main repo describing your use case and current failure modes,
- reference the WFGY ProblemMap number that matches your problem if you already know it,
- or reach out via Discord for more exploratory discussions.
We are especially interested in:
- RAG or agent teams who want to run WFGY debugging in real production-like settings,
- research groups who want to design new stress tests or observables on top of the 131-problem atlas,
- platform owners who would like to expose WFGY-style diagnostics as part of their user-facing tools.
The long-term goal is simple.
Make it normal for AI systems to ship with a reasoning and debugging layer that users can actually see and test.




