Suggestion: add WFGY (TXT based long horizon stress test + failure map)

Hi, and thanks for putting this list together. It is a very useful overview of the AI evaluation ecosystem.

I would like to ask whether a project called WFGY could fit somewhere in this repo, probably under the robustness / stress-testing side of evaluation.

Short description:
- WFGY 3.0 · Singularity Demo is a pure TXT pack used as a long horizon “tension crash test” for LLMs. Models read one S-class problem after another and we check when their reasoning quietly drifts or collapses.
- WFGY ProblemMap is a diagnostic map of common AI system failure modes (for example RAG issues, vector store mistakes, deployment order problems). It is used to label what went wrong when a model starts behaving strangely during those stress tests.

Everything is open source on GitHub (MIT license, 1k+ stars) and fully transparent. There is no hidden code and no external calls.

If this looks in scope for awesome-ai-eval, I am happy to prepare a small PR that adds a single line in the appropriate section. If it is out of scope, that is also totally fine; I mainly wanted to ask before sending a PR.

Thanks again for curating this list.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: add WFGY (TXT based long horizon stress test + failure map) #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Suggestion: add WFGY (TXT based long horizon stress test + failure map) #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions