-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hi, and thanks for putting this list together. It is a very useful overview of the AI evaluation ecosystem.
I would like to ask whether a project called WFGY could fit somewhere in this repo, probably under the robustness / stress-testing side of evaluation.
Short description:
- WFGY 3.0 · Singularity Demo is a pure TXT pack used as a long horizon “tension crash test” for LLMs. Models read one S-class problem after another and we check when their reasoning quietly drifts or collapses.
- WFGY ProblemMap is a diagnostic map of common AI system failure modes (for example RAG issues, vector store mistakes, deployment order problems). It is used to label what went wrong when a model starts behaving strangely during those stress tests.
Everything is open source on GitHub (MIT license, 1k+ stars) and fully transparent. There is no hidden code and no external calls.
If this looks in scope for awesome-ai-eval, I am happy to prepare a small PR that adds a single line in the appropriate section. If it is out of scope, that is also totally fine; I mainly wanted to ask before sending a PR.
Thanks again for curating this list.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels