Skip to content

Suggestion: add WFGY (TXT based long horizon stress test + failure map) #4

@onestardao

Description

@onestardao

Hi, and thanks for putting this list together. It is a very useful overview of the AI evaluation ecosystem.

I would like to ask whether a project called WFGY could fit somewhere in this repo, probably under the robustness / stress-testing side of evaluation.

Short description:

  • WFGY 3.0 · Singularity Demo is a pure TXT pack used as a long horizon “tension crash test” for LLMs. Models read one S-class problem after another and we check when their reasoning quietly drifts or collapses.
  • WFGY ProblemMap is a diagnostic map of common AI system failure modes (for example RAG issues, vector store mistakes, deployment order problems). It is used to label what went wrong when a model starts behaving strangely during those stress tests.

Everything is open source on GitHub (MIT license, 1k+ stars) and fully transparent. There is no hidden code and no external calls.

If this looks in scope for awesome-ai-eval, I am happy to prepare a small PR that adds a single line in the appropriate section. If it is out of scope, that is also totally fine; I mainly wanted to ask before sending a PR.

Thanks again for curating this list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions