Change the repository type filter
All
Repositories list
43 repositories
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
opencompass
PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets…- [ICLR 2026] Rectifying LLM Thought From Lens of Optimization
SAGA
PublicInteractScience
PublicCognitiveKernel-Pro
PublicGAOKAO-Eval
Public.github
PublicMMBench-GUI
PublicOfficial repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical mann…ReasonZoo
PublicCompassVerifier
PublicGPassK
Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?Creation-MMBench
PublicCompassJudger
PublicRaML
PublicBotChat
PublicAda-LEval
PublicMathBench
PublicMMBench
PublicProSA
PublicANAH
Publicoc_doc_website
PublicCriticEval
Publiclagent-cibench
Publichinode
Public