Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      0100Updated Apr 24, 2026Apr 24, 2026
    • [ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
      Python
      3333671Updated Apr 23, 2026Apr 23, 2026
    • Python
      7800Updated Apr 2, 2026Apr 2, 2026
    • Benchmarking Language Agents Under Controllable and Extreme Context Growth
      Python
      MIT License
      53710Updated Mar 30, 2026Mar 30, 2026
    • KernelGYM

      Public
      [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations
      Python
      1816031Updated Mar 29, 2026Mar 29, 2026
    • Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.
      Python
      55100Updated Mar 10, 2026Mar 10, 2026
    • [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
      Python
      MIT License
      01700Updated Feb 9, 2026Feb 9, 2026
    • Simple RL training for reasoning
      Python
      MIT License
      2893.8k331Updated Dec 23, 2025Dec 23, 2025
    • "Large Language Models" Course (COMP4901B) offered in HKUST
      Python
      121001Updated Nov 23, 2025Nov 23, 2025
    • Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
      Python
      12210Updated Oct 8, 2025Oct 8, 2025
    • From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
      Python
      MIT License
      12500Updated Oct 7, 2025Oct 7, 2025
    • The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
      Python
      311510Updated Sep 29, 2025Sep 29, 2025
    • ceval

      Public
      Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
      Python
      MIT License
      831.8k60Updated Jul 27, 2025Jul 27, 2025
    • mstar

      Public
      [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
      MIT License
      37320Updated Jul 13, 2025Jul 13, 2025
    • Laser

      Public
      [ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
      Python
      36440Updated May 22, 2025May 22, 2025
    • B-STaR

      Public
      B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
      Python
      118600Updated May 21, 2025May 21, 2025
    • CodeIO

      Public
      [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
      Python
      3456801Updated May 6, 2025May 6, 2025
    • GUIMid

      Public
      02210Updated May 3, 2025May 3, 2025
    • The official repo of "On the Perception Bottleneck of VLMs for Chart Understanding"
      Jupyter Notebook
      0900Updated Apr 12, 2025Apr 12, 2025
    • PreSelect

      Public
      [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches
      Python
      96400Updated Mar 4, 2025Mar 4, 2025
    • dart-math

      Public
      [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
      Jupyter Notebook
      MIT License
      712150Updated Dec 10, 2024Dec 10, 2024
    • deita

      Public
      Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
      Python
      Apache License 2.0
      3559490Updated Dec 9, 2024Dec 9, 2024
    • On the Universal Truthfulness Hyperplane Inside LLMs (EMNLP 2024)
      Python
      2600Updated Oct 3, 2024Oct 3, 2024
    • Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
      Python
      MIT License
      714700Updated Sep 20, 2024Sep 20, 2024
    • An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
      SAS
      41409125Updated May 20, 2024May 20, 2024
    • In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
      Python
      96231Updated Mar 30, 2024Mar 30, 2024
    • JavaScript
      1000Updated Jan 25, 2024Jan 25, 2024
    • felm

      Public
      Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
      Python
      16430Updated Dec 25, 2023Dec 25, 2023
    • [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
      Python
      Apache License 2.0
      96141Updated Nov 26, 2023Nov 26, 2023
    • Python
      1700Updated Oct 3, 2023Oct 3, 2023
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.