Skip to content

Pinned Loading

  1. terminal-bench terminal-bench Public

    A benchmark for LLMs on complicated tasks in the terminal

    Python 1.6k 481

  2. harbor harbor Public

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    Python 830 612

  3. terminal-bench-2 terminal-bench-2 Public

    Shell 109 39

  4. terminal-bench-3 terminal-bench-3 Public

    🚧 Accepting Task Submissions 🚧

    Python 44 54

  5. terminal-bench-science terminal-bench-science Public

    Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal

    Shell 24 18

  6. awesome-harbor awesome-harbor Public

    A curated list of awesome Harbor ecosystem projects

    14 1

Repositories

Showing 9 of 9 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Most used topics

Loading…