You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker compose -f packages/evals/docker-compose.yml --profile server --profile runner up --build --scale runner=0
29
+
pnpm evals
30
30
```
31
31
32
-
The initial build process can take a minute or two. Upon success you should see output indicating that a web service is running on [localhost:3000](http://localhost:3000/):
33
-
<imgwidth="1182"alt="Screenshot 2025-06-05 at 12 05 38 PM"src="https://github.com/user-attachments/assets/34f25a59-1362-458c-aafa-25e13cdb2a7a" />
32
+
The initial build process can take a minute or two. Upon success you should see output indicating that a web service is running on localhost:3000:
Navigate to [localhost:3446](http://localhost:3446/) in your browser and click the 🚀 button.
39
39
40
40
By default a evals run will run all programming exercises in [Roo Code Evals](https://github.com/RooCodeInc/Roo-Code-Evals) repository with the Claude Sonnet 4 model and default settings. For basic configuration you can specify the LLM to use and any subset of the exercises you'd like. For advanced configuration you can import a Roo Code settings file which will allow you to run the evals with Roo Code configured any way you'd like (this includes custom modes, a footgun prompt, etc).
41
41
42
-
<imgwidth="1053"alt="Screenshot 2025-06-05 at 12 08 06 PM"src="https://github.com/user-attachments/assets/2367eef4-6ae9-4ac2-8ee4-80f981046486" />
After clicking "Launch" you should find that a "controller" container has spawned as well as `N` "task" containers where `N` is the value you chose for concurrency:
45
-
<imgwidth="1283"alt="Screenshot 2025-06-05 at 12 13 29 PM"src="https://github.com/user-attachments/assets/024413e2-c886-4272-ab59-909b4b114e7c" />
0 commit comments