Skip to content

Commit dfb892b

Browse files
authored
Update README.md
1 parent 0df1da5 commit dfb892b

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

packages/evals/README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,23 @@ Start the evals service:
2929
docker compose -f packages/evals/docker-compose.yml --profile server --profile runner up --build --scale runner=0
3030
```
3131

32-
Navigate to [localhost:3000](http://localhost:3000/) in your browser.
32+
The initial build process can take a minute or two. Upon success you should see ouput indicating that a web service is running on [localhost:3000](http://localhost:3000/):
33+
<img width="1182" alt="Screenshot 2025-06-05 at 12 05 38 PM" src="https://github.com/user-attachments/assets/34f25a59-1362-458c-aafa-25e13cdb2a7a" />
34+
35+
Additionally, you'll find in Docker Desktop that database and redis services are running:
36+
<img width="1283" alt="Screenshot 2025-06-05 at 12 07 09 PM" src="https://github.com/user-attachments/assets/ad75d791-9cc7-41e3-8168-df7b21b49da2" />
37+
38+
Navigate to [localhost:3000](http://localhost:3000/) in your browser and click the 🚀 button.
39+
40+
By default a evals run will run all programming exercises in [Roo Code Evals](https://github.com/RooCodeInc/Roo-Code-Evals) repository with the Claude Sonnet 4 model and default settings. For basic configuration you can specify the LLM to use and any subset of the exercises you'd like. For advanced configuration you can import a Roo Code settings file which will allow you to run the evals with Roo Code configured any way you'd like (this includes custom modes, a footgun prompt, etc).
41+
42+
<img width="1053" alt="Screenshot 2025-06-05 at 12 08 06 PM" src="https://github.com/user-attachments/assets/2367eef4-6ae9-4ac2-8ee4-80f981046486" />
43+
44+
After clicking "Launch" you should find that a "controller" container has spawned as well as `N` "task" containers where `N` is the value you chose for concurrency:
45+
<img width="1283" alt="Screenshot 2025-06-05 at 12 13 29 PM" src="https://github.com/user-attachments/assets/024413e2-c886-4272-ab59-909b4b114e7c" />
46+
47+
The web app's UI should update in realtime with the results of the eval run:
48+
<img width="1053" alt="Screenshot 2025-06-05 at 12 14 52 PM" src="https://github.com/user-attachments/assets/6fe3b651-0898-4f14-a231-3cc8d66f0e1f" />
3349

3450
## Advanced Usage / Debugging
3551

0 commit comments

Comments
 (0)