Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apps/web-evals/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"scripts": {
"lint": "next lint",
"check-types": "tsc -b",
"dev": "scripts/check-services.sh && next dev",
"dev": "scripts/check-services.sh && next dev --port 8080",
"format": "prettier --write src",
"build": "next build",
"start": "next start"
Expand Down
5 changes: 2 additions & 3 deletions packages/evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ Start the evals service:
docker compose -f packages/evals/docker-compose.yml --profile server --profile runner up --build --scale runner=0
```

The initial build process can take a minute or two. Upon success you should see ouput indicating that a web service is running on [localhost:3000](http://localhost:3000/):
The initial build process can take a minute or two. Upon success you should see ouput indicating that a web service is running on [localhost:8080](http://localhost:8080/):
<img width="1182" alt="Screenshot 2025-06-05 at 12 05 38 PM" src="https://github.com/user-attachments/assets/34f25a59-1362-458c-aafa-25e13cdb2a7a" />

Additionally, you'll find in Docker Desktop that database and redis services are running:
<img width="1283" alt="Screenshot 2025-06-05 at 12 07 09 PM" src="https://github.com/user-attachments/assets/ad75d791-9cc7-41e3-8168-df7b21b49da2" />

Navigate to [localhost:3000](http://localhost:3000/) in your browser and click the 🚀 button.
Navigate to [localhost:8080](http://localhost:8080/) in your browser and click the 🚀 button.

By default a evals run will run all programming exercises in [Roo Code Evals](https://github.com/RooCodeInc/Roo-Code-Evals) repository with the Claude Sonnet 4 model and default settings. For basic configuration you can specify the LLM to use and any subset of the exercises you'd like. For advanced configuration you can import a Roo Code settings file which will allow you to run the evals with Roo Code configured any way you'd like (this includes custom modes, a footgun prompt, etc).

Expand Down Expand Up @@ -68,7 +68,6 @@ To stop an evals run early you can simply stop the "controller" container using

<img width="1302" alt="Screenshot 2025-06-06 at 9 00 41 AM" src="https://github.com/user-attachments/assets/a9d4725b-730c-441a-ba24-ac99f9599ced" />


## Advanced Usage / Debugging

The evals system runs VS Code headlessly in Docker containers for consistent, reproducible environments. While this design ensures reliability, it can make debugging more challenging. For debugging purposes, you can run the system locally on macOS, though this approach is less reliable due to hardware and environment variability.
Expand Down
2 changes: 1 addition & 1 deletion packages/evals/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ services:
context: ../../
dockerfile: packages/evals/Dockerfile.web
ports:
- "3000:3000"
- "8080:3000"
environment:
- HOST_EXECUTION_METHOD=docker
volumes:
Expand Down
4 changes: 2 additions & 2 deletions packages/evals/scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@ fi

echo -e "\n🚀 You're ready to rock and roll! \n"

if ! nc -z localhost 3000; then
if ! nc -z localhost 8080; then
read -p "🌐 Would you like to start the evals web app? (Y/n): " start_evals

if [[ "$start_evals" =~ ^[Yy]|^$ ]]; then
Expand All @@ -386,5 +386,5 @@ if ! nc -z localhost 3000; then
echo "💡 You can start it anytime with 'pnpm --filter @roo-code/web-evals dev'."
fi
else
echo "👟 The evals web app is running at http://localhost:3000"
echo "👟 The evals web app is running at http://localhost:8080"
fi
Loading