Skip to content

Commit 655326e

Browse files
committed
AI generated evals docs
1 parent 4b20d2e commit 655326e

File tree

5 files changed

+215
-5
lines changed

5 files changed

+215
-5
lines changed

.nvmrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
v20.18.1

.tool-versions

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
nodejs v20.18.1

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
# Roo Code Docs
22

3-
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator, and lives at https://docs.roocode.com
3+
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator, and lives at https://docs.roocode.com.
44

55
### Installation
66

7-
```
8-
$ npm install
7+
```sh
8+
npm install
99
```
1010

1111
### Local Development
1212

13-
```
14-
$ npm start
13+
```sh
14+
npm start
1515
```
1616

1717
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

docs/evals/evals.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
sidebar_label: Evals
3+
---
4+
5+
# Roo Code Evals System
6+
7+
## What is Roo Code Evals?
8+
9+
The Roo Code Evals System is a user-friendly testing framework that helps you evaluate how well the Roo Code AI coding assistant performs on various programming tasks. It allows you to:
10+
11+
- Run coding exercises across multiple programming languages (JavaScript, Python, Go, Rust, Java)
12+
- Compare different AI models (Claude, Gemini, etc.)
13+
- Experiment with different Roo Code settings
14+
- Measure performance metrics like success rate, cost, and speed
15+
- Analyze results to find the optimal configuration for your needs
16+
17+
## Getting Started
18+
19+
Setting up the Roo Code Evals system is simple with our one-step setup script:
20+
21+
### Prerequisites
22+
23+
- macOS (currently the only supported operating system)
24+
- Visual Studio Code
25+
- Internet connection
26+
27+
### Quick Setup
28+
29+
1. Run the setup script:
30+
```bash
31+
./scripts/setup.sh
32+
```
33+
34+
2. Follow the interactive prompts to:
35+
- Select which programming languages you want to test (JavaScript, Python, Go, Rust, Java)
36+
- Install necessary tools and dependencies
37+
- Configure your OpenRouter API key (required to access AI models)
38+
39+
The setup process is automated and will take care of everything you need, including:
40+
- Installing required tools and language environments
41+
- Building and installing the Roo Code VSCode extension
42+
- Setting up the database to store your results
43+
- Configuring everything for immediate use
44+
45+
Once setup is complete, you'll be ready to start running evals and experimenting with different models and settings!
46+
47+
## Experimenting with Models and Settings
48+
49+
The Roo Code Evals system makes it easy to experiment with different AI models and settings to find the optimal configuration for your needs.
50+
51+
### Running Your First Eval
52+
53+
1. Start the web app:
54+
```bash
55+
pnpm web
56+
```
57+
58+
2. Open your browser and navigate to http://localhost:3000
59+
60+
3. Click the "Launch" button (rocket icon) to create a new eval run
61+
62+
4. Configure your experiment:
63+
- **Model Selection**: Choose from recommended models like Claude or Gemini, or explore other OpenRouter models
64+
- **Settings Configuration**: Import custom settings or use defaults (more on this below)
65+
- **Exercise Selection**: Run all exercises or select specific ones to focus your testing
66+
- **Description**: Add notes about what you're testing for future reference
67+
68+
5. Click "Launch" to start the evaluation
69+
70+
The system will automatically:
71+
- Launch VSCode instances for each exercise
72+
- Run the Roo Code agent on the coding tasks
73+
- Collect performance metrics
74+
- Run tests to verify the solutions
75+
- Display results in the web interface
76+
77+
### Comparing Different Models
78+
79+
To compare how different AI models perform:
80+
81+
1. Run an eval with one model (e.g., Claude)
82+
2. Run another eval with a different model (e.g., Gemini)
83+
3. Compare the results in the web interface:
84+
- Success rates
85+
- Completion times
86+
- Token usage
87+
- Costs
88+
89+
This helps you identify which model works best for your specific coding needs.
90+
91+
### Command Line Usage (Advanced)
92+
93+
For advanced users, you can also run evals from the command line:
94+
95+
```bash
96+
# Run all exercises for all languages
97+
pnpm cli all
98+
99+
# Run all exercises for a specific language
100+
pnpm cli javascript
101+
102+
# Run a specific exercise
103+
pnpm cli javascript todo-app
104+
```
105+
106+
## Experimenting with Custom Settings
107+
108+
One of the most powerful features of the Roo Code Evals system is the ability to test how different Roo Code settings affect performance.
109+
110+
### Exporting Settings from Roo Code
111+
112+
1. In VSCode with the Roo Code extension installed:
113+
- Open the Command Palette (Cmd+Shift+P or Ctrl+Shift+P)
114+
- Type "Roo Code: Export Settings"
115+
- Save the settings JSON file to your computer
116+
117+
This file contains all your current Roo Code configuration, including:
118+
- Model preferences
119+
- Context handling settings
120+
- Code generation parameters
121+
- Tool usage settings
122+
- And many other customizable options
123+
124+
### Importing Settings into Evals
125+
126+
When creating a new eval run:
127+
1. Click the "Import Settings" button
128+
2. Select your exported settings JSON file
129+
3. The system will show you a diff of how your settings differ from the defaults
130+
4. Run the eval with these custom settings
131+
132+
### Comparing Settings Performance
133+
134+
To find the optimal settings for your workflow:
135+
1. Run an eval with default settings
136+
2. Export your custom settings from Roo Code
137+
3. Run another eval with your custom settings
138+
4. Compare the results to see which configuration performs better
139+
140+
This approach lets you fine-tune Roo Code to your specific needs and coding style.
141+
142+
## Understanding Your Results
143+
144+
The evals system provides easy-to-understand metrics to help you evaluate performance:
145+
146+
### Key Performance Indicators
147+
148+
- **Success Rate**: Percentage of exercises completed successfully
149+
- **Completion Time**: How long it took to solve each exercise
150+
- **Cost Efficiency**: How much you spent on API calls
151+
- **Token Usage**: How efficiently the AI used its context
152+
153+
### Viewing and Interpreting Results
154+
155+
The web interface makes it easy to analyze your results:
156+
157+
1. **Dashboard View**: See all your runs with summary metrics
158+
2. **Detailed Run View**: Click on a run to see performance for each exercise
159+
3. **Console Output**: Click on an exercise to see the actual interaction with the AI
160+
161+
When comparing runs, look for:
162+
- Which model has the highest success rate
163+
- Which settings configuration completes tasks faster
164+
- How different models balance speed vs. cost
165+
166+
## Quick Troubleshooting Guide
167+
168+
If you encounter any issues while using the Roo Code Evals system, here are some simple solutions:
169+
170+
### "The web app isn't starting"
171+
- Make sure you've completed the setup process
172+
- Try running `pnpm install` and then `pnpm web` again
173+
- Check that port 3000 isn't being used by another application
174+
175+
### "My eval run isn't starting"
176+
- Verify your OpenRouter API key is valid
177+
- Ensure VSCode is properly installed and accessible from the command line
178+
- Check that you have the necessary language environments installed for the exercises you're trying to run
179+
180+
### "The Roo Code extension isn't working"
181+
- Make sure the extension was properly built during setup
182+
- Try running the setup script again with the option to rebuild the extension
183+
- Verify that VSCode can find and load the extension
184+
185+
### "My settings import failed"
186+
- Make sure you're using a valid settings export from Roo Code
187+
- Check that the JSON file isn't corrupted
188+
- Try exporting your settings from Roo Code again
189+
190+
### Getting More Help
191+
192+
If you continue to experience issues:
193+
- Check the console output in the terminal where you started the web app
194+
- Look at the task console output in the web interface
195+
- Join our community Discord for support from other users and the development team
196+
197+
## Happy Experimenting!
198+
199+
The Roo Code Evals system is designed to help you find the perfect combination of AI models and settings for your coding workflow. By experimenting with different configurations and comparing the results, you can optimize Roo Code to be an even more effective coding assistant for your specific needs.
200+
201+
We encourage you to try different models, adjust settings, and share your findings with the community!

sidebars.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,13 @@ const sidebars: SidebarsConfig = {
114114
'providers/vscode-lm',
115115
]
116116
},
117+
{
118+
type: 'category',
119+
label: 'Evals',
120+
items: [
121+
'evals/evals',
122+
],
123+
},
117124
{
118125
type: 'category',
119126
label: 'FAQ',

0 commit comments

Comments
 (0)