-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feature/grading vscode test response #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
d7b44a2
21f04f5
63346fb
a41fb07
eb826b1
1e279b4
937c3b6
4886cb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # VSCode Integration Tests | ||
|
|
||
| This document describes the integration test setup for the Roo Code VSCode extension. | ||
|
|
||
| ## Overview | ||
|
|
||
| The integration tests use the `@vscode/test-electron` package to run tests in a real VSCode environment. These tests verify that the extension works correctly within VSCode, including features like mode switching, webview interactions, and API communication. | ||
|
|
||
| ## Test Setup | ||
|
|
||
| ### Directory Structure | ||
|
|
||
| ``` | ||
| src/test/ | ||
| ├── runTest.ts # Main test runner | ||
| ├── suite/ | ||
| │ ├── index.ts # Test suite configuration | ||
| │ ├── modes.test.ts # Mode switching tests | ||
| │ ├── tasks.test.ts # Task execution tests | ||
| │ └── extension.test.ts # Extension activation tests | ||
| ``` | ||
|
|
||
| ### Test Runner Configuration | ||
|
|
||
| The test runner (`runTest.ts`) is responsible for: | ||
|
|
||
| - Setting up the extension development path | ||
| - Configuring the test environment | ||
| - Running the integration tests using `@vscode/test-electron` | ||
|
|
||
| ### Environment Setup | ||
|
|
||
| 1. Create a `.env.integration` file in the root directory with required environment variables: | ||
|
|
||
| ``` | ||
| OPENROUTER_API_KEY=sk-or-v1-... | ||
| ``` | ||
|
|
||
| 2. The test suite (`suite/index.ts`) configures: | ||
|
|
||
| - Mocha test framework with TDD interface | ||
| - 10-minute timeout for LLM communication | ||
| - Global extension API access | ||
| - WebView panel setup | ||
| - OpenRouter API configuration | ||
|
|
||
| ## Test Suite Structure | ||
|
|
||
| Tests are organized using Mocha's TDD interface (`suite` and `test` functions). The main test files are: | ||
|
|
||
| - `modes.test.ts`: Tests mode switching functionality | ||
| - `tasks.test.ts`: Tests task execution | ||
| - `extension.test.ts`: Tests extension activation | ||
|
|
||
| ### Global Objects | ||
|
|
||
| The following global objects are available in tests: | ||
|
|
||
| ```typescript | ||
| declare global { | ||
| var api: ClineAPI | ||
| var provider: ClineProvider | ||
| var extension: vscode.Extension<ClineAPI> | ||
| var panel: vscode.WebviewPanel | ||
| } | ||
| ``` | ||
|
|
||
| ## Running Tests | ||
|
|
||
| 1. Ensure you have the required environment variables set in `.env.integration` | ||
|
|
||
| 2. Run the integration tests: | ||
|
|
||
| ```bash | ||
| npm run test:integration | ||
| ``` | ||
|
|
||
| The tests will: | ||
|
|
||
| - Download and launch a clean VSCode instance | ||
| - Install the extension | ||
| - Execute the test suite | ||
| - Report results | ||
|
|
||
| ## Writing New Tests | ||
|
|
||
| When writing new integration tests: | ||
|
|
||
| 1. Create a new test file in `src/test/suite/` with the `.test.ts` extension | ||
|
|
||
| 2. Add the test file to the `files` array in `suite/index.ts` (you can temporarily comment out the other tests to run just the new test): | ||
|
|
||
| ```typescript | ||
| const files = ["suite/modes.test.js", "suite/tasks.test.js", "suite/extension.test.js", "suite/your-new-test.test.js"] | ||
| ``` | ||
|
|
||
| 3. Structure your tests using the TDD interface: | ||
|
|
||
| ```typescript | ||
| import * as assert from "assert" | ||
| import * as vscode from "vscode" | ||
|
|
||
| suite("Your Test Suite Name", () => { | ||
| test("Should do something specific", async function () { | ||
| // Your test code here | ||
| }) | ||
| }) | ||
| ``` | ||
|
|
||
| 4. Use the global objects (`api`, `provider`, `extension`, `panel`) to interact with the extension | ||
|
|
||
| ### Best Practices | ||
|
|
||
| 1. **Timeouts**: Use appropriate timeouts for async operations: | ||
|
|
||
| ```typescript | ||
| const timeout = 30000 | ||
| const interval = 1000 | ||
| ``` | ||
|
|
||
| 2. **State Management**: Reset extension state before/after tests: | ||
|
|
||
| ```typescript | ||
| await globalThis.provider.updateGlobalState("mode", "Ask") | ||
| await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true) | ||
| ``` | ||
|
|
||
| 3. **Assertions**: Use clear assertions with meaningful messages: | ||
|
|
||
| ```typescript | ||
| assert.ok(condition, "Descriptive message about what failed") | ||
| ``` | ||
|
|
||
| 4. **Error Handling**: Wrap test code in try/catch blocks and clean up resources: | ||
|
|
||
| ```typescript | ||
| try { | ||
| // Test code | ||
| } finally { | ||
| // Cleanup code | ||
| } | ||
| ``` | ||
|
|
||
| 5. **Wait for Operations**: Use polling when waiting for async operations: | ||
|
|
||
| ```typescript | ||
| let startTime = Date.now() | ||
| while (Date.now() - startTime < timeout) { | ||
| if (condition) break | ||
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
| ``` | ||
|
|
||
| 6. **Grading**: When grading tests, use the `Grade:` format to ensure the test is graded correctly (See modes.test.ts for an example). | ||
|
|
||
| ```typescript | ||
| await globalThis.api.startNewTask( | ||
| `Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`, | ||
| ) | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,8 @@ suite("Roo Code Modes", () => { | |
| test("Should handle switching modes correctly", async function () { | ||
| const timeout = 30000 | ||
| const interval = 1000 | ||
|
|
||
| const testPrompt = | ||
| "For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete" | ||
| if (!globalThis.extension) { | ||
| assert.fail("Extension not found") | ||
| } | ||
|
|
@@ -27,9 +28,7 @@ suite("Roo Code Modes", () => { | |
| await globalThis.provider.updateGlobalState("autoApprovalEnabled", true) | ||
|
|
||
| // Start a new task. | ||
| await globalThis.api.startNewTask( | ||
| "For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete", | ||
| ) | ||
| await globalThis.api.startNewTask(testPrompt) | ||
|
|
||
| // Wait for task to appear in history with tokens. | ||
| startTime = Date.now() | ||
|
|
@@ -52,46 +51,46 @@ suite("Roo Code Modes", () => { | |
| assert.fail("No messages received") | ||
| } | ||
|
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'code' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to code mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes("software engineer"), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'", | ||
| await globalThis.provider.updateGlobalState("mode", "Ask") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading. |
||
| let output = globalThis.provider.messages.map(({ type, text }) => (type === "say" ? text : "")).join("\n") | ||
| await globalThis.api.startNewTask( | ||
| `Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`, | ||
| ) | ||
|
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && text?.includes(`"request":"[switch_mode to 'architect' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to architect mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && (text?.includes("technical planning") || text?.includes("technical leader")), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Architect mode, specializing in analyzing codebases'", | ||
| ) | ||
| startTime = Date.now() | ||
|
|
||
| while (Date.now() - startTime < timeout) { | ||
| const messages = globalThis.provider.messages | ||
|
|
||
| if ( | ||
| messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && text?.includes("I AM DONE GRADING") && !text?.includes("be sure to say"), | ||
| ) | ||
| ) { | ||
| break | ||
| } | ||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
| if (globalThis.provider.messages.length === 0) { | ||
| assert.fail("No messages received") | ||
| } | ||
| globalThis.provider.messages.forEach(({ type, text }) => { | ||
| if (type === "say" && text?.includes("Grade:")) { | ||
| console.log(text) | ||
| } | ||
| }) | ||
| const grade = globalThis.provider.messages.find( | ||
| ({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"), | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nitpick, but maybe could use a regex to pull out the score?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe something like this would be a little more DRY?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
| )?.text | ||
| console.log("THIS IS THE GRADE", grade) | ||
ColemanRoo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'ask' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to ask mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && (text?.includes("technical knowledge") || text?.includes("technical assist")), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Ask mode, specializing in answering questions'", | ||
| grade?.includes("Grade: 10") || | ||
| grade?.includes("Grade: 9") || | ||
| grade?.includes("Grade: 8") || | ||
| grade?.includes("Grade: 7"), | ||
| "Did not receive expected response containing 'Grade: 10' or 'Grade: 9' or 'Grade: 8' or 'Grade: 7'", | ||
| ) | ||
| } finally { | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it hurt to keep this? I can imagine someone getting confused if they want to add more tests and they don't know about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't necessarily hurt to keep this, but it makes getting set up to run a single test locally more annoying. I made sure that in the VSCODE_INTEGRATION_TESTS.md there are explicit instructions around that part. Also this allows us to potentially quickly turn a test on or off by removing it from the list without having to move or delete the test.
This set up with a list of tests is similar to how we do things in Roo Automation Mobile, but happy to swap back to a single line if that's what makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's more intuitive to just have it run all tests without needing to add them to this list. You can always add
.skipor.onlyto control which ones are run.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change to have all tests run by default, but I am leaving the array of test files commented out with instructions as it is way easier to modify the index.ts file to only run the single test someone is working on than having to go to each test they don't want to run and add skip. I don't believe the
.onlyfunction works for this setup.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.onlyseems to work for me. What do you see when you try?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried that the runner was going to go in order so, if it grabbed the test that didn't have
.onlyfirst it would still run it, but it seems like it is smart enough. I'll remove the list fromindex.tsand update the doc