-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feature/grading vscode test response #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d7b44a2
21f04f5
63346fb
a41fb07
eb826b1
1e279b4
937c3b6
4886cb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
| # VSCode Integration Tests | ||
|
|
||
| This document describes the integration test setup for the Roo Code VSCode extension. | ||
|
|
||
| ## Overview | ||
|
|
||
| The integration tests use the `@vscode/test-electron` package to run tests in a real VSCode environment. These tests verify that the extension works correctly within VSCode, including features like mode switching, webview interactions, and API communication. | ||
|
|
||
| ## Test Setup | ||
|
|
||
| ### Directory Structure | ||
|
|
||
| ``` | ||
| src/test/ | ||
| ├── runTest.ts # Main test runner | ||
| ├── suite/ | ||
| │ ├── index.ts # Test suite configuration | ||
| │ ├── modes.test.ts # Mode switching tests | ||
| │ ├── tasks.test.ts # Task execution tests | ||
| │ └── extension.test.ts # Extension activation tests | ||
| ``` | ||
|
|
||
| ### Test Runner Configuration | ||
|
|
||
| The test runner (`runTest.ts`) is responsible for: | ||
|
|
||
| - Setting up the extension development path | ||
| - Configuring the test environment | ||
| - Running the integration tests using `@vscode/test-electron` | ||
|
|
||
| ### Environment Setup | ||
|
|
||
| 1. Create a `.env.integration` file in the root directory with required environment variables: | ||
|
|
||
| ``` | ||
| OPENROUTER_API_KEY=sk-or-v1-... | ||
| ``` | ||
|
|
||
| 2. The test suite (`suite/index.ts`) configures: | ||
|
|
||
| - Mocha test framework with TDD interface | ||
| - 10-minute timeout for LLM communication | ||
| - Global extension API access | ||
| - WebView panel setup | ||
| - OpenRouter API configuration | ||
|
|
||
| ## Test Suite Structure | ||
|
|
||
| Tests are organized using Mocha's TDD interface (`suite` and `test` functions). The main test files are: | ||
|
|
||
| - `modes.test.ts`: Tests mode switching functionality | ||
| - `tasks.test.ts`: Tests task execution | ||
| - `extension.test.ts`: Tests extension activation | ||
|
|
||
| ### Global Objects | ||
|
|
||
| The following global objects are available in tests: | ||
|
|
||
| ```typescript | ||
| declare global { | ||
| var api: ClineAPI | ||
| var provider: ClineProvider | ||
| var extension: vscode.Extension<ClineAPI> | ||
| var panel: vscode.WebviewPanel | ||
| } | ||
| ``` | ||
|
|
||
| ## Running Tests | ||
|
|
||
| 1. Ensure you have the required environment variables set in `.env.integration` | ||
|
|
||
| 2. Run the integration tests: | ||
|
|
||
| ```bash | ||
| npm run test:integration | ||
| ``` | ||
|
|
||
| 3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Be sure to remove the `test.only` function before committing your changes. | ||
|
|
||
| The tests will: | ||
|
|
||
| - Download and launch a clean VSCode instance | ||
| - Install the extension | ||
| - Execute the test suite | ||
| - Report results | ||
|
|
||
| ## Writing New Tests | ||
|
|
||
| When writing new integration tests: | ||
|
|
||
| 1. Create a new test file in `src/test/suite/` with the `.test.ts` extension | ||
|
|
||
| 2. Structure your tests using the TDD interface: | ||
|
|
||
| ```typescript | ||
| import * as assert from "assert" | ||
| import * as vscode from "vscode" | ||
|
|
||
| suite("Your Test Suite Name", () => { | ||
| test("Should do something specific", async function () { | ||
| // Your test code here | ||
| }) | ||
| }) | ||
| ``` | ||
|
|
||
| 3. Use the global objects (`api`, `provider`, `extension`, `panel`) to interact with the extension | ||
|
|
||
| ### Best Practices | ||
|
|
||
| 1. **Timeouts**: Use appropriate timeouts for async operations: | ||
|
|
||
| ```typescript | ||
| const timeout = 30000 | ||
| const interval = 1000 | ||
| ``` | ||
|
|
||
| 2. **State Management**: Reset extension state before/after tests: | ||
|
|
||
| ```typescript | ||
| await globalThis.provider.updateGlobalState("mode", "Ask") | ||
| await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true) | ||
| ``` | ||
|
|
||
| 3. **Assertions**: Use clear assertions with meaningful messages: | ||
|
|
||
| ```typescript | ||
| assert.ok(condition, "Descriptive message about what failed") | ||
| ``` | ||
|
|
||
| 4. **Error Handling**: Wrap test code in try/catch blocks and clean up resources: | ||
|
|
||
| ```typescript | ||
| try { | ||
| // Test code | ||
| } finally { | ||
| // Cleanup code | ||
| } | ||
| ``` | ||
|
|
||
| 5. **Wait for Operations**: Use polling when waiting for async operations: | ||
|
|
||
| ```typescript | ||
| let startTime = Date.now() | ||
| while (Date.now() - startTime < timeout) { | ||
| if (condition) break | ||
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
| ``` | ||
|
|
||
| 6. **Grading**: When grading tests, use the `Grade:` format to ensure the test is graded correctly (See modes.test.ts for an example). | ||
|
|
||
| ```typescript | ||
| await globalThis.api.startNewTask( | ||
| `Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`, | ||
| ) | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,8 @@ suite("Roo Code Modes", () => { | |
| test("Should handle switching modes correctly", async function () { | ||
| const timeout = 30000 | ||
| const interval = 1000 | ||
|
|
||
| const testPrompt = | ||
| "For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete" | ||
| if (!globalThis.extension) { | ||
| assert.fail("Extension not found") | ||
| } | ||
|
|
@@ -27,9 +28,7 @@ suite("Roo Code Modes", () => { | |
| await globalThis.provider.updateGlobalState("autoApprovalEnabled", true) | ||
|
|
||
| // Start a new task. | ||
| await globalThis.api.startNewTask( | ||
| "For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete", | ||
| ) | ||
| await globalThis.api.startNewTask(testPrompt) | ||
|
|
||
| // Wait for task to appear in history with tokens. | ||
| startTime = Date.now() | ||
|
|
@@ -52,47 +51,50 @@ suite("Roo Code Modes", () => { | |
| assert.fail("No messages received") | ||
| } | ||
|
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'code' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to code mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes("software engineer"), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'", | ||
| ) | ||
| //Log the messages to the console | ||
| globalThis.provider.messages.forEach(({ type, text }) => { | ||
| if (type === "say") { | ||
| console.log(text) | ||
| } | ||
| }) | ||
|
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && text?.includes(`"request":"[switch_mode to 'architect' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to architect mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && (text?.includes("technical planning") || text?.includes("technical leader")), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Architect mode, specializing in analyzing codebases'", | ||
| //Start Grading Portion of test to grade the response from 1 to 10 | ||
| await globalThis.provider.updateGlobalState("mode", "Ask") | ||
| let output = globalThis.provider.messages.map(({ type, text }) => (type === "say" ? text : "")).join("\n") | ||
| await globalThis.api.startNewTask( | ||
| `Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`, | ||
| ) | ||
|
|
||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'ask' because:`), | ||
| ), | ||
| "Did not receive expected response containing 'Roo wants to switch to ask mode'", | ||
| ) | ||
| assert.ok( | ||
| globalThis.provider.messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && (text?.includes("technical knowledge") || text?.includes("technical assist")), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Ask mode, specializing in answering questions'", | ||
| ) | ||
| startTime = Date.now() | ||
|
|
||
| while (Date.now() - startTime < timeout) { | ||
| const messages = globalThis.provider.messages | ||
|
|
||
| if ( | ||
| messages.some( | ||
| ({ type, text }) => | ||
| type === "say" && text?.includes("I AM DONE GRADING") && !text?.includes("be sure to say"), | ||
| ) | ||
| ) { | ||
| break | ||
| } | ||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
| if (globalThis.provider.messages.length === 0) { | ||
| assert.fail("No messages received") | ||
| } | ||
| globalThis.provider.messages.forEach(({ type, text }) => { | ||
| if (type === "say" && text?.includes("Grade:")) { | ||
| console.log(text) | ||
| } | ||
| }) | ||
| const gradeMessage = globalThis.provider.messages.find( | ||
| ({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"), | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nitpick, but maybe could use a regex to pull out the score?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe something like this would be a little more DRY?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
| )?.text | ||
| const gradeMatch = gradeMessage?.match(/Grade: (\d+)/) | ||
| const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined | ||
| assert.ok(gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10, "Grade must be between 7 and 10") | ||
| } finally { | ||
| } | ||
| }) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,16 +22,19 @@ suite("Roo Code Task", () => { | |
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
|
|
||
| await globalThis.provider.updateGlobalState("mode", "Code") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding cleanup logic for the global state changes to prevent side effects on other tests. |
||
| await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true) | ||
| await globalThis.provider.updateGlobalState("autoApprovalEnabled", true) | ||
|
|
||
| await globalThis.api.startNewTask("Hello world, what is your name? Respond with 'My name is ...'") | ||
|
|
||
| // Wait for task to appear in history with tokens. | ||
| startTime = Date.now() | ||
|
|
||
| while (Date.now() - startTime < timeout) { | ||
| const state = await globalThis.provider.getState() | ||
| const task = state.taskHistory?.[0] | ||
| const messages = globalThis.provider.messages | ||
|
|
||
| if (task && task.tokensOut > 0) { | ||
| if (messages.some(({ type, text }) => type === "say" && text?.includes("My name is Roo"))) { | ||
| break | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in
index.tsusing the same mechanism.Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.