Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions src/test/VSCODE_INTEGRATION_TESTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# VSCode Integration Tests

This document describes the integration test setup for the Roo Code VSCode extension.

## Overview

The integration tests use the `@vscode/test-electron` package to run tests in a real VSCode environment. These tests verify that the extension works correctly within VSCode, including features like mode switching, webview interactions, and API communication.

## Test Setup

### Directory Structure

```
src/test/
├── runTest.ts # Main test runner
├── suite/
│ ├── index.ts # Test suite configuration
│ ├── modes.test.ts # Mode switching tests
│ ├── tasks.test.ts # Task execution tests
│ └── extension.test.ts # Extension activation tests
```

### Test Runner Configuration

The test runner (`runTest.ts`) is responsible for:

- Setting up the extension development path
- Configuring the test environment
- Running the integration tests using `@vscode/test-electron`

### Environment Setup

1. Create a `.env.integration` file in the root directory with required environment variables:

```
OPENROUTER_API_KEY=sk-or-v1-...
```

2. The test suite (`suite/index.ts`) configures:

- Mocha test framework with TDD interface
- 10-minute timeout for LLM communication
- Global extension API access
- WebView panel setup
- OpenRouter API configuration

## Test Suite Structure

Tests are organized using Mocha's TDD interface (`suite` and `test` functions). The main test files are:

- `modes.test.ts`: Tests mode switching functionality
- `tasks.test.ts`: Tests task execution
- `extension.test.ts`: Tests extension activation

### Global Objects

The following global objects are available in tests:

```typescript
declare global {
var api: ClineAPI
var provider: ClineProvider
var extension: vscode.Extension<ClineAPI>
var panel: vscode.WebviewPanel
}
```

## Running Tests

1. Ensure you have the required environment variables set in `.env.integration`

2. Run the integration tests:

```bash
npm run test:integration
```

The tests will:

- Download and launch a clean VSCode instance
- Install the extension
- Execute the test suite
- Report results

## Writing New Tests

When writing new integration tests:

1. Create a new test file in `src/test/suite/` with the `.test.ts` extension

2. Add the test file to the `files` array in `suite/index.ts` (you can temporarily comment out the other tests to run just the new test):

```typescript
const files = ["suite/modes.test.js", "suite/tasks.test.js", "suite/extension.test.js", "suite/your-new-test.test.js"]
```

3. Structure your tests using the TDD interface:

```typescript
import * as assert from "assert"
import * as vscode from "vscode"

suite("Your Test Suite Name", () => {
test("Should do something specific", async function () {
// Your test code here
})
})
```

4. Use the global objects (`api`, `provider`, `extension`, `panel`) to interact with the extension

### Best Practices

1. **Timeouts**: Use appropriate timeouts for async operations:

```typescript
const timeout = 30000
const interval = 1000
```

2. **State Management**: Reset extension state before/after tests:

```typescript
await globalThis.provider.updateGlobalState("mode", "Ask")
await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true)
```

3. **Assertions**: Use clear assertions with meaningful messages:

```typescript
assert.ok(condition, "Descriptive message about what failed")
```

4. **Error Handling**: Wrap test code in try/catch blocks and clean up resources:

```typescript
try {
// Test code
} finally {
// Cleanup code
}
```

5. **Wait for Operations**: Use polling when waiting for async operations:

```typescript
let startTime = Date.now()
while (Date.now() - startTime < timeout) {
if (condition) break
await new Promise((resolve) => setTimeout(resolve, interval))
}
```

6. **Grading**: When grading tests, use the `Grade:` format to ensure the test is graded correctly (See modes.test.ts for an example).

```typescript
await globalThis.api.startNewTask(
`Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`,
)
```
3 changes: 2 additions & 1 deletion src/test/suite/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ export async function run(): Promise<void> {

try {
// Find all test files
const files = await glob("**/**.test.js", { cwd: testsRoot })
//const files = await glob("**/**.test.js", { cwd: testsRoot } leaving this commented out for now since we only have three tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it hurt to keep this? I can imagine someone getting confused if they want to add more tests and they don't know about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't necessarily hurt to keep this, but it makes getting set up to run a single test locally more annoying. I made sure that in the VSCODE_INTEGRATION_TESTS.md there are explicit instructions around that part. Also this allows us to potentially quickly turn a test on or off by removing it from the list without having to move or delete the test.

This set up with a list of tests is similar to how we do things in Roo Automation Mobile, but happy to swap back to a single line if that's what makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's more intuitive to just have it run all tests without needing to add them to this list. You can always add .skip or .only to control which ones are run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to have all tests run by default, but I am leaving the array of test files commented out with instructions as it is way easier to modify the index.ts file to only run the single test someone is working on than having to go to each test they don't want to run and add skip. I don't believe the .only function works for this setup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.only seems to work for me. What do you see when you try?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that the runner was going to go in order so, if it grabbed the test that didn't have .only first it would still run it, but it seems like it is smart enough. I'll remove the list from index.ts and update the doc

const files = ["suite/modes.test.js", "suite/task.test.js", "suite/extension.test.js"]

// Add files to the test suite
files.forEach((f: string) => mocha.addFile(path.resolve(testsRoot, f)))
Expand Down
86 changes: 46 additions & 40 deletions src/test/suite/modes.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ suite("Roo Code Modes", () => {
test("Should handle switching modes correctly", async function () {
const timeout = 30000
const interval = 1000

const testPrompt =
"For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete"
if (!globalThis.extension) {
assert.fail("Extension not found")
}
Expand All @@ -27,9 +28,7 @@ suite("Roo Code Modes", () => {
await globalThis.provider.updateGlobalState("autoApprovalEnabled", true)

// Start a new task.
await globalThis.api.startNewTask(
"For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete",
)
await globalThis.api.startNewTask(testPrompt)

// Wait for task to appear in history with tokens.
startTime = Date.now()
Expand All @@ -52,46 +51,53 @@ suite("Roo Code Modes", () => {
assert.fail("No messages received")
}

assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'code' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to code mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes("software engineer"),
),
"Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'",
)
//Log the messages to the console
globalThis.provider.messages.forEach(({ type, text }) => {
if (type === "say") {
console.log(text)
}
})

assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && text?.includes(`"request":"[switch_mode to 'architect' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to architect mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && (text?.includes("technical planning") || text?.includes("technical leader")),
),
"Did not receive expected response containing 'I am Roo in Architect mode, specializing in analyzing codebases'",
//Start Grading Portion of test to grade the response from 1 to 10
await globalThis.provider.updateGlobalState("mode", "Ask")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in index.ts using the same mechanism.

Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.

let output = globalThis.provider.messages.map(({ type, text }) => (type === "say" ? text : "")).join("\n")
await globalThis.api.startNewTask(
`Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`,
)

startTime = Date.now()

while (Date.now() - startTime < timeout) {
const messages = globalThis.provider.messages

if (
messages.some(
({ type, text }) =>
type === "say" && text?.includes("I AM DONE GRADING") && !text?.includes("be sure to say"),
)
) {
break
}

await new Promise((resolve) => setTimeout(resolve, interval))
}
if (globalThis.provider.messages.length === 0) {
assert.fail("No messages received")
}
globalThis.provider.messages.forEach(({ type, text }) => {
if (type === "say" && text?.includes("Grade:")) {
console.log(text)
}
})
const grade = globalThis.provider.messages.find(
({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but maybe could use a regex to pull out the score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this would be a little more DRY?

			const gradeMessage = globalThis.provider.messages.find(
				({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
			)?.text
			const gradeMatch = gradeMessage?.match(/Grade: (\d+)/)
			const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined
			assert.ok(
				gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10,
				"Grade must be between 7 and 10",
			)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

)?.text
assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'ask' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to ask mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && (text?.includes("technical knowledge") || text?.includes("technical assist")),
),
"Did not receive expected response containing 'I am Roo in Ask mode, specializing in answering questions'",
grade?.includes("Grade: 10") ||
grade?.includes("Grade: 9") ||
grade?.includes("Grade: 8") ||
grade?.includes("Grade: 7"),
"Did not receive expected response containing 'Grade: 10' or 'Grade: 9' or 'Grade: 8' or 'Grade: 7'",
)
} finally {
}
Expand Down
9 changes: 6 additions & 3 deletions src/test/suite/task.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,19 @@ suite("Roo Code Task", () => {
await new Promise((resolve) => setTimeout(resolve, interval))
}

await globalThis.provider.updateGlobalState("mode", "Code")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cleanup logic for the global state changes to prevent side effects on other tests.

await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true)
await globalThis.provider.updateGlobalState("autoApprovalEnabled", true)

await globalThis.api.startNewTask("Hello world, what is your name? Respond with 'My name is ...'")

// Wait for task to appear in history with tokens.
startTime = Date.now()

while (Date.now() - startTime < timeout) {
const state = await globalThis.provider.getState()
const task = state.taskHistory?.[0]
const messages = globalThis.provider.messages

if (task && task.tokensOut > 0) {
if (messages.some(({ type, text }) => type === "say" && text?.includes("My name is Roo"))) {
break
}

Expand Down
Loading