Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions src/test/VSCODE_INTEGRATION_TESTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# VSCode Integration Tests

This document describes the integration test setup for the Roo Code VSCode extension.

## Overview

The integration tests use the `@vscode/test-electron` package to run tests in a real VSCode environment. These tests verify that the extension works correctly within VSCode, including features like mode switching, webview interactions, and API communication.

## Test Setup

### Directory Structure

```
src/test/
├── runTest.ts # Main test runner
├── suite/
│ ├── index.ts # Test suite configuration
│ ├── modes.test.ts # Mode switching tests
│ ├── tasks.test.ts # Task execution tests
│ └── extension.test.ts # Extension activation tests
```

### Test Runner Configuration

The test runner (`runTest.ts`) is responsible for:

- Setting up the extension development path
- Configuring the test environment
- Running the integration tests using `@vscode/test-electron`

### Environment Setup

1. Create a `.env.integration` file in the root directory with required environment variables:

```
OPENROUTER_API_KEY=sk-or-v1-...
```

2. The test suite (`suite/index.ts`) configures:

- Mocha test framework with TDD interface
- 10-minute timeout for LLM communication
- Global extension API access
- WebView panel setup
- OpenRouter API configuration

## Test Suite Structure

Tests are organized using Mocha's TDD interface (`suite` and `test` functions). The main test files are:

- `modes.test.ts`: Tests mode switching functionality
- `tasks.test.ts`: Tests task execution
- `extension.test.ts`: Tests extension activation

### Global Objects

The following global objects are available in tests:

```typescript
declare global {
var api: ClineAPI
var provider: ClineProvider
var extension: vscode.Extension<ClineAPI>
var panel: vscode.WebviewPanel
}
```

## Running Tests

1. Ensure you have the required environment variables set in `.env.integration`

2. Run the integration tests:

```bash
npm run test:integration
```

3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a note that test.only should be removed before committing to ensure full test coverage on CI.

Suggested change
3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.
3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Remember to remove `test.only` before committing to ensure full test coverage on CI.


The tests will:

- Download and launch a clean VSCode instance
- Install the extension
- Execute the test suite
- Report results

## Writing New Tests

When writing new integration tests:

1. Create a new test file in `src/test/suite/` with the `.test.ts` extension

2. Structure your tests using the TDD interface:

```typescript
import * as assert from "assert"
import * as vscode from "vscode"

suite("Your Test Suite Name", () => {
test("Should do something specific", async function () {
// Your test code here
})
})
```

3. Use the global objects (`api`, `provider`, `extension`, `panel`) to interact with the extension

### Best Practices

1. **Timeouts**: Use appropriate timeouts for async operations:

```typescript
const timeout = 30000
const interval = 1000
```

2. **State Management**: Reset extension state before/after tests:

```typescript
await globalThis.provider.updateGlobalState("mode", "Ask")
await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true)
```

3. **Assertions**: Use clear assertions with meaningful messages:

```typescript
assert.ok(condition, "Descriptive message about what failed")
```

4. **Error Handling**: Wrap test code in try/catch blocks and clean up resources:

```typescript
try {
// Test code
} finally {
// Cleanup code
}
```

5. **Wait for Operations**: Use polling when waiting for async operations:

```typescript
let startTime = Date.now()
while (Date.now() - startTime < timeout) {
if (condition) break
await new Promise((resolve) => setTimeout(resolve, interval))
}
```

6. **Grading**: When grading tests, use the `Grade:` format to ensure the test is graded correctly (See modes.test.ts for an example).

```typescript
await globalThis.api.startNewTask(
`Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`,
)
```
86 changes: 44 additions & 42 deletions src/test/suite/modes.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ suite("Roo Code Modes", () => {
test("Should handle switching modes correctly", async function () {
const timeout = 30000
const interval = 1000

const testPrompt =
"For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete"
if (!globalThis.extension) {
assert.fail("Extension not found")
}
Expand All @@ -27,9 +28,7 @@ suite("Roo Code Modes", () => {
await globalThis.provider.updateGlobalState("autoApprovalEnabled", true)

// Start a new task.
await globalThis.api.startNewTask(
"For each mode (Code, Architect, Ask) respond with the mode name and what it specializes in after switching to that mode, do not start with the current mode, be sure to say 'I AM DONE' after the task is complete",
)
await globalThis.api.startNewTask(testPrompt)

// Wait for task to appear in history with tokens.
startTime = Date.now()
Expand All @@ -52,47 +51,50 @@ suite("Roo Code Modes", () => {
assert.fail("No messages received")
}

assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'code' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to code mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes("software engineer"),
),
"Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'",
)
//Log the messages to the console
globalThis.provider.messages.forEach(({ type, text }) => {
if (type === "say") {
console.log(text)
}
})

assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && text?.includes(`"request":"[switch_mode to 'architect' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to architect mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && (text?.includes("technical planning") || text?.includes("technical leader")),
),
"Did not receive expected response containing 'I am Roo in Architect mode, specializing in analyzing codebases'",
//Start Grading Portion of test to grade the response from 1 to 10
await globalThis.provider.updateGlobalState("mode", "Ask")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in index.ts using the same mechanism.

Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.

let output = globalThis.provider.messages.map(({ type, text }) => (type === "say" ? text : "")).join("\n")
await globalThis.api.startNewTask(
`Given this prompt: ${testPrompt} grade the response from 1 to 10 in the format of "Grade: (1-10)": ${output} \n Be sure to say 'I AM DONE GRADING' after the task is complete`,
)

assert.ok(
globalThis.provider.messages.some(
({ type, text }) => type === "say" && text?.includes(`"request":"[switch_mode to 'ask' because:`),
),
"Did not receive expected response containing 'Roo wants to switch to ask mode'",
)
assert.ok(
globalThis.provider.messages.some(
({ type, text }) =>
type === "say" && (text?.includes("technical knowledge") || text?.includes("technical assist")),
),
"Did not receive expected response containing 'I am Roo in Ask mode, specializing in answering questions'",
)
startTime = Date.now()

while (Date.now() - startTime < timeout) {
const messages = globalThis.provider.messages

if (
messages.some(
({ type, text }) =>
type === "say" && text?.includes("I AM DONE GRADING") && !text?.includes("be sure to say"),
)
) {
break
}

await new Promise((resolve) => setTimeout(resolve, interval))
}
if (globalThis.provider.messages.length === 0) {
assert.fail("No messages received")
}
globalThis.provider.messages.forEach(({ type, text }) => {
if (type === "say" && text?.includes("Grade:")) {
console.log(text)
}
})
const gradeMessage = globalThis.provider.messages.find(
({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but maybe could use a regex to pull out the score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this would be a little more DRY?

			const gradeMessage = globalThis.provider.messages.find(
				({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
			)?.text
			const gradeMatch = gradeMessage?.match(/Grade: (\d+)/)
			const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined
			assert.ok(
				gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10,
				"Grade must be between 7 and 10",
			)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

)?.text
const gradeMatch = gradeMessage?.match(/Grade: (\d+)/)
const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined
assert.ok(gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10, "Grade must be between 7 and 10")
} finally {
}
})
Expand Down
9 changes: 6 additions & 3 deletions src/test/suite/task.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,19 @@ suite("Roo Code Task", () => {
await new Promise((resolve) => setTimeout(resolve, interval))
}

await globalThis.provider.updateGlobalState("mode", "Code")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cleanup logic for the global state changes to prevent side effects on other tests.

await globalThis.provider.updateGlobalState("alwaysAllowModeSwitch", true)
await globalThis.provider.updateGlobalState("autoApprovalEnabled", true)

await globalThis.api.startNewTask("Hello world, what is your name? Respond with 'My name is ...'")

// Wait for task to appear in history with tokens.
startTime = Date.now()

while (Date.now() - startTime < timeout) {
const state = await globalThis.provider.getState()
const task = state.taskHistory?.[0]
const messages = globalThis.provider.messages

if (task && task.tokensOut > 0) {
if (messages.some(({ type, text }) => type === "say" && text?.includes("My name is Roo"))) {
break
}

Expand Down
Loading