Feature/grading vscode test response #943

ColemanRoo · 2025-02-11T17:00:33Z

Description

Changing how we determine if modes.test.ts passes. We now use the Ask mode of Roo Code to determine if the output from a previous task is correct.
Modified the index.ts file to have the option to use an explicit list of test files to make it easier to run 1 test at time locally.
Also added documentation for VS Code Integration Tests
Future Enhancement: Use a different model to grade the response instead of the same model.

Test Procedure

Tested the new grading assertion locally.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update

Pre-flight Checklist

Changes are limited to a single feature, bugfix or chore (split larger changes into separate PRs)
Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint)
I have created a changeset using npm run changeset (required for user-facing changes)
I have reviewed contributor guidelines

Screenshots

n/a

Additional Notes

n/a

Important

Add grading mechanism for VSCode test responses using Roo Code's Ask mode and document integration test setup.

Behavior:
- Use Ask mode of Roo Code to grade test responses in modes.test.ts.
- Modify index.ts to allow running specific test files locally.
Documentation:
- Add VSCODE_INTEGRATION_TESTS.md for VSCode integration test setup.
Tests:
- Update modes.test.ts to include grading logic for mode switching responses.
- Update task.test.ts to ensure correct handling of prompt and response.

^{This description was created by}^{for 4886cb7. It will automatically update as commits are pushed.}

Add documentation for VSCode Integration Tests Use explicit list of tests instead of searching ot make it easier to run 1 test at time locally.

changeset-bot · 2025-02-11T17:00:39Z

⚠️ No Changeset found

Latest commit: 4886cb7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

src/test/suite/modes.test.ts

mrubens · 2025-02-11T17:03:50Z

src/test/suite/index.ts

 	try {
 		// Find all test files
-		const files = await glob("**/**.test.js", { cwd: testsRoot })
+		//const files = await glob("**/**.test.js", { cwd: testsRoot } leaving this commented out for now since we only have three tests


Does it hurt to keep this? I can imagine someone getting confused if they want to add more tests and they don't know about this.

It doesn't necessarily hurt to keep this, but it makes getting set up to run a single test locally more annoying. I made sure that in the VSCODE_INTEGRATION_TESTS.md there are explicit instructions around that part. Also this allows us to potentially quickly turn a test on or off by removing it from the list without having to move or delete the test.

This set up with a list of tests is similar to how we do things in Roo Automation Mobile, but happy to swap back to a single line if that's what makes sense.

Yeah I think it's more intuitive to just have it run all tests without needing to add them to this list. You can always add .skip or .only to control which ones are run.

I made the change to have all tests run by default, but I am leaving the array of test files commented out with instructions as it is way easier to modify the index.ts file to only run the single test someone is working on than having to go to each test they don't want to run and add skip. I don't believe the .only function works for this setup.

.only seems to work for me. What do you see when you try?

I was worried that the runner was going to go in order so, if it grabbed the test that didn't have .only first it would still run it, but it seems like it is smart enough. I'll remove the list from index.ts and update the doc

mrubens · 2025-02-11T17:05:50Z

src/test/suite/modes.test.ts

-					({ type, text }) => type === "say" && text?.includes("software engineer"),
-				),
-				"Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'",
+			await globalThis.provider.updateGlobalState("mode", "Ask")


I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)

I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in index.ts using the same mechanism.

Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.

Modify where we log messages to the console for the test

ellipsis-dev · 2025-02-11T20:23:14Z

src/test/suite/task.test.ts

 				await new Promise((resolve) => setTimeout(resolve, interval))
 			}

+			await globalThis.provider.updateGlobalState("mode", "Code")


Consider adding cleanup logic for the global state changes to prevent side effects on other tests.

…kens used

mrubens · 2025-02-12T05:19:11Z

src/test/suite/modes.test.ts

+				}
+			})
+			const grade = globalThis.provider.messages.find(
+				({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),


Nitpick, but maybe could use a regex to pull out the score?

Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs

Maybe something like this would be a little more DRY?

const gradeMessage = globalThis.provider.messages.find( ({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"), )?.text const gradeMatch = gradeMessage?.match(/Grade: (\d+)/) const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined assert.ok( gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10, "Grade must be between 7 and 10", )

Add regex look for grade in in modes.test.ts

src/test/suite/modes.test.ts

Update documentation Update modes.test.ts to use PR comment suggestion for pulling out reponse grade

ellipsis-dev · 2025-02-13T22:11:59Z

src/test/VSCODE_INTEGRATION_TESTS.md

+npm run test:integration
+```
+
+3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.


Consider adding a note that test.only should be removed before committing to ensure full test coverage on CI.

Suggested change

3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.

3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Remember to remove `test.only` before committing to ensure full test coverage on CI.

Converting modes.test.ts to use grading to measure if the test passes

d7b44a2

Add documentation for VSCode Integration Tests Use explicit list of tests instead of searching ot make it easier to run 1 test at time locally.

ColemanRoo requested review from cte, mrubens and stea9499 as code owners February 11, 2025 17:00

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

src/test/suite/modes.test.ts Outdated Show resolved Hide resolved

mrubens reviewed Feb 11, 2025

View reviewed changes

ColemanRoo added 2 commits February 11, 2025 11:06

Fix typo for test list

21f04f5

Modify where we log messages to the console for the test

update task.test.ts to run in any order with other tests

63346fb

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

task.test.ts fix to check for messages before assersion instead of to…

a41fb07

…kens used

mrubens reviewed Feb 12, 2025

View reviewed changes

Change index.ts to always run all tests

eb826b1

Add regex look for grade in in modes.test.ts

ellipsis-dev bot reviewed Feb 12, 2025

View reviewed changes

src/test/suite/modes.test.ts Outdated Show resolved Hide resolved

ellipsis comment on regex format

1e279b4

ColemanRoo requested a review from mrubens February 13, 2025 14:47

Remove test files list from index.ts

937c3b6

Update documentation Update modes.test.ts to use PR comment suggestion for pulling out reponse grade

ellipsis-dev bot reviewed Feb 13, 2025

View reviewed changes

ellipsis comment update to test document

4886cb7

mrubens approved these changes Feb 13, 2025

View reviewed changes

ColemanRoo merged commit 1cea41d into main Feb 13, 2025
6 checks passed

ColemanRoo deleted the feature/gradingTestResponse branch February 13, 2025 22:30

	3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.
	3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Remember to remove `test.only` before committing to ensure full test coverage on CI.

Feature/grading vscode test response #943

Feature/grading vscode test response #943

Uh oh!

Conversation

ColemanRoo commented Feb 11, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Procedure

Type of Change

Pre-flight Checklist

Screenshots

Additional Notes

Uh oh!

changeset-bot bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ColemanRoo commented Feb 11, 2025 •

edited by ellipsis-dev bot

Loading

changeset-bot bot commented Feb 11, 2025 •

edited

Loading