Skip to content

Conversation

@ColemanRoo
Copy link
Contributor

@ColemanRoo ColemanRoo commented Feb 11, 2025

Description

Changing how we determine if modes.test.ts passes. We now use the Ask mode of Roo Code to determine if the output from a previous task is correct.
Modified the index.ts file to have the option to use an explicit list of test files to make it easier to run 1 test at time locally.
Also added documentation for VS Code Integration Tests
Future Enhancement: Use a different model to grade the response instead of the same model.

Test Procedure

Tested the new grading assertion locally.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update

Pre-flight Checklist

  • Changes are limited to a single feature, bugfix or chore (split larger changes into separate PRs)
  • Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint)
  • I have created a changeset using npm run changeset (required for user-facing changes)
  • I have reviewed contributor guidelines

Screenshots

n/a

Additional Notes

n/a


Important

Add grading mechanism for VSCode test responses using Roo Code's Ask mode and document integration test setup.

  • Behavior:
    • Use Ask mode of Roo Code to grade test responses in modes.test.ts.
    • Modify index.ts to allow running specific test files locally.
  • Documentation:
    • Add VSCODE_INTEGRATION_TESTS.md for VSCode integration test setup.
  • Tests:
    • Update modes.test.ts to include grading logic for mode switching responses.
    • Update task.test.ts to ensure correct handling of prompt and response.

This description was created by Ellipsis for 4886cb7. It will automatically update as commits are pushed.

Add documentation for VSCode Integration Tests
Use explicit list of tests instead of searching ot make it easier to run 1 test at time locally.
@changeset-bot
Copy link

changeset-bot bot commented Feb 11, 2025

⚠️ No Changeset found

Latest commit: 4886cb7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

try {
// Find all test files
const files = await glob("**/**.test.js", { cwd: testsRoot })
//const files = await glob("**/**.test.js", { cwd: testsRoot } leaving this commented out for now since we only have three tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it hurt to keep this? I can imagine someone getting confused if they want to add more tests and they don't know about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't necessarily hurt to keep this, but it makes getting set up to run a single test locally more annoying. I made sure that in the VSCODE_INTEGRATION_TESTS.md there are explicit instructions around that part. Also this allows us to potentially quickly turn a test on or off by removing it from the list without having to move or delete the test.

This set up with a list of tests is similar to how we do things in Roo Automation Mobile, but happy to swap back to a single line if that's what makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's more intuitive to just have it run all tests without needing to add them to this list. You can always add .skip or .only to control which ones are run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to have all tests run by default, but I am leaving the array of test files commented out with instructions as it is way easier to modify the index.ts file to only run the single test someone is working on than having to go to each test they don't want to run and add skip. I don't believe the .only function works for this setup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.only seems to work for me. What do you see when you try?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that the runner was going to go in order so, if it grabbed the test that didn't have .only first it would still run it, but it seems like it is smart enough. I'll remove the list from index.ts and update the doc

({ type, text }) => type === "say" && text?.includes("software engineer"),
),
"Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'",
await globalThis.provider.updateGlobalState("mode", "Ask")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in index.ts using the same mechanism.

Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.

await new Promise((resolve) => setTimeout(resolve, interval))
}

await globalThis.provider.updateGlobalState("mode", "Code")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding cleanup logic for the global state changes to prevent side effects on other tests.

}
})
const grade = globalThis.provider.messages.find(
({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but maybe could use a regex to pull out the score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this would be a little more DRY?

			const gradeMessage = globalThis.provider.messages.find(
				({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
			)?.text
			const gradeMatch = gradeMessage?.match(/Grade: (\d+)/)
			const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined
			assert.ok(
				gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10,
				"Grade must be between 7 and 10",
			)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Add regex look for grade in in modes.test.ts
@ColemanRoo ColemanRoo requested a review from mrubens February 13, 2025 14:47
Update documentation
Update modes.test.ts to use PR comment suggestion for pulling out reponse grade
npm run test:integration
```

3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a note that test.only should be removed before committing to ensure full test coverage on CI.

Suggested change
3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others.
3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Remember to remove `test.only` before committing to ensure full test coverage on CI.

@ColemanRoo ColemanRoo merged commit 1cea41d into main Feb 13, 2025
6 checks passed
@ColemanRoo ColemanRoo deleted the feature/gradingTestResponse branch February 13, 2025 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants