[automation] QA instructions files

Our instruction files are starting to become more important as we use them to guide developers to the right tools, example prompts, example code, etc...

To aid in testing this, we need to build out some code to run our instructions in some form of test harness, which lets us set some expectations and make sure that the responses have some relevance to the truth.

If we can just automate this, partway, I think we'll have something worthwhile. Getting to a fully automated setup will be difficult until Copilot has an official API.

So:

1. Use Playwright to automate input to VSCode Copilot on the web (ie, just use a codespace)
2. Feed it a prompt
3. Use the VSCode copilot export, to get the chat as JSON
4. Parse the JSON for our request, get the response and use some form of eval framework to ensure the prompt makes sense. Some particular behaviors that I've seen:
   * Instructions can sometimes loop in an unpredictable manner
   * Instructions can hallucinate parameters to existing tools
   * Instructions can be selectively followed

This is a task that can be broken apart and worked on, with stubs in place to do this. I have a demo on a very simple method to do this: https://microsoft-my.sharepoint.com/:v:/p/ripark/EcddcghEhkpIjS0AkA6BOFkBWxqKz1ggUPnTrgVcCrjqgQ?e=CKf8dE

I'm sure there's also some prior art, so some research would help us as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[automation] QA instructions files #11299

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[automation] QA instructions files #11299

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions