Skip to content

[automation] QA instructions files #11299

@richardpark-msft

Description

@richardpark-msft

Our instruction files are starting to become more important as we use them to guide developers to the right tools, example prompts, example code, etc...

To aid in testing this, we need to build out some code to run our instructions in some form of test harness, which lets us set some expectations and make sure that the responses have some relevance to the truth.

If we can just automate this, partway, I think we'll have something worthwhile. Getting to a fully automated setup will be difficult until Copilot has an official API.

So:

  1. Use Playwright to automate input to VSCode Copilot on the web (ie, just use a codespace)
  2. Feed it a prompt
  3. Use the VSCode copilot export, to get the chat as JSON
  4. Parse the JSON for our request, get the response and use some form of eval framework to ensure the prompt makes sense. Some particular behaviors that I've seen:
    • Instructions can sometimes loop in an unpredictable manner
    • Instructions can hallucinate parameters to existing tools
    • Instructions can be selectively followed

This is a task that can be broken apart and worked on, with stubs in place to do this. I have a demo on a very simple method to do this: https://microsoft-my.sharepoint.com/:v:/p/ripark/EcddcghEhkpIjS0AkA6BOFkBWxqKz1ggUPnTrgVcCrjqgQ?e=CKf8dE

I'm sure there's also some prior art, so some research would help us as well.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions