Where are the evals? #681

oshea00 · 2026-02-07T18:56:59Z

oshea00
Feb 7, 2026

Is there any discussion on how these components (prompts, skills, etc.) are evaluated and scored for utility? How are changes evaluated? If skills and prompts are later updated, how do we know the effect one way or another? Are there efforts to study this?

For the moment, it looks like 99% of this is caveat emptor.

astrokdev · 2026-02-07T19:25:01Z

astrokdev
Feb 7, 2026

I also join your feeling, would it be possible to have some acceptance criteria and test results ?

0 replies

oshea00 · 2026-02-09T05:17:31Z

oshea00
Feb 9, 2026
Author

An example:
"Request clarification on ambiguous schemas, authentication methods, or requirements" found in

awesome-copilot/agents/openapi-to-application.agent.md

Line 31 in d99ba71

    
           - Request clarification on ambiguous schemas, authentication methods, or requirements

How does this prescription in the agent instructions effect to desired behavior of detecting "ambiguous schemas"? Seems a little ambiguous. How would this "clarification" be ranked among the nearly infinite possibilities that it encompasses?

1 reply

oshea00 Feb 9, 2026
Author

Hint: how would we evaluate this? How would we determine whether our instruction, after further refinement, results in a positive effect in determining "ambiguoseness"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where are the evals? #681

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Where are the evals? #681

Uh oh!

oshea00 Feb 7, 2026

Replies: 2 comments · 1 reply

Uh oh!

astrokdev Feb 7, 2026

Uh oh!

oshea00 Feb 9, 2026 Author

Uh oh!

oshea00 Feb 9, 2026 Author

oshea00
Feb 7, 2026

Replies: 2 comments 1 reply

astrokdev
Feb 7, 2026

oshea00
Feb 9, 2026
Author

oshea00 Feb 9, 2026
Author