Skip to content

Discussion: We need evals or somethingΒ #54

@Tiberriver256

Description

@Tiberriver256

There's a flood of these coming in. How do we know they work? How do we know which ones are better than others?

If not a hard eval script to run, could we have a community rating system that would at least crowd source these things.

If someone wants to make a PR to change one though... How do we know the change is an improvement other than just vibes?

Really curious to hear others thoughts on this. It's something we're struggling to figure out internally at my company with internal prompt/instructions/modes too.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions