A/B testing
#793
Replies: 1 comment
-
@boxabirds have you got an API in mind? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem: with such wild variability in output based on not only the LLMs but the prompts, small changes can result in quite significant differences.
Solution: ability to specify a list of prompt variations and a list of different LLMs to try.
You could use Optuna for efficient evaluation (cf DSPy), along with argilla the human evaluation.
Beta Was this translation helpful? Give feedback.
All reactions