-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
backendBot implementation and other backend concernsBot implementation and other backend concerns
Description
This issue captures ideas for experiments to run against single-turn baseline results
- gemini 2.5 pro w/o RAG - quantify how much RAG is improving correctness
- gemini 3.0 preview w/ RAG - quantify correctness/tokens-per-second/tone/cost-per-answer
- chatGPT frontier model (5.2?) w/ VertexSearch - gemini vs chatGPT
- thinking budget (in tokens) - investigate correlation factor between thinking budget and correctness, thinking budget and cost-per-answer
- gemini 2.5 pro w/ few-shot examples ??? - correctness
- fine-tune gemini 2.5 pro ??? - correctness
- a/b-test variations on system prompt - correctness
Metadata
Metadata
Assignees
Labels
backendBot implementation and other backend concernsBot implementation and other backend concerns