- Agentic benchmarking - Frontier model comparison - Testing Llama Stack - Langchain - CrewAI - Whitepaper - Take whitepaper & demo to bootstrap and "Eval AIOps harness"