feat(llma): add batch mode for evals

Add an option to run evals in [batch mode](https://platform.openai.com/docs/guides/batch). This makes them take up to 24h (usually way faster though), but also half as expensive.

We'd need to set up an endpoint to receive the async responses and emit the corresponding eval events.