The benchmarks that run in the cloud triggered via the evaluation repo https://github.com/OpenHands/evaluation/tree/main/envs/evaluation should be using `secrets.LLM_API_KEY_EVAL` as LLM Key