This is for evaluating A2UI (v0.8) against various LLMs.
To use the models, you need to set the following environment variables with your API keys:
GEMINI_API_KEYOPENAI_API_KEYANTHROPIC_API_KEY
You can set these in a .env file in the root of the project, or in your shell's configuration file (e.g., .bashrc, .zshrc).
A .env.example file is provided as a template:
cp .env.example .env
# Edit .env with your API keys (do not commit .env)You also need to install dependencies before running:
pnpm installTo run the flow, use the following command:
pnpm run evalAllYou can run the script for a single model and data point by using the --model and --prompt command-line flags. This is useful for quick tests and debugging.
pnpm run eval -- --model='<model_name>' --prompt=<prompt_name>To run the test with the gpt-5-mini (reasoning: minimal) model and the generateDogUIs prompt, use the following command:
pnpm run eval -- --model='gpt-5-mini (reasoning: minimal)' --prompt=generateDogUIsBy default, the script only prints the summary table and any errors that occur during generation. To see the full JSON output for each successful generation, use the --verbose flag.
To keep the input and output for each run in separate files, specify the --keep=<output_dir> flag, which will create a directory hierarchy with the input and output for each LLM call in separate files.
pnpm run evalAll -- --verbosepnpm run evalAll -- --keep=output