|
| 1 | +# Configuration Guide |
| 2 | + |
| 3 | +This guide explains how to use YAML configuration files with `lm-eval` to define reusable evaluation settings. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Instead of passing many CLI arguments, you can define evaluation parameters in a YAML configuration file: |
| 8 | + |
| 9 | +```bash |
| 10 | +# Instead of: |
| 11 | +lm-eval run --model hf --model_args pretrained=gpt2,dtype=float32 --tasks hellaswag arc_easy --num_fewshot 5 --batch_size 8 --device cuda:0 |
| 12 | + |
| 13 | +# Use: |
| 14 | +lm-eval run --config eval_config.yaml |
| 15 | +``` |
| 16 | + |
| 17 | +CLI arguments override config file values, so you can set defaults in a config file and override specific settings: |
| 18 | + |
| 19 | +```bash |
| 20 | +lm-eval run --config eval_config.yaml --tasks mmlu --limit 100 |
| 21 | +``` |
| 22 | + |
| 23 | +## Quick Reference |
| 24 | + |
| 25 | +All configuration keys correspond directly to CLI arguments. See the [CLI Reference](interface.md#lm-eval-run) for detailed descriptions of each option. |
| 26 | + |
| 27 | +## Config Schema |
| 28 | + |
| 29 | +| Field | Type | Default | Description | |
| 30 | +|-------|------|---------|-------------| |
| 31 | +| `model` | string | `"hf"` | Model type/provider | |
| 32 | +| `model_args` | dict | `{}` | Model constructor arguments | |
| 33 | +| `tasks` | list/string | required | Tasks to evaluate | |
| 34 | +| `num_fewshot` | int/null | `null` | Few-shot example count | |
| 35 | +| `batch_size` | int/string | `1` | Batch size or "auto" | |
| 36 | +| `max_batch_size` | int/null | `null` | Max batch size for auto | |
| 37 | +| `device` | string/null | `"cuda:0"` | Device to use | |
| 38 | +| `limit` | float/null | `null` | Example limit per task | |
| 39 | +| `samples` | dict/null | `null` | Specific sample indices | |
| 40 | +| `use_cache` | string/null | `null` | Response cache path | |
| 41 | +| `cache_requests` | string/dict | `{}` | Request cache settings | |
| 42 | +| `output_path` | string/null | `null` | Results output path | |
| 43 | +| `log_samples` | bool | `false` | Save model I/O | |
| 44 | +| `predict_only` | bool | `false` | Skip metrics | |
| 45 | +| `apply_chat_template` | bool/string | `false` | Chat template | |
| 46 | +| `system_instruction` | string/null | `null` | System prompt | |
| 47 | +| `fewshot_as_multiturn` | bool/null | `null` | Multi-turn few-shot | |
| 48 | +| `include_path` | string/null | `null` | External tasks path | |
| 49 | +| `gen_kwargs` | dict | `{}` | Generation arguments | |
| 50 | +| `wandb_args` | dict | `{}` | W&B init arguments | |
| 51 | +| `hf_hub_log_args` | dict | `{}` | HF Hub logging | |
| 52 | +| `seed` | list/int | `[0,1234,1234,1234]` | Random seeds | |
| 53 | +| `trust_remote_code` | bool | `false` | Trust remote code | |
| 54 | +| `metadata` | dict | `{}` | Task metadata | |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Example |
| 59 | + |
| 60 | +```yaml |
| 61 | +# basic_eval.yaml |
| 62 | +model: hf |
| 63 | +model_args: |
| 64 | + pretrained: gpt2 |
| 65 | + dtype: float32 |
| 66 | + |
| 67 | +tasks: |
| 68 | + - hellaswag |
| 69 | + - arc_easy |
| 70 | + |
| 71 | +num_fewshot: 0 |
| 72 | +batch_size: auto |
| 73 | +device: cuda:0 |
| 74 | + |
| 75 | +output_path: ./results/gpt2/ |
| 76 | +log_samples: true |
| 77 | + |
| 78 | +wandb_args: |
| 79 | + project: llm-evals |
| 80 | + name: mistral-7b-instruct |
| 81 | + tags: |
| 82 | + - mistral |
| 83 | + - instruct |
| 84 | + - production |
| 85 | + |
| 86 | +hf_hub_log_args: |
| 87 | + hub_results_org: my-org |
| 88 | + results_repo_name: llm-eval-results |
| 89 | + push_results_to_hub: true |
| 90 | + public_repo: false |
| 91 | +``` |
| 92 | +
|
| 93 | +--- |
| 94 | +
|
| 95 | +## Programmatic Usage |
| 96 | +
|
| 97 | +For loading config files in Python, see the [Python API Guide](python-api.md#using-evaluatorconfig). |
| 98 | +
|
| 99 | +--- |
| 100 | +
|
| 101 | +## Validation |
| 102 | +
|
| 103 | +Validate your configuration before running: |
| 104 | +
|
| 105 | +```bash |
| 106 | +# Check that tasks exist |
| 107 | +lm-eval validate --tasks hellaswag,arc_easy |
| 108 | + |
| 109 | +# With external tasks |
| 110 | +lm-eval validate --tasks my_task --include_path /path/to/tasks |
| 111 | +``` |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Tips |
| 116 | + |
| 117 | +1. **Start simple**: Begin with minimal config and add options as needed |
| 118 | +2. **Use CLI overrides**: Set defaults in config, override with CLI for experiments |
| 119 | +3. **Separate concerns**: Create different configs for different model families or task sets |
| 120 | +4. **Version control**: Commit config files alongside results for reproducibility |
| 121 | +5. **Use comments**: YAML supports `#` comments to document your choices |
0 commit comments