Skip to content

Commit cc055c7

Browse files
authored
Add mergekit-evolve for parameter space evolutionary algorithm merging (#283)
1 parent 215f767 commit cc055c7

25 files changed

+1865
-92
lines changed

docs/evolve.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# mergekit-evolve
2+
3+
`mergekit-evolve` is a script that uses an evolutionary algorithm (CMA-ES) to optimize the parameters of a merge against model metrics. This is inspired by SakanaAI's [Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/abs/2403.13187), in particular their parameter-space approach. `mergekit-evolve` uses EleutherAI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to define and evaluate the scoring function. The script is set up to be run either single-node or on a Ray cluster and has a few different strategies for scheduling operations depending on your particular configuration of compute.
4+
5+
## Installation
6+
7+
Install `mergekit` with the `evolve` (and optionally `vllm`) features:
8+
9+
```sh
10+
git clone https://github.com/arcee-ai/mergekit.git
11+
cd mergekit
12+
13+
pip install -e .[evolve,vllm]
14+
```
15+
16+
If you had a perfectly good pytorch environment going and installing an older version of vLLM downgraded it and broke flash attention, run the following commands to fix it:
17+
18+
```sh
19+
pip uninstall flash-attn
20+
pip cache purge
21+
pip install flash-attn
22+
```
23+
24+
## Configuration
25+
26+
`mergekit-evolve` takes in a YAML configuration file that defines how the merge is parameterized and what metrics to optimize. The general syntax is as follows:
27+
28+
```yml
29+
genome:
30+
models:
31+
- model_1
32+
- model_2
33+
...
34+
- model_n
35+
merge_method: dare_ties
36+
base_model: base_model_if_needed
37+
tokenizer_source: null # optional
38+
layer_granularity: 8
39+
normalize: false # optional
40+
allow_negative_weights: false # optional
41+
tasks:
42+
- name: lm_eval_task_name
43+
weight: 1.0 # optional
44+
metric: "acc,none" # defaults to acc,none
45+
- name: ... # as many as you want
46+
```
47+
48+
### Genome Definition
49+
50+
The `genome` section of the configuration file defines the parameter space that `mergekit-evolve` will be optimizing in.
51+
52+
#### `models`
53+
54+
This should be a list of all of the models you want available to be merged. Depending on the merge method not all are guaranteed to be used in the final merge.
55+
56+
#### `merge_method`
57+
58+
Merge method to be used. Currently supported values are `linear`, `dare_tires`, `task_arithmetic`, `ties`, and `slerp`.
59+
60+
#### `base_model`
61+
62+
The base model for the merge, if applicable.
63+
64+
#### `layer_granularity`
65+
66+
A set of parameters will be introduced for each consecutive slice of `layer_granularity` layers. So for example, a 32-layer model like `mistralai/Mistral-7B-v0.1` with `layer_granularity: 8` will be divided into 4 groups of 8 layers with different merge parameters for each. The value specified here must be a divisor of the number of layers in your input models. Large values of `layer_granularity` will reduce the search space greatly, meaning you will get faster convergence at the cost of a potentially less good global solution.
67+
68+
#### `normalize`
69+
70+
Sets the `normalize` flag when merging. For methods like `linear`, `ties`, and `dare_ties` this constrains the search space to a set of definitely valid models. Similarly to `layer_granularity`, this can greatly speed up convergence at the cost of ruling out oddball solutions that might score better than more standard merges.
71+
72+
#### `allow_negative_weights`
73+
74+
Pretty self explanatory. When this flag is not set, the absolute value of weight parameters is used. Sensible search space reduction for `linear` and `slerp`. For task arithmetic based methods you probably want `allow_negative_weights: true`.
75+
76+
### Task Definition
77+
78+
To evaluate the produced merges you need to specify a list of tasks supported by the LM evaluation harness. This can be either built in tasks (don't be naughty) or tasks you define yourself (see the [New Task Guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md) for how). If your task does not use `acc` as the metric then you must specify the correct metric name. Each task can also optionally have a weight associated.
79+
80+
`mergekit-evolve` aims to maximize the score of the merge, so if you are using any tasks or metrics where a lower score is better (like perplexity) be sure to assign a negative weight to that task.
81+
82+
## Running `mergekit-evolve`
83+
84+
```sh
85+
mergekit-evolve [OPTIONS] --storage-path PATH GENOME_CONFIG_PATH
86+
```
87+
88+
`mergekit-evolve` needs a storage path specified, where it will save the input models, merges to evaluate, and the config for the current best merge evaluated. If you are not using in-memory merging this can require a *lot* of space - expect at least one fp16 model per GPU.
89+
90+
Some important options:
91+
92+
### Scheduling Strategy (`--strategy`)
93+
94+
There are three different strategies implemented for scheduling merging and evaluation jobs.
95+
96+
#### `pool`
97+
98+
Assigns an actor to each GPU in your cluster and guarantees merges and evaluations are performed on the same node. This is a safe default suitable for any configuration, local or distributed.
99+
100+
#### `buffered`
101+
102+
Maintains a buffer of tasks scheduled to ensure that there is always a model mergign or ready to evaluate for each gpu. Allows for concurrent merging and evaluation of models on the same GPU if enough VRAM is available. Only suitable for a single-node setup or when `--storage-path` points to a fast shared filesystem.
103+
104+
#### `serial`
105+
106+
Uses Ray placement groups to ensure merges and their evaluations happen on the same node, but otherwise just lets Ray take the wheel. Maybe give a try if you're having trouble with the other two, otherwise probably don't use it.
107+
108+
### Evaluation LLM Backend
109+
110+
By default `mergekit-evolve` will use the `hf` backend for `lm-eval`. To use vLLM instead, pass the `--vllm` flag.
111+
112+
### On-Disk vs. In-Memory
113+
114+
By default `mergekit-evolve` will perform merges, write the result to disk, then start up an instance of lm-eval pointing at that path. This is a safe default and will generally always work but also causes a lot of GPU downtime and eats disk space. When using the `pool` scheduling strategy, you have the option to instead keep a model resident in memory and directly update its parameters instead of merging to disk. This is much faster and uses no additional disk space. However, it does involve mucking around in the internals of vLLM and the LM evaluation harness. So it might break at any moment! Choose wisely. Use `--in-memory` to enable this mode.
115+
116+
### Task search path
117+
118+
If you're using custom task definitions (and you should be) then you can append to the search path using the `--task-search-path` option. This should point to the directory your custom task YAML is in (or a parent of that directory). Multiple paths can be included by repeating the option.
119+
120+
### Batch size
121+
122+
Override the batch size used during merge evaluation. If using vLLM `auto` is recommended (default).
123+
124+
### CMA-ES options
125+
126+
#### `--max-fevals`
127+
128+
Maximum number of merges to evaluate. Note that the `cma` package is very loosey-goosey with this number and will happily go over by 50% depending on the size of each generation. Set to 100 by default.
129+
130+
#### `--sigma0`
131+
132+
Initial value of sigma for CMA-ES. No need to play with this unless you really know what you're doing.
133+
134+
### WandB logging
135+
136+
`mergekit-evolve` supports logging metrics to Weights & Biases. Enable this functionality with the `--wandb` flag. Project and entity names can be overridden with the `--wandb-project` and `--wandb-entity` options.
137+
138+
### Example
139+
140+
```sh
141+
mergekit-evolve --strategy pool --wandb --wandb-project mergekit-evolve --wandb-entity arcee-ai --storage-path /path/to/mergekit-evolve/ ./config.yml
142+
```
143+
144+
## Output
145+
146+
`mergekit-evolve` will write the merge configuration for the best merge found so far to the storage path with the filename `best_config.yaml`. If you're using WandB it will also log the config as an artifact. The script will keep running until a KeyboardInterrupt is received or `--max-fevals` is generously exceeded.
147+
148+
## Caveats
149+
150+
`mergekit-evolve` is a work in progress and has probably not been tested on your specific configuration. Keep an eye on the output before leaving it running, and if you run in to any issues don't hesitate to file an issue!
151+
152+
## Acknowledgements
153+
154+
Thanks to SakanaAI for the inspiration and the EleutherAI team for the LM evaluation harness.

mergekit/common.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
import peft
3737
import torch
3838
import transformers
39-
from pydantic import BaseModel, model_validator
39+
from pydantic import BaseModel, model_serializer, model_validator
4040
from pydantic_core import core_schema
4141
from transformers import AutoConfig, PretrainedConfig
4242
from typing_extensions import TypeVar
@@ -173,6 +173,13 @@ def validate_string(cls, value):
173173
raise RuntimeError(f"Can't parse {value}")
174174
return value
175175

176+
@model_serializer()
177+
def serialize(self):
178+
res = str(self)
179+
if '"' in res or " " in res:
180+
return self
181+
return res
182+
176183
@classmethod
177184
def parse(cls, value: str) -> "ModelReference":
178185
"""Parse a ModelReference. Format: '<MODEL_PATH>(+<LORA_PATH>)?'"""

mergekit/evo/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)