You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: Extend the regex for rank/alpha pattern (#2419)
Supersedes #2382
Right now, the regex used to match the keys passed for rank_pattern and
alpha_pattern requires that either:
1. The module name is identical to the key
2. The module name having a prefix and then ending on the key
This is restrictive, since it doesn't allow to disambiguate between all
cases. E.g. if we have a model with these attributes:
- model.foo
- model.bar.foo
We cannot currently target just model.foo. (We can already target only
model.bar.foo by passing "bar.foo" as a key to the rank_pattern /
alpha_pattern dict).
This PR makes it possible to pass "^foo" as a key. This way,
model.bar.foo is not targeted, as the key does not start with "foo".
As a general rule for users, if they intend to have a full match, they
should pass the full name of the module preceded by a ^. This is the
least ambigious way.
When running the test case with the old code, all the test cases with ^
will fail, which is fine, since ^ was not working anyway. At the same
time, all test cases not using ^ pass, which means they are backwards
compatible.
Copy file name to clipboardExpand all lines: docs/source/developer_guides/lora.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -239,6 +239,36 @@ Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a
239
239
[Fewshot-Metamath-OrcaVicuna-Mistral-10B](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B) is an example of a model trained using this method on Mistral-7B expanded to 10B. The
240
240
[adapter_config.json](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B/blob/main/adapter_config.json) shows a sample LoRA adapter config applying this method for fine-tuning.
241
241
242
+
### Fine grained control over ranks and alpha (scaling)
243
+
244
+
By default, all layers targeted with LoRA will have the same rank `r` and the same `lora_alpha` (which determines the LoRA scaling), depending on what was specified in the [`LoraConfig`]. In same cases, however, you may want to indicate different values for different layers. This is possible by passing the `rank_pattern` and `alpha_pattern` arguments to [`LoraConfig`]. These arguments should be dictionaries with the key being the layer name and the value being the rank/alpha value. The keys can be [regular expressesions](https://docs.python.org/3/library/re.html) (regex). All LoRA layers that are not explicitly mentioned in `rank_pattern` and `alpha_pattern` will take the default `r` and `lora_alpha` values.
245
+
246
+
To give an examples, let's assume that we have a model with the following structure:
247
+
248
+
```python
249
+
>>>print(model)
250
+
Outer(
251
+
(foo): Linear(...)
252
+
(module): Middle(
253
+
(foo): Linear(...)
254
+
(foobar): Linear(...)
255
+
(module): Inner(
256
+
(foo): Linear(...)
257
+
(barfoo): Linear(...)
258
+
)
259
+
)
260
+
)
261
+
```
262
+
263
+
-`rank_pattern={"foo": 42}` will match all 3 `foo` layers. Neither `foobar` nor `barfoo` are matched.
264
+
-`rank_pattern={"^foo": 42}` will only match the `foo` layer of the model, but neither `module.foo` nor `module.module.foo`. This is because the `^` means "start of string" when using regular expressions, and only `foo` starts with `"foo"`, the other layer names have prefixes.
265
+
-`rank_pattern={"^module.foo": 42}` matches only `module.foo`, but not `module.module.foo`, for the same reason.
266
+
-`rank_pattern={"module.foo": 42}` matches both `module.foo` and `module.module.foo`, but not `foo`.
267
+
-`rank_pattern={"^foo": 42, "^module.module.foo": 55}` matches `foo` and `module.module.foo`, respectively, but not `module.foo`.
268
+
- There is no need to indicate `$` to mark the end of the match, as this is added automatically by PEFT.
269
+
270
+
The same logic applies to `alpha_pattern`. If you're in doubt, don't try to get fancy with regular expressions -- just pass the full name for each module with a different rank/alpha, preceded by the `^` prefix, and you should be good.
271
+
242
272
## Optimizers
243
273
244
274
LoRA training can optionally include special purpose optimizers. Currently the only such optimizer is LoRA+.
0 commit comments