You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/peft/tuners/gralora/config.py
+31-33Lines changed: 31 additions & 33 deletions
Original file line number
Diff line number
Diff line change
@@ -26,50 +26,48 @@ class GraloraConfig(PeftConfig):
26
26
27
27
Args:
28
28
r (`int`):
29
-
GraLoRA attention dimension determines the rank of the GraLoRA adapter.
30
-
The total parameter count of the GraLoRA adapter is same as LoRA with same rank r, while the expressivitiy is multiplied by gralora_k.
29
+
GraLoRA attention dimension determines the rank of the GraLoRA adapter. The total parameter count of the
30
+
GraLoRA adapter is same as LoRA with same rank r, while the expressivitiy is multiplied by gralora_k.
31
31
hybrid_r (`int`):
32
32
Hybrid GraLoRA rank determines the rank allocated to vanilla LoRA method when using Hybrid GraLoRA method.
33
-
Hybrid GraLoRA, a combination of GraLoRA and vanilla LoRA, becomes available when hybrid_r > 0.
34
-
The parameter count of the GraLoRA adapter is r + hybrid_r.
33
+
Hybrid GraLoRA, a combination of GraLoRA and vanilla LoRA, becomes available when hybrid_r > 0. The
34
+
parameter count of the GraLoRA adapter is r + hybrid_r.
35
35
target_modules (`Union[List[str], str]`):
36
-
List of module names or regex expression of the module names to replace with GraLoRA. "
37
-
For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. "
38
-
This can also be a wildcard 'all-linear' which matches all linear/Conv1D "
39
-
"(if the model is a PreTrainedModel, the output layer excluded). "
40
-
If not specified, modules will be chosen according to the model architecture, If the architecture is "
41
-
not known, an error will be raised -- in this case, you should specify the target modules manually. "
42
-
To avoid targeting any modules (because you want to apply `target_parameters`), set "
43
-
`target_modules=[]`.
36
+
List of module names or regex expression of the module names to replace with GraLoRA. " For example, ['q',
37
+
'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. " This can also be a wildcard 'all-linear'
38
+
which matches all linear/Conv1D " "(if the model is a PreTrainedModel, the output layer excluded). " If not
39
+
specified, modules will be chosen according to the model architecture, If the architecture is " not known,
40
+
an error will be raised -- in this case, you should specify the target modules manually. " To avoid
41
+
targeting any modules (because you want to apply `target_parameters`), set " `target_modules=[]`.
44
42
gralora_alpha (`int`): GraLoRA alpha.
45
-
GraLoRA alpha is the scaling factor for the GraLoRA adapter.
46
-
Scale becomes gralora_alpha / (r + hybrid_r).
43
+
GraLoRA alpha is the scaling factor for the GraLoRA adapter. Scale becomes gralora_alpha / (r + hybrid_r).
47
44
gralora_dropout (`float`):
48
-
GraLoRA dropout is the dropout probability for the GraLoRA adapter.
49
-
It is used to prevent overfitting and improve the generalization of the GraLoRA adapter.
45
+
GraLoRA dropout is the dropout probability for the GraLoRA adapter. It is used to prevent overfitting and
46
+
improve the generalization of the GraLoRA adapter.
50
47
gralora_k (`int`):
51
-
GraLoRA k determines the number of subblocks in the GraLoRA adapter.
52
-
The rank r must be divisible by gralora_k for the GraLoRA adapter to be valid.
53
-
The total parameter count is preserved regardles of gralora_k.
54
-
The entire rank of the GraLoRA adapter is increased by gralora_k, while the rank of each subblock is reduced by gralora_k.
55
-
gralora_k=2 is recommended for rank 32 or lower, and gralora_k=4 is recommended for rank 64 or higher.
48
+
GraLoRA k determines the number of subblocks in the GraLoRA adapter. The rank r must be divisible by
49
+
gralora_k for the GraLoRA adapter to be valid. The total parameter count is preserved regardles of
50
+
gralora_k. The entire rank of the GraLoRA adapter is increased by gralora_k, while the rank of each
51
+
subblock is reduced by gralora_k. gralora_k=2 is recommended for rank 32 or lower, and gralora_k=4 is
52
+
recommended for rank 64 or higher.
56
53
fan_in_fan_out (`bool`):
57
-
Set this to True if the layer to replace stores weight like (fan_in, fan_out).
58
-
For example, gpt-2 uses `Conv1D` which stores weights like (fan_in, fan_out) and hence this should be set to `True`.
54
+
Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses
55
+
`Conv1D` which stores weights like (fan_in, fan_out) and hence this should be set to `True`.
59
56
bias (`str`):
60
-
Bias type for gralora. Can be 'none', 'all' or 'gralora_only'.
61
-
If 'all' or 'gralora_only', the corresponding biases will be updated during training.
62
-
Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
57
+
Bias type for gralora. Can be 'none', 'all' or 'gralora_only'. If 'all' or 'gralora_only', the
58
+
corresponding biases will be updated during training. Be aware that this means that, even when disabling
59
+
the adapters, the model will not produce the same output as the base model would have without adaptation.
63
60
init_weights (`bool`):
64
-
Whether to initialize the weights of the GraLoRA layers with their default initialization.
65
-
Don't change this setting, except if you know exactly what you're doing.
61
+
Whether to initialize the weights of the GraLoRA layers with their default initialization. Don't change
62
+
this setting, except if you know exactly what you're doing.
66
63
layers_to_transform (`Union[List[int], int]`):
67
-
The layer indexes to transform, is this argument is specified, PEFT will transform only the layers indexes that are specified inside this list.
68
-
If a single integer is passed, PEFT will transform only the layer at this index.
69
-
This only works when target_modules is a list of str.
64
+
The layer indexes to transform, is this argument is specified, PEFT will transform only the layers indexes
65
+
that are specified inside this list. If a single integer is passed, PEFT will transform only the layer at
66
+
this index. This only works when target_modules is a list of str.
0 commit comments