You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -160,7 +160,7 @@ Frequently used arguments are provided in ```configs/***_train_config``` and exp
160
160
161
161
-**attn_implementation**: "flash_attention_2" or "eager" or "sdpa", worked when model is supported by transformers officially
162
162
163
-
-**peft_type**: either "lora" or "qlora".
163
+
-**peft_type**: null or "lora" or "qlora". null for full-params training
164
164
165
165
-**lora_rank**: Rank value for Lora.
166
166
@@ -170,11 +170,11 @@ Frequently used arguments are provided in ```configs/***_train_config``` and exp
170
170
171
171
-**target_modules**: List of target modules in lora, we have default values if None
172
172
173
-
-**quantization**: Whether to use quantization."4bit" or "8bit", or null. For QLoRA, it is recommended to use 4-bit quantization.
173
+
-**quantization**: "4bit" for QLoRA/ null for LoRA and Full-params training.
174
174
175
175
-**pretrained_model_path**: Local/Shared disk path or model name on HuggingFace for the pre-trained model.
176
176
177
-
-**weighted_loss_mode**: Loss weighting method for multitask training. "case3" is recommended at present.
177
+
-**weighted_loss_mode**: Loss weighting method for multitask training. "case3" is recommended at present, "self-paced" is supported but need tuning of hyper-parameters.
178
178
179
179
-**padding_mode**: The way tokenized data is set. "padding" means padding for each sample to seq_length, "pack" means putting samples into seq_length as many as possible.
0 commit comments