Skip to content

Commit 3750eda

Browse files
authored
fix use_cache (#3487)
1 parent 45f1e8f commit 3750eda

File tree

4 files changed

+2
-15
lines changed

4 files changed

+2
-15
lines changed

docs/source/Instruction/ReleaseNote3.0.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,3 @@
7878
4. merge_lora的存储目录可以通过`--output_dir`指定了,且merge_lora和量化不能在一个命令中执行,需要最少两个命令
7979
5. 使用`swift app --model xxx`开启app-ui界面,支持了多模态界面推理
8080
6. 移除了AIGC的依赖以及对应的examples和训练代码
81-
82-
## 待完成
83-
84-
1. 自定义数据集评测3.0版本尚不支持,请使用2.6.1版本
85-
2. Megatron预训练能力3.0版本尚不支持,请使用2.6.1版本
86-
3. 文档和README暂时未更新完整

docs/source_en/Instruction/ReleaseNote3.0.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,3 @@ The parameters marked as compatible in version 2.0 have been entirely removed.
9191
5. Use `swift app --model xxx` to launch the app-ui interface, which supports multimodal interface inference.
9292

9393
6. Removed dependencies for AIGC along with corresponding examples and training code.
94-
95-
## Pending Tasks
96-
97-
1. Custom dataset evaluation is not supported in version 3.0. Please use version 2.6.1.
98-
2. Megatron pre-training capabilities are not supported in version 3.0. Please use version 2.6.1.
99-
3. Documentation and README are temporarily incomplete and will be updated.

swift/llm/train/sft.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,10 @@ def __init__(self, args: Union[List[str], TrainArguments, None] = None) -> None:
3434

3535
def _prepare_gradient_checkpointing(self):
3636
args = self.args
37-
37+
self.model.config.use_cache = False
3838
if args.gradient_checkpointing:
3939
self.model.supports_gradient_checkpointing = True
4040
dynamic_gradient_checkpointing(self.model)
41-
self.model.config.use_cache = False # fix transformers==4.36
4241
self.model.enable_input_require_grads()
4342
model_meta = self.model.model_meta
4443
model_arch = get_model_arch(model_meta.model_arch)

swift/trainers/mixin.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ def _save(self, output_dir: Optional[str] = None, state_dict=None):
224224
from swift.llm import save_checkpoint
225225
additional_saved_files = self.model_meta.additional_saved_files
226226
save_checkpoint(None, self.template.processor, output_dir, additional_saved_files=additional_saved_files)
227-
if hasattr(self.model, 'origin_generation_config'):
227+
if getattr(self.model, 'origin_generation_config', None):
228228
self.model.origin_generation_config.save_pretrained(output_dir)
229229

230230
def _fix_zero3_gather_all_parameters(self) -> None:

0 commit comments

Comments
 (0)