diff --git "a/docs/source/Megatron-SWIFT/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" "b/docs/source/Megatron-SWIFT/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md"
index b835cb959c..7c7ca5bcca 100644
--- "a/docs/source/Megatron-SWIFT/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md"
+++ "b/docs/source/Megatron-SWIFT/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md"
@@ -95,8 +95,8 @@
 - exit_on_missing_checkpoint: 如果设置了`–-load`，但**找不到检查点，则直接退出**，而不是初始化。默认为True。
 - 🔥async_save: 使用异步检查点保存。目前仅适用于`torch_dist`分布式检查点格式。默认为False。
 - use_persistent_ckpt_worker: 使用持久化检查点工作进程用于异步保存，即创建专门后台进程来处理异步保存。默认为False。
-- ckpt_fully_parallel_load: 跨 DP 对分布式检查点使用完全加载并行化，加速权重加载速度。默认为False。
-- ckpt_assume_constant_structure: 如果在单个训练中，模型和优化器状态字典结构保持不变，允许Megatron进行额外检查点性能优化。默认为False。
+- ckpt_fully_parallel_load: 跨 DP 对分布式检查点使用完全加载并行化，加速权重加载速度。默认为True。
+- ckpt_assume_constant_structure: 如果在单个训练中，模型和优化器状态字典结构保持不变，允许Megatron进行额外检查点性能优化。默认为True。
 
 **分布式参数**:
 并行技术的选择请参考[训练技巧文档](快速开始.md#训练技巧)。
diff --git a/docs/source_en/Megatron-SWIFT/Command-line-parameters.md b/docs/source_en/Megatron-SWIFT/Command-line-parameters.md
index 37983e8e1b..5bc776fc43 100644
--- a/docs/source_en/Megatron-SWIFT/Command-line-parameters.md
+++ b/docs/source_en/Megatron-SWIFT/Command-line-parameters.md
@@ -99,8 +99,8 @@
 - exit_on_missing_checkpoint: If `--load` is set but **no checkpoint is found, exit directly** instead of initializing. Default is True.
 - 🔥async_save: Use asynchronous checkpoint saving. Currently only applicable to the `torch_dist` distributed checkpoint format. Defaults to False.
 - use_persistent_ckpt_worker: Use a persistent checkpoint worker process for async saving, i.e., create a dedicated background process to handle asynchronous saving. Defaults to False.
-- ckpt_fully_parallel_load: Apply full load parallelization across DP for distributed checkpoints to accelerate weight loading speed. Defaults to False.
-- ckpt_assume_constant_structure: If the model and optimizer state dict structure remains constant throughout a single training job, allows Megatron to perform additional checkpoint performance optimizations. Defaults to False.
+- ckpt_fully_parallel_load: Apply full load parallelization across DP for distributed checkpoints to accelerate weight loading speed. Defaults to True.
+- ckpt_assume_constant_structure: If the model and optimizer state dict structure remains constant throughout a single training job, allows Megatron to perform additional checkpoint performance optimizations. Defaults to True.
 
 
 **Distributed Parameters**:
diff --git a/swift/megatron/argument/megatron_args.py b/swift/megatron/argument/megatron_args.py
index 4cd2fc7a18..8f46599eb4 100644
--- a/swift/megatron/argument/megatron_args.py
+++ b/swift/megatron/argument/megatron_args.py
@@ -214,8 +214,8 @@ class MegatronArguments(ExtraMegatronArguments):
     exit_on_missing_checkpoint: bool = True
     async_save: bool = False
     use_persistent_ckpt_worker: bool = False
-    ckpt_fully_parallel_load: bool = False
-    ckpt_assume_constant_structure: bool = False
+    ckpt_fully_parallel_load: bool = True
+    ckpt_assume_constant_structure: bool = True
 
     # dist
     distributed_backend: Literal['nccl', 'gloo'] = 'nccl'