Merge branch 'main' into release/3.3

Jintao-Huang · Jintao-Huang · commit 1e4890099efd · 2025-04-11T14:10:24.000+08:00
diff --git a/docs/source/GetStarted/SWIFT安装.md b/docs/source/GetStarted/SWIFT安装.md
@@ -8,8 +8,6 @@
 pip install 'ms-swift'
 # 使用评测
 pip install 'ms-swift[eval]' -U
-# 使用序列并行
-pip install 'ms-swift[seq_parallel]' -U
 # 全能力
 pip install 'ms-swift[all]' -U
 ```
diff --git a/docs/source/Instruction/预训练与微调.md b/docs/source/Instruction/预训练与微调.md
@@ -65,8 +65,8 @@ ms-swift使用了分层式的设计思想，用户可以使用命令行界面、
 - Any-to-Any模型训练：参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/all_to_all)。
 - 其他能力：
   - 数据流式读取: 在数据量较大时减少内存使用。参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/streaming/train.sh)。
-  - 序列并行: 参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/sequence_parallel)。
   - packing: 将多个序列拼成一个，让每个训练样本尽可能接近max_length，提高显卡利用率，参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/packing/train.sh)。
+  - 长文本训练: 参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/long_text)。
   - lazy tokenize: 在训练期间对数据进行tokenize而不是在训练前tokenize（多模态模型可以避免在训练前读入所有多模态资源），这可以避免预处理等待并节约内存。参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/lazy_tokenize/train.sh)。
 
 小帖士：
diff --git a/docs/source_en/GetStarted/SWIFT-installation.md b/docs/source_en/GetStarted/SWIFT-installation.md
@@ -8,8 +8,6 @@ You can install it using pip:
 pip install 'ms-swift'
 # For evaluation usage
 pip install 'ms-swift[eval]' -U
-# For sequence parallel usage
-pip install 'ms-swift[seq_parallel]' -U
 # Full capabilities
 pip install 'ms-swift[all]' -U
 ```
diff --git a/docs/source_en/Instruction/Pre-training-and-Fine-tuning.md b/docs/source_en/Instruction/Pre-training-and-Fine-tuning.md
@@ -68,8 +68,8 @@ Additionally, we offer a series of scripts to help you understand the training c
 - Any-to-Any Model Training: Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/all_to_all).
 - Other Capabilities:
   - Streaming Data Reading: Reduces memory usage when handling large datasets. Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/streaming/train.sh).
-  - Sequence Parallelism: Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/sequence_parallel).
   - Packing: Combines multiple sequences into one, making each training sample as close to max_length as possible to improve GPU utilization. Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/packing/train.sh).
+  - Long Text Training: Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/long_text).
   - Lazy Tokenize: Performs tokenization during training instead of pre-training (for multi-modal models, this avoids the need to load all multi-modal resources before training), which can reduce preprocessing wait times and save memory. Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/lazy_tokenize/train.sh).