ByteDance-Seed · Alter-A1ways · Nov 27, 2025 · Nov 28, 2025 · Nov 28, 2025 · Nov 28, 2025
diff --git a/configs/sft/qwen3_sft.yaml b/configs/sft/qwen3_sft.yaml
@@ -0,0 +1,49 @@
+model:
+  model_path: Qwen/Qwen3-32B
+  attn_implementation: flash_attention_2
+
+data:
+  train_path: None
+  data_type: conversation
+  datasets_type: iterable
+  dataloader_type: native
+  chat_template: default
+  max_seq_len: 2048
+  train_size: 40000000
+  text_keys: messages
+
+train:
+  num_train_epochs: 2
+  max_steps: 2000
+  use_wandb: false
+  output_dir: qwen3_sft
+  data_parallel_mode: fsdp2
+  ulysses_parallel_size: 1
+  expert_parallel_size: 1
+  global_batch_size: 16
+  micro_batch_size: 1
+  rmpad: false
+  rmpad_with_pos_ids: true
+  bsz_warmup_ratio: 0.007
+  optimizer: adamw
+  lr: 1.0e-5
+  lr_warmup_ratio: 0.007
+  lr_decay_style: constant
+  lr_decay_ratio: 1.0
+  weight_decay: 0.01
+  max_grad_norm: 1.0
+  enable_mixed_precision: true
+  enable_gradient_checkpointing: true
+  enable_full_shard: true
+  enable_fsdp_offload: false
+  enable_activation_offload: false
+  init_device: meta
+  enable_full_determinism: false
+  empty_cache_steps: 500
+  ckpt_manager: dcp
+  load_checkpoint_path: ""
+  save_steps: 2000
+  save_epochs: 2
+  save_hf_weights: true
+  wandb_project: Qwen3_32B_sft
+  wandb_name: Qwen3_32B_fsdp2
diff --git a/docs/ascend_tutorials/ascend_quick_start.md b/docs/ascend_tutorials/ascend_quick_start.md
@@ -0,0 +1,74 @@
+# Ascend Quickstart
+
+Last updated: 11/28/2025
+
+我们在VeOmni上增加对华为昇腾设备的支持。
+
+### 基础环境准备
+
+| software  | version                  |
+|-----------|--------------------------|
+| Python    | >= 3.10, <3.12           |
+| CANN      | == 8.3.RC1               |
+| torch     | == 2.7.1                 |
+| torch_npu | == 2.7.1                 |
+
+基础环境准备请参照这份 [文档](https://gitcode.com/Ascend/pytorch) 。
+
+### 使用uv安装依赖环境
+
+#### 1. 进入VeOmni仓的根目录
+
+    git clone https://github.com/ByteDance-Seed/VeOmni.git
+    cd VeOmni
+
+#### 2. 固定python版本
+
+    uv python pin 3.11
+
+#### 3. 设置超时时间（可选）
+
+如果网络不稳定，可以设置超时时间防止下载超时。设置环境变量 UV_HTTP_TIMEOUT 来调整超时时间：
+
+    export UV_HTTP_TIMEOUT=60
+
+#### 4. 执行 uv 环境安装
+
+    uv sync --extra npu --allow-insecure-host github.com --allow-insecure-host pythonhosted.org
+
+#### 5. 环境使用
+
+安装好之后，在VeOmni根目录会出现一个.venv文件夹，就是uv安装的环境。可用如下命令激活环境：
+
+    source .venv/bin/activate
+
+查看安装的依赖包列表：
+
+    uv pip list 
+
+### 快速开始
+
+1. 准备模型和数据集
+
+2. 设置`train.sh`中`NPROC_PER_NODE`参数为实际卡数
+
+3. 运行训练脚本
+
+```bash
+# 设置环境变量
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
+export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
+export MULTI_STREAM_MEMORY_REUSE=2
+
+bash train.sh tasks/train_torch.py configs/sft/qwen3_sft.yaml
+```
+
+### 并行能力支持
+
+| 能力               | 是否支持               |
+|-------------------|----------------------|
+| fsdp              | ✅                   |
+| fsdp2             | ✅                   |
+| ulysses parallel  | ✅                   |
+| expert_parallel   | 适配中                |
+
diff --git a/docs/ascend_tutorials/ascend_quick_start_en.md b/docs/ascend_tutorials/ascend_quick_start_en.md
@@ -0,0 +1,74 @@
+# Ascend Quickstart
+
+Last updated: 2025-11-28
+
+We have added support for Huawei Ascend devices in VeOmni.
+
+### Environment Requirements
+
+| software  | version        |
+| --------- | -------------- |
+| Python    | >= 3.10, <3.12 |
+| CANN      | == 8.3.RC1     |
+| torch     | == 2.7.1       |
+| torch_npu | == 2.7.1       |
+
+Please refer to this [document](https://gitcode.com/Ascend/pytorch) for basic environment setup.
+
+### Installing Dependencies with uv
+
+#### 1. Enter the VeOmni root directory
+
+    git clone https://github.com/ByteDance-Seed/VeOmni.git
+    cd VeOmni
+
+#### 2. Pin the Python version
+
+    uv python pin 3.11
+
+#### 3. (Optional) Set timeout
+
+If the network is unstable, you can increase the timeout to avoid download failures by setting the UV_HTTP_TIMEOUT environment variable:
+
+    export UV_HTTP_TIMEOUT=60
+
+#### 4. Install the environment using uv
+
+    uv sync --extra npu --allow-insecure-host github.com --allow-insecure-host pythonhosted.org
+
+#### 5. Using the environment
+
+After installation, a .venv folder will appear in the VeOmni project root. This is the environment created by uv.
+Activate it with:
+
+    source .venv/bin/activate
+
+Check installed dependencies:
+
+    uv pip list
+
+### Quick Start
+
+1.	Prepare the model and dataset.
+
+2.	Set the NPROC_PER_NODE parameter in train.sh according to the number of available NPUs.
+
+3.	Run the training script:
+
+```bash
+# Set environment variables
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
+export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
+export MULTI_STREAM_MEMORY_REUSE=2
+
+bash train.sh tasks/train_torch.py configs/sft/qwen3_sft.yaml
+```
+
+Parallelism Support
+
+| Feature          | Supported   |
+| ---------------- | ----------- |
+| fsdp             | ✅           |
+| fsdp2            | ✅           |
+| ulysses parallel | ✅           |
+| expert_parallel  | In progress |