Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions configs/sft/qwen3_sft.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
model:
model_path: Qwen/Qwen3-32B
attn_implementation: flash_attention_2

data:
train_path: None
data_type: conversation
datasets_type: iterable
dataloader_type: native
chat_template: default
max_seq_len: 2048
train_size: 40000000
text_keys: messages

train:
num_train_epochs: 2
max_steps: 2000
use_wandb: false
output_dir: qwen3_sft
data_parallel_mode: fsdp2
ulysses_parallel_size: 1
expert_parallel_size: 1
global_batch_size: 16
micro_batch_size: 1
rmpad: false
rmpad_with_pos_ids: true
bsz_warmup_ratio: 0.007
optimizer: adamw
lr: 1.0e-5
lr_warmup_ratio: 0.007
lr_decay_style: constant
lr_decay_ratio: 1.0
weight_decay: 0.01
max_grad_norm: 1.0
enable_mixed_precision: true
enable_gradient_checkpointing: true
enable_full_shard: true
enable_fsdp_offload: false
enable_activation_offload: false
init_device: meta
enable_full_determinism: false
empty_cache_steps: 500
ckpt_manager: dcp
load_checkpoint_path: ""
save_steps: 2000
save_epochs: 2
save_hf_weights: true
wandb_project: Qwen3_32B_sft
wandb_name: Qwen3_32B_fsdp2
74 changes: 74 additions & 0 deletions docs/ascend_tutorials/ascend_quick_start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Ascend Quickstart

Last updated: 11/28/2025

我们在VeOmni上增加对华为昇腾设备的支持。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议同意使用英文或者有两个版本(纯中文版本/纯英文版本)


### 基础环境准备

| software | version |
|-----------|--------------------------|
| Python | >= 3.10, <3.12 |
| CANN | == 8.3.RC1 |
| torch | == 2.7.1 |
| torch_npu | == 2.7.1 |

基础环境准备请参照这份 [文档](https://gitcode.com/Ascend/pytorch) 。

### 使用uv安装依赖环境

#### 1. 进入VeOmni仓的根目录

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

#### 2. 固定python版本

uv python pin 3.11

#### 3. 设置超时时间(可选)

如果网络不稳定,可以设置超时时间防止下载超时。设置环境变量 UV_HTTP_TIMEOUT 来调整超时时间:

export UV_HTTP_TIMEOUT=60

#### 4. 执行 uv 环境安装

uv sync --extra npu --allow-insecure-host github.com --allow-insecure-host pythonhosted.org

#### 5. 环境使用

安装好之后,在VeOmni根目录会出现一个.venv文件夹,就是uv安装的环境。可用如下命令激活环境:

source .venv/bin/activate

查看安装的依赖包列表:

uv pip list

### 快速开始

1. 准备模型和数据集

2. 设置`train.sh`中`NPROC_PER_NODE`参数为实际卡数

3. 运行训练脚本

```bash
# 设置环境变量
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MULTI_STREAM_MEMORY_REUSE=2

bash train.sh tasks/train_torch.py configs/sft/qwen3_sft.yaml
```

### 并行能力支持

| 能力 | 是否支持 |
|-------------------|----------------------|
| fsdp | ✅ |
| fsdp2 | ✅ |
| ulysses parallel | ✅ |
| expert_parallel | 适配中 |

74 changes: 74 additions & 0 deletions docs/ascend_tutorials/ascend_quick_start_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Ascend Quickstart

Last updated: 2025-11-28

We have added support for Huawei Ascend devices in VeOmni.

### Environment Requirements

| software | version |
| --------- | -------------- |
| Python | >= 3.10, <3.12 |
| CANN | == 8.3.RC1 |
| torch | == 2.7.1 |
| torch_npu | == 2.7.1 |

Please refer to this [document](https://gitcode.com/Ascend/pytorch) for basic environment setup.

### Installing Dependencies with uv

#### 1. Enter the VeOmni root directory

git clone https://github.com/ByteDance-Seed/VeOmni.git
cd VeOmni

#### 2. Pin the Python version

uv python pin 3.11

#### 3. (Optional) Set timeout

If the network is unstable, you can increase the timeout to avoid download failures by setting the UV_HTTP_TIMEOUT environment variable:

export UV_HTTP_TIMEOUT=60

#### 4. Install the environment using uv

uv sync --extra npu --allow-insecure-host github.com --allow-insecure-host pythonhosted.org

#### 5. Using the environment

After installation, a .venv folder will appear in the VeOmni project root. This is the environment created by uv.
Activate it with:

source .venv/bin/activate

Check installed dependencies:

uv pip list

### Quick Start

1. Prepare the model and dataset.

2. Set the NPROC_PER_NODE parameter in train.sh according to the number of available NPUs.

3. Run the training script:

```bash
# Set environment variables
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MULTI_STREAM_MEMORY_REUSE=2

bash train.sh tasks/train_torch.py configs/sft/qwen3_sft.yaml
```

Parallelism Support

| Feature | Supported |
| ---------------- | ----------- |
| fsdp | ✅ |
| fsdp2 | ✅ |
| ulysses parallel | ✅ |
| expert_parallel | In progress |