|
1 | | -# How to train naml on kunlun |
| 1 | +# 使用昆仑XPU芯片加速NAML模型训练 |
2 | 2 |
|
3 | | -## Prepare kunlun environment |
4 | | -[Paddle installation for machines with Kunlun XPU card](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/2.0-rc1/install/install_Kunlun_zh.html) |
| 3 | +## 准备Paddle昆仑XPU版训练环境 |
| 4 | +[昆仑XPU芯片运行飞桨](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/xpu_docs/index_cn.html) |
5 | 5 |
|
6 | | -## Prepare data |
| 6 | +## 数据准备 |
| 7 | + |
| 8 | +### 示例数据 |
| 9 | +参考 [数据准备](README##数据准备) |
| 10 | + |
| 11 | + |
| 12 | +### 全量数据 |
7 | 13 | ```shell |
8 | 14 | cd PaddleRec/datasets/MIND/data |
9 | 15 | bash run.sh |
10 | 16 | ``` |
11 | 17 |
|
12 | | -## Train |
| 18 | +## 训练 |
13 | 19 | ```shell |
14 | | -# set kunlun card id |
| 20 | +# 设置训练使用的昆仑XPU芯片卡号 |
15 | 21 | export FLAGS_selected_xpus=0 |
16 | | -# enable convolution autotune |
| 22 | +# 开启昆仑XPU芯片卷积计算加速(可不设置) |
17 | 23 | export XPU_CONV_AUTOTUNE=2 |
18 | 24 |
|
19 | 25 | cd PaddleRec/models/rank/naml |
20 | | -python3.7 -u ../../../tools/trainer.py -m config_bigdata_kunlun.yaml |
| 26 | +# 全量数据动态图训练 |
| 27 | +python3.7 -u ../../../tools/trainer.py -m config_bigdata_kunlun.yaml # 使用示例数据,请指定config_kunlun.yaml |
| 28 | +# 全量数据静态图训练 |
| 29 | +python3.7 -u ../../../tools/static_trainer.py -m config_bigdata_kunlun.yaml # 使用示例数据,请指定config_kunlun.yaml |
21 | 30 | ``` |
22 | 31 |
|
23 | | - |
24 | | -## Eval |
| 32 | +## 评估 |
25 | 33 | ```shell |
26 | | -# set kunlun card id |
| 34 | +# 设置训练使用的昆仑XPU芯片卡号 |
27 | 35 | export FLAGS_selected_xpus=0 |
28 | | -# enable convolution autotune |
| 36 | +# 开启昆仑XPU芯片卷积计算加速(可不设置) |
29 | 37 | export XPU_CONV_AUTOTUNE=2 |
30 | 38 |
|
31 | 39 | cd PaddleRec/models/rank/naml |
32 | | -python3.7 -u ../../../tools/infer.py -m config_bigdata_kunlun.yaml |
| 40 | +# 全量数据动态图预测 |
| 41 | +python3.7 -u ../../../tools/infer.py -m config_bigdata_kunlun.yaml # 使用示例数据,请指定config_kunlun.yaml |
| 42 | +# 全量数据静态图预测 |
| 43 | +python3.7 -u ../../../tools/static_infer.py -m config_bigdata_kunlun.yaml # 使用示例数据,请指定config_kunlun.yaml |
33 | 44 | ``` |
| 45 | + |
| 46 | +## 模型效果 |
| 47 | +以下为全量数据训练2个epoch的结果: |
| 48 | + |
| 49 | +| 模型 | 训练auc |batch_size | epoch_num| Time of each epoch| |
| 50 | +| :------| :------ | :------ | :------| :------ | |
| 51 | +| naml | 0.71 | 50 | 2 | 约7小时 | |
| 52 | + |
| 53 | + |
| 54 | +| 模型 | 预测auc |batch_size | Time of each epoch| |
| 55 | +| :------| :------ | :------ | :------ | |
| 56 | +| naml | 0.67 | 10 | 约2小时 | |
0 commit comments