Skip to content

Commit e06d21c

Browse files
update ai4s.png (#7194)
1 parent adbec3d commit e06d21c

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

docs/guides/paddle_v3_features/higher_order_ad_cn.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,16 +184,18 @@ print(H_y_x2_x2.shape)
184184

185185
此外,飞桨利用高阶自动微分与编译优化技术,在与 NVIDIA 合作适配其 AI Physics 工具 Modulus-sym 的过程中,成功完成了全量模型适配([Modulus-sym[paddle-backend]](https://github.com/PaddlePaddle/modulus-sym/tree/paddle?tab=readme-ov-file#modulus-symbolic-betapaddle-backend)),实现了方程求解类模型性能的大幅优化,相比 Modulus-sym 现有后端**求解速度平均提升 71%**
186186

187-
在 AI 分子动力学套件 [DeePMD-kit](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) 中,我们对 dpa2, se_atten, se_e2_a 进行了动态图和编译器适配,相比 DeePMD-kit torch 后端,**求解速度分别提升了 102.6%, 40.5%, 102.6%**,相关结果已公开至论文:[DeePMD-kit v3: A Multiple-Backend Framework for Machine Learning Potentials](https://arxiv.org/abs/2502.19161)
188-
189187
<figure align="center">
190188
<img src="https://raw.githubusercontent.com/PaddlePaddle/docs/develop/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png" style="zoom:40%"/>
191189
</figure>
192190

191+
> 上述测试环境为:cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(388165), ips = total_batch_size / batch_cost(ms)
192+
193+
在 AI 分子动力学套件 [DeePMD-kit](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) 中,我们对 dpa2, se_atten, se_e2_a 进行了动态图和编译器适配,相比 DeePMD-kit torch 后端,**求解速度分别提升了 102.6%, 40.5%, 102.6%**,相关结果已公开至论文:[DeePMD-kit v3: A Multiple-Backend Framework for Machine Learning Potentials](https://arxiv.org/abs/2502.19161)
194+
193195
| 模型名称/平均耗时(s/batch) | Torch(dygraph) | Paddle(dygraph) | Paddle(CINN) | IPS 提升率 |
194196
|:---------------------------|:-------|:-------|:-------------|:-----------|
195197
| dpa2 | 0.1064 | 0.120 | **0.053** | 102.6% |
196198
| se_atten | 0.0336 | 0.049 | **0.024** | 40.5% |
197199
| se_e2_a | 0.0227 | 0.025 | **0.011** | 102.6% |
198200

199-
> cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(86994e3), ips 提升率 = (torch/paddle-1)
201+
> 上述测试环境为:cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(86994e3), ips 提升率 = (torch/paddle-1)
-312 KB
Loading

0 commit comments

Comments
 (0)