Skip to content

Commit 6c9b422

Browse files
authored
Merge branch 'PaddlePaddle:develop' into dev_model
2 parents eb0a431 + 931e5c5 commit 6c9b422

File tree

16 files changed

+258
-32
lines changed

16 files changed

+258
-32
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ PaddleScience 是一个基于深度学习框架 PaddlePaddle 开发的科学计
108108
| 晶体材料属性预测 | [CGCNN](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/cgcnn/) | 数据驱动 | GNN | 监督学习 | [MP](https://next-gen.materialsproject.org/) / [Perovskite](https://cmr.fysik.dtu.dk/cubic_perovskites/cubic_perovskites.html) / [C2DB](https://cmr.fysik.dtu.dk/c2db/c2db.html) / [test](https://paddle-org.bj.bcebos.com/paddlescience%2Fdatasets%2Fcgcnn%2Fcgcnn-test.zip) | [Paper](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301) |
109109
| 分子生成 | [MoFlow](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/moflow/) | 数据驱动 | Flow Model | 监督学习 | [qm9/ zink250k](https://aistudio.baidu.com/datasetdetail/282687) | [Paper](https://arxiv.org/abs/2006.10137v1) |
110110
| 分子属性预测 | [IFM](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/ifm/) | 数据驱动 | MLP | 监督学习 | [tox21/sider/hiv/bace/bbbp](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/ifm/#:~:text=molecules%20%E6%95%B0%E6%8D%AE%E9%9B%86-,dataset.zip,-%EF%BC%8C%E6%88%96Google%20Drive) | [Paper](https://openreview.net/pdf?id=NLFqlDeuzt) |
111-
111+
| 二维材料生成与数据库 | [ML2DDB](./en/examples/ml2ddb.md) | 数据驱动 | GNN/Diffusion | 监督学习 | Coming Soon | [Paper](https://arxiv.org/pdf/2507.00584) |
112112

113113
<br>
114114
<p align="center"><b>地球科学(AI for Earth Science)</b></p>
@@ -166,7 +166,7 @@ PaddleScience 是一个基于深度学习框架 PaddlePaddle 开发的科学计
166166
### 安装 PaddlePaddle
167167

168168
<!-- --8<-- [start:paddle_install] -->
169-
请根据您的运行环境,访问 [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html) 官网,安装 <font color="red"><b>3.0 develop</b></font> 版的 PaddlePaddle
169+
请根据您的运行环境,访问 [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html) 官网,建议安装 PaddlePaddle <font color="red"><b>3.0 以上稳定版,或最新的 develop 开发版</b></font>。
170170

171171
安装完毕之后,运行以下命令,验证 Paddle 是否安装成功。
172172

deploy/python_infer/base.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ class Predictor:
3939
pdmodel_path (Optional[str]): Path to the PaddlePaddle model file. Defaults to None.
4040
pdiparams_path (Optional[str]): Path to the PaddlePaddle model parameters file. Defaults to None.
4141
device (Literal["cpu", "gpu", "npu", "xpu", "sdaa"], optional): Device to use for inference. Defaults to "cpu".
42-
engine (Literal["native", "tensorrt", "onnx", "mkldnn"], optional): Inference engine to use. Defaults to "native".
42+
engine (Literal["native", "tensorrt", "onnx", "onednn"], optional): Inference engine to use. Defaults to "native".
4343
precision (Literal["fp32", "fp16", "int8"], optional): Precision to use for inference. Defaults to "fp32".
4444
onnx_path (Optional[str], optional): Path to the ONNX model file. Defaults to None.
4545
ir_optim (bool, optional): Whether to use IR optimization. Defaults to True.
@@ -55,7 +55,7 @@ def __init__(
5555
pdiparams_path: Optional[str] = None,
5656
*,
5757
device: Literal["cpu", "gpu", "npu", "xpu", "sdaa"] = "cpu",
58-
engine: Literal["native", "tensorrt", "onnx", "mkldnn"] = "native",
58+
engine: Literal["native", "tensorrt", "onnx", "onednn"] = "native",
5959
precision: Literal["fp32", "fp16", "int8"] = "fp32",
6060
onnx_path: Optional[str] = None,
6161
ir_optim: bool = True,
@@ -157,11 +157,11 @@ def _create_paddle_predictor(
157157
config.enable_xpu(10 * 1024 * 1024)
158158
else:
159159
config.disable_gpu()
160-
if self.engine == "mkldnn":
160+
if self.engine == "onednn":
161161
# 'set_mkldnn_cache_capatity' is not available on macOS
162162
if platform.system() != "Darwin":
163163
...
164-
# cache 10 different shapes for mkldnn to avoid memory leak
164+
# cache 10 different shapes for onednn to avoid memory leak
165165
# config.set_mkldnn_cache_capacity(10)
166166
config.enable_mkldnn()
167167

@@ -170,6 +170,11 @@ def _create_paddle_predictor(
170170

171171
config.set_cpu_math_library_num_threads(self.num_cpu_threads)
172172

173+
elif self.engine == "mkldnn":
174+
raise ValueError(
175+
"The 'mkldnn' engine is deprecated. Please use 'onednn' instead."
176+
)
177+
173178
# enable memory optim
174179
config.enable_memory_optim()
175180
# config.disable_glog_info()
@@ -221,9 +226,9 @@ def _check_device(self, device: str):
221226
)
222227

223228
def _check_engine(self, engine: str):
224-
if engine not in ["native", "tensorrt", "onnx", "mkldnn"]:
229+
if engine not in ["native", "tensorrt", "onnx", "onednn"]:
225230
raise ValueError(
226-
"Inference only supports 'native', 'tensorrt', 'onnx' and 'mkldnn' "
231+
"Inference only supports 'native', 'tensorrt', 'onnx' and 'onednn' "
227232
f"engines, but got {engine}."
228233
)
229234

docs/en/examples/ml2ddb.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# ML2DDB
2+
3+
[Monolayer Two-dimensional Materials Database (ML2DDB) and Applications](https://arxiv.org/pdf/2507.00584)
4+
5+
Zhongwei Liu<sup>a, b, #</sup>,
6+
Zhimin Zhang<sup>c, #</sup>,
7+
Xuwei Liu<sup>c, #</sup>,
8+
Mingjia Yao<sup>b</sup>,
9+
Xin He<sup>a</sup>,
10+
Yuanhui Sun<sup>b, *</sup>,
11+
Xin Chen<sup>b, *</sup>,
12+
Lijun Zhang<sup>a, b, *</sup>
13+
14+
<sup>a</sup>
15+
State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE and College of Materials Science and Engineering, Jilin University, Changchun 130012, China
16+
17+
<sup>b</sup> Suzhou Laboratory, Suzhou, 215123, China
18+
19+
<sup>c</sup> Baidu Inc., Beijing, P.R. China.
20+
21+
<sup>#</sup> These authors contributed equally to this work.
22+
23+
24+
25+
## Abstract
26+
27+
The discovery of two-dimensional (2D) materials with tailored properties is critical to meet the increasing demands of high-performance applications across flexible electronics, optoelectronics, catalysis, and energy storage. However, current 2D material databases are constrained by limited scale and compositional diversity. In this study, we introduce a scalable active learning workflow that integrates deep neural networks with density functional theory (DFT) calculations to efficiently explore a vast set of candidate structures. These structures are generated through physics-informed elemental substitution strategies, enabling broad and systematic discovery of stable 2D materials. Through six iterative screening cycles, we established the creation of the Monolayer 2D Materials Database (ML2DDB), which contains 242,546 DFT-validated stable structures—an order-of-magnitude increase over the largest known 2D materials databases. In particular, the number of ternary and quaternary compounds showed the most significant increase. Combining this database with a generative diffusion model, we demonstrated effective structure generation under specified chemistry and symmetry constraints. This work accomplished an organically interconnected loop of 2D material data expansion and application, which provides a new paradigm for the discovery of new materials.
28+
29+
![ML2DDB](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/ml2ddb.png)
30+
31+
## Dataset of 2D materials
32+
33+
We developed ML2DDB, a large-scale 2D material database containing >242k DFT-validated monolayer structures (𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> <50 meV/atom), representing a 10× increase over existing datasets. Key features:
34+
35+
- Broad elemental coverage: 81 elements across the periodic table (excluding radioactive/noble gases).
36+
- Enhanced diversity: Significantly more compounds with 3–4 distinct elements compared to prior work.
37+
- Structural richness: Diverse prototypes and cation-anion combinations.
38+
- Extended resource: >1M candidate structures (𝐸<sub>hull</sub><sup>MLIP</sup> <200 meV/atom) for future studies.
39+
40+
![dataset](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/ml2ddb_dataset.png)
41+
42+
## Diffusion model generation of S.U.N. materials
43+
44+
The capability to generate S.U.N. (stable, unique, new) 2D materials are prerequisites for diffusion models. We considered a generated structure as stable with 𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 100 meV/atom with respect to ML2DDB. The unique is specified whether a generated structure matches any other structure generated in the same batch or not, and the new is whether it is identical to any of the structures in ML2DDB. As shown in Figure 5b, we performed DFT structure optimization on 1024 structures to evaluate the stable attribute. The results show that 74.8% of them are considered stable (𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 100 meV/atom), which is comparable to the success rate of 3D stable structure generation of MatterGen. When the constraint is set to 𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 0 meV/atom, our method achieved a success rate of 59.6%, which is significantly higher than that of MatterGen (~13%). In addition, the Root-mean-square displacement (RMSD) of the generated structure is lower than 0.26 Å compared to the DFT relaxation structure, which is still less than the radius of the hydrogen atom (0.53 Å). For the generation of unique structures, the success rate accounts for 100% when generating one thousand structures. The rate only decreases 4.4% when generating ten thousand structures. For the generation of new structures, the rate decreases from 100% to 73.5% when the generated structures grow from one thousand to two thousand. This indicates that our model has a relatively excellent ability to generate completely new stable structures.
45+
46+
![dataset](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/gen_2d.png)
47+
48+
## Conclusion
49+
50+
This study establishes a novel framework integrating active learning workflows with conditional diffusion-based structural generation, achieving unprecedented expansion of 2D materials databases. Key contributions include:
51+
52+
1. **Dataset Advancement**
53+
- Created ML2DDB containing >242,546 thermodynamically stable 2D materials (E_hull^DFT <50 meV/atom), exceeding existing databases by ≥10x
54+
- Achieved 1100% and 960% growth in ternary/quaternary compounds respectively
55+
- Generated >1 million candidate structures (𝐸<sub>hull</sub><sup>MLIP</sup> <200 meV/atom)
56+
2. **Methodological Innovation**
57+
- Developed MLIP model with 92.36% accuracy in stability classification
58+
- Enabled phase diagram generation and space group-specific design through diffusion model integration
59+
- Demonstrated applicability to nonlinear optical and ferroelectric materials discovery

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@
141141
|-----|---------|-----|---------|----|---------|---------|
142142
| 材料设计 | [散射板设计(反问题)](./zh/examples/hpinns.md) | 机理驱动 | Transformer | 无监督学习 | [Train Data](https://paddle-org.bj.bcebos.com/paddlescience/datasets/hPINNs/hpinns_holo_train.mat)<br>[Eval Data](https://paddle-org.bj.bcebos.com/paddlescience/datasets/hPINNs/hpinns_holo_valid.mat) | [Paper](https://arxiv.org/pdf/2102.04626.pdf) |
143143
| 晶体材料属性预测 | [CGCNN](./zh/examples/cgcnn.md) | 数据驱动 | GNN | 监督学习 | [MP](https://next-gen.materialsproject.org/) / [Perovskite](https://cmr.fysik.dtu.dk/cubic_perovskites/cubic_perovskites.html) / [C2DB](https://cmr.fysik.dtu.dk/c2db/c2db.html) / [test](https://paddle-org.bj.bcebos.com/paddlescience%2Fdatasets%2Fcgcnn%2Fcgcnn-test.zip) | [Paper](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301) |
144+
| 二维材料生成与数据库 | [ML2DDB](./en/examples/ml2ddb.md) | 数据驱动 | GNN/Diffusion | 监督学习 | Coming Soon | [Paper](https://arxiv.org/pdf/2507.00584) |
144145

145146
<br>
146147
<p align="center"><b>地球科学(AI for Earth Science)</b></p>

docs/zh/api/utils/initializer.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,7 @@
1616
- kaiming_normal_
1717
- linear_init_
1818
- conv_init_
19+
- glorot_normal_
20+
- lecun_normal_
1921
show_root_heading: True
2022
heading_level: 3

docs/zh/user_guide.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -464,7 +464,7 @@ ppsci MESSAGE: Visualization result is saved to: ./aneurysm_pred.vtu
464464
465465
PaddleScience 提供了多种推理配置组合,可通过命令行进行组合,目前支持的推理配置如下:
466466
467-
| | Native | ONNX | TensorRT | macaRT | MKLDNN |
467+
| | Native | ONNX | TensorRT | macaRT | oneDNN |
468468
| :--- | :--- | :--- | :--- | :--- | :--- |
469469
| Intel(CPU) | ✅ | ✅ | / | / | ✅ |
470470
| NVIDIA | ✅ | ✅ | ✅ | / | / |
@@ -576,31 +576,31 @@ PaddleScience 提供了多种推理配置组合,可通过命令行进行组合
576576
INFER.engine=onnx
577577
```
578578
579-
=== "使用 MKLDNN 推理"
579+
=== "使用 oneDNN 推理"
580580
581-
MKLDNN 是英特尔推出的高性能推理引擎,适用于 CPU 推理加速,PaddleScience 支持了 MKLDNN 推理功能。
581+
oneDNN 是英特尔推出的高性能推理引擎,适用于 CPU 推理加速,PaddleScience 支持了 oneDNN 推理功能。
582582
583583
运行以下命令进行推理:
584584
585585
``` sh
586586
python aneurysm.py mode=infer \
587587
INFER.device=cpu \
588-
INFER.engine=mkldnn
588+
INFER.engine=onednn
589589
```
590590
591591
!!! info "完整推理配置参数"
592592
593593
| 参数 | 默认值 | 说明 |
594594
| :--- | :--- | :--- |
595595
| `INFER.device` | `cpu` | 推理设备,目前支持 `cpu``gpu` |
596-
| `INFER.engine` | `native` | 推理引擎,目前支持 `native`, `tensorrt`, `onnx``mkldnn` |
596+
| `INFER.engine` | `native` | 推理引擎,目前支持 `native`, `tensorrt`, `onnx``onednn` |
597597
| `INFER.precision` | `fp32` | 推理精度,目前支持 `fp32`, `fp16` |
598598
| `INFER.ir_optim` | `True` | 是否启用 IR 优化 |
599599
| `INFER.min_subgraph_size` | `30` | TensorRT 中最小子图 size,当子图的 size 大于该值时,才会尝试对该子图使用 TensorRT 计算 |
600600
| `INFER.gpu_mem` | `2000` | 初始显存大小 |
601601
| `INFER.gpu_id` | `0` | GPU 逻辑设备号 |
602602
| `INFER.max_batch_size` | `1024` | 推理时的最大 batch_size |
603-
| `INFER.num_cpu_threads` | `10` | MKLDNN 和 ONNX 在 CPU 推理时的线程数 |
603+
| `INFER.num_cpu_threads` | `10` | oneDNN 和 ONNX 在 CPU 推理时的线程数 |
604604
| `INFER.batch_size` | `256` | 推理时的 batch_size |
605605
606606
### 1.4 断点继续训练

examples/ML2DDB/readme.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# ML2DDB
2+
3+
[Monolayer Two-dimensional Materials Database (ML2DDB) and Applications](https://arxiv.org/pdf/2507.00584)
4+
5+
Zhongwei Liu<sup>a, b, #</sup>,
6+
Zhimin Zhang<sup>c, #</sup>,
7+
Xuwei Liu<sup>c, #</sup>,
8+
Mingjia Yao<sup>b</sup>,
9+
Xin He<sup>a</sup>,
10+
Yuanhui Sun<sup>b, *</sup>,
11+
Xin Chen<sup>b, *</sup>,
12+
Lijun Zhang<sup>a, b, *</sup>
13+
14+
<sup>a</sup>
15+
State Key Laboratory of Integrated Optoelectronics, Key Laboratory of Automobile Materials of MOE and College of Materials Science and Engineering, Jilin University, Changchun 130012, China
16+
17+
<sup>b</sup> Suzhou Laboratory, Suzhou, 215123, China
18+
19+
<sup>c</sup> Baidu Inc., Beijing, P.R. China.
20+
21+
<sup>#</sup> These authors contributed equally to this work.
22+
23+
24+
25+
## Abstract
26+
27+
The discovery of two-dimensional (2D) materials with tailored properties is critical to meet the increasing demands of high-performance applications across flexible electronics, optoelectronics, catalysis, and energy storage. However, current 2D material databases are constrained by limited scale and compositional diversity. In this study, we introduce a scalable active learning workflow that integrates deep neural networks with density functional theory (DFT) calculations to efficiently explore a vast set of candidate structures. These structures are generated through physics-informed elemental substitution strategies, enabling broad and systematic discovery of stable 2D materials. Through six iterative screening cycles, we established the creation of the Monolayer 2D Materials Database (ML2DDB), which contains 242,546 DFT-validated stable structures—an order-of-magnitude increase over the largest known 2D materials databases. In particular, the number of ternary and quaternary compounds showed the most significant increase. Combining this database with a generative diffusion model, we demonstrated effective structure generation under specified chemistry and symmetry constraints. This work accomplished an organically interconnected loop of 2D material data expansion and application, which provides a new paradigm for the discovery of new materials.
28+
29+
![ML2DDB](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/ml2ddb.png)
30+
31+
## Dataset of 2D materials
32+
33+
We developed ML2DDB, a large-scale 2D material database containing >242k DFT-validated monolayer structures (𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> <50 meV/atom), representing a 10× increase over existing datasets. Key features:
34+
35+
- Broad elemental coverage: 81 elements across the periodic table (excluding radioactive/noble gases).
36+
- Enhanced diversity: Significantly more compounds with 3–4 distinct elements compared to prior work.
37+
- Structural richness: Diverse prototypes and cation-anion combinations.
38+
- Extended resource: >1M candidate structures (𝐸<sub>hull</sub><sup>MLIP</sup> <200 meV/atom) for future studies.
39+
40+
![dataset](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/ml2ddb_dataset.png)
41+
42+
## Diffusion model generation of S.U.N. materials
43+
44+
The capability to generate S.U.N. (stable, unique, new) 2D materials are prerequisites for diffusion models. We considered a generated structure as stable with 𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 100 meV/atom with respect to ML2DDB. The unique is specified whether a generated structure matches any other structure generated in the same batch or not, and the new is whether it is identical to any of the structures in ML2DDB. As shown in Figure 5b, we performed DFT structure optimization on 1024 structures to evaluate the stable attribute. The results show that 74.8% of them are considered stable (𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 100 meV/atom), which is comparable to the success rate of 3D stable structure generation of MatterGen. When the constraint is set to 𝐸<sub>hull</sub><sup>𝐷𝐹𝑇</sup> < 0 meV/atom, our method achieved a success rate of 59.6%, which is significantly higher than that of MatterGen (~13%). In addition, the Root-mean-square displacement (RMSD) of the generated structure is lower than 0.26 Å compared to the DFT relaxation structure, which is still less than the radius of the hydrogen atom (0.53 Å). For the generation of unique structures, the success rate accounts for 100% when generating one thousand structures. The rate only decreases 4.4% when generating ten thousand structures. For the generation of new structures, the rate decreases from 100% to 73.5% when the generated structures grow from one thousand to two thousand. This indicates that our model has a relatively excellent ability to generate completely new stable structures.
45+
46+
![dataset](https://paddle-org.bj.bcebos.com/paddlescience/docs/ML2DDB/gen_2d.png)
47+
48+
## Conclusion
49+
50+
This study establishes a novel framework integrating active learning workflows with conditional diffusion-based structural generation, achieving unprecedented expansion of 2D materials databases. Key contributions include:
51+
52+
1. **Dataset Advancement**
53+
- Created ML2DDB containing >242,546 thermodynamically stable 2D materials (E_hull^DFT <50 meV/atom), exceeding existing databases by ≥10x
54+
- Achieved 1100% and 960% growth in ternary/quaternary compounds respectively
55+
- Generated >1 million candidate structures (𝐸<sub>hull</sub><sup>MLIP</sup> <200 meV/atom)
56+
2. **Methodological Innovation**
57+
- Developed MLIP model with 92.36% accuracy in stability classification
58+
- Enabled phase diagram generation and space group-specific design through diffusion model integration
59+
- Demonstrated applicability to nonlinear optical and ferroelectric materials discovery

0 commit comments

Comments
 (0)