Skip to content

Commit 427d7e0

Browse files
authored
Create Uni-Mol2_18_12_2024.md (#191)
1 parent dde27c9 commit 427d7e0

File tree

1 file changed

+102
-0
lines changed

1 file changed

+102
-0
lines changed
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: "Uni-Mol2 Release: Joining Hands with the DeepModeling Community, Marching Towards a 3D Molecular Foundation Model"
3+
date: 2024-12-18
4+
categories:
5+
- Uni-Mol
6+
---
7+
8+
DeepModeling community has officially released Uni-Mol2, which is currently the largest 3D molecular representation foundation model. The largest version of Uni-Mol2 has a parameter scale of 1.1 billion and has been pre-trained on 800 million molecular conformations, demonstrating excellent performance in multiple molecular property prediction tasks. This achievement not only provides a powerful tool for deep learning research in the field of molecular science but also lays a solid experimental foundation for exploring larger-scale molecular pre-training models. At the NeurIPS 2024 conference currently being held in Vancouver, Canada, Uni-Mol2, as an accepted paper, has also received extensive attention.
9+
10+
<!-- more -->
11+
12+
## Version Release Content
13+
14+
### Better Prediction Performance
15+
16+
The research team has proposed an innovative "dual-track Transformer" architecture. By separately processing atomic-level, graph-level, and molecular conformation-level features, multi-dimensional modeling of molecular information is achieved. The core of this architecture lies in simultaneously processing atomic-level features and atomic-pair-level features, thus achieving a comprehensive representation of molecular information. Specifically, the backbone network of Uni-Mol2 updates atomic representations and atomic-pair representations in parallel in each module and realizes the deep fusion of features through an attention mechanism with atomic-pair bias. This design provides a powerful expressive ability for modeling complex molecular structures.
17+
18+
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/Uni-Mol2_18_12_2024/figure1.jpeg pic_center width="80%" height="80%" /></center>
19+
20+
*Figure 1: The overall pretraining architecture and details of backbone block*
21+
22+
Uni-Mol2 uses a dataset containing 885 million compounds, 40 times the size of the dataset used in Uni-Mol training, with a 17-fold increase in the number of molecular skeletons, greatly expanding the data diversity.
23+
24+
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/Uni-Mol2_18_12_2024/table1.jpeg pic_center width="70%" height="70%" /></center>
25+
26+
*Table 1: Uni-Mol2 vs Uni-Mol Dataset Scale Comparison*
27+
28+
Based on this, Uni-Mol2 shows better prediction performance. On the QM9 dataset for quantum mechanical prediction tasks, Uni-Mol2 achieves an average 27% performance improvement. When using the COMPAS-1D dataset, the 1.1-billion-parameter model of Uni-Mol2 achieves an average 4% performance improvement compared to Uni-Mol, and when the 1.1-billion-parameter model of Uni-Mol2 includes atomic and bond features, it achieves an average 14% performance improvement compared to Uni-Mol.
29+
30+
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/Uni-Mol2_18_12_2024/table2.jpeg pic_center width="70%" height="70%" /></center>
31+
32+
*Table 2: The results of the mean absolute error [MAE, ↓] of the QM9 dataset*
33+
34+
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/Uni-Mol2_18_12_2024/table3.png pic_center width="70%" height="70%" /></center>
35+
36+
*Table 3: The results of the mean absolute error [MAE, ↓] of the COMPAS - 1D dataset*
37+
38+
### Uni-Mol2 Scale Law Introduction
39+
40+
The research systematically explores the scale law in molecular pre-training. The experimental results show that during the training process of Uni-Mol2, as the parameter scale increases, the validation loss continuously decreases. Further analysis reveals that as the model scale expands, the relationship between the magnitude of the loss decrease and the molecular data scale, model parameter scale, and total computational scale can be characterized by a power law. This finding provides an important experimental basis for constructing larger-scale molecular pre-training models in the future.
41+
42+
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/Uni-Mol2_18_12_2024/figure2.jpeg pic_center width="100%" height="100%" /></center>
43+
44+
### Uni-Mol Tools Support Uni-Mol2
45+
46+
Compared with Uni-Mol, Uni-Mol2 has a good gain in tasks strongly related to the molecular ground state conformation, such as quantum chemical property prediction. For the convenience of users, currently, the code of Uni-Mol2 has been integrated into Uni-Mol Tools. Through the following few lines of code, multiple versions of Uni-Mol2 can be called through Uni-Mol Tools. Currently, it supports loading the weights of all release versions of the foundation model, and users can try different scale weights in multiple scenarios.
47+
48+
# Install unimol_tools
49+
pip install unimol_tools --upgrade
50+
# Uni-Mol2 is introduced in the version of unimol_tools==0.1.1, please note the installed version number
51+
52+
Examples of using Uni-Mol2 for training and prediction are as follows:
53+
54+
# In MolTrain and UniMolRepr, set model_name to unimolv2
55+
56+
# trainingfrom unimol_tools import MolTrain, MolPredict
57+
clf = MolTrain(task='regression',
58+
data_type='molecule',
59+
epochs=100,
60+
batch_size=4,
61+
split='random',
62+
save_path='./exp',
63+
remove_hs=False,
64+
early_stopping=5,
65+
target_cols='TARGET',
66+
model_name='unimolv2', # avaliable: unimolv1, unimolv2
67+
model_size='84m', # work when model_name is unimolv2. avaliable: 84m, 164m, 310m, 570m, 1.1B.
68+
)
69+
70+
clf.fit(data='path/to/train/file')
71+
72+
# inference
73+
clf = MolPredict(load_model='./exp')
74+
res = clf.predict(data='path/to/test/file')
75+
76+
The features of Uni-Mol2 can be obtained as follows:
77+
78+
# unimol2 representation
79+
80+
from unimol_tools import UniMolRepr
81+
# single smiles unimol representation
82+
clf = UniMolRepr(data_type='molecule',
83+
remove_hs=False,
84+
model_name='unimolv2', # avaliable: unimolv1, unimolv2
85+
model_size='84m', # work when model_name is unimolv2. avaliable: 84m, 164m, 310m, 570m, 1.1B.
86+
)
87+
smiles_list = ['O=C1NCC1C1CC1']
88+
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
89+
90+
Visit the tutorial for more information: https://unimol.readthedocs.io/en/latest/quickstart.html
91+
92+
## Quick Access Address
93+
94+
- Uni-Mol in the DeepModeling community repository: https://github.com/deepmodeling/Uni-Mol
95+
96+
- Uni-Mol users and developers' questions: https://github.com/deepmodeling/Uni-Mol/issues
97+
98+
- Uni-Mol Tools English documentation: https://unimol.readthedocs.io/en/latest/
99+
100+
## Project Contribution
101+
102+
Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Yaning Cui, Letian Chen, Linfeng Zhang, Guolin Ke, Weinan E have contributed to the above work.

0 commit comments

Comments
 (0)