You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,9 @@ English | [简体中文](README_cn.md)
12
12
13
13
14
14
## Latest News
15
-
`2024.11.08`**To streamline HelixFold3 integration and support high-throughput use, we introduce a convenient paid API ([usage guide link](https://paddlehelix.baidu.com/app/tut/guide/all/helixfold3sdk)) for academic and commercial applications, enabling efficient access to HelixFold3’s structural prediction capabilities.**
15
+
`2025.07.23`**HelixFold3.2 released!** Compared to HelixFold3, **HelixFold3.2** demonstrates significant improvements in protein-related tasks and structural quality. For implementation details, please see [the code here](./apps/protein_folding/helixfold3).
16
+
17
+
`2024.11.08` To streamline HelixFold3 integration and support high-throughput use, we introduce a convenient paid API ([usage guide link](https://paddlehelix.baidu.com/app/tut/guide/all/helixfold3sdk)) for academic and commercial applications, enabling efficient access to HelixFold3’s structural prediction capabilities.
16
18
17
19
`2024.08.30` We are excited to announce great news! The initial version of the HelixFold3 server, designed for biomolecular structure prediction, is now available on the PaddleHelix website (https://paddlehelix.baidu.com/app/all/helixfold3/forecast). We encourage everyone to explore its capabilities and leverage it for impactful and innovative research.
# or lower version: https://paddle-wheel.bj.bcebos.com/2.5.1/linux/linux-gpu-cuda11.7-cudnn8.4.1-mkl-gcc8.2-avx/paddlepaddle_gpu-2.5.1.post117-cp39-cp39-linux_x86_64.whl
Note: If you have a different version of python3 and cuda, please refer to [here](https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html) for the compatible PaddlePaddle `dev` package.
69
80
70
81
71
-
### Usage
82
+
### 🎯 Usage
72
83
73
84
In order to run HelixFold3, the genetic databases and model parameters are required.
74
85
@@ -95,10 +106,10 @@ The script `scripts/download_all_data.sh` can be used to download and set up all
95
106
will download a reduced version of the databases to be used with the `reduced_dbs` preset. The total download
96
107
size for the reduced databases is around 190 GB, and the total unzipped size is around 530 GB.
97
108
98
-
#### Understanding Model Input
109
+
#### 🤔 Understanding Model Input
110
+
111
+
There are some demo input under `./data/`foryour test and reference. Data input isin the form of JSON containing several entities such as `protein`, `ligand`, `dna`, `rna` and `ion`. Proteins and nucleic acids inputs are their sequence.
99
112
100
-
There are some demo input under `./data/`foryour test and reference. Data input isin the form of JSON containing
101
-
several entities such as `protein`, `ligand`, `dna`, `rna` and `ion`. Proteins and nucleic acids inputs are their sequence.
102
113
HelixFold3 supports input ligand as SMILES or CCD id, please refer to `/data/demo_6zcy_smiles.json` and `demo_output/demo_6zcy_smiles/`
103
114
formore details about SMILES input. More flexible input will comein soon.
104
115
@@ -149,7 +160,7 @@ Here is an example modification input:
149
160
}
150
161
```
151
162
152
-
#### Running HelixFold for Inference
163
+
#### 🚀 Running HelixFold for Inference
153
164
To run inference on a sequence or multiple sequences using HelixFold3's pretrained parameters, run e.g.:
154
165
* Inference on single GPU (change the settings in script BEFORE you run it)
The descriptions of the above script are as follows:
196
207
* Replace `DATA_DIR` with your downloaded data path.
197
208
* Replace `ENV_BIN` with your conda virtual environment or any environment where `hhblits`, `hmmsearch` and other dependencies have been installed.
209
+
* Replace `PYTHON_BIN` with your python binary where `paddlepaddle-gpu` have been installed.
198
210
* `--preset` - Set `'reduced_dbs'` to use small bfd or `'full_dbs'` to use full bfd.
199
211
* `--*_database_path` - Path to datasets you have downloaded.
200
212
* `--input_json` - Input data in the form of JSON. Input pattern in `./data/demo_*.json` for your reference.
@@ -203,7 +215,7 @@ The descriptions of the above script are as follows:
203
215
* `--infer_time` - The number of inferences executed by model for single input. In each inference, the model will infer `5` times (`diff_batch_size`) for the same input by default. This hyperparameter can be changed by `model.head.diffusion_module.test_diff_batch_size` within `./helixfold/model/config.py`
204
216
* `--precision` - Either `bf16` or `fp32`. Please check if your machine can support `bf16` or not beforing changing it. For example, `bf16` is supported by A100 and H100 or higher version while V100 only supports `fp32`.
205
217
206
-
### Understanding Model Output
218
+
### 🤔 Understanding Model Output
207
219
208
220
The outputs will be in a subfolder of `output_dir`, including the computed MSAs, predicted structures,
209
221
ranked structures, and evaluation metrics. For a task of inferring twice with diffusion batch size 3,
@@ -233,11 +245,11 @@ assume your input JSON is named `demo_data.json`, the `output_dir` directory wil
233
245
The contents of each output file are as follows:
234
246
* `msas/` - A directory containing the files describing the various genetic
235
247
tool hits that were used to construct the input MSA.
236
-
* `demo_data-pred-X-Y` - Prediction results of `demo_data.json` in X-th inference and Y-thdiffusion batch,
248
+
* `demo_data-pred-X-Y` - Prediction results of `demo_data.json` in X-th inference and Y-th diffusion batch,
237
249
including predicted structures in `cif` and a JSON file containing all metrics' results.
238
250
*`demo_data-rank*` - Ranked results of a series of predictions according to metrics.
239
251
240
-
### Resource Usage
252
+
### 📌 Resource Usage
241
253
242
254
We suggest a single GPU for inference has at least 32G available memory. The maximum number of tokens is around
243
255
1200 for inference on a single A100-40G GPU with precision `bf16`. The length of inference input tokens on a
@@ -251,11 +263,11 @@ reduce the number of additional recycles by changing `model.num_recycle` in the
251
263
We are keen on support longer token inference, it will come in soon.
252
264
253
265
254
-
## Copyright
266
+
## 📌 Copyright
255
267
256
268
HelixFold3's code and model parameters are available under the [LICENSE](./LICENSE) for non-commercial use by individuals or non-commercial organizations only. Please check the usage restrictions before using HelixFold3.
257
269
258
-
## Reference
270
+
## 🌟 Reference
259
271
260
272
[1] Abramson, J et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500. 10.1038/s41586-024-07487-w
261
273
@@ -278,7 +290,7 @@ Elevating protein complex structure prediction to new heights. arXiv preprint ar
278
290
Wang, Jingzhou He, et al. Pre-training on large-scale generated docking conformations with helixdock to unlock
279
291
the potential of protein-ligand structure prediction models. arXiv preprint arXiv:2310.13913, 2023.
280
292
281
-
## Citation
293
+
## 📖 Citation
282
294
283
295
If you use the code, data, or checkpoints in this repo, please cite the following:
0 commit comments