Skip to content

Commit c48e99c

Browse files
committed
feat(helixfold3): Update the README for HelixFold3.2 and optimize the dependency configuration
1 parent 0221f4b commit c48e99c

File tree

7 files changed

+48
-36
lines changed

7 files changed

+48
-36
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ English | [简体中文](README_cn.md)
1212

1313

1414
## Latest News
15-
`2024.11.08` **To streamline HelixFold3 integration and support high-throughput use, we introduce a convenient paid API ([usage guide link](https://paddlehelix.baidu.com/app/tut/guide/all/helixfold3sdk)) for academic and commercial applications, enabling efficient access to HelixFold3’s structural prediction capabilities.**
15+
`2025.07.23` **HelixFold3.2 released!** Compared to HelixFold3, **HelixFold3.2** demonstrates significant improvements in protein-related tasks and structural quality. For implementation details, please see [the code here](./apps/protein_folding/helixfold3).
16+
17+
`2024.11.08` To streamline HelixFold3 integration and support high-throughput use, we introduce a convenient paid API ([usage guide link](https://paddlehelix.baidu.com/app/tut/guide/all/helixfold3sdk)) for academic and commercial applications, enabling efficient access to HelixFold3’s structural prediction capabilities.
1618

1719
`2024.08.30` We are excited to announce great news! The initial version of the HelixFold3 server, designed for biomolecular structure prediction, is now available on the PaddleHelix website (https://paddlehelix.baidu.com/app/all/helixfold3/forecast). We encourage everyone to explore its capabilities and leverage it for impactful and innovative research.
1820

apps/protein_folding/helixfold3/README.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,35 @@ The **free version** of the HelixFold3 server is restricted to **non-commercial
2727
<br>
2828

2929

30-
3130
## HelixFold3 Inference
3231

33-
### Environment
32+
### 📣 Updates
33+
34+
- 2025-07-23: **HelixFold3.2** brings significant advancements in protein-related tasks on FoldBench, along with a marked reduction in atomic clashes.
35+
<table>
36+
<tr>
37+
<td align="center" width="48%">
38+
<img src="images/HelixFold3.2_foldbench.png" width="100%"/><br/>
39+
</td>
40+
<td align="center" width="48%">
41+
<img src="images/HelixFold3.2_atom_clash.png" width="100%"/><br/>
42+
</td>
43+
</tr>
44+
</table>
45+
46+
47+
### 🛠 Environment
3448
Specific environment settings are required to reproduce the results reported in this repo,
3549

36-
* Python: 3.9
50+
* Python: 3.10
3751
* CUDA: 12.0
3852
* CuDNN: 8.4.0
3953
* NCCL: 2.14.3
40-
* Paddle: 2.6.1
54+
* Paddle: 3.1.0
4155

4256
Those settings are recommended as they are the same as we used in our A100 machines for all inference experiments.
4357

44-
### Installation
58+
### 📦 Installation
4559

4660
HelixFold3 depends on [PaddlePaddle](https://github.com/paddlepaddle/paddle). Python dependencies available through `pip`
4761
is provided in `requirements.txt`. `kalign`, the [`HH-suite`](https://github.com/soedinglab/hh-suite) and `jackhmmer` are
@@ -50,25 +64,22 @@ also needed to produce multiple sequence alignments. The download scripts requir
5064
Locate to the directory of `helixfold` then run:
5165

5266
```bash
53-
# Install py env
54-
conda create -n helixfold -c conda-forge python=3.9
55-
conda install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold
56-
conda install -y -c conda-forge openbabel -n helixfold
67+
# install msa env
68+
conda create -n msa_env -c conda-forge python=3.9
69+
conda install -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n msa_env -y
5770

58-
# activate the conda environment
71+
# install paddlepaddle and other requirements
72+
conda create -n helixfold -c conda-forge python=3.10
5973
conda activate helixfold
6074

61-
# install paddlepaddle
62-
python3 -m pip install paddlepaddle-gpu==2.6.1.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
63-
# or lower version: https://paddle-wheel.bj.bcebos.com/2.5.1/linux/linux-gpu-cuda11.7-cudnn8.4.1-mkl-gcc8.2-avx/paddlepaddle_gpu-2.5.1.post117-cp39-cp39-linux_x86_64.whl
64-
75+
python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
6576
python3 -m pip install -r requirements.txt
6677
```
6778

6879
Note: If you have a different version of python3 and cuda, please refer to [here](https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html) for the compatible PaddlePaddle `dev` package.
6980

7081

71-
### Usage
82+
### 🎯 Usage
7283

7384
In order to run HelixFold3, the genetic databases and model parameters are required.
7485

@@ -95,10 +106,10 @@ The script `scripts/download_all_data.sh` can be used to download and set up all
95106
will download a reduced version of the databases to be used with the `reduced_dbs` preset. The total download
96107
size for the reduced databases is around 190 GB, and the total unzipped size is around 530 GB.
97108

98-
#### Understanding Model Input
109+
#### 🤔 Understanding Model Input
110+
111+
There are some demo input under `./data/` for your test and reference. Data input is in the form of JSON containing several entities such as `protein`, `ligand`, `dna`, `rna` and `ion`. Proteins and nucleic acids inputs are their sequence.
99112

100-
There are some demo input under `./data/` for your test and reference. Data input is in the form of JSON containing
101-
several entities such as `protein`, `ligand`, `dna`, `rna` and `ion`. Proteins and nucleic acids inputs are their sequence.
102113
HelixFold3 supports input ligand as SMILES or CCD id, please refer to `/data/demo_6zcy_smiles.json` and `demo_output/demo_6zcy_smiles/`
103114
for more details about SMILES input. More flexible input will come in soon.
104115

@@ -149,7 +160,7 @@ Here is an example modification input:
149160
}
150161
```
151162

152-
#### Running HelixFold for Inference
163+
#### 🚀 Running HelixFold for Inference
153164
To run inference on a sequence or multiple sequences using HelixFold3's pretrained parameters, run e.g.:
154165
* Inference on single GPU (change the settings in script BEFORE you run it)
155166
```
@@ -195,6 +206,7 @@ CUDA_VISIBLE_DEVICES=0 "$PYTHON_BIN" inference.py \
195206
The descriptions of the above script are as follows:
196207
* Replace `DATA_DIR` with your downloaded data path.
197208
* Replace `ENV_BIN` with your conda virtual environment or any environment where `hhblits`, `hmmsearch` and other dependencies have been installed.
209+
* Replace `PYTHON_BIN` with your python binary where `paddlepaddle-gpu` have been installed.
198210
* `--preset` - Set `'reduced_dbs'` to use small bfd or `'full_dbs'` to use full bfd.
199211
* `--*_database_path` - Path to datasets you have downloaded.
200212
* `--input_json` - Input data in the form of JSON. Input pattern in `./data/demo_*.json` for your reference.
@@ -203,7 +215,7 @@ The descriptions of the above script are as follows:
203215
* `--infer_time` - The number of inferences executed by model for single input. In each inference, the model will infer `5` times (`diff_batch_size`) for the same input by default. This hyperparameter can be changed by `model.head.diffusion_module.test_diff_batch_size` within `./helixfold/model/config.py`
204216
* `--precision` - Either `bf16` or `fp32`. Please check if your machine can support `bf16` or not beforing changing it. For example, `bf16` is supported by A100 and H100 or higher version while V100 only supports `fp32`.
205217
206-
### Understanding Model Output
218+
### 🤔 Understanding Model Output
207219
208220
The outputs will be in a subfolder of `output_dir`, including the computed MSAs, predicted structures,
209221
ranked structures, and evaluation metrics. For a task of inferring twice with diffusion batch size 3,
@@ -233,11 +245,11 @@ assume your input JSON is named `demo_data.json`, the `output_dir` directory wil
233245
The contents of each output file are as follows:
234246
* `msas/` - A directory containing the files describing the various genetic
235247
tool hits that were used to construct the input MSA.
236-
* `demo_data-pred-X-Y` - Prediction results of `demo_data.json` in X-th inference and Y-thdiffusion batch,
248+
* `demo_data-pred-X-Y` - Prediction results of `demo_data.json` in X-th inference and Y-th diffusion batch,
237249
including predicted structures in `cif` and a JSON file containing all metrics' results.
238250
* `demo_data-rank*` - Ranked results of a series of predictions according to metrics.
239251

240-
### Resource Usage
252+
### 📌 Resource Usage
241253

242254
We suggest a single GPU for inference has at least 32G available memory. The maximum number of tokens is around
243255
1200 for inference on a single A100-40G GPU with precision `bf16`. The length of inference input tokens on a
@@ -251,11 +263,11 @@ reduce the number of additional recycles by changing `model.num_recycle` in the
251263
We are keen on support longer token inference, it will come in soon.
252264

253265

254-
## Copyright
266+
## 📌 Copyright
255267

256268
HelixFold3's code and model parameters are available under the [LICENSE](./LICENSE) for non-commercial use by individuals or non-commercial organizations only. Please check the usage restrictions before using HelixFold3.
257269
258-
## Reference
270+
## 🌟 Reference
259271
260272
[1] Abramson, J et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500. 10.1038/s41586-024-07487-w
261273
@@ -278,7 +290,7 @@ Elevating protein complex structure prediction to new heights. arXiv preprint ar
278290
Wang, Jingzhou He, et al. Pre-training on large-scale generated docking conformations with helixdock to unlock
279291
the potential of protein-ligand structure prediction models. arXiv preprint arXiv:2310.13913, 2023.
280292
281-
## Citation
293+
## 📖 Citation
282294
283295
If you use the code, data, or checkpoints in this repo, please cite the following:
284296

apps/protein_folding/helixfold3/helixfold/model/diffusion.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,7 @@ def sample_diffusion(self, representations, batch, step_num=None, gamma0=None):
238238
gamma0 = self.gamma0
239239

240240
single_act = representations['single'] # (B, N, d1)
241+
gamma0 = self.gamma0 if single_act.shape[1] <= 1400 else 0.0 # TODO:
241242
atom_mask = batch['all_atom_pos_mask']
242243
B, N_atom = atom_mask.shape[:2]
243244
c_list = self._noise_schedule(step_num)

apps/protein_folding/helixfold3/helixfold/model/modules_all_atom.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,15 @@ def forward(self,
129129
tracer = _dygraph_tracer()
130130
if tracer._amp_dtype == "bfloat16":
131131
with paddle.amp.auto_cast(enable=False):
132-
bf16 = paddle.base.core.VarDesc.VarType.BF16 if FLUID_DEPRECATED else paddle.fluid.core.VarDesc.VarType.BF16
132+
bf16_vartype = paddle.base.core.VarDesc.VarType.BF16 if FLUID_DEPRECATED else paddle.fluid.core.VarDesc.VarType.BF16
133+
bf16_list = [bf16_vartype, paddle.bfloat16]
133134
for key, value in representations.items():
134-
if isinstance(value, paddle.Tensor) and value.dtype in [bf16]:
135+
if isinstance(value, paddle.Tensor) and value.dtype in bf16_list:
135136
temp_value = value.cast('float32')
136137
temp_value.stop_gradient = value.stop_gradient
137138
representations[key] = temp_value
138139
for key, value in batch.items():
139-
if isinstance(value, paddle.Tensor) and value.dtype in [bf16]:
140+
if isinstance(value, paddle.Tensor) and value.dtype in bf16_list:
140141
temp_value = value.cast('float32')
141142
temp_value.stop_gradient = value.stop_gradient
142143
batch[key] = temp_value
241 KB
Loading
276 KB
Loading
Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,9 @@
11
absl-py==0.13.0
2-
biopython==1.79
3-
chex==0.0.7
4-
dm-haiku==0.0.4
52
dm-tree==0.1.6
6-
docker==5.0.0
3+
biopython==1.79
74
immutabledict==2.0.0
8-
jax==0.2.14
95
ml-collections==0.1.0
106
pandas==1.3.4
117
scipy==1.9.0
12-
rdkit-pypi==2022.9.5
13-
posebusters
8+
jsonschema==4.24.0
9+
posebusters

0 commit comments

Comments
 (0)