Skip to content

Commit a1182c6

Browse files
committed
Update and expand documentation for RFdiffusion2
Improved the README with clearer setup, installation, and inference instructions, and added troubleshooting tips. Expanded the documentation by adding new usage examples, configuration options, and ORI token explanations. Updated Sphinx configuration and index to include new documentation files, and adjusted workflow and dependencies to comment out unused Sphinx extensions. Enhanced installation guide with troubleshooting for Apptainer image issues and clarified instructions for both Apptainer and source installations.
1 parent 21e435d commit a1182c6

File tree

11 files changed

+158
-34
lines changed

11 files changed

+158
-34
lines changed

.github/workflows/documentation_workflow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
- uses: actions/setup-python@v5
1414
- name: Install dependencies
1515
run: |
16-
pip install sphinx sphinx_mdinclude furo sphinx-copybutton sphinx-new-tab-link
16+
pip install sphinx sphinx_mdinclude furo sphinx-copybutton #sphinx-new-tab-link
1717
- name: Sphinx build
1818
run: |
1919
sphinx-build -M html doc/source/ doc/build/

README.md

Lines changed: 67 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
> 🚧 **Under Construction:**
22
> This is an initial release of the functionality — further documentation and cleanup is in progress. Currently only inference is supported. But everything currently documented in this README should be runnable. If you are experiencing a bug with inference, feel free to file an issue and attach the output .pdb, .trb, **and** a picture of the design visualized according to the pymol visualization section of this README.
33
4-
# RFdiffusion 2
4+
# RFdiffusion2
55

66
Open source code for RFdiffusion2 as described in the following pre-print.
77

@@ -18,7 +18,24 @@ Open source code for RFdiffusion2 as described in the following pre-print.
1818

1919
More detailed information about how to run, install, and use RFdiffusion2 can be found [here](https://rosettacommons.github.io/RFdiffusion2/).
2020

21+
<!-- ## Table of Contents
22+
- [Set-up](readme_link.html#set-up)
23+
- [Inference Example](readme_link.html#inference)
24+
- [Viewing Designs](readme_link.html#viewing-designs)
25+
- [PyMOL and designs on the same machine](readme_link.html#same_machine_pymol)
26+
- [PyMOL running locally, designs on remote GPU](readme_link.html#remote_machine_pymol)
27+
- [Additional Info](readme_link.html#additional_info)
28+
- [Running the AME benchmark](readme_link.html#ame_benchmark)
29+
- [Pipeline metrics](readme_link.html#pipeline_metrics)
30+
- [RFdiffusion2 outputs](readme_link.html#rfdiffusion2_outputs)
31+
- [LigandMPNN outputs](readme_link.html#ligandmpnn_outputs)
32+
-->
33+
2134
## Set-up
35+
<a id="set-up"></a>
36+
37+
If these set-up instructions do not work for your system see the [Installation Guide](installation.html) for troubleshooting issues with the
38+
provided image and alternative instructions for how to install RFdiffusion2 from source.
2239

2340
1. **Clone the repo.**
2441

@@ -28,20 +45,21 @@ More detailed information about how to run, install, and use RFdiffusion2 can be
2845
```bash
2946
export PYTHONPATH="/my/path/to/RFdiffusion2"
3047
```
48+
You will need to export this variable every time you use RFdiffusion2.
3149

3250
3. **Download the model weights and containers:**
3351
```bash
3452
cd /my/path/to/RFdiffusion2
3553
python setup.py
3654
```
37-
These files are quite large so the download process can take over half an hour. If the download process gets terminated before finishing run `python setup.py overwrite` so taht any partially downloaded files can be overwritten.
55+
These files are quite large so the download process can take over half an hour. If the download process gets terminated before finishing run `python setup.py overwrite` so that any partially downloaded files can be overwritten.
3856

39-
4. **Install apptainer**
57+
4. **Install Apptainer**
4058

41-
*Note: you can also run RFdiffusion2 with singularity.*
59+
*Note: You can also run RFdiffusion2 with [Singularity](https://sylabs.io/singularity/).*
4260

43-
RFdiffusion2 (RFD2) uses [apptainer](https://apptainer.org) to simplify the environment set-up.
44-
If you do not already have apptainer on your system, please follow the apptainer [installation instructions](https://apptainer.org/docs/admin/main/installation.html) for your operating system.
61+
RFdiffusion2 (RFD2) uses [Apptainer](https://apptainer.org) to simplify the environment set-up.
62+
If you do not already have Apptainer on your system, please follow the [Apptainer installation instructions](https://apptainer.org/docs/admin/main/installation.html) for your operating system.
4563

4664
If you manage your packages on linux with `apt` you can simply run:
4765
```bash
@@ -60,19 +78,26 @@ More detailed information about how to run, install, and use RFdiffusion2 can be
6078
apptainer exec --nv exec/bakerlab_rf_diffusion_aa.sif <path-to-python_file> <args>
6179
```
6280

81+
## Inference Example
82+
<a id="inference"></a>
6383

64-
## Inference
84+
For other usage examples, see the [Usage page](rosettacommons.github.io/RFdiffusion2/usage/usage.html) in the external documentation.
6585

66-
To run a demo of some of the inference capabilities, including enzyme design from an atomic motif + small molecule, enzyme design from an atomic motif of unknown sequence positions + small molecule, and small-molecule binder design (with RASA conditioning to enforce burial of the small molecule).
67-
(See `rf_diffusion/benchmark/demo.json` for how these tasks are declared.) Note that this will be extremely slow if not run on a GPU.
86+
The blow commands run several demos that show off some of the inference capabilities of RFdiffusion2 including:
87+
- enzyme design from an atomic motif and small molecule
88+
- enzyme design from an atomic motif of unknown sequence positions and a small molecule
89+
- small-molecule binder design with RASA conditioning
90+
The possible demo options can be found in `rf_diffusion/benchmark/open_source_demo.json` which also shows the different inference configurations
91+
used in each demo. Note that this will be extremely slow if not run on a GPU.
6892

69-
The default argument `in_proc=True` in `open_source_demo.yaml` makes the script run locally. With `in_proc=False` the pipeline will automatically distribute the tasks using SLURM, but this is not yet supported.
93+
<!--The default argument `in_proc=True` in `open_source_demo.yaml` makes the script run locally. With `in_proc=False` the pipeline will automatically distribute the tasks using SLURM, but this is not yet supported.-->
7094
The demo generates 150 residue proteins with many of the residues atomized, so it can take upwards of 30 minutes to run all cases. Each case may take up to 10 minutes on an RTX2060.
7195

7296
**Run single demo case:**
7397
```bash
7498
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif rf_diffusion/benchmark/pipeline.py --config-name=open_source_demo sweep.benchmarks=active_site_unindexed_atomic_partial_ligand
7599
```
100+
You can replace `active_site_unindexed_atomic_partial_ligand` with any of the other demos included in `rf_diffusion/benchmark/open_source_demo.json`.
76101

77102
**Run all demo cases:**
78103
```bash
@@ -83,23 +108,26 @@ The outputs will be written to:
83108
```
84109
pipeline_outputs/${now:%Y-%m-%d}_${now:%H-%M-%S}_open_source_demo
85110
```
111+
in the directory that you are running the image file from.
86112

87113
This runs only the design stage of the pipeline. In order to continue through sequence-fitting with [LigandMPNN](https://github.com/dauparas/LigandMPNN) and folding with [Chai1](https://github.com/chaidiscovery/chai-lab), pass the command line argument: `stop_step=''`. Note that Chai1 cannot run on all GPU architectures.
88114

89115
Pipeline runs can be resumed by passing `outdir=/path/to/your/output/directory`.
90116

91117

92118
## Viewing Designs
119+
<a id="viewing-designs"></a>
93120

94121
Visualizing the design outputs can be confusing when looking at the raw .pdb files, especially for unindexed motif scaffolding, in which the input motif is incorporated into the protein at indices of the network's choice.
95122
To simplify this, we provide scripts for visualizing the outputs of the network that interact with a local PyMOL instance over XMLRPC.
96123

97-
Download [PyMOL](https://www.pymol.org/) and run it as an XMLRPCServer with:
124+
Download [PyMOL](https://www.pymol.org/) and run it as an XML-RPC server with:
98125
```bash
99126
pymol -R
100127
```
101128

102129
### PyMOL and designs on the same machine
130+
<a id="same_machine_pymol"></a>
103131

104132
Run:
105133
```bash
@@ -119,27 +147,39 @@ You should see something like:
119147
- Any small molecules will have their carbon atoms colored purple.
120148

121149
### PyMOL running locally, designs on remote GPU
150+
<a id="remote_machine_pymol"></a>
122151

123152
It is common for users to be sshed into a gpu cluster for running designs.
124153
It is still possible to view designs on a remote computer from your local PyMOL, as long as your remote computer has a route to your local computer (via VPN or ssh proxy).
125154

126-
Simply find your hostname on your cluster with:
127-
```bash
128-
hostname -I
129-
192.168.0.113 100.64.128.68
155+
You will need to know an IP address that will point back to your computer, typically you can use 127.0.0.1.
156+
157+
You will also need to add the -R option when you are signing into your cluster and provide the path back to yourPyMOL server:
130158
```
159+
ssh username@hostname -R 9123:localhost:9123
160+
```
161+
You do **not** need to replace `localhost` with an IP address. The 9123 is the port that should have been printed in the PyMOL terminal
162+
when you first set up the XML-RPC server.
131163

132-
The second number is the route to your computer.
164+
If you need to sign into multiple servers before you can run Apptainer, you will need to sign into your cluster like this:
165+
```
166+
ssh -J username@first_hostname username@second_hostname -R 9123:localhost:9123
167+
```
168+
The -J option stands for 'jump host' and allows you to connect to a remote host through an intermediate server in one command.
133169

134-
Simply append `--pymol_url=http://100.64.128.68:9123` to the command, i.e. from your remote machine (cluster) run:
170+
Simply append `--pymol_url=http://127.0.0.1:9123` to the command, i.e. from your remote machine (cluster) run:
135171
```bash
136-
apptainer exec rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif rf_diffusion/dev/show_bench.py --clear=True --key=name '/absolute/path/to/pipeline_outputs/output_directory/*.pdb' --pymol_url=http://100.64.128.68:9123
172+
apptainer exec rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif rf_diffusion/dev/show_bench.py --clear=True --key=name '/absolute/path/to/pipeline_outputs/output_directory/*.pdb' --pymol_url=http://127.0.0.1:9123
137173
```
138-
Make sure to replace 100.64.128.68 with your computer's route. 9123 is the port that PyMol uses, after running `pymol -R` you should see a message containing this route number.
174+
If 127.0.0.1 does not point to your local machine, replace it in all the above steps in this section with an IP address that does point to your machine.
175+
176+
For more information about using PyMOL remotely, see this blog post on [Controlling PyMOL from afar](https://www.blopig.com/blog/2024/11/controlling-pymol-from-afar/).
139177

140-
## Additional Info
178+
## Additional Information
179+
<a id="additional_info"></a>
141180

142-
### Running the AME Benchmark
181+
### Running the AME benchmark
182+
<a id="ame_benchmark"></a>
143183

144184
We crawled M-CSA for 41 enzymes where all reactants and products are present to create this benchmark.
145185
Only positon-agnostic tip atoms are provided to the network. 100 designs for each case are created. Run it with:
@@ -148,7 +188,8 @@ apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif rf_diffusion/
148188
```
149189
Running this entire benchmark will perform [41 active sites * 100 designs per active site * 8 sequences per design] chai folding runs, which will take a prohibitively long time on a single machine, but for reproducibility it is included.
150190

151-
### Pipeline Metrics
191+
### Pipeline metrics
192+
<a id="pipeline_metrics"></a>
152193

153194
We also include the code that was used to benchmark the network.
154195
The outline of the benchmarking process is:
@@ -178,7 +219,8 @@ Where:
178219
- `full_atom`: All heavy atoms
179220
- `motif_atom`: Only motif heavy atoms
180221

181-
#### RFdiffusion2 Outputs
222+
#### RFdiffusion2 outputs
223+
<a id="rfdiffusion2_outputs"></a>
182224

183225
The network outputs the protein with both the indexed backbone region and the unindexed atomized region.
184226
After that several idealization steps are conducted; the backbone is idealized, the protein is deatomized and the unindexed residues are assigned their corresponding indexed residue (using a greedy algorithm that searches for the closest C-alpha in the indexed backbone).
@@ -189,7 +231,8 @@ There are two further idealization steps that are optional to users:
189231

190232
The protein at this point has sequence and structure for the motif regions but only backbone (N,Ca,C,O,C-Beta) coordinates for diffused residues (as well as any non-protein components e.g. small molecules).
191233

192-
#### LigandMPNN Outputs
234+
#### LigandMPNN outputs
235+
<a id="ligandmpnn_outputs"></a>
193236

194237
Sequence is fit using LigandMPNN in a ligand-aware, motif-rotamer-aware mode. LigandMPNN also performs packing. LigandMPNN attempts to keep the motif rotamers unchanged, however the pack uses a more conservative set of torsions than RF All-Atom (i.e. fewer DoF) to pack the rotamers and thus there is often some deviation between the RF All-Atom-idealized and ligandmpnn-idealized motif rotamers. The idealization gap between the diffusion-output rotamer set and the RF All-Atom-idealized rotamer set can be found with metrics key: `metrics.IdealizedResidueRMSD.rmsd_constellation`. The corresponding gap between the rf2aa-idealized (or not idealized if `inference.idealize_sidechain_outputs == False`) rotamer set and the ligandmpnn-idealized rotamer set can be found with metrics key: `motif_ideality_diff`.
195238

doc/source/conf.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
'sphinx_mdinclude',
2222
#'myst_parser', # to use markdown instead of ReST
2323
'sphinx_copybutton',
24-
'sphinx_new_tab_link',
24+
#'sphinx_new_tab_link',
2525
]
2626

2727
#myst_enable_extensions = ["colon_fence"] # see https://mystmd.org/guide/syntax-overview for more information
@@ -64,7 +64,7 @@
6464
napoleon_use_ivar = True
6565

6666
templates_path = ['_templates']
67-
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'overview.md']
67+
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'overview.md', 'usage/run_inference_example.md', 'usage/other_pipeline_example.md']
6868

6969
# -- Options for HTML output -------------------------------------------------
7070
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
@@ -75,8 +75,6 @@
7575

7676
html_theme_options = {
7777
"sidebar_hide_name":False,
78-
"top_of_page_buttons": ["edit"],
79-
""
8078
#"announcement": "<em>THIS DOCUMENTATION IS CURRENTLY UNDER CONSTRUCTION</em>",
8179
"light_css_variables": {
8280
"color-brand-primary": "#F68A33", # Rosetta Teal

doc/source/index.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@ Welcome to the Official Documentation for `RFdiffusion2 <https://github.com/Rose
1515
readme_link.rst
1616
license_link.rst
1717
installation.md
18+
usage/usage.md
19+
usage/configuration_options.md
20+
usage/ori_tokens.md
21+
22+
.. toctree::
23+
:maxdepth: 1
24+
:hidden:
25+
26+
.. usage/other_pipeline_example.md
27+
.. usage/run_inference_example.md
1828
1929
.. Indices and tables
2030
.. ==================

doc/source/installation.md

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,40 @@
1-
# Installing RFdiffusion2
1+
# Installation Guide
22

33
## Apptainer Image (Recommended)
4-
There is an Apptainer image provided in the RFdiffusion2 repository, it is located at `RFdiffusion2/rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif`. This file can be run with either Apptainer or Singularity, if you have any issues using it please [create an issue](https://github.com/RosettaCommons/RFdiffusion2/issues). An example of how to use this image is given in the [README](readme_link.html#inference).
4+
There is an Apptainer image provided in the RFdiffusion2 repository, it is located at `RFdiffusion2/rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif`. This file can be run with either Apptainer or Singularity, if you have any issues using it please [create an issue](https://github.com/RosettaCommons/RFdiffusion2/issues). An example of how to use this image is given in the [README](readme_link.html#inference).
55

66
If you need to generate your own image, the `.spec` file used to generate the given `.sif` file can be found at `RFdiffusion2/rf_diffusion/exec/rf_diffusion_aa.spec`.
77

8+
### Troubleshooting
9+
<a id="image_troubleshooting"></a>
10+
11+
<details>
12+
<summary>lz4 compression issues</summary>
13+
14+
Full error message you might see:
15+
```
16+
FATAL: container creation failed: mount hook function failure: mount /proc/self/fd/3->/var/apptainer/mnt/session/rootfs error: while mounting image /proc/self/fd/3: squashfuse_ll exited with status 255: Squashfs image uses lz4 compression, this version supports only zlib.
17+
```
18+
Or you may see
19+
```
20+
FATAL: kernel reported a bad superblock for squashfs image partition,possible causes are that your kernel doesn't support the compression algorithm or the image is corrupted.
21+
```
22+
23+
To fix this issue you can rebuild the sif on your HPC cluster:
24+
```
25+
apptainer build --sandbox rfd2_sandbox /path/to/bakerlab_rf_diffusion_aa.sif
26+
apptainer build rfd2_zlib.sif rfd2 sandbox
27+
```
28+
Thank you to those who posted in [Issue 10](https://github.com/RosettaCommons/RFdiffusion2/issues/10) for reporting this problem and documenting a
29+
solution.
30+
</details>
31+
32+
833
## Installation from Source
934
Some of the dependencies listed below will vary based on your system, especially the version of CUDA available on your cluster.
1035
You will likely need to change some of the versions of the tools below to successfully install RFdiffusion2.
1136
The instructions below are for CUDA 12.4 and PyTorch 2.4.
12-
For some useful troubleshooting tips, see the [Troubleshooting](#troubleshooting) section below.
37+
For some useful troubleshooting tips, see the [Troubleshooting](#install_troubleshooting) section below.
1338

1439
1. Create a conda environment using [miniforge](https://github.com/conda-forge/miniforge) and activate it
1540
1. Point to the correct [NVIDIA-CUDA channel](https://anaconda.org/nvidia/cuda/labels), and install [PyTorch](https://pytorch.org/), Python 3.11, and [pip](https://pip.pypa.io/en/latest/) based on what is available on your system:
@@ -105,10 +130,17 @@ For some useful troubleshooting tips, see the [Troubleshooting](#troubleshooting
105130
```
106131
export PYTHONPATH=$PYTHONPATH:/path/to/RFdiffusion2
107132
```
133+
134+
You can add this to your environment via
135+
```
136+
conda env config vars set PYTHONPATH=$PYTHONPATH:/path/to/RFdiffusion2
137+
```
138+
so that you do not need to set it every time.
108139
109140
.. _troubleshooting:
110141
111142
### Troubleshooting
143+
<a id="install_troubleshooting"></a>
112144
Ran into an installation issue not covered here? [Create a new issue!](https://github.com/RosettaCommons/RFdiffusion2/issues)
113145
114146

doc/source/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Overview
22
========
33

4-
Introduced in [*Atom level enzyme active site scaffolding using RFdiffusion2*](https://www.biorxiv.org/content/10.1101/2025.04.09.648075v1), RFdiffusion2 expands on the enzyme scaffolding capabilities of diffusion-based protein design by giving researchers finer control over enzyme active sites.
4+
Introduced in [Atom level enzyme active site scaffolding using RFdiffusion2](https://www.biorxiv.org/content/10.1101/2025.04.09.648075v1), RFdiffusion2 expands on the enzyme scaffolding capabilities of diffusion-based protein design by giving researchers finer control over enzyme active sites.
55
The original [RFdiffusion](https://github.com/RosettaCommons/RFdiffusion) could generate enzyme scaffolds, but the geometry of the active site could only the specified at the residue level - no atomic or rotamer information could be directly provided.
66
Although defining hotspot residues provided a way for protein designers to control scaffold-ligand interactions, they offered limited flexibility for the placement of the catalytic residues in the final design.
77

@@ -12,4 +12,4 @@ RFdiffusion2 addresses these limitations by:
1212

1313
To learn how to run RFdiffusion2 using an [Apptainer](https://apptainer.org/) image, see the [READEME](readme_link.html).
1414

15-
> **NOTE:** The current rendition of RFdiffusion2 makes it particularly useful for enzyme scaffolding, but for many other applications RFdiffusion (the original) will be easier to use and may provide comparable or better results.
15+
> **NOTE:** The current rendition of RFdiffusion2 makes it particularly useful for enzyme scaffolding and it has increased backbone flexibility compared to RFdiffusion. However, for binder design it is recommended to use the original RFdiffusion.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
## Configuration Options

doc/source/usage/ori_tokens.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## ORI Tokens
2+
3+
ORI Tokens
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
## Using the pipelines.py script
2+
3+
4+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Using run_inference.py
2+
3+
While the pipelines.py script is powerful, you can also run

0 commit comments

Comments
 (0)