Skip to content

Commit 762e070

Browse files
committed
PPI tutorial and RFD3 docs update
Created output files for PPI tutorial and listed their locations. Made edits to files to add labels to sections to remove sphinx warnings.
1 parent 5de555d commit 762e070

File tree

7 files changed

+68
-23
lines changed

7 files changed

+68
-23
lines changed

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
]
2020

2121
templates_path = ['_templates']
22-
exclude_patterns = []
22+
exclude_patterns = ["readme.md", "readmelink.md", "readme_link.rst"]
2323

2424

2525

docs/source/index.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,14 @@
44
contain the root `toctree` directive.
55
66
Welcome to the official documentation for foundry
7-
================================================
7+
=================================================
88

99
`foundry <https://github.com/RosettaCommons/foundry/tree/production>`_ is a home for
1010
many of the machine learning models produced by `Rosetta Commons member labs <https://rosettacommons.org/about/labs/>`_.
1111

1212
.. toctree::
1313
:maxdepth: 1
1414
:caption: General
15-
16-
.. readme_link.rst
1715

1816
license_link.rst
1917

models/rfd3/docs/input.md

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,20 +12,22 @@ This document outlines the various input settings and configurations you can use
1212
- [Quick start](#quick-start)
1313
- [CLI arguments](#cli-arguments)
1414
- [Required CLI Arguments](#required-cli-arguments)
15+
- [Other Useful CLI Arguments](#other-useful-cli-arguments)
1516
- [Other CLI options](#other-CLI-options)
1617
- [InputSpecification fields](#inputspecification-fields)
1718
- [The `InputSelection` mini-language](#the-inputselection-mini-language)
1819
- [Contig Strings](#contig-strings)
1920
- [Input Option Specifics](#input-option-specifics)
2021
- [Unindexing Specifics](#unindexing-specifics)
2122
- [Partial Diffusion](#partial-diffusion)
22-
- [CIF Parser Options](#cif_parser_options)
23+
- [CIF Parser Options](#cif-parser-options)
2324
- [Select Fixed Atoms](#select-fixed-atoms)
2425
- [Debugging recommendations](#debugging-recommendations)
2526
- [FAQ / Gotchas](#faq--gotchas)
2627

2728
---
2829

30+
(quick-start)=
2931
## Quick start
3032
> For more detailed information on RFdiffusion3 inputs and outputs, see {doc}`intro_inference_calculations`
3133
@@ -49,12 +51,17 @@ You can then run inference at the command line with:
4951
rfd3 design out_dir=<path/to/outdir> inputs=<path/to/inputs>
5052
```
5153

54+
(cli-arguments)=
5255
## CLI arguments
56+
57+
(required-cli-arguments)=
5358
### Required CLI arguments:
5459
- `out_dir` — The directory that output files from the inference run will be stored in. If the directory does not exist it will be created. **This does not change how the output files are named.**
5560
- `inputs` — The path and file name of the JSON or YAML file where you have defined your inference constraints.
5661

57-
### Other Useful CLI arguments (from the [default config](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)):
62+
(other-useful-cli-arguments)=
63+
### Other Useful CLI arguments:
64+
(From the [default config](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml))
5865
- `n_batches` — number of batches to generate per input key (default: 1).
5966
- `diffusion_batch_size` — number of diffusion samples (designs) per batch (default: 8). If `n_batches=1` and `diffusion_batch_size=8` then 8 designs will be generated from the inference run.
6067
- `specification` — JSON overrides for the per-example InputSpecification (default: `{}`). For example, you can run `rfd3 design inputs=null specification.length=200` for a quick debug of creating a 200-length protein.
@@ -68,7 +75,7 @@ rfd3 design out_dir=<path/to/outdir> inputs=<path/to/inputs>
6875
- `prevalidate_inputs` — Check that your inputs (JSON or YAML file) are valid before running inference (default: False).
6976
- `low_memory_mode` - Set to True (default: False) for memory efficient tokenization mode.
7077

71-
78+
(other-cli-options)=
7279
### Other CLI Options:
7380
- `json_keys_subset` — Allows the user to extract only a subset of the JSON keys provided in the `inputs` file (default: null).
7481
- `inference_sampler`
@@ -99,6 +106,7 @@ rfd3 design out_dir=<path/to/outdir> inputs=<path/to/inputs>
99106

100107
The full config of default arguments that are applied can be seen in [inference_engine/rfdiffusion3.yaml](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)
101108

109+
(inputspecification-fields)=
102110
## InputSpecification fields
103111

104112
Below is a table of all of the inputs that the `InputSpecification` accepts. Use these fields to describe the constraints you want to apply to your system during inference.
@@ -125,7 +133,7 @@ Below is a table of all of the inputs that the `InputSpecification` accepts. Use
125133
| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection` | Atom-wise donor/acceptor flags. Atom-wise selection of hydrogen bond donors and acceptors, respectively. Only dictionary inputs allowed. See {doc}`na_binder_design` for an example. |
126134
| `select_hotspots` | `InputSelection` | Atom-level or residue-level hotspots. Hotspots will typically be at most 4.5 Å to any heavy atom in the designed structure. Typically used for designing binders. |
127135
| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs (input structures). |
128-
| `symmetry` | `SymmetryConfig` | See {doc}`symmetry.md`. |
136+
| `symmetry` | `SymmetryConfig` | See {doc}`symmetry`. |
129137
| `ori_token` | `list[float]` | `[x,y,z]` origin override to control COM (center of mass) placement of designed structure. |
130138
| `infer_ori_strategy` | `str` | `"com"` or `"hotspots"`. The center of mass of the diffused region will typically be within 5Å of the ORI token. Using `hotspots` will place the ORI token 10Å outward from the center of mass of the specified hotspots. Using `com` will place the token at the center of mass of the input structure.|
131139
| `plddt_enhanced` | `bool` | Default `True`. Enables pLDDT (predicted Local Distance Difference Test) enhancement. |
@@ -148,6 +156,7 @@ A few notes on the above:
148156
- **Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.)
149157

150158
---
159+
(the-inputselection-mini-language)=
151160
## The InputSelection Mini-Language
152161

153162
Fields marked as `InputSelection` accept either a boolean, a contig-style string, or a dictionary. Dictionaries are the most expressive and can also use shorthand values like `ALL`, `TIP`, or `BKBN`:
@@ -160,11 +169,20 @@ select_fixed_atoms:
160169
LIG: '' # selects no atoms (i.e. unfixes the atoms for ligands named `LIG`)
161170
```
162171
163-
<p align="center">
172+
<!--<p align="center">
164173
<img src=".assets/input_selection.png" alt="InputSelection language for foundry" width=500>
165-
</p>
174+
</p>-->
175+
```{figure} .assets/input_selection.png
176+
---
177+
alt: Input selection language for foundry.
178+
width: 500px
179+
---
180+
Graphical representation of the different ways to specify portions of a structure using RFD3's InputSelection mini-language.
181+
```
182+
166183

167-
## Contig String
184+
(contig-strings)=
185+
## Contig Strings
168186
A 'contig string' is a string that contains residue information and is used in many of the settings in the table above. Here are some formatting specifics:
169187
- Different pieces of information included in the string are separated by commas
170188
- Ranges of residues are specified by a dash (`-`) between the starting and ending residue
@@ -186,8 +204,10 @@ my_calculation:
186204
- `B3-B45`: Residues B3 thru B45 are taken from the input structure.
187205
- `60-80`: A design region is added B45 that will be between 60 and 80 residues long.
188206

207+
(input-option-specifics)=
189208
## Input Option Specifics
190209

210+
(unindexing-specifics)=
191211
### Unindexing Specifics
192212

193213
`unindex` marks motif tokens whose relative sequence placement is unknown to the model (useful for scaffolding around active sites, etc.).
@@ -198,6 +218,7 @@ You can specify consecutive residues as e.g. `A11-12` (instead of `A11,A12`), th
198218
Similarly, you can specify manually any number of residues that offsets two components, e.g. `A11,0,A12` (0 sequence offset, equivalent to just `A11-12`), or `A11,3,A12` (3-residue separation).
199219
From our initial tests this only leads to a slight bias in the model, but newer models may show better adherence!
200220

221+
(partial-diffusion)=
201222
### Partial Diffusion
202223
To enable partial diffusion, you can pass `partial_t` with any example. This sets the *noise level* in *angstroms* for the sampler:
203224
- The `specification.partial_t` argument can be specified from JSON or the command line.
@@ -222,10 +243,15 @@ In the following example, RFD3 will noise out by 15 angstroms and constrain atom
222243
}
223244
```
224245
Below is an example of what the output should look like (diffusion outputs in teal, original native in navajo white):
225-
<p align="center">
246+
<!--<p align="center">
226247
<img src=".assets/partial_diff.png" alt="Partial diffusion" width=650>
227-
</p>
248+
</p>-->
249+
```{image} .assets/partial_diff.png
250+
:alt: Partial diffusion.
251+
:width: 650px
252+
```
228253

254+
(cif-parser-options)=
229255
### CIF Parser Options
230256
The `cif_parser_args` setting that you can include in your input JSON or YAML file accepts several possible values as a dictionary:
231257
- `cache_dir`: String specifying the path to the directory where cache files are stored (default: null).
@@ -239,18 +265,21 @@ The `cif_parser_args` setting that you can include in your input JSON or YAML fi
239265

240266
You can also use `STANDARD_PARSER_ARGS` from [AtomWorks](https://github.com/RosettaCommons/atomworks), more information can be found at [atomworks/io/parser.py](https://github.com/RosettaCommons/atomworks/blob/production/src/atomworks/io/parser.py)
241267

268+
(select-fixed-atoms)=
242269
### Select Fixed Atoms
243270
The `select_fixed_atoms` input setting can take a boolean, dictionary or contig string as input:
244271
- `True`: All atoms pulled from the input file (via `contig`, for example) are fixed in 3D space
245272
- `False`: All the atoms pulled from the input file are unfixed in 3D space
246273
- Contig string: See the [Contig Strings](#contig-strings) section for formatting. Specifying a contig string for this setting allows for the specification of several components to fix in 3D space. This string should only reference residues from the input. Chain breaks are irrelevant for this setting.
247274
- Dictionary: Allows for the specification of specific atoms within the residue to be fixed in 3D space. For example, `{"A1": "N,CA,C,O,CB,CG", "A2-10": "BKBN"}` fixes backbone and CB for residues 1 and 2, and all atoms for residues 3-10 in chain A.
248275

276+
(debugging-recommendations)=
249277
## Debugging recommendations
250278
- For unindexed scaffolding, you can use the option `cleanup_guideposts=False` to keep the models' outputs for the guideposts. The guideposts are saved as separate chains based on whether their relative indices were leaked to the model: e.g. for `unindex=A11-12,A22`, you should see `A11` and `A12` indexed together on one chain and `A22` on its own chain, indicating the model was provided with the fact that `A11` and `A12` are immediately next to one another in sequence but their distance to `A22` is unknown.
251279
- To see the full 14 diffused virtual atoms you can use `cleanup_virtual_atoms=False`. Default is to discard them for the sake of downstream processing.
252280
- To see the trajectories, you can use `dump_trajectories=True`. This can be useful if the outputs look strange but the config is correct, or if you want to make cool gifs of course! Trajectories do not have sequence labels and contain virtual atoms.
253281

282+
(faq--gotchas)=
254283
## FAQ / Gotchas
255284

256285
<details>

models/rfd3/docs/ppi_design_tutorial.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Before We Get Started...
44
This tutorial does not cover installing RFD3, before continuing you should make sure that RFdiffusion3 (RFD3) is installed and able to be run on your system.
55

6-
See the [README](../README.md) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
6+
See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
77

88
```{note}
99
The instructions below assume that you have installed RFD3 via the pip commands.
@@ -12,30 +12,30 @@ You may need to slightly modify how you run the calculations based on your setup
1212

1313
Make sure you have activated any environments you used to install RFD3.
1414

15-
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one. <!-- TO DO: Say memory requirement-->
15+
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one.
1616

17-
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb).
17+
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.
1818

1919
Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed.
2020

21+
(learning-objectives)=
2122
## Learning Objectives
2223
In this tutorial, we will design a binder for the human insulin receptor to explore the settings available in RFD3 that are useful in protein-protein interface (PPI) design.
2324

25+
(setup)=
2426
## Setup
2527
Create a directory named `rfd3_ppi_tutorial` and `cd` into it:
2628
```bash
2729
mkdir rfd3_ppi_tutorial && cd rfd3_ppi_tutorial
2830
```
29-
This is where you will be storing the files related to this tutorial. Download the `ppi_design_tutorial_files.zip` file and decompress it:
30-
```bash
31-
unzip ppi_design_tutorial_files.zip
32-
```
31+
This is where you will be storing the files related to this tutorial.
3332

34-
If you would like to compare your outputs against those generated by the authors of this tutorial, you can download the example output files here. <!-- TO DO: Create example output files.-->
35-
The 'basic' folder outputs did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' directory has the `select_fixed_atoms` option.
33+
If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/ppi_tutorial_files`
34+
The 'basic' zip file contains outputs that did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' zip file has the outputs resulting from using the `select_fixed_atoms` option.
3635

37-
There is also an already made YAML file available here. <!-- TO DO: Add example YAML file --> We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
36+
There is also an already made YAML file available in `foundry/models/rfd3/docs/ppi_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
3837

38+
(creating-the-yaml-file)=
3939
## Creating the YAML file
4040
In this tutorial, we will be briefly describing each of the settings we will be using for this example binder design project.
4141

@@ -93,6 +93,7 @@ In this tutorial, we will be briefly describing each of the settings we will be
9393
```
9494
1. Save you file and close it.
9595

96+
(other-useful-settings)=
9697
### Other useful settings
9798
1. There is a setting for allowing structural flexibility while keeping the sequence fixed in the input structure, for example:
9899
```yaml
@@ -103,6 +104,7 @@ In this tutorial, we will be briefly describing each of the settings we will be
103104
```
104105
Here, an empty list indicates that all atoms are flexible, `BKBN` keeps the backbone atoms fixed while allowing side chain atoms to move, and for the last residue, specific atoms are fixed in place while allowing the others to move. Feel free to try adding this to your YAML file and see how your outputs change.
105106

107+
(running-rfd3)=
106108
## Running RFD3
107109
To actually run RFD3 you need to know:
108110
- the directory you want the outputs to be stored in
@@ -119,6 +121,7 @@ Your output files will be placed in a new directory `ppi_tutorial_outputs/0`. If
119121
You may see several warning messages when you run RFD3, these should not interfere with the calculation.
120122
```
121123

124+
(analyzing-the-outputs)=
122125
## Analyzing the Outputs
123126
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.
124127

@@ -135,6 +138,7 @@ You'll notice that the binders are always on the side of the input structure clo
135138
The lengths of the designed binders are all also between 40 and 120 amino acids long. However, you'll also notice that they are all the same length!
136139
This is because RFD3 runs batched inference calculations. All of the calculations in a single 'batch' will have the same randomly sampled length, while designs from other batches will have different lengths. If you want to change the number of batches, you will want to add the setting `n_batches` to your `run rfd3` command.
137140

141+
(references-and-further-reading)=
138142
## References and Further Reading
139143
- For more information on the different inference settings in RFD3, see [input.md](input.md)
140144
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joeseph L. Watson, et. al.
279 KB
Binary file not shown.
298 KB
Binary file not shown.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
insulinr:
2+
input: 4zxb_cropped.pdb
3+
contig: 40-120,/0,E6-155
4+
length: 190-270
5+
select_hotspots:
6+
E64: CD2,CZ
7+
E88: CG,CZ
8+
E96: CD1,CZ
9+
infer_ori_strategy: hotspots
10+
is_non_loopy: true
11+
select_fixed_atoms:
12+
E25: []
13+
E26: BKBN
14+
E27: CA,CB,OG

0 commit comments

Comments
 (0)