You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`out_dir` — The directory that output files from the inference run will be stored in. If the directory does not exist it will be created. **This does not change how the output files are named.**
55
60
-`inputs` — The path and file name of the JSON or YAML file where you have defined your inference constraints.
56
61
57
-
### Other Useful CLI arguments (from the [default config](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)):
62
+
(other-useful-cli-arguments)=
63
+
### Other Useful CLI arguments:
64
+
(From the [default config](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml))
58
65
-`n_batches` — number of batches to generate per input key (default: 1).
59
66
-`diffusion_batch_size` — number of diffusion samples (designs) per batch (default: 8). If `n_batches=1` and `diffusion_batch_size=8` then 8 designs will be generated from the inference run.
60
67
-`specification` — JSON overrides for the per-example InputSpecification (default: `{}`). For example, you can run `rfd3 design inputs=null specification.length=200` for a quick debug of creating a 200-length protein.
The full config of default arguments that are applied can be seen in [inference_engine/rfdiffusion3.yaml](https://github.com/RosettaCommons/foundry/blob/production/models/rfd3/configs/inference_engine/rfdiffusion3.yaml)
101
108
109
+
(inputspecification-fields)=
102
110
## InputSpecification fields
103
111
104
112
Below is a table of all of the inputs that the `InputSpecification` accepts. Use these fields to describe the constraints you want to apply to your system during inference.
@@ -125,7 +133,7 @@ Below is a table of all of the inputs that the `InputSpecification` accepts. Use
125
133
|`select_hbond_donor` / `select_hbond_acceptor`|`InputSelection`| Atom-wise donor/acceptor flags. Atom-wise selection of hydrogen bond donors and acceptors, respectively. Only dictionary inputs allowed. See {doc}`na_binder_design` for an example. |
126
134
|`select_hotspots`|`InputSelection`| Atom-level or residue-level hotspots. Hotspots will typically be at most 4.5 Å to any heavy atom in the designed structure. Typically used for designing binders. |
127
135
|`redesign_motif_sidechains`|`bool`| Fixed backbone, redesigned sidechains for motifs (input structures). |
128
-
|`symmetry`|`SymmetryConfig`| See {doc}`symmetry.md`. |
136
+
|`symmetry`|`SymmetryConfig`| See {doc}`symmetry`. |
129
137
|`ori_token`|`list[float]`|`[x,y,z]` origin override to control COM (center of mass) placement of designed structure. |
130
138
|`infer_ori_strategy`|`str`|`"com"` or `"hotspots"`. The center of mass of the diffused region will typically be within 5Å of the ORI token. Using `hotspots` will place the ORI token 10Å outward from the center of mass of the specified hotspots. Using `com` will place the token at the center of mass of the input structure.|
-**Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.)
149
157
150
158
---
159
+
(the-inputselection-mini-language)=
151
160
## The InputSelection Mini-Language
152
161
153
162
Fields marked as `InputSelection` accept either a boolean, a contig-style string, or a dictionary. Dictionaries are the most expressive and can also use shorthand values like `ALL`, `TIP`, or `BKBN`:
@@ -160,11 +169,20 @@ select_fixed_atoms:
160
169
LIG: ''# selects no atoms (i.e. unfixes the atoms for ligands named `LIG`)
161
170
```
162
171
163
-
<p align="center">
172
+
<!--<p align="center">
164
173
<img src=".assets/input_selection.png" alt="InputSelection language for foundry" width=500>
165
-
</p>
174
+
</p>-->
175
+
```{figure} .assets/input_selection.png
176
+
---
177
+
alt: Input selection language for foundry.
178
+
width: 500px
179
+
---
180
+
Graphical representation of the different ways to specify portions of a structure using RFD3's InputSelection mini-language.
181
+
```
182
+
166
183
167
-
## Contig String
184
+
(contig-strings)=
185
+
## Contig Strings
168
186
A 'contig string' is a string that contains residue information and is used in many of the settings in the table above. Here are some formatting specifics:
169
187
- Different pieces of information included in the string are separated by commas
170
188
- Ranges of residues are specified by a dash (`-`) between the starting and ending residue
@@ -186,8 +204,10 @@ my_calculation:
186
204
- `B3-B45`: Residues B3 thru B45 are taken from the input structure.
187
205
- `60-80`: A design region is added B45 that will be between 60 and 80 residues long.
188
206
207
+
(input-option-specifics)=
189
208
## Input Option Specifics
190
209
210
+
(unindexing-specifics)=
191
211
### Unindexing Specifics
192
212
193
213
`unindex`marks motif tokens whose relative sequence placement is unknown to the model (useful for scaffolding around active sites, etc.).
@@ -198,6 +218,7 @@ You can specify consecutive residues as e.g. `A11-12` (instead of `A11,A12`), th
198
218
Similarly, you can specify manually any number of residues that offsets two components, e.g. `A11,0,A12` (0 sequence offset, equivalent to just `A11-12`), or `A11,3,A12` (3-residue separation).
199
219
From our initial tests this only leads to a slight bias in the model, but newer models may show better adherence!
200
220
221
+
(partial-diffusion)=
201
222
### Partial Diffusion
202
223
To enable partial diffusion, you can pass `partial_t` with any example. This sets the *noise level* in *angstroms* for the sampler:
203
224
- The `specification.partial_t` argument can be specified from JSON or the command line.
@@ -222,10 +243,15 @@ In the following example, RFD3 will noise out by 15 angstroms and constrain atom
222
243
}
223
244
```
224
245
Below is an example of what the output should look like (diffusion outputs in teal, original native in navajo white):
The `cif_parser_args` setting that you can include in your input JSON or YAML file accepts several possible values as a dictionary:
231
257
- `cache_dir`: String specifying the path to the directory where cache files are stored (default: null).
@@ -239,18 +265,21 @@ The `cif_parser_args` setting that you can include in your input JSON or YAML fi
239
265
240
266
You can also use `STANDARD_PARSER_ARGS` from [AtomWorks](https://github.com/RosettaCommons/atomworks), more information can be found at [atomworks/io/parser.py](https://github.com/RosettaCommons/atomworks/blob/production/src/atomworks/io/parser.py)
241
267
268
+
(select-fixed-atoms)=
242
269
### Select Fixed Atoms
243
270
The `select_fixed_atoms` input setting can take a boolean, dictionary or contig string as input:
244
271
- `True`: All atoms pulled from the input file (via `contig`, for example) are fixed in 3D space
245
272
- `False`: All the atoms pulled from the input file are unfixed in 3D space
246
273
- Contig string: See the [Contig Strings](#contig-strings) section for formatting. Specifying a contig string for this setting allows for the specification of several components to fix in 3D space. This string should only reference residues from the input. Chain breaks are irrelevant for this setting.
247
274
- Dictionary: Allows for the specification of specific atoms within the residue to be fixed in 3D space. For example, `{"A1": "N,CA,C,O,CB,CG", "A2-10": "BKBN"}` fixes backbone and CB for residues 1 and 2, and all atoms for residues 3-10 in chain A.
248
275
276
+
(debugging-recommendations)=
249
277
## Debugging recommendations
250
278
- For unindexed scaffolding, you can use the option `cleanup_guideposts=False` to keep the models' outputs for the guideposts. The guideposts are saved as separate chains based on whether their relative indices were leaked to the model: e.g. for `unindex=A11-12,A22`, you should see `A11` and `A12` indexed together on one chain and `A22` on its own chain, indicating the model was provided with the fact that `A11` and `A12` are immediately next to one another in sequence but their distance to `A22` is unknown.
251
279
- To see the full 14 diffused virtual atoms you can use `cleanup_virtual_atoms=False`. Default is to discard them for the sake of downstream processing.
252
280
- To see the trajectories, you can use `dump_trajectories=True`. This can be useful if the outputs look strange but the config is correct, or if you want to make cool gifs of course! Trajectories do not have sequence labels and contain virtual atoms.
Copy file name to clipboardExpand all lines: models/rfd3/docs/ppi_design_tutorial.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
## Before We Get Started...
4
4
This tutorial does not cover installing RFD3, before continuing you should make sure that RFdiffusion3 (RFD3) is installed and able to be run on your system.
5
5
6
-
See the [README](../README.md) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
6
+
See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions. You will need to remember the path to the directory where you stored your checkpoint files.
7
7
8
8
```{note}
9
9
The instructions below assume that you have installed RFD3 via the pip commands.
@@ -12,30 +12,30 @@ You may need to slightly modify how you run the calculations based on your setup
12
12
13
13
Make sure you have activated any environments you used to install RFD3.
14
14
15
-
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one.<!-- TO DO: Say memory requirement-->
15
+
RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node, if you have access to one.
16
16
17
-
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb).
17
+
You will need the file `4zxb_cropped.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](input_pdbs/4zxb_cropped.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.
18
18
19
19
Lastly, we will be visualizing the outputs of the calculations presented in the tutorial using [PyMOL](https://pymol.org/). The visualization steps are completely optional, but if you would like to follow along you will need to have PyMOL installed.
20
20
21
+
(learning-objectives)=
21
22
## Learning Objectives
22
23
In this tutorial, we will design a binder for the human insulin receptor to explore the settings available in RFD3 that are useful in protein-protein interface (PPI) design.
23
24
25
+
(setup)=
24
26
## Setup
25
27
Create a directory named `rfd3_ppi_tutorial` and `cd` into it:
26
28
```bash
27
29
mkdir rfd3_ppi_tutorial &&cd rfd3_ppi_tutorial
28
30
```
29
-
This is where you will be storing the files related to this tutorial. Download the `ppi_design_tutorial_files.zip` file and decompress it:
30
-
```bash
31
-
unzip ppi_design_tutorial_files.zip
32
-
```
31
+
This is where you will be storing the files related to this tutorial.
33
32
34
-
If you would like to compare your outputs against those generated by the authors of this tutorial, you can download the example output files here. <!-- TO DO: Create example output files.-->
35
-
The 'basic' folder outputs did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' directory has the `select_fixed_atoms` option.
33
+
If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/ppi_tutorial_files`
34
+
The 'basic' zip file contains outputs that did not use the setting discussed in [Other Useful Settings](#other-useful-settings) section. The 'fixed' zip file has the outputs resulting from using the `select_fixed_atoms` option.
36
35
37
-
There is also an already made YAML file available here. <!-- TO DO: Add example YAML file --> We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
36
+
There is also an already made YAML file available in `foundry/models/rfd3/docs/ppi_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to PPI design.
38
37
38
+
(creating-the-yaml-file)=
39
39
## Creating the YAML file
40
40
In this tutorial, we will be briefly describing each of the settings we will be using for this example binder design project.
41
41
@@ -93,6 +93,7 @@ In this tutorial, we will be briefly describing each of the settings we will be
93
93
```
94
94
1. Save you file and close it.
95
95
96
+
(other-useful-settings)=
96
97
### Other useful settings
97
98
1. There is a setting for allowing structural flexibility while keeping the sequence fixed in the input structure, for example:
98
99
```yaml
@@ -103,6 +104,7 @@ In this tutorial, we will be briefly describing each of the settings we will be
103
104
```
104
105
Here, an empty list indicates that all atoms are flexible, `BKBN` keeps the backbone atoms fixed while allowing side chain atoms to move, and for the last residue, specific atoms are fixed in place while allowing the others to move. Feel free to try adding this to your YAML file and see how your outputs change.
105
106
107
+
(running-rfd3)=
106
108
## Running RFD3
107
109
To actually run RFD3 you need to know:
108
110
- the directory you want the outputs to be stored in
@@ -119,6 +121,7 @@ Your output files will be placed in a new directory `ppi_tutorial_outputs/0`. If
119
121
You may see several warning messages when you run RFD3, these should not interfere with the calculation.
120
122
```
121
123
124
+
(analyzing-the-outputs)=
122
125
## Analyzing the Outputs
123
126
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.
124
127
@@ -135,6 +138,7 @@ You'll notice that the binders are always on the side of the input structure clo
135
138
The lengths of the designed binders are all also between 40 and 120 amino acids long. However, you'll also notice that they are all the same length!
136
139
This is because RFD3 runs batched inference calculations. All of the calculations in a single 'batch' will have the same randomly sampled length, while designs from other batches will have different lengths. If you want to change the number of batches, you will want to add the setting `n_batches` to your `run rfd3` command.
137
140
141
+
(references-and-further-reading)=
138
142
## References and Further Reading
139
143
- For more information on the different inference settings in RFD3, see [input.md](input.md)
140
144
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joeseph L. Watson, et. al.
0 commit comments