|
1 | 1 | # RFdiffusion3 — Input specification (dialect **2**) |
2 | 2 |
|
3 | 3 | > **TL;DR** |
4 | | -> Inputs are now defined with a single `InputSpecification` class. |
| 4 | +> Inputs are now defined with a single `InputSpecification` class, see [`rfd3/src/rfd3/inference/input.parsing.py`](https://github.com/RosettaCommons/foundry/blob/rac_docs/models/rfd3/src/rfd3/inference/input_parsing.py) to see all possible inputs. |
5 | 5 | > Selections like “what’s fixed?”, “what’s sequence-free?”, “which atoms are donors/acceptors?” are all expressed with the same **InputSelection** mini-language. |
6 | | -> Everything is reproducibly logged back out alongside your generation. |
| 6 | +> Everything is reproducibly logged back out alongside your generation – each design will create an output JSON file with all setting defined. |
7 | 7 |
|
8 | 8 | --- |
9 | 9 |
|
10 | | -- [What changed (high level)](#what-changed-high-level) |
11 | 10 | - [Quick start](#quick-start) |
12 | 11 | - [The `InputSelection` mini-language](#the-inputselection-mini-language) |
13 | 12 | - [Full schema: `InputSpecification`](#full-schema-inputspecification) |
|
23 | 22 |
|
24 | 23 | --- |
25 | 24 |
|
26 | | -## How it works (high level) |
27 | | - |
28 | | -- **Unified selections.** All per-residue/atom choices now use **InputSelection**: |
29 | | - - You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`). |
30 | | - - Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`. |
31 | | -- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples). |
32 | | - When using `unindex`, only **the atoms you mark as fixed** are carried over from the input. |
33 | | -- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains). |
34 | | -- **Safer parsing.** You’ll now get early, informative errors if: |
35 | | - - You pass unknown keys, |
36 | | - - A selection doesn’t match any atoms, |
37 | | - - Indexed and unindexed motifs overlap, |
38 | | - - Mutually exclusive selections overlap (e.g., two RASA bins for the same atom). |
39 | | -- **Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.) |
40 | | - |
41 | | ---- |
42 | | - |
43 | 25 | ## InputSpecification |
44 | | - |
45 | | -| Field | Type | Description | |
46 | | -| -------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------- | |
47 | | -| `input` | `str?` | Path to input **PDB/CIF**. Required if you provide contig+length. | |
48 | | -| `atom_array_input` | internal | Pre-loaded `AtomArray` (not recommended). | |
49 | | -| `contig` | `InputSelection?` | Indexed motif specification, e.g., `"A1-80,10,\0,B5-12"`. | |
50 | | -| `unindex` | `InputSelection?` | Unindexed motif components (unknown sequence placement). | |
51 | | -| `length` | `str?` | Total design length constraint; `"min-max"` or int. | |
52 | | -| `ligand` | `str?` | Ligand(s) by resname or index. | |
53 | | -| `cif_parser_args` | `dict?` | Optional args to CIF loader. | |
54 | | -| `extra` | `dict` | Extra metadata (e.g., logs). | |
55 | | -| `dialect` | `int` | `2`=new (default), `1`=legacy. | |
56 | | -| `select_fixed_atoms` | `InputSelection?` | Atoms with fixed coordinates. | |
57 | | -| `select_unfixed_sequence` | `InputSelection?` | Where sequence can change. | |
58 | | -| `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection?` | RASA bins 0/1/2 (mutually exclusive). | |
59 | | -| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection?` | Atom-wise donor/acceptor flags. | |
60 | | -| `select_hotspots` | `InputSelection?` | Atom-level or token-level hotspots. | |
61 | | -| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs. | |
62 | | -| `symmetry` | `SymmetryConfig?` | See `docs/symmetry.md`. | |
63 | | -| `ori_token` | `list[float]?` | `[x,y,z]` origin override to control COM placement | |
64 | | -| `infer_ori_strategy` | `str?` | `"com"` or `"hotspots"`. | |
65 | | -| `plddt_enhanced` | `bool` | Default `true`. | |
66 | | -| `is_non_loopy` | `bool` | Default `true`. | |
67 | | -| `partial_t` | `float?` | Noise (Å) for partial diffusion, enables partial diffusion | |
| 26 | +Here are some of the inference settings in RFdiffusion3 (RFD3): |
| 27 | +* For the inputs that are of type `InputSelection` see section [The InputSelection mini-language](#the-inputselection-mini-language) for more details |
| 28 | + |
| 29 | +| Field | Type | Description | |
| 30 | +| -------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------------------------- | |
| 31 | +| `input` | `str` | Path to and file name of input **PDB/CIF**. Required if you provide `contig`+`length`. | |
| 32 | +| `atom_array_input` | `AtomArray` | Pre-loaded `AtomArray` ([class from Biotite](https://www.biotite-python.org/latest/apidoc/biotite.structure.AtomArray.html)) (not recommended). | |
| 33 | +| `contig` | `InputSelection` | Indexed motif specification, e.g., `"A1-80,10,\0,B5-12"`. More details in [next section](#contig) | |
| 34 | +| `unindex` | `InputSelection` | Unindexed motif components (unknown sequence placement). Example: `A15-20,B6-10` or <!-- TO DO test out dictionary specification for this--> | |
| 35 | +| `length` | `str?` | Total design length constraint; `"min-max"` or int. | |
| 36 | +| `ligand` | `str?` | Ligand(s) by resname or index. | |
| 37 | +| `cif_parser_args` | `dict?` | Optional args to CIF loader. | |
| 38 | +| `extra` | `dict` | Extra metadata (e.g., logs). | |
| 39 | +| `dialect` | `int` | `2`=new (default), `1`=legacy. | |
| 40 | +| `select_fixed_atoms` | `InputSelection?` | Atoms with fixed coordinates. | |
| 41 | +| `select_unfixed_sequence` | `InputSelection?` | Where sequence can change. | |
| 42 | +| `select_buried` / `select_partially_buried` / `select_exposed` | `InputSelection?` | RASA bins 0/1/2 (mutually exclusive). | |
| 43 | +| `select_hbond_donor` / `select_hbond_acceptor` | `InputSelection?` | Atom-wise donor/acceptor flags. | |
| 44 | +| `select_hotspots` | `InputSelection?` | Atom-level or token-level hotspots. | |
| 45 | +| `redesign_motif_sidechains` | `bool` | Fixed backbone, redesigned sidechains for motifs. | |
| 46 | +| `symmetry` | `SymmetryConfig?` | See `docs/symmetry.md`. | |
| 47 | +| `ori_token` | `list[float]?` | `[x,y,z]` origin override to control COM placement | |
| 48 | +| `infer_ori_strategy` | `str?` | `"com"` or `"hotspots"`. | |
| 49 | +| `plddt_enhanced` | `bool` | Default `true`. | |
| 50 | +| `is_non_loopy` | `bool` | Default `true`. | |
| 51 | +| `partial_t` | `float?` | Noise (Å) for partial diffusion, enables partial diffusion | |
68 | 52 |
|
69 | 53 |
|
70 | 54 | ## Quick start |
71 | 55 |
|
72 | | -### Minimal JSON example |
| 56 | +### `contig` |
| 57 | +The 'contig string' is one way to specify the portions of your final structure that come from your input PDB/CIF or are designed by RFD3. Here are a few guidelines for writing a `contig` string: |
| 58 | +- Different portions of the string should be comma separated |
| 59 | +- `\0` denotes a chain break - no peptide bond is specified between the chain before/after the chain break but the break can be as large/small as makes sense for the rest of the design |
| 60 | +- Any portions of the string that start with a letter (e.g. `A1-80`) come from the input PDB, the letter corresponds to the chain label in the input PDB/CIF file |
| 61 | +- Any portions of the string that do **not** start with a letter are going to be designed by RFD3 |
| 62 | +- If a range is specified for a designed segment (e.g., `100–150`), the length of the designed region is sampled uniformly at random from that range, inclusive. |
| 63 | +- The order of the `contig` string is followed in the design |
| 64 | + |
| 65 | +> **Example** |
| 66 | +> |
| 67 | +> `A1-80,10-20,A100-120,B25-50,\0,C43-56,40-60` |
| 68 | +> |
| 69 | +> The resulting design would have: |
| 70 | +> - Residues 1-80 from chain A in the input PDB/CIF |
| 71 | +> - 10 to 20 designed residues that connect to residue A80 |
| 72 | +> - Residues 100-120 from chain A in the input PDB/CIF, connected to the last residue in the designed region |
| 73 | +> - Residues 25-50 from chain B in the input PDB/CIF, connected to A120, even if this connection did not exist in the input PDB/CIF |
| 74 | +> - A chain break |
| 75 | +> - Residues 43-56 from chain C in the input PDB/CIF not connected to the previous chain |
| 76 | +> - 40-60 designed residues that connect to residue C56 |
| 77 | +
|
| 78 | +### Input File Types |
| 79 | +For more detailed information about these file types, see {doc}`intro_inference_calculations`. |
| 80 | + |
| 81 | +#### Minimal JSON example |
73 | 82 |
|
74 | 83 | ```json |
75 | 84 | { |
76 | | - "": { |
| 85 | + "calculation_label": { |
77 | 86 | "input": "path/to/template.pdb", |
78 | 87 | "contig": "A1-80", |
79 | 88 | "length": "150-180", |
80 | 89 | "select_fixed_atoms": true, |
81 | 90 | "select_unfixed_sequence": "A20-35", |
82 | 91 | "ligand": "HAX,OAA", |
83 | 92 | "dialect": 2 |
84 | | - } |
| 93 | + } |
85 | 94 | } |
86 | 95 | ``` |
87 | | -### Mininmal YAML example |
88 | | -``` |
89 | | -input: path/to/template.pdb |
90 | | -contig: A1-80 |
91 | | -length: 150-180 |
92 | | -select_fixed_atoms: true |
93 | | -select_unfixed_sequence: A20-35 |
94 | | -ligand: HAX,OAA |
95 | | -dialect: 2 |
96 | 96 |
|
| 97 | +#### Mininmal YAML example |
| 98 | +```yaml |
| 99 | +calculation_label: |
| 100 | + input: path/to/template.pdb |
| 101 | + contig: A1-80 |
| 102 | + length: 150-180 |
| 103 | + select_fixed_atoms: true |
| 104 | + select_unfixed_sequence: A20-35 |
| 105 | + ligand: HAX,OAA |
| 106 | + dialect: 2 |
97 | 107 | ``` |
98 | 108 |
|
99 | 109 | ### Python API |
|
0 commit comments