Skip to content

Commit e9d3305

Browse files
committed
docs(rule-system): Add comprehensive documentation for the DREIDING rule system and its structure
1 parent cc33a7e commit e9d3305

File tree

1 file changed

+134
-0
lines changed

1 file changed

+134
-0
lines changed

docs/05_rule_system.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Reference: The DREIDING Rule System
2+
3+
After perception, every atom in an `AnnotatedMolecule` carries rich metadata: element, degree, lone pairs, hybridization, aromaticity, resonance state, and the smallest ring it participates in. The typing engine does not hard-code chemistry; instead, it evaluates TOML rules that describe how those annotations map to DREIDING atom types. This document is the complete guide to that rule layer.
4+
5+
## Rule Structure
6+
7+
Rules are declared as `[[rule]]` tables inside a TOML file. Each rule must provide four keys:
8+
9+
- `name` (`string`): descriptive identifier surfaced in diagnostics.
10+
- `priority` (`integer`): conflict resolver; **larger values win** if multiple rules match an atom during the same iteration.
11+
- `type` (`string`): the DREIDING atom type emitted when the rule fires.
12+
- `conditions` (`table`): property checks an atom must satisfy. All listed checks must pass.
13+
14+
Example:
15+
16+
```toml
17+
[[rule]]
18+
name = "N_Trigonal_SP2"
19+
priority = 200
20+
type = "N_2"
21+
conditions = { element = "N", hybridization = "SP2" }
22+
```
23+
24+
At runtime, `typing::rules::parse_rules` converts the TOML into strongly typed `Rule` structures. `typing::rules::get_default_rules` lazily parses the embedded `resources/default.rules.toml`, so applications can either use the canonical ruleset directly or append their own entries before starting the typing engine.
25+
26+
## Available Conditions
27+
28+
Conditions operate on the immutable snapshot of an `AnnotatedAtom`. Because perception already computed lone pairs, hybridization, resonance, and ring membership, rules can simply read fields and avoid bespoke chemistry code. Every key is optional; omitting a key turns it into a wildcard.
29+
30+
The following table details every valid key that can be used inside the `conditions` table.
31+
32+
| Key | Type | Description |
33+
| ----------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
34+
| **Atom-Intrinsic Properties** | | Properties derived from the atom itself. |
35+
| `element` | String | The atom's element symbol (e.g., `"C"`, `"Na"`). Must be a valid symbol. |
36+
| `formal_charge` | Integer | The formal charge of the atom (e.g., `1`, `0`, `-1`). |
37+
| `degree` | Integer | The number of directly bonded neighbor atoms. |
38+
| `lone_pairs` | Integer | The number of lone electron pairs, as calculated during the Perception Phase. |
39+
| `hybridization` | String | The perceived hybridization state. Valid values: `"SP"`, `"SP2"`, `"SP3"`, `"Resonant"`, `"None"`. |
40+
| `is_in_ring` | Boolean | `true` if the atom is part of any detected ring system. |
41+
| `is_aromatic` | Boolean | `true` if the atom is part of a perceived aromatic system. |
42+
| `is_anti_aromatic` | Boolean | `true` if perception tagged the atom as belonging to an anti-aromatic ring. |
43+
| `is_resonant` | Boolean | `true` if resonance analysis marked the atom as delocalized (e.g., phenoxide oxygen). |
44+
| **Neighbor-Based Properties** | | Properties derived from the atom's immediate neighbors. |
45+
| `neighbor_elements` | Table | Specifies the **exact counts** of neighboring elements. Atoms not listed are assumed to be zero. |
46+
| `neighbor_types` | Table | Specifies the **exact counts** of the **final assigned types** of neighboring atoms. This is the key condition that enables context-dependent, iterative typing. |
47+
48+
**Example of `neighbor_elements`:**
49+
The following condition matches a hydrogen atom bonded to exactly two boron atoms (as in diborane).
50+
51+
```toml
52+
conditions = { element = "H", degree = 2, neighbor_elements = { B = 2 } }
53+
```
54+
55+
**Example of `neighbor_types`:**
56+
This condition would match a carbon atom bonded to exactly one `C_3` atom and three `H_` atoms.
57+
58+
```toml
59+
conditions = { element = "C", neighbor_types = { "C_3" = 1, "H_" = 3 } }
60+
```
61+
62+
### The Role of `priority` and `neighbor_types`
63+
64+
- **Priority:** The `priority` key is the sole mechanism for resolving conflicts. When an atom matches multiple rules, the one with the highest `priority` value is definitively chosen in that iteration.
65+
- **Iteration trigger:** `neighbor_types` refers to already-assigned neighbor atom types. Early rounds may skip these rules while neighbors are still untyped. The engine keeps iterating, seeding newly determined types back into the graph, until every atom is stable. See [Typing Engine](./03_typing_engine.md) for the convergence strategy.
66+
67+
## Default Ruleset Philosophy and Key Atom Types
68+
69+
`resources/default.rules.toml` tracks the original DREIDING priorities while embracing the richer perception data. The layout is intentionally layered:
70+
71+
1. **500+** – Exotic safeties and overrides (e.g., diborane bridging hydrogens).
72+
2. **400s** – Delocalized or aromatic atoms (`*_R`, resonance-stabilized heteroatoms) that must outrank geometry-only rules.
73+
3. **100–300** – Hybridization-driven workhorses keyed off `hybridization` (SP, SP2, SP3, Resonant).
74+
4. **<100** – Simple fallbacks such as halogens, alkali/alkaline-earth metals, hydrogen-bonding hydrogens, and default hydrogens.
75+
76+
Representative entries are summarized below (table retained for quick reference):
77+
78+
| Atom Type | DREIDING Description | Key Rule Condition(s) in `default.rules.toml` | Priority |
79+
| :--------------------- | :---------------------------- | :---------------------------------------------------------------- | :------: |
80+
| `H_` | Standard Hydrogen | `{ element = "H" }` | 1 |
81+
| `H_HB` | Hydrogen-Bonding Hydrogen | `{ element = "H", neighbor_elements = { O = 1 } }` or `{ N = 1 }` | 78–80 |
82+
| `H_b` | Bridging Hydrogen (Diborane) | `{ element = "H", degree = 2, neighbor_elements = { B = 2 } }` | 500 |
83+
| `C_3` | sp³ Tetrahedral Carbon | `{ element = "C", hybridization = "SP3" }` | 100 |
84+
| `C_2` | sp² Trigonal Carbon | `{ element = "C", hybridization = "SP2" }` | 200 |
85+
| `C_1` | sp Linear Carbon | `{ element = "C", hybridization = "SP" }` | 300 |
86+
| `C_R` | Resonant/Aromatic Carbon | `{ element = "C", hybridization = "Resonant" }` | 400 |
87+
| `N_3` | sp³ Nitrogen (Amine/Ammonium) | `{ element = "N", hybridization = "SP3" }` | 100 |
88+
| `N_2` | sp² Nitrogen (Imine/Amide) | `{ element = "N", hybridization = "SP2" }` | 200 |
89+
| `N_1` | sp Linear Nitrogen (Nitrile) | `{ element = "N", hybridization = "SP" }` | 300 |
90+
| `N_R` | Resonant/Aromatic Nitrogen | `{ element = "N", hybridization = "Resonant" }` | 400 |
91+
| `O_3` | sp³ Oxygen (Ether/Alcohol) | `{ element = "O", hybridization = "SP3" }` | 100 |
92+
| `O_2` | sp² Oxygen (Carbonyl) | `{ element = "O", hybridization = "SP2" }` | 200 |
93+
| `O_R` | Resonant Oxygen (Phenol) | `{ element = "O", hybridization = "Resonant" }` | 400 |
94+
| `S_3` | sp³ Sulfur (Thiol/Sulfide) | `{ element = "S", hybridization = "SP3" }` | 100 |
95+
| `S_2` | sp² Sulfur (Thioketone) | `{ element = "S", hybridization = "SP2" }` | 200 |
96+
| `S_R` | Resonant Sulfur (Thiophene) | `{ element = "S", hybridization = "Resonant" }` | 400 |
97+
| `P_3` | sp³ Phosphorus (Phosphate) | `{ element = "P", hybridization = "SP3" }` | 100 |
98+
| `F_`, `Cl`, `Br`, `I_` | Halogens | `{ element = "F" }`, etc. | 50 |
99+
| `Na`, `Ca`, etc. | Metal Ions | `{ element = "Na" }`, etc. | 20 |
100+
101+
## How to Extend the Rule System
102+
103+
Customizing typing means editing TOML, not Rust. Typical workflow:
104+
105+
1. **Author a TOML snippet** (e.g., `my_copper_rules.toml`).
106+
2. **Pick priorities carefully.** Choose values that let your rules coexist with (or outrank) the defaults.
107+
3. **Load rules at runtime.** Parse the TOML with `dreid_typer::rules::parse_rules` and pass the resulting slice into `assign_topology_with_rules`. (If you want to extend the canonical DREIDING file, copy `resources/default.rules.toml` into your project, edit it, and parse that content before appending your custom entries.)
108+
109+
```toml
110+
# my_copper_rules.toml
111+
[[rule]]
112+
name = "Ion_Cu_Divalent"
113+
priority = 20
114+
type = "Cu+2"
115+
conditions = { element = "Cu", formal_charge = 2 }
116+
```
117+
118+
```rust
119+
use dreid_typer::{assign_topology_with_rules, rules::{get_default_rules, parse_rules}, MolecularGraph, MolecularTopology, TyperError};
120+
121+
fn type_with_custom_rules(graph: &MolecularGraph) -> Result<MolecularTopology, TyperError> {
122+
// Load custom rules from embedded TOML.
123+
let custom_rules = parse_rules(include_str!("my_copper_rules.toml"))?;
124+
125+
// Combine with default rules (custom rules extend the defaults).
126+
let mut all_rules = get_default_rules().to_vec();
127+
all_rules.extend(custom_rules);
128+
129+
// Run the pipeline with the combined ruleset.
130+
assign_topology_with_rules(graph, &all_rules)
131+
}
132+
```
133+
134+
Because the engine merely consumes structured data, you can version-control TOML files, generate them from other toolchains, or even ship different rulesets for different force fields—all without recompiling `dreid-typer`.

0 commit comments

Comments
 (0)