Skip to content

Commit 0263924

Browse files
authored
Merge pull request #14 from caltechmsc/feature/13-decouple-bond-order-logic-and-implement-strict-resonance-perception
refactor(core): Decouple Graph and Topology Bond Orders and Enforce Strict Resonance
2 parents dd01bc3 + 573c9a5 commit 0263924

28 files changed

+8297
-3259
lines changed

Cargo.toml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "dreid-typer"
3-
version = "0.2.1"
3+
version = "0.3.0"
44
authors = [
55
"Tony Kan <tonykan@caltech.edu>",
66
"William A. Goddard III <wag@caltech.edu>",
@@ -23,7 +23,6 @@ readme = "README.md"
2323
thiserror = "2.0.17"
2424
toml = "0.9.7"
2525
serde = { version = "1.0.188", features = ["derive"] }
26-
pauling = "0.1.0"
2726

2827
[lib]
2928
name = "dreid_typer"

README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,27 +22,27 @@ Add the crate to your `Cargo.toml`:
2222

2323
```toml
2424
[dependencies]
25-
dreid-typer = "0.2.1"
25+
dreid-typer = "0.3.0"
2626
```
2727

2828
Run the full pipeline from connectivity to topology:
2929

3030
```rust
31-
use dreid_typer::{assign_topology, Element, MolecularGraph, MolecularTopology, BondOrder};
31+
use dreid_typer::{assign_topology, Element, GraphBondOrder, MolecularGraph, MolecularTopology};
3232

3333
let mut graph = MolecularGraph::new();
3434
let c1 = graph.add_atom(Element::C);
3535
let c2 = graph.add_atom(Element::C);
3636
let o = graph.add_atom(Element::O);
3737
let h_o = graph.add_atom(Element::H);
38-
let h_atoms: Vec<_> = (0..6).map(|_| graph.add_atom(Element::H)).collect();
38+
let h_atoms: Vec<_> = (0..5).map(|_| graph.add_atom(Element::H)).collect();
3939

40-
graph.add_bond(c1, c2, BondOrder::Single).unwrap();
41-
graph.add_bond(c2, o, BondOrder::Single).unwrap();
42-
graph.add_bond(o, h_o, BondOrder::Single).unwrap();
43-
for (carbon, chunk) in [c1, c2].into_iter().zip(h_atoms.chunks(3)) {
40+
graph.add_bond(c1, c2, GraphBondOrder::Single).unwrap();
41+
graph.add_bond(c2, o, GraphBondOrder::Single).unwrap();
42+
graph.add_bond(o, h_o, GraphBondOrder::Single).unwrap();
43+
for (carbon, chunk) in [(c1, &h_atoms[0..3]), (c2, &h_atoms[3..5])].into_iter() {
4444
for &hydrogen in chunk {
45-
graph.add_bond(carbon, hydrogen, BondOrder::Single).unwrap();
45+
graph.add_bond(carbon, hydrogen, GraphBondOrder::Single).unwrap();
4646
}
4747
}
4848

@@ -54,16 +54,20 @@ assert_eq!(topology.atoms[o].atom_type, "O_3");
5454
assert_eq!(topology.atoms[h_o].atom_type, "H_HB");
5555
```
5656

57-
Need custom chemistry? Parse a TOML file and run the same API:
57+
Need custom chemistry? Parse a TOML file and extend the default rules:
5858

5959
```rust
60-
use dreid_typer::{assign_topology_with_rules, rules, MolecularGraph};
60+
use dreid_typer::{assign_topology_with_rules, rules::{get_default_rules, parse_rules}, MolecularGraph};
6161

62-
let mut ruleset = rules::get_default_rules().to_vec();
63-
let extra = std::fs::read_to_string("my_metals.rules.toml")?;
64-
ruleset.extend(rules::parse_rules(&extra)?);
62+
// Start with the default DREIDING rules
63+
let mut all_rules = get_default_rules().to_vec();
6564

66-
let topology = assign_topology_with_rules(&graph, &ruleset)?;
65+
// Parse and append custom rules from a TOML file
66+
let extra_toml = std::fs::read_to_string("my_metals.rules.toml")?;
67+
all_rules.extend(parse_rules(&extra_toml)?);
68+
69+
// Run the pipeline with extended rules
70+
let topology = assign_topology_with_rules(&graph, &all_rules)?;
6771
```
6872

6973
## Documentation

docs/01_pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Once a `MolecularGraph` enters the pipeline, it is immediately converted into an
3636
- Intrinsic properties (`element`, `formal_charge`).
3737
- Topological properties (`degree`, `is_in_ring`, `smallest_ring_size`).
3838
- Electronic properties (`lone_pairs`, `steric_number`, `hybridization`).
39-
- Special properties (`is_aromatic`, `perception_source`).
39+
- Aromaticity and resonance flags (`is_aromatic`, `is_anti_aromatic`, `is_resonant`).
4040
- An adjacency list for efficient neighbor traversal.
4141
- **Design Rationale:**
4242
- **Centralized Knowledge:** By pre-calculating and storing all relevant properties in one place, the subsequent typing and building phases can be implemented as efficient, stateless queries against this data structure. This avoids redundant calculations.

docs/02_perception.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,20 +57,17 @@ Each pass mutates the shared `AnnotatedMolecule`. Later stages can rely on the i
5757
## 5. Resonance — `resonance::perceive`
5858

5959
- **Goal:** Mark atoms that participate in conjugated systems, even when they are not part of a strictly aromatic ring.
60-
- **How it works:** The pass delegates to the external `pauling` crate to discover resonance systems, then overlays project-specific heuristics:
61-
- Aromatic atoms are always marked conjugated.
62-
- Amide/thioamide and sulfonamide motifs promote their heteroatom donors into conjugation when lone pairs are available.
63-
- Hypervalent halogen oxyanions have their terminal oxygens demoted to avoid false conjugation.
64-
- Purely σ-bound sulfurs that slipped through the previous steps are also demoted.
65-
- **Why it matters:** Conjugation flags feed hybridization inference and help the typing engine distinguish resonant atoms from plain sp² centers.
60+
- **How it works:** The pass uses strict substructure matching to detect chemically significant resonance motifs. It operates in two phases:
61+
1. **Core functional group detection:** Pattern recognizers identify carboxylates, nitro groups, guanidinium ions, thiourea/thioamide fragments, amides, and phosphate groups. When a motif is found, all participating atoms are flagged as resonant, and the system (atoms + bonds) is recorded for later topology emission.
62+
2. **Peripheral propagation:** Heteroatoms (O, N, S) with lone pairs that are adjacent to already-resonant atoms are themselves promoted to resonant.
63+
- **Why it matters:** Conjugation flags feed hybridization inference and help the typing engine distinguish resonant atoms from plain sp² centers. The recorded resonance systems inform the builder phase which bonds should receive the resonant bond order.
6664

6765
## 6. Hybridization — `hybridization::perceive`
6866

6967
- **Goal:** Assign the final `Hybridization` enum and normalized `steric_number` for every atom.
7068
- **How it works:** For each atom:
7169
- Elements that never hybridize (alkali metals, halogens, most transition metals) are stamped as `Hybridization::None`.
7270
- Conjugated atoms that are not anti-aromatic collapse to `Hybridization::Resonant`, even when their raw steric number is four (lone-pair donation collapses the geometry to trigonal).
73-
- Aromatic atoms default to `Hybridization::SP2`.
7471
- Remaining atoms fall back to VSEPR rules derived from `degree + lone_pairs`.
7572
- The stored `steric_number` is renormalized so downstream consumers can rely on 2/3/4 despite resonance collapsing a formal 4 to 3.
7673
- **Why it matters:** The typing rules operate primarily on the `hybridization`, aromatic flags, and neighbor information produced by this pass. The builder also copies the final hybridization into the emitted topology.
@@ -81,7 +78,7 @@ By the end of chemical perception every `AnnotatedAtom` contains:
8178

8279
- identity (`element`, `id`, `degree`)
8380
- ring context (`is_in_ring`, `smallest_ring_size`)
84-
- electronic structure (`formal_charge`, `lone_pairs`, `is_resonant`, `is_in_conjugated_system`)
81+
- electronic structure (`formal_charge`, `lone_pairs`, `is_resonant`)
8582
- aromaticity flags (`is_aromatic`, `is_anti_aromatic`)
8683
- geometry (`hybridization`, normalized `steric_number`)
8784

docs/03_typing_engine.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ Every rule declares:
1616

1717
```toml
1818
[[rule]]
19-
name = "N_aromatic"
19+
name = "N_Resonant"
2020
priority = 400
2121
type = "N_R"
2222
[rule.conditions]
2323
element = "N"
24-
is_aromatic = true
24+
hybridization = "Resonant"
2525
```
2626

2727
Key aspects used by the engine:
@@ -65,9 +65,9 @@ Any failed check short-circuits the rest; only atoms meeting _all_ specified con
6565
## Worked Example: Ethanol (`CH3-CH2-OH`)
6666

6767
1. **Round 1:**
68-
- Carbons satisfy the `C_3` rule (`steric_number = 4`) with priority 100.
69-
- The oxygen matches `O_3` (same priority), picking up `Hybridization::SP3` and `type = "O_3"`.
70-
- The hydroxyl hydrogen matches `H_HB` (priority ~250) because its neighbor elements histogram includes one oxygen.
68+
- Carbons satisfy the `C_3` rule (`hybridization = "SP3"`) with priority 100.
69+
- The oxygen matches `O_3` (same priority) with `hybridization = "SP3"`.
70+
- The hydroxyl hydrogen matches `H_HB` (priority 80) because its neighbor elements histogram includes one oxygen.
7171
- Remaining hydrogens fall back to the generic `H_` rule (priority 1).
7272
2. **Round 2:** re-evaluating atoms discovers no higher-priority matches, so the engine exits with the types assigned in the previous round.
7373

docs/05_rule_system.md

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Example:
1818
name = "N_Trigonal_SP2"
1919
priority = 200
2020
type = "N_2"
21-
conditions = { element = "N", steric_number = 3, is_aromatic = false }
21+
conditions = { element = "N", hybridization = "SP2" }
2222
```
2323

2424
At runtime, `typing::rules::parse_rules` converts the TOML into strongly typed `Rule` structures. `typing::rules::get_default_rules` lazily parses the embedded `resources/default.rules.toml`, so applications can either use the canonical ruleset directly or append their own entries before starting the typing engine.
@@ -36,13 +36,11 @@ The following table details every valid key that can be used inside the `conditi
3636
| `formal_charge` | Integer | The formal charge of the atom (e.g., `1`, `0`, `-1`). |
3737
| `degree` | Integer | The number of directly bonded neighbor atoms. |
3838
| `lone_pairs` | Integer | The number of lone electron pairs, as calculated during the Perception Phase. |
39-
| `steric_number` | Integer | The sum of `degree` and `lone_pairs`. A primary indicator of geometry. |
4039
| `hybridization` | String | The perceived hybridization state. Valid values: `"SP"`, `"SP2"`, `"SP3"`, `"Resonant"`, `"None"`. |
4140
| `is_in_ring` | Boolean | `true` if the atom is part of any detected ring system. |
4241
| `is_aromatic` | Boolean | `true` if the atom is part of a perceived aromatic system. |
4342
| `is_anti_aromatic` | Boolean | `true` if perception tagged the atom as belonging to an anti-aromatic ring. |
4443
| `is_resonant` | Boolean | `true` if resonance analysis marked the atom as delocalized (e.g., phenoxide oxygen). |
45-
| `smallest_ring_size` | Integer | The size of the smallest ring the atom belongs to (e.g., `5` for furan). |
4644
| **Neighbor-Based Properties** | | Properties derived from the atom's immediate neighbors. |
4745
| `neighbor_elements` | Table | Specifies the **exact counts** of neighboring elements. Atoms not listed are assumed to be zero. |
4846
| `neighbor_types` | Table | Specifies the **exact counts** of the **final assigned types** of neighboring atoms. This is the key condition that enables context-dependent, iterative typing. |
@@ -72,31 +70,33 @@ conditions = { element = "C", neighbor_types = { "C_3" = 1, "H_" = 3 } }
7270

7371
1. **500+** – Exotic safeties and overrides (e.g., diborane bridging hydrogens).
7472
2. **400s** – Delocalized or aromatic atoms (`*_R`, resonance-stabilized heteroatoms) that must outrank geometry-only rules.
75-
3. **100–300**VSEPR-driven workhorses keyed off steric number and hybridization.
76-
4. **<100** – Simple fallbacks such as halogens, alkali/alkaline-earth metals, and default hydrogens.
73+
3. **100–300**Hybridization-driven workhorses keyed off `hybridization` (SP, SP2, SP3, Resonant).
74+
4. **<100** – Simple fallbacks such as halogens, alkali/alkaline-earth metals, hydrogen-bonding hydrogens, and default hydrogens.
7775

7876
Representative entries are summarized below (table retained for quick reference):
7977

80-
| Atom Type | DREIDING Description | Key Rule Condition(s) in `dreiding.rules.toml` | Priority |
81-
| :---------------- | :---------------------------- | :---------------------------------------------------------------- | :------: |
82-
| `H_` | Standard Hydrogen | `{ element = "H" }` | 1 |
83-
| `H_HB` | Hydrogen-Bonding Hydrogen | `{ element = "H", neighbor_elements = { O = 1 } }` or `{ N = 1 }` | ~250 |
84-
| `H_b` | Bridging Hydrogen (Diborane) | `{ degree = 2, neighbor_elements = { B = 2 } }` | 500 |
85-
| `C_3` | sp³ Tetrahedral Carbon | `{ element = "C", steric_number = 4 }` | 100 |
86-
| `C_2` | sp² Trigonal Carbon | `{ element = "C", steric_number = 3, is_aromatic = false }` | 200 |
87-
| `C_1` | sp Linear Carbon | `{ element = "C", steric_number = 2 }` | 300 |
88-
| `C_R` | Resonant/Aromatic Carbon | `{ element = "C", is_aromatic = true }` | 400 |
89-
| `N_3` | sp³ Nitrogen (Amine/Ammonium) | `{ element = "N", steric_number = 4 }` | 100 |
90-
| `N_2` | sp² Nitrogen (Imine/Amide) | `{ element = "N", steric_number = 3, is_aromatic = false }` | 200 |
91-
| `N_R` | Resonant/Aromatic Nitrogen | `{ element = "N", is_aromatic = true }` | 400 |
92-
| `O_3` | sp³ Oxygen (Ether/Alcohol) | `{ element = "O", steric_number = 4 }` | 100 |
93-
| `O_2` | sp² Oxygen (Carbonyl) | `{ element = "O", steric_number = 3 }` | 200 |
94-
| `O_R` | Resonant Oxygen (Phenol) | `{ element = "O", hybridization = "Resonant" }` | 401 |
95-
| `S_R` | Resonant Sulfur (Thiophene) | `{ element = "S", hybridization = "Resonant" }` | 400 |
96-
| `P_3` | sp³ Phosphorus (Phosphate) | `{ element = "P", steric_number = 4 }` | 100 |
97-
| `S_3` | sp³ Sulfur (Thiol/Sulfide) | `{ element = "S", hybridization = "SP3" }` | 100 |
98-
| `F_`, `Cl_`, etc. | Halogens | `{ element = "F" }`, etc. | 50 |
99-
| `Na`, `Ca`, etc. | Metal Ions | `{ element = "Na" }`, etc. | 20 |
78+
| Atom Type | DREIDING Description | Key Rule Condition(s) in `default.rules.toml` | Priority |
79+
| :--------------------- | :---------------------------- | :---------------------------------------------------------------- | :------: |
80+
| `H_` | Standard Hydrogen | `{ element = "H" }` | 1 |
81+
| `H_HB` | Hydrogen-Bonding Hydrogen | `{ element = "H", neighbor_elements = { O = 1 } }` or `{ N = 1 }` | 78–80 |
82+
| `H_b` | Bridging Hydrogen (Diborane) | `{ element = "H", degree = 2, neighbor_elements = { B = 2 } }` | 500 |
83+
| `C_3` | sp³ Tetrahedral Carbon | `{ element = "C", hybridization = "SP3" }` | 100 |
84+
| `C_2` | sp² Trigonal Carbon | `{ element = "C", hybridization = "SP2" }` | 200 |
85+
| `C_1` | sp Linear Carbon | `{ element = "C", hybridization = "SP" }` | 300 |
86+
| `C_R` | Resonant/Aromatic Carbon | `{ element = "C", hybridization = "Resonant" }` | 400 |
87+
| `N_3` | sp³ Nitrogen (Amine/Ammonium) | `{ element = "N", hybridization = "SP3" }` | 100 |
88+
| `N_2` | sp² Nitrogen (Imine/Amide) | `{ element = "N", hybridization = "SP2" }` | 200 |
89+
| `N_1` | sp Linear Nitrogen (Nitrile) | `{ element = "N", hybridization = "SP" }` | 300 |
90+
| `N_R` | Resonant/Aromatic Nitrogen | `{ element = "N", hybridization = "Resonant" }` | 400 |
91+
| `O_3` | sp³ Oxygen (Ether/Alcohol) | `{ element = "O", hybridization = "SP3" }` | 100 |
92+
| `O_2` | sp² Oxygen (Carbonyl) | `{ element = "O", hybridization = "SP2" }` | 200 |
93+
| `O_R` | Resonant Oxygen (Phenol) | `{ element = "O", hybridization = "Resonant" }` | 400 |
94+
| `S_3` | sp³ Sulfur (Thiol/Sulfide) | `{ element = "S", hybridization = "SP3" }` | 100 |
95+
| `S_2` | sp² Sulfur (Thioketone) | `{ element = "S", hybridization = "SP2" }` | 200 |
96+
| `S_R` | Resonant Sulfur (Thiophene) | `{ element = "S", hybridization = "Resonant" }` | 400 |
97+
| `P_3` | sp³ Phosphorus (Phosphate) | `{ element = "P", hybridization = "SP3" }` | 100 |
98+
| `F_`, `Cl`, `Br`, `I_` | Halogens | `{ element = "F" }`, etc. | 50 |
99+
| `Na`, `Ca`, etc. | Metal Ions | `{ element = "Na" }`, etc. | 20 |
100100

101101
## How to Extend the Rule System
102102

@@ -116,15 +116,17 @@ conditions = { element = "Cu", formal_charge = 2 }
116116
```
117117

118118
```rust
119-
use dreid_typer::{assign_topology_with_rules, rules, MolecularGraph, MolecularTopology, TyperError};
119+
use dreid_typer::{assign_topology_with_rules, rules::{get_default_rules, parse_rules}, MolecularGraph, MolecularTopology, TyperError};
120120

121121
fn type_with_custom_rules(graph: &MolecularGraph) -> Result<MolecularTopology, TyperError> {
122-
// Load the default ruleset.
123-
let mut all_rules = rules::parse_rules(include_str!("dreiding.rules.toml"))?;
122+
// Load custom rules from embedded TOML.
123+
let custom_rules = parse_rules(include_str!("my_copper_rules.toml"))?;
124124

125-
// Append or override entries programmatically.
126-
all_rules.extend(rules::parse_rules(include_str!("my_copper_rules.toml"))?);
125+
// Combine with default rules (custom rules extend the defaults).
126+
let mut all_rules = get_default_rules().to_vec();
127+
all_rules.extend(custom_rules);
127128

129+
// Run the pipeline with the combined ruleset.
128130
assign_topology_with_rules(graph, &all_rules)
129131
}
130132
```

0 commit comments

Comments
 (0)