Skip to content

Commit dd01bc3

Browse files
authored
Merge pull request #12 from caltechmsc/bugfix/11-core-typing-logic-incorrectly-equates-_r-type-with-aromaticity-failing-to-type-non-aromatic-resonance-systems
refactor(perception): Overhaul Chemical Perception for Full Resonance Modeling
2 parents 735d9f5 + 07e2bf9 commit dd01bc3

38 files changed

+6806
-5547
lines changed

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "dreid-typer"
3-
version = "0.2.0"
3+
version = "0.2.1"
44
authors = [
55
"Tony Kan <tonykan@caltech.edu>",
66
"William A. Goddard III <wag@caltech.edu>",
@@ -20,8 +20,10 @@ categories = ["science", "simulation"]
2020
readme = "README.md"
2121

2222
[dependencies]
23+
thiserror = "2.0.17"
2324
toml = "0.9.7"
2425
serde = { version = "1.0.188", features = ["derive"] }
26+
pauling = "0.1.0"
2527

2628
[lib]
2729
name = "dreid_typer"

README.md

Lines changed: 41 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,70 @@
11
# DreidTyper
22

3-
**DreidTyper** is a high-performance, foundational software library for the automated assignment of DREIDING force field atom types and the perception of molecular topologies. It provides a modern, robust solution for translating simple chemical connectivity (a molecular graph) into a complete, engine-agnostic topological description essential for molecular simulations. This library is engineered from the ground up in **Rust** for exceptional performance, memory safety, and strict adherence to the principles of modular software design.
3+
**DreidTyper** is a Rust library that turns a minimal `MolecularGraph` (atoms + bonds) into a fully typed, DREIDING-compatible topology. The pipeline is deterministic, aggressively validated, and designed for integrators who need trustworthy chemistry without shipping their own perception code.
44

5-
The core mission of DreidTyper is to provide a reliable, predictable, and easy-to-integrate tool for developers and researchers building the next generation of simulation tools for general chemistry, materials science, and drug discovery.
5+
At a high level the library walks through:
6+
7+
1. **Perception:** six ordered passes (rings → Kekulé expansion → electron bookkeeping → aromaticity → resonance → hybridization) that upgrade raw connectivity into a rich `AnnotatedMolecule`.
8+
2. **Typing:** an iterative, priority-sorted rule engine that resolves the final DREIDING atom label for every atom.
9+
3. **Building:** a pure graph traversal that emits canonical bonds, angles, and torsions as a `MolecularTopology`.
610

711
## Features
812

9-
- **DREIDING Atom Typing**: Assigns canonical DREIDING atom types from molecular connectivity.
10-
- **Full Topology Perception**: Identifies bonds, angles, and proper/improper dihedrals.
11-
- **Memory Safe & Fast**: Built in Rust for guaranteed memory safety and high performance.
12-
- **Rule-Based Engine**: Uses a clear TOML-based rule system for atom typing logic.
13-
- **Engine-Agnostic**: Produces a pure topological representation independent of any MD engine.
13+
- **Chemically faithful perception:** built-in algorithms cover SSSR ring search, strict Kekulé expansion, charge/lone pair templates for heteroatoms, aromaticity categorization (including anti-aromatic detection), resonance propagation, and hybridization inference.
14+
- **Deterministic typing engine:** TOML rules are sorted by priority and evaluated until a fixed point, making neighbor-dependent rules (e.g., `H_HB`) converge without guesswork.
15+
- **Engine-agnostic topology:** outputs canonicalized bonds, angles, proper and improper dihedrals ready for any simulator that consumes DREIDING-style terms.
16+
- **Extensible ruleset:** ship with curated defaults (`resources/default.rules.toml`) and load or merge custom rule files at runtime.
17+
- **Rust-first ergonomics:** zero `unsafe`, comprehensive unit/integration tests, and precise error variants for validation, perception, and typing failures.
1418

1519
## Getting Started
1620

17-
To get started with DreidTyper, add it as a dependency in your `Cargo.toml`:
21+
Add the crate to your `Cargo.toml`:
1822

1923
```toml
2024
[dependencies]
21-
dreid-typer = "0.2.0"
25+
dreid-typer = "0.2.1"
2226
```
2327

24-
Then, you can use it in your Rust code as follows:
28+
Run the full pipeline from connectivity to topology:
2529

2630
```rust
27-
use dreid_typer::{
28-
assign_topology, MolecularGraph, MolecularTopology,
29-
Element, BondOrder,
30-
};
31+
use dreid_typer::{assign_topology, Element, MolecularGraph, MolecularTopology, BondOrder};
3132

32-
// 1. Define the molecule's connectivity using a `MolecularGraph`.
3333
let mut graph = MolecularGraph::new();
34-
let c1 = graph.add_atom(Element::C); // CH3
35-
let c2 = graph.add_atom(Element::C); // CH2
34+
let c1 = graph.add_atom(Element::C);
35+
let c2 = graph.add_atom(Element::C);
3636
let o = graph.add_atom(Element::O);
37-
let h_c1_1 = graph.add_atom(Element::H);
38-
let h_c1_2 = graph.add_atom(Element::H);
39-
let h_c1_3 = graph.add_atom(Element::H);
40-
let h_c2_1 = graph.add_atom(Element::H);
41-
let h_c2_2 = graph.add_atom(Element::H);
4237
let h_o = graph.add_atom(Element::H);
38+
let h_atoms: Vec<_> = (0..6).map(|_| graph.add_atom(Element::H)).collect();
4339

4440
graph.add_bond(c1, c2, BondOrder::Single).unwrap();
4541
graph.add_bond(c2, o, BondOrder::Single).unwrap();
46-
graph.add_bond(c1, h_c1_1, BondOrder::Single).unwrap();
47-
graph.add_bond(c1, h_c1_2, BondOrder::Single).unwrap();
48-
graph.add_bond(c1, h_c1_3, BondOrder::Single).unwrap();
49-
graph.add_bond(c2, h_c2_1, BondOrder::Single).unwrap();
50-
graph.add_bond(c2, h_c2_2, BondOrder::Single).unwrap();
5142
graph.add_bond(o, h_o, BondOrder::Single).unwrap();
52-
53-
// 2. Call the main function to perceive the topology.
54-
let topology: MolecularTopology = assign_topology(&graph).unwrap();
55-
56-
// 3. Inspect the results.
57-
assert_eq!(topology.atoms.len(), 9);
58-
assert_eq!(topology.bonds.len(), 8);
59-
assert_eq!(topology.angles.len(), 13);
60-
assert_eq!(topology.proper_dihedrals.len(), 12);
61-
62-
// Check the assigned DREIDING atom types.
63-
assert_eq!(topology.atoms[c1].atom_type, "C_3"); // sp3 Carbon
64-
assert_eq!(topology.atoms[c2].atom_type, "C_3"); // sp3 Carbon
65-
assert_eq!(topology.atoms[o].atom_type, "O_3"); // sp3 Oxygen
66-
assert_eq!(topology.atoms[h_o].atom_type, "H_HB"); // Hydrogen-bonding Hydrogen
67-
assert_eq!(topology.atoms[h_c1_1].atom_type, "H_"); // Standard Hydrogen
43+
for (carbon, chunk) in [c1, c2].into_iter().zip(h_atoms.chunks(3)) {
44+
for &hydrogen in chunk {
45+
graph.add_bond(carbon, hydrogen, BondOrder::Single).unwrap();
46+
}
47+
}
48+
49+
let topology: MolecularTopology = assign_topology(&graph).expect("perception + typing succeed");
50+
51+
assert_eq!(topology.atoms[c1].atom_type, "C_3");
52+
assert_eq!(topology.atoms[c2].atom_type, "C_3");
53+
assert_eq!(topology.atoms[o].atom_type, "O_3");
54+
assert_eq!(topology.atoms[h_o].atom_type, "H_HB");
6855
```
6956

70-
> **Note**: This is a simplified example. For more complex molecules and edge cases, please refer to the [API Documentation](https://docs.rs/dreid-typer).
57+
Need custom chemistry? Parse a TOML file and run the same API:
58+
59+
```rust
60+
use dreid_typer::{assign_topology_with_rules, rules, MolecularGraph};
61+
62+
let mut ruleset = rules::get_default_rules().to_vec();
63+
let extra = std::fs::read_to_string("my_metals.rules.toml")?;
64+
ruleset.extend(rules::parse_rules(&extra)?);
65+
66+
let topology = assign_topology_with_rules(&graph, &ruleset)?;
67+
```
7168

7269
## Documentation
7370

docs/01_pipeline.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,21 @@ pub struct MolecularGraph {
2626
}
2727
```
2828

29-
### 1.2 `ProcessingGraph`: The Internal Workspace
29+
### 1.2 `AnnotatedMolecule`: The Internal Workspace
3030

31-
Once a `MolecularGraph` enters the pipeline, it is immediately converted into a `ProcessingGraph`. This is the most complex data structure in the library, serving as the central, chemically-aware workspace for the core algorithms.
31+
Once a `MolecularGraph` enters the pipeline, it is immediately converted into an `AnnotatedMolecule` (defined in `perception::model`). This is the most complex data structure in the library, serving as the central, chemically-aware workspace for the core algorithms.
3232

3333
- **Purpose:** To hold a rich, comprehensive set of perceived chemical properties for every atom. It is the single source of truth for the typing and building phases.
3434
- **Structure:**
35-
- A list of `AtomView`s, where each view contains numerous fields:
35+
- A list of `AnnotatedAtom`s, where each entry contains numerous fields:
3636
- Intrinsic properties (`element`, `formal_charge`).
3737
- Topological properties (`degree`, `is_in_ring`, `smallest_ring_size`).
3838
- Electronic properties (`lone_pairs`, `steric_number`, `hybridization`).
3939
- Special properties (`is_aromatic`, `perception_source`).
4040
- An adjacency list for efficient neighbor traversal.
4141
- **Design Rationale:**
4242
- **Centralized Knowledge:** By pre-calculating and storing all relevant properties in one place, the subsequent typing and building phases can be implemented as efficient, stateless queries against this data structure. This avoids redundant calculations.
43-
- **Factual Immutability:** The `ProcessingGraph` is constructed once during the **Perception Phase** and is treated as a read-only object thereafter. This immutability guarantees that the typing engine operates on a consistent and deterministic chemical context.
43+
- **Factual Immutability:** The `AnnotatedMolecule` is constructed once during the **Perception Phase** and is treated as a read-only object thereafter. This immutability guarantees that the typing engine operates on a consistent and deterministic chemical context.
4444

4545
### 1.3 `MolecularTopology`: The Final Output
4646

@@ -65,7 +65,7 @@ graph LR
6565
end
6666
6767
subgraph "Internal Processing"
68-
B(<b>ProcessingGraph</b><br><i>Complex & Chemically-Aware</i>)
68+
B(<b>AnnotatedMolecule</b><br><i>Complex & Chemically-Aware</i>)
6969
end
7070
7171
subgraph "Engine-Ready Output"
@@ -76,10 +76,10 @@ graph LR
7676
B -- "<b>Phase 2 & 3: Typing & Building</b><br>Information is queried and transformed" --> C
7777
```
7878

79-
1. **Input to Workspace (`MolecularGraph` -> `ProcessingGraph`):**
80-
The `perceive` function acts as the constructor for the `ProcessingGraph`. It takes the minimal `MolecularGraph` and performs all necessary chemical computations to build a fully annotated, "intelligent" graph. This is the most computationally intensive part of the process, where raw data is converted into chemical knowledge.
79+
1. **Input to Workspace (`MolecularGraph` -> `AnnotatedMolecule`):**
80+
The `perception::perceive` function acts as the constructor for the `AnnotatedMolecule`. It takes the minimal `MolecularGraph` and performs all necessary chemical computations to build a fully annotated, "intelligent" graph. This is the most computationally intensive part of the process, where raw data is converted into chemical knowledge.
8181

82-
2. **Workspace to Output (`ProcessingGraph` -> `MolecularTopology`):**
83-
The `assign_types` and `build_topology` functions work in concert to transform the rich `ProcessingGraph` into the final, lean `MolecularTopology`. This stage is not about discovering new information, but rather about **querying** the existing knowledge and **formatting** it according to the rules of the DREIDING model. The typing engine queries atomic properties to assign types, and the builder queries connectivity to generate geometric terms.
82+
2. **Workspace to Output (`AnnotatedMolecule` -> `MolecularTopology`):**
83+
The `typing::engine::assign_types` and `builder::build_topology` functions work in concert to transform the rich `AnnotatedMolecule` into the final, lean `MolecularTopology`. This stage is not about discovering new information, but rather about **querying** the existing knowledge and **formatting** it according to the rules of the DREIDING model. The typing engine queries atomic properties to assign types, and the builder queries connectivity to generate geometric terms.
8484

8585
By strictly separating these data representations, `dreid-typer` achieves a clean architecture that is both robust and easy to reason about.

0 commit comments

Comments
 (0)