You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**atomworks** is an open-source platform for next-generation biomolecular data processing, conversion, and machine-learning-ready featurization.
10
-
It is composed of two symbiotic libraries:
9
+
**atomworks** is an open-source platform that maximizes research velocity for biomolecular modeling tasks. Much like how [Torchdata](https://docs.pytorch.org/data/beta/index.html) enables rapid prototyping within the vision and language domains, AtomWorks aims to accelerate development and experimentation within biomolecular modeling.
11
10
12
-
-**atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standards like mmCIF, PDB, FASTA, SMILES, MOL, and more.
13
-
-**atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows—using atomworks.io as its structural backbone.
11
+
If you're looking for the models themselves (e.g., RF3, MPNN) that integrate with AtomWorks rather than the underlying framework, check out [ModelForge](https://github.com/RosettaCommons/modelforge)
14
12
15
-
The atomworks ecosystem is designed to eliminate the pain of file conversion and preprocessing, offering scientists and modelers an efficient, unified interface for biomolecular data.
13
+
AtomWorks is composed of two symbiotic libraries:
14
+
15
+
-**atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standard formats like mmCIF, PDB, FASTA, SMILES, MOL, and more.
16
+
-**atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows that uses `atomworks.io` as its structural backbone. We provide a comprensive, pre-built and well-tested set of `Transforms` for common tasks that can be easily composed into full deep-learning pipelines; users may also create their own `Transforms` for custom operations.
17
+
18
+
For more detail on the motivation for and applications of AtomWorks, please see the [preprint](https://doi.org/10.1101/2025.08.14.670328).
19
+
20
+
AtomWorks is built atop [biotite](https://www.biotite-python.org/): We are grateful to the Biotite developers for maintaining such a high-quality and flexible toolkit, and hope that our package will prove a helpful addition to the broader `biotite` community.
16
21
17
22
---
18
23
19
24
## atomworks.io
20
25
21
-
*A swiss-army knife for biomolecular files in Python*
26
+
*A general-purpose Python toolkit for working with biomolecular files*
22
27
23
28
**atomworks.io** lets you:
24
-
- Parse, convert, and clean up any common biological file (structure or sequence).
25
-
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning.
26
-
- Model missing atoms, handle ligands/solvents, resolve naming/assembly heterogeneity—all from Python.
29
+
- Parse, convert, and clean any common biological file (structure or sequence). For example, identifying and removing leaving groups, correcting bond order after nucleophilic addition, fixing charges, parsing covalent geometries, and appropriate treatment of structures with multiple occupancies and ligands at symmetry centers
30
+
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning applications, regardless of initial source
31
+
- Model missing atoms (those implied by the sequence but not represented in the coordinates) and initialize entity- and instance-level annotations (see the [glossary]() for more detail on our composable naming conventions)
27
32
28
-
Instead of juggling dozens of tools or manual curation, simply load your data with atomworks.ioand focus on your research.
33
+
We have found `atomworks.io` to be useful to a general bioinformatics and protein design audience; in many cases, `atomworks.io` can replace bespoke scripts and manual curation, enabling researchers to spend more time testing hypothesis and less time juggling dozens of tools and dependencies.
29
34
30
35
---
31
36
32
37
## atomworks.ml
33
38
34
-
*Advanced dataset featurization and sampling for deep learning workflows*
39
+
*Modular, component-based library for dataset featurization within biomolecular deep learning workflows*
35
40
36
41
**atomworks.ml** provides:
37
-
- Ready-made featurization pipelines for entire datasets
42
+
- A library of pre-built, well-tested `Transforms` that can be slotted into novel pipelines
43
+
- An extensible framework, integrated with `atomworks.io`, to write `Transforms` for arbitrary use cases
44
+
- Scripts to pre-process the PDB or other databases into dataframes appropriate for network training
38
45
- Efficient sampling and batching utilities for training machine learning models
39
-
- Seamless integration with atomworks.io for ML-ready feature engineering
40
-
- Optimized data structures and workflows designed specifically for deep learning applications
41
46
42
-
Built on atomworks.io's structural backbone, atomworks.ml bridges the gap between biological data processing and machine learning pipelines.
47
+
Within the AtomWorks paradigm, the output of each `Transofrm` is not an opaque dictionary with model-specific tensors but instead an updated version of our atom-level structural representation (Biotite's `AtomArray`). Operations within – and between – pipelines thus maintain a common vocabulary of inputs and outputs.
-**chain_info** — Sequences/metadata for each chain
80
85
-**ligand_info** — Ligand annotation & metrics
81
-
-**asym_unit** — Structure (AtomArrayStack)
82
-
-**assemblies** — Built biological assemblies
86
+
-**asym_unit** — Structure (`AtomArrayStack`)
87
+
-**assemblies** — Built biological assemblies (each are their own `AtomArrayStack`)
83
88
-**metadata** — Experimental and source information
84
89
85
90
See [usage examples](https://baker-laboratory.github.io/atomworks-dev/latest/auto_examples/).
@@ -104,3 +109,10 @@ See [usage examples](https://baker-laboratory.github.io/atomworks-dev/latest/aut
104
109
105
110
We welcome improvements!
106
111
Please see the [full documentation](https://baker-laboratory.github.io/atomworks-dev/latest) for contribution guidelines.
112
+
113
+
## Citation
114
+
115
+
If you make use of AtomWorks in your research, please cite:
116
+
117
+
* N. Corley, S. Mathis, R. Krishna, M. S. Bauer, T. R. Thompson, W. Ahern, M. W. Kazman, R. I. Brent, K. Didi, A. Kubaney, L. McHugh, A. Nagle, A. Favor, M. Kshirsagar, P. Sturmfels, Y. Li, J. Butcher, B. Qiang, L. L. Schaaf, R. Mitra, K. Campbell, O. Zhang, R. Weissman, I. R. Humphreys, Q. Cong, J. Funk, S. Sonthalia, P. Lio, D. Baker, F. DiMaio,
118
+
"Accelerating Biomolecular Modeling with AtomWorks and RF3," bioRxiv, August 2025. doi: [10.1101/2025.08.14.670328](https://doi.org/10.1101/2025.08.14.670328)
0 commit comments