Skip to content

Commit 3af2294

Browse files
authored
docs: README
docs: update README
2 parents 986a582 + c0383ed commit 3af2294

File tree

1 file changed

+29
-17
lines changed

1 file changed

+29
-17
lines changed

README.md

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,40 +6,45 @@
66

77
<img src="docs/_static/atomworks_logo_color.svg" width="450" alt="atomworks logo">
88

9-
**atomworks** is an open-source platform for next-generation biomolecular data processing, conversion, and machine-learning-ready featurization.
10-
It is composed of two symbiotic libraries:
9+
**atomworks** is an open-source platform that maximizes research velocity for biomolecular modeling tasks. Much like how [Torchdata](https://docs.pytorch.org/data/beta/index.html) enables rapid prototyping within the vision and language domains, AtomWorks aims to accelerate development and experimentation within biomolecular modeling.
1110

12-
- **atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standards like mmCIF, PDB, FASTA, SMILES, MOL, and more.
13-
- **atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows—using atomworks.io as its structural backbone.
11+
If you're looking for the models themselves (e.g., RF3, MPNN) that integrate with AtomWorks rather than the underlying framework, check out [ModelForge](https://github.com/RosettaCommons/modelforge)
1412

15-
The atomworks ecosystem is designed to eliminate the pain of file conversion and preprocessing, offering scientists and modelers an efficient, unified interface for biomolecular data.
13+
AtomWorks is composed of two symbiotic libraries:
14+
15+
- **atomworks.io:** A universal Python toolkit for parsing, cleaning, manipulating, and converting biological data (structures, sequences, small molecules). Built on the [biotite](https://www.biotite-python.org/) API, it seamlessly loads and exports between standard formats like mmCIF, PDB, FASTA, SMILES, MOL, and more.
16+
- **atomworks.ml:** Advanced dataset featurization and sampling for deep learning workflows that uses `atomworks.io` as its structural backbone. We provide a comprensive, pre-built and well-tested set of `Transforms` for common tasks that can be easily composed into full deep-learning pipelines; users may also create their own `Transforms` for custom operations.
17+
18+
For more detail on the motivation for and applications of AtomWorks, please see the [preprint](https://doi.org/10.1101/2025.08.14.670328).
19+
20+
AtomWorks is built atop [biotite](https://www.biotite-python.org/): We are grateful to the Biotite developers for maintaining such a high-quality and flexible toolkit, and hope that our package will prove a helpful addition to the broader `biotite` community.
1621

1722
---
1823

1924
## atomworks.io
2025

21-
*A swiss-army knife for biomolecular files in Python*
26+
*A general-purpose Python toolkit for working with biomolecular files*
2227

2328
**atomworks.io** lets you:
24-
- Parse, convert, and clean up any common biological file (structure or sequence).
25-
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning.
26-
- Model missing atoms, handle ligands/solvents, resolve naming/assembly heterogeneity—all from Python.
29+
- Parse, convert, and clean any common biological file (structure or sequence). For example, identifying and removing leaving groups, correcting bond order after nucleophilic addition, fixing charges, parsing covalent geometries, and appropriate treatment of structures with multiple occupancies and ligands at symmetry centers
30+
- Transform all data to a consistent `AtomArray` representation for further analysis or machine learning applications, regardless of initial source
31+
- Model missing atoms (those implied by the sequence but not represented in the coordinates) and initialize entity- and instance-level annotations (see the [glossary]() for more detail on our composable naming conventions)
2732

28-
Instead of juggling dozens of tools or manual curation, simply load your data with atomworks.io and focus on your research.
33+
We have found `atomworks.io` to be useful to a general bioinformatics and protein design audience; in many cases, `atomworks.io` can replace bespoke scripts and manual curation, enabling researchers to spend more time testing hypothesis and less time juggling dozens of tools and dependencies.
2934

3035
---
3136

3237
## atomworks.ml
3338

34-
*Advanced dataset featurization and sampling for deep learning workflows*
39+
*Modular, component-based library for dataset featurization within biomolecular deep learning workflows*
3540

3641
**atomworks.ml** provides:
37-
- Ready-made featurization pipelines for entire datasets
42+
- A library of pre-built, well-tested `Transforms` that can be slotted into novel pipelines
43+
- An extensible framework, integrated with `atomworks.io`, to write `Transforms` for arbitrary use cases
44+
- Scripts to pre-process the PDB or other databases into dataframes appropriate for network training
3845
- Efficient sampling and batching utilities for training machine learning models
39-
- Seamless integration with atomworks.io for ML-ready feature engineering
40-
- Optimized data structures and workflows designed specifically for deep learning applications
4146

42-
Built on atomworks.io's structural backbone, atomworks.ml bridges the gap between biological data processing and machine learning pipelines.
47+
Within the AtomWorks paradigm, the output of each `Transofrm` is not an opaque dictionary with model-specific tensors but instead an updated version of our atom-level structural representation (Biotite's `AtomArray`). Operations within – and between – pipelines thus maintain a common vocabulary of inputs and outputs.
4348

4449
---
4550

@@ -78,8 +83,8 @@ print(chain_id, info["sequence"])
7883
Output includes:
7984
- **chain_info** — Sequences/metadata for each chain
8085
- **ligand_info** — Ligand annotation & metrics
81-
- **asym_unit** — Structure (AtomArrayStack)
82-
- **assemblies** — Built biological assemblies
86+
- **asym_unit** — Structure (`AtomArrayStack`)
87+
- **assemblies** — Built biological assemblies (each are their own `AtomArrayStack`)
8388
- **metadata** — Experimental and source information
8489

8590
See [usage examples](https://baker-laboratory.github.io/atomworks-dev/latest/auto_examples/).
@@ -104,3 +109,10 @@ See [usage examples](https://baker-laboratory.github.io/atomworks-dev/latest/aut
104109

105110
We welcome improvements!
106111
Please see the [full documentation](https://baker-laboratory.github.io/atomworks-dev/latest) for contribution guidelines.
112+
113+
## Citation
114+
115+
If you make use of AtomWorks in your research, please cite:
116+
117+
* N. Corley, S. Mathis, R. Krishna, M. S. Bauer, T. R. Thompson, W. Ahern, M. W. Kazman, R. I. Brent, K. Didi, A. Kubaney, L. McHugh, A. Nagle, A. Favor, M. Kshirsagar, P. Sturmfels, Y. Li, J. Butcher, B. Qiang, L. L. Schaaf, R. Mitra, K. Campbell, O. Zhang, R. Weissman, I. R. Humphreys, Q. Cong, J. Funk, S. Sonthalia, P. Lio, D. Baker, F. DiMaio,
118+
"Accelerating Biomolecular Modeling with AtomWorks and RF3," bioRxiv, August 2025. doi: [10.1101/2025.08.14.670328](https://doi.org/10.1101/2025.08.14.670328)

0 commit comments

Comments
 (0)