by Sebastian Pagel, Abhishek Sharma, Lee Cronin
We use the framework of Assembly Theory to quantify the selection induced by the evolutionary machine on earth in chemical spaces using large scale chemical databases as our model system. Additionally, we introduce a new framework for generating novel molecules based upon Joint-Assembly-Spaces and present an exmaple how novel drug-like molecules can be generated given our framework.
Evolution is often understood through genetic mutations driving changes in an organism's fitness, but there is potential to extend this understanding beyond the genetic code. We propose that natural products—complex molecules central to Earth's biochemistry—can be used to uncover evolutionary mechanisms beyond genes. By applying Assembly Theory (AT), which views selection as a process not limited to biological systems, we can map and measure evolutionary forces in these molecules. AT enables the exploration of the assembly space of natural products, demonstrating how the principles of evolution apply to these complex chemical structures, selecting vastly improbable and complex molecules from a vast space of possibilities. By comparing natural products with a broader molecular database, we can assess the degree of evolutionary contingency, providing insight into how molecular novelty emerges and persists. This approach not only quantifies evolutionary selection at the molecular level but also offers a new avenue for drug discovery by exploring the molecular assembly spaces of natural products. Our method provides a fresh perspective on measuring the evolutionary processes both, shaping and being read out, by the molecular imprint of selection.
src/: Core source code for the project.assembler/: Modules for molecular assembly, including Joint Assembly Space (JAS) and molecule generation.mol_filter/: Tools for filtering molecules based on geometry and properties.plotting/: Visualization utilities.
notebooks/: Jupyter notebooks demonstrating workflows (e.g., JAS creation, molecular generation).data/: Example datasets and figures.
This project uses uv for dependency management.
-
Clone the repository:
git clone https://github.com/croningp/molecular_spaces cd molecular_spaces -
Install dependencies:
uv sync
This command automatically creates the virtual environment and installs the project in editable mode along with all dependencies.
To run the full assembly calculations, you will need AssemblyGo or AssemblyCpp installed and in your PATH.
The easiest way to generate new molecules where data/tutorial/example_molecules/opiods.txt has some example molecules to construct a JAS from (be aware if you provide a large set of molecules to generate from the JAS calculation might be quite heavy!).
-
Build the image:
docker build -t mol-spaces . -
Run the CLI tool: Mount your current directory (
$(pwd)) to process local files.docker run --rm -v $(pwd):/app mol-spaces \ molecular-assembler -f data/tutorial/example_molecules/opiods.txt -o out.smi
Requires AssemblyCpp to be installed and in your PATH.
-
Install with
uv:uv sync
-
Run the tool:
uv run molecular-assembler -f data/tutorial/example_molecules/opiods.txt -o out.smi
The project requires Python 3.12+. Key dependencies include:
rdkitnetworkxpandasmatplotlib/seabornpython-igraph
See pyproject.toml for the full list.
The large dataset used in the manuscript to generate figure is available on reasonable request (>1TB).
If you use this code or the data in your research, please cite:
Pagel, S., Sharma, A., & Cronin, L. (2024). Mapping Evolution of Molecules Across Biochemistry with Assembly Theory. arXiv preprint arXiv:2409.05993.
BibTeX entry:
@article{pagel2024mapping,
title={Mapping Evolution of Molecules Across Biochemistry with Assembly Theory},
author={Pagel, Sebastian and Sharma, Abhishek and Cronin, Leroy},
journal={arXiv preprint arXiv:2409.05993},
year={2024},
url={https://arxiv.org/abs/2409.05993}
}All source code is made available under a BSD 3-clause license. You can freely
use and modify the code, without warranty, so long as you provide attribution
to the authors. See LICENSE.md for the full license text.
The manuscript text is not open source. The authors reserve the rights to the article content.
