Python package for generating synthetic datasets of the cellular context for Cryo-Electron Tomography.
- IMOD must be installed on the system since PolNet calls to some of its standalone commands: https://bio3d.colorado.edu/imod/doc/guide.html
- Miniconda or Anaconda with Python 3.
- Git.
- IMOD can be used for MRC files visualization. Paraview can be used for VTK (.vtp) files visualization. Pandas is recommended for managing the CSV files.
Here is how to get it installed:
-
Download PolNet source code:
git clone https://github.com/anmartinezs/polnet.git cd polnet
-
Create a conda virtual environment
conda create --name polnet pip conda activate polnet
-
Install PolNet package with its requirements:
pip install -e .
For developers who do not want to install PolNet in the virtual environment as a package, you can only install the requirements by:
pip install -r requirements.txt
You can check all requirements in the requirements.txt file (JAX is optional).
The installation has been tested in Ubuntu 22.04 and Windows 10 and 11.
Perfect — here’s a GitHub-ready Markdown version of your README section.
You can copy-paste this directly into your README.md file, and it will render correctly on GitHub with proper headings, indentation, and code blocks:
This repository contains a Python script for generating tomograms simulating various features such as membranes, helicoidal fibers, globular protein clusters, and more. The script is highly configurable and allows users to specify parameters via command-line arguments.
Simulates tomograms with:
- Membranes (spherical, elliptical, toroidal)
- Helicoidal fibers (actin, microtubules)
- Globular protein clusters
- Membrane-bound proteins
Outputs:
- Simulated density maps (
.mrc) - Polydata files (
.vtp) - STAR file mapping particle coordinates and orientations with tomograms
- Configurable via command-line arguments
- Includes logging and input file previews
python all_features_argument.pypython all_features_argument.py --out_dir /path/to/outputpython all_features_argument.py --voi_shape 1024 1024 250 --ntomos 10python all_features_short_sn_parallel.py --out_dir /path/to/outputThe script automatically previews input files and saves them as .tar.gz archives in the output directory.
- Simulated density maps (
.mrc) - Polydata files (
.vtp) - STAR file mapping particle coordinates and orientations with tomograms
simulation-output_<job_id>.log— General log messagessimulation_<job_id>_error.log— Error messages
The script generates detailed statistics for each simulated tomogram, including:
- Number of membranes, actin, microtubules, proteins, and membrane-bound proteins
- Volume occupied by each feature
- Total time taken for the simulation
Folder docs contains the file default_settings.pdf, it describes the defaults settings for the hardcoded script to generate synthetic tomogram scripts/data_gen/all_features.py.
In addition, table in docs/molecules_table.md contains more detailed descriptions of the PDB models used to create macromolecular models provided in data folder.
- polnet: python package with the Python implemented functionality for generating the synthetic data.
- gui: set of Jupyter notebooks with Graphic User Interface (GUI).
- core: functionality required by the notebooks.
- scripts: python scripts for generating different types of synthetic datasets. Folders:
- data_gen: scripts for data generation.
- deprecated: contains
some scripts for evaluations carried out during the software development, they are not prepared for external users
because some hardcoded paths need to be modified.
- templates: scripts for building the structural units for macromolecules (requires the installation EMAN2). Their usage is strongly deprecated, now GUI notebooks include all functionality.
- deprecated: contains
some scripts for evaluations carried out during the software development, they are not prepared for external users
because some hardcoded paths need to be modified.
- csv: scripts for postprocessing the CSV generated files.
- data_prep: script to convert the generated dataset in nn-UNet format.
- data_gen: scripts for data generation.
- tests: unit tests for functionalities in polnet. The script tests/test_transformations.py requires to generate at least 1 output tomo with the script scripts/all_features.py and modified the hardcoded input paths, that is because the size of the input data avoid to upload them to the repository.
- data: contains input data, mainly macromolecules densities and configuration input files, that con be used to simulate tomograms. These are the default input, an user can add/remove/modify these input data using the notebooks in GUI.
- in_10A: input models for macromolecules at 10A voxel size.
- in_helix: input models for helical structures.
- in_mbsx: input models for membrane structures.
- tempaltes: atomic models and density maps used by macromolecular models.
- docs:
- API documentation.
- A PDF with the suplementary material for [1] with the next tables:
- Glossary of acronyms by order of appearance in the main text.
- Glossary mathematical symbols defined in the main text organized by scope
- Table Variables used by the input files to model the generators.
- Table with the structures used to simulate the cellular context.
The API documentation for polnet Python package is available in docs/apidoc/index.html
[1] Martinez-Sanchez A.*, and Lamm L., Jasnin M. and Phelippeau H. (2024) "Simulating the cellular context in synthetic datasets for cryo-electron tomography" IEEE Transactions on Medical Imaging 10.1109/TMI.2024.3398401