Physics-informed modeling and visualization of cosmic structure evolution
- Overview
- Highlights
- Quick start
- Installation & Dependencies
- Data format & schema
- Project structure
- Model architecture
- Training & Evaluation
- Visualization (UI)
- Experiments & reproducibility
- Development workflow
- Roadmap & future work
- Cite / License / Contact
This repository implements a Space-Time Graph Neural Network (ST-GNN) for modeling the large-scale structure of the universe. Galaxies are treated as nodes in a dynamic, time-evolving graph and the model predicts their spatio-kinematic evolution while enforcing physics-informed constraints (e.g., mass conservation and kinetic energy regularization).
The codebase contains a reproducible training pipeline, lightweight EDA notebooks, a Three.js-based visualization for results, and utilities to reproduce experiments.
- Physics-informed loss terms (mass conservation, velocity smoothness, kinetic energy regularization). 🔬
- Encoder–GRU–Decoder ST-GNN combining spatial GCN layers and a temporal GRU core. ⏱️
- Interactive 3D visualization using Three.js (UI served from
ui/). 🖥️ - Clean separation between data, model, training, and UI for reproducibility. 🔁
- Clone repository
git clone <repo-url>
cd st-gnn-cosmic-structure- Create environment and install
Windows:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtUnix/macOS:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt- Run training (example)
python main.py --config configs/train_default.yaml- Start visualization UI
cd ui
python -m http.server 8000
# then open http://localhost:8000 in your browserNotes:
- For quick experiments you can use the small dataset in
data/raw/galaxies.csvor provide a path to a larger SDSS-style CSV. - Results and predictions are stored under
experiments/by default.
Primary dependencies are listed in requirements.txt. Key packages:
- torch
- torch-geometric
- pandas
- numpy
- matplotlib
For GPU training install the appropriate PyTorch wheel for your CUDA version: https://pytorch.org/get-started/locally/
Expected input: SDSS-style CSV with one row per galaxy per timestamp (or snapshot). Minimal required columns:
id(optional but recommended): unique galaxy identifierra,dec: sky coordinates (degrees)redshift: temporal proxymass: galaxy mass (float)luminosity: optionalvx,vy,vz: velocity components (km/s)
Example row:
ra,dec,redshift,mass,luminosity,vx,vy,vz
210.8,54.3,0.12,1.3e10,2.1e9,120,-30,45
Data loader responsibilities (src/data/load_sdss.py):
- validation of required columns
- normalization/scaling of physical units
- conversion of sky coordinates -> 3D cartesian positions (optional)
- building spatial graphs (e.g., k-NN, radius search) via
src/graph/build_graph.py
├── data/ # raw datasets
│ └── raw/galaxies.csv
├── notebooks/ # EDA and quick visualization
├── src/ # main code
│ ├── data/ # loaders & preprocessing
│ ├── graph/ # graph builders (kNN, radius graphs)
│ ├── models/ # ST-GNN implementation
│ ├── physics/ # physics-inspired loss terms
│ └── training/ # training loop, schedulers, metrics
├── ui/ # Three.js visualizer
├── experiments/ # predictions, checkpoints, logs
├── main.py # high-level entrypoint
└── README.md
The ST-GNN follows an Encoder → Temporal Module → Decoder pattern:
- Encoder: GCN (graph convolution) layers transform raw node features into latent embeddings using local neighbor information.
- Temporal Module: a GRU operates on per-node embeddings across time steps to capture temporal dynamics.
- Decoder: GCN layers map temporal embeddings back into target physical quantities (position/velocity/mass predictions).
Loss function = Prediction loss (L2) + λ1 * MassConservationLoss + λ2 * VelocitySmoothnessLoss + λ3 * EnergyRegularization
Hyperparameters and variants can be configured in the configs/ directory (add config support if missing).
- Entry point:
main.py(orsrc/training/train.pyfor lower-level usage). - Typical command:
python main.py --config configs/train_default.yaml - Key training details:
- Optimizer: Adam
- Learning rate scheduling, early stopping support
- Logging to
experiments/<run-id>/including checkpoints and predictions
Evaluation metrics include MSE on velocities/positions and physics-aware metrics (mass conservation error, energy drift).
Tips:
- Use smaller subsets to validate pipelines before full runs.
- Fix random seeds (
torch.manual_seed,numpy.random.seed) for reproducibility.
The UI is a lightweight Three.js app in ui/ that consumes experiments/<run-id>/predictions.json and renders:
- 3D galaxy positions (time slider for redshift snapshots)
- Size or color mapped to
massorluminosity - Camera controls and play/pause
Run locally:
cd ui
python -m http.server 8000
# open http://localhost:8000- All experiment outputs (predictions, metrics, checkpoints) should be saved under
experiments/<run-id>/. - Use a
configs/file per experiment and commit it to reproduce runs. - To reproduce a past run, copy the config and checkpoint from
experiments/<run-id>/and run:
python main.py --config experiments/<run-id>/config.yaml --checkpoint experiments/<run-id>/checkpoint.pt- Write unit tests for new preprocessing/model code.
- Keep notebooks thin; move complex logic into
src/. - Follow deterministic seeds for CI tests that depend on model outputs.
Suggested tools: pytest, black, flake8.
Planned improvements:
- Gravity-weighted / physics-driven edge weights
- Support for multi-timestep sequences and long short-term dependencies
- Full SDSS/DES ingestion pipeline
- Enhanced UI controls (time slider, selection, coloring options)
- Model compression / quantization for faster inference
- Author: Anurag Lal