Skip to content

Commit e380919

Browse files
committed
Add data download instructions
1 parent 864cd14 commit e380919

File tree

2 files changed

+80
-8
lines changed

2 files changed

+80
-8
lines changed

README.md

Lines changed: 78 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# SpaceCast
22

3-
[![arXiv](https://img.shields.io/badge/arXiv-2509.19605-b31b1b.svg)](https://arxiv.org/abs/2509.19605) [![Linting](https://github.com/fmihpc/spacecast/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/fmihpc/spacecast/actions/workflows/pre-commit.yml)
3+
[![arXiv](https://img.shields.io/badge/arXiv-2509.19605-b31b1b.svg)](https://arxiv.org/abs/2509.19605) [![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/deinal/spacecast-data) [![Linting](https://github.com/fmihpc/spacecast/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/fmihpc/spacecast/actions/workflows/pre-commit.yml)
44

55
![](figures/example_forecast.png)
66

@@ -23,25 +23,81 @@ Use Python 3.10 / 3.11 and
2323

2424
Complete list of packages can be installed with `pip install -r requirements.txt`.
2525

26+
For linting further install `pre-commit install --install-hooks`. Then you can run `pre-commit run --all-files`.
27+
28+
## Quickstart
29+
30+
A small subset of the data is available for easy experimentation. Download with:
31+
```
32+
from huggingface_hub import snapshot_download
33+
34+
snapshot_download(
35+
repo_id="deinal/spacecast-data-small",
36+
repo_type="dataset",
37+
local_dir="data_small"
38+
)
39+
```
40+
41+
Training can then be run immediately on the preprocessed data with readily available graphs.
42+
```
43+
python -m neural_lam.train_model \
44+
--config_path data_small/vlasiator_config.yaml \
45+
--model graph_efm \
46+
...
47+
```
48+
2649
## Data
2750

28-
The data is stored in `Zarr` format on [Zenodo](https://zenodo.org/records/16930055). To create a training-ready dataset with [mllam-data-prep](https://github.com/mllam/mllam-data-prep), run:
51+
The data is stored in [Zarr](https://zarr.dev) format on [Hugging Face](https://huggingface.co/datasets/deinal/spacecast-data).
52+
53+
It can be downloaded to a local `data` directory with:
2954
```
30-
mllam_data_prep data/vlasiator_mdp.yaml
55+
from huggingface_hub import snapshot_download
56+
57+
snapshot_download(
58+
repo_id="deinal/spacecast-data",
59+
repo_type="dataset",
60+
local_dir="data"
61+
)
62+
```
63+
64+
The folder will then follow the assumed structure of neural-lam:
65+
```
66+
data/
67+
├── graph/ - Directory containing graphs for training
68+
├── run_1.zarr/ - Vlasiator run 1 with ρ = 0.5 cm⁻³ solar wind
69+
├── run_2.zarr/ - Vlasiator run 2 with ρ = 1.0 cm⁻³ solar wind
70+
├── run_3.zarr/ - Vlasiator run 3 with ρ = 1.5 cm⁻³ solar wind
71+
├── run_4.zarr/ - Vlasiator run 4 with ρ = 2.0 cm⁻³ solar wind
72+
├── static.zarr/ - Static features x, z, r coordinates
73+
├── vlasiator_config.yaml - Configuration file for neural-lam
74+
├── vlasiator_run_1.yaml - Configuration file for datastore 1, referred to from vlasiator_config.yaml
75+
├── vlasiator_run_2.yaml - Configuration file for datastore 2, referred to from vlasiator_config.yaml
76+
├── vlasiator_run_3.yaml - Configuration file for datastore 3, referred to from vlasiator_config.yaml
77+
└── vlasiator_run_4.yaml - Configuration file for datastore 4, referred to from vlasiator_config.yaml
3178
```
3279

33-
Simple, multiscale, and hierarchical graphs are created and stored in `.pt` format using the following commands:
80+
Preprocess the runs with [mllam-data-prep](https://github.com/mllam/mllam-data-prep), run:
3481
```
35-
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name simple --levels 1 --plot
36-
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name multiscale --levels 3 --plot
37-
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name hierarchical --hierarchical --levels 3 --plot
82+
mllam_data_prep data/vlasiator_run_1.yaml
83+
mllam_data_prep data/vlasiator_run_2.yaml
84+
mllam_data_prep data/vlasiator_run_3.yaml
85+
mllam_data_prep data/vlasiator_run_4.yaml
86+
```
87+
This produces training-ready zarr stores in the data directory.
88+
89+
Simple, multiscale, and hierarchical graphs are included already, but can be created using the following commands:
90+
```
91+
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name simple --levels 1 --coarsen-factor 5 --plot
92+
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name multiscale --levels 3 --coarsen-factor 5 --plot
93+
python -m neural_lam.create_graph --config_path data/vlasiator_config.yaml --name hierarchical --levels 3 --coarsen-factor 5 --hierarchical --plot
3894
```
3995

4096
To plot the graphs and store as `.html` files run:
4197
```
4298
python -m neural_lam.plot_graph --datastore_config_path data/vlasiator_config.yaml --graph ...
4399
```
44-
with `--graph` as `simple`, `multiscale` or `hierarchcial` and `--save` is the name of the output file.
100+
with `--graph` as `simple`, `multiscale` or `hierarchcial` and `--save` specifies the name of the output file.
45101

46102
## Logging
47103

@@ -107,6 +163,19 @@ where a model checkpoint from a given path given to the `--load` in `.ckpt` form
107163

108164
## Cite
109165

166+
ML dataset
167+
```
168+
@misc{vlasiator2025mldata,
169+
title={Vlasiator Dataset for Machine Learning Studies},
170+
author={Zaitsev, Ivan and Holmberg, Daniel and Alho, Markku and Bouri, Ioanna and Franssila, Fanni and Jeong, Haewon and Palmroth, Minna and Roos, Teemu},
171+
year={2025},
172+
publisher={Hugging Face},
173+
url={https://huggingface.co/datasets/deinal/spacecast-data},
174+
doi={10.57967/hf/7027},
175+
}
176+
```
177+
178+
ML4PS paper
110179
```
111180
@inproceedings{holmberg2025graph,
112181
title={Graph-based Neural Space Weather Forecasting},
@@ -115,3 +184,4 @@ where a model checkpoint from a given path given to the `--load` in `.ckpt` form
115184
year={2025}
116185
}
117186
```
187+
This work is based on code using a single run dataloader at commit: https://github.com/fmihpc/spacecast/commit/937094079c1364ec484d3d1647e758f4a388ad97.

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,5 @@ torch-geometric>=2.6.1
1313
parse>=1.20.2
1414
dataclass-wizard==0.35.0
1515
mllam-data-prep==0.6.1
16+
pre-commit==3.8.0
17+
huggingface-hub==0.27.0

0 commit comments

Comments
 (0)