Skip to content

Commit 4af233d

Browse files
committed
Merge branch 'main' of github.com:FoundationModelsForScience/AstroCLIP
2 parents ead11ed + 46e9490 commit 4af233d

File tree

5 files changed

+2188
-226
lines changed

5 files changed

+2188
-226
lines changed

README.md

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,58 @@
11
# AstroCLIP
22
Multimodal contrastive pretraining for astronomical data
33

4-
![image](https://github.com/FoundationModelsForScience/AstroCLIP/assets/861591/306ca96f-009b-4983-9b1e-4d4880822ee0)
4+
<a href="https://arxiv.org/abs/2310.03024" style='vertical-align:middle; display:inline;'><img
5+
src="https://img.shields.io/badge/astro--ph.IM-arXiv%3A2310.03024-B31B1B.svg" class="plain" style="height:25px;" /></a>
56

67

7-
## Requirements
8+
The goal of this project is to demonstrate the ability of contrastive pre-training between two different kinds of astronomical data modalities (multi-band imaging, and optical spectra), to yield a meaningful embedding space which captures physical information about galaxies and is shared between both modalities.
89

9-
This repo should only have basic pytorch and huggingface requirements. The following should install all that is needed (so far)
10+
![image](assets/im_embedding.png)
1011

11-
```bash
12-
pip install datasets timm lightning
12+
## Results
13+
14+
We encourage you to take a look at our [NeurIPS 2023 AI4Science submission](https://arxiv.org/abs/2310.03024) (still under review) for a longer form description of our results, but here are the main takeaways:
15+
- Both image and spectra encoders are able to extract meaningful physical information from the input data.
16+
- The embeddings of both images and spectra are well aligned, allowing us to retrieve spectra that correspond to a given image, and vice-versa.
17+
18+
The notebook used to generate the plots of the paper can be found [here](notebooks/PaperPlots.ipynb).
19+
20+
Below is a visualization of the learned embeddings, by taking the 2 first PCA components of spectra and image embeddings. As one can see, images and spectra discover similar main factors of variations.
21+
![emb_pca](https://github.com/PolymathicAI/AstroCLIP/assets/861591/01475caa-8628-439b-8553-951074e287e2)
22+
23+
Visualizing the structure of the latent space by UMAP dimensionality reduction further higlights some of its information content. Below is an example of a UMAP of the spectra embeddings:
24+
25+
![image](https://github.com/PolymathicAI/AstroCLIP/assets/861591/0b7bd48a-f29a-4edd-8e0b-1272a51a0d88)
26+
27+
28+
## Products: Datasets and Trained Models
29+
30+
### Dataset
31+
32+
As part of this project, we compile and make available a combined dataset of DESI Legacy Survey g,r,z images, and DESI Early Data Release spectra. These images are a subset of the [ssl-legacysurvey](https://github.com/georgestein/ssl-legacysurvey) sample compiled by @georgestein from the Legacy Survey DR9. Scripts used to match these datasets are available [here](scripts/cross_match_data.py).
33+
34+
For convenience, we provide a Hugging Face Datasets loading script which will automatically download the data needed and prepare the dataset on your computer.
35+
36+
```python
37+
from datasets import load_dataset
38+
39+
# This downloads about 60 GB of data
40+
dset = load_dataset('astroclip/datasets/legacy_survey.py')
1341
```
1442

15-
## Usage
43+
For an example of getting started with this dataset, for example to simply predict redsfhit from the spectra, you can take a look at this notebook [notebook](notebooks/dev/ConvolutionalPrototyping.ipynb).
44+
45+
46+
### Training scripts and model weights
47+
48+
**[Coming soon]**
49+
1650

17-
Please take a look at this initial prototyping notebook to see how the data looks like and how to use it: [notebook](notebooks/dev/ConvolutionalPrototyping.ipynb)
51+
## Requirements
52+
53+
This repo should only have basic pytorch and huggingface requirements. The following should install all that is needed (when run from this repository):
54+
55+
```bash
56+
pip install .
57+
```
1858

assets/im_embedding.png

256 KB
Loading

notebooks/AstroCLIP_training.ipynb

Lines changed: 289 additions & 53 deletions
Large diffs are not rendered by default.

notebooks/PaperPlots.ipynb

Lines changed: 1849 additions & 166 deletions
Large diffs are not rendered by default.

notebooks/dev/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Development notebooks
2+
3+
This folder contains a number of development notebooks, kept here for archival reasons, but not intended to be easily reusable/runnable.

0 commit comments

Comments
 (0)