PolymathicAI
diff --git a/‎README.md‎
Lines changed: 47 additions & 7 deletions b/‎README.md‎
Lines changed: 47 additions & 7 deletions
diff --git a/‎assets/im_embedding.png‎
256 KB b/‎assets/im_embedding.png‎
256 KB
diff --git a/‎notebooks/AstroCLIP_training.ipynb‎
Lines changed: 289 additions & 53 deletions b/‎notebooks/AstroCLIP_training.ipynb‎
Lines changed: 289 additions & 53 deletions
diff --git a/‎notebooks/PaperPlots.ipynb‎
Lines changed: 1849 additions & 166 deletions b/‎notebooks/PaperPlots.ipynb‎
Lines changed: 1849 additions & 166 deletions
diff --git a/‎notebooks/dev/README.md‎
Lines changed: 3 additions & 0 deletions b/‎notebooks/dev/README.md‎
Lines changed: 3 additions & 0 deletions
@@ -1,18 +1,58 @@
 # AstroCLIP
 Multimodal contrastive pretraining for astronomical data
 
-![image](https://github.com/FoundationModelsForScience/AstroCLIP/assets/861591/306ca96f-009b-4983-9b1e-4d4880822ee0)
+<a href="https://arxiv.org/abs/2310.03024" style='vertical-align:middle; display:inline;'><img
+							src="https://img.shields.io/badge/astro--ph.IM-arXiv%3A2310.03024-B31B1B.svg" class="plain" style="height:25px;" /></a>
 
 
-## Requirements
+The goal of this project is to demonstrate the ability of contrastive pre-training between two different kinds of astronomical data modalities (multi-band imaging, and optical spectra), to yield a meaningful embedding space which captures physical information about galaxies and is shared between both modalities. 
 
-This repo should only have basic pytorch and huggingface requirements. The following should install all that is needed (so far)
+![image](assets/im_embedding.png)
 
-```bash
-pip install datasets timm lightning
+## Results
+
+We encourage you to take a look at our [NeurIPS 2023 AI4Science submission](https://arxiv.org/abs/2310.03024) (still under review) for a longer form description of our results, but here are the main takeaways:
+ - Both image and spectra encoders are able to extract meaningful physical information from the input data.
+ - The embeddings of both images and spectra are well aligned, allowing us to retrieve spectra that correspond to a given image, and vice-versa.
+
+The notebook used to generate the plots of the paper can be found [here](notebooks/PaperPlots.ipynb).
+
+Below is a visualization of the learned embeddings, by taking the 2 first PCA components of spectra and image embeddings. As one can see, images and spectra discover similar main factors of variations.
+![emb_pca](https://github.com/PolymathicAI/AstroCLIP/assets/861591/01475caa-8628-439b-8553-951074e287e2)
+
+Visualizing the structure of the latent space by UMAP dimensionality reduction further higlights some of its information content. Below is an example of a UMAP of the spectra embeddings:
+
+![image](https://github.com/PolymathicAI/AstroCLIP/assets/861591/0b7bd48a-f29a-4edd-8e0b-1272a51a0d88)
+
+
+## Products: Datasets and Trained Models
+
+### Dataset
+
+As part of this project, we compile and make available a combined dataset of DESI Legacy Survey g,r,z images, and DESI Early Data Release spectra. These images are a subset of the [ssl-legacysurvey](https://github.com/georgestein/ssl-legacysurvey) sample compiled by @georgestein from the Legacy Survey DR9. Scripts used to match these datasets are available [here](scripts/cross_match_data.py).
+
+For convenience, we provide a Hugging Face Datasets loading script which will automatically download the data needed and prepare the dataset on your computer.
+
+```python
+from datasets import load_dataset
+
+# This downloads about 60 GB of data
+dset = load_dataset('astroclip/datasets/legacy_survey.py')
 ```
 
-## Usage
+For an example of getting started with this dataset, for example to simply predict redsfhit from the spectra, you can take a look at this notebook  [notebook](notebooks/dev/ConvolutionalPrototyping.ipynb).
+
+
+### Training scripts and model weights 
+
+**[Coming soon]**
+
 
-Please take a look at this initial prototyping notebook to see how the data looks like and how to use it: [notebook](notebooks/dev/ConvolutionalPrototyping.ipynb)
+## Requirements
+
+This repo should only have basic pytorch and huggingface requirements. The following should install all that is needed (when run from this repository):
+
+```bash
+pip install .
+```
 
@@ -0,0 +1,3 @@
+# Development notebooks
+
+This folder contains a number of development notebooks, kept here for archival reasons, but not intended to be easily reusable/runnable.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Development notebooks`
	`2`	`+`
	`3`	`+This folder contains a number of development notebooks, kept here for archival reasons, but not intended to be easily reusable/runnable.`