You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-12-03-mmu.md
+7-23Lines changed: 7 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,15 +22,12 @@ That’s why we’re excited to have partnered with the **Multimodal Universe**
22
22
<imgsrc="/images/blog/mmu_dset_examples.png"alt="Examples of data in the MMU dataset"width="95%"style="mix-blend-mode: darken;">
23
23
</p>
24
24
25
-
---
26
25
27
-
## Why “Multimodal”?
26
+
####Why “Multimodal”?
28
27
29
28
**Multimodal data** refers to data that comes in multiple formats or “modalities” for a given object. For example, an image of a galaxy is a two-dimensional array of pixel intensities, while a spectrum encodes brightness at different wavelengths, and a time series captures how the brightness of a source evolves over time. Each of these modalities offers a unique window into the physics of the source under study, which is why pairing them in a single dataset can be particularly powerful.
30
29
31
-
---
32
-
33
-
## What’s in the Multimodal Universe?
30
+
#### What’s in the Multimodal Universe?
34
31
35
32
We’ve combined publicly available data from **major astronomical surveys** into one consistently cross-matched framework, summarized in the table below. Images, spectra, hyperspectral data cubes, time-series data… they’re all in here! Each dataset has been carefully pre-processed, documented, and aligned to play nicely with one another right out of the box.
36
33
@@ -65,9 +62,7 @@ We’ve combined publicly available data from **major astronomical surveys** int
65
62
66
63
Up-to-date instructions on how to download the data, plus details about cross-matching and referencing the original sources, can be found on the [Multimodal Universe GitHub](https://github.com/MultimodalUniverse/MultimodalUniverse/).
67
64
68
-
---
69
-
70
-
## Key Principles and Features
65
+
#### Key Principles and Features
71
66
72
67
By collating these diverse surveys and ensuring that each dataset aligns with the rest, the Multimodal Universe follows a few guiding principles:
73
68
@@ -83,19 +78,15 @@ By collating these diverse surveys and ensuring that each dataset aligns with th
83
78
4.**Public Availability of All Scripts**
84
79
All the code used to download, process, and collate the data is public. This ensures **transparency** and makes it easy to replicate the entire pipeline or trace the lineage of each dataset from the ground up.
85
80
86
-
---
87
-
88
-
## A Catalyst for Machine Learning in Astronomy
81
+
#### A Catalyst for Machine Learning in Astronomy
89
82
90
83
We’ve provided a suite of **benchmarks** in the paper that highlight key scenarios in which this dataset shines. For instance, we replicate the **AstroCLIP** project [1] by combining Legacy Survey images with DESI spectra in just a few lines of code, whereas the original paper required a large data engineering effort.
91
84
92
85
Even better, by unifying the underlying data framework, **pipelines** developed for one survey or modality can be **directly transferred** to others. This paves the way for large-scale ML models that draw from multiple instruments and data formats simultaneously.
93
86
94
87
Finally, challenges like **distribution shifts**, **uncertainty quantification**, and **model calibration** are crucial in scientific ML. The Multimodal Universe’s breadth and diversity of data naturally test the limits of ML model generalizability:
95
88
96
-
---
97
-
98
-
## Where to Find It and What’s Next
89
+
#### Where to Find It and What’s Next
99
90
100
91
We host the Multimodal Universe dataset in full at the Flatiron Institute, with the first official release corresponding to the data listed in the table. However, this is an **ongoing project** and will be regularly updated:
101
92
@@ -105,9 +96,7 @@ We host the Multimodal Universe dataset in full at the Flatiron Institute, with
105
96
106
97
We envision this living dataset as a **central hub** for ML-driven astronomy, drastically cutting down on the data-engineering overhead that has historically slowed progress.
107
98
108
-
---
109
-
110
-
## Getting Started
99
+
#### Getting Started
111
100
112
101
1.**Visit the Landing Page**
113
102
Head to the [Multimodal Universe GitHub](https://github.com/MultimodalUniverse/MultimodalUniverse/) for the latest version, plus scripts for data retrieval and usage.
@@ -122,11 +111,6 @@ Whether you’re building a classifier to find elusive supernovae or training a
122
111
123
112
*-- Liam Parker*
124
113
125
-
---
126
-
127
-
## References
114
+
#### References
128
115
129
116
1. Parker, Liam, et al. "AstroCLIP: a cross-modal foundation model for galaxies." Monthly Notices of the Royal Astronomical Society 531.4 (2024): 4990-5011.
0 commit comments