Skip to content

Commit 276e408

Browse files
Update summaries.md
1 parent c61e6e9 commit 276e408

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

notes/summaries.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "AELLA (Autonomous Extraction of Linked Literature for Accessibility): The Inference.net × LAION"
2+
title: "AELLA (Autonomous Extraction of Linked Literature for Accessibility): The Inference.net × LAION × Grass"
33
author: "Christoph Schuhmann, Amarjot Singh, Andrii Prolorenzo, Andrej Radonjic, Sean Smith, and Sam Hogan"
44
date: "November 11 2025"
55
previewImg: "/images/blog/sci3.jpg"
@@ -10,7 +10,7 @@ previewImg: "/images/blog/sci3.jpg"
1010
## Abstract
1111

1212
We present a comprehensive approach to democratizing access to scientific knowledge through large-scale, **structured summarization** of academic literature.
13-
We retrieved and processed ~**100 million** research papers from the public internet, leveraging existing datasets from **bethgelab**, **PeS2o**, **Hugging Face**, and **Common Pile**.
13+
We retrieved and processed ~**100 million** research papers from the public internet , leveraging existing datasets from **bethgelab**, **PeS2o**, **Hugging Face**, and **Common Pile**.
1414

1515
<p align="center">
1616
<img src="/images/blog/sci5.png"
@@ -42,7 +42,7 @@ Access to scientific knowledge remains constrained by paywalls, licensing, and c
4242

4343
### 2.1 Dataset Collection & Processing
4444

45-
Primary corpus: ~**100M** research papers retrieved from the public internet. After deduplication, we **supplemented** with: *
45+
Primary corpus: ~**100M** research papers retrieved from the public internet through a collaboration with Grass. After deduplication, we **supplemented** with: *
4646
**bethgelab**: *paper_parsed_jsons* ([dataset](https://huggingface.co/datasets/bethgelab/paper_parsed_jsons)) *
4747

4848
**LAION**: *COREX-18text* ([dataset](https://huggingface.co/datasets/laion/COREX-18text)) *
@@ -194,7 +194,7 @@ We invite **researchers, librarians, and open-access advocates** to help us **ga
194194

195195
## Acknowledgments
196196

197-
This is a collaboration between **LAION** and **Inference.net**. We thank all contributors, especially **Tawsif Ratul** for data collection, and **Prof. Sören Auer**, **Dr. Gollam Rabby**, and the **TIB – Leibniz Information Centre for Science and Technology** for scientific advice and support.
197+
This is a collaboration between **LAION**, **Grass** and **Inference.net**. We thank all contributors, especially **Tawsif Ratul** for data collection, and **Prof. Sören Auer**, **Dr. Gollam Rabby**, and the **TIB – Leibniz Information Centre for Science and Technology** for scientific advice and support.
198198

199199

200200

0 commit comments

Comments
 (0)