-We retrieved and processed ~**100 million** research papers from the public internet, leveraging existing datasets from **bethgelab**, **PeS2o**, **Hugging Face**, and **Common Pile**. We designed a standardized **JSON schema** for scientific paper summaries and **post-trained two models**—**Qwen 3 14B** and **Nemotron 12B**—to produce summaries in this format. Our evaluation combines **LLM-as-a-Judge** and a **QA dataset**. Fine-tuned models achieve performance on our evals comparable to leading closed models (e.g., GPT-5, Claude 4.5). **Nemotron 12B** offers ~**2.25×** higher throughput than Qwen 3 14B, making it attractive for large-scale processing.
0 commit comments