You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -471,10 +391,15 @@ Dingo includes an experimental Model Context Protocol (MCP) server. For details
471
391
472
392
# Research & Publications
473
393
474
-
-**"Comprehensive Data Quality Assessment for Multilingual WebData"** : [WanJuanSiLu: A High-Quality Open-Source Webtext
475
-
Dataset for Low-Resource Languages](https://arxiv.org/pdf/2501.14506)
476
-
-**"Pre-training data quality using the DataMan methodology"** : [DataMan: Data Manager for Pre-training Large Language Models](https://openreview.net/pdf?id=eNbA8Fqir4)
394
+
## Research Powered by Dingo
395
+
-**WanJuanSiLu**: [A High-Quality Open-Source Webtext Dataset for Low-Resource Languages](https://arxiv.org/pdf/2501.14506)
396
+
*Uses Dingo for comprehensive data quality assessment of multilingual web data*
477
397
398
+
## Methodologies Implemented in Dingo
399
+
-**DataMan Methodology**: [DataMan: Data Manager for Pre-training Large Language Models](https://openreview.net/pdf?id=eNbA8Fqir4)
400
+
*Dingo implements the DataMan methodology for pre-training data quality assessment*
0 commit comments