@@ -4,17 +4,21 @@ Inspired by [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-
4
4
5
5
## Training Data
6
6
7
- | Dataset | Task | Data Instance | Number of Training Tuples |
8
- | ------------------------------------------------------------------------------------ | :---------------------------: | :-------------------------------------------: | :-----------------------: |
9
- | [ indonli] ( https://huggingface.co/datasets/indonli ) | Natural Language Inference | ` (premise, entailment, contradiction) ` | 3,914 |
10
- | [ indolem/indo_story_cloze] ( https://huggingface.co/datasets/indolem/indo_story_cloze ) | Commonsense Reasoning | ` (context, correct ending, incorrect ending) ` | 1,000 |
11
- | [ unicamp-dl/mmarco] ( https://huggingface.co/datasets/unicamp-dl/mmarco ) | Passage Retrieval | ` (query, positive passage, negative passage) ` | 100,000 |
12
- | [ miracl/miracl] ( https://huggingface.co/datasets/miracl/miracl ) | Passage Retrieval | ` (query, positive passage, negative passage) ` | 8,086 |
13
- | [ SEACrowd/wrete] ( https://huggingface.co/datasets/SEACrowd/wrete ) | Textual Entailment | ` (sentenceA, sentenceB) ` | 183 |
14
- | [ SEACrowd/indolem_ntp] ( https://huggingface.co/datasets/SEACrowd/indolem_ntp ) | Textual Entailment | ` (tweet, next tweet) ` | 5,681 |
15
- | [ khalidalt/tydiqa-goldp] ( https://huggingface.co/datasets/khalidalt/tydiqa-goldp ) | Extractive Question-Answering | ` (question, passage) ` , ` (question, answer) ` | 11,404 |
16
- | [ SEACrowd/facqa] ( https://huggingface.co/datasets/SEACrowd/facqa ) | Extractive Question-Answering | ` (question, passage) ` , ` (question, answer) ` | 4,990 |
17
- | ** Total** | | | ** 135,258** |
7
+ | Dataset | Task | Data Instance | Number of Training Tuples |
8
+ | ------------------------------------------------------------------------------------------------------------------ | :----------------------------: | :-------------------------------------------: | :-----------------------: |
9
+ | [ indonli] ( https://huggingface.co/datasets/indonli ) | Natural Language Inference | ` (premise, entailment, contradiction) ` | 3,914 |
10
+ | [ indolem/indo_story_cloze] ( https://huggingface.co/datasets/indolem/indo_story_cloze ) | Commonsense Reasoning | ` (context, correct ending, incorrect ending) ` | 1,000 |
11
+ | [ unicamp-dl/mmarco] ( https://huggingface.co/datasets/unicamp-dl/mmarco ) | Passage Retrieval | ` (query, positive passage, negative passage) ` | 100,000 |
12
+ | [ miracl/miracl] ( https://huggingface.co/datasets/miracl/miracl ) | Passage Retrieval | ` (query, positive passage, negative passage) ` | 8,086 |
13
+ | [ SEACrowd/wrete] ( https://huggingface.co/datasets/SEACrowd/wrete ) | Textual Entailment | ` (sentenceA, sentenceB) ` | 183 |
14
+ | [ SEACrowd/indolem_ntp] ( https://huggingface.co/datasets/SEACrowd/indolem_ntp ) | Textual Entailment | ` (tweet, next tweet) ` | 5,681 |
15
+ | [ khalidalt/tydiqa-goldp] ( https://huggingface.co/datasets/khalidalt/tydiqa-goldp ) | Extractive Question-Answering | ` (question, passage) ` , ` (question, answer) ` | 11,404 |
16
+ | [ SEACrowd/facqa] ( https://huggingface.co/datasets/SEACrowd/facqa ) | Extractive Question-Answering | ` (question, passage) ` , ` (question, answer) ` | 4,990 |
17
+ | * included in v2* |
18
+ | [ indonesian-nlp/lfqa_id] ( https://huggingface.co/datasets/indonesian-nlp/lfqa_id ) | Open-domain Question-Answering | ` (question, answer) ` | 226,147 |
19
+ | [ jakartaresearch/indoqa] ( https://huggingface.co/datasets/jakartaresearch/indoqa ) | Extractive Question-Answering | ` (question, passage) ` , ` (question, answer) ` | 6,498 |
20
+ | [ jakartaresearch/id-paraphrase-detection] ( https://huggingface.co/datasets/jakartaresearch/id-paraphrase-detection ) | Paraphrase | ` (sentence, rephrased sentence) ` | 4,076 |
21
+ | ** Total** | | | ** 371,979** |
18
22
19
23
## All Supervised Datasets with MultipleNegativesRankingLoss
20
24
0 commit comments