|
1 | 1 | """MS MARCO (Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR. |
2 | 2 |
|
3 | | - **Publication**: |
4 | | - Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, RanganMajumder, and Li Deng. 2016. |
5 | | - MS MARCO: A Human Generated MAchineReading COmprehension Dataset. In CoCo@NIPS. |
| 3 | +**Publication**: |
| 4 | +Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, RanganMajumder, and Li Deng. 2016. |
| 5 | +MS MARCO: A Human Generated MAchineReading COmprehension Dataset. In CoCo@NIPS. |
6 | 6 |
|
7 | 7 |
|
8 | | - See [https://github.com/microsoft/MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) for more details |
| 8 | +See [https://github.com/microsoft/MSMARCO-Passage-Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) for more details |
9 | 9 | """ |
10 | 10 |
|
11 | 11 | from datamaestro.annotations.agreement import useragreement |
|
35 | 35 |
|
36 | 36 | # --- Document collection |
37 | 37 |
|
| 38 | + |
38 | 39 | # TODO: Not ideal since it would be better to have small versions right away |
39 | 40 | # instead of downloading again the MS Marco Collection |
40 | 41 | @lua |
|
43 | 44 | url="https://msmarco.blob.core.windows.net/msmarcoranking/collectionandqueries.tar.gz", |
44 | 45 | checker=HashCheck("31644046b18952c1386cd4564ba2ae69", md5), |
45 | 46 | ) |
46 | | -@dataset(Folder, url="https://github.com/microsoft/MSMARCO-Passage-Ranking") |
47 | | -def collection_etc(data): |
| 47 | +@dataset(url="https://github.com/microsoft/MSMARCO-Passage-Ranking") |
| 48 | +def collection_etc(data) -> Folder: |
48 | 49 | """Documents and some more files""" |
49 | | - return {"path": data} |
| 50 | + return Folder(path=data) |
50 | 51 |
|
51 | 52 |
|
52 | 53 | @lua |
|
0 commit comments