Skip to content

Commit e0be159

Browse files
yinhaofengroot
authored andcommitted
fix license
1 parent 980ed97 commit e0be159

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

datasets/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ sh data_process.sh
1818
| :----------------------------------------------: | :------------------------------------------------------------------------------------------: | :-------------------------------: |
1919
|[ag_news](https://paddle-tagspace.bj.bcebos.com/data.tar)|496835 条来自AG新闻语料库 4 大类别超过 2000 个新闻源的新闻文章,数据集仅仅援用了标题和描述字段。每个类别分别拥有 30,000 个训练样本及 1900 个测试样本。| [ComeToMyHead](http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html)|
2020
|[Ali-CCP:Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408)|从淘宝推荐系统的真实流量日志中收集的数据集。|[SIGIR(2018)]( https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408)|
21-
|[BQ](https://paddlerec.bj.bcebos.com/dssm%2Fbq.tar.gz)|BQ是一个智能客服中文问句匹配数据集,该数据集是自动问答系统语料,共有120,000对句子对,并标注了句子对相似度值。数据中存在错别字、语法不规范等问题,但更加贴近工业场景|--|
21+
|[BQ](https://paddlerec.bj.bcebos.com/dssm%2Fbq.tar.gz)|BQ是一个智能客服中文问句匹配数据集,该数据集是自动问答系统语料,共有120,000对句子对,并标注了句子对相似度值。数据中存在错别字、语法不规范等问题,但更加贴近工业场景|[The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification](http://icrc.hitsz.edu.cn/info/1037/1162.htm)|
2222
|[Census-income Data](https://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/census.tar.gz )|此数据集包含从1994年和1995年美国人口普查局进行的当前人口调查中提取的加权人口普查数据。数据包含人口统计和就业相关变量。|[Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid](http://robotics.stanford.edu/~ronnyk/nbtree.pdf)|
2323
|[Criteo](https://fleet.bj.bcebos.com/ctr_data.tar.gz)|该数据集包括两部分:训练集和测试集。训练集包含一段时间内Criteo的部分流量,测试集则对应训练数据后一天的广告点击流量。|[kaggle](https://www.kaggle.com/c/criteo-display-ad-challenge/)|
2424
|[letor07](https://paddlerec.bj.bcebos.com/match_pyramid/match_pyramid_data.tar.gz)|LETOR是一套用于学习排名研究的基准数据集,其中包含标准特征、相关性判断、数据划分、评估工具和若干基线|[LETOR: Learning to Rank for Information Retrieval](https://www.microsoft.com/en-us/research/project/letor-learning-rank-information-retrieval/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fbeijing%2Fprojects%2Fletor%2F)|

0 commit comments

Comments
 (0)