You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[Ali-CCP:Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408)|从淘宝推荐系统的真实流量日志中收集的数据集。|[SIGIR(2018)](https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408)|
|[BQ](https://paddlerec.bj.bcebos.com/dssm%2Fbq.tar.gz)|BQ是一个智能客服中文问句匹配数据集,该数据集是自动问答系统语料,共有120,000对句子对,并标注了句子对相似度值。数据中存在错别字、语法不规范等问题,但更加贴近工业场景|[The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification](http://icrc.hitsz.edu.cn/info/1037/1162.htm)|
22
22
|[Census-income Data](https://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/census.tar.gz)|此数据集包含从1994年和1995年美国人口普查局进行的当前人口调查中提取的加权人口普查数据。数据包含人口统计和就业相关变量。|[Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid](http://robotics.stanford.edu/~ronnyk/nbtree.pdf)|
|[letor07](https://paddlerec.bj.bcebos.com/match_pyramid/match_pyramid_data.tar.gz)|LETOR是一套用于学习排名研究的基准数据集,其中包含标准特征、相关性判断、数据划分、评估工具和若干基线|[LETOR: Learning to Rank for Information Retrieval](https://www.microsoft.com/en-us/research/project/letor-learning-rank-information-retrieval/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fbeijing%2Fprojects%2Fletor%2F)|
0 commit comments