Skip to content

数据集用户id、商品id等cutoff问题 #36

@shuDaoNan9

Description

@shuDaoNan9

运行DCN模型跑下面这个数据集时候有些疑问:
http://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/
Kaggle Display Advertising Challenge Dataset
我看里面数据格式是:
The columns are tab separeted with the following schema:
<integer feature 1> ... <integer feature 13> <categorical feature 1> ... <categorical feature 26>
并没有区分用户id、商品id,那这样如何给用户做推荐呢?而且我看get_criteo_feature.py处理的时候,很多categorical 类型数据直接被截断没了,那如何区分开用户呢?
parser.add_argument(
"--cutoff",
type=int,
default=200,
help="cutoff long-tailed categorical values"
)

谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions