File tree Expand file tree Collapse file tree 2 files changed +23
-14
lines changed Expand file tree Collapse file tree 2 files changed +23
-14
lines changed Original file line number Diff line number Diff line change @@ -50,11 +50,6 @@ ESMM是发表在 SIGIR’2018 的论文[《Entire Space Multi-Task Model: An E
5050
5151数据地址:[ Ali-CCP:Alibaba Click and Conversion Prediction] ( https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408 )
5252
53- ```
54- cd data
55- sh run.sh
56- ```
57-
5853数据格式参见demo数据:data/train
5954
6055
@@ -108,11 +103,25 @@ CPU环境
108103
109104## 论文复现
110105
111- 用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1000, thread_num=8, epoch_num=4
106+ 由于原论文的数据太大,我们选取了部分数据作为训练和测试数据, 建议使用gpu训练。
107+
108+ 我们的测试ctr auc为0.79+,ctcvr auc为0.82+。
112109
110+ ```
111+ wget https://paddlerec.bj.bcebos.com/esmm/traindata_10w.csv
112+ wget https://paddlerec.bj.bcebos.com/esmm/testdata_10w.csv
113+ mkdir data/train_data data/test_data
114+ mv traindata_10w.csv data/train_data
115+ mv testdata_10w.csv data/test_data
116+ ```
113117
114- 修改后运行方案:修改config.yaml中的'workspace'为config.yaml的目录位置,执行
118+ 用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1024, epoch=10, device=gpu, selected_gpus:"0"
115119
120+ 具体配置可以下载config_10w.yaml文件
121+ ```
122+ wget https://paddlerec.bj.bcebos.com/esmm/config_10w.yaml
123+ ```
124+ 修改后运行
116125```
117126python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径
118127```
Original file line number Diff line number Diff line change @@ -17,19 +17,19 @@ workspace: "models/multitask/esmm"
1717
1818dataset :
1919 - name : dataset_train
20- batch_size : 1
20+ batch_size : 5
2121 type : QueueDataset
2222 data_path : " {workspace}/data/train"
2323 data_converter : " {workspace}/esmm_reader.py"
2424 - name : dataset_infer
25- batch_size : 1
25+ batch_size : 5
2626 type : QueueDataset
2727 data_path : " {workspace}/data/test"
2828 data_converter : " {workspace}/esmm_reader.py"
2929
3030hyper_parameters :
31- vocab_size : 10000
32- embed_size : 128
31+ vocab_size : 737946
32+ embed_size : 12
3333 optimizer :
3434 class : adam
3535 learning_rate : 0.001
@@ -43,15 +43,15 @@ runner:
4343 class : train
4444 device : cpu
4545 epochs : 3
46- save_checkpoint_interval : 2
46+ save_checkpoint_interval : 1
4747 save_inference_interval : 4
48- save_checkpoint_path : " increment "
48+ save_checkpoint_path : " increment_esmm "
4949 save_inference_path : " inference"
5050 print_interval : 10
5151 phases : [train]
5252 - name : infer_runner
5353 class : infer
54- init_model_path : " increment /1"
54+ init_model_path : " increment_esmm /1"
5555 device : cpu
5656 print_interval : 1
5757 phases : [infer]
You can’t perform that action at this time.
0 commit comments