Skip to content

Commit 70d0b27

Browse files
authored
Merge branch 'master' into classification
2 parents 6bede25 + 8c7d113 commit 70d0b27

File tree

2 files changed

+23
-14
lines changed

2 files changed

+23
-14
lines changed

models/multitask/esmm/README.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,6 @@ ESMM是发表在 SIGIR’2018 的论文[《Entire Space Multi-Task Model: An E
5050

5151
数据地址:[Ali-CCP:Alibaba Click and Conversion Prediction]( https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408 )
5252

53-
```
54-
cd data
55-
sh run.sh
56-
```
57-
5853
数据格式参见demo数据:data/train
5954

6055

@@ -108,11 +103,25 @@ CPU环境
108103

109104
## 论文复现
110105

111-
用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1000, thread_num=8, epoch_num=4
106+
由于原论文的数据太大,我们选取了部分数据作为训练和测试数据, 建议使用gpu训练。
107+
108+
我们的测试ctr auc为0.79+,ctcvr auc为0.82+。
112109

110+
```
111+
wget https://paddlerec.bj.bcebos.com/esmm/traindata_10w.csv
112+
wget https://paddlerec.bj.bcebos.com/esmm/testdata_10w.csv
113+
mkdir data/train_data data/test_data
114+
mv traindata_10w.csv data/train_data
115+
mv testdata_10w.csv data/test_data
116+
```
113117

114-
修改后运行方案:修改config.yaml中的'workspace'为config.yaml的目录位置,执行
118+
用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1024, epoch=10, device=gpu, selected_gpus:"0"
115119

120+
具体配置可以下载config_10w.yaml文件
121+
```
122+
wget https://paddlerec.bj.bcebos.com/esmm/config_10w.yaml
123+
```
124+
修改后运行
116125
```
117126
python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径
118127
```

models/multitask/esmm/config.yaml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,19 @@ workspace: "models/multitask/esmm"
1717

1818
dataset:
1919
- name: dataset_train
20-
batch_size: 1
20+
batch_size: 5
2121
type: QueueDataset
2222
data_path: "{workspace}/data/train"
2323
data_converter: "{workspace}/esmm_reader.py"
2424
- name: dataset_infer
25-
batch_size: 1
25+
batch_size: 5
2626
type: QueueDataset
2727
data_path: "{workspace}/data/test"
2828
data_converter: "{workspace}/esmm_reader.py"
2929

3030
hyper_parameters:
31-
vocab_size: 10000
32-
embed_size: 128
31+
vocab_size: 737946
32+
embed_size: 12
3333
optimizer:
3434
class: adam
3535
learning_rate: 0.001
@@ -43,15 +43,15 @@ runner:
4343
class: train
4444
device: cpu
4545
epochs: 3
46-
save_checkpoint_interval: 2
46+
save_checkpoint_interval: 1
4747
save_inference_interval: 4
48-
save_checkpoint_path: "increment"
48+
save_checkpoint_path: "increment_esmm"
4949
save_inference_path: "inference"
5050
print_interval: 10
5151
phases: [train]
5252
- name: infer_runner
5353
class: infer
54-
init_model_path: "increment/1"
54+
init_model_path: "increment_esmm/1"
5555
device: cpu
5656
print_interval: 1
5757
phases: [infer]

0 commit comments

Comments
 (0)