Skip to content

Commit d8d5bb0

Browse files
committed
add dcn_v2
1 parent d8ce095 commit d8d5bb0

File tree

14 files changed

+1037
-2
lines changed

14 files changed

+1037
-2
lines changed

README_CN.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # 静态图训
128128
| 召回 | [NCF](models/recall/ncf/)([文档](https://paddlerec.readthedocs.io/en/latest/models/recall/ncf.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3240152) ||| >=2.1.0 | [WWW 2017][Neural Collaborative Filtering](https://arxiv.org/pdf/1708.05031.pdf) |
129129
| 召回 | [TiSAS](models/recall/tisas/) | - ||| >=2.1.0 | [WSDM 2020][Time Interval Aware Self-Attention for Sequential Recommendation](https://cseweb.ucsd.edu/~jmcauley/pdfs/wsdm20b.pdf) |
130130
| 召回 | [ENSFM](models/recall/ensfm/) | - ||| >=2.1.0 | [IW3C2 2020][Eicient Non-Sampling Factorization Machines for Optimal Context-Aware Recommendation](http://www.thuir.cn/group/~mzhang/publications/TheWebConf2020-Chenchong.pdf) |
131+
| 召回 | [MHCN](models/recall/mhcn/) | - ||| >=2.1.0 | [WWW 2021][Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation](https://arxiv.org/pdf/2101.06448v3.pdf) |
131132
| 召回 | [GNN](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/recall/gnn/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [AAAI 2019][Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855) |
132133
| 召回 | [RALM](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/recall/look-alike_recall/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [KDD 2019][Real-time Attention Based Look-alike Model for Recommender System](https://arxiv.org/pdf/1906.05022.pdf) |
133134
| 排序 | [Logistic Regression](models/rank/logistic_regression/)([文档](https://paddlerec.readthedocs.io/en/latest/models/rank/logistic_regression.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3240481) || x | >=2.1.0 | / |
@@ -157,9 +158,10 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # 静态图训
157158
| 排序 | [Wide&Deep](models/rank/wide_deep/)([文档](https://paddlerec.readthedocs.io/en/latest/models/rank/wide_deep.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238421) || x | >=2.1.0 | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454) |
158159
| 排序 | [FGCNN](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/rank/fgcnn/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [WWW 2019][Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1904.04447.pdf) |
159160
| 排序 | [Fibinet](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/rank/fibinet/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [RecSys19][FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction]( https://arxiv.org/pdf/1905.09433.pdf) |
160-
| 排序 | [Flen](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/rank/flen/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
161+
| 排序 | [FLEN](models/rank/flen/) | - ||| >=2.1.0 | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
161162
| 排序 | [DeepRec](models/rank/deeprec/) | - ||| >=2.1.0 | [2017][Training Deep AutoEncoders for Collaborative Filtering](https://arxiv.org/pdf/1708.01715v3.pdf) |
162-
| 排序 | [AutoFIS](models/rank/autofis/) | - ||| >=2.1.0 | [KDD 2020][AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction](https://arxiv.org/pdf/2003.11235v3.pdf) | | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
163+
| 排序 | [AutoFIS](models/rank/autofis/) | - | ✓ | ✓ | >=2.1.0 | [KDD 2020][AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction](https://arxiv.org/pdf/2003.11235v3.pdf)
164+
| 排序 | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)
163165
| 多任务 | [PLE](models/multitask/ple/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/ple.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238938) ||| >=2.1.0 | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/abs/10.1145/3383313.3412236) |
164166
| 多任务 | [ESMM](models/multitask/esmm/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/esmm.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238583) ||| >=2.1.0 | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) |
165167
| 多任务 | [MMOE](models/multitask/mmoe/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/mmoe.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238934) ||| >=2.1.0 | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007) |

README_EN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # Training wit
150150
| Rank | [Flen](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/rank/flen/) | - ||| [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf) |
151151
| Rank | [DeepRec](models/rank/deeprec/) | - ||| >=2.1.0 | [2017][Training Deep AutoEncoders for Collaborative Filtering](https://arxiv.org/pdf/1708.01715v3.pdf) |
152152
| Rank | [AutoFIS](models/rank/autofis/) | - ||| >=2.1.0 | [KDD 2020][AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction](https://arxiv.org/pdf/2003.11235v3.pdf) |
153+
| Rank | [DCN_V2](models/rank/dcn_v2/) | - | ✓ | ✓ | >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)
153154
| Multi-Task | [PLE](models/multitask/ple/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/ple.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238938) ||| >=2.1.0 | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/abs/10.1145/3383313.3412236) |
154155
| Multi-Task | [ESMM](models/multitask/esmm/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/esmm.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238583) ||| >=2.1.0 | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) |
155156
| Multi-Task | [MMOE](models/multitask/mmoe/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/mmoe.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238934) ||| >=2.1.0 | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007) |

contributor.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,8 @@
1717
| [AutoFIS](models/rank/autofis/) | [renmada](https://github.com/renmada) | https://github.com/PaddlePaddle/PaddleRec/pull/660 | 论文复现赛第五期 |
1818
| [Dselect_K](models/multitask/dselect_k/) | [Andy1314Chen](https://github.com/Andy1314Chen) | https://github.com/PaddlePaddle/PaddleRec/pull/671 | 论文复现赛第五期 |
1919
| [MIND](models/recall/mind/) | [duyiqi17 ](https://github.com/duyiqi17) | https://github.com/PaddlePaddle/PaddleRec/pull/398 | 其他 |
20+
| [FLEN](models/rank/flen/) | [LinJayan](https://github.com/LinJayan) | https://github.com/PaddlePaddle/PaddleRec/pull/685 | 论文复现赛第五期 |
21+
| [MHCN](models/recall/mhcn/) | [Andy1314Chen](https://github.com/Andy1314Chen) | https://github.com/PaddlePaddle/PaddleRec/pull/679 | 论文复现赛第五期 |
22+
| [DCN_V2](models/rank/dcn_v2/) | [LinJayan](https://github.com/LinJayan) | https://github.com/PaddlePaddle/PaddleRec/pull/677 | 论文复现赛第五期 |
2023

2124
</div>

datasets/criteo_dcn_v2/download.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
wget --no-check-certificate https://paddlerec.bj.bcebos.com/deepfm%2Ffeat_dict_10.pkl2
2+
3+
wget --no-check-certificate https://fleet.bj.bcebos.com/ctr_data.tar.gz
4+
5+
tar -zxvf ctr_data.tar.gz
6+
mv ./raw_data ./train_data_full
7+
mkdir train_data && cd train_data
8+
cp ../train_data_full/part-0 ../train_data_full/part-1 ./ && cd ..
9+
mv ./test_data ./test_data_full
10+
mkdir test_data && cd test_data
11+
cp ../test_data_full/part-220 ./ && cd ..
12+
echo "Complete data download."
13+
echo "Full Train data stored in ./train_data_full "
14+
echo "Full Test data stored in ./test_data_full "
15+
echo "Rapid Verification train data stored in ./train_data "
16+
echo "Rapid Verification test data stored in ./test_data "
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
import numpy as np
17+
import paddle.fluid.incubate.data_generator as dg
18+
try:
19+
import cPickle as pickle
20+
except ImportError:
21+
import pickle
22+
23+
import paddle.fluid.incubate.data_generator as dg
24+
25+
26+
class Reader(dg.MultiSlotDataGenerator):
27+
def __init__(self, config):
28+
dg.MultiSlotDataGenerator.__init__(self)
29+
30+
def init(self):
31+
# DCN_v2 use log normalize the 13 continuous features
32+
# log(x+4)for dense-feature-2, log(x+1) for others
33+
34+
# self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
35+
# self.cont_max_ = [
36+
# 5775, 257675, 65535, 969, 23159456, 431037, 56311, 6047, 29019, 46,
37+
# 231, 4008, 7393
38+
# ]
39+
# self.cont_diff_ = [
40+
# self.cont_max_[i] - self.cont_min_[i]
41+
# for i in range(len(self.cont_min_))
42+
# ]
43+
44+
self.continuous_range_ = range(1, 14)
45+
self.categorical_range_ = range(14, 40)
46+
# load preprocessed feature dict
47+
self.feat_dict_name = "deepfm%2Ffeat_dict_10.pkl2" #
48+
self.feat_dict_ = pickle.load(open(self.feat_dict_name, 'rb'))
49+
50+
def _process_line(self, line):
51+
features = line.rstrip('\n').split('\t')
52+
feat_idx = []
53+
feat_value = []
54+
# log normalize
55+
for idx in self.continuous_range_:
56+
if features[idx] == '':
57+
# feat_idx.append(0)
58+
feat_value.append(0.0)
59+
else:
60+
# feat_idx.append(self.feat_dict_[idx])
61+
if idx == 2: # log(x+4)
62+
feat_value.append(np.log(float(features[idx]) + 4))
63+
else: # log(x+1)
64+
feat_value.append(np.log(float(features[idx]) + 1))
65+
66+
# feat_idx.append(self.feat_dict_[idx])
67+
# feat_value.append(
68+
# (float(features[idx]) - self.cont_min_[idx - 1]) /
69+
# self.cont_diff_[idx - 1])
70+
71+
for idx in self.categorical_range_:
72+
if features[idx] == '' or features[idx] not in self.feat_dict_:
73+
feat_idx.append(0)
74+
# feat_value.append(0.0)
75+
else:
76+
feat_idx.append(self.feat_dict_[features[idx]])
77+
# feat_value.append(1.0)
78+
label = [int(features[0])]
79+
return label, feat_value, feat_idx
80+
81+
def generate_sample(self, line):
82+
"""
83+
Read the data line by line and process it as a dictionary
84+
"""
85+
86+
def data_iter():
87+
label, feat_value, feat_idx = self._process_line(line)
88+
s = ""
89+
for i in [('click', label), ('dense_feature', feat_value),
90+
('feat_idx', feat_idx)]:
91+
k = i[0]
92+
v = i[1]
93+
for n, j in enumerate(v):
94+
if k == "feat_idx":
95+
s += " " + str(n + 1) + ":" + str(j)
96+
else:
97+
s += " " + k + ":" + str(j)
98+
print(s.strip()) # add print for data preprocessing
99+
yield None
100+
101+
return data_iter
102+
103+
104+
reader = Reader("../config.yaml")
105+
reader.init()
106+
reader.run_from_stdin()

datasets/criteo_dcn_v2/run.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
sh download.sh
2+
mkdir slot_train_data_full
3+
for i in `ls ./train_data_full`
4+
do
5+
cat train_data_full/$i | python get_slot_data.py > slot_train_data_full/$i
6+
done
7+
8+
mkdir slot_test_data_full
9+
for i in `ls ./test_data_full`
10+
do
11+
cat test_data_full/$i | python get_slot_data.py > slot_test_data_full/$i
12+
done
13+
14+
mkdir slot_train_data
15+
for i in `ls ./train_data`
16+
do
17+
cat train_data/$i | python get_slot_data.py > slot_train_data/$i
18+
done
19+
20+
mkdir slot_test_data
21+
for i in `ls ./test_data`
22+
do
23+
cat test_data/$i | python get_slot_data.py > slot_test_data/$i
24+
done

0 commit comments

Comments
 (0)