Skip to content

Commit 482aab3

Browse files
authored
Merge pull request #748 from BamLubi/master
add sign
2 parents bc583c9 + 57dd218 commit 482aab3

File tree

21 files changed

+3530
-1
lines changed

21 files changed

+3530
-1
lines changed

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,6 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # 静态图训
117117

118118
<h2 align="center">支持模型列表</h2>
119119

120-
121120
| 方向 | 模型 | 在线环境 | 分布式CPU | 分布式GPU | 支持版本| 论文 |
122121
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| :-----------------------------------------------------------------------: | :-----: | :-------: | :-------: |:-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
123122
| 内容理解 | [TextCnn](models/contentunderstanding/textcnn/)([文档](https://paddlerec.readthedocs.io/en/latest/models/contentunderstanding/textcnn.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238415) || x | >=2.1.0 | [EMNLP 2014][Convolutional neural networks for sentence classication](https://www.aclweb.org/anthology/D14-1181.pdf) |
@@ -172,6 +171,7 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # 静态图训
172171
| 排序 | [DCN_V2](models/rank/dcn_v2/) | - ||| >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)|
173172
| 排序 | [AITM](models/rank/aitm/) | - ||| >=2.1.0 | [KDD 2021][Modeling the Sequential Dependence among Audience Multi-step Conversions withMulti-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489v2.pdf) |
174173
| 排序 | [DSIN](models/rank/dsin/) | - ||| >=2.1.0 | [IJCAI 2019][Deep Session Interest Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.06482v1.pdf) |
174+
| 排序 | [SIGN](models/rank/sign/)([文档](https://paddl7erec.readthedocs.io/en/latest/models/rank/sign.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3869111) ||| >=2.1.0 | [AAAI 2021][Detecting Beneficial Feature Interactions for Recommender Systems](https://arxiv.org/pdf/2008.00404v6.pdf) |
175175
| 多任务 | [PLE](models/multitask/ple/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/ple.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238938) ||| >=2.1.0 | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/abs/10.1145/3383313.3412236) |
176176
| 多任务 | [ESMM](models/multitask/esmm/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/esmm.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238583) ||| >=2.1.0 | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) |
177177
| 多任务 | [MMOE](models/multitask/mmoe/)([文档](https://paddlerec.readthedocs.io/en/latest/models/multitask/mmoe.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238934) ||| >=2.1.0 | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007) |

README_EN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml # Training wit
162162
| Rank | [DCN_V2](models/rank/dcn_v2/) | - ||| >=2.1.0 | [WWW 2021][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/pdf/2008.13535v2.pdf)|
163163
| Rank | [AITM](models/rank/aitm/) | - ||| >=2.1.0 | [KDD 2021][Modeling the Sequential Dependence among Audience Multi-step Conversions withMulti-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489v2.pdf) |
164164
| Rank | [DSIN](models/rank/dsin/) | - ||| >=2.1.0 | [IJCAI 2019][Deep Session Interest Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.06482v1.pdf) |
165+
| Rank | [SIGN](models/rank/sign/)([doc](https://paddlerec.readthedocs.io/en/latest/models/rank/sign.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3869111) ||| >=2.1.0 | [AAAI 2021][Detecting Beneficial Feature Interactions for Recommender Systems](https://arxiv.org/pdf/2008.00404v6.pdf) |
165166
| Multi-Task | [PLE](models/multitask/ple/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/ple.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238938) ||| >=2.1.0 | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/abs/10.1145/3383313.3412236) |
166167
| Multi-Task | [ESMM](models/multitask/esmm/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/esmm.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238583) ||| >=2.1.0 | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) |
167168
| Multi-Task | [MMOE](models/multitask/mmoe/)<br>([doc](https://paddlerec.readthedocs.io/en/latest/models/multitask/mmoe.html)) | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238934) ||| >=2.1.0 | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007) |

contributor.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,6 @@
2020
| [FLEN](models/rank/flen/) | [LinJayan](https://github.com/LinJayan) | https://github.com/PaddlePaddle/PaddleRec/pull/685 | 论文复现赛第五期 |
2121
| [MHCN](models/recall/mhcn/) | [Andy1314Chen](https://github.com/Andy1314Chen) | https://github.com/PaddlePaddle/PaddleRec/pull/679 | 论文复现赛第五期 |
2222
| [DCN_V2](models/rank/dcn_v2/) | [LinJayan](https://github.com/LinJayan) | https://github.com/PaddlePaddle/PaddleRec/pull/677 | 论文复现赛第五期 |
23+
| [SIGN](models/rank/sign/) | [BamLubi](https://github.com/BamLubi) | https://github.com/PaddlePaddle/PaddleRec/pull/748 | 论文复现赛第六期 |
2324

2425
</div>

datasets/sign/run.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
wget https://blog.cos.bamlubi.cn/Paddle-SIGN/ml-tag.zip
2+
unzip ml-tag.zip

doc/imgs/sign.png

126 KB
Loading

doc/source/models/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ PaddleRec 模型库
7676
rank/naml.md
7777
rank/wide_deep.md
7878
rank/xdeepfm.md
79+
rank/rank.md
7980

8081

8182
重排序

doc/source/models/rank/sign.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# sign (Detecting Beneficial Feature Interactions for Recommender Systems)
2+
3+
代码请参考:[sign](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/rank/sign)
4+
如果我们的代码对您有用,还请点个star啊~
5+
6+
## 内容
7+
8+
- [模型简介](#模型简介)
9+
- [数据准备](#数据准备)
10+
- [运行环境](#运行环境)
11+
- [快速开始](#快速开始)
12+
- [模型组网](#模型组网)
13+
- [效果复现](#效果复现)
14+
- [进阶使用](#进阶使用)
15+
- [FAQ](#FAQ)
16+
17+
## 模型简介
18+
特征交叉通过将两个或多个特征相乘,来实现样本空间的非线性变换,提高模型的非线性能力,其在推荐系统领域中可以显著提高准确率。以往的研究考虑了所有特征之间的交叉,但是某些特征交叉与推荐结果的相关性不大,其引入的噪声会降低模型的准确率。因此论文[《Detecting Beneficial Feature Interactions for Recommender Systems》]( https://arxiv.org/pdf/2008.00404v6.pdf )中提出了一种利用图神经网络自动发现有意义特征交叉的模型L0-SIGN。
19+
20+
作者使用图神经网络建模每个样本的特征,将特征交叉与图中的边相联系,用GNN的关系推理能力对特征交叉进行建模。使用L0正则化的边预测来限制图中检测的边的数量,以此进行有意义特征交叉的检测。
21+
22+
本模型实现了下述论文中的 SIGN 模型:
23+
24+
```text
25+
@inproceedings{su2021detecting,
26+
title={Detecting Beneficial Feature Interactions for Recommender Systems},
27+
author={Su, Yixin and Zhang, Rui and Erfani, Sarah and Xu, Zhenghua},
28+
booktitle={Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI)},
29+
year={2021}
30+
}
31+
```
32+
33+
## 数据准备
34+
35+
论文使用了4个开源数据集,`DBLP_v1``frappe``ml-tag``twitter`,这里使用`ml-tag`验证模型效果,在模型目录的data目录中准备了快速运行的示例数据,若需要使用全量数据可以参考下方[效果复现](#效果复现)部分。
36+
该数据集专注于电影标签推荐,每个数据实例都代表一个图,数据格式如下:
37+
38+
```shell
39+
# 电影标签 用户ID 电影ID 电影ID
40+
0.0 24 25 26
41+
1.0 62 63 64
42+
```
43+
## 运行环境
44+
PaddlePaddle>=2.0
45+
46+
pgl>=2.2.0
47+
48+
python 2.7/3.5/3.6/3.7
49+
50+
os : windows/linux/macos
51+
52+
## 快速开始
53+
本文提供了样例数据可以供您快速体验,在任意目录下均可执行。在sign模型目录的快速执行命令如下:
54+
```bash
55+
# 准备环境: 安装pgl
56+
pip install pgl
57+
58+
# 进入模型目录
59+
cd PaddleRec/models/rank/sign # 在任意目录均可运行
60+
# 动态图训练
61+
python -u ../../../tools/trainer.py -m config.yaml # sample数据运行
62+
python -u ../../../tools/trainer.py -m config_bigdata.yaml # 全量数据运行
63+
# 动态图预测
64+
python -u ../../../tools/infer.py -m config.yaml # sample数据预测
65+
python -u ../../../tools/infer.py -m config_bigdata.yaml # 全量数据预测
66+
```
67+
68+
## 模型组网
69+
70+
L0-SIGN模型有两个模块,一个是L0边预估模块,通过矩阵分解图的邻接矩阵进行边的预估,一个是图分类SIGN模块。模型的主要组网结构如图1所示,与 `net.py` 中的代码一一对应 :
71+
72+
<p align="center">
73+
<img align="center" src="../../../imgs/sign.png">
74+
<p>
75+
76+
## 效果复现
77+
78+
为了方便使用者能够快速的跑通每一个模型,我们在每个模型下都提供了样例数据。如果需要复现readme中的效果,请按如下步骤依次操作即可。
79+
在全量数据下模型的指标如下:
80+
81+
| 模型 | auc | acc | batch_size | epoch_num | Time of each epoch |
82+
| :--- | :----- | :----- | :--------- | :-------- | :----------------- |
83+
| SIGN | 0.9418 | 0.8927 | 1024 | 40 | 约18分钟 |
84+
85+
1. 确认您当前所在目录为PaddleRec/models/rank/sign
86+
2. 进入PaddleRec/datasets/sign目录下,执行`run.sh`脚本,会从国内源的服务器上下载sign全量数据集,并解压到指定文件夹。
87+
88+
``` bash
89+
cd ../../../datasets/sign
90+
bash run.sh
91+
```
92+
93+
3. 安装依赖
94+
95+
```shell
96+
# 安装pgl
97+
pip install pgl
98+
```
99+
100+
3. 切回模型目录,执行命令运行全量数据
101+
102+
```bash
103+
cd - # 切回模型目录
104+
# 动态图训练
105+
python -u ../../../tools/trainer.py -m config_bigdata.yaml # 全量数据运行
106+
python -u .././../tools/infer.py -m config_bigdata.yaml # 全量数据预测
107+
```
108+
109+
## 进阶使用
110+
111+
本模型支持飞桨训推一体认证 (Training and Inference Pipeline Certification(TIPC)) 信息和测试工具,方便用户查阅每种模型的训练推理部署打通情况,并可以进行一键测试。
112+
113+
使用本工具,可以测试不同功能的支持情况,以及预测结果是否对齐,测试流程概括如下:
114+
115+
1. 运行`prepare.sh`准备测试所需数据和模型;
116+
2. 运行测试脚本`test_train_inference_python.sh`,产出log,由log可以看到不同配置是否运行成功;
117+
118+
测试单项功能仅需两行命令,命令格式如下:
119+
120+
```shell
121+
# 功能:准备数据
122+
# 格式:bash + 运行脚本 + 参数1: 配置文件选择 + 参数2: 模式选择
123+
# 模式选择 [Mode] = 'lite_train_lite_infer' | 'whole_train_whole_infer' | 'whole_infer' | 'lite_train_whole_infer'
124+
bash test_tipc/prepare.sh configs/[model_name]/[params_file_name] [Mode]
125+
126+
# 功能:运行测试
127+
# 格式:bash + 运行脚本 + 参数1: 配置文件选择 + 参数2: 模式选择
128+
bash test_tipc/test_train_inference_python.sh configs/[model_name]/[params_file_name] [Mode]
129+
```
130+
131+
例如,测试基本训练预测功能的`lite_train_lite_infer`模式,运行:
132+
133+
```shell
134+
# 确保当前目录在 PaddleRec
135+
# cd PaddleRec
136+
# 准备数据
137+
bash test_tipc/prepare.sh ./test_tipc/configs/sign/train_infer_python.txt 'lite_train_lite_infer'
138+
# 运行测试
139+
bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/sign/train_infer_python.txt 'lite_train_lite_infer'
140+
```
141+
142+
## FAQ

doc/source/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,4 @@
5050
[autofis](https://paddlerec.readthedocs.io/en/latest/models/rank/autofis.html)
5151
[aitm](https://paddlerec.readthedocs.io/en/latest/models/rank/aitm.html)
5252
[dsin](https://paddlerec.readthedocs.io/en/latest/models/rank/dsin.html)
53+
[sign](https://paddlerec.readthedocs.io/en/latest/models/rank/sign.html)

0 commit comments

Comments
 (0)