Skip to content

Commit fa3e3a8

Browse files
authored
Merge pull request #697 from wangzhen38/add_mhcn
add mhcn
2 parents 08b6767 + c801409 commit fa3e3a8

File tree

12 files changed

+1722
-0
lines changed

12 files changed

+1722
-0
lines changed

contributor.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,6 @@
1818
| [Dselect_K](models/multitask/dselect_k/) | [Andy1314Chen](https://github.com/Andy1314Chen) | https://github.com/PaddlePaddle/PaddleRec/pull/671 | 论文复现赛第五期 |
1919
| [MIND](models/recall/mind/) | [duyiqi17 ](https://github.com/duyiqi17) | https://github.com/PaddlePaddle/PaddleRec/pull/398 | 其他 |
2020
| [FLEN](models/rank/flen/) | [LinJayan](https://github.com/LinJayan) | https://github.com/PaddlePaddle/PaddleRec/pull/685 | 论文复现赛第五期 |
21+
| [MHCN](models/recall/mhcn/) | [Andy1314Chen](https://github.com/Andy1314Chen) | https://github.com/PaddlePaddle/PaddleRec/pull/679 | 论文复现赛第五期 |
2122

2223
</div>

datasets/LastFM_MHCN/run.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
wget https://paddlerec.bj.bcebos.com/datasets/LastFM_MHCN/lastfm.zip
2+
unzip lastfm.zip
3+
4+
rm -rf lastfm.zip __MACOSX

models/recall/mhcn/README.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation
2+
3+
**[AI Studio在线运行环境](https://aistudio.baidu.com/studio/project/partial/verify/3406375/b2db3498abdd41a39b0a994a8e95ffcb)**
4+
5+
以下是本例的简要目录结构及说明:
6+
7+
```
8+
├── data # 样例数据
9+
├── train
10+
├── train.txt
11+
├── test
12+
├── test.txt
13+
├── ratings.txt
14+
├── trusts.txt
15+
├── __init__.py
16+
├── config.yaml # sample 数据配置
17+
├── config_bigdata.yaml # 全量数据配置
18+
├── dygraph_model.py # 构建动态图
19+
├── infer.py # 评估函数
20+
├── lastfm_reader.py # 数据读取函数
21+
├── net.py # 模型核心组网(动静统一)
22+
├── README.md # 文档
23+
```
24+
25+
注:在阅读该示例前,建议您先了解以下内容:
26+
27+
[paddlerec入门教程](https://github.com/PaddlePaddle/PaddleRec/blob/master/README.md)
28+
29+
## 内容
30+
31+
- [模型简介](#模型简介)
32+
- [数据准备](#数据准备)
33+
- [运行环境](#运行环境)
34+
- [快速开始](#快速开始)
35+
- [模型组网](#模型组网)
36+
- [效果复现](#效果复现)
37+
- [进阶使用](#进阶使用)
38+
- [FAQ](#FAQ)
39+
40+
## 模型简介
41+
42+
[Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation](https://arxiv.org/abs/2101.06448)
43+
44+
在推荐系统中,当 user 与 item 的交互数据稀疏时,社交关系常常用来提高推荐质量。大多数现有的社会推荐模型 (Social Recommendation)
45+
利用配对关系来挖掘潜在的用户偏好。然而,现实生活中的用户交互是非常复杂的,用户关系可能是高阶的。超图(Hypergraph) 为复杂的高阶关系提供了一种自然的建模方式,
46+
而它在改善社会推荐方面的潜力却没有被充分发掘。在本文中,我们填补了这一空白,并提出了一个多通道超图卷积网络,通过利用高阶用户关系来提高社交推荐。从技术上讲,网络中的每个通道都编码了一个超图,通过超图卷积网络描述了一个共同的高阶用户关系图。通过聚合多个通道学习的嵌入向量,可以获得全面的用户表示,
47+
从而产生推荐结果。然而,聚合操作也可能掩盖了不同类型的高阶连接信息的固有特征。为了弥补聚合的损失,创新性地将自我监督学习整合到超图聚合网络的训练中,以分层互信息最大化的方式重新获得连接信息。
48+
49+
![](https://tva1.sinaimg.cn/large/008i3skNly1gya578zf58j30tn078dgo.jpg)
50+
51+
![](https://tva1.sinaimg.cn/large/008i3skNly1gya57zcfs9j30v10c1q59.jpg)
52+
53+
以上两图分别展示了超图的组成类型及 MHCN 的网络结构。
54+
55+
```text
56+
@inproceedings{yu2021self,
57+
title={Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation},
58+
author={Yu, Junliang and Yin, Hongzhi and Li, Jundong and Wang, Qinyong and Hung, Nguyen Quoc Viet and Zhang, Xiangliang},
59+
booktitle={Proceedings of the Web Conference 2021},
60+
pages={413--424},
61+
year={2021}
62+
}
63+
```
64+
65+
## 数据准备
66+
67+
论文使用了 3 个常见的公开的社交推荐数据集,`LastFM``Douban``Yelp`, 这里以 [LastFM](http://files.grouplens.org/datasets/hetrec2011/)
68+
数据为例进行复现。该数据集包含文件如下:
69+
70+
```text
71+
* artists.dat
72+
73+
This file contains information about music artists listened and tagged by the users.
74+
75+
* tags.dat
76+
77+
This file contains the set of tags available in the dataset.
78+
79+
* user_artists.dat
80+
81+
This file contains the artists listened by each user.
82+
83+
It also provides a listening count for each [user, artist] pair.
84+
85+
* user_taggedartists.dat - user_taggedartists-timestamps.dat
86+
87+
These files contain the tag assignments of artists provided by each particular user.
88+
89+
They also contain the timestamps when the tag assignments were done.
90+
91+
* user_friends.dat
92+
93+
These files contain the friend relations between users in the database.
94+
```
95+
96+
MHCN 在 github 上开源了其处理好的数据集,包括 `ratings.txt``trusts.txt`
97+
两个文件,下载地址: https://github.com/Coder-Yu/QRec/blob/master/dataset/lastfm
98+
99+
## 运行环境
100+
101+
PaddlePaddle>=2.0
102+
103+
python 2.7/3.5/3.6/3.7
104+
105+
os : windows/linux/macos
106+
107+
## 快速开始
108+
109+
本文提供了样例数据可以供您快速体验,在任意目录下均可执行。在 mhcn 模型目录的快速执行命令如下:
110+
111+
```bash
112+
# 进入模型目录
113+
# cd models/recall/mhcn
114+
# 动态图训练
115+
python -u ../../../tools/trainer.py -m config.yaml # 全量数据运行config_bigdata.yaml
116+
# 动态图预测 (注意:这里 mhcn 评估指标较为复杂,没有集成到通用 tools/infer.py 中)
117+
python infer.py -m config.yaml
118+
```
119+
120+
## 模型组网
121+
122+
MHCN 网络结构见上图 2 所示,与 net.py 文件中代码一一对应。
123+
124+
## 效果复现
125+
126+
![](https://tva1.sinaimg.cn/large/008i3skNly1gya5pggeiaj30nq02mt97.jpg)
127+
128+
原论文 MHCN 在 LastFM 数据集上 P@10、R@10、N@10 最佳结果分别为
129+
20.052%、20.375%、0.24395,其 [github 开源代码](https://github.com/Coder-Yu/QRec/issues/216) issues 有反馈前两项指标稍低, NDCG@10 要稍高一些,
130+
复现结果为 19.607%、19.914%、0.24459,但结果相对其他模型仍是最优的。
131+
132+
为了方便使用者能够快速的跑通每一个模型,我们在每个模型下都提供了样例数据。如果需要复现 readme 中的效果,请按如下步骤依次操作即可。 在全量数据下模型的指标如下:
133+
| 模型 | P@10 | R@10 | N@10 | batch_size | epoch_num| Time of each epoch |
134+
| :------| :------ | :------ | :------ | :------ | :------| :------ |
135+
| MHCN | 20.063% | 20.452% | 0.24780 | 2000 | 120 | 约 15 分钟 |
136+
137+
1. 确认您当前所在目录为 PaddleRec/models/recall/mhcn
138+
2. 进入 PaddleRec/datasets/LastFM_MHCN 目录下,执行该脚本,会从国内源的服务器上下载数据集,并解压到指定文件夹。
139+
```shell
140+
cd ../../../datasets/LastFM_MHCN
141+
sh run.sh
142+
```
143+
3. 切回模型目录,执行命令运行全量数据
144+
145+
```bash
146+
# 需要在 mhcn 目录下
147+
# 动态图训练
148+
python -u ../../../tools/trainer.py -m config_bigdata.yaml # 全量数据运行 config_bigdata.yaml
149+
python -u infer.py -m config_bigdata.yaml # 全量数据运行 config_bigdata.yaml
150+
```
151+
152+
## 进阶使用
153+
154+
## FAQ

models/recall/mhcn/__init__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.

models/recall/mhcn/config.yaml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
runner:
16+
train_data_dir: "data"
17+
rating_file: "data/ratings_100.txt"
18+
relation_file: "data/trusts_100.txt"
19+
train_batch_size: 8
20+
train_reader_path: "lastfm_reader" # importlib format
21+
use_gpu: False
22+
use_auc: False
23+
epochs: 5
24+
print_interval: 1
25+
model_save_path: "output_model_mhcn"
26+
test_data_dir: "data"
27+
infer_batch_size: 8
28+
infer_reader_path: "lastfm_reader" # importlib format
29+
infer_load_path: "output_model_mhcn"
30+
infer_start_epoch: 0
31+
infer_end_epoch: 5
32+
top_k: 10
33+
is_train: True
34+
35+
# hyper parameters of user-defined network
36+
hyper_parameters:
37+
# optimizer config
38+
optimizer:
39+
class: Adam
40+
learning_rate: 0.001
41+
# strategy: async
42+
# user-defined <key, value> pairs
43+
n_layer: 2
44+
num_factors: 50
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
runner:
16+
train_data_dir: "data"
17+
rating_file: "../../../datasets/LastFM_MHCN/lastfm/ratings.txt"
18+
relation_file: "../../../datasets/LastFM_MHCN/lastfm/trusts.txt"
19+
train_batch_size: 2000
20+
train_reader_path: "lastfm_reader" # importlib format
21+
use_gpu: True
22+
use_auc: False
23+
epochs: 120
24+
print_interval: 1
25+
model_save_path: "output_model_mhcn_all"
26+
test_data_dir: "data"
27+
infer_batch_size: 2000
28+
infer_reader_path: "lastfm_reader" # importlib format
29+
infer_load_path: "output_model_mhcn_all"
30+
infer_start_epoch: 0
31+
infer_end_epoch: 117
32+
top_k: 10
33+
is_train: True
34+
35+
# hyper parameters of user-defined network
36+
hyper_parameters:
37+
# optimizer config
38+
optimizer:
39+
class: Adam
40+
learning_rate: 0.001
41+
# strategy: async
42+
# user-defined <key, value> pairs
43+
n_layer: 2
44+
num_factors: 50
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
2 51 13883
2+
2 52 11690
3+
2 53 11351
4+
2 54 10300
5+
2 55 8983
6+
2 56 6152
7+
2 57 5955
8+
2 58 4616
9+
2 59 4337
10+
2 60 4147
11+
2 61 3923
12+
2 62 3782
13+
2 63 3735
14+
2 64 3644
15+
2 65 3579
16+
2 66 3312
17+
2 67 3301
18+
2 68 2927
19+
2 69 2720
20+
2 70 2686
21+
2 71 2654
22+
2 72 2619
23+
2 73 2584
24+
2 74 2547
25+
2 75 2397
26+
2 76 2382
27+
2 77 2120
28+
2 78 2119
29+
2 79 1990
30+
2 80 1972
31+
2 81 1948
32+
2 82 1868
33+
2 83 1792
34+
2 84 1740
35+
2 85 1638
36+
2 86 1594
37+
2 87 1559
38+
2 88 1553
39+
2 89 1519
40+
2 90 1471
41+
2 91 1438
42+
2 92 1411
43+
2 93 1407
44+
2 94 1373
45+
2 95 1363
46+
2 96 1342
47+
2 97 1337
48+
2 98 1332
49+
2 99 1330
50+
2 100 1315
51+
3 101 13176
52+
3 102 662
53+
3 103 493
54+
3 104 431
55+
3 105 403
56+
3 106 354
57+
3 107 269
58+
3 108 236
59+
3 109 215
60+
3 110 215
61+
3 111 212
62+
3 112 193
63+
3 113 187
64+
3 114 167
65+
3 115 163
66+
3 116 154
67+
3 117 151
68+
3 118 142
69+
3 119 142
70+
3 120 125
71+
3 121 111
72+
3 122 108
73+
3 123 104
74+
3 124 99
75+
3 125 98
76+
3 126 94
77+
3 127 89
78+
3 128 89
79+
3 129 86
80+
3 130 85
81+
3 131 84
82+
3 132 83
83+
3 133 83
84+
3 134 77
85+
3 135 77
86+
3 136 76
87+
3 137 75
88+
3 138 72
89+
3 139 72
90+
3 140 71
91+
3 141 70
92+
3 142 70
93+
3 143 70
94+
3 144 69
95+
3 145 68
96+
3 146 67
97+
3 147 67
98+
3 148 66
99+
3 149 66
100+
3 150 65

0 commit comments

Comments
 (0)