PaddlePaddle
diff --git a/‎README_CN.md‎
Lines changed: 1 addition & 0 deletions b/‎README_CN.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README_EN.md‎
Lines changed: 211 additions & 0 deletions b/‎README_EN.md‎
Lines changed: 211 additions & 0 deletions
diff --git a/‎datasets/kim/run.sh‎
Lines changed: 2 additions & 0 deletions b/‎datasets/kim/run.sh‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎doc/imgs/kim.png‎
66.8 KB b/‎doc/imgs/kim.png‎
66.8 KB
diff --git a/‎doc/source/models/match/kim.md‎
Lines changed: 98 additions & 0 deletions b/‎doc/source/models/match/kim.md‎
Lines changed: 98 additions & 0 deletions
diff --git a/‎models/match/kim/config.yaml‎
Lines changed: 48 additions & 0 deletions b/‎models/match/kim/config.yaml‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎models/match/kim/config_bigdata.yaml‎
Lines changed: 48 additions & 0 deletions b/‎models/match/kim/config_bigdata.yaml‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎models/match/kim/data/sample_data/KGGraph.json‎
Lines changed: 1 addition & 0 deletions b/‎models/match/kim/data/sample_data/KGGraph.json‎
Lines changed: 1 addition & 0 deletions
@@ -125,6 +125,7 @@ python -u tools/static_trainer.py -m models/rank/dnn/config.yaml #  静态图训
   |   匹配   |                                     [DSSM](models/match/dssm/)([文档](https://paddlerec.readthedocs.io/en/latest/models/match/dssm.html))                                     |  [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238124)  |       ✓     |     x     | >=2.1.0 | [CIKM 2013][Learning Deep Structured Semantic Models for Web Search using Clickthrough Data](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)             |
   |   匹配   |                       [Match-Pyramid](models/match/match-pyramid/)([文档](https://paddlerec.readthedocs.io/en/latest/models/match/match-pyramid.html))                        |  [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238192)  |       ✓     |     x     | >=2.1.0 | [AAAI 2016][Text Matching as Image Recognition](https://arxiv.org/pdf/1602.06359.pdf)             |
   |   匹配   |                   [MultiView-Simnet](models/match/multiview-simnet/)([文档](https://paddlerec.readthedocs.io/en/latest/models/match/multiview-simnet.html))                   |  [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3238206)  |       ✓     |     x     | >=2.1.0 | [WWW 2015][A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)             |
+  |   匹配   |                                      [KIM](models/match/kim/)([文档](https://paddlerec.readthedocs.io/en/latest/models/match/kim.html))                                       |                                      -                                      |       x     |     x     | >=2.1.0 |                                                          [SIGIR 2021][Personalized News Recommendation with Knowledge-aware Interactive Matching](https://arxiv.org/pdf/2104.10083.pdf)                                                          |
   |   召回   |                                          [TDM](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/treebased/tdm/)                                          |  -  |     ✓     |  >=1.8.0  | [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [KDD 2018][Learning Tree-based Deep Model for Recommender Systems](https://arxiv.org/pdf/1801.02294.pdf)                                                                                                    |
   |   召回   |                                      [FastText](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5/models/recall/fasttext/)                                       |  -  |       x     |     x     | [1.8.5](https://github.com/PaddlePaddle/PaddleRec/tree/release/1.8.5) | [EACL 2017][Bag of Tricks for Efficient Text Classification](https://www.aclweb.org/anthology/E17-2068.pdf)                                                                                                 |
   |   召回   |                                    [MIND](models/recall/mind/)([文档](https://paddlerec.readthedocs.io/en/latest/models/recall/mind.html))                                    | [Python CPU/GPU](https://aistudio.baidu.com/aistudio/projectdetail/3239088)   |       x     |     x     |  >=2.1.0 | [2019][Multi-Interest Network with Dynamic Routing for Recommendation at Tmall](https://arxiv.org/pdf/1904.08030.pdf)                                                                                                 |
 
@@ -0,0 +1,2 @@
+wget https://paddlerec.bj.bcebos.com/datasets/kim/kim.zip
+unzip kim.zip
@@ -0,0 +1,98 @@
+# kim文本匹配模型
+
+
+以下是本例的简要目录结构及说明： 
+
+```
+├── data #样例数据
+    ├── sample_data #样例数据
+        ├── docs.tsv #新闻文本
+        ├── entity2id.txt #实体映射表
+        ├── glove.840B.300d.txt #glove词向量
+        ├── KGGraph #知识图谱关系
+        ├── KGGraph #知识图谱关系
+        ├── train.tsv #训练数据样例
+        ├── test.txt #测试数据样例
+├── __init__.py
+├── README.md #文档
+├── config.yaml # sample数据配置
+├── config_bigdata.yaml # 全量数据配置
+├── dygraph_model.py # 构建动态图
+├── net.py # 模型核心组网（动静统一）
+├── mind_reader.py #数据读取程序
+├── eval_utils #评估函数
+```
+
+注：在阅读该示例前，建议您先了解以下内容：
+
+[paddlerec入门教程](https://github.com/PaddlePaddle/PaddleRec/blob/master/README.md)  
+[kim](https://paddlerec.readthedocs.io/en/latest/models/match/kim.html)  
+
+## 内容
+
+- [模型简介](#模型简介)
+- [数据准备](#数据准备)
+- [运行环境](#运行环境)
+- [快速开始](#快速开始)
+- [模型组网](#模型组网)
+- [效果复现](#效果复现)
+- [进阶使用](#进阶使用)
+- [FAQ](#FAQ)
+
+
+## 数据准备
+训练及测试数据集选用mind新闻、 glove.840B.300d 词向量初始化embedding层和知识图谱数据。
+
+## 运行环境
+PaddlePaddle>=2.0
+nltk>=3.7
+python 3.7
+
+os : windows/linux/macos  
+
+## 快速开始
+**开始前确保已nltk英文分词模型到个人目录下**
+本文提供了样例数据可以供您快速体验，在任意目录下均可执行。在kim模型目录的快速执行命令如下： 
+```bash
+# 进入模型目录
+# cd models/match/match-pyramid # 在任意目录均可运行
+# 动态图训练
+python -u trainer.py -m config.yaml -o mode=train # 全量数据运行config_bigdata.yaml 
+# 动态图预测
+python -u infer.py -m config.yaml -o mode=test
+```  
+
+## 模型组网
+个性化新闻推荐的核心是候选新闻和用户兴趣之间的准确匹配。大多数现有的新闻推荐方法通常从文本内容中建立候选新闻模型，并从用户点击的新闻中建立用户兴趣模型，两者是独立的。然而，一篇新闻可能涵盖多个方面和实体，一个用户也可能有多种兴趣。对候选新闻和用户兴趣的独立建模可能会导致新闻和用户之间的劣质匹配。在本文中，我们提出了一个用于个性化新闻推荐的知识感知的交互式匹配框架。我们的方法可以对候选新闻和用户兴趣进行交互式建模，以学习用户感知的候选新闻表示和候选新闻感知的用户兴趣表示，这可以促进用户兴趣和候选新闻之间的准确匹配。更具体地说，我们提出了一个知识协同编码器，借助知识图谱捕捉实体中的关联性，为点击新闻和候选新闻交互式地学习基于知识的新闻表示。此外，我们还提出了一个文本协同编码器，通过对文本之间的语义关系进行建模，为被点击新闻和候选新闻交互式地学习基于文本的新闻表示。此外，我们还提出了一个用户-新闻联合编码器，从候选新闻和点击新闻的知识和基于文本的表征中学习候选新闻的用户兴趣表征和用户意识到的候选新闻表征，以实现更好的兴趣匹配。通过在两个真实世界的数据集上进行广泛的实验，我们证明了我们的方法可以有效地提高新闻推荐的性能。:
+<p align="center">
+<img align="center" src="../../../doc/imgs/kim.png">
+<p>
+
+
+## 效果复现
+为了方便使用者能够快速的跑通每一个模型，我们在每个模型下都提供了样例数据。如果需要复现readme中的效果,请按如下步骤依次操作即可。  
+在全量数据下模型的指标如下：  
+## 效果复现
+为了方便使用者能够快速的跑通每一个模型，我们在每个模型下都提供了样例数据。如果需要复现readme中的效果,请按如下步骤依次操作即可。  
+在全量数据下模型的指标如下：  
+
+| 模型  | AUC |  MRR   |    nDCG5 |   nDCG10  | batch_size | epoch_num | Time of each epoch |
+|-----|-----|-----|-----|-----|------------|-----------|--------------------|
+| kim |  0.6681   |   0.3164  |    0.3484 |  0.4132   | 16         | 7         | 2h                 |
+|  kim   |   0.6696  |   0.3192  |   0.3515  |  0.4158   | 16         | 8         | 2h                 |
+
+1. 确认您当前所在目录为PaddleRec/models/match/kim
+2. 进入paddlerec/datasets/kim目录下，执行该脚本，会从国内源的服务器上下载我们预处理完成的kim全量数据集，并解压到指定文件夹。
+``` bash
+cd ../../../datasets/kim
+bash run.sh
+```
+3. 切回模型目录f
+```bash
+python -u trainer.py -m config_bigdata.yml -o mode=train
+python -u infer.py -m config_bigdata.yml -o mode=test
+```
+
+## 进阶使用
+  
+## FAQ
@@ -0,0 +1,48 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# global settings
+
+runner:
+  train_data_dir: "./data/sample_data"
+  train_reader_path: "mind_reader" # importlib format
+  use_gpu: False
+  use_auc: False
+  train_batch_size: 1
+  epochs: 1
+  print_interval: 1
+  #model_init_path: "output_model/0" # init model
+  model_save_path: "output_model_kim"
+  test_data_dir: "./data/sample_data"
+  infer_reader_path: "mind_reader" # importlib format
+  infer_batch_size: 1
+  infer_load_path: "output_model_kim"
+  infer_start_epoch: 0
+  infer_end_epoch: 1
+  random_emb: true
+
+# hyper parameters of user-defined network
+hyper_parameters:
+  # optimizer config
+  optimizer:
+    class: Adam
+    learning_rate: 0.00005
+  # user-defined <key, value> pairs
+  max_sentence: 30
+  max_sents: 50
+  max_entity_num: 10
+  npratio: 4
+  hidden_size: 400
+  embedding_size: 300
+  vocab_size: 1891
@@ -0,0 +1,48 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# global settings
+
+runner:
+  train_data_dir: "../../../datasets/kim/data/whole_data"
+  train_reader_path: "mind_reader" # importlib format
+  use_gpu: True
+  use_auc: False
+  train_batch_size: 16
+  epochs: 8
+  print_interval: 50
+  #model_init_path: "output_model/0" # init model
+  model_save_path: "output_model_kim_all"
+  test_data_dir: "../../../datasets/kim/data/whole_data"
+  infer_reader_path: "mind_reader" # importlib format
+  infer_batch_size: 64
+  infer_load_path: "output_model_kim_all"
+  infer_start_epoch: 6
+  infer_end_epoch: 8
+  random_emb: false
+
+# hyper parameters of user-defined network
+hyper_parameters:
+  # optimizer config
+  optimizer:
+    class: Adam
+    learning_rate: 0.00005
+  # user-defined <key, value> pairs
+  max_sentence: 30
+  max_sents: 50
+  max_entity_num: 10
+  npratio: 4
+  hidden_size: 400
+  embedding_size: 300
+  vocab_size: 42055
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+wget https://paddlerec.bj.bcebos.com/datasets/kim/kim.zip`
	`2`	`+unzip kim.zip`