MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets. Currently, we implmented 7 multi-task recommendation models to enable fair comparison and boost the development of multi-task recommendation algorithms. The currently supported algorithms include:
- SingleTask:Train one model for each task, respectively
- Shared-Bottom: It is a traditional multi-task model with a shared bottom and multiple towers.
- OMoE: Adaptive Mixtures of Local Experts (Neural Computation 1991)
- MMoE: Modeling Task Relationships in Multi-task Learning with Multi-Gate Mixture-of-Experts (KDD 2018)
- PLE: Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (RecSys 2020 best paper)
- AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising (KDD 2021)
- MetaHeac: Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising (KDD 2021)
- AliExpressDataset: This is a dataset gathered from real-world traffic logs of the search system in AliExpress. This dataset is collected from 5 countries: Russia, Spain, French, Netherlands, and America, which can utilized as 5 multi-task datasets. Original_dataset Processed_dataset Google Drive Processed_dataset Baidu Netdisk
For the processed dataset, you should directly put the dataset in './data/' and unpack it. For the original dataset, you should put it in './data/' and run 'python preprocess.py --dataset_name NL'.
- Python 3.6
- PyTorch > 1.10
- pandas
- numpy
- tqdm
Parameter Configuration:
- dataset_name: choose a dataset in ['AliExpress_NL', 'AliExpress_FR', 'AliExpress_ES', 'AliExpress_US'], default for AliExpress_NL
- dataset_path: default for ./data
- model_name: choose a model in ['singletask', 'sharedbottom', 'omoe', 'mmoe', 'ple', 'aitm', 'metaheac'], default for metaheac
- epoch: the number of epochs for training, default for 50
- task_num: the number of tasks, default for 2(CTR & CVR)
- expert_num: the number of experts for ['omoe', 'mmoe', 'ple', 'metaheac'], default for 8
- learning_rate: default for 0.001
- batch_size: default for 2048
- weight_decay: default for 1e-6
- device: the device to run the code, default for cuda:0
- save_dir: the folder to save parameters, default for chkpt
You can run a model through:
python main.py --model_name metaheac --num_expert 8 --dataset_name AliExpress_NLFor fair comparisons, the learning rate is 0.001, the dimension of embeddings is 128, and mini-batch size is 2048 equally for all models. We report the mean AUC and Logloss over five random runs. Best results are in boldface.
| Methods | AliExpress (Netherlands, NL) | AliExpress (Spain, ES) | ||||||
|---|---|---|---|---|---|---|---|---|
| CTR | CTCVR | CTR | CTCVR | |||||
| AUC | Logloss | AUC | Logloss | AUC | Logloss | AUC | Logloss | |
| SingleTask | 0.7222 | 0.1085 | 0.8590 | 0.00609 | 0.7266 | 0.1207 | 0.8855 | 0.00456 | 
| Shared-Bottom | 0.7228 | 0.1083 | 0.8511 | 0.00620 | 0.7287 | 0.1204 | 0.8866 | 0.00452 | 
| OMoE | 0.7254 | 0.1081 | 0.8611 | 0.00614 | 0.7253 | 0.1209 | 0.8859 | 0.00452 | 
| MMoE | 0.7234 | 0.1080 | 0.8606 | 0.00607 | 0.7285 | 0.1205 | 0.8898 | 0.00450 | 
| PLE | 0.7292 | 0.1088 | 0.8591 | 0.00631 | 0.7273 | 0.1223 | 0.8913 | 0.00461 | 
| AITM | 0.7240 | 0.1078 | 0.8577 | 0.00611 | 0.7290 | 0.1203 | 0.8885 | 0.00451 | 
| MetaHeac | 0.7263 | 0.1077 | 0.8615 | 0.00606 | 0.7299 | 0.1203 | 0.8883 | 0.00450 | 
| Methods | AliExpress (French, FR) | AliExpress (America, US) | ||||||
|---|---|---|---|---|---|---|---|---|
| CTR | CTCVR | CTR | CTCVR | |||||
| AUC | Logloss | AUC | Logloss | AUC | Logloss | AUC | Logloss | |
| SingleTask | 0.7259 | 0.1002 | 0.8737 | 0.00435 | 0.7061 | 0.1004 | 0.8637 | 0.00381 | 
| Shared-Bottom | 0.7245 | 0.1004 | 0.8700 | 0.00439 | 0.7029 | 0.1008 | 0.8698 | 0.00381 | 
| OMoE | 0.7257 | 0.1006 | 0.8781 | 0.00432 | 0.7049 | 0.1007 | 0.8701 | 0.00381 | 
| MMoE | 0.7216 | 0.1010 | 0.8811 | 0.00431 | 0.7043 | 0.1006 | 0.8758 | 0.00377 | 
| PLE | 0.7276 | 0.1014 | 0.8805 | 0.00451 | 0.7138 | 0.0992 | 0.8675 | 0.00403 | 
| AITM | 0.7236 | 0.1005 | 0.8763 | 0.00431 | 0.7048 | 0.1004 | 0.8730 | 0.00377 | 
| MetaHeac | 0.7249 | 0.1005 | 0.8813 | 0.00429 | 0.7089 | 0.1001 | 0.8743 | 0.00378 | 
.
├── main.py
├── README.md
├── models
│   ├── layers.py
│   ├── aitm.py
│   ├── omoe.py
│   ├── mmoe.py
│   ├── metaheac.py
│   ├── ple.py
│   ├── singletask.py
│   └── sharedbottom.py
└── data
    ├── preprocess.py         # Preprocess the original data
    ├── AliExpress_NL         # AliExpressDataset from Netherlands
    	├── train.csv
	└── test.py
    ├── AliExpress_ES         # AliExpressDataset from Spain
    ├── AliExpress_FR         # AliExpressDataset from French
    └── AliExpress_US         # AliExpressDataset from America
If you have any problem about this library, please create an issue or send us an Email at:
If you use this repository, please cite the following papers:
@inproceedings{zhu2021learning,
  title={Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising},
  author={Zhu, Yongchun and Liu, Yudan and Xie, Ruobing and Zhuang, Fuzhen and Hao, Xiaobo and Ge, Kaikai and Zhang, Xu and Lin, Leyu and Cao, Juan},
  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  pages={4005--4013},
  year={2021}
}
@inproceedings{xi2021modeling,
  title={Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising},
  author={Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu},
  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  pages={3745--3755},
  year={2021}
}
Some model implementations and util functions refers to these nice repositories.
- pytorch-fm: This package provides a PyTorch implementation of factorization machine models and common datasets in CTR prediction.
- MetaHeac: This is an official implementation for Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising.
