Skip to content

Commit 2491dfb

Browse files
authored
Dev lmc: add multi task learning models (#391)
add multi task learning models:add Shared-Bottom, ESSM, MMOE, CGC, PLE
1 parent 9f15559 commit 2491dfb

File tree

10 files changed

+660
-4
lines changed

10 files changed

+660
-4
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
strategy:
1919
matrix:
2020
python-version: [3.6,3.7]
21-
tf-version: [1.4.0,1.15.0,2.1.0,2.5.0]
21+
tf-version: [1.4.0,1.15.0,2.2.0,2.5.0]
2222

2323
exclude:
2424
- python-version: 3.7

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,8 @@ Let's [**Get Started!**](https://deepctr-doc.readthedocs.io/en/latest/Quick-Star
4545
| Attentional Factorization Machine | [IJCAI 2017][Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks](http://www.ijcai.org/proceedings/2017/435) |
4646
| Neural Factorization Machine | [SIGIR 2017][Neural Factorization Machines for Sparse Predictive Analytics](https://arxiv.org/pdf/1708.05027.pdf) |
4747
| xDeepFM | [KDD 2018][xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://arxiv.org/pdf/1803.05170.pdf) |
48-
| Deep Interest Network | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1706.06978.pdf)
49-
| AutoInt | [CIKM 2019][AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/abs/1810.11921) ||
48+
| Deep Interest Network | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1706.06978.pdf) |
49+
| AutoInt | [CIKM 2019][AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/abs/1810.11921) |
5050
| Deep Interest Evolution Network | [AAAI 2019][Deep Interest Evolution Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1809.03672.pdf) |
5151
| FwFM | [WWW 2018][Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising](https://arxiv.org/pdf/1806.03514.pdf) |
5252
| ONN | [arxiv 2019][Operation-aware Neural Networks for User Response Prediction](https://arxiv.org/pdf/1904.12579.pdf) |
@@ -59,7 +59,11 @@ Let's [**Get Started!**](https://deepctr-doc.readthedocs.io/en/latest/Quick-Star
5959
| DCN V2 | [arxiv 2020][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) |
6060
| DIFM | [IJCAI 2020][A Dual Input-aware Factorization Machine for CTR Prediction](https://www.ijcai.org/Proceedings/2020/0434.pdf) |
6161
| FEFM and DeepFEFM | [arxiv 2020][Field-Embedded Factorization Machines for Click-through rate prediction](https://arxiv.org/abs/2009.09931) |
62-
62+
| Shared-Bottom | [Multitask learning](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf) |
63+
| ESSM | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931) |
64+
| MMOE | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007) |
65+
| CGC | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/10.1145/3383313.3412236) |
66+
| PLE | [RecSys 2020][Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/10.1145/3383313.3412236) |
6367
## Citation
6468

6569
- Weichen Shen. (2017). DeepCTR: Easy-to-use,Modular and Extendible package of deep-learning based CTR models. https://github.com/shenweichen/deepctr.

deepctr/models/mtl/__init__.py

Whitespace-only changes.

deepctr/models/mtl/cgc.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
"""
2+
Author:
3+
Mincai Lai, [email protected]
4+
5+
Reference:
6+
[1] Tang H, Liu J, Zhao M, et al. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations[C]//Fourteenth ACM Conference on Recommender Systems. 2020.(https://arxiv.org/abs/1804.07931)
7+
"""
8+
import tensorflow as tf
9+
10+
from ...feature_column import build_input_features, input_from_feature_columns
11+
from ...layers.core import PredictionLayer, DNN
12+
from ...layers.utils import combined_dnn_input, reduce_sum
13+
14+
15+
def CGC(dnn_feature_columns, num_tasks=None, task_types=None, task_names=None, num_experts_specific=8,
16+
num_experts_shared=4,
17+
expert_dnn_units=(128, 128), gate_dnn_units=None, tower_dnn_units_lists=((32,), (32,)),
18+
l2_reg_embedding=0.00001, l2_reg_dnn=0, seed=1024, dnn_dropout=0, dnn_activation='relu', dnn_use_bn=False):
19+
"""Instantiates the Customized Gate Control block of Progressive Layered Extraction architecture.
20+
21+
:param dnn_feature_columns: An iterable containing all the features used by deep part of the model.
22+
:param num_tasks: integer, number of tasks, equal to number of outputs, must be greater than 1.
23+
:param task_types: list of str, indicating the loss of each tasks, ``"binary"`` for binary logloss, ``"regression"`` for regression loss. e.g. ['binary', 'regression']
24+
:param task_names: list of str, indicating the predict target of each tasks
25+
26+
:param num_experts_specific: integer, number of task-specific experts.
27+
:param num_experts_shared: integer, number of task-shared experts.
28+
29+
:param expert_dnn_units: list, list of positive integer, its length must be greater than 1, the layer number and units in each layer of expert DNN
30+
:param gate_dnn_units: list, list of positive integer or None, the layer number and units in each layer of gate DNN, default value is None. e.g.[8, 8].
31+
:param tower_dnn_units_lists: list, list of positive integer list, its length must be euqal to num_tasks, the layer number and units in each layer of task-specific DNN
32+
33+
:param l2_reg_embedding: float. L2 regularizer strength applied to embedding vector
34+
:param l2_reg_dnn: float. L2 regularizer strength applied to DNN
35+
:param seed: integer ,to use as random seed.
36+
:param dnn_dropout: float in [0,1), the probability we will drop out a given DNN coordinate.
37+
:param dnn_activation: Activation function to use in DNN
38+
:param dnn_use_bn: bool. Whether use BatchNormalization before activation or not in DNN
39+
:return: a Keras model instance
40+
"""
41+
42+
if num_tasks <= 1:
43+
raise ValueError("num_tasks must be greater than 1")
44+
if len(task_types) != num_tasks:
45+
raise ValueError("num_tasks must be equal to the length of task_types")
46+
47+
for task_type in task_types:
48+
if task_type not in ['binary', 'regression']:
49+
raise ValueError("task must be binary or regression, {} is illegal".format(task_type))
50+
51+
if num_tasks != len(tower_dnn_units_lists):
52+
raise ValueError("the length of tower_dnn_units_lists must be euqal to num_tasks")
53+
54+
features = build_input_features(dnn_feature_columns)
55+
56+
inputs_list = list(features.values())
57+
58+
sparse_embedding_list, dense_value_list = input_from_feature_columns(features, dnn_feature_columns,
59+
l2_reg_embedding, seed)
60+
dnn_input = combined_dnn_input(sparse_embedding_list, dense_value_list)
61+
62+
expert_outputs = []
63+
# build task-specific expert layer
64+
for i in range(num_tasks):
65+
for j in range(num_experts_specific):
66+
expert_network = DNN(expert_dnn_units, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
67+
name='task_' + task_names[i] + '_expert_specific_' + str(j))(dnn_input)
68+
expert_outputs.append(expert_network)
69+
70+
# build task-shared expert layer
71+
for i in range(num_experts_shared):
72+
expert_network = DNN(expert_dnn_units, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
73+
name='expert_shared_' + str(i))(dnn_input)
74+
expert_outputs.append(expert_network)
75+
76+
# build one Extraction Layer
77+
cgc_outs = []
78+
for i in range(num_tasks):
79+
# concat task-specific expert and task-shared expert
80+
cur_expert_num = num_experts_specific + num_experts_shared
81+
cur_experts = expert_outputs[i * num_experts_specific:(i + 1) * num_experts_specific] + expert_outputs[-int(
82+
num_experts_shared):] # task_specific + task_shared
83+
expert_concat = tf.keras.layers.concatenate(cur_experts, axis=1, name='expert_concat_' + task_names[i])
84+
expert_concat = tf.keras.layers.Reshape([cur_expert_num, expert_dnn_units[-1]],
85+
name='expert_reshape_' + task_names[i])(expert_concat)
86+
87+
# build gate layers
88+
if gate_dnn_units != None:
89+
gate_network = DNN(gate_dnn_units, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
90+
name='gate_' + task_names[i])(dnn_input)
91+
gate_input = gate_network
92+
else: # in origin paper, gate is one Dense layer with softmax.
93+
gate_input = dnn_input
94+
95+
gate_out = tf.keras.layers.Dense(cur_expert_num, use_bias=False, activation='softmax',
96+
name='gate_softmax_' + task_names[i])(gate_input)
97+
gate_out = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1))(gate_out)
98+
99+
# gate multiply the expert
100+
gate_mul_expert = tf.keras.layers.Multiply(name='gate_mul_expert_' + task_names[i])([expert_concat, gate_out])
101+
gate_mul_expert = tf.keras.layers.Lambda(lambda x: reduce_sum(x, axis=1, keep_dims=True))(gate_mul_expert)
102+
cgc_outs.append(gate_mul_expert)
103+
104+
task_outs = []
105+
for task_type, task_name, tower_dnn, cgc_out in zip(task_types, task_names, tower_dnn_units_lists, cgc_outs):
106+
# build tower layer
107+
tower_output = DNN(tower_dnn, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
108+
name='tower_' + task_name)(cgc_out)
109+
logit = tf.keras.layers.Dense(1, use_bias=False, activation=None)(tower_output)
110+
output = PredictionLayer(task_type, name=task_name)(logit)
111+
task_outs.append(output)
112+
113+
model = tf.keras.models.Model(inputs=inputs_list, outputs=task_outs)
114+
return model

deepctr/models/mtl/essm.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
"""
2+
Author:
3+
Mincai Lai, [email protected]
4+
5+
Reference:
6+
[1] Ma X, Zhao L, Huang G, et al. Entire space multi-task model: An effective approach for estimating post-click conversion rate[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018.(https://arxiv.org/abs/1804.07931)
7+
"""
8+
9+
import tensorflow as tf
10+
11+
from ...feature_column import build_input_features, input_from_feature_columns
12+
from ...layers.core import PredictionLayer, DNN
13+
from ...layers.utils import combined_dnn_input
14+
15+
16+
def ESSM(dnn_feature_columns, task_type='binary', task_names=('ctr', 'ctcvr'),
17+
tower_dnn_units_lists=((128, 128), (128, 128)), l2_reg_embedding=0.00001, l2_reg_dnn=0,
18+
seed=1024, dnn_dropout=0, dnn_activation='relu', dnn_use_bn=False):
19+
"""Instantiates the Entire Space Multi-Task Model architecture.
20+
21+
:param dnn_feature_columns: An iterable containing all the features used by deep part of the model.
22+
:param task_type: str, indicating the loss of each tasks, ``"binary"`` for binary logloss or ``"regression"`` for regression loss.
23+
:param task_names: list of str, indicating the predict target of each tasks. default value is ['ctr', 'ctcvr']
24+
25+
:param tower_dnn_units_lists: list, list of positive integer, the length must be equal to 2, the layer number and units in each layer of task-specific DNN
26+
27+
:param l2_reg_embedding: float. L2 regularizer strength applied to embedding vector
28+
:param l2_reg_dnn: float. L2 regularizer strength applied to DNN
29+
:param seed: integer ,to use as random seed.
30+
:param dnn_dropout: float in [0,1), the probability we will drop out a given DNN coordinate.
31+
:param dnn_activation: Activation function to use in DNN
32+
:param dnn_use_bn: bool. Whether use BatchNormalization before activation or not in DNN
33+
:return: A Keras model instance.
34+
"""
35+
if len(task_names) != 2:
36+
raise ValueError("the length of task_names must be equal to 2")
37+
38+
if len(tower_dnn_units_lists) != 2:
39+
raise ValueError("the length of tower_dnn_units_lists must be equal to 2")
40+
41+
features = build_input_features(dnn_feature_columns)
42+
inputs_list = list(features.values())
43+
44+
sparse_embedding_list, dense_value_list = input_from_feature_columns(features, dnn_feature_columns,
45+
l2_reg_embedding, seed)
46+
47+
dnn_input = combined_dnn_input(sparse_embedding_list, dense_value_list)
48+
49+
ctr_output = DNN(tower_dnn_units_lists[0], dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed)(
50+
dnn_input)
51+
cvr_output = DNN(tower_dnn_units_lists[1], dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed)(
52+
dnn_input)
53+
54+
ctr_logit = tf.keras.layers.Dense(1, use_bias=False, activation=None)(ctr_output)
55+
cvr_logit = tf.keras.layers.Dense(1, use_bias=False, activation=None)(cvr_output)
56+
57+
ctr_pred = PredictionLayer(task_type, name=task_names[0])(ctr_logit)
58+
cvr_pred = PredictionLayer(task_type)(cvr_logit)
59+
60+
ctcvr_pred = tf.keras.layers.Multiply(name=task_names[1])([ctr_pred, cvr_pred]) # CTCVR = CTR * CVR
61+
62+
model = tf.keras.models.Model(inputs=inputs_list, outputs=[ctr_pred, ctcvr_pred])
63+
return model

deepctr/models/mtl/mmoe.py

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
"""
2+
Author:
3+
Mincai Lai, [email protected]
4+
5+
Reference:
6+
[1] Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.(https://dl.acm.org/doi/abs/10.1145/3219819.3220007)
7+
"""
8+
9+
import tensorflow as tf
10+
11+
from ...feature_column import build_input_features, input_from_feature_columns
12+
from ...layers.core import PredictionLayer, DNN
13+
from ...layers.utils import combined_dnn_input, reduce_sum
14+
15+
16+
def MMOE(dnn_feature_columns, num_tasks=None, task_types=None, task_names=None, num_experts=4,
17+
expert_dnn_units=(128, 128), gate_dnn_units=None, tower_dnn_units_lists=((32,), (32,)),
18+
l2_reg_embedding=0.00001, l2_reg_dnn=0, seed=1024, dnn_dropout=0, dnn_activation='relu', dnn_use_bn=False):
19+
"""Instantiates the Multi-gate Mixture-of-Experts multi-task learning architecture.
20+
21+
:param dnn_feature_columns: An iterable containing all the features used by deep part of the model.
22+
:param num_tasks: integer, number of tasks, equal to number of outputs, must be greater than 1.
23+
:param task_types: list of str, indicating the loss of each tasks, ``"binary"`` for binary logloss, ``"regression"`` for regression loss. e.g. ['binary', 'regression']
24+
:param task_names: list of str, indicating the predict target of each tasks
25+
26+
:param num_experts: integer, number of experts.
27+
:param expert_dnn_units: list, list of positive integer, its length must be greater than 1, the layer number and units in each layer of expert DNN
28+
:param gate_dnn_units: list, list of positive integer or None, the layer number and units in each layer of gate DNN, default value is None. e.g.[8, 8].
29+
:param tower_dnn_units_lists: list, list of positive integer list, its length must be euqal to num_tasks, the layer number and units in each layer of task-specific DNN
30+
31+
:param l2_reg_embedding: float. L2 regularizer strength applied to embedding vector
32+
:param l2_reg_dnn: float. L2 regularizer strength applied to DNN
33+
:param seed: integer ,to use as random seed.
34+
:param dnn_dropout: float in [0,1), the probability we will drop out a given DNN coordinate.
35+
:param dnn_activation: Activation function to use in DNN
36+
:param dnn_use_bn: bool. Whether use BatchNormalization before activation or not in DNN
37+
:return: a Keras model instance
38+
"""
39+
40+
if num_tasks <= 1:
41+
raise ValueError("num_tasks must be greater than 1")
42+
43+
if len(task_types) != num_tasks:
44+
raise ValueError("num_tasks must be equal to the length of task_types")
45+
46+
for task_type in task_types:
47+
if task_type not in ['binary', 'regression']:
48+
raise ValueError("task must be binary or regression, {} is illegal".format(task_type))
49+
50+
if num_tasks != len(tower_dnn_units_lists):
51+
raise ValueError("the length of tower_dnn_units_lists must be euqal to num_tasks")
52+
53+
features = build_input_features(dnn_feature_columns)
54+
55+
inputs_list = list(features.values())
56+
57+
sparse_embedding_list, dense_value_list = input_from_feature_columns(features, dnn_feature_columns,
58+
l2_reg_embedding, seed)
59+
dnn_input = combined_dnn_input(sparse_embedding_list, dense_value_list)
60+
61+
# build expert layer
62+
expert_outs = []
63+
for i in range(num_experts):
64+
expert_network = DNN(expert_dnn_units, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
65+
name='expert_' + str(i))(dnn_input)
66+
expert_outs.append(expert_network)
67+
expert_concat = tf.keras.layers.concatenate(expert_outs, axis=1, name='expert_concat')
68+
expert_concat = tf.keras.layers.Reshape([num_experts, expert_dnn_units[-1]], name='expert_reshape')(
69+
expert_concat) # (num_experts, output dim of expert_network)
70+
71+
mmoe_outs = []
72+
for i in range(num_tasks): # one mmoe layer: nums_tasks = num_gates
73+
# build gate layers
74+
if gate_dnn_units != None:
75+
gate_network = DNN(gate_dnn_units, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
76+
name='gate_' + task_names[i])(dnn_input)
77+
gate_input = gate_network
78+
else: # in origin paper, gate is one Dense layer with softmax.
79+
gate_input = dnn_input
80+
gate_out = tf.keras.layers.Dense(num_experts, use_bias=False, activation='softmax',
81+
name='gate_softmax_' + task_names[i])(gate_input)
82+
gate_out = tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1))(gate_out)
83+
84+
# gate multiply the expert
85+
gate_mul_expert = tf.keras.layers.Multiply(name='gate_mul_expert_' + task_names[i])([expert_concat, gate_out])
86+
gate_mul_expert = tf.keras.layers.Lambda(lambda x: reduce_sum(x, axis=1, keep_dims=False))(gate_mul_expert)
87+
mmoe_outs.append(gate_mul_expert)
88+
89+
task_outs = []
90+
for task_type, task_name, tower_dnn, mmoe_out in zip(task_types, task_names, tower_dnn_units_lists, mmoe_outs):
91+
# build tower layer
92+
tower_output = DNN(tower_dnn, dnn_activation, l2_reg_dnn, dnn_dropout, dnn_use_bn, seed=seed,
93+
name='tower_' + task_name)(mmoe_out)
94+
95+
logit = tf.keras.layers.Dense(1, use_bias=False, activation=None)(tower_output)
96+
output = PredictionLayer(task_type, name=task_name)(logit)
97+
task_outs.append(output)
98+
99+
model = tf.keras.models.Model(inputs=inputs_list, outputs=task_outs)
100+
return model

0 commit comments

Comments
 (0)