Skip to content

Commit 9376283

Browse files
committed
Upload sections
1 parent d640ac8 commit 9376283

File tree

4 files changed

+742
-0
lines changed

4 files changed

+742
-0
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Overview
2+
3+
The central component of a recommender system is the recommendation
4+
model, which generates prospective items of interest for users based on
5+
given input data. For a large-scale recommender system to function
6+
seamlessly and deliver high-quality results, it needs additional
7+
supporting modules built around this central model.
8+
9+
Figure :numref:`recommender systems` illustrates the essential modules
10+
of a typical recommender system. A messaging queue accepts logs uploaded
11+
from the client-side of the recommendation service. These logs capture
12+
user feedback on previously recommended items, such as a record of
13+
whether users clicked on the suggested items. A separate data processing
14+
module handles the raw data from these logs, generating new training
15+
samples that are subsequently added to another message queue.
16+
17+
Training servers extract these training samples from the message queue
18+
and use them to update model parameters. A typical recommendation model
19+
comprises two components: embedding tables and neural networks. During
20+
the training phase, each training server retrieves the model parameters
21+
from parameter servers, calculates gradients, and then uploads these
22+
gradients back to parameter servers. Parameter servers integrate the
23+
results from each training server and update the parameters accordingly.
24+
25+
Inference servers handle user requests, procure the necessary model
26+
parameters from parameter servers based on these requests, and calculate
27+
the recommendation outcomes.
28+
29+
![Architecture of a recommendersystem](../img/ch_recommender/recommender_system.png)
30+
:label:`recommender systems`
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# Recommendation Pipeline
2+
3+
A recommendation pipeline, designed to suggest items of potential
4+
interest to users based on their requests, is an integral part of any
5+
recommender system. Specifically, a user seeking recommendations submits
6+
a request that includes their user ID and the current context features,
7+
such as recently browsed items and browsing duration, to the inference
8+
service. The recommendation pipeline uses these user features and those
9+
of potential items as input for computation. It then derives a score for
10+
each candidate item, selects the highest-scoring items (ranging from
11+
dozens to hundreds) to form the recommendation result, and delivers this
12+
result back to the user.
13+
14+
Given that a recommender system generally contains billions of potential
15+
items, using just a single model to compute the score of each item
16+
necessitates a trade-off between model accuracy and speed. In other
17+
words, opting for a simpler model may boost speed but potentially result
18+
in recommendations that fail to pique the user's interest due to
19+
diminished accuracy. On the other hand, using a more complex model may
20+
provide more accurate results but deter users due to longer waiting
21+
times.
22+
23+
![Example of a multi-stage recommendationpipeline](../img/ch_recommender/recommender_pipeline.png)
24+
:label:`recommender pipeline`
25+
26+
To mitigate this, contemporary recommender systems typically deploy
27+
multiple recommendation models as part of a pipeline, as illustrated in
28+
Figure :numref:`recommender pipeline`. The pipeline begins with the
29+
retrieval stage, employing fast, simple models to filter the entire pool
30+
of candidate items, identifying thousands to tens of thousands of items
31+
that the user may find appealing. Following this, in the ranking stage,
32+
slower, more complex models score and order the retrieved items. The
33+
top-scoring items (the exact number may vary depending on the specific
34+
service scenario), numbering in the dozens or hundreds, are returned as
35+
the final recommendation. If the ranking models are too intricate and
36+
cannot process all retrieved items within the given time frame, the
37+
ranking stage may be further divided into three sub-stages: pre-ranking,
38+
ranking, and re-ranking.
39+
40+
## Retrieval Stage
41+
42+
The retrieval stage is the initial phase of the recommendation process.
43+
The model takes user features as input and performs a rough filter of
44+
all candidate items to identify those the user might be interested in.
45+
These selected items form the output. The main goal of the retrieval
46+
stage is to reduce the pool of candidate items, thereby lightening the
47+
computational load on the ranking model in the subsequent stage.
48+
49+
### Two-Tower Model
50+
51+
To illustrate the retrieval process, let's consider the two-tower model
52+
as an example, as shown in Figure
53+
:numref:`two tower model`. The two-tower model contains two
54+
multilayer perceptrons (MLPs) which encode user features and item
55+
features, referred to as the user tower[^1] and item tower,
56+
respectively.
57+
58+
Continuous features can be input directly into the MLPs, while discrete
59+
features must first be mapped into a dense vector using embedding tables
60+
before being fed into the MLPs. The user tower and item tower process
61+
these features to generate user vectors and item vectors, respectively,
62+
each representing a unique user or item. The two-tower model employs a
63+
scoring function to evaluate the similarity between user vectors and
64+
item vectors.
65+
66+
![Structure of the two-towermodel](../img/ch_recommender/two_tower_model.png)
67+
:label:`two tower model`
68+
69+
### Training
70+
71+
During training, the model input consists of the user's feedback data on
72+
historical recommendation results, represented by the tuple \<user,
73+
item, label\>. The label denotes whether the user has clicked the item,
74+
with 1 and 0 typically representing a click and non-click, respectively.
75+
The two-tower model uses positive samples (i.e., samples where the label
76+
is 1) for training. To obtain negative samples, an intra-batch sampler
77+
that corrects sampling bias performs sampling within the batch. The
78+
details of the algorithm, while not the focus here, can be found in the
79+
original paper.
80+
81+
The model's output consists of the click probabilities for different
82+
items. During training, a suitable loss function is chosen to ensure
83+
that the predicted results for positive samples are as close to 1 as
84+
possible, and as close to 0 as possible for negative samples.
85+
86+
### Inference
87+
88+
Before inference, item vectors for all items are computed and saved
89+
using the trained model. Given that item features are relatively stable,
90+
this step can reduce computational overhead during inference and speed
91+
up the process. User features, which are related to user behavior, are
92+
processed when user requests arrive. The two-tower model uses the user
93+
tower to compute current user features and generate the user vector. The
94+
same scoring function used during training is then used to measure
95+
similarity. This enables similarity search based on the user vector
96+
across all candidate item vectors. The most similar items are output as
97+
the retrieval result.
98+
99+
### Evaluation Metrics
100+
101+
A common evaluation metric of the retrieval model is the recall metric
102+
when $k$ items are recalled (Recall@k). This metric essentially
103+
quantifies the ability of a model to successfully retrieve the top $k$
104+
items of interest.
105+
106+
The mathematical definition of Recall@k is expressed as follows:
107+
108+
$$\text{Recall@k} = \frac{\text{TP}}{\min(\text{TP} + \text{FN}, k)}$$
109+
110+
In this equation, the term \"True Positive\" (TP) refers to the count of
111+
items correctly identified by the model as relevant (i.e., with a true
112+
label of 1) among the $k$ items retrieved. On the other hand, \"False
113+
Negative\" (FN) denotes the count of relevant items (again, with a true
114+
label of 1) that the model failed to include among the $k$ retrieved
115+
items.
116+
117+
Thus, the Recall@k metric serves as a measure of the model's ability to
118+
correctly identify and retrieve positive samples. Importantly, it is
119+
crucial to understand that if the total number of positive samples
120+
surpasses $k$, the maximum possible count of correctly retrieved items
121+
is $k$. This is due to the fact that the model is limited to retrieving
122+
only $k$ items. Consequently, the denominator in the Recall@k equation
123+
is defined as the lesser of two quantities: the sum of true positives
124+
and false negatives, or $k$.
125+
126+
## Ranking Stage
127+
128+
During the ranking phase, the model appraises the items gathered in the
129+
retrieval stage, evaluating each individually in terms of user features
130+
and item features. Each item's score is indicative of the probability
131+
that the user might be interested in that item. As a result, the
132+
highest-scoring items based on these rankings are then suggested to the
133+
user.
134+
135+
If the number of candidate items evaluated by the recommendation model
136+
continually increases, or if the recommendation logic and rules become
137+
more complex, the entire ranking stage can be efficiently divided into
138+
three sub-stages: pre-ranking, ranking, and re-ranking.
139+
140+
### Pre-ranking
141+
142+
Acting as an intermediary between the retrieval and ranking stages, the
143+
pre-ranking stage serves as an additional layer of filtering. This
144+
becomes particularly useful when there's a large influx of candidate
145+
items from the retrieval stage, or when multi-channel retrieval methods
146+
are used to boost retrieval result diversity. If every retrieved item
147+
was directly fed into the ranking model, the subsequent process could
148+
become overly lengthy due to the sheer volume of items. Thus,
149+
introducing a pre-ranking stage to the recommendation pipeline reduces
150+
the number of items proceeding to the ranking stage, enhancing overall
151+
system efficiency.
152+
153+
### Ranking
154+
155+
Ranking, the second stage, is pivotal in the pipeline. In this phase,
156+
it's essential that the model precisely represents the user's
157+
preferences across varying items. When referring to the \"ranking
158+
model\" in subsequent sections, we are specifically addressing the model
159+
used during this ranking sub-stage.
160+
161+
### Re-ranking
162+
163+
In the final re-ranking stage, the preliminary outcomes derived from the
164+
ranking stage are further refined according to specific business logic
165+
and rules. The goal of this stage is to improve the holistic quality of
166+
the recommendation service, shifting the focus from the click-through
167+
rate (CTR) of a single item to the broader user experience. For
168+
instance, the applied business logic might include efforts to increase
169+
the visibility of new items, filter out previously purchased items or
170+
watched videos, and create rules to diversify the order and variety of
171+
recommended items, thereby decreasing the frequency of similar item
172+
recommendations.
173+
174+
## Ranking with Deep Learning
175+
176+
The ranking stage in a recommender system has largely benefited from the
177+
use of deep learning models. These models are often referred to as the
178+
Deep Learning Recommendation Model (DLRM). As depicted in Figure
179+
:numref:`dlrm model`, a
180+
DLRM consists of embedding tables, multi-layer perceptrons (MLPs) that
181+
include two layers, and an interaction layer.[^2]
182+
183+
![Structure of DLRM](../img/ch_recommender/dlrm_model.png)
184+
:label:`dlrm model`
185+
186+
Similar to the two-tower model, the DLRM initially uses embedding tables
187+
to transform discrete features into corresponding embedding items, which
188+
are represented as dense vectors. The model then combines all continuous
189+
features into a single vector, which is introduced into the bottom MLP,
190+
generating an output vector with the same dimension as the embedding
191+
items. Both this output vector and all the embedding items are then
192+
forwarded to the interaction layer for further processing.
193+
194+
As illustrated in Figure :numref:`interaction`, the interaction layer performs dot product
195+
operations on all features (encompassing all embedding items and the
196+
processed continuous features) to obtain second-order interactions. As
197+
the features interacted within the interaction layer are symmetric, the
198+
diagonal represents each feature's self-interaction result. In the
199+
non-diagonal section, every distinct pair of features interacts twice
200+
(e.g., for features $p$ and $q$, two results are acquired: $<p,q>$ and
201+
$<q,p>$). Therefore, only the lower triangular part of the result matrix
202+
is retained and flattened. This flattened interaction result is merged
203+
with the output from the bottom MLP, and the combined result is used as
204+
the input for the top MLP. After further processing by the top MLP, the
205+
final output score reflects the probability of a user clicking on the
206+
item.
207+
208+
![Interaction principlediagram](../img/ch_recommender/interaction.png)
209+
:label:`interaction`
210+
211+
### Training Process
212+
213+
The DLRM bases its training on \<user, item, label\> tuples. It takes in
214+
user and item features as inputs and interacts with these features to
215+
predict the likelihood of a user clicking an item. For positive samples,
216+
the model aims to approximate this probability as closely to 1 as
217+
possible, while for negative samples, the goal is to get this
218+
probability as near to 0 as possible.
219+
220+
The ranking process can be considered a binary classification problem;
221+
the (user, item) pair can be classified either as click (label: 1) or no
222+
click (label: 0). Therefore, the method used to evaluate a ranking model
223+
is analogous to that employed for assessing a binary classification
224+
model. However, it's crucial to consider that recommender system
225+
datasets tend to be extremely imbalanced, meaning the proportion of
226+
positive samples is drastically different from that of negative samples.
227+
To minimize the influence of this data imbalance on metrics, we use the
228+
Area Under the Curve (AUC) and F1 score to evaluate ranking models.
229+
230+
The AUC is the area under the Receiver Operating Characteristic (ROC)
231+
curve, a graph used to define classification thresholds, plotted with
232+
the True Positive Rate (TPR) against the False Positive Rate (FPR) ---
233+
with the TPR on the y-axis and the FPR on the x-axis. An appropriate
234+
classification threshold can be determined by calculating the AUC and
235+
the ROC curves. If the predicted probability exceeds the classification
236+
threshold, the prediction result is 1 (click); otherwise, it is 0 (no
237+
click). From the prediction result, recall and precision can be
238+
computed, which in turn allows for the calculation of the F1 score using
239+
the formula :eqref:`f1`.
240+
241+
$$F1 = 2 \times \frac{recall \times precision}{recall + precision}$$
242+
:eqlabel:`equ:f1`
243+
244+
### Inference Process
245+
246+
During the inference stage, the features of the retrieved items, along
247+
with their corresponding user features, are merged and inputted into the
248+
DLRM. The model then predicts scores, and the items with the highest
249+
probabilities are selected for output.
250+
251+
[^1]: In the original paper, the user tower also uses the features of
252+
videos watched by users as seed features.
253+
254+
[^2]: DLRM is designed for structural customization. This section will
255+
illustrate an example using the standard code implementation of
256+
DLRM.

0 commit comments

Comments
 (0)