You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/posts/a-quest-for-fast-personalized-recommendation-part-i.md
+13-17Lines changed: 13 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,23 +16,19 @@ Personalized recommender systems attempt to generate a limited number of item op
16
16
17
17
An established and prevalent technique for personalized recommendation is collaborative filtering based on matrix factorization (MF), which attempts to learn customers’ preferences from their historical activities. Let’s assume that there are ***m*** users, denoted as ***U*** and ***n*** items, denoted as ***I***. Typically, a classic matrix factorization model consists of two phases:
18
18
19
-
****Learning***: this phase analyses customers’ historical activities, represented by a sparse matrix ***R*** of size ***m* x *n***, to learn their preferences. Each customer *u* is represented by a ***d***\-dimensional vector ***xu*** and each item *i* is represented by a ***d***\-dimensional vector ***yi***, where ***d*** is the hypothetical number of factors that explain the behaviour of each customer. The degree of preference of a customer *u* for an item *i* is modelled as the inner product score ***(*xu*)Tyi***. A higher inner product score implies a higher chance of the customer *u* to prefer the item *i*.
19
+
-***Learning***: this phase analyses customers’ historical activities, represented by a sparse matrix ***R*** of size ***m*** x ***n***, to learn their preferences. Each customer *u* is represented by a ***d***-dimensional vector ***x_u_*** and each item *i* is represented by a ***d***-dimensional vector ***y_i_***, where ***d*** is the hypothetical number of factors that explain the behaviour of each customer. The degree of preference of a customer *u* for an item *i* is modelled as the inner product score ***(*x_u_*)^T^y_i_***. A higher inner product score implies a higher chance of the customer *u* to prefer the item *i*.
20
20
21
-
****Retrieval***: given the output vectors from the learning phase, to arrive at a personalized recommendation list for customer ***u***, we need to identify the top-*K* items in ***I*** that have the highest inner product scores to ***xu***. Figure 1 illustrates the pipeline of top-*K* MF recommendation retrieval, in which ***Y*** denotes the item matrix where each row represents an item vector.
21
+
-***Retrieval***: given the output vectors from the learning phase, to arrive at a personalized recommendation list for customer ***u***, we need to identify the top-*K* items in ***I*** that have the highest inner product scores to ***x_u_***. Figure 1 illustrates the pipeline of top-*K* MF recommendation retrieval, in which ***Y*** denotes the item matrix where each row represents an item vector.
**Figure 1: Top-K Retrieval of Matrix Factorization Models**
26
-
27
-
The challenge of the *learning* phase is how to design effective algorithms that can learn from the data at the scale of millions of customers and items. This problem has been studied extensively in the research literature. On the other hand, the challenge of the *retrieval* phase is *speed,* due to the real-time nature of the task: *upon the arrival of a targeted customer**u*, the system needs to quickly generate top-*K* items with highest inner product scores to ***xu*** be recommended for *u*.
25
+
The challenge of the *learning* phase is how to design effective algorithms that can learn from the data at the scale of millions of customers and items. This problem has been studied extensively in the research literature. On the other hand, the challenge of the *retrieval* phase is *speed,* due to the real-time nature of the task: *upon the arrival of a targeted customer**u*, the system needs to quickly generate top-*K* items with highest inner product scores to ***x_u_*** be recommended for *u*.
28
26
29
27
Formally, the above problem of finding the top-*K* MF recommendations can be stated as follows:
30
28
31
-
**(Maximum Inner Product Search-MIPS)** Given a customer vector *xu*, determine the item *i* such that:
32
-
33
-
i=\\mathrm{argmax}\_{j \\in I} x\_u^T y\_j
29
+
**(Maximum Inner Product Search-MIPS)** Given a customer vector *x_u_*, determine the item *i* such that:
34
30
35
-
A straightforward solution for MIPS is to compute the inner product between ***xu*** and all item vectors **{*y1*, *y2*, …, *ym*}** and rank these scores. However, such solution scales linearly with the number of items, which incurs the prohibitive cost given current number of items in today large-scale systems (see References \[1\], \[2\], \[3\] for more detailed analysis). To achieve real-time personalized recommendation, we shall look for faster alternatives to solve the MIPS problem efficiently, specifically those who can avoid examining all items in *I*. In this post, we will explore such a solution, namely *indexing.*
31
+
A straightforward solution for MIPS is to compute the inner product between ***x_u_*** and all item vectors {***y_1_***, ***y_2_***, …, ***y_m_***} and rank these scores. However, such solution scales linearly with the number of items, which incurs the prohibitive cost given current number of items in today large-scale systems (see References [1], [2], [3] for more detailed analysis). To achieve real-time personalized recommendation, we shall look for faster alternatives to solve the MIPS problem efficiently, specifically those who can avoid examining all items in *I*. In this post, we will explore such a solution, namely *indexing.*
36
32
37
33
**Indexing for Matrix Factorization Recommendation Retrieval**
38
34
@@ -42,21 +38,21 @@ Figure 2 depicts two steps of a top-*K* recommender system with the aid of index
**Figure 2: Indexing Approach for Efficient Top-K Retrieval**
41
+
-**Index construction**: process and store the item vectors *Y* in a data structure (e.g., hash tables, binary search trees, etc.) so that similar item vectors are stored closely in the data structure (e.g., on the same buckets of the hash tables or the same leaf nodes of the binary search tree. etc.).
46
42
47
-
***Index construction**: process and store the item vectors *Y* in a data structure (e.g., hash tables, binary search trees, etc.) so that similar item vectors are stored closely in the data structure (e.g., on the same buckets of the hash tables or the same leaf nodes of the binary search tree. etc.).
48
-
***Retrieval**: Given the built data structure, a search for the top-*K* most similar items to a customer vector ***xu***, i.e., top-*K* recommendations can be performed in order of magnitude faster than naïve exhaustive search. This is primarily due to the property of indexing structures, which can automatically remove potential irrelevant items with high confidence and reduce the number of item candidates for inner product computation and ranking.
43
+
-**Retrieval**: Given the built data structure, a search for the top-*K* most similar items to a customer vector ***x_u_***, i.e., top-*K* recommendations can be performed in order of magnitude faster than naïve exhaustive search. This is primarily due to the property of indexing structures, which can automatically remove potential irrelevant items with high confidence and reduce the number of item candidates for inner product computation and ranking.
49
44
50
-
The benefit of indexing comes at the cost of constructing the data structures to store the item vectors in new formats that support efficient similarity search, which is a one-time cost to be amortized over the many query instances.
45
+
The benefit of indexing comes at the cost of constructing the data structures to store the item vectors in new formats that support efficient similarity search, which is a one-time cost to be amortized over the many query instances.
51
46
52
47
Though having several advantages, a factor for consideration when using indexing structures for top-*K* recommendation is the growth rate of the systems. As customer preferences may change over time, new items appear, or old items are removed, maintaining a retrieval-efficient structure would require constant updates (e.g., insertion, deletion, or even re-build).
53
48
54
-
In the next part, we will investigate further some issues with using indexing for top-*K* MIPS as well as discuss some promising solutions.
49
+
In the next part, we will investigate further some issues with using indexing for top-*K* MIPS as well as discuss some promising solutions.
55
50
56
51
**References**
57
52
58
-
**\[1\]** Koenigstein, Noam, Parikshit Ram, and Yuval Shavitt. “Efficient retrieval of recommendations in a matrix factorization framework.” *Proceedings of the 21st ACM international conference on Information and knowledge management*. 2012.
53
+
**[1]** Koenigstein, Noam, Parikshit Ram, and Yuval Shavitt. “Efficient retrieval of recommendations in a matrix factorization framework.” *Proceedings of the 21st ACM international conference on Information and knowledge management*. 2012.
54
+
55
+
**[2]** Le, D. D., & Lauw, H. W. (2017, November). Indexable Bayesian Personalized Ranking for Efficient Top-k Recommendation. In *Proceedings of the 2017 ACM on Conference on Information and Knowledge Management* (pp. 1389-1398). ACM.
59
56
60
-
**\[2\]** Le, D. D., & Lauw, H. W. (2017, November). Indexable Bayesian Personalized Ranking for Efficient Top-k Recommendation. In *Proceedings of the 2017 ACM on Conference on Information and Knowledge Management* (pp. 1389-1398). ACM.
57
+
**[3]** Le, D. D., & Lauw, H. W (2020, Feb). Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval, In *Proceeding of the 34thAAAI Conference on Artificial Intelligence* (AAAI’20), Feb 2020.
61
58
62
-
**\[3\]** Le, D. D., & Lauw, H. W (2020, Feb). Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval, In *Proceeding of the 34thAAAI Conference on Artificial Intelligence* (AAAI’20), Feb 2020.
Copy file name to clipboardExpand all lines: content/posts/aaai-2019-in-hawaii.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ date: "2019-03-09"
4
4
author: "Hady Lauw"
5
5
excerpt: "In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial. As..."
seoDescription: "In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial. As..."
@@ -14,30 +14,35 @@ seoDescription: "In January 2019, four members of Preferred.AI travelled to the
14
14
15
15
In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial.
16
16
17
-

17
+

18
18
19
-
As a country, Singapore held our own against much larger neighbors. With 122 submissions and 25 papers acceptance, the country success rate was a credible 20.5%
19
+
As a premier conference in artificial intelligence, AAAI has always been competitive. This year was especially so. There were a total of 7095 full paper submissions. No wonder the acceptance rate for this year was a low 16.2%, a drastic drop from last year’s 24.6%.
20
20
21
-
On Jan 28, [Andrew](/team/andrew/) and [Hady](/team/hadylauw/) delivered a 3-hour tutorial on “[Recent Advances in Scalable Retrieval of Personalized Recommendations](/aaai19-tutorial/)“. This emphasized the importance of retrieval efficiency for recommendation and covered the main strategies such as approximate maximum inner product search, indexable representation learning, discrete representations. We made the [materials](https://github.com/PreferredAI/recommendation-retrieval) as well as [video recording](https://www.youtube.com/playlist?list=PL291RJWFNQGL7MBEuBIDwMIQn8rX1Jloz) available.
21
+

22
22
23
-

23
+
On Jan 28, [Andrew](https://preferred.ai/team/andrew/) and [Hady](https://preferred.ai/team/hadylauw/) delivered a 3-hour tutorial on “[Recent Advances in Scalable Retrieval of Personalized Recommendations](https://preferred.ai/aaai19-tutorial/)“. This emphasized the importance of retrieval efficiency for recommendation and covered the main strategies such as approximate maximum inner product search, indexable representation learning, discrete representations. We made the [materials](https://github.com/PreferredAI/recommendation-retrieval) as well as [video recording](https://www.youtube.com/playlist?list=PL291RJWFNQGL7MBEuBIDwMIQn8rX1Jloz) available.
24
24
25
-
Andrew and Hady explored the various strategies to increase the retrieval efficiency of recommender systems, while maintaining accuracy
25
+

26
26
27
-
On Jan 30, [Tuan](/team/tuan/) presented the spotlight for our paper “[VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis](http://www.hadylauw.com/publications/aaai19a.pdf)” that showed the efficacy of review images in helping to identify the textual passages that would be useful for sentiment analysis. The [implementation](https://github.com/PreferredAI/vista-net) is now available.
27
+
On Jan 30, [Tuan](https://preferred.ai/team/tuan/) presented the spotlight for our paper “[VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis](http://www.hadylauw.com/publications/aaai19a.pdf)” that showed the efficacy of review images in helping to identify the textual passages that would be useful for sentiment analysis. The [implementation](https://github.com/PreferredAI/vista-net) is now available.
28
28
29
29

30
30
31
-
Hady and Tuan at the poster session for VistaNet
31
+
On Jan 31, [Maksim](https://preferred.ai/team/maksim/) gave the spotlight on our paper “[CompareLDA: A Topic Model for Document Comparison](http://www.hadylauw.com/publications/aaai19b.pdf)“, emphasizing that when comparison was a key property, a topic model supervised by pairwise comparisons such as CompareLDA would be more effective. The [implementation](https://github.com/PreferredAI/compare-lda) is also now available.
32
32
33
-
On Jan 31, [Maksim](/team/maksim/) gave the spotlight on our paper “[CompareLDA: A Topic Model for Document Comparison](http://www.hadylauw.com/publications/aaai19b.pdf)“, emphasizing that when comparison was a key property, a topic model supervised by pairwise comparisons such as CompareLDA would be more effective. The [implementation](https://github.com/PreferredAI/compare-lda) is also now available.
33
+

34
34
35
-

35
+
While the AAAI-19 program was interesting, the island of O’ahu also offered as picturesque a scenery as any. [Maksim](https://preferred.ai/team/maksim/) captured the winter waves of O’ahu in the following stunning drone video.
36
36
37
-
Maksim explaining how a topic model aligned to comparisons can reveal insightful topics about how entities are ranked with respect to one another
38
37
39
-
While the AAAI-19 program was interesting, the island of O’ahu also offered as picturesque a scenery as any. [Maksim](/team/maksim/) captured the winter waves of O’ahu in the following stunning drone video.
During the conference downtime, we explored several attractions around the island. We invite you to share in our experiences with the following montage.
0 commit comments