Skip to content

Commit 74f7f4e

Browse files
committed
fix: posts extraction
1 parent 5db3808 commit 74f7f4e

File tree

54 files changed

+513
-1385
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+513
-1385
lines changed

content/posts/a-quest-for-fast-personalized-recommendation-part-i.md

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,23 +16,19 @@ Personalized recommender systems attempt to generate a limited number of item op
1616

1717
An established and prevalent technique for personalized recommendation is collaborative filtering based on matrix factorization (MF), which attempts to learn customers’ preferences from their historical activities. Let’s assume that there are ***m*** users, denoted as ***U*** and ***n*** items, denoted as ***I***. Typically, a classic matrix factorization model consists of two phases:
1818

19-
* ***Learning***: this phase analyses customers’ historical activities, represented by a sparse matrix ***R*** of size ***m* x *n***, to learn their preferences. Each customer *u* is represented by a ***d***\-dimensional vector ***xu*** and each item *i* is represented by a ***d***\-dimensional vector ***yi***, where ***d*** is the hypothetical number of factors that explain the behaviour of each customer. The degree of preference of a customer *u* for an item *i* is modelled as the inner product score ***(*xu*)Tyi***. A higher inner product score implies a higher chance of the customer *u* to prefer the item *i*.
19+
- ***Learning***: this phase analyses customers’ historical activities, represented by a sparse matrix ***R*** of size ***m*** x ***n***, to learn their preferences. Each customer *u* is represented by a ***d***-dimensional vector ***x_u_*** and each item *i* is represented by a ***d***-dimensional vector ***y_i_***, where ***d*** is the hypothetical number of factors that explain the behaviour of each customer. The degree of preference of a customer *u* for an item *i* is modelled as the inner product score ***(*x_u_*)^T^y_i_***. A higher inner product score implies a higher chance of the customer *u* to prefer the item *i*.
2020

21-
* ***Retrieval***: given the output vectors from the learning phase, to arrive at a personalized recommendation list for customer ***u***, we need to identify the top-*K* items in ***I*** that have the highest inner product scores to ***xu***. Figure 1 illustrates the pipeline of top-*K* MF recommendation retrieval, in which ***Y***  denotes the item matrix where each row represents an item vector.
21+
- ***Retrieval***: given the output vectors from the learning phase, to arrive at a personalized recommendation list for customer ***u***, we need to identify the top-*K* items in ***I*** that have the highest inner product scores to ***x_u_***. Figure 1 illustrates the pipeline of top-*K* MF recommendation retrieval, in which ***Y***  denotes the item matrix where each row represents an item vector.
2222

2323
![](/uploads/2020/09/mf-based-recommendation-retrieval-1.png)
2424

25-
**Figure 1: Top-K Retrieval of Matrix Factorization Models**
26-
27-
The challenge of the *learning* phase is how to design effective algorithms that can learn from the data at the scale of millions of customers and items. This problem has been studied extensively in the research literature. On the other hand, the challenge of the *retrieval* phase is *speed,* due to the real-time nature of the task: *upon the arrival of a targeted customer* *u*, the system needs to quickly generate top-*K* items with highest inner product scores to ***xu*** be recommended for *u*.
25+
The challenge of the *learning* phase is how to design effective algorithms that can learn from the data at the scale of millions of customers and items. This problem has been studied extensively in the research literature. On the other hand, the challenge of the *retrieval* phase is *speed,* due to the real-time nature of the task: *upon the arrival of a targeted customer* *u*, the system needs to quickly generate top-*K* items with highest inner product scores to ***x_u_*** be recommended for *u*.
2826

2927
Formally, the above problem of finding the top-*K* MF recommendations can be stated as follows:
3028

31-
**(Maximum Inner Product Search-MIPS)** Given a customer vector *xu*, determine the item *i* such that:
32-
33-
i=\\mathrm{argmax}\_{j \\in I} x\_u^T y\_j
29+
**(Maximum Inner Product Search-MIPS)** Given a customer vector *x_u_*, determine the item *i* such that:
3430

35-
A straightforward solution for MIPS is to compute the inner product between ***xu*** and all item vectors **{*y1*, *y2*, …, *ym*}** and rank these scores. However, such solution scales linearly with the number of items, which incurs the prohibitive cost given current number of items in today large-scale systems (see References \[1\], \[2\], \[3\] for more detailed analysis). To achieve real-time personalized recommendation, we shall look for faster alternatives to solve the MIPS problem efficiently, specifically those who can avoid examining all items in *I*. In this post, we will explore such a solution, namely *indexing.*
31+
A straightforward solution for MIPS is to compute the inner product between ***x_u_*** and all item vectors {***y_1_***, ***y_2_***, …, ***y_m_***} and rank these scores. However, such solution scales linearly with the number of items, which incurs the prohibitive cost given current number of items in today large-scale systems (see References [1], [2], [3] for more detailed analysis). To achieve real-time personalized recommendation, we shall look for faster alternatives to solve the MIPS problem efficiently, specifically those who can avoid examining all items in *I*. In this post, we will explore such a solution, namely *indexing.*
3632

3733
**Indexing for Matrix Factorization Recommendation Retrieval**
3834

@@ -42,21 +38,21 @@ Figure 2 depicts two steps of a top-*K* recommender system with the aid of index
4238

4339
![](/uploads/2020/09/indexing-for-MF-recommendation-retrieval-1.png)
4440

45-
**Figure 2: Indexing Approach for Efficient Top-K Retrieval**
41+
- **Index construction**: process and store the item vectors *Y* in a data structure (e.g., hash tables, binary search trees, etc.) so that similar item vectors are stored closely in the data structure (e.g., on the same buckets of the hash tables or the same leaf nodes of the binary search tree. etc.).
4642

47-
* **Index construction**: process and store the item vectors *Y* in a data structure (e.g., hash tables, binary search trees, etc.) so that similar item vectors are stored closely in the data structure (e.g., on the same buckets of the hash tables or the same leaf nodes of the binary search tree. etc.).
48-
* **Retrieval**: Given the built data structure, a search for the top-*K* most similar items to a customer vector ***xu***, i.e., top-*K* recommendations can be performed in order of magnitude faster than naïve exhaustive search. This is primarily due to the property of indexing structures, which can automatically remove potential irrelevant items with high confidence and reduce the number of item candidates for inner product computation and ranking.
43+
- **Retrieval**: Given the built data structure, a search for the top-*K* most similar items to a customer vector ***x_u_***, i.e., top-*K* recommendations can be performed in order of magnitude faster than naïve exhaustive search. This is primarily due to the property of indexing structures, which can automatically remove potential irrelevant items with high confidence and reduce the number of item candidates for inner product computation and ranking.
4944

50-
The benefit of indexing comes at the cost of constructing the data structures to store the item vectors in new formats that support efficient similarity search, which is a one-time cost to be amortized over the many query instances. 
45+
The benefit of indexing comes at the cost of constructing the data structures to store the item vectors in new formats that support efficient similarity search, which is a one-time cost to be amortized over the many query instances.
5146

5247
Though having several advantages, a factor for consideration when using indexing structures for top-*K* recommendation is the growth rate of the systems. As customer preferences may change over time, new items appear, or old items are removed, maintaining a retrieval-efficient structure would require constant updates (e.g., insertion, deletion, or even re-build).
5348

54-
In the next part, we will investigate further some issues with using indexing for top-*K* MIPS as well as discuss some promising solutions. 
49+
In the next part, we will investigate further some issues with using indexing for top-*K* MIPS as well as discuss some promising solutions.
5550

5651
**References**
5752

58-
**\[1\]** Koenigstein, Noam, Parikshit Ram, and Yuval Shavitt. “Efficient retrieval of recommendations in a matrix factorization framework.” *Proceedings of the 21st ACM international conference on Information and knowledge management*. 2012.
53+
**[1]** Koenigstein, Noam, Parikshit Ram, and Yuval Shavitt. “Efficient retrieval of recommendations in a matrix factorization framework.” *Proceedings of the 21st ACM international conference on Information and knowledge management*. 2012.
54+
55+
**[2]** Le, D. D., & Lauw, H. W. (2017, November). Indexable Bayesian Personalized Ranking for Efficient Top-k Recommendation. In *Proceedings of the 2017 ACM on Conference on Information and Knowledge Management* (pp. 1389-1398). ACM.
5956

60-
**\[2\]** Le, D. D., & Lauw, H. W. (2017, November). Indexable Bayesian Personalized Ranking for Efficient Top-k Recommendation. In *Proceedings of the 2017 ACM on Conference on Information and Knowledge Management* (pp. 1389-1398). ACM.
57+
**[3]** Le, D. D., & Lauw, H. W (2020, Feb). Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval, In *Proceeding of the 34thAAAI Conference on Artificial Intelligence* (AAAI’20), Feb 2020.
6158

62-
**\[3\]** Le, D. D., & Lauw, H. W (2020, Feb). Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval, In *Proceeding of the 34thAAAI Conference on Artificial Intelligence* (AAAI’20), Feb 2020.

content/posts/aaai-2019-in-hawaii.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ date: "2019-03-09"
44
author: "Hady Lauw"
55
excerpt: "In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial. As..."
66
featuredImage: "/uploads/2019/03/aaai19-hilton.jpg"
7-
categories: ["Video", "Travel", "Presentation"]
7+
categories: ["Presentation", "Travel", "Video"]
88
tags: []
99
seoTitle: "Aloha, AAAI-2019 - Preferred.AI"
1010
seoDescription: "In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial. As..."
@@ -14,30 +14,35 @@ seoDescription: "In January 2019, four members of Preferred.AI travelled to the
1414

1515
In January 2019, four members of Preferred.AI travelled to the AAAI-19 conference held in Honolulu, Hawaii to present 2 papers and 1 tutorial.
1616

17-
![](/uploads/2019/03/aaai19-acceptance.jpg)
17+
![The conference was held in Hilton Hawaiian Village Waikiki Beach Resort, a sprawling complex of hotel, restaurants, convention hall, and a beach too!](/uploads/2019/03/aaai19-hilton.jpg)
1818

19-
As a country, Singapore held our own against much larger neighbors. With 122 submissions and 25 papers acceptance, the country success rate was a credible 20.5%
19+
As a premier conference in artificial intelligence, AAAI has always been competitive. This year was especially so. There were a total of 7095 full paper submissions. No wonder the acceptance rate for this year was a low 16.2%, a drastic drop from last year’s 24.6%.
2020

21-
On Jan 28, [Andrew](/team/andrew/) and [Hady](/team/hadylauw/) delivered a 3-hour tutorial on “[Recent Advances in Scalable Retrieval of Personalized Recommendations](/aaai19-tutorial/)“. This emphasized the importance of retrieval efficiency for recommendation and covered the main strategies such as approximate maximum inner product search, indexable representation learning, discrete representations. We made the [materials](https://github.com/PreferredAI/recommendation-retrieval) as well as [video recording](https://www.youtube.com/playlist?list=PL291RJWFNQGL7MBEuBIDwMIQn8rX1Jloz) available.
21+
![As a country, Singapore held our own against much larger neighbors. With 122 submissions and 25 papers acceptance, the country success rate was a credible 20.5%](/uploads/2019/03/aaai19-acceptance.jpg)
2222

23-
![](/uploads/2019/03/aaai19-tutorial.jpg)
23+
On Jan 28, [Andrew](https://preferred.ai/team/andrew/) and [Hady](https://preferred.ai/team/hadylauw/) delivered a 3-hour tutorial on “[Recent Advances in Scalable Retrieval of Personalized Recommendations](https://preferred.ai/aaai19-tutorial/)“. This emphasized the importance of retrieval efficiency for recommendation and covered the main strategies such as approximate maximum inner product search, indexable representation learning, discrete representations. We made the [materials](https://github.com/PreferredAI/recommendation-retrieval) as well as [video recording](https://www.youtube.com/playlist?list=PL291RJWFNQGL7MBEuBIDwMIQn8rX1Jloz) available.
2424

25-
Andrew and Hady explored the various strategies to increase the retrieval efficiency of recommender systems, while maintaining accuracy
25+
![Andrew and Hady explored the various strategies to increase the retrieval efficiency of recommender systems, while maintaining accuracy](/uploads/2019/03/aaai19-tutorial.jpg)
2626

27-
On Jan 30, [Tuan](/team/tuan/) presented the spotlight for our paper “[VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis](http://www.hadylauw.com/publications/aaai19a.pdf)” that showed the efficacy of review images in helping to identify the textual passages that would be useful for sentiment analysis. The [implementation](https://github.com/PreferredAI/vista-net) is now available.
27+
On Jan 30, [Tuan](https://preferred.ai/team/tuan/) presented the spotlight for our paper “[VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis](http://www.hadylauw.com/publications/aaai19a.pdf)” that showed the efficacy of review images in helping to identify the textual passages that would be useful for sentiment analysis. The [implementation](https://github.com/PreferredAI/vista-net) is now available.
2828

2929
![](/uploads/2019/03/aaai19-vistanet.jpg)
3030

31-
Hady and Tuan at the poster session for VistaNet
31+
On Jan 31, [Maksim](https://preferred.ai/team/maksim/) gave the spotlight on our paper “[CompareLDA: A Topic Model for Document Comparison](http://www.hadylauw.com/publications/aaai19b.pdf)“, emphasizing that when comparison was a key property, a topic model supervised by pairwise comparisons such as CompareLDA would be more effective. The [implementation](https://github.com/PreferredAI/compare-lda) is also now available.
3232

33-
On Jan 31, [Maksim](/team/maksim/) gave the spotlight on our paper “[CompareLDA: A Topic Model for Document Comparison](http://www.hadylauw.com/publications/aaai19b.pdf)“, emphasizing that when comparison was a key property, a topic model supervised by pairwise comparisons such as CompareLDA would be more effective. The [implementation](https://github.com/PreferredAI/compare-lda) is also now available.
33+
![Maksim explaining how a topic model aligned to comparisons can reveal insightful topics about how entities are ranked with respect to one another](/uploads/2019/03/aaai19-comparelda.jpg)
3434

35-
![](/uploads/2019/03/aaai19-comparelda.jpg)
35+
While the AAAI-19 program was interesting, the island of O’ahu also offered as picturesque a scenery as any. [Maksim](https://preferred.ai/team/maksim/) captured the winter waves of O’ahu in the following stunning drone video.
3636

37-
Maksim explaining how a topic model aligned to comparisons can reveal insightful topics about how entities are ranked with respect to one another
3837

39-
While the AAAI-19 program was interesting, the island of O’ahu also offered as picturesque a scenery as any. [Maksim](/team/maksim/) captured the winter waves of O’ahu in the following stunning drone video.
38+
<iframe width="720" height="405" src="https://www.youtube.com/embed/SEDLxfgkCOw?feature=oembed" frameborder="0" allowfullscreen></iframe>
39+
4040

4141
During the conference downtime, we explored several attractions around the island. We invite you to share in our experiences with the following montage.
4242

43+
44+
<iframe width="720" height="405" src="https://www.youtube.com/embed/SKIzmRKW-W4?feature=oembed" frameborder="0" allowfullscreen></iframe>
45+
46+
4347
Mahalo!
48+

0 commit comments

Comments
 (0)