RECSYS2023

会议论文列表

本会议共有 179 篇论文

序号	链接	摘要	作者	组织
1	Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders	Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential retrieval models, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders in the retrieval stage using a "not-to-recommend" loss function that optimizes for the log-likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by developing a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change.	Alex Beutel, Elaine Ya Le, Jingchen Feng, Longfei Li, MinCheng Huang, Shane Li, Shuchao Bi, Shuo Chang, Xujian Liang, Yaping Zhang, Yoni Halpern, Yueqi Wang	Google, Mountain View, CA 94043 USA
2	gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling	A large catalogue size is one of the central challenges in training recommendation models: a large number of items makes them memory and computationally inefficient to compute scores for all items during training, forcing these models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data, and therefore models trained with negative sampling tend to overestimate the probabilities of positive interactions a phenomenon we call overconfidence. While the absolute values of the predicted scores or probabilities are not important for the ranking of retrieved recommendations, overconfident models may fail to estimate nuanced differences in the top-ranked items, resulting in degraded performance. In this paper, we show that overconfidence explains why the popular SASRec model underperforms when compared to BERT4Rec. This is contrary to the BERT4Rec authors explanation that the difference in performance is due to the bi-directional attention mechanism. To mitigate overconfidence, we propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence. We further propose the gSASRec model, an improvement over SASRec that deploys an increased number of negatives and the gBCE loss. We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem. As a result, gSASRec can outperform BERT4Rec (e.g. +9.47% NDCG on the MovieLens-1M dataset), while requiring less training time (e.g. -73% training time on MovieLens-1M). Moreover, in contrast to BERT4Rec, gSASRec is suitable for large datasets that contain more than 1 million items.	Aleksandr Vladimirovich Petrov, Craig MacDonald	School of Computing Science, University of Glasgow, United Kingdom; University of Glasgow, United Kingdom
3	Rethinking Multi-Interest Learning for Candidate Matching in Recommender Systems	Existing research efforts for multi-interest candidate matching in recommender systems mainly focus on improving model architecture or incorporating additional information, neglecting the importance of training schemes. This work revisits the training framework and uncovers two major problems hindering the expressiveness of learned multi-interest representations. First, the current training objective (i.e., uniformly sampled softmax) fails to effectively train discriminative representations in a multi-interest learning scenario due to the severe increase in easy negative samples. Second, a routing collapse problem is observed where each learned interest may collapse to express information only from a single item, resulting in information loss. To address these issues, we propose the REMI framework, consisting of an Interest-aware Hard Negative mining strategy (IHN) and a Routing Regularization (RR) method. IHN emphasizes interest-aware hard negatives by proposing an ideal sampling distribution and developing a Monte-Carlo strategy for efficient approximation. RR prevents routing collapse by introducing a novel regularization term on the item-to-interest routing matrices. These two components enhance the learned multi-interest representations from both the optimization objective and the composition information. REMI is a general framework that can be readily applied to various existing multi-interest candidate matching methods. Experiments on three real-world datasets show our method can significantly improve state-of-the-art methods with easy implementation and negligible computational overhead. The source code is available at https://github.com/Tokkiu/REMI.	Fangzhao Wu, Jae Boum Kim, Jingqi Gao, Peilin Zhou, Qichen Ye, Sunghun Kim, Yining Hua, Yueqi Xie	HKUST gz, Hong Kong, Peoples R China; HKUST, Hong Kong, Peoples R China; MIT, Cambridge, MA 02139 USA; MSRA, Beijing, Peoples R China; Peking Univ, Beijing, Peoples R China; Upstage, Hong Kong, Peoples R China
4	Understanding and Modeling Passive-Negative Feedback for Short-video Sequential Recommendation	Sequential recommendation is one of the most important tasks in recommender systems, which aims to recommend the next interacted item with historical behaviors as input. Traditional sequential recommendation always mainly considers the collected positive feedback such as click, purchase, etc. However, in short-video platforms such as TikTok, video viewing behavior may not always represent positive feedback. Specifically, the videos are played automatically, and users passively receive the recommended videos. In this new scenario, users passively express negative feedback by skipping over videos they do not like, which provides valuable information about their preferences. Different from the negative feedback studied in traditional recommender systems, this passive-negative feedback can reflect users' interests and serve as an important supervision signal in extracting users' preferences. Therefore, it is essential to carefully design and utilize it in this novel recommendation scenario. In this work, we first conduct analyses based on a large-scale real-world short-video behavior dataset and illustrate the significance of leveraging passive feedback. We then propose a novel method that deploys the sub-interest encoder, which incorporates positive feedback and passive-negative feedback as supervision signals to learn the user's current active sub-interest. Moreover, we introduce an adaptive fusion layer to integrate various sub-interests effectively. To enhance the robustness of our model, we then introduce a multi-task learning module to simultaneously optimize two kinds of feedback - passive-negative feedback and traditional randomly-sampled negative feedback. The experiments on two large-scale datasets verify that the proposed method can significantly outperform state-of-the-art approaches. The code is released at https:// github.com/ tsinghua-fib-lab/ RecSys2023-SINE to benefit the community.	Chen Gao, Depeng Jin, Jianxin Chang, Kun Gai, Yanan Niu, Yang Song, Yong Li, Yunzhu Pan	Beijing Kuaishou Technol Co Ltd, Beijing, Peoples R China; Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Dept Elect Engn, Beijing, Peoples R China; Unaffiliated, Beijing, Peoples R China; Univ Elect Sci & Technol China, Chengdu, Peoples R China
5	Adaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation	Classical recommender systems often assume that historical data are stationary and fail to account for the dynamic nature of user preferences, limiting their ability to provide reliable recommendations in time-sensitive settings. This assumption is particularly problematic in finance, where financial products exhibit continuous changes in valuations, leading to frequent shifts in client interests. These evolving interests, summarized in the past client-product interactions, see their utility fade over time with a degree that might differ from one client to another. To address this challenge, we propose a time-dependent collaborative filtering algorithm that can adaptively discount distant client-product interactions using personalized decay functions. Our approach is designed to handle the non-stationarity of financial data and produce reliable recommendations by modeling the dynamic collaborative signals between clients and products. We evaluate our method using a proprietary dataset from BNP Paribas and demonstrate significant improvements over state-of-the-art benchmarks from relevant literature. Our findings emphasize the importance of incorporating time explicitly in the model to enhance the accuracy of financial product recommendation.	Ashraf Ghiye, Baptiste Barreau, Laurent Carlier, Michalis Vazirgiannis	BNP Paribas Corp & Inst Banking, Global Markets Data & AI Lab, Paris, France; Ecole Polytechn, Comp Sci Lab, LIX, Palaiseau, France
6	Integrating Item Relevance in Training Loss for Sequential Recommender Systems	Sequential Recommender Systems (SRSs) are a popular type of recommender system that leverages user history to predict the next item of interest. However, the presence of noise in user interactions, stemming from account sharing, inconsistent preferences, or accidental clicks, can significantly impact the robustness and performance of SRSs, particularly when the entire item set to be predicted is noisy. This situation is more prevalent when only one item is used to train and evaluate the SRSs. To tackle this challenge, we propose a novel approach that addresses the issue of noise in SRSs. First, we propose a sequential multi-relevant future items training objective, leveraging a loss function aware of item relevance, thereby enhancing their robustness against noise in the training data. Additionally, to mitigate the impact of noise at evaluation time, we propose multi-relevant future items evaluation (MRFI-evaluation), aiming to improve overall performance. Our relevance-aware models obtain an improvement of 1.58% of NDCG@10 and 0.96% in terms of HR@10 in the traditional evaluation protocol, the one which utilizes one relevant future item. In the MRFI-evaluation protocol, using multiple future items, the improvement is 2.82% of NDCG@10 and 0.64% of HR@10 w.r.t the best baseline model.	Andrea Bacciu, Fabrizio Silvestri, Federico Siciliano, Nicola Tonellotto	Sapienza Univ Rome, Rome, Italy; Univ Pisa, Pisa, Italy
7	Integrating the ACT-R Framework with Collaborative Filtering for Explainable Sequential Music Recommendation	Music listening sessions often consist of sequences including repeating tracks. Modeling such relistening behavior with models of human memory has been proven effective in predicting the next track of a session. However, these models intrinsically lack the capability of recommending novel tracks that the target user has not listened to in the past. Collaborative filtering strategies, on the contrary, provide novel recommendations by leveraging past collective behaviors but are often limited in their ability to provide explanations. To narrow this gap, we propose four hybrid algorithms that integrate collaborative filtering with the cognitive architecture ACT-R. We compare their performance in terms of accuracy, novelty, diversity, and popularity bias, to baselines of different types, including pure ACT-R, kNN-based, and neural-networks-based approaches. We show that the proposed algorithms are able to achieve the best performances in terms of novelty and diversity, and simultaneously achieve a higher accuracy of recommendation with respect to pure ACT-R models. Furthermore, we illustrate how the proposed models can provide explainable recommendations.	Christian Wallmann, Dominik Kowald, Elisabeth Lex, Markus ReiterHaas, Markus Schedl, Marta Moscati	Graz Univ Technol, Graz, Austria; Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria; Welser Profile GmbH, Gresten, Austria
8	An Industrial Framework for Personalized Serendipitous Recommendation in E-commerce	Classical recommendation methods typically face the filter bubble problem where users likely receive recommendations of their familiar items, making them bored and dissatisfied. To alleviate such an issue, this applied paper introduces a novel framework for personalized serendipitous recommendation in an e-commerce platform (i.e., JD.com), which allows to present user unexpected and satisfying items deviating from user's prior behaviors, considering both accuracy and novelty. To achieve such a goal, it is crucial yet challenging to recognize when a user is willing to receive serendipitous items and how many novel items are expected. To address above two challenges, a two-stage framework is designed. Firstly, a DNN-based scorer is deployed to quantify the novelty degree of a product category based on user behavior history. Then, we resort to a potential outcome framework to decide the optimal timing to recommend a user serendipitous items and the novelty degree of the recommendation. Online A/B test on the e-commerce recommender platform in JD.com demonstrates that our model achieves significant gains on various metrics, 0.54% relative increase of impressive depth, 0.8% of average user click count, 3.23% and 1.38% of number of novel impressive and clicked items individually.	Anyu Dai, Linfang Hou, Luobao Zou, Mian Ma, Nan Qiao, Sulong Xu, Yanyan Zou, Zhuoye Ding, Zongyi Wang	JD com, Beijing, Peoples R China
9	Full Index Deep Retrieval: End-to-End User and Item Structures for Cold-start and Long-tail Item Recommendation	End-to-end retrieval models, such as Tree-based Models (TDM) and Deep Retrieval (DR), have attracted a lot of attention, but they cannot handle cold-start and long-tail item recommendation scenarios well. Specifically, DR learns a compact indexing structure, enabling efficient and accurate retrieval for large recommendation systems. However, it is discovered that DR largely fails on retrieving coldstart and long-tail items. This is because DR only utilizes user-item interaction data, which is rare and often noisy for cold-start and long-tail items. Besides, end-to-end retrieval models are unable to make use of the rich item content features. To address this issue while maintaining the efficiency of DR indexing structure, we propose Full Index Deep Retrieval (FIDR) that learns indices for the full corpus items, including cold-start and long-tail items. In addition to the original structure in DR (called User Structure in FIDR) that learns with user-item interaction data (e.g., clicks), we add an Item Structure to embed items directly based on item content features (e.g., categories). With joint efforts of User Structure and Item Structure, FIDR makes cold-start items retrievable and also improves the recommendation quality of long-tail items. To our best knowledge, FIDR is the first to solve the cold-start and longtail recommendation problem for the end-to-end retrieval models. Through extensive experiments on three real-world datasets, we demonstrate that FIDR can effectively recommend cold-start as well as long-tail items, and largely promote overall recommendation performance without sacrificing inference efficiency. According to the experiments, the recall of FIDR is improved by 8.8%similar to 11.9%, while the inference of FIDR is as efficient as DR.	Anran Xu, Chong Wang, Fan Wu, Lei Chen, Shengjie Wang, Xin Wu, Zhen Gong, Zhenzhe Zheng	Bytedance Inc, Mountain View, CA USA; Shanghai Jiao Tong Univ, Shanghai, Peoples R China
10	Online Matching: A Real-time Bandit System for Large-scale Recommendations	The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid offline + online approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB - a novel extension of the LinUCB algorithm - to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.	Charles Wu, Ed H. Chi, Hariharan Chandrasekaran, Lichan Hong, Lukasz Heldt, Minmin Chen, Ruining He, ShaoChuan Wang, Xinyang Yi	Google Deepmind, Mountain View, CA 94043 USA; Google Inc, Mountain View, CA USA
11	Exploring False Hard Negative Sample in Cross-Domain Recommendation	Negative Sampling in recommendation aims to capture informative negative instances for the sparse user-item interactions to improve the performance. Conventional negative sampling methods tend to select informative hard negative samples (HNS) besides the default random samples. However, these hard negative sampling methods usually struggle with false hard negative samples (FHNS), which happens when a user-item interaction has not been observed yet and is picked as a negative sample, while the user will actually interact with this item once exposed to it. Such FHNS issues may seriously confuse the model training, while most conventional hard negative sampling methods do not systematically explore and distinguish FHNS from HNS. To address this issue, we propose a novel model-agnostic Real Hard Negative Sampling (RealHNS) framework specially for cross-domain recommendation (CDR), which aims to discover the false and refine the real from all HNS via both general and cross-domain real hard negative sample selectors. For the general part, we conduct the coarse- and fine-grained real HNS selectors sequentially, armed with a dynamic item-based FHNS filter to find high-quality HNS. For the cross-domain part, we further design a new cross-domain HNS for alleviating negative transfer in CDR and discover its corresponding FHNS via a dynamic user-based FHNS filter to keep its power. We conduct experiments on four datasets based on three representative hard negative sampling methods, along with extensive model analyses, ablation studies, and universality analyses. The consistent improvements indicate the effectiveness, robustness, and universality of RealHNS, which is also easy-to-deploy in real-world systems as a plug-and-play strategy. The source code is avaliable in https://github.com/hulkima/RealHNS.	Haokai Ma, Jie Zhou, Lei Meng, Leyu Lin, Ruobing Xie, Xin Chen, Xu Zhang	Shandong Univ, Sch Software, Jinan, Peoples R China; Tencent, WeChat, Beijing, Peoples R China
12	Contrastive Learning with Frequency-Domain Interest Trends for Sequential Recommendation	Recently, contrastive learning for sequential recommendation has demonstrated its powerful ability to learn high-quality user representations. However, constructing augmented samples in the time domain poses challenges due to various reasons, such as fast-evolving trends, interest shifts, and system factors. Furthermore, the F-principle indicates that deep learning preferentially fits the low-frequency part, resulting in poor performance on high-frequency tasks. The complexity of time series and the low-frequency preference limit the utility of sequence encoders. To address these challenges, we need to construct augmented samples from the frequency domain, thus improving the ability to accommodate events of different frequency sizes. To this end, we propose a novel Contrastive Learning with Frequency-Domain Interest Trends for Sequential Recommendation (CFIT4SRec). We treat the embedding representations of historical interactions as "images" and introduce the secondorder Fourier transform to construct augmented samples. The components of different frequency sizes reflect the interest trends between attributes and their surroundings in the hidden space. We introduce three data augmentation operations to accommodate events of different frequency sizes: low-pass augmentation, high-pass augmentation, and band-stop augmentation. Extensive experiments on four public benchmark datasets demonstrate the superiority of CFIT4SRec over the state-of-the-art baselines. The implementation code is available at https://github.com/zhangyichi1Z/CFIT4SRec.	Guisheng Yin, Yichi Zhang, Yuxin Dong	Harbin Engn Univ, Harbin, Heilongjiang, Peoples R China
13	Multi-task Item-attribute Graph Pre-training for Strict Cold-start Item Recommendation	Recommendation systems suffer in the strict cold-start (SCS) scenario, where the user-item interactions are entirely unavailable. The well-established, dominating identity (ID)-based approaches completely fail to work. Cold-start recommenders, on the other hand, leverage item contents ( brand, title, descriptions, etc.) to map the new items to the existing ones. However, the existing SCS recommenders explore item contents in coarse-grained manners that introduce noise or information loss. Moreover, informative data sources other than item contents, such as users' purchase sequences and review texts, are largely ignored. In this work, we explore the role of the fine-grained item attributes in bridging the gaps between the existing and the SCS items and pre-train a knowledgeable item-attribute graph for SCS item recommendation. Our proposed framework, ColdGPT, models item-attribute correlations into an item-attribute graph by extracting fine-grained attributes from item contents. ColdGPT then transfers knowledge into the item-attribute graph from various available data sources, i.e., item contents, historical purchase sequences, and review texts of the existing items, via multi-task learning. To facilitate the positive transfer, ColdGPT designs specific submodules according to the natural forms of the data sources and proposes to coordinate the multiple pre-training tasks via unified alignment-and-uniformity losses. Our pre-trained item-attribute graph acts as an implicit, extendable item embedding matrix, which enables the SCS item embeddings to be easily acquired by inserting these items into the item-attribute graph and propagating their attributes' embeddings. We carefully process three public datasets, i.e., Yelp, Amazon-home, and Amazon-sports, to guarantee the SCS setting for evaluation. Extensive experiments show that ColdGPT consistently outperforms the existing SCS recommenders by large margins and even surpasses models that are pre-trained on 75 - 224 times more, cross-domain data on two out of four datasets. Our code and pre-processed datasets for SCS evaluations are publicly available to help future SCS studies.	Chen Wang, Chenyu You, Hao Peng, Liangwei Yang, Philip S. Yu, Yuwei Cao, Zhiwei Liu	Beihang Univ, Beijing, Peoples R China; Salesforce AI, Washington, DC USA; Univ Illinois, Chicago, IL USA; Yale Univ, New Haven, CT USA
14	BVAE: Behavior-aware Variational Autoencoder for Multi-Behavior Multi-Task Recommendation	A practical recommender system should be able to handle heterogeneous behavioral feedback as inputs and has multi-task outputs ability. Although the heterogeneous one-class collaborative filtering (HOCCF) and multi-task learning (MTL) methods has been well studied, there is still a lack of targeted manner in their combined fields, i.e., Multi-behavior Multi-task Recommendation (MMR). To fill the gap, we propose a novel recommendation framework called Behavior-aware Variational AutoEncoder (BVAE), which meliorates the parameter sharing and loss minimization method with the VAE structure to address the MMR problem. Specifically, our BVAE includes behavior-aware semi-encoders and decoders, and a target feature fusion network with a global feature filtering network, while using standard deviation to weigh loss. These modules generate the behavior-aware recommended item list via constructing better semantic feature vectors for users, i.e., from dual perspectives of behavioral preference and global interaction. In addition, we optimize our BVAE in terms of adaptability and robustness, i.e., it is concise and flexible to consume any amount of behaviors with different distributions. Extensive empirical studies on two real and widely used datasets confirm the validity of our design and show that our BVAE can outperform the state-of-the-art related baseline methods under multiple evaluation metrics. The processed datasets, source code, and scripts necessary to reproduce the results can be available at https://github.com/WitnessForest/BVAE.	Qianzhen Rao, Weike Pan, Yang Liu, Zhong Ming	Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
15	Looks Can Be Deceiving: Linking User-Item Interactions and User's Propensity Towards Multi-Objective Recommendations	Multi-objective recommender systems (MORS) provide suggestions to users according to multiple (and possibly conflicting) goals. When a system optimizes its results at the individual-user level, it tailors them on a user's propensity towards the different objectives. Hence, the capability to understand users' fine-grained needs towards each goal is crucial. In this paper, we present the results of a user study in which we monitored the way users interacted with recommended items, as well as their self-proclaimed propensities towards relevance, novelty, and diversity objectives. The study was divided into several sessions, where users evaluated recommendation lists originating from a relevance-only single-objective baseline as well as MORS. We show that, despite MORS-based recommendations attracting fewer selections, their presence in the early sessions are crucial for users' satisfaction in the later stages. Surprisingly, the self-proclaimed willingness of users to interact with novel and diverse items is not always reflected in the recommendations they accept. Post-study questionnaires provide insights on how to deal with this matter, suggesting that MORS-based results should be accompanied by elements that allow users to understand the recommendations, so as to facilitate the choice of whether a recommendation should be accepted or not. Detailed study results are available at https://bit.ly/looks-can-be-deceiving-repo.	Ladislav Peska, Ludovico Boratto, Patrik Dokoupil	Charles Univ Prague, Fac Math & Phys, Prague, Czech Republic; Univ Cagliari, Cagliari, Italy
16	Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions	This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec(+), TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SAS-Rec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code(1) and an anonymized dataset(2).	PaulVincent Kobow, Philipp Normann, Sophie Baumeister, Timo Wilm	OTTO GmbH & Co KG, Hamburg, Germany
17	Pairwise Intent Graph Embedding Learning for Context-Aware Recommendation	Although knowledge graph has shown their effectiveness in mitigating data sparsity in many recommendation tasks, they remain underutilized in context-aware recommender systems (CARS) with the specific sparsity challenges associated with the contextual features, i.e., feature sparsity and interaction sparsity. To bridge this gap, in this paper, we propose a novel pairwise intent graph embedding learning (PING) framework to efficiently integrate knowledge graphs into CARS. Specifically, our PING contains three modules: 1) a graph construction module is used to obtain a pairwise intent graph (PIG) containing nodes for users, items, entities, and enhanced intent, where enhanced intent nodes are generated by applying user intent fusion (UIF) on relational intent and contextual intent, and two sub-intents are derived from the semantic information and contextual information, respectively; 2) a pairwise intent joint graph convolution module is used to obtain the refined embeddings of all the features by executing a customized convolution strategy on PIG, where each enhanced intent node acts as a hub to efficiently propagate information among different features and between all the features and knowledge graph; 3) a recommendation module with the refined embeddings is used to replace the randomly initialized embeddings of downstream recommendation models to improve model performance. Finally, we conduct extensive experiments on three public datasets to verify the effectiveness and compatibility of our PING.	Dugang Liu, Hao Wang, Qinjuan Yang, Weixin Li, Xiaolian Zhang, Yuhao Wu, Zhong Ming	Huawei 2012 Lab, Shenzhen, Peoples R China; Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China; Shenzhen Univ, Guangdong Lab Artificial Intelligence & Digital E, Shenzhen, Peoples R China
18	A Model-Agnostic Framework for Recommendation via Interest-aware Item Embeddings	Item representation holds significant importance in recommendation systems, which encompasses domains such as news, retail, and videos. Retrieval and ranking models utilise item representation to capture the user-item relationship based on user behaviours. While existing representation learning methods primarily focus on optimising item-based mechanisms, such as attention and sequential modelling. However, these methods lack a modelling mechanism to directly reflect user interests within the learned item representations. Consequently, these methods may be less effective in capturing user interests indirectly. To address this challenge, we propose a novel Interest-aware Capsule network (IaCN) recommendation model, a model-agnostic framework that directly learns interest-oriented item representations. IaCN serves as an auxiliary task, enabling the joint learning of both item-based and interest-based representations. This framework adopts existing recommendation models without requiring substantial redesign. We evaluate the proposed approach on benchmark datasets, exploring various scenarios involving different deep neural networks, behaviour sequence lengths, and joint learning ratios of interest-oriented item representations. Experimental results demonstrate significant performance enhancements across diverse recommendation models, validating the effectiveness of our approach.	Amit Kumar Jaiswal, Yu Xiong
19	Gradient Matching for Categorical Data Distillation in CTR Prediction	The cost of hardware and energy consumption on training a click-through rate (CTR) model is highly prohibitive. A recent promising direction for reducing such costs is data distillation with gradient matching, which aims to synthesize a small distilled dataset to guide the model to a similar parameter space as those trained on real data. However, there are two main challenges to implementing such a method in the recommendation field: (1) The categorical recommended data are high dimensional and sparse one- or multi-hot data which will block the gradient flow, causing backpropagation-based data distillation invalid. (2) The data distillation process with gradient matching is computationally expensive due to the bi-level optimization. To this end, we investigate efficient data distillation tailored for recommendation data with plenty of side information where we formulate the discrete data to the dense and continuous data format. Then, we further introduce a one-step gradient matching scheme, which performs gradient matching for only a single step to overcome the inefficient training process. The overall proposed method is called Categorical data distillation with Gradient Matching (CGM), which is capable of distilling a large dataset into a small of informative synthetic data for training CTR models from scratch. Experimental results show that our proposed method not only outperforms the state-of-the-art coreset selection and data distillation methods but also has remarkable cross-architecture performance. Moreover, we explore the application of CGM on model retraining and mitigate the effect of different random seeds on the training results.	Cheng Wang, Jiacheng Sun, Rui Zhang, Ruixuan Li, Zhenhua Dong	Huawei Noahs Ark Lab, Shenzhen, Peoples R China; Huazhong Univ Sci & Technol, Wuhan, Peoples R China; Ruizhang Info, Shenzhen, Peoples R China
20	Augmented Negative Sampling for Collaborative Filtering	Negative sampling is essential for implicit-feedback-based collaborative filtering, which is used to constitute negative signals from massive unlabeled data to guide supervised learning. The state-of-the-art idea is to utilize hard negative samples that carry more useful information to form a better decision boundary. To balance efficiency and effectiveness, the vast majority of existing methods follow the two-pass approach, in which the first pass samples a fixed number of unobserved items by a simple static distribution and then the second pass selects the final negative items using a more sophisticated negative sampling strategy. However, selecting negative samples from the original items in a dataset is inherently restricted due to the limited available choices, and thus may not be able to contrast positive samples well. In this paper, we confirm this observation via carefully designed experiments and introduce two major limitations of existing solutions: ambiguous trap and information discrimination. Our response to such limitations is to introduce "augmented" negative samples that may not exist in the original dataset. This direction renders a substantial technical challenge because constructing unconstrained negative samples may introduce excessive noise that eventually distorts the decision boundary. To this end, we introduce a novel generic augmented negative sampling (ANS) paradigm and provide a concrete instantiation. First, we disentangle hard and easy factors of negative items. Next, we generate new candidate negative samples by augmenting only the easy factors in a regulated manner: the direction and magnitude of the augmentation are carefully calibrated. Finally, we design an advanced negative sampling strategy to identify the final augmented negative samples, which considers not only the score function used in existing methods but also a new metric called augmentation gain. Extensive experiments on real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines. Our code is publicly available at https://github.com/Asa9aoTK/ANS-Recbole.	Hongtao Song, Li Chen, Qilong Han, Riwei Lai, Rui Chen, Yuhan Zhao	Harbin Engn Univ, Harbin, Peoples R China; Hong Kong Baptist Univ, Hong Kong, Peoples R China
21	LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee's Advertisement Recommendation	Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing the graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how GNN is applied for large-scale e-commerce item retrieval at Shopee. We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search. Finally, we design multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee's Recommendation Advertisement system.	Chenfei Wang, Dang Minh Nguyen, Yan Shen, Yifan Zeng	SEA Grp, Shopee, Beijing, Peoples R China; SEA Grp, Shopee, Singapore, Singapore
22	Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance Feedback	Interactive recommendation enables users to provide verbal and non-verbal relevance feedback (such as natural-language critiques and likes/dislikes) when viewing a ranked list of recommendations (such as images of fashion products), in order to guide the recommender system towards their desired items (i.e. goals) across multiple interaction turns. Such a multi-modal interactive recommendation (MMIR) task has been successfully formulated with deep reinforcement learning (DRL) algorithms by simulating the interactions between an environment (i.e. a user) and an agent (i.e. a recommender system). However, it is typically challenging and unstable to optimise the agent to improve the recommendation quality associated with implicit learning of multi-modal representations in an end-to-end fashion in DRL. This is known as the coupling of policy optimisation and representation learning. To address this coupling issue, we propose a novel goal-oriented multi-modal interactive recommendation model (GOMMIR) that uses both verbal and non-verbal relevance feedback to effectively incorporate the users' preferences over time. Specifically, our GOMMIR model employs a multi-task learning approach to explicitly learn the multi-modal representations using a multi-modal composition network when optimising the recommendation agent. Moreover, we formulate the MMIR task using goal-oriented reinforcement learning and enhance the optimisation objective by leveraging non-verbal relevance feedback for hard negative sampling and providing extra goal-oriented rewards to effectively optimise the recommendation agent. Following previous work, we train and evaluate our GOMMIR model by using user simulators that can generate natural-language feedback about the recommendations as a surrogate for real human users. Experiments conducted on four well-known fashion datasets demonstrate that our proposed GOMMIR model yields significant improvements in comparison to the existing state-of-the-art baseline models.	Craig Macdonald, Iadh Ounis, Yaxiong Wu	Univ Glasgow, Glasgow, Lanark, Scotland
23	DREAM: Decoupled Representation via Extraction Attention Module and Supervised Contrastive Learning for Cross-Domain Sequential Recommender	Cross-Domain Sequential Recommendation(CDSR) aims to generate accurate predictions for future interactions by leveraging users' cross-domain historical interactions. One major challenge of CDSR is howto jointly learn the single- and cross-domain user preferences efficiently. To enhance the target domain's performance, most existing solutions start by learning the single-domain user preferences within each domain and then transferring the acquired knowledge from the rich domain to the target domain. However, this approach ignores the inter-sequence item relationship and also limits the opportunities for target domain knowledge to enhance the rich domain performance. Moreover, it also ignores the information within the cross-domain sequence. Despite cross-domain sequences being generally noisy and hard to learn directly, they contain valuable user behavior patterns with great potential to enhance performance. Another key challenge of CDSR is data sparsity, which also exists in other recommendation system problems. In the real world, the data distribution of the recommendation system is highly skewed to the popular products, especially on the large-scale dataset with millions of users and items. One more challenge is the class imbalance problem, inherited by the sequential recommendation problem. Generally, each sample only has one positive and thousands of negative samples. To address the above problems together, an innovative Decoupled Representation via Extraction Attention Module (DREAM) is proposed for CDSR to simultaneously learn singleand cross-domain user preference via decoupled representations. A novel Supervised Contrastive Learning framework is introduced to model the inter-sequence relationship as well as address the data sparsity via data augmentations. DREAM also leverages Focal Loss to put more weight on misclassified samples to address the class-imbalance problem, with another uplift on the overall model performance. Extensive experiments had been conducted on two cross-domain recommendation datasets, demonstrating DREAM outperforms various SOTA cross-domain recommendation algorithms achieving up to a 75% uplift in Movie-Book Scenarios.	Lina Yao, Xiaoxin Ye, Yun Li	CSIROs Data61, Sydney, NSW, Australia; UNSW, Sch Comp Sci & Engn, Sydney, NSW, Australia
24	A Multi-view Graph Contrastive Learning Framework for Cross-Domain Sequential Recommendation	Sequential recommendation methods play an irreplaceable role in recommender systems which can capture the users' dynamic preferences from the behavior sequences. Despite their success, these works usually suffer from the sparsity problem commonly existed in real applications. Cross-domain sequential recommendation aims to alleviate this problem by introducing relatively richer source-domain data. However, most existing methods capture the users' preferences independently of each domain, which may neglect the item transition patterns across sequences from different domains, i.e., a user's interaction in one domain may influence his/her next interaction in other domains. Moreover, the data sparsity problem still exists since some items in the target and source domains are interacted with only a limited number of times. To address these issues, in this paper we propose a generic framework named multi-view graph contrastive learning (MGCL). Specifically, we adopt the contrastive mechanism in an intra-domain item representation view and an inter-domain user preference view. The former is to jointly learn the dynamic sequential information in the user sequence graph and the static collaborative information in the cross-domain global graph, while the latter is to capture the complementary information of the user's preferences from different domains. Extensive empirical studies on three real-world datasets demonstrate that our MGCL significantly outperforms the state-of-the-art methods.	Weike Pan, Zhong Ming, Zitao Xu	Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
25	STAN: Stage-Adaptive Network for Multi-Task Recommendation by Learning User Lifecycle-Based Representation	Recommendation systems play a vital role in many online platforms, with their primary objective being to satisfy and retain users. As directly optimizing user retention is challenging, multiple evaluation metrics are often employed. Current methods often use multi-task learning to optimize these measures. However, they usually miss that users have personal preferences for different tasks, which can change over time. Identifying and tracking the evolution of user preferences can lead to better user retention. To address this issue, we introduce the concept of "user lifecycle," consisting of multiple stages characterized by users' varying preferences for different tasks. We propose a novel Stage-Adaptive Network (STAN) framework for modeling user lifecycle stages. STAN first identifies latent user lifecycle stages based on learned user preferences and then employs the stage representation to enhance multi-task learning performance. Our experimental results using both public and industrial datasets demonstrate that the proposed model significantly improves multi-task prediction performance compared to state-of-the-art methods, highlighting the importance of considering user lifecycle stages in recommendation systems. Online A/B testing reveals that our model outperforms the existing model, achieving a significant improvement of 3.05% in staytime per user and 0.88% in CVR. We have deployed STAN on all Shopee live-streaming recommendation services.	Suhang Wang, Wanda Li, Wenhao Zheng, Xuanji Xiao	Penn State Univ, University Pk, PA 16802 USA; Shopee Co, Beijing, Peoples R China; Tsinghua Univ, Beijing, Peoples R China
26	Bootstrapped Personalized Popularity for Cold Start Recommender Systems	Recommender Systems are severely hampered by the well-known Cold Start problem, identified by the lack of information on new items and users. This has led to research efforts focused on data imputation and augmentation models as predominantly data preprocessing strategies, yet their improvement of cold-user performance is largely indirect and often comes at the price of a reduction in accuracy for warmer users. To address these limitations, we propose Bootstrapped Personalized Popularity (B2P), a novel framework that improves performance for cold users (directly) and cold items (implicitly) via popularity models personalized with item metadata. B2P is scalable to very large datasets and directly addresses the Cold Start problem, so it can complement existing Cold Start strategies. Experiments on a real-world dataset from the BBC iPlayer and a public dataset demonstrate that B2P (1) significantly improves cold-user performance, (2) boosts warm-user performance for bootstrapped models by lowering their training sparsity, and (3) improves total recommendation accuracy at a competitive diversity level relative to existing high-performing Collaborative Filtering models. We demonstrate that B2P is a powerful and scalable framework for strongly cold datasets.	Benjamin Richard Clark, Duncan Martin Walker, Edoardo Gruppi, Iason Chaimalas, Laura Toni	British Broadcasting Corp, London, England; UCL, London, England
27	Co-occurrence Embedding Enhancement for Long-tail Problem in Multi-Interest Recommendation	Multi-interest recommendation methods extract multiple interest vectors to represent the user comprehensively. Despite their success in the matching stage, previous works overlook the long-tail problem. This results in the model excelling at suggesting head items, while the performance for tail items, which make up more than 70% of all items, remains suboptimal. Hence, enhancing the tail item recommendation capability holds great potential for improving the performance of the multi-interest model. Through experimental analysis, we reveal that the insufficient context for embedding learning is the reason behind the under-performance of tail items. Meanwhile, we face two challenges in addressing this issue: the absence of supplementary item features and the need to maintain head item performance. To tackle these challenges, we propose a CoLT module (Co-occurrence embedding enhancement for Long-Tail problem) that replaces the embedding layer of existing multi-interest frameworks. By linking co-occurring items to establish "assistance relationships", CoLT aggregates information from relevant head items into tail item embeddings and enables joint gradient updates. Experiments on three datasets show our method outperforms SOTA models by 21.86% Recall@50 and improves the Recall@50 of tail items by 14.62% on average.	Minghui Zou, Xiaowang Zhang, Yaokun Liu, Zhiyong Feng	Tianjin Univ, Tianjin, Peoples R China
28	On the Consistency of Average Embeddings for Item Recommendation	A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.	Guillaume SalhaGalvan, Romain Hennequin, Thomas Bouabça, Tristan Cazenave, Walid Bendada	Deezer, Paris, France; Univ Paris 09, Deezer, Paris, Dauphine, France; Univ Paris 09, LAMSADE, PSL, Paris, Dauphine, France
29	Progressive Horizon Learning: Adaptive Long Term Optimization for Personalized Recommendation	As E-commerce and subscription services scale, personalized recommender systems are often needed to further drive long term business growth in acquisition, engagement, and retention of customers. However, long-term metrics associated with these goals can require several months to mature. Additionally, deep personalization also demands a large volume of training data that take a long time to collect. These factors incur substantial lead time for training a model to optimize a long-term metric. Before such model is deployed, a recommender system has to rely on a simple policy (e.g. random) to collect customer feedback data for training, inflicting high opportunity cost and delaying optimization of the target metric. Besides, as customer preferences can shift over time, a large temporal gap between inputs and outcome poses a high risk of data staleness and suboptimal learning. Existing approaches involve various compromises. For instance, contextual bandits often optimize short-term surrogate metrics with simple model structure, which can be suboptimal in the long run, while Reinforcement Learning approaches rely on an abundance of historical data for offline training, which essentially means long lead time before deployment. To address these problems, we propose Progressive Horizon Learning Recommender (PHLRec), a personalized model that can progressively learn metric patterns and adaptively evolve from short- to long-term optimization over time. Through simulations and real data experiments, we demonstrated that PHLRec outperforms competing methods, achieving optimality in both deployment speed and long-term metric performances.	Congrui Yi, David Zumwalt, Shreya Chakrabarti, Zijian Ni	Amazon, Seattle, WA 98109 USA
30	From Research to Production: Towards Scalable and Sustainable Neural Recommendation Models on Commodity CPU Hardware	In the last decade, large-scale deep learning has fundamentally transformed industrial recommendation systems. However, this revolutionary technology remains prohibitively expensive due to the need for costly and scarce specialized hardware, such as Graphics Processing Units (GPUs), to train and serve models. In this talk, we share our multi-year journey at ThirdAI in developing efficient neural recommendation models that can be trained and deployed on commodity CPU machines without the need for costly accelerators like GPUs. In particular, we discuss the limitations of the current GPU-based ecosystem in machine learning, why recommendation systems are amenable to the strengths of CPU devices, and present results from our efforts to translate years of academic research into a deployable system that fundamentally shifts the economics of training and operating large-scale machine learning models.	Anshumali Shrivastava, Benito Geordie, David Torres Ramos, Joshua Engels, Nicholas Meisburger, Pratik Pranav, Shubh Gupta, Siddharth Jain, Tharun Medini, Vihan Lakshman, Yashwanth Adunukota	ThirdAI Corp, Houston, TX 77027 USA
31	User-Centric Conversational Recommendation: Adapting the Need of User with Large Language Models	Conversational recommender systems (CRS) promise to provide a more natural user experience for exploring and discovering items of interest through ongoing conversation. However, effectively modeling and adapting to users' complex and changing preferences remains challenging. This research develops user-centric methods that focus on understanding and adapting to users throughout conversations to provide the most helpful recommendations. First, a graph-based Conversational Path Reasoning (CPR) framework is proposed that represents dialogs as interactive reasoning over a knowledge graph to capture nuanced user interests and explain recommendations. To further enhance relationship modeling, graph neural networks are incorporated for improved representation learning. Next, to address uncertainty in user needs, the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario and matching Adaptive Vague Preference Policy Learning (AVPPL) solution are presented using reinforcement learning to tailor recommendations to evolving preferences. Finally, opportunities to leverage large language models are discussed to further advance user experiences via advanced user modeling, policy learning, and response generation. Overall, this research focuses on designing conversational recommender systems that continuously understand and adapt to users' ambiguous, complex and changing needs during natural conversations.	Gangyi Zhang	Univ Sci & Technol China, Hefei, Peoples R China
32	Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation	We consider the problem of sequential recommendation, where the current recommendation is made based on past interactions. This recommendation task requires efficient processing of the sequential data and aims to provide recommendations that maximize the long-term reward. To this end, we train a farsighted recommender by using an offline RL algorithm with the policy network in our model architecture that has been initialized from a pre-trained transformer model. The pre-trained model leverages the superb ability of the transformer to process sequential information. Compared to prior works that rely on online interaction via simulation, we focus on implementing a fully offline RL framework that is able to converge in a fast and stable way. Through extensive experiments on public datasets, we show that our method is robust across various recommendation regimes, including e-commerce and movie suggestions. Compared to state-of-the-art supervised learning algorithms, our algorithm yields recommendations of higher quality, demonstrating the clear advantage of combining RL and transformers.	Liwen Ouyang, Quan Liu, Xumei Xi, Yang Wu, Yuke Zhao
33	Fast and Examination-agnostic Reciprocal Recommendation in Matching Markets	In matching markets such as job posting and online dating platforms, the recommender system plays a critical role in the success of the platform. Unlike standard recommender systems that suggest items to users, reciprocal recommender systems (RRSs) that suggest other users must take into account the mutual interests of users. In addition, ensuring that recommendation opportunities do not disproportionately favor popular users is essential for the total number of matches and for fairness among users. Existing recommendation methods in matching markets, however, face computational challenges on real-world scale platforms and depend on specific examination functions in the position-based model (PBM). In this paper, we introduce the reciprocal recommendation method based on the matching with transferable utility (TU matching) model in the context of ranking recommendations in matching markets, and propose a faster and examination-agnostic algorithm. Furthermore, we evaluate our approach on experiments with synthetic data and real-world data from an online dating platform in Japan. Our method performs better than or as well as existing methods in terms of the total number of matches and works well even in relatively large datasets for which one existing method does not work.	Naoto Ohsaka, Riku Togashi, Yoji Tomita, Yuriko Hashizume	CyberAgent Inc, Tokyo, Japan
34	✨ Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations	Precisely recommending candidate news articles to users has always been a core challenge for personalized news recommendation systems. Most recent works primarily focus on using advanced natural language processing techniques to extract semantic information from rich textual data, employing content-based methods derived from local historical news. However, this approach lacks a global perspective, failing to account for users’ hidden motivations and behaviors beyond semantic information. To address this challenge, we propose a novel model called GLORY (Global-LOcal news Recommendation sYstem), which combines global representations learned from other users with local representations to enhance personalized recommendation systems. We accomplish this by constructing a Global-aware Historical News Encoder, which includes a global news graph and employs gated graph neural networks to enrich news representations, thereby fusing historical news representations by a historical news aggregator. Similarly, we extend this approach to a Global Candidate News Encoder, utilizing a global entity graph and a candidate news aggregator to enhance candidate news representation. Evaluation results on two public news datasets demonstrate that our method outperforms existing approaches. Furthermore, our model offers more diverse recommendations1.	Boming Yang, Dairui Liu, Irene Li, Ruihai Dong, Toyotaro Suzumura	Univ Coll Dublin, Dublin, Ireland; Univ Tokyo, Tokyo, Japan
35	Distribution-based Learnable Filters with Side Information for Sequential Recommendation	Sequential Recommendation aims to predict the next item by mining out the dynamic preference from user previous interactions. However, most methods represent each item as a single fixed vector, which is incapable of capturing the uncertainty of item-item transitions that result from time-dependent and multifarious interests of users. Besides, they struggle to effectively exploit side information that helps to better express user preferences. Finally, the noise in user's access sequence, which is due to accidental clicks, can interfere with the next item prediction and lead to lower recommendation performance. To deal with these issues, we propose DLFS-Rec, a simple and novel model that combines Distribution-based Learnable Filters with Side information for sequential Recommendation. Specifically, items and their side information are represented by stochastic Gaussian distribution, which is described by mean and covariance embeddings, and then the corresponding embeddings are fused to generate a final representation for each item. To attenuate noise, stacked learnable filter layers are applied to smooth the fused embeddings. Extensive experiments on four public real-world datasets demonstrate the superiority of the proposed model over state-of-the-art baselines, especially on cold start users and items. Codes are available at https://github.com/zxiang30/DLFS-Rec.	Haibo Liu, Jinjia Peng, Liang Wang, Shi Feng, Zhixiang Deng	HeBei Univ, Sch Cyber Secur & Comp, Baoding, Peoples R China; Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
36	Reciprocal Sequential Recommendation	Reciprocal recommender system (RRS), considering a two-way matching between two parties, has been widely applied in online platforms like online dating and recruitment. Existing RRS models mainly capture static user preferences, which have neglected the evolving user tastes and the dynamic matching relation between the two parties. Although dynamic user modeling has been well-studied in sequential recommender systems, existing solutions are developed in a user-oriented manner. Therefore, it is non-trivial to adapt sequential recommendation algorithms to reciprocal recommendation. In this paper, we formulate RRS as a distinctive sequence matching task, and further propose a new approach ReSeq for RRS, which is short for Reciprocal Sequential recommendation. To capture dual-perspective matching, we propose to learn fine-grained sequence similarities by co-attention mechanism across different time steps. Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module. In the deployment stage, only the efficient student module is used, greatly speeding up the similarity computation. Extensive experiments on five real-world datasets from two scenarios demonstrate the effectiveness and efficiency of the proposed method. Our code is available at https://github.com/RUCAIBox/ReSeq/.	Bowen Zheng, Hengshu Zhu, Wayne Xin Zhao, Yang Song, Yupeng Hou	BOSS Zhipin, Beijing, Peoples R China; Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
37	STRec: Sparse Transformer for Sequential Recommendations	With the rapid evolution of transformer architectures, researchers are exploring their application in sequential recommender systems (SRSs) and presenting promising performance on SRS tasks compared with former SRS models. However, most existing transformer-based SRS frameworks retain the vanilla attention mechanism, which calculates the attention scores between all item-item pairs. With this setting, redundant item interactions can harm the model performance and consume much computation time and memory. In this paper, we identify the sparse attention phenomenon in transformer-based SRS models and propose Sparse Transformer for sequential Recommendation tasks (STRec) to achieve the efficient computation and improved performance. Specifically, we replace self-attention with cross-attention, making the model concentrate on the most relevant item interactions. To determine these necessary interactions, we design a novel sampling strategy to detect relevant items based on temporal information. Extensive experimental results validate the effectiveness of STRec, which achieves the state-of-the-art accuracy while reducing 54% inference time and 70% memory cost. We also provide massive extended experiments to further investigate the property of our framework.	Chengxi Li, Lixin Zou, Qidong Liu, Qing Li, Wanyu Wang, Wenqi Fan, Xiangyu Zhao, Yejing Wang, Yiqi Wang	City Univ Hong Kong, Hong Kong, Peoples R China; Hong Kong Polytech Univ, Hong Kong, Peoples R China; Michigan State Univ, E Lansing, MI 48824 USA; Wuhan Univ, Wuhan, Peoples R China
38	Deep Situation-Aware Interaction Network for Click-Through Rate Prediction	User behavior sequence modeling plays a significant role in Click-Through Rate (CTR) prediction on e-commerce platforms. Except for the interacted items, user behaviors contain rich interaction information, such as the behavior type, time, location, etc. However, so far, the information related to user behaviors has not yet been fully exploited. In the paper, we propose the concept of a situation and situational features for distinguishing interaction behaviors and then design a CTR model named Deep Situation-Aware Interaction Network (DSAIN). DSAIN first adopts the reparameterization trick to reduce noise in the original user behavior sequences. Then it learns the embeddings of situational features by feature embedding parameterization and tri-directional correlation fusion. Finally, it obtains the embedding of behavior sequence via heterogeneous situation aggregation. We conduct extensive offline experiments on three real-world datasets. Experimental results demonstrate the superiority of the proposed DSAIN model. More importantly, DSAIN has increased the CTR by 2.70%, the CPM by 2.62%, and the GMV by 2.16% in the online A/B test. Now, DSAIN has been deployed on the Meituan food delivery platform and serves the main traffic of the Meituan takeout app. Our source code is available at https://github.com/W-void/DSAIN.	Beihong Jin, Dong Wang, Jian Dong, Shuli Wang, Xingxing Wang, Yapeng Zhang, Yimin Lv, Yisong Yu, Yongkang Wang	Meituan, Beijing, Peoples R China; Univ Chinese Acad Sci, Chinese Acad Sci, Inst Software, Beijing, Peoples R China
39	Equivariant Contrastive Learning for Sequential Recommendation	Contrastive learning (CL) benefits the training of sequential recommendation models with informative self-supervision signals. Existing solutions apply general sequential data augmentation strategies to generate positive pairs and encourage their representations to be invariant. However, due to the inherent properties of user behavior sequences, some augmentation strategies, such as item substitution, can lead to changes in user intent. Learning indiscriminately invariant representations for all augmentation strategies might be suboptimal. Therefore, we propose Equivariant Contrastive Learning for Sequential Recommendation (ECL-SR), which endows SR models with great discriminative power, making the learned user behavior representations sensitive to invasive augmentations (e.g., item substitution) and insensitive to mild augmentations (e.g., featurelevel dropout masking). In detail, we use the conditional discriminator to capture differences in behavior due to item substitution, which encourages the user behavior encoder to be equivariant to invasive augmentations. Comprehensive experiments on four benchmark datasets show that the proposed ECL-SR framework achieves competitive performance compared to state-of-the-art SR models. The source code is available at https://github.com/Tokkiu/ECL.	Jaeboum Kim, Jingqi Gao, Peilin Zhou, Qichen Ye, Shoujin Wang, Sunghun Kim, Yining Hua, Yueqi Xie	Harvard Univ, Cambridge, MA USA; Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China; Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China; Peking Univ, Beijing, Peoples R China; Univ Technol Sydney, Sydney, NSW, Australia; Upstage, Hong Kong, Peoples R China
40	Task Aware Feature Extraction Framework for Sequential Dependence Multi-Task Learning	In online recommendation, financial service, etc., the most common application of multi-task learning (MTL) is the multi-step conversion estimations. A core property of the multi-step conversion is the sequential dependence among tasks. However, most existing works focus far more on the specific post-view click-through rate (CTR) and post-click conversion rate (CVR) estimations, which neglect the generalization of sequential dependence multi-task learning (SDMTL). Additionally, the performance of the SDMTL framework is also deteriorated by the interference derived from implicitly conflict information passing between adjacent tasks. In this paper, a systematic learning paradigm of the SDMTL problem is established for the first time, which can transform the SDMTL problem into a general MTL problem with constraints and be applicable to more general multi-step conversion scenarios with stronger task dependence. Also, the distribution dependence relationship between adjacent task spaces is illustrated from a theoretical point of view. On the other hand, an SDMTL architecture, named Task Aware Feature Extraction (TAFE), is developed to enable dynamic task representation learning from a sample-wise view. TAFE selectively reconstructs the implicit shared information corresponding to each sample case and performs explicit task-specific extraction under dependence constraints. Extensive experiments on offline public and real-world industrial datasets, and online A/B implementations demonstrate the effectiveness and applicability of proposed theoretical and implementation frameworks.	Bing Han, Hongwei Cheng, Linxun Cheng, Mingming Ha, Qiongxu Ma, Wenfang Lin, Xiaobo Guo, Xuewen Tao	MYbank, Ant Grp, Beijing, Peoples R China; MYbank, Ant Grp, Hangzhou, Zhejiang, Peoples R China; MYbank, Ant Grp, Shanghai, Peoples R China
41	AutoOpt: Automatic Hyperparameter Scheduling and Optimization for Deep Click-through Rate Prediction	Click-through Rate (CTR) prediction is essential for commercial recommender systems. Recently, to improve the prediction accuracy, plenty of deep learning-based CTR models have been proposed, which are sensitive to hyperparameters and difficult to optimize well. General hyperparameter optimization methods fix these hyperparameters across the entire model training and repeat them multiple times. This trial-and-error process not only leads to suboptimal performance but also requires non-trivial computation efforts. In this paper, we propose an automatic hyperparameters scheduling and optimization method for deep CTR models, AutoOpt, making the optimization process more stable and efficient. Specifically, the whole training regime is firstly divided into several consecutive stages, where a data-efficient model is learned to model the relation between model states and prediction performance. To optimize the stage-wise hyperparameters, AutoOpt uses the global and local scheduling modules to propose proper hyperparameters for the next stage based on the training in the current stage. Extensive experiments on three public benchmarks are conducted to validate the effectiveness of AutoOpt. Moreover, AutoOpt has been deployed onto an advertising platform and a music platform, where online A/B tests also demonstrate superior improvement. In addition, the code of our algorithm is publicly available in MindSpore1.	Bo Chen, Ruiming Tang, Xing Tang, Yimin Huang, Yujun Li, Zhenguo Li	Noahs Ark Lab, Hong Kong, Peoples R China
42	Alleviating the Long-Tail Problem in Conversational Recommender Systems	Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations. To develop an effective CRS, high-quality CRS datasets are very crucial. However, existing CRS datasets suffer from the long-tail issue, i.e., a large proportion of items are rarely (or even never) mentioned in the conversations, which are called long-tail items. As a result, the CRSs trained on these datasets tend to recommend frequent items, and the diversity of the recommended items would be largely reduced, making users easier to get bored. To address this issue, this paper presents LOT-CRS, a novel framework that focuses on simulating and utilizing a balanced CRS dataset (i.e., covering all the items evenly) for improving LOng-Tail recommendation performance of CRSs. In our approach, we design two pre-training tasks to enhance the understanding of simulated conversation for long-tail items, and adopt retrieval-augmented fine-tuning with label smoothness strategy to further improve the recommendation of long-tail items. Extensive experiments on two public CRS datasets have demonstrated the effectiveness and extensibility of our approach, especially on long-tail recommendation. Our code is publicly available at the link: https://github.com/Oran-Ac/LOT-CRS.	Fan Pan, JiRong Wen, Kun Zhou, Wayne Xin Zhao, Xiaolei Wang, Zhao Cao, Zhipeng Zhao	Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China; Huawei, Poisson Lab, Shenzhen, Peoples R China; Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China; Renmin Univ China, Sch Informat, Beijing, Peoples R China
43	Reproducibility of Multi-Objective Reinforcement Learning Recommendation: Interplay between Effectiveness and Beyond-Accuracy Perspectives	Providing effective suggestions is of predominant importance for successful Recommender Systems (RSs). Nonetheless, the need of accounting for additional multiple objectives has become prominent, from both the final users’ and the item providers’ points of view. This need has led to a new class of RSs, called Multi-Objective Recommender Systems (MORSs). These systems are designed to provide suggestions by considering multiple (conflicting) objectives simultaneously, such as diverse, novel, and fairness-aware recommendations. In this work, we reproduce a state-of-the-art study on MORSs that exploits a reinforcement learning agent to satisfy three objectives, i.e., accuracy, diversity, and novelty of recommendations. The selected study is one of the few MORSs where the source code and datasets are released to ensure the reproducibility of the proposed approach. Interestingly, we find that some challenges arise when replicating the results of the original work, due to the nature of multiple-objective problems. We also extend the evaluation of the approach to analyze the impact of improving user-centered objectives of recommendations (i.e., diversity and novelty) in terms of algorithmic bias. To this end, we take into consideration both popularity and category of the items. We discover some interesting trends in the recommendation performance according to different evaluation metrics. In addition, we see that the multi-objective reinforcement learning approach is responsible for increasing the bias disparity in the output of the recommendation algorithm for those items belonging to positively/negatively biased categories. We publicly release datasets and codes in the following GitHub repository: https://github.com/sisinflab/MORS_reproducibility.	Ludovico Boratto, Tommaso Di Noia, Vincenzo Paparella, Vito Walter Anelli	Politecn Bari, Bari, Italy; Univ Cagliari, Cagliari, Italy
44	Personalised Recommendations for the BBC iPlayer: Initial approach and current challenges	BBC iPlayer is one of the most important digital products of the BBC, offering live and on-demand television for audiences in the UK with over 10 million weekly active users. The BBC’s role as a public service broadcaster, broadcasting over traditional linear channels as well as online presents a number of challenges for a recommender system. In addition to having substantially different objectives to a commercial service, we show that the diverse content offered by the BBC including news and sport, factual, drama and live events lead to a catalogue with a diversity of consumption patterns, depending on genre. Our research shows that simple models represent strong baselines in this system. We discuss our initial attempts to improve upon these baselines, and conclude with our current challenges.	Benjamin Richard Clark, Duncan Martin Walker, Kristine Grivcova, Polina Proutskova	British Broadcasting Corp, London, England
45	Heterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM	The analysis and mining of user heterogeneous behavior are of paramount importance in recommendation systems. However, the conventional approach of incorporating various types of heterogeneous behavior into recommendation models leads to feature sparsity and knowledge fragmentation issues. To address this challenge, we propose a novel approach for personalized recommendation via Large Language Model (LLM), by extracting and fusing heterogeneous knowledge from user heterogeneous behavior information. In addition, by combining heterogeneous knowledge and recommendation tasks, instruction tuning is performed on LLM for personalized recommendations. The experimental results demonstrate that our method can effectively integrate user heterogeneous behavior and significantly improve recommendation performance.	Bin Yin, Junjie Xie, Wei Lin, Xiang Li, Yu Qin, Zhichao Feng, Zixiang Ding	Meituan, Beijing, Peoples R China; Unaffiliated, Beijing, Peoples R China
46	MCM: A Multi-task Pre-trained Customer Model for Personalization	Personalization plays a critical role in helping customers discover the products and contents they prefer for e-commerce stores.Personalized recommendations differ in contents, target customers, and UI. However, they require a common core capability - the ability to deeply understand customers’ preferences and shopping intents. In this paper, we introduce the MCM (Multi-task pre-trained Customer Model), a large pre-trained BERT-based multi-task customer model with 10 million trainable parameters for e-commerce stores. This model aims to empower all personalization projects by providing commonly used preference scores for recommendations, customer embeddings for transfer learning, and a pre-trained model for fine-tuning. In this work, we improve the SOTA BERT4Rec framework to handle heterogeneous customer signals and multi-task training as well as innovate new data augmentation method that is suitable for recommendation task. Experimental results show that MCM outperforms the original BERT4Rec by 17% on on NDCG@10 of next action prediction tasks. Additionally, we demonstrate that the model can be easily fine-tuned to assist a specific recommendation task. For instance, after fine-tuning MCM for an incentive based recommendation project, performance improves by 60% on the conversion prediction task and 25% on the click-through prediction task compared to a baseline tree-based GBDT model.	Jingyuan Deng, Peng Wan, Rui Luo, Tianxin Wang	Amazon LLC, Beijing, Peoples R China
47	Beyond the Sequence: Statistics-Driven Pre-training for Stabilizing Sequential Recommendation Model	The sequential recommendation task aims to predict the item that user is interested in according to his/her historical action sequence. However, inevitable random action, i.e. user randomly accesses an item among multiple candidates or clicks several items at random order, cause the sequence fails to provide stable and high-quality signals. To alleviate the issue, we propose the StatisTics-Driven Pre-traing framework (called STDP briefly). The main idea of the work lies in the exploration of utilizing the statistics information along with the pre-training paradigm to stabilize the optimization of recommendation model. Specifically, we derive two types of statistical information: item co-occurrence across sequence and attribute frequency within the sequence. And we design the following pre-training tasks: 1) The co-occurred items prediction task, which encourages the model to distribute its attention on multiple suitable targets instead of just focusing on the next item that may be unstable. 2) We generate a paired sequence by replacing items with their co-occurred items and enforce its representation close with the original one, thus enhancing the model’s robustness to the random noise. 3) To reduce the impact of random on user’s long-term preferences, we encourage the model to capture sequence-level frequent attributes. The significant improvement over six datasets demonstrates the effectiveness and superiority of the proposal, and further analysis verified the generalization of the STDP framework on other models.	Hongzhi Zhang, Peiguang Li, Sirui Wang, Yunsen Xian	Meituan, Beijing, Peoples R China; Tsinghua Univ, Dept Automat, Beijing, Peoples R China
48	Personalized Category Frequency prediction for Buy It Again recommendations	Buy It Again (BIA) recommendations are crucial to retailers to help improve user experience and site engagement by suggesting items that customers are likely to buy again based on their own repeat purchasing patterns. Most existing BIA studies analyze guests’ personalized behaviour at item granularity. This finer level of granularity might be appropriate for small businesses or small datasets for search purposes. However, this approach can be infeasible for big retailers which have hundreds of millions of guests and tens of millions of items. For such data sets, it is more practical to have a coarse-grained model that captures customer behaviour at the item category level. In addition, customers commonly explore variants of items within the same categories, e.g., trying different brands or flavors of yogurt. A category-based model may be more appropriate in such scenarios. We propose a recommendation system called a hierarchical PCIC model that consists of a personalized category model (PC model) and a personalized item model within categories (IC model). PC model generates a personalized list of categories that customers are likely to purchase again. IC model ranks items within categories that guests are likely to reconsume within a category. The hierarchical PCIC model captures the general consumption rate of products using survival models. Trends in consumption are captured using time series models. Features derived from these models are used in training a category-grained neural network. We compare PCIC to twelve existing baselines on four standard open datasets. PCIC improves NDCG up to 16% while improving recall by around 2%. We were able to scale and train (over 8 hours) PCIC on a large dataset of 100M guests and 3M items where repeat categories of a guest outnumber repeat items. PCIC was deployed and A/B tested on the site of a major retailer, leading to significant gains in guest engagement.	Amit Pande, Kunal Ghosh, Rankyung Park	Target Corp, Data Sci, Brooklyn Pk, MN 55445 USA
49	Hessian-aware Quantized Node Embeddings for Recommendation	Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent vanishing gradients during back-propagation. However, the STE often causes the gradient mismatch problem, leading to sub-optimal results. In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings and a quantized module for compressing full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from both lower memory requirements and faster inference speeds compared to vanilla GNNs. To address the gradient mismatch problem in STE, we further consider the quantized errors and its second-order derivatives for better stability. The experimental results on several large-scale datasets show that HQ-GNN achieves a good balance between latency and performance.	ChinChia Michael Yeh, Hao Yang, Huiyuan Chen, Kaixiong Zhou, KweiHerng Lai, Xia Hu, Yan Zheng	Rice Univ, Houston, TX USA; Visa Res, Palo Alto, CA 94404 USA
50	Scalable Approximate NonSymmetric Autoencoder for Collaborative Filtering	In the field of recommender systems, shallow autoencoders have recently gained significant attention. One of the most highly acclaimed shallow autoencoders is easer, favored for its competitive recommendation accuracy and simultaneous simplicity. However, the poor scalability of easer (both in time and especially in memory) severely restricts its use in production environments with vast item sets. In this paper, we propose a hyperefficient factorization technique for sparse approximate inversion of the data-Gram matrix used in easer. The resulting autoencoder, sansa, is an end-to-end sparse solution with prescribable density and almost arbitrarily low memory requirements — even for training. As such, sansa allows us to effortlessly scale the concept of easer to millions of items and beyond.	Antonín Hoskovec, Ladislav Peska, Martin Spisák, Miroslav Tuma, Radek Bartyzal	Charles Univ Prague, Fac Math & Phys, Prague, Czech Republic; GLAMI, Prague, Czech Republic
51	M3REC: A Meta-based Multi-scenario Multi-task Recommendation Framework	Users in recommender systems exhibit multi-behavior in multiple business scenarios on real-world e-commerce platforms. A crucial challenge in such systems is to make recommendations for each business scenario at the same time. On top of this, multiple predictions (e.g., Click Through Rate and Conversion Rate) need to be made simultaneously in order to improve the platform revenue. Research focus on making recommendations for several business scenarios is in the field of Multi-Scenario Recommendation (MSR), and Multi-Task Recommendation (MTR) mainly attempts to solve the possible problems in collaboratively executing different recommendation tasks. However, existing researchers have paid attention to either MSR or MTR, ignoring the integration of MSR and MTR that faces the issue of conflict between scenarios and tasks. To address the above issue, we propose a Meta-based Multi-scenario Multi-task RECommendation framework (M3REC) to serve multiple tasks in multiple business scenarios by a unified model. However, integrating MSR and MTR in a proper manner is non-trivial due to: 1) Unified representation problem: Users’ and items’ representation behave Non-i.i.d in different scenarios and tasks which takes inconsistency into recommendations. 2) Synchronous optimization problem: Tasks distribution varies in different scenarios, and a unified optimization method is needed to optimize multi-tasks in multi-scenarios. Thus, to unified represent users and items, we design a Meta-Item-Embedding Generator (MIEG) and a User-Preference Transformer (UPT). The MIEG module can generate initialized item embedding using item features through meta-learning technology, and the UPT module can transfer user preferences in other scenarios. Besides, the M3REC framework uses a specifically designed backbone network together with a task-specific aggregate gate to promote all tasks to achieve the purpose of optimizing multiple tasks in multiple business scenarios within one model. Experiments on two public datasets have shown that M3REC outperforms those compared MSR and MTR state-of-the-art methods.	Xianneng Li, Yingyi Zhang, Zerong Lan	Dalian Univ Technol, Sch Econ & Management, Dalian, Liaoning, Peoples R China
52	Incorporating Time in Sequential Recommendation Models	Sequential models are designed to learn sequential patterns in data based on the chronological order of user interactions. However, they often ignore the timestamps of these interactions. Incorporating time is crucial because many sequential patterns are time-dependent, and the model cannot make time-aware recommendations without considering time. This article demonstrates that providing a rich representation of time can significantly improve the performance of sequential models. The existing literature treats time as a one-dimensional time-series obtained by quantizing time. In this study, we propose treating time as a multi-dimensional time-series and explore representation learning methods, including a kernel based method and an embedding-based algorithm. Experiments on multiple datasets show that the inclusion of time significantly enhances the model’s performance, and multi-dimensional methods outperform the one-dimensional method by a substantial margin.	Fei Wang, James Caverlee, Mostafa Rahmani	Amazon, Schertz, TX USA; Amazon, Seattle, WA 98109 USA; Amazon, Seattle, WA USA
53	Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation	Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the problem of data sparsity, previous studies have utilized self-supervised learning to enhance Transformers, such as pre-training embeddings from item attributes or contrastive data augmentations. However, these approaches encounter several training issues, including initialization sensitivity, manual data augmentations, and large batch-size memory bottlenecks. In this work, we investigate Transformers from the perspective of loss geometry, aiming to enhance the models' data efficiency and generalization in sequential recommendation. We observe that Transformers (e.g., SASRec) can converge to extremely sharp local minima if not adequately regularized. Inspired by the recent Sharpness-Aware Minimization (SAM), we propose SAMRec, which significantly improves the accuracy and robustness of sequential recommendation. SAMRec performs comparably to state-of-the-art self-supervised Transformers, such as S^3Rec and CL4SRec, without the need for pre-training or strong data augmentations.	ChinChia Michael Yeh, Hao Yang, Huiyuan Chen, Minghua Xu, Vivian Lai, Yiwei Cai	Visa Res, Palo Alto, CA 94111 USA
54	Initiative transfer in conversational recommender systems	Conversational recommender systems (CRS) are increasingly designed to offer mixed-initiative dialogs in which the user and the system can take turns in starting a communicative exchange, for example, by asking questions or stating preferences. However, whether and when users make use of the mixed-initiative capabilities in a CRS and which factors influence their behavior is as yet not well understood. We report an online study investigating user interaction behavior, especially the transfer of initiative between user and system in a real-time online CRS. We assessed the impact of dialog initiative at system start as well as of several psychological user characteristics that may influence their preference for either initiative mode. To collect interaction data, we implemented a chatbot in the domain of smartphones. Two groups of participants on Prolific (total n=143) used the system which started either with a system-initiated or user-initiated dialog. In addition to interaction data, we measured several psychological factors as well as users’ subjective assessment of the system through questionnaires. We found that: 1. Most users tended to take over the initiative from the system or stay in user-initiated mode when this mode was offered initially. 2. Starting the dialog in user-initiated mode CRS led to fewer interactions needed for selecting a product than in system-initiated mode. 3. The user’s initiative transfer was mainly affected by their personal interaction preferences (especially initiative preference). 4. The initial mode of the mixed-initiative CRS did not affect the user experience, but the occurrence of initiative transfers in the dialog negatively affected the degree of user interest and excitement. The results can inform the design and potential personalization of CRS.	Jürgen Ziegler, Yuan Ma	Univ Duisburg Essen, Duisburg, Germany
55	RecQR: Using Recommendation Systems for Query Reformulation to correct unseen errors in spoken dialog systems	As spoken dialog systems like Siri, Alexa and Google Assistant become widespread, it becomes apparent that relying solely on global, one-size-fits-all models of Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Entity Resolution (ER), is inadequate for delivering a friction-less customer experience. To address this issue, Query Reformulation (QR) has emerged as a crucial technique for personalizing these systems and reducing customer friction. However, existing QR models, trained on personal rephrases in history face a critical drawback - they are unable to reformulate unseen queries to unseen targets. To alleviate this, we present RecQR, a novel system based on collaborative filters, designed to reformulate unseen defective requests to target requests that a customer may never have requested for in the past. RecQR anticipates a customer’s future requests and rewrites them using state of the art, large-scale, collaborative filtering and query reformulation models. Based on experiments we find that it reduces errors by nearly 40% (relative) on the reformulated utterances.	Kanna Shimizu, Manik Bhandari, Mingxian Wang, Oleg Poliannikov	Amazon Alexa AI, Arlington, VA 22203 USA
56	Optimizing Podcast Discovery: Unveiling Amazon Music's Retrieval and Ranking Framework	This work presents the search and discovery architecture of Amazon Music, a highly efficient system designed to retrieve relevant music content for users. The architecture consists of three key stages: indexing, retrieval, and ranking. During the indexing stage, data is meticulously parsed and processed to create a comprehensive index that contains dense representations and essential information about each document (such as a music or podcast entity) in the collection, including its title, metadata, and relevant attributes. This indexing process enables fast and efficient data access during retrieval. The retrieval stage utilizes multi-faceted retrieval strategies, resulting in improved identification of candidate matches compared to traditional structured search methods. Subsequently, candidates are ranked based on their relevance to the customer’s query, taking into account document features and personalized factors. With a specific focus on the podcast use case, this paper highlights the deployment of the architecture and demonstrates its effectiveness in enhancing podcast search capabilities, providing tailored and engaging content experiences.	Geetha Sai Aluri, Joaquin Delgado, Paul Greyson	Amazon Mus Search, San Francisco, CA 94105 USA
57	OutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking	The design of modern recommender systems relies on understanding which parts of the feature space are relevant for solving a given recommendation task. However, real-world data sets in this domain are often characterized by their large size, sparsity, and noise, making it challenging to identify meaningful signals. Feature ranking represents an efficient branch of algorithms that can help address these challenges by identifying the most informative features and facilitating the automated search for more compact and better-performing models (AutoML). We introduce OutRank, a system for versatile feature ranking and data quality-related anomaly detection. OutRank was built with categorical data in mind, utilizing a variant of mutual information that is normalized with regard to the noise produced by features of the same cardinality. We further extend the similarity measure by incorporating information on feature similarity and combined relevance. The proposed approach's feasibility is demonstrated by speeding up the state-of-the-art AutoML system on a synthetic data set with no performance loss. Furthermore, we considered a real-life click-through-rate prediction data set where it outperformed strong baselines such as random forest-based approaches. The proposed approach enables exploration of up to 300% larger feature spaces compared to AutoML-only approaches, enabling faster search for better models on off-the-shelf hardware.	Blaz Mramor, Blaz Skrlj	Outbrain, Ljubljana, Slovenia
58	Improving Group Recommendations using Personality, Dynamic Clustering and Multi-Agent MicroServices	The complexity associated to group recommendations needs strategies to mitigate several problems, such as the group's heterogeinity and conflicting preferences, the emotional contagion phenomenon, the cold-start problem, and the group members’ needs and concerns while providing recommendations that satisfy all members at once. In this demonstration, we show how we implemented a Multi-Agent Microservice to model the tourists in a mobile Group Recommender System for Tourism prototype and a novel dynamic clustering process to help minimize the group's heterogeneity and conflicting preferences. To help solve the cold-start problem, the preliminary tourist attractions preference and travel-related preferences & concerns are predicted using the tourists' personality, considering the tourists’ disabilities and fears/phobias. Although there is no need for data from previous interactions to build the tourists’ profile since we predict the tourists’ preferences, the tourist agents learn with each other by using association rules to find patterns in the tourists' profile and in the ratings given to Points of Interest to refine the recommendations.	André Martins, Goreti Marreiros, Patrícia Alves, Paulo Novais	Polytech Porto, Super Inst Engn Porto, GECAD LASI, Porto, Portugal; Univ Minho, ALGORITMI LASI, Braga, Portugal
59	Power Loss Function in Neural Networks for Predicting Click-Through Rate	Loss functions guide machine learning models towards concentrating on the error most important to improve upon. We introduce power loss functions for neural networks and apply them on imbalanced click-through rate datasets. Power loss functions decrease the loss for confident predictions and increase the loss for error-prone predictions. They improve both AUC and F1 and produce better calibrated results. We obtain improvements in the results on four different classifiers and on two different datasets. We obtain significant improvements in AUC that reach 0.44% for DeepFM on the Avazu dataset.	Ergun Biçici	Huawei Turkiye R&D Ctr, Istanbul, Turkiye
60	Sequential Recommendation Models: A Graph-based Perspective	Recommender systems (RS) traditionally leverage the users’ rich interaction data with the system, but ignore the sequential dependency of items. Sequential recommender systems aim to predict the next item the user will interact with (e.g., click on, purchase, or listen to) based on the preceding interactions of the user with the system. Current state-of-the-art approaches focus on transformer-based architectures and graph neural networks. Specifically, graph-based modeling of sequences has been shown to be state-of-the-art by introducing a structured, inductive bias into the recommendation learning framework. In this work, we outline our research into designing novel graph-based methods for sequential recommendation.	Andreas Peintner	Univ Innsbruck, Innsbruck, Austria
61	Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models	Recommender Systems (RSs) play a pivotal role in delivering personalized recommendations across various domains, from e-commerce to content streaming platforms. Recent advancements in natural language processing have introduced Large Language Models (LLMs) that exhibit remarkable capabilities in understanding and generating human-like text. RS are renowned for their effectiveness and proficiency within clearly defined domains; nevertheless, they are limited in adaptability and incapable of providing recommendations for unexplored data. Conversely, LLMs exhibit contextual awareness and strong adaptability to unseen data. Combining these technologies creates a powerful tool for delivering contextual and relevant recommendations, even in cold scenarios characterized by high data sparsity. The proposal aims to explore the possibilities of integrating LLMs into RS, introducing a novel approach called Retrieval-augmented Recommender Systems, which combines the strengths of retrieval-based and generation-based models to enhance the ability of RSs to provide relevant suggestions.	Dario Di Palma	Politecn Bari, Bari, Italy
62	Leveraging Large Language Models for Sequential Recommendation	Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we devise and evaluate three approaches to leverage the power of LLMs in different ways. Our results from experiments on two datasets show that initializing the state-of-the-art sequential recommendation model BERT4Rec with embeddings obtained from an LLM improves NDCG by 15-20% compared to the vanilla BERT4Rec model. Furthermore, we find that a simple approach that leverages LLM embeddings for producing recommendations, can provide competitive performance by highlighting semantically related items. We publicly share the code and data of our experiments to ensure reproducibility.1	Asterios Katsifodimos, Dietmar Jannach, Jesse Harte, Marios Fragkoulis, Panos Louridas, Wouter Zorgdrager	Athens Univ Econ & Business, Athens, Greece; Delft Univ Technol, Delft, Netherlands; Delivery Hero Res, Berlin, Germany; Univ Klagenfurt, Klagenfurt, Austria
63	Uncovering User Interest from Biased and Noised Watch Time in Video Recommendation	In the video recommendation, watch time is commonly adopted as an indicator of user interest. However, watch time is not only influenced by the matching of users' interests but also by other factors, such as duration bias and noisy watching. Duration bias refers to the tendency for users to spend more time on videos with longer durations, regardless of their actual interest level. Noisy watching, on the other hand, describes users taking time to determine whether they like a video or not, which can result in users spending time watching videos they do not like. Consequently, the existence of duration bias and noisy watching make watch time an inadequate label for indicating user interest. Furthermore, current methods primarily address duration bias and ignore the impact of noisy watching, which may limit their effectiveness in uncovering user interest from watch time. In this study, we first analyze the generation mechanism of users' watch time from a unified causal viewpoint. Specifically, we considered the watch time as a mixture of the user's actual interest level, the duration-biased watch time, and the noisy watch time. To mitigate both the duration bias and noisy watching, we propose Debiased and Denoised watch time Correction (D^2Co), which can be divided into two steps: First, we employ a duration-wise Gaussian Mixture Model plus frequency-weighted moving average for estimating the bias and noise terms; then we utilize a sensitivity-controlled correction function to separate the user interest from the watch time, which is robust to the estimation error of bias and noise terms. The experiments on two public video recommendation datasets and online A/B testing indicate the effectiveness of the proposed method.	Guohao Cai, Haiyuan Zhao, JiRong Wen, Jun Xu, Lei Zhang, Zhenhua Dong	Huawei, Noahs Ark Lab, Shenzhen, Peoples R China; Renmin Univ, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China; Univ China, Beijing, Peoples R China
64	Nonlinear Bandits Exploration for Recommendations	The paradigm of framing recommendations as (sequential) decision-making processes has gained significant interest. To achieve long-term user satisfaction, these interactive systems need to strike a balance between exploitation (recommending high-reward items) and exploration (exploring uncertain regions for potentially better items). Classical bandit algorithms like Upper-Confidence-Bound and Thompson Sampling, and their contextual extensions with linear payoffs have exhibited strong theoretical guarantees and empirical success in managing the exploration-exploitation trade-off. Building efficient exploration-based systems for deep neural network powered real-world, large-scale industrial recommender systems remains under studied. In addition, these systems are often multi-stage, multi-objective and response time sensitive. In this talk, we share our experience in addressing these challenges in building exploration based industrial recommender systems. Specifically, we adopt the Neural Linear Bandit algorithm, which effectively combines the representation power of deep neural networks, with the simplicity of linear bandits to incorporate exploration in DNN based recommender systems. We introduce exploration capability to both the nomination and ranking stage of the industrial recommender system. In the context of the ranking stage, we delve into the extension of this algorithm to accommodate the multi-task setup, enabling exploration in systems with multiple objectives. Moving on to the nomination stage, we will address the development of efficient bandit algorithms tailored to factorized bi-linear models. These algorithms play a crucial role in facilitating maximum inner product search, which is commonly employed in large-scale retrieval systems. We validate our algorithms and present findings from real-world live experiments.	Minmin Chen, Yi Su	Google, Mountain View, CA 94043 USA
65	SPARE: Shortest Path Global Item Relations for Efficient Session-based Recommendation	Session-based recommendation aims to predict the next item based on a set of anonymous sessions. Capturing user intent from a short interaction sequence imposes a variety of challenges since no user profiles are available and interaction data is naturally sparse. Recent approaches relying on graph neural networks (GNNs) for session-based recommendation use global item relations to explore collaborative information from different sessions. These methods capture the topological structure of the graph and rely on multi-hop information aggregation in GNNs to exchange information along edges. Consequently, graph-based models suffer from noisy item relations in the training data and introduce high complexity for large item catalogs. We propose to explicitly model the multi-hop information aggregation mechanism over multiple layers via shortest-path edges based on knowledge from the sequential recommendation domain. Our approach does not require multiple layers to exchange information and ignores unreliable item-item relations. Furthermore, to address inherent data sparsity, we are the first to apply supervised contrastive learning by mining data-driven positive and hard negative item samples from the training data. Extensive experiments on three different datasets show that the proposed approach outperforms almost all of the state-of-the-art methods.	Amir Reza Mohammadi, Andreas Peintner, Eva Zangerle	Univ Innsbruck, Innsbruck, Austria
66	When Fairness meets Bias: a Debiased Framework for Fairness aware Top-N Recommendation	Fairness in the recommendation domain has recently attracted increasing attention due to more and more concerns about the algorithm discrimination and ethics. While recent years have witnessed many promising fairness aware recommender models, an important problem has been largely ignored, that is, the fairness can be biased due to the user personalized selection tendencies or the non-uniform item exposure probabilities. To study this problem, in this paper, we formally define a novel task named as unbiased fairness aware Top-N recommendation. For solving this task, we firstly define an ideal loss function based on all the user-item pairs. Considering that, in real-world datasets, only a small number of user-item interactions can be observed, we then approximate the above ideal loss with a more tractable objective based on the inverse propensity score (IPS). Since the recommendation datasets can be noisy and quite sparse, which brings difficulties for accurately estimating the IPS, we propose to optimize the objective in an IPS range instead of a specific point, which improves the model fault tolerance capability. In order to make our model more applicable to the commonly studied Top-N recommendation, we soften the ranking metrics such as Precision, Hit-Ratio, and NDCG to derive a fully differentiable framework. We conduct extensive experiments to demonstrate the effectiveness of our model based on four real-world datasets.	Jiakai Tang, Jingsen Zhang, Shiqi Shen, Xu Chen, Zhi Gong, Zhipeng Wang	Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China; Tencent, Wechat, Beijing, Peoples R China
67	A Probabilistic Position Bias Model for Short-Video Recommendation Feeds	Modern web-based platforms often show ranked lists of recommendations to users, in an attempt to maximise user satisfaction or business metrics. Typically, the goal of such systems boils down to maximising the exposure probability -conversely, minimising the rank- for items that are deemed "reward-maximising" according to some metric of interest. This general framing comprises music or movie streaming applications, as well as e-commerce, restaurant or job recommendations, and even web search. Position bias or user models can be used to estimate exposure probabilities for each use-case, specifically tailored to how users interact with the presented rankings. A unifying factor in these diverse problem settings is that typically only one or several items will be engaged with (clicked, streamed, purchased, et cetera) before a user leaves the ranked list. Short-video feeds on social media platforms diverge from this general framing in several ways, most notably that users do not tend to leave the feed after, for example, liking a post. Indeed, seemingly infinite feeds invite users to scroll further down the ranked list. For this reason, existing position bias or user models tend to fall short in such settings, as they do not accurately capture users' interaction modalities. In this work, we propose a novel and probabilistically sound personalised position bias model for feed recommendations. We focus on a 1st-level feed in a hierarchical structure, where users may enter a 2nd-level feed via any given 1st-level item. We posit that users come to the platform with a given scrolling budget that is drawn according to a discrete power-law distribution, and show how the survival function of said distribution can be used to obtain closed-form estimates for personalised exposure probabilities. Empirical insights gained through data from a large-scale social media platform show how our probabilistic position bias model more accurately captures empirical exposure than existing models, and paves the way for improved unbiased evaluation and learning-to-rank.	Olivier Jeunen	ShareChat, Edinburgh, Scotland
68	Collaborative filtering algorithms are prone to mainstream-taste bias	Collaborative filtering has been a dominant approach in the recommender systems community since the early 1990s. Collaborative filtering (and other) algorithms, however, have been predominantly evaluated by aggregating results across users or user groups. These performance averages hide large disparities: an algorithm may perform very well for some users (or groups) and poorly for others. We show that performance variation is large and systematic. In experiments on three large-scale datasets and using an array of collaborative filtering algorithms, we demonstrate large performance disparities across algorithms, datasets and metrics for different users. We then show that two key features that characterize users, their mean taste similarity and dispersion in taste similarity with other users, can systematically explain performance variation better than previously identified features. We use these two features to visualize algorithm performance for different users and we point out that this mapping can capture different categories of users that have been proposed before. Our results demonstrate an extensive mainstream-taste bias in collaborative filtering algorithms, which implies a fundamental fairness limitation that needs to be mitigated.	Pantelis Pipergias Analytis, Philipp Hager	Univ Amsterdam, Amsterdam, Netherlands; Univ Southern Denmark, Odense, Denmark
69	Providing Previously Unseen Users Fair Recommendations Using Variational Autoencoders	An emerging definition of fairness in machine learning requires that models are oblivious to demographic user information, e.g., a user's gender or age should not influence the model. Personalized recommender systems are particularly prone to violating this definition through their explicit user focus and user modelling. Explicit user modelling is also an aspect that makes many recommender systems incapable of providing hitherto unseen users with recommendations. We propose novel approaches for mitigating discrimination in Variational Autoencoder-based recommender systems by limiting the encoding of demographic information. The approaches are capable of, and evaluated on, providing users that are not represented in the training data with fair recommendations.	Benjamin Kille, Bjørnar Vassøy, Helge Langseth	Norwegian Univ Sci & Technol, Trondheim, Trondelag, Norway
70	Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences	Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.	Ben Wedin, Filip Radlinski, Krisztian Balog, Lucas Dixon, Scott Sanner	Google, Cambridge, MA USA; Google, London, England; Google, Paris, France; Google, Stavanger, Norway; Univ Toronto, Toronto, ON, Canada
71	Towards Companion Recommenders Assisting Users' Long-Term Journeys	Share on Towards Companion Recommenders Assisting Users’ Long-Term Journeys Authors: Konstantina Christakopoulou Google DeepMind, Google, USA Google DeepMind, Google, USA 0000-0002-1650-1796View Profile , Minmin Chen Google DeepMind, Google, USA Google DeepMind, Google, USA 0000-0002-7342-9022View Profile Authors Info & Claims RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsSeptember 2023Pages 1039–1041https://doi.org/10.1145/3604915.3610241Published:14 September 2023Publication History 0citation175DownloadsMetricsTotal Citations0Total Downloads175Last 12 Months175Last 6 weeks175 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteGet Access	Konstantina Christakopoulou, Minmin Chen	Google DeepMind, Mountain View, CA 94043 USA
72	How Users Ride the Carousel: Exploring the Design of Multi-List Recommender Interfaces From a User Perspective	Multi-list interfaces are widely used in recommender systems, especially in industry, showing collections of recommendations, one below the other, with items that have certain commonalities. The composition and order of these “carousels” are usually optimized by simulating user interaction based on probabilistic models learned from item click data. Research that actually involves users is rare, with only few studies investigating general user experience in comparison to conventional recommendation lists. Hence, it is largely unknown how specific design aspects such as carousel type and length influence the individual perception and usage of carousel-based interfaces. This paper seeks to fill this gap through an exploratory user study. The results confirm previous assumptions about user behavior and provide first insights into the differences in decision making in the presence of multiple recommendation carousels.	Benedikt Loepp, Jürgen Ziegler	Univ Duisburg Essen, Duisburg, Germany
73	On the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation	Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of the full ranking (i.e. using all negative items), others have highlighted their improved discriminative power and potential to make evaluation more robust. Unfortunately, empirical studies on negative item sampling are based on relatively few methods (between 3-12) and, therefore, lack the statistical power to assess the impact of negative item sampling in practice. In this article, we present preliminary findings from a comprehensive benchmarking study of negative item sampling based on 52 recommendation algorithms and 3 benchmark data sets. We show how the number of sampled negative items and different sampling strategies affect the consistency and discriminative power of sampled evaluation metrics. Furthermore, we investigate the impact of sparsity bias and popularity bias on the robustness of these metrics. In brief, we show that the optimal parameterizations for negative item sampling are dependent on data set characteristics and the goals of the investigator, suggesting a need for greater transparency in related experimental design decisions.	Alan Medlar, Dorota Glowacka, Yang Liu	Univ Helsinki, Helsinki, Finland
74	User Behavior Modeling with Deep Learning for Recommendation: Recent Advances	User Behavior Modeling (UBM) plays a critical role in user interest learning, and has been extensively used in recommender systems. The exploration of key interactive patterns between users and items has yielded significant improvements and great commercial success across a variety of recommendation tasks. This tutorial aims to offer an in-depth exploration of this evolving research topic. We start by reviewing the research background of UBM, paving the way to a clearer understanding of the opportunities and challenges. Then, we present a systematic categorization of existing UBM research works, which can be categorized into four different directions including Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information. To provide an expansive understanding, we delve into each category, discussing representative models while highlighting their respective strengths and weaknesses. Furthermore, we elucidate on the industrial applications of UBM methods, aiming to provide insights into the practical value of existing UBM solutions. Finally, we identify some open challenges and future prospects in UBM. This comprehensive tutorial serves to provide a solid foundation for anyone looking to understand and implement UBM in their research or business.	Hao Wang, Ruiming Tang, Wei Guo, Weiwen Liu, Yong Liu	Huawei Noahs Ark Lab, Hong Kong, Peoples R China; Huawei Noahs Ark Lab, Shenzhen, Peoples R China; Huawei Noahs Ark Lab, Singapore, Singapore; Univ Sci & Technol China, Hefei, Peoples R China
75	HUMMUS: A Linked, Healthiness-Aware, User-centered and Argument-Enabling Recipe Data Set for Recommendation	The overweight and obesity rate is increasing for decades worldwide. Healthy nutrition is, besides education and physical activity, one of the various keys to tackle this issue. In an effort to increase the availability of digital, healthy recommendations, the scientific area of food recommendation extends its focus from the accuracy of the recommendations to beyond-accuracy goals like transparency and healthiness. To address this issue a data basis is required, which in the ideal case encompasses user-item interactions like ratings and reviews, food-related information such as recipe details, nutritional data, and in the best case additional data which describes the food items and their relations semantically. Though several recipe recommendation data sets exist, to the best of our knowledge, a holistic large-scale healthiness-aware and connected data sets have not been made available yet. The lack of such data could partially explain the poor popularity of the topic of healthy food recommendation when compared to the domain of movie recommendation. In this paper, we show that taking into account only user-item interactions is not sufficient for a recommendation. To close this gap, we propose a connected data set called HUMMUS (Health-aware User-centered recoMMendation and argUment-enabling data Set) collected from Food.com containing multiple features including rich nutrient information, text reviews, and ratings, enriched by the authors with extra features such as Nutri-scores and connections to semantic data like the FoodKG and the FoodOn ontology. We hope that these data will contribute to the healthy food recommendation domain.	Armin Gerl, Diana Nurbakova, Felix Bölz, Harald Kosch, Lionel Brunie, Sylvie Calabretto	Univ Lyon, INSA Lyon, CNRS, UCBL,LIRIS,UMR5205, Villeurbanne, France; Univ Passau, Passau, Germany
76	Data-free Knowledge Distillation for Reusing Recommendation Models	A common practice to keep the freshness of an offline Recommender System (RS) is to train models that fit the user’s most recent behaviour while directly replacing the outdated historical model. However, many feature engineering and computing resources are used to train these historical models, but they are underutilized in the downstream RS model training. In this paper, to turn these historical models into treasures, we introduce a model inversed data synthesis framework, which can recover training data information from the historical model and use it for knowledge transfer. This framework synthesizes a new form of data from the historical model. Specifically, we ’invert’ an off-the-shield pretrained model to synthesize binary class user-item pairs beginning from random noise without requiring any additional information from the training dataset. To synthesize informative data from a pretrained model, we propose a new continuous data type rather than the original one- or multi-hot vectors. An additional statistical regularization is added to further improve the quality of the synthetic data inverted from the deep model with batch normalization. The experimental results show that our framework can generalize across different types of models. We can efficiently train different types of classical Click-Through-Rate (CTR) prediction models from scratch with significantly few inversed synthetic data (2 orders of magnitude). Moreover, our framework can also work well in the knowledge transfer scenarios such as model retraining and data-free knowledge distillation.	Cheng Wang, Jiacheng Sun, Jieming Zhu, Rui Zhang, Ruixuan Li, Zhenguo Li, Zhenhua Dong	Huawei Noahs Ark Lab, Hong Kong, Peoples R China; Huawei Noahs Ark Lab, Shenzhen, Peoples R China; Huazhong Univ Sci & Technol, Wuhan, Peoples R China
77	Contextual Multi-Armed Bandit for Email Layout Recommendation	We present the use of a contextual multi-armed bandit approach to improve the personalization of marketing emails sent to Wayfair’s customers. Emails are a critical outreach tool as they economically unlock a significant amount of revenue. We describe how we formulated our problem of selecting the optimal personalized email layout to use as a contextual multi-armed bandit problem. We also explain how we approximated a solution with an Epsilon-greedy strategy. We detail the thorough evaluations we ran, including offline experiments, an off-policy evaluation, and an online A/B test. Our results demonstrate that our approach is able to select personalized email layouts that lead to significant gains in topline business metrics including engagement and conversion rates.	Akash Mehta, Benjamin Schroeder, Emilian Vankov, Linas Baltrunas, Matthew Herman, Preston Donovan, Yan Chen	Netflix, Los Gatos, CA USA; Wayfair, Boston, MA 02116 USA
78	Domain Disentanglement with Interpolative Data Augmentation for Dual-Target Cross-Domain Recommendation	The conventional single-target Cross-Domain Recommendation (CDR) aims to improve the recommendation performance on a sparser target domain by transferring the knowledge from a source domain that contains relatively richer information. By contrast, in recent years, dual-target CDR has been proposed to improve the recommendation performance on both domains simultaneously. However, to this end, there are two challenges in dual-target CDR: (1) how to generate both relevant and diverse augmented user representations, and (2) how to effectively decouple domain-independent information from domain-specific information, in addition to domain-shared information, to capture comprehensive user preferences. To address the above two challenges, we propose a Disentanglement-based framework with Interpolative Data Augmentation for dual-target Cross-Domain Recommendation, called DIDA-CDR. In DIDA-CDR, we first propose an interpolative data augmentation approach to generating both relevant and diverse augmented user representations to augment sparser domain and explore potential user preferences. We then propose a disentanglement module to effectively decouple domain-specific and domain-independent information to capture comprehensive user preferences. Both steps significantly contribute to capturing more comprehensive user preferences, thereby improving the recommendation performance on each domain. Extensive experiments conducted on five real-world datasets show the significant superiority of DIDA-CDR over the state-of-the-art methods.	Feng Zhu, Jiajie Zhu, Yan Wang, Zhu Sun	ASTAR, Inst High Performance Comp, Singapore, Singapore; Ant Grp, Hangzhou, Peoples R China; Macquarie Univ, Macquarie Pk, Australia
79	Reproducibility Analysis of Recommender Systems relying on Visual Features: traps, pitfalls, and countermeasures	Reproducibility is an important requirement for scientific progress, and the lack of reproducibility for a large amount of published research can hinder the progress over the state-of-the-art. This concerns several research areas, and recommender systems are witnessing the same reproducibility crisis. Even solid works published at prestigious venues might not be reproducible for several reasons: data might not be public, source code for recommendation algorithms might not be available or well documented, and evaluation metrics might be computed using parameters not explicitly provided. In addition, recommendation pipelines are becoming increasingly complex due to the use of deep neural architectures or representations for multimodal side information involving text, images, audio, or video. This makes the reproducibility of experiments even more challenging. In this work, we describe an extension of an already existing open-source recommendation framework, called ClayRS, with the aim of providing the foundation for future reproducibility of recommendation processes involving images as side information. This extension, called ClayRS Can See, is the starting point for reproducing state-of-the-art recommendation algorithms exploiting images. We have provided our implementation of one of these algorithms, namely VBPR – Visual Bayesian Personalized Ranking from Implicit Feedback, and we have discussed all the issues related to the reproducibility of the study to deeply understand the main traps and pitfalls, along with solutions to deal with such complex environments. We conclude the work by proposing a checklist for recommender systems reproducibility as a guide for the research community.	Antonio Silletti, Cataldo Musto, Elio Musacchio, Giovanni Semeraro, Marco Polignano, Pasquale Lops	Univ Bari Aldo Moro, Bari, Italy
80	What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems' Performance using Item Response Theory	Current practices in offline evaluation use rank-based metrics to measure the quality of top-n recommendation lists. This approach has practical benefits as it centres assessment on the output of the recommender system and, therefore, measures performance from the perspective of end-users. However, this methodology neglects how recommender systems more broadly model user preferences, which is not captured by only considering the top-n recommendations. In this article, we use item response theory (IRT), a family of latent variable models used in psychometric assessment, to gain a comprehensive understanding of offline evaluation. We use IRT to jointly estimate the latent abilities of 51 recommendation algorithms and the characteristics of 3 commonly used benchmark data sets. For all data sets, the latent abilities estimated by IRT suggest that higher scores from traditional rank-based metrics do not reflect improvements in modeling user preferences. Furthermore, we show that the top-n recommendations with the most discriminatory power are biased towards lower difficulty items, leaving much room for improvement. Lastly, we highlight the role of popularity in evaluation by investigating how user engagement and item popularity influence recommendation difficulty.	Alan Medlar, Dorota Glowacka, Yang Liu	Univ Helsinki, Helsinki, Finland
81	Identifying Controversial Pairs in Item-to-Item Recommendations	Recommendation systems in large-scale online marketplaces are essential to aiding users in discovering new content. However, state-of-the-art systems for item-to-item recommendation tasks are often based on a shallow level of contextual relevance, which can make the system insufficient for tasks where item relationships are more nuanced. Contextually relevant item pairs can sometimes have problematic relationships that are confusing or even controversial to end users, and they could degrade user experiences and brand perception when recommended to users. For example, the recommendation of a book about one sports team to someone reading a book about that team’s biggest rival could be a bad experience, despite the presumed similarities of the books. In this paper, we propose a classifier to identify and prevent such problematic item-to-item recommendations and to enhance overall user experiences. The proposed approach utilizes active learning to sample hard examples effectively across sensitive item categories and employs human raters for data labeling. We also perform offline experiments to demonstrate the efficacy of this system for identifying and filtering problematic recommendations while maintaining recommendation quality.	Brian Knott, Dayvid V. R. Oliveira, Goodman Gu, Jin Cao, Junyi Shen, Nikita Sudan, Rob Monarch, Sindhu Vijaya Raghavan, Yunye Jin	Apple, Austin, TX USA; Apple, Cupertino, CA 95014 USA; Apple, New York, NY USA; Apple, Singapore, Singapore
82	Interpretable User Retention Modeling in Recommendation	Recommendation usually focuses on immediate accuracy metrics like CTR as training objectives. User retention rate, which reflects the percentage of today’s users that will return to the recommender system in the next few days, should be paid more attention to in real-world systems. User retention is the most intuitive and accurate reflection of user long-term satisfaction. However, most existing recommender systems are not focused on user retention-related objectives, since their complexity and uncertainty make it extremely hard to discover why a user will or will not return to a system and which behaviors affect user retention. In this work, we conduct a series of preliminary explorations on discovering and making full use of the reasons for user retention in recommendation. Specifically, we make a first attempt to design a rationale contrastive multi-instance learning framework to explore the rationale and improve the interpretability of user retention. Extensive offline and online evaluations with detailed analyses of a real-world recommender system verify the effectiveness of our user retention modeling. We further reveal the real-world interpretable factors of user retention from both user surveys and explicit negative feedback quantitative analyses to facilitate future model designs. The source codes are released at https://github.com/dinry/IURO.	Jie Zhou, Kaikai Ge, Leyu Lin, Rui Ding, Ruobing Xie, Xiaobo Hao, Xiaochun Yang, Xu Zhang	Northeastern Univ, Shenyang, Peoples R China; Tencent, WeChat, Beijing, Peoples R China
83	Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application	This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by , where k is the number of arms selected per time step, N is the total number of arms, and T is the time horizon.	Jianjun Yuan, Ludovik Coba, Wei Lee Woon	Expedia Grp, London, England; Expedia Grp, Seattle, WA 98119 USA
84	Deliberative Diversity for News Recommendations: Operationalization and Experimental User Study	News recommender systems are an increasingly popular field of study that attracts a growing interdisciplinary research community. As these systems play an essential role in our daily lives, the mechanisms behind their curation processes are under scrutiny. In the area of personalized news, many platforms make design choices driven by economic incentives. In contrast to such systems that optimize for financial gain, there can be norm-driven diversity systems that prioritize normative and democratic goals. However, their impact on users in terms of inducing behavioral change or influencing knowledge is still understudied. In this paper, we contribute to the field of news recommender system design by conducting a user study that examines the impact of these normative approaches. We a.) operationalize the notion of a deliberative public sphere for news recommendations, show b.) the impact on news usage, and c.) the influence on political knowledge, attitudes and voting behavior. We find that exposure to small parties is associated with an increase in knowledge about their candidates and that intensive news consumption about a party can change the direction of attitudes of readers towards the issues of the party.	Abraham Bernstein, Hendrik Meyer, Juliane A. Lischka, Laura Laugwitz, Lucien Heitz, Rana Abdullah	Univ Hamburg, Hamburg, Germany; Univ Zurich, Dept Informat & Digital Soc Initiat, Zurich, Switzerland; Univ Zurich, Dept Informat, Zurich, Switzerland
85	Group Fairness for Content Creators: the Role of Human and Algorithmic Biases under Popularity-based Recommendations	The Creator Economy faces concerning levels of unfairness. Content creators (CCs) publicly accuse platforms of purposefully reducing the visibility of their content based on protected attributes, while platforms place the blame on viewer biases. Meanwhile, prior work warns about the “rich-get-richer” effect perpetuated by existing popularity biases in recommender systems: Any initial advantage in visibility will likely be exacerbated over time. What remains unclear is how the biases based on protected attributes from platforms and viewers interact and contribute to the observed inequality in the context of popularity-biased recommender systems. The difficulty of the question lies in the complexity and opacity of the system. To overcome this challenge, we design a simple agent-based model (ABM) that unifies the platform systems which allocate the visibility of CCs (e.g., recommender systems, moderation) into a single popularity-based function, which we call the visibility allocation system (VAS). Through simulations, we find that although viewer homophilic biases do alone create inequalities, small levels of additional biases in VAS are more harmful. From the perspective of interventions, our results suggest that (a) attempts to reduce attribute-biases in moderation and recommendations should precede those reducing viewers’ homophilic tendencies, (b) decreasing the popularity-biases in VAS decreases but not eliminates inequalities, (c) boosting the visibility of protected CCs to overcome viewers’ homophily with respect to one fairness metric is unlikely to produce fair outcomes with respect to all metrics, and (d) the process is also unfair for viewers and this unfairness could be overcome through the same interventions. More generally, this work demonstrates the potential of using ABMs to better understand the causes and effects of biases and interventions within complex sociotechnical systems.	Aniko Hannak, Nicolò Pagan, Stefania Ionescu	Univ Zurich, Zurich, Switzerland
86	Scalable Deep Q-Learning for Session-Based Slate Recommendation	Reinforcement learning (RL) has demonstrated great potential to improve slate-based recommender systems by optimizing recommendations for long-term user engagement. To handle the combinatorial action space in slate recommendation, recent works decompose the Q-value of a slate into item-wise Q-values, using an item-wise value-based policy. However, the common case where the value function is a parameterized function taking state and action as input results in a linearly increasing number of evaluations required to select an action, proportional to the number of candidate items. While slow training may be acceptable, this becomes intractable when considering the costly evaluation of the parameterized function, such as with deep neural networks, during model serving time. To address this issue, we propose an actor-based policy that reduces the evaluation of the Q-function to a subset of items, significantly reducing inference time and enabling practical deployment in real-world industrial settings. In our empirical evaluation, we demonstrate that our proposed approach achieves equivalent user session engagement to a value-based policy, while significantly reducing the slate serving time by at least 4 times.	Aayush Singha Roy, Aonghus Lawlor, Edoardo D'Amico, Elias Z. Tragos, Neil Hurley	Univ Coll Dublin, Dublin, Ireland
87	Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning	Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement a one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.	Alex Nikulkov, Dmytro Korenkevych, Fan Liu, Jalaj Bhandari, Ruiyang Xu, Yuchen He, Zheqing Zhu	Meta AI, Menlo Pk, CA USA
88	Deep Exploration for Recommendation Systems	Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.	Benjamin Van Roy, Zheqing Zhu	Stanford Univ, Meta AI, Stanford, CA 94305 USA
89	Time-Aware Item Weighting for the Next Basket Recommendations	In this paper we study the next basket recommendation problem. Recent methods use different approaches to achieve better performance. However, many of them do not use information about the time of prediction and time intervals between baskets. To fill this gap, we propose a novel method, Time-Aware Item-based Weighting (TAIW), which takes timestamps and intervals into account. We provide experiments on three real-world datasets, and TAIW outperforms well-tuned state-of-the-art baselines for next-basket recommendations. In addition, we show the results of an ablation study and a case study of a few items.	Aleksey Romanov, Marina Ananyeva, Oleg Lashinin, Sergey Kolesnikov	Natl Res Univ Higher Sch Econ, Moscow, Russia; Tinkoff, Moscow, Russia
90	Multiple Connectivity Views for Session-based Recommendation	Session-based recommendation (SBR), which makes the next-item recommendation based on previous anonymous actions, has drawn increasing attention. The last decade has seen multiple deep learning-based modeling choices applied on SBR successfully, e.g., recurrent neural networks (RNNs), convolutional neural networks (CNNs), graph neural networks (GNNs), and each modeling choice has its intrinsic superiority and limitation. We argue that these modeling choices differentiate from each other by (1) the way they capture the interactions between items within a session and (2) the operators they adopt for composing the neural network, e.g., convolutional operator or self-attention operator. In this work, we dive deep into the former as it is relatively unique to the SBR scenario, while the latter is shared by general neural network modeling techniques. We first introduce the concept of connectivity view to describe the different item interaction patterns at the input level. Then, we develop the Multiple Connectivity Views for Session-based Recommendation (MCV-SBR), a unified framework that incorporates different modeling choices in a single model through the lens of connectivity view. In addition, MCV-SBR allows us to effectively and efficiently explore the search space of the combinations of connectivity views by the Tree-structured Parzen Estimator (TPE) algorithm. Finally, on three widely used SBR datasets, we verify the superiority of MCV-SBR by comparing the searched models with state-of-the-art baselines. We also conduct a series of studies to demonstrate the efficacy and practicability of the proposed connectivity view search algorithm, as well as other components in MCV-SBR.	Jieyu Zhang, Yaming Yang, Yujing Wang, Yunhai Tong, Zheng Miao	Peking Univ, Beijing, Peoples R China; Peking Univ, Sch Artificial Intelligence, Beijing, Peoples R China; Univ Washington, Seattle, WA USA
91	Navigating the Feedback Loop in Recommender Systems: Insights and Strategies from Industry Practice	Understanding and measuring the impact of feedback loops in industrial recommender systems is challenging, leading to the underestimation of their deterioration. In this study, we define open and closed feedback loops and investigate the unique reasons behind the emergence of feedback loops in the industry, drawing from real-world examples that have received limited attention in prior research. We highlight the measurement challenges associated with capturing the full impact of feedback loops using traditional online A/B tests. To address this, we propose the use of offline evaluation frameworks as surrogates for long-term feedback loop bias, supported by a practical simulation system using real data. Our findings provide valuable insights for optimizing the performance of recommender systems operating under feedback loop conditions.	Ding Tong, James McInerney, Justin Basilico, Qifeng Qiao, TingPo Lee	Netflix, Los Gatos, CA 95032 USA
92	Unleash the Power of Context: Enhancing Large-Scale Recommender Systems with Context-Based Prediction Models	In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user's action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability and incorporating its prediction as a feature in CTR prediction models. Our experiments indicate that this enhancement brings significant improvements in offline and online business metrics while having minimal impact on the cost of serving. Overall, our work offers a simple and scalable, yet powerful approach for enhancing the performance of large-scale commercial recommender systems, with broad implications for the field of personalized recommendations.	Assaf Klein, Davorin Kopic, Jan Hartman, Natalia Silberstein	Outbrain, Ljubljana, Slovenia; Outbrain, Netanya, Israel
93	Learning the True Objectives of Multiple Tasks in Sequential Behavior Modeling	Multi-task optimization is an emerging research field in recommender systems that focuses on improving the recommendation performance of multiple tasks. Various methods have been proposed in the past to address task weight balancing, gradient conflict resolution, Pareto optimality, etc, yielding promising results in specific contexts. However, when it comes to real-world scenarios involving user sequential behaviors, these methods are not well suited. To address this gap, we propose AARec, a novel and effective approach for sequential behavior modeling in multi-task recommender systems inspired by acoustic attenuation. Specifically, AARec introduces an impact attenuation mechanism to mitigate the uncertain task interference in multi-task optimization. Extensive experiments on public datasets demonstrate the effectiveness of AARec.	Jiawei Zhang	Peking Univ, Beijing, Peoples R China
94	Analyzing Accuracy versus Diversity in a Health Recommender System for Physical Activities: a Longitudinal User Study	As personalization has great potential to improve mobile health apps, analyzing the effect of different recommender algorithms in the health domain is still in its infancy. As such, this paper investigates whether more accurate recommendations from a content-based recommender or more diverse recommendations from a user-based collaborative filtering recommender will lead to more motivation to move. An eight-week longitudinal between-subject user study is being conducted with an Android app in which participants receive personalized recommendations for physical activities and tips to reduce sedentary behavior. The objective manipulation check confirmed that the group with collaborative filtering received significantly more diverse recommendations. The subjective manipulation check showed that the content-based group assigned more positive feedback for perceived accuracy and star rating to the recommendations they chose and executed. However, perceived diversity and inspiringness was significantly higher in the content-based group, suggesting that users might experience the recommendations differently. Lastly, momentary motivation for the executed activities and tips was significantly higher in the content-based group. As such, the preliminary results of this longitudinal study suggest that more accurate and less diverse recommendations have better effects on motivating users to move more.	Ine Coppens, Luc Martens, Toon De Pessemier	Univ Ghent, Imec WAVES, Ghent, Belgium
95	EasyStudy: Framework for Easy Deployment of User Studies on Recommender Systems	Improvements in the recommender systems (RS) domain are not possible without a thorough way to evaluate and compare newly proposed approaches. User studies represent a viable alternative to online and offline evaluation schemes, but despite their numerous benefits, they are only rarely used. One of the main reasons behind this fact is that preparing a user study from scratch involves a lot of extra work on top of a simple algorithm proposal. To simplify this task, we propose EasyStudy, a modular framework built on the credo “Make simple things fast and hard things possible”. It features ready-to-use datasets, preference elicitation methods, incrementally tuned baseline algorithms, study flow plugins, and evaluation metrics. As a result, a simple study comparing several RS can be deployed with just a few clicks, while more complex study designs can still benefit from a range of reusable components, such as preference elicitation. Overall, EasyStudy dramatically decreases the gap between the laboriousness of offline evaluation vs. user studies and, therefore, may contribute towards the more reliable and insightful user-centric evaluation of next-generation RS. The project repository is available from https://bit.ly/easy-study-repo.	Ladislav Peska, Patrik Dokoupil	Charles Univ Prague, Fac Math & Phys, Prague, Czech Republic
96	LLM Based Generation of Item-Description for Recommendation System	The description of an item plays a pivotal role in providing concise and informative summaries to captivate potential viewers and is essential for recommendation systems. Traditionally, such descriptions were obtained through manual web scraping techniques, which are time-consuming and susceptible to data inconsistencies. In recent years, Large Language Models (LLMs), such as GPT-3.5, and open source LLMs like Alpaca have emerged as powerful tools for natural language processing tasks. In this paper, we have explored how we can use LLMs to generate detailed descriptions of the items. To conduct the study, we have used the MovieLens 1M dataset comprising movie titles and the Goodreads Dataset consisting of names of books and subsequently, an open-sourced LLM, Alpaca, was prompted with few-shot prompting on this dataset to generate detailed movie descriptions considering multiple features like the names of the cast and directors for the ML dataset and the names of the author and publisher for the Goodreads dataset. The generated description was then compared with the scraped descriptions using a combination of Top Hits, MRR, and NDCG as evaluation metrics. The results demonstrated that LLM-based movie description generation exhibits significant promise, with results comparable to the ones obtained by web-scraped descriptions.	Arkadeep Acharya, Brijraj Singh, Naoyuki Onoe	Sony Res India, Bhubaneswar, India
97	Exploring Unlearning Methods to Ensure the Privacy, Security, and Usability of Recommender Systems	Machine learning algorithms have proven highly effective in analyzing large amounts of data and identifying complex patterns and relationships. One application of machine learning that has received significant attention in recent years is recommender systems, which are algorithms that analyze user behavior and other data to suggest items or content that a user may be interested in. However useful, these systems may unintentionally retain sensitive, outdated, or faulty information. Posing a risk to user privacy, system security, and limiting a system’s usability. In this research proposal, we aim to address these challenges by investigating methods for machine “unlearning”, which would allow information to be efficiently “forgotten” or “unlearned” from machine learning models. The main objective of this proposal is to develop the foundation for future machine unlearning methods. We first evaluate current unlearning methods and explore novel adversarial attacks on these methods’ verifiability, efficiency, and accuracy to gain new insights and further develop the theory of machine unlearning. Using our gathered insights, we seek to create novel unlearning methods that are verifiable, efficient, and limit unnecessary accuracy degradation. Through this research, we seek to make significant contributions to the theoretical foundations of machine unlearning while also developing unlearning methods that can be applied to real-world problems.	Jens Leysen	Univ Antwerp, Antwerp, Belgium
98	Complementary Product Recommendation for Long-tail Products	Identifying complementary relations between products plays a key role in e-commerce Recommender Systems (RS). Existing methods in Complementary Product Recommendation (CPR), however, focus only on identifying complementary relations in huge and data-rich catalogs, while none of them considers real-world scenarios of small and medium e-commerce platforms with limited number of interactions. In this paper, we discuss our research proposal that addresses the problem of identifying complementary relations in such sparse settings. To overcome the data sparsity problem, we propose to first learn complementary relations in large and data-rich catalogs and then transfer learned knowledge to small and scarce ones. To be able to map individual products across different catalogs and thus transfer learned relations between them, we propose to create Product Universal Embedding Space (PUES) using textual and visual product meta-data, which serves as a common ground for the products from arbitrary catalog.	Rastislav Papso	Kempelen Inst Intelligent Technol, Bratislava, Slovakia
99	Challenges for Anonymous Session-Based Recommender Systems in Indoor Environments	In the last two decades, recommender systems have become more popular since they can provide personalized recommendations in different fields. However, the current research landscape in this area suggests that there is still considerable potential for applying novel recommendation techniques in indoor environments. In addition, the growing attention to privacy raises even more challenges. Anonymous session-based recommender systems represent attractive solutions in this scenario, given their natural predisposition to model the indoor domain by treating each visit to a particular location as an anonymous session. This paper presents some noteworthy challenges regarding several aspects related to the application of these models in indoor environments. We expose our research questions on issues related to the representation of user behavior, cold-start problem, and fairness. Although these problems affect any RS, they become even more challenging in the chosen environment. Finally, we outline a possible use case in a real application scenario to make more transparent and concrete the line of research we intend to pursue in the near future.	Alessio Ferrato	Roma Tre Univ, Dept Engn, Rome, Italy
100	Recommenders In the wild - Practical Evaluation Methods	The gap between training a recommender model and actually having a recommender system in production is a topic often neglected. A recommender system is far more than a model which produces good metrics in an offline evaluation. Specifically, the evaluation of various recommendation engines in production is often very different from offline evaluations on a laptop. This tutorial will go through many practical steps and focus on the development, evaluation and, in particular, metrics and A/B tests.	Kim Falk, Morten Arngren	Binary Vikings, Copenhagen, Denmark; Wunderman Thompson, Copenhagen, Denmark
101	Masked and Swapped Sequence Modeling for Next Novel Basket Recommendation in Grocery Shopping	Next basket recommendation (NBR) is the task of predicting the next set of items based on a sequence of already purchased baskets. It is a recommendation task that has been widely studied, especially in the context of grocery shopping. In next basket recommendation (NBR), it is useful to distinguish between repeat items, i.e., items that a user has consumed before, and explore items, i.e., items that a user has not consumed before. Most NBR work either ignores this distinction or focuses on repeat items. We formulate the next novel basket recommendation (NNBR) task, i.e., the task of recommending a basket that only consists of novel items, which is valuable for both real-world application and NBR evaluation. We evaluate how existing NBR methods perform on the NNBR task and find that, so far, limited progress has been made w.r.t. the NNBR task. To address the NNBR task, we propose a simple bi-directional transformer basket recommendation model (BTBR), which is focused on directly modeling item-to-item correlations within and across baskets instead of learning complex basket representations. To properly train BTBR, we propose and investigate several masking strategies and training objectives: (i) item-level random masking, (ii) item-level select masking, (iii) basket-level all masking, (iv) basket-level explore masking, and (v) joint masking. In addition, an item-basket swapping strategy is proposed to enrich the item interactions within the same baskets. We conduct extensive experiments on three open datasets with various characteristics. The results demonstrate the effectiveness of BTBR and our masking and swapping strategies for the NNBR task. BTBR with a properly selected masking and swapping strategy can substantially improve NNBR performance.	Andrew Yates, Maarten de Rijke, Ming Li, Mozhdeh Ariannezhad	Univ Amsterdam, AIRLab, Amsterdam, Netherlands; Univ Amsterdam, Amsterdam, Netherlands
102	Loss Harmonizing for Multi-Scenario CTR Prediction	Large-scale industrial systems often include multiple scenarios to satisfy diverse user needs. The common approach of using one model per scenario does not scale well and not suitable for minor scenarios with limited samples. An solution is to train a model on all scenarios, which can introduce domination and bias from the main scenario. MMoE-like structures have been proposed for multi-scenario prediction, but they do not explicitly address the issue of gradient unbalancing. This work proposes an adaptive loss harmonizing (ALH) algorithm for multi-scenario CTR prediction. It dynamically adjusts the learning speed for balanced training and improved performance. Experiments on real industrial datasets and rigorous A/B testing prove our method’s superiority.	Changping Peng, Congcong Liu, Fei Teng, Jingping Shao, Liang Shi, Pei Wang, Xue Jiang, Zhangang Lin	JD Com, Beijing, Peoples R China
103	Towards Robust Fairness-aware Recommendation	Due to the progressive advancement of trustworthy machine learning algorithms, fairness in recommender systems is attracting increasing attention and is often considered from the perspective of users. Conventional fairness-aware recommendation models assume that user preferences remain the same between the training set and the testing set. However, this assumption is arguable in reality, where user preference can shift in the testing set due to the natural spatial or temporal heterogeneity. It is concerning that conventional fairness-aware models may be unaware of such distribution shifts, leading to a sharp decline in the model performance. To address the distribution shift problem, we propose a robust fairness-aware recommendation framework based on Distributionally Robust Optimization (DRO) technique. In specific, we assign learnable weights for each sample to approximate the distributions that leads to the worst-case model performance, and then optimize the fairness-aware recommendation model to improve the worst-case performance in terms of both fairness and recommendation accuracy. By iteratively updating the weights and the model parameter, our framework can be robust to unseen testing sets. To ease the learning difficulty of DRO, we use a hard clustering technique to reduce the number of learnable sample weights. To optimize our framework in a full differentiable manner, we soften the above clustering strategy. Empirically, we conduct extensive experiments based on four real-world datasets to verify the effectiveness of our proposed framework.	Chenyi Zhuang, Hao Yang, Xu Chen, Zeyu Zhang, Zhining Liu	Ant Grp, Hangzhou, Peoples R China; Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
104	Two-sided Calibration for Quality-aware Responsible Recommendation	Calibration in recommender systems ensures that the user’s interests distribution over groups of items is reflected with their corresponding proportions in the recommendation, which has gained increasing attention recently. For example, a user who watched 80 entertainment videos and 20 knowledge videos is expected to receive recommendations comprising about 80% entertainment and 20% knowledge videos as well. However, with the increasing calls for responsible recommendation, it has become inadequate to just match users’ historical behaviors especially when items are grouped by their qualities, which could result in undesired effects at the system level (e.g., overwhelming clickbaits). In this paper, we envision the two-sided calibration task that not only matches the users’ past interests distribution (user-level calibration) but also guarantees an overall target exposure distribution of different item groups (system-level calibration). The target group exposure distribution can be explicitly pursued by users, platform owners, and even the law (e.g., the platform owners expect about 50% knowledge video recommendation on the whole). To support this scenario, we propose a post-processing method named PCT. PCT first solves personalized calibration targets that minimize the changes in users’ historical interest distributions while ensuring the overall target group exposure distribution. Then, PCT reranks the original recommendation lists according to personalized calibration targets to generate both relevant and two-sided calibrated recommendations. Extensive experiments demonstrate the superior performance of the proposed method compared to calibrated and fairness-aware recommendation approaches.	Chao Deng, Chenyang Wang, Haitao Zeng, Junlan Feng, Min Zhang, Weizhi Ma, Yankai Liu, Yiqun Liu, Yuanqing Yu	China Mobile Res Inst & THU CMCC Joint Inst, Beijing 100084, Peoples R China; China Mobile Res Inst, Beijing 100084, Peoples R China; Tsinghua Univ & THU CMCC Joint Inst, BNRist, DCST, Beijing 100084, Peoples R China; Tsinghua Univ, AIR, Beijing 100084, Peoples R China; Tsinghua Univ, BNRist, DCST, Beijing 100084, Peoples R China
105	RecAD: Towards A Unified Library for Recommender Attack and Defense	In recent years, recommender systems have become a ubiquitous part of our daily lives, while they suffer from a high risk of being attacked due to the growing commercial and social values. Despite significant research progress in recommender attack and defense, there is a lack of a widely-recognized benchmarking standard in the field, leading to unfair performance comparison and limited credibility of experiments. To address this, we propose RecAD, a unified library aiming at establishing an open benchmark for recommender attack and defense. RecAD takes an initial step to set up a unified benchmarking pipeline for reproducible research by integrating diverse datasets, standard source codes, hyper-parameter settings, running logs, attack knowledge, attack budget, and evaluation results. The benchmark is designed to be comprehensive and sustainable, covering both attack, defense, and evaluation tasks, enabling more researchers to easily follow and contribute to this promising field. RecAD will drive more solid and reproducible research on recommender systems attack and defense, reduce the redundant efforts of researchers, and ultimately increase the credibility and practical value of recommender attack and defense. The project is released at https://github.com/gusye1234/recad.	Changsheng Wang, Chongming Gao, Fuli Feng, Jianbai Ye, Wenjie Wang, Xiangnan He	Natl Univ Singapore, Singapore, Singapore; Univ Sci & Technol China, Hefei, Anhui, Peoples R China
106	Adversarial Collaborative Filtering for Free	Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of recommendation. To tackle this problem, many prior studies leverage adversarial learning to regularize the representations of users/items, which improves both generalizability and robustness. Those methods often learn adversarial perturbations and model parameters under min-max optimization framework. However, there still have two major drawbacks: 1) Existing methods lack theoretical guarantees of why adding perturbations improve the model generalizability and robustness; 2) Solving min-max optimization is time-consuming. In addition to updating the model parameters, each iteration requires additional computations to update the perturbations, making them not scalable for industry-scale datasets. In this paper, we present Sharpness-aware Collaborative Filtering (SharpCF), a simple yet effective method that conducts adversarial training without extra computational cost over the base optimizer. To achieve this goal, we first revisit the existing adversarial collaborative filtering and discuss its connection with recent Sharpness-aware Minimization. This analysis shows that adversarial training actually seeks model parameters that lie in neighborhoods around the optimal model parameters having uniformly low loss values, resulting in better generalizability. To reduce the computational overhead, SharpCF introduces a novel trajectory loss to measure the alignment between current weights and past weights. Experimental results on real-world datasets demonstrate that our SharpCF achieves superior performance with almost zero additional computational cost comparing to adversarial training.	ChinChia Michael Yeh, Hao Yang, Huiyuan Chen, Mahashweta Das, Vivian Lai, Xiaoting Li, Yan Zheng, Yujie Fan	Visa Res, Palo Alto, CA 94404 USA
107	Trending Now: Modeling Trend Recommendations	Modern recommender systems usually include separate recommendation carousels such as ‘trending now’ to list trending items and further boost their popularity, thereby attracting active users. Though widely useful, such ‘trending now’ carousels typically generate item lists based on simple heuristics, e.g., the number of interactions within a time interval, and therefore still leave much room for improvement. This paper aims to systematically study this under-explored but important problem from the new perspective of time series forecasting. We first provide a set of rigorous definitions related to item trendiness and formulate the trend recommendation task as a one-step time series forecasting problem. We then propose a deep latent variable model, dubbed Trend Recommender (TrendRec), to forecast items’ future trends and generate trending item lists. Furthermore, we design associated evaluation protocols for trend recommendation. Experiments on real-world datasets from various domains show that our TrendRec significantly outperforms the baselines, verifying our model’s effectiveness.	Anoop Deoras, Branislav Kveton, Fei Wang, Hao Ding, Hao Wang, Ravi Divvela, Venkataramana Kini, Yifei Ma, Youngsuk Park, Yupeng Gu	AWS AI Labs, Seattle, WA 98019 USA; Amazon, Seattle, WA 98019 USA
108	A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions	Most Recommender Systems (RecSys) do not provide an indication of confidence in their decisions. Therefore, they do not distinguish between recommendations of which they are certain, and those where they are not. Existing confidence methods for RecSys are either inaccurate heuristics, conceptually complex or computationally very expensive. Consequently, real-world RecSys applications rarely adopt these methods, and thus, provide no confidence insights in their behavior. In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Our results show that LBD maintains competitive accuracy to existing methods while also having a significantly stronger correlation between its accuracy and confidence. Furthermore, LBD has higher performance when applied to a high-precision targeted recommendation task. Our work thus shows that confidence in RecSys is possible without sacrificing simplicity or accuracy, and without introducing heavy computational complexity. Thereby, we hope it enables better insight into real-world RecSys and opens the door for novel future applications.	Harrie Oosterhuis, Norman Knyazev	Radboud Univ Nijmegen, Nijmegen, Netherlands
109	Investigating the effects of incremental training on neural ranking models	Recommender systems are an essential component of online platforms providing users with personalized experiences. Some recommendation scenarios such as social networks and news are extremely dynamic in nature with user interests changing over time and new items being continuously added due to breaking news and trending events. Incremental training is a popular technique to keep recommender models up-to-date in such dynamic platforms. In this paper, we provide an empirical analysis of a large industry dataset from the Sharechat app MOJ, a social media platform featuring short videos, to answer relevant questions like - How often should I retrain the models? - do different model architectures, features and dataset sizes benefit differently from incremental training? - Does incremental training equally benefit all users and items?	Benedikt Schifferer, Chris Deotte, Chris Green, Even Oldridge, Gabriel de Souza Pereira Moreira, Gilberto Titericz, Kazuki Onodera, Praveen Dhinwa, Vishal Agrawal, Wenzhe Shi	NVIDIA, Munich, Germany; NVIDIA, Santa Clara, CA USA; NVIDIA, Sao Paulo, Brazil; NVIDIA, Tokyo, Japan; NVIDIA, Vancouver, BC, Canada; ShareChat, Bangalore, Karnataka, India; ShareChat, London, England; ShareChat, Washington, DC USA; Sharechat, New York, NY USA
110	Multi-Relational Contrastive Learning for Recommendation	Personalized recommender systems play a crucial role in capturing users’ evolving preferences over time to provide accurate and effective recommendations on various online platforms. However, many recommendation models rely on a single type of behavior learning, which limits their ability to represent the complex relationships between users and items in real-life scenarios. In such situations, users interact with items in multiple ways, including clicking, tagging as favorite, reviewing, and purchasing. To address this issue, we propose the Relation-aware Contrastive Learning (RCL) framework, which effectively models dynamic interaction heterogeneity. The RCL model incorporates a multi-relational graph encoder that captures short-term preference heterogeneity while preserving the dedicated relation semantics for different types of user-item interactions. Moreover, we design a dynamic cross-relational memory network that enables the RCL model to capture users’ long-term multi-behavior preferences and the underlying evolving cross-type behavior dependencies over time. To obtain robust and informative user representations with both commonality and diversity across multi-behavior interactions, we introduce a multi-relational contrastive learning paradigm with heterogeneous short- and long-term interest modeling. Our extensive experimental studies on several real-world datasets demonstrate the superiority of the RCL recommender system over various state-of-the-art baselines in terms of recommendation accuracy and effectiveness. We provide the implementation codes for the RCL model at https://github.com/HKUDS/RCL.	Chao Huang, Lianghao Xia, Wei Wei	Univ Hong Kong, Hong Kong, Peoples R China
111	Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis	The success of graph neural network-based models (GNNs) has significantlyadvanced recommender systems by effectively modeling users and items as abipartite, undirected graph. However, many original graph-based works oftenadopt results from baseline papers without verifying their validity for thespecific configuration under analysis. Our work addresses this issue byfocusing on the replicability of results. We present a code that successfullyreplicates results from six popular and recent graph recommendation models(NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF) on three common benchmarkdatasets (Gowalla, Yelp 2018, and Amazon Book). Additionally, we compare thesegraph models with traditional collaborative filtering models that historicallyperformed well in offline evaluations. Furthermore, we extend our study to twonew datasets (Allrecipes and BookCrossing) that lack established setups inexisting literature. As the performance on these datasets differs from theprevious benchmarks, we analyze the impact of specific dataset characteristicson recommendation accuracy. By investigating the information flow from users'neighborhoods, we aim to identify which models are influenced by intrinsicfeatures in the dataset structure. The code to reproduce our experiments isavailable at: https://github.com/sisinflab/Graph-RSs-Reproducibility.	Alejandro Bellogín, Claudio Pomo, Daniele Malitesta, Eugenio Di Sciascio, Tommaso Di Noia, Vito Walter Anelli	Politecn Bari, Bari, Italy; Univ Autonoma Madrid, Madrid, Spain
112	InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models	Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are nowbuilding large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.	Kabir Nagrecha, Lingyi Liu, Pablo Delgado, Prasanna Padmanabhan	Netflix Inc, Los Gatos, CA 95032 USA
113	Generative Learning Plan Recommendation for Employees: A Performance-aware Reinforcement Learning Approach	With the rapid development of enterprise Learning Management Systems (LMS), more and more companies are trying to build enterprise training and course learning platforms for promoting the career development of employees. Indeed, through course learning, many employees have the opportunity to improve their knowledge and skills. For these systems, a major issue is how to recommend learning plans, i.e., a set of courses arranged in the order they should be learned, that can help employees improve their work performance. Existing studies mainly focus on recommending courses that users are most likely to click on by capturing their learning preferences. However, the learning preference of employees may not be the right fit for their career development, and thus it may not necessarily mean their work performance can be improved accordingly. Furthermore, how to capture the mutual correlation and sequential effects between courses, and ensure the rationality of the generated results, is also a major challenge. To this end, in this paper, we propose the Generative Learning plAn recommenDation (GLAD) framework, which can generate personalized learning plans for employees to help them improve their work performance. Specifically, we first design a performance predictor and a rationality discriminator, which have the same transformer-based model architecture, but with totally different parameters and functionalities. In particular, the performance predictor is trained for predicting the work performance of employees based on their work profiles and historical learning records, while the rationality discriminator aims to evaluate the rationality of the generated results. Then, we design a learning plan generator based on the gated transformer and the cross-attention mechanism for learning plan generation. We calculate the weighted sum of the output from the performance predictor and the rationality discriminator as the reward, and we use Self-Critical Sequence Training (SCST) based policy gradient methods to train the generator following the Generative Adversarial Network (GAN) paradigm. Finally, extensive experiments on real-world data clearly validate the effectiveness of our GLAD framework compared with state-of-the-art baseline methods and reveal some interesting findings for talent management.	Hengshu Zhu, Hui Xiong, Xin Song, Ying Sun, Zhi Zheng	Baidu Inc, Baidu Talent Intelligence Ctr, Beijing, Peoples R China; Hong Kong Univ Sci & Technol Guangzhou China, Thrust Artificial Intelligence, Guangzhou, Peoples R China; Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Guangzhou, Peoples R China; Univ Sci & Technol China, Sch Data Sci, Langfang, Peoples R China
114	Knowledge-based Multiple Adaptive Spaces Fusion for Recommendation	Since Knowledge Graphs (KGs) contain rich semantic information, recently there has been an influx of KG-enhanced recommendation methods. Most of existing methods are entirely designed based on euclidean space without considering curvature. However, recent studies have revealed that a tremendous graph-structured data exhibits highly non-euclidean properties. Motivated by these observations, in this work, we propose a knowledge-based multiple adaptive spaces fusion method for recommendation, namely MCKG. Unlike existing methods that solely adopt a specific manifold, we introduce the unified space that is compatible with hyperbolic, euclidean and spherical spaces. Furthermore, we fuse the multiple unified spaces in an attention manner to obtain the high-quality embeddings for better knowledge propagation. In addition, we propose a geometry-aware optimization strategy which enables the pull and push processes benefited from both hyperbolic and spherical spaces. Specifically, in hyperbolic space, we set smaller margins in the area near to the origin, which is conducive to distinguishing between highly similar positive items and negative ones. At the same time, we set larger margins in the area far from the origin to ensure the model has sufficient error tolerance. The similar manner also applies to spherical spaces. Extensive experiments on three real-world datasets demonstrate that the MCKG has a significant improvement over state-of-the-art recommendation methods. Further ablation experiments verify the importance of multi-space fusion and geometry-aware optimization strategy, justifying the rationality and effectiveness of MCKG.	Deqing Wang, Fuzhen Zhuang, Jin Dong, Meng Yuan, Zhao Zhang	Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China; Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China; Beijing Acad Blockchain & Edge Comp, Beijing, Peoples R China; Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
115	KGTORe: Tailored Recommendations through Knowledge-aware GNN Models	Knowledge graphs (KG) have been proven to be a powerful source of side information to enhance the performance of recommendation algorithms. Their graph-based structure paves the way for the adoption of graph-aware learning models such as Graph Neural Networks (GNNs). In this respect, state-of-the-art models achieve good performance and interpretability via user-level combinations of intents leading users to their choices. Unfortunately, such results often come from and end-to-end learnings that considers a combination of the whole set of features contained in the KG without any analysis of the user decisions. In this paper, we introduce KGTORe, a GNN-based model that exploits KG to learn latent representations for the semantic features, and consequently, interpret the user decisions as a personal distillation of the item feature representations. Differently from previous models, KGTORe does not need to process the whole KG at training time but relies on a selection of the most discriminative features for the users, thus resulting in improved performance and personalization. Experimental results on three well-known datasets show that KGTORe achieves remarkable accuracy performance and several ablation studies demonstrate the effectiveness of its components. The implementation of KGTORe is available at: https://github.com/sisinflab/KGTORe.	Alberto Carlo Maria Mancino, Antonio Ferrara, Daniele Malitesta, Eugenio Di Sciascio, Salvatore Bufi, Tommaso Di Noia	Politecn Bari, Bari, Italy
116	Everyone's a Winner! On Hyperparameter Tuning of Recommendation Models	The performance of a recommender system algorithm in terms of common offline accuracy measures often strongly depends on the chosen hyperparameters. Therefore, when comparing algorithms in offline experiments, we can obtain reliable insights regarding the effectiveness of a newly proposed algorithm only if we compare it to a number of state-of-the-art baselines that are carefully tuned for each of the considered datasets. While this fundamental principle of any area of applied machine learning is undisputed, we find that the tuning process for the baselines in the current literature is barely documented in much of today’s published research. Ultimately, in case the baselines are actually not carefully tuned, progress may remain unclear. In this paper, we exemplify through a computational experiment involving seven recent deep learning models how every method in such an unsound comparison can be reported to be outperforming the state-of-the-art. Finally, we iterate appropriate research practices to avoid unreliable algorithm comparisons in the future.	Dietmar Jannach, Faisal Shehzad	Univ Klagenfurt, Klagenfurt, Austria
117	ADRNet: A Generalized Collaborative Filtering Framework Combining Clinical and Non-Clinical Data for Adverse Drug Reaction Prediction	Adverse drug reaction (ADR) prediction plays a crucial role in both health care and drug discovery for reducing patient mortality and enhancing drug safety. Recently, many studies have been devoted to effectively predict the drug-ADRs incidence rates. However, these methods either did not effectively utilize non-clinical data, i.e., physical, chemical, and biological information about the drug, or did little to establish a link between content-based and pure collaborative filtering during the training phase. In this paper, we first formulate the prediction of multi-label ADRs as a drug-ADR collaborative filtering problem, and to the best of our knowledge, this is the first work to provide extensive benchmark results of previous collaborative filtering methods on two large publicly available clinical datasets. Then, by exploiting the easy accessible drug characteristics from non-clinical data, we propose ADRNet, a generalized collaborative filtering framework combining clinical and non-clinical data for drug-ADR prediction. Specifically, ADRNet has a shallow collaborative filtering module and a deep drug representation module, which can exploit the high-dimensional drug descriptors to further guide the learning of low-dimensional ADR latent embeddings, which incorporates both the benefits of collaborative filtering and representation learning. Extensive experiments are conducted on two publicly available real-world drug-ADR clinical datasets and two non-clinical datasets to demonstrate the accuracy and efficiency of the proposed ADRNet. The code is available at https://github.com/haoxuanli-pku/ADRnet.	Chunyuan Zheng, Fuli Feng, Haoxuan Li, Taojun Hu, Xiangnan He, XiaoHua Zhou, Zetong Xiong	Peking Univ, Beijing, Peoples R China; Univ Calif San Diego, San Diego, CA USA; Univ Sci & Technol China, Hefei, Peoples R China; Yale Univ, New Haven, CT USA
118	Using Learnable Physics for Real-Time Exercise Form Recommendations	Good posture and form are essential for safe and productive exercising. Even in gym settings, trainers may not be readily available for feedback. Rehabilitation therapies and fitness workouts can thus benefit from recommender systems that provide real-time evaluation. In this paper, we present an algorithmic pipeline that can diagnose problems in exercises technique and offer corrective recommendations, with high sensitivity and specificity, in real-time. We use MediaPipe for pose recognition, count repetitions using peak-prominence detection, and use a learnable physics simulator to track motion evolution for each exercise. A test video is diagnosed based on deviations from the prototypical learned motion using statistical learning. The system is evaluated on six full and upper body exercises. These real-time recommendations, counseled via low-cost equipment like smartphones, will allow exercisers to rectify potential mistakes making self-practice feasible while reducing the risk of workout injuries.	Abhishek Jaiswal, Gautam Chauhan, Nisheeth Srivastava	Indian Inst Technol Kanpur, Kanpur, Uttar Pradesh, India
119	ReCon: Reducing Congestion in Job Recommendation using Optimal Transport	Recommender systems may suffer from congestion, meaning that there is an unequal distribution of the items in how often they are recommended. Some items may be recommended much more than others. Recommenders are increasingly used in domains where items have limited availability, such as the job market, where congestion is especially problematic: Recommending a vacancy—for which typically only one person will be hired—to a large number of job seekers may lead to frustration for job seekers, as they may be applying for jobs where they are not hired. This may also leave vacancies unfilled and result in job market inefficiency. We propose a novel approach to job recommendation called ReCon, accounting for the congestion problem. Our approach is to use an optimal transport component to ensure a more equal spread of vacancies over job seekers, combined with a job recommendation model in a multi-objective optimization problem. We evaluated our approach on two real-world job market datasets. The evaluation results show that ReCon has good performance on both congestion-related (e.g., Congestion) and desirability (e.g., NDCG) measures.	Bo Kang, Jefrey Lijffijt, Tijl De Bie, Yoosof Mashayekhi	Univ Ghent, Dept Elect & Informat Syst, IDLAB, Ghent, Belgium
120	Analysis Operations for Constraint-based Recommender Systems	Constraint-based recommender systems support users in the identification of complex items such as financial services and digital cameras (digicams). Such recommender systems enable users to find an appropriate item within the scope of a conversational process. In this context, relevant items are determined by matching user preferences with a corresponding product (item) assortment on the basis of a pre-defined set of constraints. The development and maintenance of constraint-based recommenders is often an error-prone activity – specifically with regard to the scoping of the offered item assortment. In this paper, we propose a set of offline analysis operations (metrics) that provide insights to assess the quality of a constraint-based recommender system before the system is deployed for productive use. The operations include a.o. automated analysis of feature restrictiveness and item (product) accessibility. We analyze usage scenarios of the proposed analysis operations on the basis of a simplified example digicam recommender.	Alexander Felfernig, Sebastian Lubos, Thi Ngoc Trang Tran, VietMan Le	Graz Univ Technol, Inst Software Technol, Graz, Austria
121	Generative Next-Basket Recommendation	Next-basket Recommendation (NBR) refers to the task of predicting a set of items that a user will purchase in the next basket. However, most of existing works merely focus on the correlations between user preferences and predicted items, ignoring the essential correlations among items in the next basket, which often results in over-homogenization of predicted items. In this work, we presents a Generative next-basket Recommendation model (GenRec), a novel NBR paradigm that generates the recommended items one by one to form the next basket via an autoregressive decoder. This generative NBR paradigm contributes to capturing and considering item correlations inside each baskets in both training and serving. Moreover, we jointly consider user’s both item- and basket-level contextual information to better capture user’s multi-granularity preferences. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model.	JiRong Wen, Junjie Zhang, Leyu Lin, Ruobing Xie, Wayne Xin Zhao, Wenqi Sun	Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China; Tencent, WeChat, Beijing, Peoples R China
122	Large Language Model Augmented Narrative Driven Recommendations	Narrative-driven recommendation (NDR) presents an information access problem where users solicit recommendations with verbose descriptions of their preferences and context, for example, travelers soliciting recommendations for points of interest while describing their likes/dislikes and travel circumstances. These requests are increasingly important with the rise of natural language-based conversational interfaces for search and recommendation systems. However, NDR lacks abundant training data for models, and current platforms commonly do not support these requests. Fortunately, classical user-item interaction datasets contain rich textual data, e.g., reviews, which often describe user preferences and context – this may be used to bootstrap training for NDR models. In this work, we explore using large language models (LLMs) for data augmentation to train NDR models. We use LLMs for authoring synthetic narrative queries from user-item interactions with few-shot prompting and train retrieval models for NDR on synthetic queries and user-item interaction data. Our experiments demonstrate that this is an effective strategy for training small-parameter retrieval models that outperform other retrieval and LLM baselines for narrative-driven recommendation.	Andrew McCallum, Hamed Zamani, Sheshera Mysore	Univ Massachusetts, Amherst, MA 01003 USA
123	Extended Conversion: Capturing Successful Interactions in Voice Shopping	Being able to measure the success of online shopping interactions is crucial in order to evaluate and optimize the performance of e-commerce systems. It is especially challenging in the domain of voice shopping, typically supported by voice-based AI assistants. Unlike Web shopping, which offers a rich amount of behavioral signals such as clicks, in voice shopping a non-negligible amount of shopping interactions frequently ends without any immediate explicit or implicit user behavioral signal. Moreover, users may start their journey using a voice-enabled device, but complete it elsewhere, for example on their smartphone mobile app or a Web browser. We explore the challenge of measuring successful interactions in voice product search based on users’ behavior, and propose a medium-term reward metric named Extended ConVersion (ECVR). ECVR extends the notion of conversion beyond the usual purchase action, which serves as an undisputed measure of success in e-commerce. More specifically, it also captures purchase actions that occur at a later stage during a same shopping journey, and possibly on different channel than the one on which the interaction started. In this paper, we formally define the ECVR metric, describe multiple ways of evaluating the quality of a metric, and use these to explore different parameters for ECVR. After selecting the most appropriate parameters, we show that a ranking system optimized for ECVR, set up with these parameters, leads to improvements in long-term engagement and revenue, without compromising immediate conversion gains.	Arnon Lazerson, Elad Haramaty, Liane LewinEytan, Yoelle Maarek, Zohar S. Karnin	Amazon Res, Haifa, Israel
124	Widespread Flaws in Offline Evaluation of Recommender Systems	Even though offline evaluation is just an imperfect proxy of online performance - due to the interactive nature of recommenders - it will probably remain the primary way of evaluation in recommender systems research for the foreseeable future, since the proprietary nature of production recommenders prevents independent validation of A/B test setups and verification of online results. Therefore, it is imperative that offline evaluation setups are as realistic and as flawless as they can be. Unfortunately, evaluation flaws are quite common in recommender systems research nowadays, due to later works copying flawed evaluation setups from their predecessors without questioning their validity. In the hope of improving the quality of offline evaluation of recommender systems, we discuss four of these widespread flaws and why researchers should avoid them.	Balázs Hidasi, Ádám Tibor Czapp	Taboola Co, Grav R&D, Budapest, Hungary
125	Towards Sustainability-aware Recommender Systems: Analyzing the Trade-off Between Algorithms Performance and Carbon Footprint	In this paper, we present a comparative analysis of the trade-off between the performance of state-of-the-art recommendation algorithms and their environmental impact. In particular, we compared 18 popular recommendation algorithms in terms of both performance metrics (i.e., accuracy and diversity of the recommendations) as well as in terms of energy consumption and carbon footprint on three different datasets. In order to obtain a fair comparison, all the algorithms were run based on the implementations available in a popular recommendation library, i.e., RecBole, and used the same experimental settings. The outcomes of the experiments showed that the choice of the optimal recommendation algorithm requires a thorough analysis, since more sophisticated algorithms often led to tiny improvements at the cost of an exponential increase of carbon emissions. Through this paper, we aim to shed light on the problem of carbon footprint and energy consumption of recommender systems, and we make the first step towards the development of sustainability-aware recommendation algorithms.	Allegra De Filippo, Cataldo Musto, Giovanni Semeraro, Giuseppe Spillo, Michela Milano	Univ Bari Aldo Moro, Bari, Italy; Univ Bologna, Bologna, Italy
126	CR-SoRec: BERT driven Consistency Regularization for Social Recommendation	In the real world, when we seek our friends’ opinions on various items or events, we request verbal social recommendations. It has been observed that we often turn to our friends for recommendations on a daily basis. The emergence of online social platforms has enabled users to share their opinion with their social connections. Therefore, we should consider users’ social connections to enhance online recommendation performance. The social recommendation aims to fuse social links with user-item interactions to offer more relevant recommendations. Several efforts have been made to develop an effective social recommendation system. However, there are two significant limitations to current methods: First, they haven’t thoroughly explored the intricate relationships between the diverse influences of neighbours on users’ preferences. Second, existing models are vulnerable to overfitting due to the relatively low number of user-item interaction records in the interaction space. For the aforementioned problems, this paper offers a novel framework called CR-SoRec, an effective recommendation model based on BERT and consistency regularization. This model incorporates Bidirectional Encoder Representations from Transformer(BERT) to learn bidirectional context-aware user and item embeddings with neighbourhood sampling. The neighbourhood Sampling technique samples the most influential neighbours for all the users/ items. Further, to effectively use the available user-item interaction data and social ties, we leverage diverse perspectives via consistency regularization to harness the underlying information. The main objective of our model is to predict the next item that a user would interact with based on its interaction behaviour and social connections. Experimental results show that our model defines a new state-of-the-art on various datasets and outperforms previous work by a significant margin. Extensive experiments are also conducted to analyze the proposed method.	Brijraj Singh, Naoyuki Onoe, Raksha Jalan, Tushar Prakash	Sony Res India, Bangalore, India; Sony, Tokyo, Japan
127	Interface Design to Mitigate Inflation in Recommender Systems	Recommendation systems rely on user-provided data to learn about item quality and provide personalized recommendations. An implicit assumption when aggregating ratings into item quality is that ratings are strong indicators of item quality. In this work, we test this assumption using data collected from a music discovery application. Our study focuses on two factors that cause rating inflation: heterogeneous user rating behavior and the dynamics of personalized recommendations. We show that user rating behavior substantially varies by user, leading to item quality estimates that reflect the users who rated an item more than the item quality itself. Additionally, items that are more likely to be shown via personalized recommendations can experience a substantial increase in their exposure and potential bias toward them. To mitigate these effects, we analyze the results of a randomized controlled trial in which the rating interface was modified. The test resulted in a substantial improvement in user rating behavior and a reduction in item quality inflation. These findings highlight the importance of carefully considering the assumptions underlying recommendation systems and designing interfaces that encourage accurate rating behavior.	Nikhil Garg, Rana Shahout, Sasha Stoikov, Yehonatan Peisakhovsky	Cornell Tech, New York, NY USA; Harvard Univ, Cambridge, MA 02138 USA; Technion, Haifa, Israel
128	Towards Self-Explaining Sequence-Aware Recommendation	Self-explaining models are becoming an important perk of recommender systems, as they help users understand the reason behind certain recommendations, which encourages them to interact more often with the platform. In order to personalize recommendations, modern approaches make the model aware of the user behavior history for interest evolution representation. However, existing explainable recommender systems do not consider the past user history to further personalize the explanation based on the user interest fluctuation. In this work, we propose a SEQuence-Aware Explainable Recommendation model (SEQUER) that is able to leverage the sequence of user-item review interactions to generate better explanations while maintaining recommendation performance. Experiments validate the effectiveness of our proposal on multiple recommendation scenarios. Our source code and preprocessed datasets are available at https://github.com/alarca94/sequer-recsys23.	Alejandro ArizaCasabona, Gianni Fenu, Ludovico Boratto, Maria Salamó	Univ Barcelona, CLiC UBICS, Barcelona, Spain; Univ Cagliari, Cagliari, Italy
129	Ti-DC-GNN: Incorporating Time-Interval Dual Graphs for Recommender Systems	Recommender systems are essential for personalized content delivery and have become increasingly popular recently. However, traditional recommender systems are limited in their ability to capture complex relationships between users and items. Dynamic graph neural networks (DGNNs) have recently emerged as a promising solution for improving recommender systems by incorporating temporal and sequential information in dynamic graphs. In this paper, we propose a novel method, "Ti-DC-GNN" (Time-Interval Dual Causal Graph Neural Networks), based on an intermediate representation of graph evolution as a sequence of time-interval graphs. The main parts of the method are the novel forms of interval graphs: graph of causality and graph of consequence that explicitly preserve inter-relationships between edges (user-items interactions). The local and global message passing are developed based on edge memory to identify short-term and long-term dependencies. Experiments on several well-known datasets show that our method consistently outperforms modern temporal GNNs with node memory alone in dynamic edge prediction tasks.	Andrey V. Savchenko, Dmitrii Kiselev, Ilya Makarov, Ivan Kireev, Maria Ivanova, Nikita Severin	Artificial Intelligence Res Inst AIRI, Moscow, Russia; HSE Univ, Moscow, Russia; Sber AI Lab, Moscow, Russia
130	Of Spiky SVDs and Music Recommendation	The truncated singular value decomposition is a widely used methodology in music recommendation for direct similar-item retrieval and downstream tasks embedding musical items. This paper investigates a curious effect that we show naturally occurring on many recommendation datasets: spiking formations in the embedding space. We first propose a metric to quantify this spiking organization’s strength, then mathematically prove its origin tied to underlying communities of items of varying internal popularity. With this new-found theoretical understanding, we finally open the topic with an industrial use case of estimating how music embeddings’ top-k similar items will change over time under the addition of data.	Darius Afchar, Romain Hennequin, Vincent Guigue	Deezer Res, Paris, France; Sorbonne Univ, AgroParisTech, MLIA, Paris, France
131	Topic-Level Bayesian Surprise and Serendipity for Recommender Systems	A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.	Razvan Bunescu, Tonmoy Hasan	Univ North Carolina Charlotte, Charlotte, NC 28223 USA
132	Stability of Explainable Recommendation	Explainable Recommendation has been gaining attention over the last few years in industry and academia. Explanations provided along with recommendations in a recommender system framework have many uses: particularly reasoning why a suggestion is provided and how well an item aligns with a user’s personalized preferences. Hence, explanations can play a huge role in influencing users to purchase products. However, the reliability of the explanations under varying scenarios has not been strictly verified from an empirical perspective. Unreliable explanations can bear strong consequences such as attackers leveraging explanations for manipulating and tempting users to purchase target items that the attackers would want to promote. In this paper, we study the vulnerability of existent feature-oriented explainable recommenders, particularly analyzing their performance under different levels of external noises added into model parameters. We conducted experiments by analyzing three important state-of-the-art (SOTA) explainable recommenders when trained on two widely used e-commerce based recommendation datasets of different scales. We observe that all the explainable models are vulnerable to increased noise levels. Experimental results verify our hypothesis that the ability to explain recommendations does decrease along with increasing noise levels and particularly adversarial noise does contribute to a much stronger decrease. Our study presents an empirical verification on the topic of robust explanations in recommender systems which can be extended to different types of explainable recommenders in RS.	Prasant Mohapatra, Sairamvinay Vijayaraghavan	Univ Calif Davis, Davis, CA 95616 USA
133	Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation	The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.	Fuli Feng, Jizhi Zhang, Keqin Bao, Wenjie Wang, Xiangnan He, Yang Zhang	Natl Univ Singapore, Singapore, Singapore; Univ Sci & Technol China, Hefei, Peoples R China
134	TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation	Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendations, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec.	Fuli Feng, Jizhi Zhang, Keqin Bao, Wenjie Wang, Xiangnan He, Yang Zhang	Natl Univ Singapore, Singapore, Singapore; Univ Sci & Technol China, Hefei, Peoples R China
135	Station and Track Attribute-Aware Music Personalization	We present a transformer for music personalization that recommends tracks given a station seed (artist) and improves the accuracy vs. a baseline matrix factorization method by 10%. Adding additional embeddings to capture track and station attributes further improves the accuracy of our recommendations by an additional 1% while also improving recommendation diversity, i.e. mitigating popularity bias. We analyze the learned embeddings and find they learn both explicit attributes provided at training and implicit attributes that may inform listener preferences. We also find that incorporating the station context of user feedback helps the model identify and transfer relevant listener preferences across different genres and artists. This particularly helps with music discovery on new stations.	Andreas F. Ehmann, M. Jeffrey Mei, Oliver Bembom	SiriusXM Radio Inc, New York, NY USA
136	Delivery Hero Recommendation Dataset: A Novel Dataset for Benchmarking Recommendation Algorithms	In this paper we propose Delivery Hero Recommendation Dataset (DHRD), a novel real-world dataset for researchers. DHRD comprises over a million food delivery orders from three distinct cities, encompassing thousands of vendors and an extensive range of dishes, serving a combined customer base of over a million individuals. We discuss the challenges associated with such real-world datasets. By releasing DHRD, researchers are empowered with a valuable resource for building and evaluating recommender systems, paving the way for advancements in this domain.	Christian Klaue, Luke Bovard, Raghav Bali, Yernat Assylbekov	Delivery Hero, Berlin, Germany
137	Creating the next generation of news experience on ekstrabladet.dk with recommender systems	With the rise of algorithmic personalization, news organizations are finding it necessary to entrust traditionally held editorial values, such as prioritizing news for readers, to automated systems. In a case study conducted by Ekstra Bladet, the Platform Intelligent News project demonstrates how recommender systems successfully improved the click-through rates for various segments on ekstrabladet.dk, while still maintaining the news organization’s editorial values.	Jes Frellsen, Johannes Kruse, Kasper Lindskow, Michael Riis Andersen	Ekstra Bladet, Copenhagen, Denmark; Tech Univ Denmark, Lyngby, Denmark
138	Leveling Up the Peloton Homescreen: A System and Algorithm for Dynamic Row Ranking	At Peloton, we constantly strive to improve the member experience by highlighting personalized content that speaks to each individual user. One area of focus is our landing page, the homescreen, consisting of numerous rows of class recommendations used to captivate our users and guide them through our growing catalog of workouts. In this paper, we discuss a strategy we have used to increase the rate of workouts started from our homescreen through a Thompson sampling approach to row ranking. We also explore a potential improvement with a collaborative filtering method based on user similarity calculated from workout history.	Alexey Zankevich, Natalia Chen, Nilothpal Talukder, Oinam Nganba Meetei	Peloton Interact, New York, NY 10001 USA
139	Uncovering ChatGPT's Capabilities in Recommender Systems	The debut of ChatGPT has recently attracted significant attention from the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to enhance ChatGPT’s recommendation capabilities by aligning it with traditional information retrieval (IR) ranking capabilities, including point-wise, pair-wise, and list-wise ranking. To achieve this goal, we re-formulate the aforementioned three recommendation policies into prompt formats tailored specifically to the domain at hand. Through extensive experiments on four datasets from different domains, we analyze the distinctions among the three recommendation policies. Our findings indicate that ChatGPT achieves an optimal balance between cost and performance when equipped with list-wise ranking. This research sheds light on a promising direction for aligning ChatGPT with recommendation tasks. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/rainym00d/LLM4RS.	Chen Xu, Haiyuan Zhao, Jun Xu, Ninglu Shao, Sunhao Dai, Weijie Yu, Xiao Zhang, Zhongxiang Sun, Zihua Si	Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China; Renmin Univ China, Sch Informat, Beijing, Peoples R China; Univ Int Business & Econ, Sch Informat Technol & Management, Beijing, Peoples R China
140	Continual Collaborative Filtering Through Gradient Alignment	A recommender system operates in a dynamic environment where new items emerge and new users join the system, resulting in ever-growing user-item interactions over time. Existing works either assume a model trained offline on a static dataset (requiring periodic re-training with ever larger datasets); or an online learning setup that favors recency over history. As privacy-aware users could hide their histories, the loss of older information means that periodic retraining may not always be feasible, while online learning may lose sight of users’ long-term preferences. In this work, we adopt a continual learning perspective to collaborative filtering, by compartmentalizing users and items over time into a notion of tasks. Of particular concern is to mitigate catastrophic forgetting that occurs when the model would reduce performance for older users and items in prior tasks even as it tries to fit the newer users and items in the current task. To alleviate this, we propose a method that leverages gradient alignment to deliver a model that is more compatible across tasks and maximizes user agreement for better user representations to improve long-term recommendations.	Hady W. Lauw, Jaime Hieu Do	Singapore Management Univ, Singapore, Singapore
141	Broadening the Scope: Evaluating the Potential of Recommender Systems beyond prioritizing Accuracy	Although beyond-accuracy metrics have gained attention in the last decade, the accuracy of recommendations is still considered the gold standard to evaluate Recommender Systems (RSs). This approach prioritizes the accuracy of recommendations, neglecting the quality of suggestions to enhance user needs, such as diversity and novelty, as well as trustworthiness regulations in RSs for user and provider fairness. As a result, single metrics determine the success of RSs, but this approach fails to consider other criteria simultaneously. A downside of this method is that the most accurate model configuration may not excel in addressing the remaining criteria. This study seeks to broaden RS evaluation by introducing a multi-objective evaluation that considers all model configurations simultaneously under several perspectives. To achieve this, several hyper-parameter configurations of an RS model are trained, and the Pareto-optimal ones are retrieved. The Quality Indicators (QI) of Pareto frontiers, which are gaining interest in Multi-Objective Optimization research, are adapted to RSs. QI enables evaluating the model’s performance by considering various configurations and giving the same importance to each metric. The experiments show that this multi-objective evaluation overturns the ranking of performance among RSs, paving the way to revisit the evaluation approaches of the RecSys research community. We release codes and datasets in the following GitHub repository: https://github.com/sisinflab/RecMOE.	Dario Di Palma, Tommaso Di Noia, Vincenzo Paparella, Vito Walter Anelli	Politecn Bari, Bari, Italy
142	Climbing crags repetitive choices and recommendations	Outdoor sport climbing in Northern Italy attracts climbers from around the world. While this country has many rock formations, it offers enormous possibilities for adventurous people to explore the mountains. Unfortunately, this great potential causes a problem in finding suitable destinations (crags) to visit for climbing activity. Existing recommender systems in this domain address this issue and suggest potentially interesting items to climbers utilizing a content-based approach. These systems understand users’ preferences from past logs recorded in an electronic training diary. At the same time, some sports people have a behavioral tendency to revisit the same place for subjective reasons. It might be related to weather and seasonality (for instance, some crags are suitable for climbing in winter/summer only), the users’ preferences (when climbers like specific destinations more than others), or personal goals to be achieved in sport (when climbers plan to try some routes again). Unfortunately, current climbing crags recommendations do not adapt when users demonstrate these repetitive behavior patterns. Sequential recommender systems can capture such users’ habits since their architectures were designed to model users’ next item choice by learning from their previous decision manners. To understand to which extent these sequential recommendations can predict the following crags choices in sport climbing, we analyzed a scenario when climbers show repetitious decisions. Further, we present a data set from collected climbers’ e-logs in the Arco region (Italy) and applied several sequential recommender systems architectures for predicting climbers’ following crags’ visits from their past logs. We evaluated these recommender systems offline and compared ranking metrics with the other reported results on the different data sets. The work concludes that sequential models obtain comparably accurate results as in the studies conducted in the field of sequential recommender systems. Hence, it has the prospect for outdoor sport climbers’ subsequent visits prediction and recommendations.	Iustina Ivanova
143	Towards Health-Aware Fairness in Food Recipe Recommendation	Food recommendation systems play a crucial role in promoting personalized recommendations designed to help users find food and recipes that align with their preferences. However, many existing food recommendation systems have overlooked the important aspect of healthy-food and nutritional value of recommended foods, thereby limiting their effectiveness in generating truly healthy recommendations. Our preliminary analysis indicates that users tend to respond positively to unhealthy food and recipes. As a result, existing food recommender systems that neglect health considerations often assign high scores to popular items, inadvertently encouraging unhealthy choices among users. In this study, we propose the development of a fairness-based model that prioritizes health considerations. Our model incorporates fairness constraints from both the user and item perspectives, integrating them into a joint objective framework. Experimental results conducted on real-world food datasets demonstrate that the proposed system not only maintains the ability of food recommendation systems to suggest users’ favorite foods but also improves the health factor compared to unfair models, with an average enhancement of approximately 35%.	Mehrdad Rostami, Mohammad Aliannejadi, Mourad Oussalah	Univ Amsterdam, IRLab, Amsterdam, Netherlands; Univ Oulu, Ctr Machine Vis & Signal Anal CMVS, Oulu, Finland
144	Localify.org: Locally-focus Music Artist and Event Recommendation	Cities with strong local music scenes enjoy many social and economic benefits. To this end, we are interested in developing a locally-focused artist and event recommendation system called Localify.org that supports and promotes local music scenes. In this demo paper, we describe both the overall system architecture as well as our core recommendation algorithm. This algorithm uses artist-artist similarity information, as opposed to user-artist preference information, to bootstrap recommendation while we grow the number of users. The overall design of Localify was chosen based on the fact that local artists tend to be relatively obscure and reside in the long tail of the artist popularity distribution. We discuss the role of popularity bias and how we attempt to ameliorate it in the context of local music recommendation.	April Trainor, Cassandra Raineault, Douglas R. Turnbull, Douglas Turnbull, Elizabeth Richards, Kieran Bentley, Paul Gagliano, Thorsten Joachims, Victoria Conrad	Ithaca Coll, Ithaca, NY 14850 USA
145	Re2Dan: Retrieval of Medical Documents for e-Health in Danish	With the clinical environment becoming more data-reliant, healthcare professionals now have unparalleled access to comprehensive clinical information from numerous sources. Then, one of the main issues is how to avoid overloading practitioners with large amounts of (irrelevant) information while guiding them to the relevant documents for specific patient cases. Additional challenges appear due to the shortness of queries and the presence of long (and maybe noisy) contextual information. This demo presents Re2Dan, a web Retrieval and recommender of Danish medical documents. Re2Dan leverages several techniques to improve the quality of retrieved documents. First, it combines lexical and semantic searches to understand the meaning and context of user queries, allowing the retrieval of documents that are conceptually similar to the user’s query. Second, it recommends similar queries, allowing users to discover related documents and insights. Third, when given contextual information (e.g., from patients’ clinical notes), it suggests medical concepts to expand the user query, enabling a more focused search scope and thus obtaining more accurate recommendations. Preliminary analyses showed the effectiveness of the recommender in improving the relevance and comprehensiveness of recommendations, thereby assisting healthcare professionals in finding relevant information for informed decision-making.	Antonela Tommasel, Ira Assent, Rafael PablosSarabia	Aarhus Univ, Dept Comp Sci, DIGIT Aarhus Univ Ctr Digitalisat Big Data & Data, Aarhus, Denmark
146	Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit	LensKit is one of the first and most popular Recommender System libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 63.6% (21 out of 33) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.	Joeran Beel, Michael Ekstrand, Tobias Vente	Boise State Univ, Boise, ID USA; Univ Siegen, Intelligent Syst Grp, Siegen, Germany
147	Tutorial on Large Language Models for Recommendation	Foundation Models such as Large Language Models (LLMs) have significantly advanced many research areas. In particular, LLMs offer significant advantages for recommender systems, making them valuable tools for personalized recommendations. For example, by formulating various recommendation tasks such as rating prediction, sequential recommendation, straightforward recommendation, and explanation generation into language instructions, LLMs make it possible to build universal recommendation engines that can handle different recommendation tasks. Additionally, LLMs have a remarkable capacity for understanding natural language, enabling them to comprehend user preferences, item descriptions, and contextual information to generate more accurate and relevant recommendations, leading to improved user satisfaction and engagement. This tutorial introduces Foundation Models such as LLMs for recommendation. We will introduce how recommender system advanced from shallow models to deep models and to large models, how LLMs enable generative recommendation in contrast to traditional discriminative recommendation, and how to build LLM-based recommender systems. We will cover multiple perspectives of LLM-based recommendation, including data preparation, model design, model pre-training, fine-tuning and prompting, multi-modality and multi-task learning, as well as trustworthy perspectives of LLM-based recommender systems such as fairness and transparency.	Lei Li, Li Chen, Shuyuan Xu, Wenyue Hua, Yongfeng Zhang	Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China; Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08854 USA
148	On Challenges of Evaluating Recommender Systems in an Offline Setting	In the past 20 years, the area of Recommender Systems (RecSys) has gained significant attention from both academia and industry. We are not in short of research papers on various RecSys models or online systems from industry players. However, in terms of model evaluation in offline settings, many researchers simply follow the commonly adopted experiment setup, and have not zoomed into the unique characteristics of the RecSys problem. In this tutorial, I will briefly review the commonly adopted evaluations in RecSys then discuss the challenges of evaluating recommender systems in an offline setting. The main emphasis is the consideration of global timeline in the evaluation, particularly when a dataset covers user-item interactions that have been collected from a long time period.	Aixin Sun	Nanyang Technol Univ, Singapore, Singapore
149	Trustworthy Recommender Systems: Technical, Ethical, Legal, and Regulatory Perspectives	This tutorial provides an interdisciplinary overview about the topics of fairness, non-discrimination, transparency, privacy, and security in the context of recommender systems. These are important dimensions of trustworthy AI systems according to European policies, but also extend to the global debate on regulating AI technology. Since we strongly believe that the aforementioned aspects require more than merely technical considerations, we discuss these topics also from ethical, legal, and regulatory points of views, intertwining different perspectives. The main focus of the tutorial is still on presenting technical solutions that aim at addressing the mentioned topics of trustworthiness. In addition, the tutorial equips the mostly technical audience of RecSys with the necessary understanding of the social and ethical implications of their research and development, and of recent ethical guidelines and regulatory frameworks.	Elisabeth Lex, Markus Schedl, Vito Walter Anelli	Graz Univ Technol, Graz, Austria; Johannes Kepler Univ Linz, Linz, Austria; Politecn Bari, Bari, Italy
150	Customer Lifetime Value Prediction: Towards the Paradigm Shift of Recommender System Objectives	The ultimate goal of recommender systems is satisfying users’ information needs in the long term. Despite the success of current recommendation techniques in targeting user interest, optimizing long-term user engagement and platform revenue is still challenging due to the restriction of optimization objectives such as clicks, ratings, and dwell time. Customer lifetime value (LTV) reflects the total monetary value of a customer to a business over the course of their relationship. Accurate LTV prediction can guide personalized service providers to optimize their marketing, sales, and service strategies to maximize customer retention, satisfaction, and profitability. However, the extreme sparsity, volatility, and randomness of consumption behaviors make LTV prediction rather intricate and challenging. In this tutorial, we give a detailed introduction to the key technologies and problems in LTV prediction. We present a systematic technique chronicle of LTV prediction over decades, including probabilistic models, traditional machine learning methods, and deep learning techniques. Based on this overview, we introduce several critical challenges in algorithm design, performance evaluation and system deployment from an industrial perspective, from which we derive potential directions for future exploration. From this tutorial, the RecSys community can gain a better understanding of the unique characteristics and challenges of LTV prediction, and it may serve as a catalyst to shift the focus of recommender systems from short-term targets to long-term ones.	Chuhan Wu, Qinglin Jia, Ruiming Tang, Zhenhua Dong	Huawei, Noahs Ark Lab, Beijing, Peoples R China; Huawei, Noahs Ark Lab, Shenzhen, Peoples R China
151	Knowledge-Aware Recommender Systems based on Multi-Modal Information Sources	The last few years showed a growing interest in the design and development of Knowledge-Aware Recommender Systems (KARSs). This is mainly due to their capability in encoding and exploiting several data sources, both structured (such as knowledge graphs) and unstructured (such as plain text). Nowadays, a lot of models at the state-of-the-art in KARSs use deep learning, enabling them to exploit large amounts of information, including knowledge graphs (KGs), user reviews, plain text, and multimedia content (pictures, audio, videos). In my Ph.D. I will follow this research trend and I will explore and study techniques for designing KARSs leveraging representations learnt from multi-modal information sources, in order to provide users with fair, accurate, and explainable recommendations.	Giuseppe Spillo	Univ Bari Aldo Moro, Dept Comp Sci, Bari, Italy
152	Explainable Graph Neural Network Recommenders; Challenges and Opportunities	Graph Neural Networks (GNNs) have demonstrated significant potential in recommendation tasks by effectively capturing intricate connections among users, items, and their associated features. Given the escalating demand for interpretability, current research endeavors in the domain of GNNs for Recommender Systems (RecSys) necessitate the development of explainer methodologies to elucidate the decision-making process underlying GNN-based recommendations. In this work, we aim to present our research focused on techniques to extend beyond the existing approaches for addressing interpretability in GNN-based RecSys.	Amir Reza Mohammadi	Univ Innsbruck, Innsbruck, Austria
153	Overcoming Recommendation Limitations with Neuro-Symbolic Integration	Despite being studied for over twenty years, Recommender Systems (RSs) still suffer from important issues that limit their applicability in real-world scenarios. Data sparsity, cold start, and explainability are some of the most impacting problems. Intuitively, these historical limitations can be mitigated by injecting prior knowledge into recommendation models. Neuro-Symbolic (NeSy) approaches are suitable candidates for achieving this goal. Specifically, they aim to integrate learning (e.g., neural networks) with symbolic reasoning (e.g., logical reasoning). Generally, the integration lets a neural model interact with a logical knowledge base, enabling reasoning capabilities. In particular, NeSy approaches have been shown to deal well with poor training data, and their symbolic component could enhance model transparency. This gives insights that NeSy systems could potentially mitigate the aforementioned RSs limitations. However, the application of such systems to RSs is still in its early stages, and most of the proposed architectures do not really exploit the advantages of a NeSy approach. To this end, we conducted preliminary experiments with a Logic Tensor Network (LTN), a novel NeSy framework. We used the LTN to train a vanilla Matrix Factorization model using a First-Order Logic knowledge base as an objective. In particular, we encoded facts to enable the regularization of the latent factors using content information, obtaining promising results. In this paper, we review existing NeSy recommenders, argue about their limitations, show our preliminary results with the LTN, and propose interesting future works in this novel research area. In particular, we show how the LTN can be intuitively used to regularize models, perform cross-domain recommendation, ensemble learning, and explainable recommendation, reduce popularity bias, and easily define the loss function of a model.	Tommaso Carraro	Univ Padua, Dept Math, Padua, Italy
154	Improving Recommender Systems Through the Automation of Design Decisions	Recommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.	Lukas Wegmeth	Univ Siegen, Siegen, Germany
155	Acknowledging Dynamic Aspects of Trust in Recommender Systems	Trust-based recommender systems emerged as a solution to different limitations of traditional recommender systems. These social systems rely on the assumption that users will adopt the preferences of users they deem trustworthy in an online social setting. However, most trust-based recommender systems consider trust to be a static notion, thereby disregarding crucial dynamic factors that influence the value of trust between users and the performance of the recommender system. In this work, we intend to address several challenges regarding the dynamics of trust within a social recommender system. These issues include the temporal evolution of trust between users and change detection and prediction in users’ interactions. By exploring the factors that influence the evolution of human trust, a complex and abstract concept, this work will contribute to a better understanding of how trust operates in recommender systems.	Imane Akdim	Mohammed VI Polytech Univ, Sch Comp Sci, Ben Guerir, Morocco
156	Denoising Explicit Social Signals for Robust Recommendation	Social recommender system assumes that user’s preferences can be influenced by their social connections. However, social networks are inherently noisy and contain redundant signals that are not helpful or even harmful for the recommendation task. In this extended abstract, we classify the noise in the explicit social links into intrinsic noise and extrinsic noise. Intrinsic noises are those edges that are natural in the social network but do not have an influence on the user preference modeling; Extrinsic noises, on the other hand, are those social links that are introduced intentionally through malicious attacks such that the attackers can manipulate the social influence to bias the recommendation outcome. To tackle this issue, we first propose a self-supervised denoising framework that learns to filter out the noisy social edges. Specifically, we introduce the influence of key opinion leaders to hinder the diffusion of noisy signals and also function as an extra source to enhance user preference modeling and alleviate the data sparsity issue. Experiments will be conducted on the real-world datasets for the Top-K ranking evaluation as well as the model’s robustness to simulated social noises. Finally, we discuss the future plan about how to defend against extrinsic noise from the attacker’s perspective through adversarial training.	Youchen Sun	Nanyang Technol Univ, Singapore, Singapore
157	Advancing Automation of Design Decisions in Recommender System Pipelines	Recommender systems have become essential in domains like streaming services, social media platforms, and e-commerce websites. However, the development of a recommender system involves a complex pipeline with preprocessing, data splitting, algorithm and model selection, and postprocessing stages. Every stage of the recommender systems pipeline requires design decisions that influence the performance of the recommender system. To ease design decisions, automated machine learning (AutoML) techniques have been adapted to the field of recommender systems, resulting in various AutoRecSys libraries. Nevertheless, these libraries limit flexibility in integrating automation techniques. In response, our research aims to enhance the usability of AutoML techniques for design decisions in recommender system pipelines. We focus on developing flexible and library-independent automation techniques for algorithm selection, model selection, and postprocessing steps. By enabling developers to make informed choices and ease the recommender system development process, we decrease the developer’s effort while improving the performance of the recommender systems. Moreover, we want to analyze the cost-to-benefit ratio of automation techniques in recommender systems, evaluating the computational overhead and the resulting improvements in predictive performance. Our objective is to leverage AutoML concepts to automate design decisions in recommender system pipelines, reduce manual effort, and enhance the overall performance and usability of recommender systems.	Tobias Vente	Univ Siegen, Dsiegen, NRW, Germany
158	Demystifying Recommender Systems: A Multi-faceted Examination of Explanation Generation, Impact, and Perception	extended-abstract Share on Demystifying Recommender Systems: A Multi-faceted Examination of Explanation Generation, Impact, and Perception Author: Giacomo Balloccu Department of Mathematics and Informatics, University of Cagliari, Italy Department of Mathematics and Informatics, University of Cagliari, Italy 0000-0002-6857-7709View Profile Authors Info & Claims RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsSeptember 2023Pages 1361–1363https://doi.org/10.1145/3604915.3608887Published:14 September 2023Publication History 0citation60DownloadsMetricsTotal Citations0Total Downloads60Last 12 Months60Last 6 weeks60 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteGet Access	Giacomo Balloccu	Univ Cagliari, Dept Math & Comp Sci, Cagliari, Sardinia, Italy
159	Enhanced Privacy Preservation for Recommender Systems	extended-abstract Share on Enhanced Privacy Preservation for Recommender Systems Author: Ziqing Wu School of Computer Science and Engineering, NTU, Singapore School of Computer Science and Engineering, NTU, Singapore 0000-0002-3714-0942View Profile Authors Info & Claims RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsSeptember 2023Pages 1364–1368https://doi.org/10.1145/3604915.3608888Published:14 September 2023Publication History 0citation91DownloadsMetricsTotal Citations0Total Downloads91Last 12 Months91Last 6 weeks91 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteGet Access	Ziqing Wu	Nanyang Technol Univ, Singapore, Singapore
160	Incentivizing Exploration in Linear Contextual Bandits under Information Gap	Contextual bandit algorithms have been popularly used to address interactive recommendation, where the users are assumed to be cooperative to explore all recommendations from a system. In this paper, we relax this strong assumption and study the problem of incentivized exploration with myopic users, where the users are only interested in recommendations with their currently highest estimated reward. As a result, in order to obtain long-term optimality, the system needs to offer compensation to incentivize the users to take the exploratory recommendations. We consider a new and practically motivated setting where the context features employed by the user are more informative than those used by the system: for example, features based on users’ private information are not accessible by the system. We develop an effective solution for incentivized exploration under such an information gap, and prove that the method achieves a sublinear rate in both regret and compensation. We theoretically and empirically analyze the added compensation due to the information gap, compared with the case where the system has access to the same context features as the user does, i.e., without information gap. Moreover, we also provide a compensation lower bound of this problem.	Chuanhao Li, Haifeng Xu, Hongning Wang, Huazheng Wang, Zhiyuan Liu
161	Ex2Vec: Characterizing Users and Items from the Mere Exposure Effect	The traditional recommendation framework seeks to connect user and content, by finding the best match possible based on users past interaction. However, a good content recommendation is not necessarily similar to what the user has chosen in the past. As humans, users naturally evolve, learn, forget, get bored, they change their perspective of the world and in consequence, of the recommendable content. One well known mechanism that affects user interest is the Mere Exposure Effect: when repeatedly exposed to stimuli, users' interest tends to rise with the initial exposures, reaching a peak, and gradually decreasing thereafter, resulting in an inverted-U shape. Since previous research has shown that the magnitude of the effect depends on a number of interesting factors such as stimulus complexity and familiarity, leveraging this effect is a way to not only improve repeated recommendation but to gain a more in-depth understanding of both users and stimuli. In this work we present (Mere) Exposure2Vec (Ex2Vec) our model that leverages the Mere Exposure Effect in repeat consumption to derive user and item characterization and track user interest evolution. We validate our model through predicting future music consumption based on repetition and discuss its implications for recommendation scenarios where repetition is common.	Bruno Sguerra, Romain Hennequin, VietAnh Tran	Deezer Res, Paris, France
162	Accelerating Creator Audience Building through Centralized Exploration	On Spotify, multiple recommender systems enable personalized user experiences across a wide range of product features. These systems are owned by different teams and serve different goals, but all of these systems need to explore and learn about new content as it appears on the platform. In this work, we describe ongoing efforts at Spotify to develop an efficient solution to this problem, by centralizing content exploration and providing signals to existing, decentralized recommendation systems (a.k.a. exploitation systems). We take a creator-centric perspective, and argue that this approach can dramatically reduce the time it takes for new content to reach its full potential.	Antonina Danylenko, Buket Baran, Guilherme Dinis Junior, Gösta Forsum, Lucas Maystre, Maksym Lefarov, Olayinka S. Folorunso, Yu Zhao	Spotify, Berlin, Germany; Spotify, London, England; Spotify, Stockholm, Sweden
163	Track Mix Generation on Music Streaming Services using Transformers	This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.	Benjamin Chapus, Guillaume SalhaGalvan, Mathieu Morlon, Thibault Cador, Thomas Bouabça, Théo Bontempelli, Walid Bendada	Deezer, Paris, France
164	Reward innovation for long-term member satisfaction	Recommender systems commonly train on user engagements because of their abundance, immediacy of feedback, and the insights they provide into users preferences. However, this approach may unintentionally prioritize optimizing short-term engagements over a product’s or business’s long-term objectives. At Netflix, our recommender systems are designed with the goal of maximizing long-term member satisfaction. To achieve this objective, we adopt a practical approach that augments engagement data with reward signals aligned with long term member satisfaction. This process of identifying, evaluating, and integrating reward signals into an existing learning algorithm is what we term reward innovation. In this work, we present the challenges of applying this approach to a large-scale recommender system and share our approach to addressing them.	Gary Tang, Henry Wang, Jiangwei Pan, Justin Basilico	Netflix, Los Gatos, CA 95032 USA
165	AdaptEx: A Self-Service Contextual Bandit Platform	This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.	Andrea Marchini, Ercument Ilhan, Vilda Markeviciute, William Black	Expedia Grp, London, England
166	Disentangling Motives behind Item Consumption and Social Connection for Mutually-enhanced Joint Prediction	Item consumption and social connection, as common user behaviors in many web applications, have been extensively studied. However, most current works separately perform either item consumption or social link prediction tasks, possibly with the help of the other as an auxiliary signal. Moreover, they merely consider the behaviors in a holistic manner yet neglect the multi-faceted motives behind them. For example, the intention of watching a movie could be killing time or watching it with friends; Likewise, one might connect with others due to friendships or colleagues. To fill this gap, we propose to Disentangle the multi-faceted Motives in each network (i.e., the user-item interaction network and social network) defined respectively by the two types of behaviors, for mutually-enhanced Joint Prediction (DMJP). Specifically, we first learn the disentangled user representations driven by motives of multi-facets in both networks. Thereafter, the mutual influence of the two networks is subtly discriminated at the facet-to-facet level. The fine-grained mutual influence is then exploited asymmetrically to help refine user representations in both networks, with the goal of achieving a mutually-enhanced joint item and social link prediction. Empirical studies on three public datasets showcase the superiority of DMJP over state-of-the-arts (SOTAs) on both tasks.	Jie Zhang, Xiao Sha, Yew Soon Ong, Youchen Sun, Zhu Sun	ASTAR, Inst High Performance Computing, Ctr Frontier AI Res, Singapore, Singapore; Hebei Univ Water Resources & Elect Engn, Cangzhou, Hebei, Peoples R China; Nanyang Technol Univ, ASTAR, Ctr Frontier AI Res, Singapore, Singapore; Nanyang Technol Univ, Singapore, Singapore
167	How Should We Measure Filter Bubbles? A Regression Model and Evidence for Online News	News media play an important role in democratic societies. Central to fulfilling this role is the premise that users should be exposed to diverse news. However, news recommender systems are gaining popularity on news websites, which has sparked concerns over filter bubbles. More specifically, editors, policy-makers and scholars are worried that these news recommender systems may expose users to less diverse content over time. To the best of our knowledge, this hypothesis has not been tested in a longitudinal observational study of real users that interact with a real news website. Such observational studies require the use of research methods that are robust and can account for the many covariates that may influence the diversity of recommendations at any given time. In this work, we propose an analysis model to study whether the variety of articles recommended to a user decreases over time in such an observational study design. Further, we present results from two case studies using aggregated and anonymized data that were collected by two western European news websites employing a collaborative filtering-based news recommender system to serve (personalized) recommendations to their users. Through these case studies we validate empirically that our modeling assumptions are sound and supported by the data, and that our model obtains more reliable and interpretable results than analysis methods used in prior empirical work on filter bubbles. Our case studies provide evidence of a small decrease in the topic variety of a user’s recommendations in the first weeks after they sign up, but no evidence of a decrease in political variety.	Annelien Smets, Bart Goethals, Jens Leysen, Jorre T. A. Vannieuwenhuyze, Lien Michiels, Robin Verachtert	Stat Vlaanderen, Brussels, Belgium; Univ Antwerp, Antwerp, Belgium; Vrije Univ Brussel, Imec, SMIT, Brussels, Belgium
168	Private Matrix Factorization with Public Item Features	We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization (CMF), that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information. The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public item data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback. Furthermore, these data modalities can be collectively utilized to fully leverage public data. Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between private models and their non-private counterparts. As privacy constraints become more stringent, models rely more heavily on public side features for recommendation. This results in a smooth transition from collaborative filtering to item-based contextual recommendations.	Li Zhang, Mihaela Curmei, Mukund Sundararajan, Walid Krichene	Google, Mountain View, CA USA; Microsoft, Mountain View, CA USA; Univ Calif Berkeley, Berkeley, CA 94720 USA
169	Transparently Serving the Public: Enhancing Public Service Media Values through Exploration	In the last few years, we have reportedly underlined the importance of the Public Service Media Remit for ZDF as a Public Service Media provider. Offering fair, diverse, and useful recommendations to users is just as important for us as being transparent about our understanding of these values, the metrics that we are using to evaluate their extent, and the algorithms in our system that produce such recommendations. This year, we have made a major step towards transparency of our algorithms and metrics describing them for a broader audience, offering the possibility for the audience to learn details about our systems and to provide direct feedback to us. Having the possibility to measure and track PSM metrics, we have started to improve our algorithms towards PSM values. In this work, we describe these steps and the results of actively debasing and adding exploration into our recommendations to achieve more fairness.	Andreas Grün, Xenija Neufeld	Accso Accelerated Solut GmbH, Darmstadt, Germany; ZDF, Mainz, Germany
170	Evaluating The Effects of Calibrated Popularity Bias Mitigation: A Field Study	Despite their proven various benefits, Recommender Systems can cause or amplify certain undesired effects. In this paper, we focus on Popularity Bias, i.e., the tendency of a recommender system to utilize the effect of recommending popular items to the user. Prior research has studied the negative impact of this type of bias on individuals and society as a whole and proposed various approaches to mitigate this in various domains. However, almost all works adopted offline methodologies to evaluate the effectiveness of the proposed approaches. Unfortunately, such offline simulations can potentially be rather simplified and unable to capture the full picture. To contribute to this line of research and given a particular lack of knowledge about how debiasing approaches work not only offline, but online as well, we present in this paper the results of user study on a national broadcaster movie streaming platform in Norway, i.e., TV 2, following the A/B testing methodology. We deployed an effective mitigation approach for popularity bias, called Calibrated Popularity (CP), and monitored its performance in comparison to the platform’s existing collaborative filtering recommendation approach as a baseline over a period of almost four months. The results obtained from a large user base interacting in real-time with the recommendations indicate that the evaluated debiasing approach can be effective in addressing popularity bias while still maintaining the level of user interest and engagement.	Anastasiia Klimashevskaia, Astrid Tessem, Christoph Trattner, Dietmar Jannach, Lars Skjærven, Mehdi Elahi	TV 2, Bergen, Norway; Univ Bergen, MediaFutures, Bergen, Norway; Univ Klagenfurt, Klagenfurt, Austria
171	An Exploration of Sentence-Pair Classification for Algorithmic Recruiting	Recent years have seen a rapid increase in the application of computational approaches to different HR tasks, such as algorithmic hiring, skill extraction, and monitoring of employee satisfaction. Much of the recent work on estimating the fit between a person and a job has used representation learning to represent both resumes and job vacancies computationally and determine the degree to which they match. A common approach to this task is Sentence-BERT, which uses a Siamese network to encode resumes and job descriptions into fixed-length vectors and estimates how well they match based on the similarity between those vectors. In our paper, we adapt BERT’s next-sentence prediction task—predicting whether one sentence is likely to follow another in a given context—to the task of matching resumes with job descriptions. Using historical data on past (mis)matches between job-resume pairs, we fine-tune BERT for this downstream task. Through a combination of offline and online experiments on data from a large Scandinavian job portal, we show that this approach performs significantly better than Sentence-BERT and other state-of-the-art approaches for determining person-job fit.	Mesut Kaya, Toine Bogers	Aalborg Univ, Copenhagen, Denmark; IT Univ Copenhagen, Copenhagen, Denmark
172	RecSys Challenge 2023: Deep Funnel Optimization with a Focus on User Privacy	The RecSys 2023 Challenge involved a conversion prediction task in the online advertising space. The dataset was provided by ShareChat (Mohalla Tech Pvt Ltd). The challenge data represents a sample of ad impressions served to the users over a period of 22 days and the task is for a given ad impression, to predict a conversion (install an app) will happen or not. The challenge ran for 3 months with a public dashboard. There were 519 teams registered and 231 teams made at least one submission. The task setting represents an important research area of modeling ad recommendations under user privacy. We identify interesting themes in feature engineering, addressing sparsity and calibrating across multi-step predictions.	Abhishek Srivastava, Athirai A. Irissappane, Rahul Agrawal, Saikishore Kalloori, Sarang Brahme, Sourav Maitra, Yong Liu	Amazon, Seattle, WA USA; ETH, Zurich, Switzerland; Huawei Noahs Ark Lab, Singapore, Singapore; IIM Visakhapatnam, Visakhapatnam, Andhra Pradesh, India; ShareChat, Bangalore, Karnataka, India; ShareChat, London, England
173	Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata	Content metadata plays a very important role in movie recommender systems as it provides valuable information about various aspects of a movie such as genre, cast, plot synopsis, box office summary, etc. Analyzing the metadata can help understand the user preferences to generate personalized recommendations and item cold starting. In this talk, we will focus on one particular type of metadata - genre labels. Genre labels associated with a movie or a TV series help categorize a collection of titles into different themes and correspondingly setting up the audience expectation. We present some of the challenges associated with using genre label information and propose a new way of examining the genre information that we call as the Genre Spectrum. The Genre Spectrum helps capture the various nuanced genres in a title and our offline and online experiments corroborate the effectiveness of the approach. Furthermore, we also talk about applications of LLMs in augmenting content metadata which could eventually be used to achieve effective organization of recommendations in user's 2-D home-grid.	Jaya Kawale, John Trenkle, Saurabh Agrawal	Tubi, San Francisco, CA 94104 USA
174	Efficient Data Representation Learning in Google-scale Systems	"Garbage in, Garbage out" is a familiar maxim to ML practitioners and researchers, because the quality of a learned data representation is highly crucial to the quality of any ML model that consumes it as an input. To handle systems that serve billions of users at millions of queries per second (QPS), we need representation learning algorithms with significantly improved efficiency. At Google, we have dedicated thousands of iterations to develop a set of powerful techniques that efficiently learn high quality data representations. We have thoroughly validated these methods through offline evaluation, online A/B testing, and deployed these in over 50 models across major Google products. In this paper, we consider a generalized data representation learning problem that allows us to identify feature embeddings and crosses as common challenges. We propose two solutions, including: 1. Multi-size Unified Embedding to learn high-quality embeddings; and 2. Deep Cross Network V2 for learning effective feature crosses. We discuss the practical challenges we encountered and solutions we developed during deployment to production systems, compare with SOTA methods, and report offline and online experimental results. This work sheds light on the challenges and opportunities for developing next-gen algorithms for web-scale systems.	Benjamin Coleman, Derek Zhiyuan Cheng, Ed H. Chi, Jianmo Ni, Jonathan Valverde, Lichan Hong, Ruoxi Wang, WangCheng Kang, Yin Zhang	Google DeepMind, Mountain View, CA 94043 USA
175	The Effect of Third Party Implementations on Reproducibility	Reproducibility of recommender systems research has come under scrutiny during recent years. Along with works focusing on repeating experiments with certain algorithms, the research community has also started discussing various aspects of evaluation and how these affect reproducibility. We add a novel angle to this discussion by examining how unofficial third-party implementations could benefit or hinder reproducibility. Besides giving a general overview, we thoroughly examine six third-party implementations of a popular recommender algorithm and compare them to the official version on five public datasets. In the light of our alarming findings we aim to draw the attention of the research community to this neglected aspect of reproducibility.	Balázs Hidasi, Ádám Tibor Czapp	Taboola Co, Grav R&D, Budapest, Hungary
176	Correcting for Interference in Experiments: A Case Study at Douyin	Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China's analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers' limited time and attention. "Naive" estimators currently used in practice simply ignore the interference, but in doing so incur bias on the order of the treatment effect. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, are impractically high variance. We introduce a novel Monte-Carlo estimator, based on "Differences-in-Qs" (DQ) techniques, which achieves bias that is second-order in the treatment effect, while remaining sample-efficient to estimate. On the theoretical side, our contribution is to develop a generalized theory of Taylor expansions for policy evaluation, which extends DQ theory to all major MDP formulations. On the practical side, we implement our estimator on Douyin's experimentation platform, and in the process develop DQ into a truly "plug-and-play" estimator for interference in real-world settings: one which provides robust, low-bias, low-variance treatment effect estimates; admits computationally cheap, asymptotically exact uncertainty quantification; and reduces MSE by 99% compared to the best existing alternatives in our applications.	Andrew Zheng, Hao Li, Huawei Zhang, Tianyi Peng, Vivek F. Farias, Xinyuyang Ren	ByteDance, Beijing, Peoples R China; MIT, Cambridge, MA 02139 USA
177	Visual Representation for Capturing Creator Theme in Brand-Creator Marketplace	Providing cold start recommendations in a brand-creator marketplace is challenging as brands’ preferences extend beyond the mere objects depicted in the creator’s content and encompass the creator’s individual theme consistently thatresonates across images shared on her social media profile. Furthermore, brands often use textual keywords to describe their campaign’s aesthetic appeal, with which creators must align. To address these challenges, we propose two methods: SAME (Same Account Media Embedding), a novel creator representation employing a Siamese network to capture the unique creator theme and OAAR (Object-Agnostic Adjective Representation), enabling filtering creators based on textual adjectives that relate to aesthetic qualities through zero-shot learning. These two methods utilize CLIP, a state-of-the-art language-image model, and improve it in addressing the aforementioned challenges.	Asnat GreensteinMessica, Keren Gaiger, Ravid Cohen, Sarel Duanis, Shaked Zychlinski	Lightricks LTD, Jerusalem, Israel
178	Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?	Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.	Alexey Vasilev, Anton Klenitskiy	Sber, AI Lab, Moscow, Russia
179	Uncertainty-adjusted Inductive Matrix Completion with Graph Neural Networks	We propose a robust recommender systems model which performs matrix completion and a ratings-wise uncertainty estimation jointly. Whilst the prediction module is purely based on an implicit low-rank assumption imposed via nuclear norm regularization, our loss function is augmented by an uncertainty estimation module which learns an anomaly score for each individual rating via a Graph Neural Network: data points deemed more anomalous by the GNN are downregulated in the loss function used to train the low-rank module. The whole model is trained in an end-to-end fashion, allowing the anomaly detection module to tap on the supervised information available in the form of ratings. Thus, our model’s predictors enjoy the favourable generalization properties that come with being chosen from small function space (i.e., low-rank matrices), whilst exhibiting the robustness to outliers and flexibility that comes with deep learning methods. Furthermore, the anomaly scores themselves contain valuable qualitative information. Experiments on various real-life datasets demonstrate that our model outperforms standard matrix completion and other baselines, confirming the usefulness of the anomaly detection module.	Antoine Ledent, Petr Kasalický, Rodrigo Alves	Czech Tech Univ, Prague, Czech Republic; Singapore Management Univ, Singapore, Singapore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RECSYS2023

会议论文列表

FilesExpand file tree

recsys2023.md

Latest commit

History

recsys2023.md

File metadata and controls

RECSYS2023

会议论文列表