chore: update confs

actions-user · actions-user · commit 4bbcd6ac28b0 · 2024-12-17T10:19:48.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -36979,5 +36979,40 @@
         "pub_date": "2024-12-13",
         "summary": "The sheer scale of data required to train modern large language models (LLMs) poses significant risks, as models are likely to gain knowledge of sensitive topics such as bio-security, as well the ability to replicate copyrighted works. Methods designed to remove such knowledge must do so from all prompt directions, in a multi-lingual capacity and without degrading general model performance. To this end, we introduce the targeted angular reversal (TARS) method of knowledge removal from LLMs. The TARS method firstly leverages the LLM in combination with a detailed prompt to aggregate information about a selected concept in the internal representation space of the LLM. It then refines this approximate concept vector to trigger the concept token with high probability, by perturbing the approximate concept vector with noise and transforming it into token scores with the language model head. The feedforward weight vectors in the LLM which operate directly on the internal representation space, and have the highest cosine similarity with this targeting vector, are then replaced by a reversed targeting vector, thus limiting the ability of the concept to propagate through the model. The modularity of the TARS method allows for a sequential removal of concepts from Llama 3.1 8B, such as the famous literary detective Sherlock Holmes, and the planet Saturn. It is demonstrated that the probability of triggering target concepts can be reduced to 0.00 with as few as 1 TARS edit, whilst simultaneously removing the knowledge bi-directionally. Moreover, knowledge is shown to be removed across all languages despite only being targeted in English. Importantly, TARS has minimal impact on the general model capabilities, as after removing 5 diverse concepts in a modular fashion, there is minimal KL divergence in the next token probabilities of the LLM on large corpora of Wikipedia text (median of 0.002).",
         "translated": "训练现代大型语言模型（LLMs）所需的数据规模庞大，这带来了显著的风险，因为模型可能会获取到生物安全等敏感话题的知识，并具备复制受版权保护作品的能力。设计用来移除这些知识的方法必须在所有提示方向上进行操作，具备多语言处理能力，并且不会降低模型的一般性能。为此，我们引入了目标角度反转（TARS）方法，用于从LLMs中移除知识。TARS方法首先利用LLM结合详细的提示，在LLM的内部表示空间中聚合关于选定概念的信息。然后，通过向近似概念向量添加噪声并将其转换为语言模型头部的令牌分数，来精炼这个近似概念向量，以高概率触发概念令牌。接着，直接作用于内部表示空间并且与目标向量具有最高余弦相似度的LLM中的前馈权重向量被替换为反转的目标向量，从而限制概念在模型中传播的能力。TARS方法的模块化特性允许从Llama 3.1 8B中依次移除概念，例如著名文学侦探夏洛克·福尔摩斯和行星土星。实验表明，只需一次TARS编辑，就能将触发目标概念的概率降低到0.00，同时双向移除知识。此外，尽管仅以英语为目标，知识在所有语言中都被移除。重要的是，TARS对模型的一般能力影响极小，因为在以模块化方式移除5个不同概念后，LLM在大量维基百科文本上的下一个令牌概率的KL散度最小（中位数为0.002）。"
+    },
+    {
+        "title": "No More Tuning: Prioritized Multi-Task Learning with Lagrangian\n  Differential Multiplier Methods",
+        "url": "http://arxiv.org/abs/2412.12092v1",
+        "pub_date": "2024-12-16",
+        "summary": "Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks.   In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.",
+        "translated": "鉴于多任务在实际系统中的普遍性，多任务学习（MTL）已在多个领域得到广泛应用。在现实场景中，这些任务通常具有不同的优先级。例如，在网页搜索中，相关性往往优先于点击率或用户参与度等其他指标。现有的框架对不同任务之间的优先级关注不足，通常通过调整任务特定损失函数的权重来区分任务优先级。然而，随着任务数量的增加，这种方法面临挑战，导致超参数调优复杂度呈指数级增长。此外，多个目标的同时优化可能会因低优先级任务的干扰而对高优先级任务的性能产生负面影响。\n\n本文提出了一种新颖的多任务学习框架，采用拉格朗日乘数法进行逐步多任务优化。该框架旨在提升高优先级任务的性能，同时避免其他任务的干扰。其主要优势在于能够自动优化多个目标，无需为不同任务设置平衡超参数，从而消除了手动调优的需求。此外，我们提供了理论分析，证明我们的方法确保了优化保证，增强了过程的可靠性。我们通过在多个公开数据集上的实验以及在淘宝搜索（一个大规模工业搜索排序系统）中的应用，展示了其有效性，显著提升了各项业务指标。"
+    },
+    {
+        "title": "RetroLLM: Empowering Large Language Models to Retrieve Fine-grained\n  Evidence within Generation",
+        "url": "http://arxiv.org/abs/2412.11919v1",
+        "pub_date": "2024-12-16",
+        "summary": "Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose \\textbf{RetroLLM}, a unified framework that integrates retrieval and generation into a single, cohesive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy. Extensive experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks. The code is available at \\url{https://github.com/sunnynexus/RetroLLM}.",
+        "translated": "大型语言模型（LLMs）展现出卓越的生成能力，但常常面临幻觉问题。检索增强生成（RAG）通过结合外部知识提供了一种有效的解决方案，但现有方法仍存在若干局限性：独立检索器的额外部署成本、从检索文本块中引入的冗余输入标记，以及检索与生成缺乏联合优化。为解决这些问题，我们提出了\\textbf{RetroLLM}，这是一个将检索与生成整合为一个统一流程的框架，使LLMs能够通过受限解码直接从语料库中生成细粒度的证据。此外，为减少受限证据生成过程中的错误剪枝，我们引入了（1）分层FM-Index约束，该约束在生成证据前产生语料库约束的线索，以识别相关文档的子集，从而减少不相关的解码空间；以及（2）前瞻性受限解码策略，该策略考虑未来序列的相关性以提高证据的准确性。在五个开放域问答数据集上的广泛实验表明，RetroLLM在域内和域外任务中均表现出优越的性能。代码已发布于\\url{https://github.com/sunnynexus/RetroLLM}。"
+    },
+    {
+        "title": "One for Dozens: Adaptive REcommendation for All Domains with\n  Counterfactual Augmentation",
+        "url": "http://arxiv.org/abs/2412.11905v1",
+        "pub_date": "2024-12-16",
+        "summary": "Multi-domain recommendation (MDR) aims to enhance recommendation performance across various domains. However, real-world recommender systems in online platforms often need to handle dozens or even hundreds of domains, far exceeding the capabilities of traditional MDR algorithms, which typically focus on fewer than five domains. Key challenges include a substantial increase in parameter count, high maintenance costs, and intricate knowledge transfer patterns across domains. Furthermore, minor domains often suffer from data sparsity, leading to inadequate training in classical methods. To address these issues, we propose Adaptive REcommendation for All Domains with counterfactual augmentation (AREAD). AREAD employs a hierarchical structure with a limited number of expert networks at several layers, to effectively capture domain knowledge at different granularities. To adaptively capture the knowledge transfer pattern across domains, we generate and iteratively prune a hierarchical expert network selection mask for each domain during training. Additionally, counterfactual assumptions are used to augment data in minor domains, supporting their iterative mask pruning. Our experiments on two public datasets, each encompassing over twenty domains, demonstrate AREAD's effectiveness, especially in data-sparse domains. Source code is available at https://github.com/Chrissie-Law/AREAD-Multi-Domain-Recommendation.",
+        "translated": "多领域推荐（MDR）旨在提升跨多个领域的推荐性能。然而，在线平台的实际推荐系统通常需要处理数十甚至数百个领域，这远远超出了传统MDR算法的处理能力，后者通常仅专注于少于五个领域。关键挑战包括参数数量的大幅增加、高昂的维护成本以及跨领域知识转移模式的复杂性。此外，小领域往往面临数据稀疏问题，导致在经典方法中训练不足。为解决这些问题，我们提出了基于反事实增强的适用于所有领域自适应推荐（AREAD）。AREAD采用了一种分层结构，在多个层级上设置了有限数量的专家网络，以有效捕捉不同粒度的领域知识。为了自适应地捕捉跨领域的知识转移模式，我们在训练过程中为每个领域生成并迭代修剪分层专家网络选择掩码。此外，利用反事实假设来增强小领域的数据，支持其迭代掩码修剪。我们在两个包含超过二十个领域的公开数据集上进行的实验证明了AREAD的有效性，特别是在数据稀疏的领域。源代码可在https://github.com/Chrissie-Law/AREAD-Multi-Domain-Recommendation获取。"
+    },
+    {
+        "title": "Investigating Mixture of Experts in Dense Retrieval",
+        "url": "http://arxiv.org/abs/2412.11864v1",
+        "pub_date": "2024-12-16",
+        "summary": "While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), one limitation of these neural models is their narrow generalizability and robustness. To cope with this issue, one can leverage the Mixture-of-Experts (MoE) architecture. While previous IR studies have incorporated MoE architectures within the Transformer layers of DRMs, our work investigates an architecture that integrates a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard fine-tuning. In detail, we fine-tune three DRMs (TinyBERT, BERT, and Contriever) across four benchmark collections with and without adding the MoE block. Moreover, since MoE showcases performance variations with respect to its parameters (i.e., the number of experts), we conduct additional experiments to investigate this aspect further. The findings show the effectiveness of SB-MoE especially for DRMs with a low number of parameters (i.e., TinyBERT), as it consistently outperforms the fine-tuned underlying model on all four benchmarks. For DRMs with a higher number of parameters (i.e., BERT and Contriever), SB-MoE requires larger numbers of training samples to yield better retrieval performance.",
+        "translated": "尽管密集检索模型（DRMs）在信息检索（IR）领域取得了进展，但这些神经模型的一个局限性在于其泛化能力和鲁棒性较窄。为了应对这一问题，可以利用专家混合（Mixture-of-Experts, MoE）架构。虽然之前的IR研究已经在DRMs的Transformer层中引入了MoE架构，但我们的工作研究了一种架构，该架构在最终Transformer层的输出之后集成了一个单一MoE块（SB-MoE）。我们的实证评估探讨了SB-MoE在检索效果方面与标准微调方法的比较。具体来说，我们在四个基准数据集上对三种DRMs（TinyBERT、BERT和Contriever）进行了微调，无论是否添加MoE块。此外，由于MoE的性能会随其参数（即专家数量）的变化而变化，我们还进行了额外的实验以进一步研究这一方面。研究结果显示，SB-MoE对参数较少的DRMs（如TinyBERT）特别有效，因为它在所有四个基准测试中始终优于微调后的基础模型。对于参数较多的DRMs（如BERT和Contriever），SB-MoE需要更多的训练样本才能获得更好的检索性能。"
+    },
+    {
+        "title": "SPGL: Enhancing Session-based Recommendation with Single Positive Graph\n  Learning",
+        "url": "http://arxiv.org/abs/2412.11846v1",
+        "pub_date": "2024-12-16",
+        "summary": "Session-based recommendation seeks to forecast the next item a user will be interested in, based on their interaction sequences. Due to limited interaction data, session-based recommendation faces the challenge of limited data availability. Traditional methods enhance feature learning by constructing complex models to generate positive and negative samples. This paper proposes a session-based recommendation model using Single Positive optimization loss and Graph Learning (SPGL) to deal with the problem of data sparsity, high model complexity and weak transferability. SPGL utilizes graph convolutional networks to generate global item representations and batch session representations, effectively capturing intrinsic relationships between items. The use of single positive optimization loss improves uniformity of item representations, thereby enhancing recommendation accuracy. In the intent extractor, SPGL considers the hop count of the adjacency matrix when constructing the directed global graph to fully integrate spatial information. It also takes into account the reverse positional information of items when constructing session representations to incorporate temporal information. Comparative experiments across three benchmark datasets, Tmall, RetailRocket and Diginetica, demonstrate the model's effectiveness. The source code can be accessed on https://github.com/liang-tian-tian/SPGL .",
+        "translated": "基于会话的推荐旨在根据用户的交互序列预测其下一个感兴趣的项目。由于交互数据有限，基于会话的推荐面临数据可用性有限的挑战。传统方法通过构建复杂模型来生成正负样本，从而增强特征学习。本文提出了一种使用单正优化损失和图学习（SPGL）的基于会话的推荐模型，以解决数据稀疏性、模型复杂性高和迁移能力弱的问题。SPGL利用图卷积网络生成全局项目表示和批量会话表示，有效捕捉项目之间的内在关系。使用单正优化损失提高了项目表示的均匀性，从而提高了推荐准确性。在意图提取器中，SPGL在构建有向全局图时考虑了邻接矩阵的跳数，以充分整合空间信息，并在构建会话表示时考虑了项目的反向位置信息，以纳入时间信息。在三个基准数据集Tmall、RetailRocket和Diginetica上的对比实验证明了该模型的有效性。源代码可在https://github.com/liang-tian-tian/SPGL 获取。"
     }
 ]