chore: update confs

actions-user · actions-user · commit 2bc07e4a9227 · 2025-10-06T10:20:33.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -56299,5 +56299,40 @@
         "pub_date": "2025-10-02",
         "summary": "Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning. However, they often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems. This issue may degrade the efficiency of the models and make them difficult to adapt the reasoning depth to the complexity of problems. To address this, we introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process. We further propose a novel reasoning paradigm -- Explore Briefly, Then Decide -- with an associated Cumulative Entropy Regulation (CER) mechanism. This paradigm leverages TECA to help the model dynamically determine the optimal point to conclude its thought process and provide a final answer, thus achieving efficient reasoning. Experimental results across diverse mathematical benchmarks show that our approach substantially mitigates overthinking without sacrificing problem-solving ability. With our thinking paradigm, the average response length decreases by up to 71% on simpler datasets, demonstrating the effectiveness of our method in creating a more efficient and adaptive reasoning process.",
         "translated": "大型语言模型（LLMs）在解决复杂问题时通过长链思维推理展现出卓越的能力。然而，它们经常存在“过度思考”问题，即对较简单问题生成不必要的冗长推理步骤。这不仅影响模型效率，也使其难以根据问题复杂度自适应调整推理深度。为应对这一挑战，我们提出新颖的评估指标——令牌熵累积均值（TECA），用于衡量推理过程中的探索程度。我们进一步提出“先探索，后决策”的创新推理范式，并配套开发累积熵调控（CER）机制。该范式通过TECA指标帮助模型动态确定终止思考过程并给出最终答案的最佳时机，从而实现高效推理。在多个数学基准测试上的实验结果表明，我们的方法在保持解决问题能力的同时显著缓解了过度思考现象。采用该思维范式后，在简单数据集上的平均响应长度最高减少71%，证明了该方法在构建更高效、自适应推理过程方面的有效性。"
+    },
+    {
+        "title": "OpenZL: A Graph-Based Model for Compression",
+        "url": "http://arxiv.org/abs/2510.03203v1",
+        "pub_date": "2025-10-03",
+        "summary": "Research in general-purpose lossless compression over the last decade has largely found improvements in compression ratio that come at great cost to resource utilization and processing throughput. However, most production workloads require high throughput and low resource utilization, so most research systems have seen little adoption. Instead, real world improvements in compression are increasingly often realized by building application-specific compressors which can exploit knowledge about the structure and semantics of the data being compressed. These systems easily outperform even the best generic compressors, but application-specific compression schemes are not without drawbacks. They are inherently limited in applicability and are difficult to maintain and deploy.   We show that these challenges can be overcome with a new way of thinking about compression. We propose the ``graph model'' of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. This motivates OpenZL, an implementation of this model that compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of tailored compressors with minimal code, its universal decoder eliminates deployment lag, and its investment in a well-vetted standard component library minimizes security risks. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents an advance in practical, scalable, and maintainable data compression for modern data-intensive applications.",
+        "translated": "过去十年间，通用无损压缩领域的研究进展大多以大幅牺牲资源利用率和处理吞吐量为代价来提升压缩比。然而，由于实际生产负载通常要求高吞吐量与低资源占用，这些研究性系统鲜少获得实际应用。当前业界更常见的压缩性能提升路径，是开发能利用数据结构与语义特征的应用定制化压缩器。这类系统即便与最优的通用压缩器相比也优势显著，但应用定制化压缩方案仍存在固有局限：适用范围受限，且部署维护难度较高。\n\n我们提出了一种突破性的压缩理论框架来解决这些难题。通过建立压缩的“图模型”，将压缩过程表示为模块化编解码器构成的有向无环图。基于此模型实现的OpenZL系统，可将数据压缩为自描述型编码格式——其任意配置均可通过通用解码器进行还原。该设计具有三大优势：以最小代码量快速开发定制压缩器，通用解码器消除部署滞后，经严格验证的标准组件库有效控制安全风险。实验数据显示，在多类真实数据集上，OpenZL在压缩比与速度两项指标均优于当前最先进的通用压缩器。在Meta内部的部署实践表明，该系统持续实现体积/速度双重提升，同时将开发周期从数月缩短至数天。OpenZL由此为现代数据密集型应用提供了兼具实用性、可扩展性与可维护性的压缩解决方案。"
+    },
+    {
+        "title": "CHORD: Customizing Hybrid-precision On-device Model for Sequential\n  Recommendation with Device-cloud Collaboration",
+        "url": "http://arxiv.org/abs/2510.03038v1",
+        "pub_date": "2025-10-03",
+        "summary": "With the advancement of mobile device capabilities, deploying reranking models directly on devices has become feasible, enabling real-time contextual recommendations. When migrating models from cloud to devices, resource heterogeneity inevitably necessitates model compression. Recent quantization methods show promise for efficient deployment, yet they overlook device-specific user interests, resulting in compromised recommendation accuracy. While on-device finetuning captures personalized user preference, it imposes additional computational burden through local retraining. To address these challenges, we propose a framework for \\underline{\\textbf{C}}ustomizing \\underline{\\textbf{H}}ybrid-precision \\underline{\\textbf{O}}n-device model for sequential \\underline{\\textbf{R}}ecommendation with \\underline{\\textbf{D}}evice-cloud collaboration (\\textbf{CHORD}), leveraging channel-wise mixed-precision quantization to simultaneously achieve personalization and resource-adaptive deployment. CHORD distributes randomly initialized models across heterogeneous devices and identifies user-specific critical parameters through auxiliary hypernetwork modules on the cloud. Our parameter sensitivity analysis operates across multiple granularities (layer, filter, and element levels), enabling precise mapping from user profiles to quantization strategy. Through on-device mixed-precision quantization, CHORD delivers dynamic model adaptation and accelerated inference without backpropagation, eliminating costly retraining cycles. We minimize communication overhead by encoding quantization strategies using only 2 bits per channel instead of 32-bit weights. Experiments on three real-world datasets with two popular backbones (SASRec and Caser) demonstrate the accuracy, efficiency, and adaptivity of CHORD.",
+        "translated": "随着移动设备性能的提升，直接在设备端部署重排序模型已具备可行性，能够实现实时情境化推荐。当模型从云端迁移至设备端时，资源异构性必然要求进行模型压缩。现有量化方法虽展现出高效部署潜力，却忽略了设备特定的用户兴趣，导致推荐准确性受损。尽管设备端微调能捕捉个性化用户偏好，但通过本地重训练会带来额外计算负担。为解决这些问题，我们提出了一种基于设备-云端协同的序列推荐混合精度定制框架（CHORD），利用通道级混合精度量化技术同步实现个性化与资源自适应部署。CHORD在异构设备间分发随机初始化模型，并通过云端辅助超网络模块识别用户特定的关键参数。我们的参数敏感性分析涵盖多粒度层级（网络层、滤波器与元素级），能精准构建从用户画像到量化策略的映射。通过设备端混合精度量化，CHORD无需反向传播即可实现动态模型适配与加速推理，消除了昂贵的重训练环节。我们采用每通道仅2比特（而非32比特权重）的编码方式压缩量化策略，最大限度降低通信开销。基于SASRec和Caser两种主流骨干网络在三个真实数据集上的实验表明，CHORD在准确性、效率与适应性方面均表现优异。"
+    },
+    {
+        "title": "Grounding Large Language Models in Clinical Evidence: A\n  Retrieval-Augmented Generation System for Querying UK NICE Clinical\n  Guidelines",
+        "url": "http://arxiv.org/abs/2510.02967v1",
+        "pub_date": "2025-10-03",
+        "summary": "This paper presents the development and evaluation of a Retrieval-Augmented Generation (RAG) system for querying the United Kingdom's National Institute for Health and Care Excellence (NICE) clinical guidelines using Large Language Models (LLMs). The extensive length and volume of these guidelines can impede their utilisation within a time-constrained healthcare system, a challenge this project addresses through the creation of a system capable of providing users with precisely matched information in response to natural language queries. The system's retrieval architecture, composed of a hybrid embedding mechanism, was evaluated against a database of 10,195 text chunks derived from three hundred guidelines. It demonstrates high performance, with a Mean Reciprocal Rank (MRR) of 0.814, a Recall of 81% at the first chunk and of 99.1% within the top ten retrieved chunks, when evaluated on 7901 queries.   The most significant impact of the RAG system was observed during the generation phase. When evaluated on a manually curated dataset of seventy question-answer pairs, RAG-enhanced models showed substantial gains in performance. Faithfulness, the measure of whether an answer is supported by the source text, was increased by 64.7 percentage points to 99.5% for the RAG-enhanced O4-Mini model and significantly outperformed the medical-focused Meditron3-8B LLM, which scored 43%. This, combined with a perfect Context Precision score of 1 for all RAG-enhanced models, confirms the system's ability to prevent information fabrication by grounding its answers in relevant source material. This study thus establishes RAG as an effective, reliable, and scalable approach for applying generative AI in healthcare, enabling cost-effective access to medical guidelines.",
+        "translated": "本文介绍了基于大语言模型（LLM）开发的英国国家卫生与临床优化研究院（NICE）临床指南检索增强生成（RAG）系统及其评估结果。由于该指南篇幅浩繁，在时间紧迫的医疗环境中难以有效利用。本项目通过构建能够响应自然语言查询、精准匹配信息的系统来解决这一难题。该系统采用混合嵌入机制的检索架构，在包含10,195个文本片段（源自三百份指南）的数据库上进行了评估。在7,901条测试查询中，系统表现出色：平均倒数排名（MRR）达0.814，首条结果召回率为81%，前十项结果召回率高达99.1%。\n\nRAG系统最显著的效果体现在生成阶段。在人工标注的70组问答数据集测试中，经RAG增强的模型性能显著提升：衡量答案是否基于原文的忠实度指标方面，增强后的O4-Mini模型达到99.5%，较基线提升64.7个百分点，显著优于专注医疗领域的Meditron3-8B大模型（43%）。结合所有RAG增强模型均取得1.0的完美上下文精确度，证实了该系统能基于相关源材料生成答案，有效避免信息虚构。本研究由此验证了RAG作为在医疗领域应用生成式AI的有效、可靠且可扩展的方案，为经济高效地获取医疗指南提供了新途径。"
+    },
+    {
+        "title": "StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop\n  Question Answering",
+        "url": "http://arxiv.org/abs/2510.02827v1",
+        "pub_date": "2025-10-03",
+        "summary": "Recent progress in retrieval-augmented generation (RAG) has led to more accurate and interpretable multi-hop question answering (QA). Yet, challenges persist in integrating iterative reasoning steps with external knowledge retrieval. To address this, we introduce StepChain GraphRAG, a framework that unites question decomposition with a Breadth-First Search (BFS) Reasoning Flow for enhanced multi-hop QA. Our approach first builds a global index over the corpus; at inference time, only retrieved passages are parsed on-the-fly into a knowledge graph, and the complex query is split into sub-questions. For each sub-question, a BFS-based traversal dynamically expands along relevant edges, assembling explicit evidence chains without overwhelming the language model with superfluous context. Experiments on MuSiQue, 2WikiMultiHopQA, and HotpotQA show that StepChain GraphRAG achieves state-of-the-art Exact Match and F1 scores. StepChain GraphRAG lifts average EM by 2.57% and F1 by 2.13% over the SOTA method, achieving the largest gain on HotpotQA (+4.70% EM, +3.44% F1). StepChain GraphRAG also fosters enhanced explainability by preserving the chain-of-thought across intermediate retrieval steps. We conclude by discussing how future work can mitigate the computational overhead and address potential hallucinations from large language models to refine efficiency and reliability in multi-hop QA.",
+        "translated": "检索增强生成（RAG）领域的最新进展使得多跳问答（QA）的准确性与可解释性显著提升。然而，如何将迭代推理步骤与外部知识检索有机结合仍存挑战。为此，我们提出StepChain GraphRAG框架，该框架通过融合问题分解与广度优先搜索（BFS）推理流来增强多跳问答能力。我们的方法首先构建语料库的全局索引；在推理阶段，仅对实时检索到的文本段进行知识图谱解析，并将复杂查询拆分为子问题。针对每个子问题，系统基于BFS的遍历算法沿相关边动态扩展，构建显式证据链，同时避免语言模型因冗余上下文而过载。在MuSiQue、2WikiMultiHopQA和HotpotQA数据集上的实验表明，StepChain GraphRAG在精确匹配（EM）和F1分数上均达到最先进水平：相较现有最优方法，平均EM提升2.57%，F1提升2.13%，其中HotpotQA数据集提升最为显著（EM +4.70%，F1 +3.44%）。该框架还通过保持中间检索步骤的思维链来增强可解释性。最后我们探讨了未来工作如何通过降低计算开销、解决大语言模型潜在幻觉问题，进一步提升多跳问答的效率和可靠性。"
+    },
+    {
+        "title": "AutoMaAS: Self-Evolving Multi-Agent Architecture Search for Large\n  Language Models",
+        "url": "http://arxiv.org/abs/2510.02669v1",
+        "pub_date": "2025-10-03",
+        "summary": "Multi-agent systems powered by large language models have demonstrated remarkable capabilities across diverse domains, yet existing automated design approaches seek monolithic solutions that fail to adapt resource allocation based on query complexity and domain requirements. This paper introduces AutoMaAS, a self-evolving multi-agent architecture search framework that leverages neural architecture search principles to automatically discover optimal agent configurations through dynamic operator lifecycle management and automated machine learning techniques. Our approach incorporates four key innovations: (1) automatic operator generation, fusion, and elimination based on performance-cost analysis, (2) dynamic cost-aware optimization with real-time parameter adjustment, (3) online feedback integration for continuous architecture refinement, and (4) enhanced interpretability through decision tracing mechanisms. Extensive experiments across six benchmarks demonstrate that AutoMaAS achieves 1.0-7.1\\% performance improvement while reducing inference costs by 3-5\\% compared to state-of-the-art methods. The framework shows superior transferability across datasets and LLM backbones, establishing a new paradigm for automated multi-agent system design in the era of large language models.",
+        "translated": "尽管由大语言模型驱动的多智能体系统已在多个领域展现出卓越能力，但现有自动化设计方法仍追求单一解决方案，无法根据查询复杂度与领域需求动态调整资源分配。本文提出AutoMaAS——一种自演进的多智能体架构搜索框架，该框架运用神经架构搜索原理，通过动态算子生命周期管理与自动化机器学习技术，自动发现最优智能体配置。我们的方法包含四大核心创新：（1）基于性能-成本分析的自动算子生成、融合与淘汰机制；（2）具备实时参数调整能力的动态成本感知优化；（3）在线反馈集成实现持续架构优化；（4）通过决策追溯机制增强系统可解释性。在六个基准测试上的大量实验表明，相较于现有最优方法，AutoMaAS在降低3-5%推理成本的同时实现了1.0-7.1%的性能提升。该框架在不同数据集与LLM骨干网络上均展现出卓越的迁移能力，为大语言模型时代的自动化多智能体系统设计确立了新范式。"
     }
 ]