Skip to content

Toseic/LLM-inference-arxiv-daily

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,873 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues

Updated on 2026.03.09

inference

Publish Date Title Authors PDF Code
2026-03-06 LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis Tao Zhang et.al. 2603.05904 null
2026-03-05 Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks Burak Topcu et.al. 2603.05692 null
2026-03-05 Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents Natchanon Pollertlam et.al. 2603.04814 null
2026-03-05 SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM Inference Luchang Li et.al. 2603.04716 null
2026-03-04 A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality Arther Tian et.al. 2603.04028 null
2026-03-03 SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment Priyavanshi Pathania et.al. 2603.02949 null
2026-03-03 Agentic Self-Evolutionary Replanning for Embodied Navigation Guoliang Li et.al. 2603.02772 null
2026-03-03 Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference Yiqi Liu et.al. 2603.02737 null
2026-03-02 Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads Dominik Scheinert et.al. 2603.02057 null
2026-03-02 Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning Jiebin Zhang et.al. 2603.01639 null
2026-03-02 Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report) Yu Lin et.al. 2603.01499 null
2026-03-02 Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study Emmanuel Aboah Boateng et.al. 2603.01486 null
2026-03-02 SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment Chaoran Xiong et.al. 2603.01477 null
2026-03-02 Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification Guang Huang et.al. 2603.01399 null
2026-02-27 LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Alexander Samarin et.al. 2602.23881 null
2026-02-27 SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud Hariz Yet et.al. 2602.23722 null
2026-02-26 Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems Siyuan Liu et.al. 2602.23266 null
2026-02-26 Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching Hiroki Matsutani et.al. 2602.22812 null
2026-02-25 Sustainable LLM Inference using Context-Aware Model Switching Yuvarani et.al. 2602.22261 null
2026-02-25 Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text Bitan Majumder et.al. 2602.21933 null
2026-02-26 DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Yongtong Wu et.al. 2602.21548 null
2026-02-24 SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks Elizabeth S. Z. Tan et.al. 2602.21307 null
2026-02-24 ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments Haley Li et.al. 2602.21140 null
2026-02-24 CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference Chao Fei et.al. 2602.20732 null
2026-02-24 FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill Rakshith Jayanth et.al. 2602.20515 null
2026-02-23 KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem Seongjin Cha et.al. 2602.20217 null
2026-02-21 MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs Dongwei Wang et.al. 2602.20191 null
2026-02-22 A Power Market Model with Hypersaclers and Modular Datacenters Yihsu Chen et.al. 2602.19310 null
2026-02-22 Scaling Inference-Time Computation via Opponent Simulation: Enabling Online Strategic Adaptation in Repeated Negotiation Xiangyu Liu et.al. 2602.19309 null
2026-02-21 WANSpec: Leveraging Global Compute Capacity for LLM Inference Noah Martin et.al. 2602.18931 null
2026-02-21 BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS Omar Basit et.al. 2602.18755 null
2026-02-21 HillInfer: Efficient Long-Context LLM Inference on the Edge with Hierarchical KV Eviction using SmartSSD He Sun et.al. 2602.18750 null
2026-02-24 RPU -- A Reasoning Processing Unit Matthew Adiletta et.al. 2602.18568 null
2026-02-20 Dual-Tree LLM-Enhanced Negative Sampling for Implicit Collaborative Filtering Jiayi Wu et.al. 2602.18249 null
2026-02-19 Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs Arka Pal et.al. 2602.17223 null
2026-02-18 Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks Michael Cunningham et.al. 2602.16760 null
2026-02-18 LLM-Driven Intent-Based Privacy-Aware Orchestration Across the Cloud-Edge Continuum Zijie Su et.al. 2602.16100 null
2026-02-17 CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill Bradley McDanel et.al. 2602.16054 null
2026-02-17 MoE-Spec: Expert Budgeting for Efficient Speculative Decoding Bradley McDanel et.al. 2602.16052 null
2026-02-17 Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation Shutian Gu et.al. 2602.15724 null
2026-02-16 Efficient Multi-round LLM Inference over Disaggregated Serving Wenhao He et.al. 2602.14516 null
2026-02-16 WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity Lei Chen et.al. 2602.14452 null
2026-02-15 HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming Jiahui Chen et.al. 2602.14214 null
2026-02-14 ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System Hao Kang et.al. 2602.13692 null
2026-02-13 Characterize LSM-tree Compaction Performance via On-Device LLM Inference Jiabiao Ding et.al. 2602.12669 null
2026-02-13 Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats Pengxiang Zhao et.al. 2602.12635 null
2026-02-13 TensorCommitments: A Lightweight Verifiable Inference for Language Models Oguzhan Baser et.al. 2602.12630 null
2026-02-12 Predicting LLM Output Length via Entropy-Guided Representations Huanyi Xie et.al. 2602.11812 null
2026-02-12 Deep Kernel Fusion for Transformers Zixi Zhang et.al. 2602.11808 null
2026-02-12 GORGO: Maximizing KV-Cache Reuse While Minimizing Network Latency in Cross-Region LLM Load Balancing Alessio Ricci Toniolo et.al. 2602.11688 null
2026-02-12 Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt Yujie Gu et.al. 2602.11513 null
2026-02-12 Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache Ye Yu et.al. 2602.11470 null
2026-02-11 Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise Abhishek Saini et.al. 2602.11088 null
2026-02-12 S-GRec: Personalized Semantic-Aware Generative Recommendation with Asymmetric Advantage Jie Jiang et.al. 2602.10606 null
2026-02-10 Beyond SMILES: Evaluating Agentic Systems for Drug Discovery Edward Wijaya et.al. 2602.10163 null
2026-02-12 Efficient Remote Prefix Fetching with GPU-native Media ASICs Liang Mi et.al. 2602.09725 null
2026-02-10 MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering Sieun Hyeon et.al. 2602.09642 null
2026-02-10 LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms Jie Kong et.al. 2602.09323 null
2026-02-09 Benchmarking the Energy Savings with Speculative Decoding Strategies Rohit Dutta et.al. 2602.09113 null
2026-02-09 FlattenGPT: Depth Compression for Transformer with Layer Flattening Ruihan Xu et.al. 2602.08858 null
2026-02-09 Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference Yifei Gao et.al. 2602.08329 null
2026-02-10 Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices Alejandro Ruiz y Mesa et.al. 2602.08060 null
2026-02-08 Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty Yumin Kim et.al. 2602.07958 null
2026-02-08 MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation Yu Zhao et.al. 2602.07905 null
2026-02-08 Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model Tianyi Wang et.al. 2602.07878 null
2026-02-07 ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs Yanlin Qi et.al. 2602.07721 null
2026-02-07 Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference Hoang Anh Duy Le et.al. 2602.07397 null
2026-02-06 SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding Yikang Yue et.al. 2602.07223 null
2026-02-06 Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making Khurram Yamin et.al. 2602.06286 null
2026-02-05 Towards Green AI: Decoding the Energy of LLM Inference in Software Development Lola Solovyeva et.al. 2602.05712 null
2026-02-05 Determining Energy Efficiency Sweet Spots in Production LLM Inference Hiari Pizzini Cavagna et.al. 2602.05695 null
2026-02-05 Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers Jingkai Huang et.al. 2602.05395 null
2026-02-05 TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference Jiyoung Park et.al. 2602.05145 null
2026-02-04 GPU-to-Grid: Voltage Regulation via GPU Utilization Control Zhirui Liang et.al. 2602.05116 null
2026-02-04 Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference Xinyu Wang et.al. 2602.04595 null
2026-02-04 LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding Gang Lin et.al. 2602.04541 null
2026-02-04 BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Junyu Chen et.al. 2602.04163 null
2026-02-03 DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference Jiancai Ye et.al. 2602.03184 null
2026-02-03 NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference Jiangyong Yu et.al. 2602.02988 null
2026-02-03 Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control Ruihan Lin et.al. 2602.02987 null
2026-02-02 Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing Lingkun Long et.al. 2602.02159 null
2026-01-30 Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity Aayush Gautam et.al. 2602.00397 null
2026-01-30 Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference Nikhil Gopal et.al. 2602.00328 null
2026-01-30 EigenAI: Deterministic Inference, Verifiable Results David Ribeiro Alves et.al. 2602.00182 null
2026-01-30 Safer Policy Compliance with Dynamic Epistemic Fallback Joseph Marvin Imperial et.al. 2601.23094 null
2026-01-30 Competitive Non-Clairvoyant KV-Cache Scheduling for LLM Inference Yiding Feng et.al. 2601.22996 null
2026-01-30 Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding Zhanglu Yan et.al. 2601.22876 null
2026-01-30 OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space Zhiyuan Cao et.al. 2601.22752 null
2026-01-30 SCaLRec: Semantic Calibration for LLM-enabled Cloud-Device Sequential Recommendation Ruiqi Zheng et.al. 2601.22543 null
2026-01-29 Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use Julien Delavande et.al. 2601.22362 null
2026-01-29 EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference Bronislav Sidik et.al. 2601.21758 null
2026-01-29 Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks Arther Tian et.al. 2601.21189 null
2026-01-28 ChunkWise LoRA: Adaptive Sequence Partitioning for Memory-Efficient Low-Rank Adaptation and Accelerated LLM Inference Ketan Thakkar et.al. 2601.21109 null
2026-01-29 ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler Bohua Zou et.al. 2601.20755 null
2026-01-29 DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning Yanlin Wang et.al. 2601.20615 null
2026-01-28 TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs Minjae Lee et.al. 2601.20357 null
2026-01-28 Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning Zeyu Xing et.al. 2601.20326 null
2026-01-28 SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips Jiahuan Yu et.al. 2601.20309 null
2026-01-28 LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis Marcus Emmanuel Barnes et.al. 2601.20148 null
2026-01-27 Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering Fangan Dong et.al. 2601.19847 null
2026-01-27 DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference Fuliang Liu et.al. 2601.19278 null
2026-01-26 Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective Fangzhou Wu et.al. 2601.18999 null
2026-01-26 Flatter Tokens are More Valuable for Speculative Draft Model Training Jiaming Fan et.al. 2601.18902 null
2026-01-26 Scaling up Privacy-Preserving ML: A CKKS Implementation of Llama-2-7B Jaiyoung Park et.al. 2601.18511 null
2026-01-26 FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning Lin Sun et.al. 2601.18116 null
2026-01-25 LLM-42: Enabling Determinism in LLM Inference with Verified Speculation Raja Gond et.al. 2601.17768 null
2026-01-25 Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction Jang-Hyun Kim et.al. 2601.17668 null
2026-01-24 GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference Thomas Ziller et.al. 2601.17551 null
2026-01-22 FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design Jiahao Zhang et.al. 2601.15710 null
2026-01-21 MARS: Unleashing the Power of Speculative Decoding via Margin-Aware Verification Jingwei Song et.al. 2601.15498 null
2026-01-21 QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design Nilesh Prasad Pandey et.al. 2601.14549 null
2026-01-20 HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference Zhiyuan Shi et.al. 2601.13684 null
2026-01-20 PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator Yue Jiet Chong et.al. 2601.13628 null
2026-01-19 Explicit Cognitive Allocation: A Principle for Governed and Auditable Inference in Large Language Models Héctor Manuel Manzanilla-Granados et.al. 2601.13443 null
2026-01-19 Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference Zimeng Wu et.al. 2601.13155 null
2026-01-19 From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation Jiahao Wang et.al. 2601.12904 null
2026-01-18 Power Aware Dynamic Reallocation For Inference Yiwei Jiang et.al. 2601.12241 null
2026-01-16 RAPID-Serve: Resource-efficient and Accelerated P/D Intra-GPU Disaggregation Amna Masood et.al. 2601.11822 null
2026-01-16 HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network Peirong Zheng et.al. 2601.11676 null
2026-01-15 WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching Xiangchen Li et.al. 2601.11652 null
2026-01-16 FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning Zhihan Yang et.al. 2601.11311 null
2026-01-14 Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs Jonathan Knoop et.al. 2601.09527 null
2026-01-14 LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference Du Yin et.al. 2601.09258 null
2026-01-13 HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding Qitan Lv et.al. 2601.08273 null
2026-01-13 Coordinated Cooling and Compute Management for AI Datacenters Nardos Belay Abera et.al. 2601.08113 null
2026-01-12 Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference Rei Taniguchi et.al. 2601.07667 null
2026-01-12 ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs Haoqian Meng et.al. 2601.07475 null
2026-01-12 TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees Tianyu Liu et.al. 2601.07353 null
2026-01-12 Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition Tanmay Joshi et.al. 2601.07239 null
2026-01-09 AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving Tianhao Xu et.al. 2601.06288 null
2026-01-07 AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization Zhiqiang Wang et.al. 2601.06177 null
2026-01-14 Challenges and Research Directions for Large Language Model Inference Hardware Xiaoyu Ma et.al. 2601.05047 null
2026-01-08 Revisiting Judge Decoding from First Principles via Training-Free Distributional Divergence Shengyin Sun et.al. 2601.04766 null
2026-01-08 GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models Maanas Taneja et.al. 2601.04719 null
2026-01-07 XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs Linzhang Li et.al. 2601.04426 null
2026-01-05 LoRA-Drop: Temporal LoRA Decoding for Efficient LLM Inference Hossein Rajabzadeh et.al. 2601.02569 null
2026-01-06 Making MoE-based LLM Inference Resilient with Tarragon Songyu Zhang et.al. 2601.01310 null
2026-01-08 From Policy to Logic for Efficient and Interpretable Coverage Assessment Rhitabrat Pokharel et.al. 2601.01266 null
2026-01-01 FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems Shanli Xing et.al. 2601.00227 null
2025-12-31 FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference Fen-Yu Hsieh et.al. 2512.24713 null
2026-01-04 Hardware Acceleration for Neural Networks: A Comprehensive Survey Bin Xu et.al. 2512.23914 null
2025-12-29 Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding Yue Guan et.al. 2512.23858 null
2025-12-28 Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware Alex Khalil et.al. 2512.23029 null
2025-12-28 Argus: Token Aware Distributed LLM Inference Optimization Panlong Wu et.al. 2512.22925 null
2025-12-27 Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving Rui Li et.al. 2512.22420 null
2025-12-22 Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs Xinhao Cheng et.al. 2512.22219 null
2025-12-20 MatKV: Trading Compute for Flash Storage in LLM Inference Kun-Woo Shin et.al. 2512.22195 null
2025-12-26 Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling Hannah Atmer et.al. 2512.22066 null
2025-12-26 Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models Tingyang Sun et.al. 2512.21884 null
2025-12-26 LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices Mingyu Sun et.al. 2512.21835 null
2025-12-23 Predictive-LoRA: A Proactive and Fragmentation-Aware Serverless Inference System for LLMs Yinan Ni et.al. 2512.20210 null
2025-12-23 Concept Generalization in Humans and Large Language Models: Insights from the Number Game Arghavan Bazigaran et.al. 2512.20162 null
2025-12-20 TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale Dongha Yoon et.al. 2512.18194 null
2025-12-20 Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference Rui Xie et.al. 2512.18152 null
2025-12-19 Specification and Detection of LLM Code Smells Brahim Mahmoudi et.al. 2512.18020 null
2025-12-19 CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs Gunho Park et.al. 2512.17970 null
2025-12-19 Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing Lingxiao Zhao et.al. 2512.17574 null
2025-12-22 Learning What to Write: Write-Gated KV for Efficient Long-Context Inference Yen-Chieh Huang et.al. 2512.17452 null
2025-12-18 Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference Dhruv Deshmukh et.al. 2512.16391 null
2025-12-18 Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference Arther Tian et.al. 2512.16317 null
2025-12-18 Fast Collaborative Inference via Distributed Speculative Decoding Ce Zheng et.al. 2512.16273 null
2025-12-18 Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference Jian Tian et.al. 2512.16134 null
2025-12-16 EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving Shaoting Feng et.al. 2512.14946 null
2025-12-16 Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement Songze Liu et.al. 2512.14151 null
2025-12-14 Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM Furong Jia et.al. 2512.12868 null
2025-12-14 Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P Anurag Dutt et.al. 2512.12801 null
2025-12-13 V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval Donghyuk Kim et.al. 2512.12284 null
2025-12-12 Learning to Extract Context for Context-Aware LLM Inference Minseon Kim et.al. 2512.11986 null
2025-12-12 PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration Yifan Zhang et.al. 2512.11550 null
2025-12-12 AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference Kuan-Wei Lu et.al. 2512.11280 null
2025-12-12 Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference Adilet Metinov et.al. 2512.11221 null
2025-12-11 LLM-Auction: Generative Auction towards LLM-Native Advertising Chujie Zhao et.al. 2512.10551 null
2025-12-14 GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference Phuong Tran et.al. 2512.09963 null
2025-12-10 RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference Siyuan Ma et.al. 2512.09304 null
2025-12-09 Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging Yi Pan et.al. 2512.08365 null
2025-12-08 NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models Feng Liang et.al. 2512.07218 null
2025-12-08 Leveraging KV Similarity for Online Structured Pruning in LLMs Jungmin Lee et.al. 2512.07090 null
2025-12-07 PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance Jifar Wakuma Ayana et.al. 2512.06747 null
2025-12-07 KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models Sourjya Roy et.al. 2512.06727 null
2025-12-06 Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices Xiangyu Li et.al. 2512.06443 null
2025-12-05 Compass: Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads Boyu Li et.al. 2512.06093 null
2025-12-05 KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity Damien Lesens et.al. 2512.05916 null
2025-12-05 RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs Jonathan Geuter et.al. 2512.05542 null
2025-12-05 Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches Namu Park et.al. 2512.05537 null
2025-12-05 Knowing Your Uncertainty -- On the application of LLM in social sciences Bolun Zhang et.al. 2512.05461 null
2025-12-04 Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge Brittany Johnson et.al. 2512.05176 null
2025-12-04 Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning Purbesh Mitra et.al. 2512.05105 null
2025-12-04 David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? Shashwat Shankar et.al. 2512.05073 null
2025-12-04 MemLoRA: Distilling Expert Adapters for On-Device Memory Systems Massimo Bini et.al. 2512.04763 null
2025-12-04 EtCon: Edit-then-Consolidate for Reliable Knowledge Editing Ruilin Li et.al. 2512.04753 null
2025-12-04 RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting Siqi Wang et.al. 2512.04752 null
2025-12-04 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild Yigui Feng et.al. 2512.04728 null
2025-12-04 PBFuzz: Agentic Directed Fuzzing for PoV Generation Haochen Zeng et.al. 2512.04611 null
2025-12-04 A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution Huifeng Zhu et.al. 2512.04580 null
2025-12-04 On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference Yue Yu et.al. 2512.04558 null
2025-12-04 MSME: A Multi-Stage Multi-Expert Framework for Zero-Shot Stance Detection Yuanshuo Zhang et.al. 2512.04492 null
2025-12-04 LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models Jiaqi Sun et.al. 2512.04474 null
2025-12-03 AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving Ying Wang et.al. 2512.04013 null
2025-12-03 OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference Liujianfu Wang et.al. 2512.03927 null
2025-12-03 Training and Evaluation of Guideline-Based Medical Reasoning in LLMs Michael Staniek et.al. 2512.03838 null
2025-12-03 ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers Feice Huang et.al. 2512.03673 null
2025-12-03 KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash Computing Lishuo Deng et.al. 2512.03608 null
2025-12-03 EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths Zhening Li et.al. 2512.03571 null
2025-12-03 A Preliminary Study on the Promises and Challenges of Native Top- $k$ Sparse Attention Di Xiu et.al. 2512.03494 null
2025-12-03 From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation Qingchuan Li et.al. 2512.03360 null
2025-12-03 Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs Ngoc Bui et.al. 2512.03324 null
2025-12-02 LLM-Guided Material Inference for 3D Point Clouds Nafiseh Izadyar et.al. 2512.03237 null
2025-12-02 TokenPowerBench: Benchmarking the Power Consumption of LLM Inference Chenxu Niu et.al. 2512.03024 null
2025-12-02 Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge Hamid Dadkhahi et.al. 2512.03019 null
2025-12-02 FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization Feiyu Wang et.al. 2512.02901 null
2025-12-02 OptPO: Optimal Rollout Allocation for Test-time Policy Optimization Youkang Wang et.al. 2512.02882 null
2025-12-02 Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages Lechen Zhang et.al. 2512.02841 null
2025-12-02 FiMMIA: scaling semantic perturbation-based membership inference across modalities Anton Emelyanov et.al. 2512.02786 null
2025-12-02 Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs Julian Ma et.al. 2512.02719 null
2025-12-02 CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning Songqiao Su et.al. 2512.02551 null
2025-12-02 In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs Vishnu Sarukkai et.al. 2512.02543 null
2025-12-02 Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective Qiyao Xue et.al. 2512.02340 null
2025-12-01 Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling Jack Cook et.al. 2512.02010 null
2025-12-01 The Art of Scaling Test-Time Compute for Large Language Models Aradhye Agarwal et.al. 2512.02008 null
2025-12-01 KV Pareto: Systems-Level Optimization of KV Cache and Model Compression for Long Context Inference Sai Gokhale et.al. 2512.01953 null
2025-12-01 Latent Debate: A Surrogate Framework for Interpreting LLM Thinking Lihu Chen et.al. 2512.01909 null
2025-12-01 DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models Patrick Kwon et.al. 2512.01686 null
2025-12-01 A Systematic Characterization of LLM Inference on GPUs Haonan Wang et.al. 2512.01644 null
2025-12-01 LLM2Fx-Tools: Tool Calling For Music Post-Production Seungheon Doh et.al. 2512.01559 null
2025-12-01 Multi-Path Collaborative Reasoning via Reinforcement Learning Jindi Lv et.al. 2512.01485 null
2025-12-01 ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation Rohin Manvi et.al. 2512.01457 null
2025-12-01 Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning Jiahao Yuan et.al. 2512.01282 null
2025-11-30 Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios Jianxiang Zang et.al. 2512.00920 null
2025-11-30 AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent Neeraj Anand et.al. 2512.00846 null
2025-11-30 ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI Fabrizio Maturo et.al. 2512.00839 null
2025-11-30 SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving Bohan Zhao et.al. 2512.00719 null
2025-11-29 SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling Yang Xiao et.al. 2512.00466 null
2025-11-29 Echo-N1: Affective RL Frontier Naifan Zhang et.al. 2512.00344 null
2025-11-29 Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA Takuto Ando et.al. 2512.00335 null
2025-11-29 RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs Ruike Hu et.al. 2512.00319 null
2025-11-29 Evolving Paradigms in Task-Based Search and Learning: A Comparative Analysis of Traditional Search Engine with LLM-Enhanced Conversational Search System Zhitong Guan et.al. 2512.00313 null
2025-11-28 Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation Mohammad Abdollahi et.al. 2512.00215 null
2025-11-28 ThetaEvolve: Test-time Learning on Open Problems Yiping Wang et.al. 2511.23473 null
2025-11-28 Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs Jiancheng Dong et.al. 2511.23271 null
2025-11-28 Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering Qiming Li et.al. 2511.23231 null
2025-11-28 HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding Chen Li et.al. 2511.23178 null
2025-11-28 Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match Jinze Li et.al. 2511.22972 null
2025-11-28 Experts are all you need: A Composable Framework for Large Language Model Inference Shrihari Sridharan et.al. 2511.22955 null
2025-11-28 Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework Kelaiti Xiao et.al. 2511.22943 null
2025-11-28 RAG-Empowered LLM-Driven Dynamic Radio Resource Management in Open 6G RAN Onur Salan et.al. 2511.22933 null
2025-11-28 Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems Shashwat Jaiswal et.al. 2511.22880 null
2025-11-27 PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration Junfei Zhan et.al. 2511.22788 null
2025-11-27 CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights Mohaiminul Al Nahian et.al. 2511.22681 null
2025-11-27 GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents Xinyu Zhang et.al. 2511.22441 null
2025-11-27 FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators Shuao Jia et.al. 2511.22348 null
2025-11-27 Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends Pablo Prieto et.al. 2511.22334 null
2025-11-27 RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems Mengfan Li et.al. 2511.22275 null
2025-11-27 Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR Yuyang Zou et.al. 2511.22267 null
2025-11-27 Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information Lukas Struppek et.al. 2511.22176 null
2025-11-27 Statistical Independence Aware Caching for LLM Workflows Yihan Dai et.al. 2511.22118 null
2025-11-26 A Comparative Study of LLM Prompting and Fine-Tuning for Cross-genre Authorship Attribution on Chinese Lyrics Yuxin Li et.al. 2511.21930 null
2025-11-26 Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework Dong Wang et.al. 2511.21686 null
2025-11-26 DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving Fengze Yu et.al. 2511.21669 null
2025-11-26 Auxiliary Metrics Help Decoding Skill Neurons in the Wild Yixiu Zhao et.al. 2511.21610 null
2025-11-26 Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM Tim Trappen et.al. 2511.21413 null
2025-11-26 PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark Robert Belanec et.al. 2511.21285 null
2025-11-26 BRIDGE: Building Representations In Domain Guided Program Verification Robert Joseph George et.al. 2511.21104 null
2025-11-26 MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts Ivan Novikov et.al. 2511.21089 null
2025-11-26 OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection Chujie Wang et.al. 2511.21064 null
2025-11-26 LOOM: Personalized Learning Informed by Daily LLM Conversations Toward Long-Term Mastery via a Dynamic Learner Memory Graph Justin Cui et.al. 2511.21037 null
2025-11-26 CaptionQA: Is Your Caption as Useful as the Image Itself? Shijia Yang et.al. 2511.21025 null
2025-11-26 A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving Junhan Liao et.al. 2511.20982 null
2025-11-26 Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows Yinwei Dai et.al. 2511.20975 null
2025-11-25 Representation Interventions Enable Lifelong Unstructured Knowledge Control Xuyuan Liu et.al. 2511.20892 null
2025-11-25 Latent Collaboration in Multi-Agent Systems Jiaru Zou et.al. 2511.20639 null
2025-11-25 DiFR: Inference Verification Despite Nondeterminism Adam Karvonen et.al. 2511.20621 null
2025-11-25 Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models Shamima Hossain et.al. 2511.20531 null
2025-11-25 Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios Luohe Shi et.al. 2511.20340 null
2025-11-25 LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design Lianzhe Hu et.al. 2511.20276 null
2025-11-25 REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance Chuyi Kong et.al. 2511.20233 null
2025-11-25 Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management Xinjun Yang et.al. 2511.20172 null
2025-11-25 SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Zhenyi Shen et.al. 2511.20102 null
2025-11-25 More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering Duc Anh Vu et.al. 2511.20086 null
2025-11-25 Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design Zixiao Huang et.al. 2511.20048 null
2025-11-25 CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model Dapeng Zhang et.al. 2511.19914 null
2025-11-25 Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models Wentao Hu et.al. 2511.19822 null
2025-11-24 Gender Bias in Emotion Recognition by Large Language Models Maureen Herbert et.al. 2511.19785 null
2025-11-24 Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces Shaltiel Shmidman et.al. 2511.19333 null
2025-11-24 MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization Boyuan Wu et.al. 2511.19253 null
2025-11-24 Learning Plug-and-play Memory for Guiding Video Diffusion Models Selena Song et.al. 2511.19229 null
2025-11-24 From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation Moazzam Umer Gondal et.al. 2511.19149 null
2025-11-24 SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression Santhosh G S et.al. 2511.18936 null
2025-11-24 Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations Ryan Wong et.al. 2511.18933 null
2025-11-24 KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit Dezhi Ran et.al. 2511.18868 null
2025-11-24 Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models Yang Xiang et.al. 2511.18864 null
2025-11-24 UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model Changxin Huang et.al. 2511.18845 null
2025-11-24 Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds Mohammad Nour Al Awad et.al. 2511.18842 null
2025-11-23 A Needle in a Haystack: Intent-driven Reusable Artifacts Recommendation with LLMs Dongming Jin et.al. 2511.18343 null
2025-11-23 Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search Zhongkai Chen et.al. 2511.18270 null
2025-11-23 LLM Reasoning for Cold-Start Item Recommendation Shijun Li et.al. 2511.18261 null
2025-11-22 Towards Harnessing the Power of LLMs for ABAC Policy Mining More Aayush Babasaheb et.al. 2511.18098 null
2025-11-22 L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention Yuliang Zhan et.al. 2511.17910 null
2025-11-22 QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents Jordan Abi Nader et.al. 2511.17855 null
2025-11-21 Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch Ziyang Zhang et.al. 2511.17826 null
2025-11-21 APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs Aishwarya Mandyam et.al. 2511.17818 null
2025-11-21 That's not natural: The Impact of Off-Policy Training Data on Probe Performance Nathalie Kirch et.al. 2511.17408 null
2025-11-21 SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion Jiajie Guo et.al. 2511.17308 null
2025-11-21 Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models Vy Nguyen et.al. 2511.17170 null
2025-11-21 ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better Yuan Zhang et.al. 2511.17106 null
2025-11-21 Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters Zhan Su et.al. 2511.17044 null
2025-11-21 Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems Kirill Nagaitsev et.al. 2511.16964 null
2025-11-20 Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems Elias Lumer et.al. 2511.16654 null
2025-11-20 Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation Kexin Zhao et.al. 2511.16577 null
2025-11-20 The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation Jiaheng Zhang et.al. 2511.16543 null
2025-11-20 Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks Éloïse Benito-Rodriguez et.al. 2511.16540 null
2025-11-20 Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement Jiashu Yao et.al. 2511.16331 null
2025-11-20 SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning Wei Xia et.al. 2511.16324 null
2025-11-20 T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs Shao-Jun Xia et.al. 2511.16107 null
2025-11-20 Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio Mohan Shi et.al. 2511.16046 null
2025-11-20 A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference Michael V. DeBole et.al. 2511.15950 null
2025-11-19 Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization Rahul Krishna Thomas et.al. 2511.15898 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-19 A Tensor Compiler for Processing-In-Memory Architectures Peiming Yang et.al. 2511.15503 null
2025-11-19 Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining Qian'ang Mao et.al. 2511.15456 null
2025-11-19 Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization Suyu Chen et.al. 2511.15389 null
2025-11-19 HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning Alexis Correa-Guillén et.al. 2511.15355 null
2025-11-19 OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition Xinli Tao et.al. 2511.15211 null
2025-11-19 As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files Haodong Li et.al. 2511.15192 null
2025-11-19 Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference Kexin Chu et.al. 2511.15015 null
2025-11-18 Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models Rui Zhu et.al. 2511.14694 null
2025-11-18 Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer Kallol Mondal et.al. 2511.14691 null
2025-11-18 Bias in, Bias out: Annotation Bias in Multilingual Large Language Models Xia Cui et.al. 2511.14662 null
2025-11-18 AutoTool: Efficient Tool Selection for Large Language Model Agents Jingyi Jia et.al. 2511.14650 null
2025-11-18 A Controllable Perceptual Feature Generative Model for Melody Harmonization via Conditional Variational Autoencoder Dengyun Huang et.al. 2511.14600 null
2025-11-18 Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language Minyoung Hwang et.al. 2511.14565 null
2025-11-18 CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design Jiawei Yi et.al. 2511.14510 null
2025-11-18 Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks Mulei Ma et.al. 2511.14450 null
2025-11-18 PathMind: A Retrieve-Prioritize-Reason Framework for Knowledge Graph Reasoning with Large Language Models Yu Liu et.al. 2511.14256 null
2025-11-18 Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation Yu Zhong et.al. 2511.14131 null
2025-11-18 PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval Chun Chet Ng et.al. 2511.14130 null
2025-11-18 Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services Liuyi Jin et.al. 2511.14119 null
2025-11-18 FailSafe: High-performance Resilient Serving Ziyi Xu et.al. 2511.14116 null
2025-11-17 TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone Xunjie Wang et.al. 2511.13717 null
2025-11-17 T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization Hyunwoo Oh et.al. 2511.13676 null
2025-11-17 Tight and Practical Privacy Auditing for Differentially Private In-Context Learning Yuyang Xia et.al. 2511.13502 null
2025-11-17 Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment Jea Kwon et.al. 2511.13290 null
2025-11-17 Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms Patrick Parschan et.al. 2511.13238 null
2025-11-17 TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs Yuxiang Zhang et.al. 2511.13223 null
2025-11-17 TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine Tianai Huang et.al. 2511.13169 null
2025-11-17 MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity Vladimír Macko et.al. 2511.13061 null
2025-11-17 RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems Zhengchao Wang et.al. 2511.12979 null
2025-11-17 MedRule-KG: A Knowledge-Graph--Steered Scaffold for Reliable Mathematical and Biomedical Reasoning Crystal Su et.al. 2511.12963 null
2025-11-16 ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction Pengze Li et.al. 2511.12485 null
2025-11-16 Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models Chenglong Wang et.al. 2511.12464 null
2025-11-15 Optimal Self-Consistency for Efficient Reasoning with Large Language Models Austin Feng et.al. 2511.12309 null
2025-11-15 Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing Khyati Kiyawat et.al. 2511.12286 null
2025-11-15 MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues Liang Xue et.al. 2511.12213 null
2025-11-15 AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing Qingyu Zhang et.al. 2511.12133 null
2025-11-15 OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description Quanxing Xu et.al. 2511.12131 null
2025-11-15 BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel Hao Li et.al. 2511.12043 null
2025-11-15 Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding Arun Ramachandran et.al. 2511.12031 null
2025-11-14 Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Siyou Li et.al. 2511.11910 null
2025-11-14 Experience-Guided Adaptation of Inference-Time Reasoning Strategies Adam Stein et.al. 2511.11519 null
2025-11-14 W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search Zhenyu Ding et.al. 2511.11518 null
2025-11-14 MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism Shulin Liu et.al. 2511.11373 null
2025-11-14 iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference Wei Fan et.al. 2511.11306 null
2025-11-14 T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup Jianyu Wei et.al. 2511.11248 null
2025-11-14 STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models Huajian Zhang et.al. 2511.11233 null
2025-11-14 AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization Zhonghua Jiang et.al. 2511.11106 null
2025-11-14 GraphMASAL: A Graph-based Multi-Agent System for Adaptive Learning Biqing Zeng et.al. 2511.11035 null
2025-11-14 DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition HongYu Liu et.al. 2511.11000 null
2025-11-14 DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition Ren Zhang et.al. 2511.10948 null
2025-11-13 ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference Yesheng Liang et.al. 2511.10645 null
2025-11-13 Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs Changhai Man et.al. 2511.10480 null
2025-11-13 FactGuard: Event-Centric and Commonsense-Guided Fake News Detection Jing He et.al. 2511.10281 null
2025-11-13 Efficient Thought Space Exploration through Strategic Intervention Ziheng Li et.al. 2511.10038 null
2025-11-13 EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models Jialin Wu et.al. 2511.09880 null
2025-11-13 HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning Nikunj Gupta et.al. 2511.09873 null
2025-11-12 From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance Jeongho Min et.al. 2511.09820 null
2025-11-13 LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations with Fast All-Reduce Communication Prajwal Singhania et.al. 2511.09557 null
2025-11-12 Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling Shiyu Ji et.al. 2511.09345 null
2025-11-12 Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference Tong Wu et.al. 2511.09323 null
2025-11-10 Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models Tianrui Song et.al. 2511.07295 null
2025-11-10 P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats Yuzong Chen et.al. 2511.06838 null
2025-11-09 Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism Cong Li et.al. 2511.06247 null
2025-11-09 LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs Zifan He et.al. 2511.06174 null
2025-11-08 MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference Myunghyun Rhee et.al. 2511.06010 null
2025-11-08 MCP-RiskCue: Can LLM infer risk information from MCP server System Logs? Jiayi Fu et.al. 2511.05867 null
2025-11-06 Enabling Dynamic Sparsity in Quantized LLM Inference Rongxiang Wang et.al. 2511.04477 null
2025-11-06 E-CARE: An Efficient LLM-based Commonsense-Augmented Framework for E-Commerce Ge Zhang et.al. 2511.04087 null
2025-11-06 PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration Yue Jiet Chong et.al. 2511.04036 null
2025-11-06 LLM-Driven Adaptive Source-Sink Identification and False Positive Mitigation for Static Analysis Shiyin Lin et.al. 2511.04023 null
2025-11-05 RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse Yinsicheng Jiang et.al. 2511.03475 null
2025-11-07 UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM Hai Huang et.al. 2511.03293 null
2025-11-04 Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes Mohammadsajad Alipour et.al. 2511.02681 null
2025-11-04 Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks Xiumei Deng et.al. 2511.02647 null
2025-11-04 Verifying LLM Inference to Prevent Model Weight Exfiltration Roy Rinberg et.al. 2511.02620 null
2025-11-03 KV Cache Transform Coding for Compact Storage in LLM Inference Konrad Staniszewski et.al. 2511.01815 null
2025-11-04 Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding Jungyeon Koh et.al. 2511.01695 null
2025-11-03 Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving Chengying Huan et.al. 2511.01633 null
2025-11-03 When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding Min Fang et.al. 2511.01282 null
2025-11-04 CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing Yifan Zhou et.al. 2511.01197 null
2025-11-04 SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding Jameson Sandler et.al. 2511.00606 null
2025-11-01 FlashEVA: Accelerating LLM inference via Efficient Attention Juan Gabriel Kostelec et.al. 2511.00576 null
2025-10-31 AMD MI300X GPU Performance Analysis Chandrish Ambati et.al. 2510.27583 null
2025-10-31 Glia: A Human-Inspired AI for Automated Systems Design and Optimization Pouya Hamadanian et.al. 2510.27176 null
2025-10-30 Beyond Benchmarks: The Economics of AI Inference Boqin Zhuang et.al. 2510.26136 null
2025-10-31 AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache Dinghong Song et.al. 2510.25979 null
2025-10-31 NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium Dinghong Song et.al. 2510.25977 null
2025-10-29 Serve Programs, Not Prompts In Gim et.al. 2510.25412 null
2025-10-26 Batch Speculative Decoding Done Right Ranran Haoran Zhang et.al. 2510.22876 null
2025-10-26 Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration Yuval Kainan et.al. 2510.22679 null
2025-10-26 SABlock: Semantic-Aware KV Cache Eviction with Adaptive Compression Block Size Jinhan Chen et.al. 2510.22556 null
2025-10-22 Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs Hongyi Liu et.al. 2510.20064 null
2025-10-22 Are Large Language Models Sensitive to the Motives Behind Communication? Addison J. Wu et.al. 2510.19687 null
2025-10-30 DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference Xiang Liu et.al. 2510.19669 null
2025-10-21 SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices Pan Zhou et.al. 2510.18544 null
2025-10-19 Justitia: Fair and Efficient Scheduling for LLM Applications Mingyan Yang et.al. 2510.17015 null
2025-10-18 FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference Jian Ma et.al. 2510.16418 null
2025-10-16 AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization Mengtao Lv et.al. 2510.16045 null
2025-10-16 Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing Tianhua Xia et.al. 2510.16040 null
2025-10-28 TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs Sibo Xiao et.al. 2510.15545 null
2025-10-16 Tail-Optimized Caching for LLM Inference Wenxin Zhang et.al. 2510.15152 null
2025-10-16 xLLM Technical Report Tongxuan Liu et.al. 2510.14686 null
2025-10-16 MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving Jungi Lee et.al. 2510.14557 null
2025-10-16 FairBatching: Fairness-Aware Batch Formation for LLM Inference Hongtao Lyu et.al. 2510.14392 null
2025-10-16 Qwen3Guard Technical Report Haiquan Zhao et.al. 2510.14276 null
2025-10-15 Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management Thanh Son Phung et.al. 2510.14024 null
2025-10-15 Adaptive Rescheduling in Prefill-Decode Disaggregated LLM Inference Zhibin Wang et.al. 2510.13668 null
2025-10-15 F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMs Jude Haris et.al. 2510.13401 null
2025-10-15 Taming the Fragility of KV Cache Eviction in LLM Inference Yuan Feng et.al. 2510.13334 null
2025-10-15 Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference Nikhil Bhendawade et.al. 2510.13161 null
2025-10-14 Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? Cedric Richter et.al. 2510.12702 null
2025-10-14 Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models Donghwan Rho et.al. 2510.12343 null
2025-10-13 Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding Bingjie Zhu et.al. 2510.11331 null
2025-10-13 Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs João Paulo Cardoso de Lima et.al. 2510.11192 null
2025-10-11 CacheClip: Accelerating RAG with Effective KV Cache Reuse Bin Yang et.al. 2510.10129 null
2025-10-10 FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference Yu-Chen Lu et.al. 2510.09332 null
2025-10-10 Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion Ruitong Liu et.al. 2510.08966 null
2025-10-13 Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors Xin Liu et.al. 2510.08907 null
2025-10-09 SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference Hengrui Zhang et.al. 2510.08544 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models Zhiqing Cui et.al. 2510.07858 null
2025-10-09 OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference Yuzhe Gu et.al. 2510.07651 null
2025-10-08 Accelerating Diffusion LLM Inference via Local Determinism Propagation Fanheng Kong et.al. 2510.07081 null
2025-10-08 Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon Baraq Lipshitz et.al. 2510.06957 null
2025-10-07 VecInfer: Efficient LLM Inference with Low-Bit KV Cache via Outlier-Suppressed Vector Quantization Dingyu Yao et.al. 2510.06175 null
2025-10-07 lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models Haoxin Wang et.al. 2510.06126 null
2025-10-07 From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs Tianhao Zhu et.al. 2510.05632 null
2025-10-06 KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction Utkarsh Saxena et.al. 2510.05373 null
2025-10-06 A novel hallucination classification framework Maksym Zavhorodnii et.al. 2510.05189 null
2025-10-06 RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms Samah Kansab et.al. 2510.04796 null
2025-10-05 Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye et.al. 2510.04371 null
2025-10-03 Best-of-Majority: Minimax-Optimal Strategy for Pass@ $k$ Inference Scaling Qiwei Di et.al. 2510.03199 null
2025-10-03 Dissecting Transformers: A CLEAR Perspective towards Green AI Hemang Jain et.al. 2510.02810 null
2025-10-03 HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference Shubham Negi et.al. 2510.02675 null
2025-10-01 PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference Hongbo Liu et.al. 2510.02395 null
2025-10-03 Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey Qiyuan Liu et.al. 2510.01925 null
2025-10-02 SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning Shicheng Liu et.al. 2510.01832 null
2025-10-01 HiSpec: Hierarchical Speculative Decoding for LLMs Avinash Kumar et.al. 2510.01336 null
2025-10-01 Generalized Parallel Scaling with Interdependent Generations Harry Dong et.al. 2510.01143 null
2025-10-01 AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size Guanxi Lu et.al. 2509.26432 null
2025-09-30 Parallax: Efficient LLM Inference Service over Decentralized Environment Chris Tong et.al. 2509.26182 null
2025-09-30 Accelerating LLM Inference with Precomputed Query Storage Jay H. Park et.al. 2509.25919 null
2025-09-30 SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV Jingyao Zhang et.al. 2509.25853 null
2025-09-29 SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching Xinye Zhao et.al. 2509.24832 null
2025-09-29 Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding Sungkyun Kim et.al. 2509.24328 null
2025-09-29 VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference Ke Wang et.al. 2509.24257 null
2025-09-28 Collaborative Device-Cloud LLM Inference through Reinforcement Learning Wenzhi Fang et.al. 2509.24050 null
2025-10-01 A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving Yue Zhang et.al. 2509.23384 null
2025-09-27 Scaling LLM Test-Time Compute with Mobile NPU on Smartphones Zixu Hao et.al. 2509.23324 null
2025-09-27 Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Vage Egiazarian et.al. 2509.23202 null
2025-09-26 Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs Shirin Alanova et.al. 2509.22166 null
2025-09-26 Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding Shijing Hu et.al. 2509.22134 null
2025-09-26 SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation Haotian Tan et.al. 2509.21932 null
2025-09-25 Preemptive Detection and Steering of LLM Misalignment via Latent Reachability Sathwik Karnik et.al. 2509.21528 null
2025-09-25 Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks Murat Arda Onsu et.al. 2509.21259 null
2025-09-24 FastEagle: Cascaded Drafting for Accelerating Speculative Decoding Haiduo Huang et.al. 2509.20416 null
2025-09-24 Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment Deokjae Lee et.al. 2509.20214 null
2025-09-24 Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM Inference Haoyu Chen et.al. 2509.19729 null
2025-09-23 Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs Marcin Chrapek et.al. 2509.18886 null
2025-09-22 Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models Dingxin Lu et.al. 2509.18221 null
2025-09-28 Disaggregated Prefill and Decoding Inference System for Large Language Model Serving on Multi-Vendor GPUs Xing Chen et.al. 2509.17542 null
2025-09-22 Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill Yunzhao Liu et.al. 2509.17357 null
2025-09-22 Multi-View Attention Multiple-Instance Learning Enhanced by LLM Reasoning for Cognitive Distortion Detection Jun Seo Kim et.al. 2509.17292 null
2025-09-21 MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference Zheming Yang et.al. 2509.16995 null
2025-09-20 Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads Mert Hidayetoglu et.al. 2509.16495 null
2025-09-19 LightCode: Compiling LLM Inference for Photonic-Electronic Systems Ryan Tomich et.al. 2509.16443 null
2025-09-19 LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference Hantao Yang et.al. 2509.15515 null
2025-09-18 A1: Asynchronous Test-Time Scaling via Conformal Prediction Jing Xiong et.al. 2509.15148 null
2025-09-18 LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism Yimin Wang et.al. 2509.14781 null
2025-09-18 LLM Jailbreak Detection for (Almost) Free! Guorui Chen et.al. 2509.14558 null
2025-09-17 TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge Zhirui Huang et.al. 2509.13765 null
2025-09-16 Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management Thanh Son Phung et.al. 2509.13201 null
2025-09-16 HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference Cenlin Duan et.al. 2509.12993 null
2025-09-15 Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference Synthia Wang et.al. 2509.12152 null
2025-09-14 Framing AI System Benchmarking as a Learning Task: FlexBench and the Open MLPerf Dataset Grigori Fursin et.al. 2509.11413 null
2025-09-14 PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits Loka Li et.al. 2509.11362 null
2025-09-14 AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs Santhosh G S et.al. 2509.11155 null
2025-09-12 MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness Huizheng Wang et.al. 2509.10372 null
2025-09-11 LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation Yiqun Shen et.al. 2509.09754 null
2025-09-11 Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference Haoran Wu et.al. 2509.09505 null
2025-08-06 Frontier: Simulating the Next Generation of LLM Inference Systems Yicheng Feng et.al. 2508.03148 null
2025-07-25 Cloud Native System for LLM Inference Serving Minxian Xu et.al. 2507.18007 null
2025-07-23 BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving Wanyi Zheng et.al. 2507.17120 null
2025-07-22 Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework Hongyi Tang et.al. 2507.16414 null
2025-07-21 Efficient Routing of Inference Requests across LLM Instances in Cloud-Edge Computing Shibo Yu et.al. 2507.15553 null
2025-07-18 Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need Michael Davies et.al. 2507.14397 null
2025-07-18 Can LLMs Infer Personality from Real World Conversations? Jianfeng Zhu et.al. 2507.14355 null
2025-07-23 Photonic Fabric Platform for AI Accelerators Jing Ding et.al. 2507.14000 null
2025-07-18 LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues Haoyang Li et.al. 2507.13681 null
2025-07-16 Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage Junqing Lin et.al. 2507.12205 null
2025-07-15 MIRAGE: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving Ruihao Li et.al. 2507.11507 null
2025-07-15 Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations Miray Özcan et.al. 2507.11417 null
2025-07-14 Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference Jiaming Cheng et.al. 2507.09942 null
2025-07-12 SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding Weihong Xu et.al. 2507.09201 null
2025-07-11 On Evaluating Performance of LLM Inference Serving Systems Amey Agrawal et.al. 2507.09019 null
2025-07-11 Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge Large Language Model Inference Chun-Ting Chen et.al. 2507.09010 null
2025-07-11 InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching Yilun Wang et.al. 2507.08523 null
2025-07-10 Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions Quanyan Zhu et.al. 2507.08208 null
2025-07-10 Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing Junyi Wen et.al. 2507.08045 null
2025-07-15 Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models Varin Sikka et.al. 2507.07505 null
2025-07-11 QUEST: Query Optimization in Unstructured Document Analysis Zhaoze Sun et.al. 2507.06515 null
2025-07-08 Voltage Regulation in Distribution Systems with Data Center Loads Yize Chen et.al. 2507.06416 null
2025-07-07 Cascade: Token-Sharded Private LLM Inference Rahul Thomas et.al. 2507.05228 null
2025-07-07 Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Yun Qu et.al. 2507.04632 null
2025-07-05 Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States Karine Karine et.al. 2507.03871 null
2025-07-05 OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference Seungjun Shin et.al. 2507.03865 null
2025-07-04 Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA Jindong Li et.al. 2507.03308 null
2025-07-03 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference Weishu Deng et.al. 2507.03153 null
2025-07-03 On the Convergence of Large Language Model Optimizer for Black-Box Network Management Hoon Lee et.al. 2507.02689 null
2025-07-03 Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure Rui Xie et.al. 2507.02654 null
2025-07-03 FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference Xing Liu et.al. 2507.02620 null
2025-07-02 Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency Zongpu Zhang et.al. 2507.02135 null
2025-07-02 LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation Tianyu Liu et.al. 2507.01449 null
2025-07-02 SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech Cheng Zhuangfei et.al. 2507.01348 null
2025-07-02 La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation Kai Liu et.al. 2507.01299 null
2025-07-01 VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator Zhican Wang et.al. 2507.00797 null
2025-07-01 Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models Yilun Zhang et.al. 2507.00653 null
2025-07-01 LLM-Mesh: Enabling Elastic Sharing for Serverless LLM Inference Chuhao Xu et.al. 2507.00507 null
2025-07-01 Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs Mohammad Firas Sada et.al. 2507.00418 null
2025-06-30 Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission Faranaksadat Solat et.al. 2507.00082 null
2025-06-27 QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization Danush Khanna et.al. 2506.22396 null
2025-06-27 Towards Operational Data Analytics Chatbots -- Virtual Knowledge Graph is All You Need Junaid Ahmed Khan et.al. 2506.22267 null
2025-06-27 SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference Yongchao He et.al. 2506.22033 null
2025-06-30 A Survey of LLM Inference Systems James Pan et.al. 2506.21901 null
2025-06-17 Utility-Driven Speculative Decoding for Mixture-of-Experts Anish Saxena et.al. 2506.20675 null
2025-07-02 Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU He Sun et.al. 2506.20187 null
2025-06-24 MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection Zhengxiang Huang et.al. 2506.19884 null
2025-06-23 Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation Ahmadreza Saboor Yaraghi et.al. 2506.19045 null
2025-06-23 WiLLM: An Open Wireless LLM Communication System Boyi Liu et.al. 2506.19030 null
2025-06-23 CommVQ: Commutative Vector Quantization for KV Cache Compression Junyan Li et.al. 2506.18879 null
2025-06-22 Mechanistic Interpretability in the Presence of Architectural Obfuscation Marcos Florencio et.al. 2506.18053 null
2025-06-20 Towards AI Search Paradigm Yuchen Li et.al. 2506.17188 null
2025-06-17 CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision Dyah Adila et.al. 2506.14912 null
2025-06-16 Vector Ontologies as an LLM world view extraction method Kaspar Rothenfusser et.al. 2506.13252 link
2025-06-13 Semantic Scheduling for LLM Inference Wenyue Hua et.al. 2506.12204 link
2025-06-13 GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news Abdul Haque et.al. 2506.11600 null
2025-06-13 Collaborative LLM Inference via Planning for Efficient Reasoning Byeongchan Lee et.al. 2506.11578 null
2025-06-13 Efficient Long-Context LLM Inference via KV Cache Clustering Jie Hu et.al. 2506.11418 null
2025-06-12 TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference Hongbin Zhang et.al. 2506.10470 null
2025-06-11 A First Look at Bugs in LLM Inference Engines Mugeng Liu et.al. 2506.09713 link
2025-06-12 Understanding the Performance and Power of LLM Inferencing on Edge Accelerators Mayank Arya et.al. 2506.09554 null
2025-06-11 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning Jiayi Yuan et.al. 2506.09501 null
2025-06-10 Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive- $k$ Chihiro Taguchi et.al. 2506.08479 null
2025-06-10 Draft-based Approximate Inference for LLMs Kevin Galim et.al. 2506.08373 link
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-07 Containerized In-Storage Processing and Computing-Enabled SSD Disaggregation Miryeong Kwon et.al. 2506.06769 null
2025-06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Adarsh Prasad Behera et.al. 2506.06579 null
2025-06-04 On the Fundamental Impossibility of Hallucination Control in Large Language Models Michał P. Karpowicz et.al. 2506.06382 null
2025-06-04 SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling Anhao Zhao et.al. 2506.04179 null
2025-06-04 Pre $^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation Junyi Chen et.al. 2506.03887 null
2025-06-04 Client-Side Zero-Shot LLM Inference for Comprehensive In-Browser URL Analysis Avihay Cohen et.al. 2506.03656 null
2025-06-04 POSS: Position Specialist Generates Better Draft for Speculative Decoding Langlin Huang et.al. 2506.03566 link
2025-06-07 Parallel CPU-GPU Execution for LLM Inference on Constrained GPUs Jiakun Fan et.al. 2506.03296 null
2025-06-03 Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs Shangmin Guo et.al. 2506.02918 null
2025-06-03 HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference Ping Gong et.al. 2506.02572 link
2025-06-02 Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts Spencer Banasik et.al. 2506.01827 null
2025-05-30 Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching Juan Wisznia et.al. 2505.24643 null
2025-05-30 LLM Inference Enhanced by External Knowledge: A Survey Yu-Hsuan Lin et.al. 2505.24377 link
2025-05-30 SkyLB: A Locality-Aware Cross-Region Load Balancer for LLM Inference Tian Xia et.al. 2505.24095 null
2025-05-29 Large Language Model Meets Constraint Propagation Alexandre Bonlarron et.al. 2505.24012 null
2025-05-29 Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism Jinhui Wei et.al. 2505.23219 null
2025-05-29 SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference Yinghao Tang et.al. 2505.23022 null
2025-05-28 Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Donghyeon Joo et.al. 2505.22913 link
2025-05-28 Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference Yue Zhu et.al. 2505.21919 null
2025-05-28 HoliTom: Holistic Token Merging for Fast Video Large Language Models Kele Shao et.al. 2505.21334 link
2025-05-28 FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration Daehyeon Baek et.al. 2505.20839 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-26 MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 DECA: A Near-Core LLM Decompression Accelerator Supporting Out-of-Order Invocation Gerasimos Gerogiannis et.al. 2505.19349 null
2025-06-03 A Survey of LLM $\times$ DATA Xuanhe Zhou et.al. 2505.18458 null
2025-05-23 An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs Rahul Thomas et.al. 2505.18332 null
2025-05-23 NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache Donghyun Son et.al. 2505.18231 null
2025-05-23 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Michael Hassid et.al. 2505.17813 null
2025-05-23 DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies Ning Yang et.al. 2505.17420 null
2025-05-22 RAP: Runtime-Adaptive Pruning for LLM Inference Huanrong Liu et.al. 2505.17138 null
2025-05-22 CASTILLO: Characterizing Response Length Distributions of Large Language Models Daniel F. Perez-Ramirez et.al. 2505.16881 link
2025-05-22 Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization Vera Neplenbroek et.al. 2505.16467 link
2025-05-22 QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Benjamin Schneider et.al. 2505.16175 link
2025-05-22 KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization Mingbo Song et.al. 2505.16162 null
2025-05-20 Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity Susav Shrestha et.al. 2505.14884 link
2025-05-20 ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions Bufang Yang et.al. 2505.14668 null
2025-05-20 ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs Yifan Sui et.al. 2505.14468 null
2025-05-16 An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents Ayesha Amjad et.al. 2505.13504 null
2025-05-19 HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding Siran Liu et.al. 2505.13254 null
2025-05-19 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Guangda Liu et.al. 2505.13109 null
2025-05-19 FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks Zihua Wang et.al. 2505.12728 link
2025-05-17 Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning Yuheng Lu et.al. 2505.11922 null
2025-05-17 Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture Yu Wu et.al. 2505.11916 null
2025-05-16 TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference Raja Gond et.al. 2505.11329 link
2025-05-16 Vaiage: A Multi-Agent Solution to Personalized Travel Planning Binwen Liu et.al. 2505.10922 null
2025-05-19 SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices Xiangwen Zhuge et.al. 2505.10259 link
2025-05-15 ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production Yuxing Xiang et.al. 2505.09999 link
2025-05-15 How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference Nidhal Jegham et.al. 2505.09598 null
2025-05-14 Statistical Modeling and Uncertainty Estimation of LLM Inference Systems Kaustabha Ray et.al. 2505.09319 null
2025-05-15 ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor Seungbeom Choi et.al. 2505.09142 null
2025-05-13 LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries Zekun Wu et.al. 2505.08842 null
2025-05-13 Automatic Task Detection and Heterogeneous LLM Speculative Decoding Danying Ge et.al. 2505.08600 null
2025-05-08 Scaling Laws for Speculative Decoding Siyuan Yan et.al. 2505.07858 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-12 Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity Guang Yan et.al. 2505.07239 null
2025-05-12 PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications Kuntai Du et.al. 2505.07203 null
2025-05-14 I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference Zibo Gao et.al. 2505.06738 null
2025-05-09 Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference Haolin Zhang et.al. 2505.06461 null
2025-05-09 Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan et.al. 2505.05772 null
2025-05-08 HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow You Peng et.al. 2505.05286 link
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-05 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Yaoqi Chen et.al. 2505.02922 null
2025-05-03 High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers Brian Wong et.al. 2505.01693 null
2025-05-08 A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Sihyeong Park et.al. 2505.01658 link
2025-05-02 PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding Bradley McDanel et.al. 2505.01572 null
2025-04-28 AutoJudge: Judge Decoding Without Manual Annotation Roman Garipov et.al. 2504.20039 null
2025-04-28 Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen et.al. 2504.19720 link
2025-04-28 R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference Zhenyu Zhang et.al. 2504.19449 null
2025-05-07 A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification Junichiro Niimi et.al. 2504.18884 link
2025-04-29 PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation Zihao An et.al. 2504.18583 null
2025-04-25 PropRAG: Guiding Retrieval with Beam Search over Proposition Paths Jingjin Wang et.al. 2504.18070 null
2025-04-24 L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference Qingyuan Liu et.al. 2504.17584 null
2025-04-24 On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration Maoyang Xiang et.al. 2504.17376 null
2025-04-18 HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing Myunghyun Rhee et.al. 2504.16112 null
2025-04-22 Token-Aware Coding Flow: A Study with Nano Surge in Reasoning Model Junwei Hu et.al. 2504.15989 null
2025-04-23 KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park et.al. 2504.15364 null
2025-04-18 High-Throughput LLM inference on Heterogeneous Clusters Yi Xiong et.al. 2504.15303 null
2025-04-21 Hardware-based Heterogeneous Memory Management for Large Language Model Inference Soojin Hwang et.al. 2504.14893 null
2025-04-19 Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator Akshat Ramachandran et.al. 2504.14365 null
2025-04-19 FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference Coleman Hooper et.al. 2504.14152 null
2025-04-16 Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading Kihyun Kim et.al. 2504.11816 link
2025-04-16 Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs Hyungwoo Lee et.al. 2504.11765 null
2025-04-16 Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures Prabhu Vellaisamy et.al. 2504.11750 null
2025-04-15 Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints Ruicheng Ao et.al. 2504.11320 link
2025-04-14 HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar et.al. 2504.10724 null
2025-04-14 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Yangshen Deng et.al. 2504.10326 null
2025-04-14 KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference Yuxuan Tian et.al. 2504.09936 null
2025-04-22 Understanding and Optimizing Multi-Stage AI Inference Pipelines Abhimanyu Rajeshkumar Bambhaniya et.al. 2504.09775 null
2025-04-13 LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference Jianing Zheng et.al. 2504.09561 link
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-04-11 SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Jiaming Xu et.al. 2504.08850 null
2025-04-10 SD $^2$ : Self-Distilled Sparse Drafters Mike Lasby et.al. 2504.08838 null
2025-04-11 Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash Fucheng Jia et.al. 2504.08378 null
2025-04-11 Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices Shengyuan Ye et.al. 2504.08242 null
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 null
2025-04-10 Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving Shihong Gao et.al. 2504.07494 link
2025-04-10 UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference Weikai Xu et.al. 2504.07479 null
2025-04-10 Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents Yueying Li et.al. 2504.07347 null
2025-04-08 SPIRe: Boosting LLM Inference Throughput with Speculative Decoding Sanjit Neelam et.al. 2504.06419 null
2025-04-08 Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching Yanhao Dong et.al. 2504.06319 null
2025-04-09 Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Gleb Rodionov et.al. 2504.06261 link
2025-04-11 User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems Jianling Wang et.al. 2504.05522 null
2025-04-07 Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness Dongzhuoran Zhou et.al. 2504.05163 null
2025-04-04 Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Erik Johannes Husom et.al. 2504.03360 null
2025-04-04 Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation Weitao Li et.al. 2504.03165 link
2025-04-03 Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search Parsa Ghaffari et.al. 2504.02426 link
2025-04-01 SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching Yuxuan Zhu et.al. 2504.00970 null
2025-04-03 Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding Aayush Gautam et.al. 2504.00030 null
2025-04-06 ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance Tong Xie et.al. 2503.24053 link
2025-03-31 MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration Tatsuya Kubo et.al. 2503.23817 null
2025-03-30 Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference Wei Tao et.al. 2503.23294 null
2025-03-28 Niyama : Breaking the Silos of LLM Inference Serving Kanishk Goel et.al. 2503.22562 null
2025-03-25 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Han Chen et.al. 2503.19950 link
2025-03-24 xKV: Cross-Layer SVD for KV-Cache Compression Chi-Chih Chang et.al. 2503.18893 link
2025-03-27 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-03-24 Jenga: Effective Memory Management for Serving LLM with Heterogeneity Chen Zhang et.al. 2503.18292 null
2025-03-27 WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference Youhui Zuo et.al. 2503.17922 link
2025-03-22 PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling Chongpeng Liu et.al. 2503.17707 null
2025-03-21 V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms Javier J. Poveda Rodrigo et.al. 2503.17422 null
2025-03-21 Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation Jingzhi Fang et.al. 2503.16893 null
2025-03-20 SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models Fahao Chen et.al. 2503.15921 null
2025-03-19 Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study Jomar Thomas Almonte et.al. 2503.15248 null
2025-03-19 Communication-Efficient Distributed On-Device LLM Inference Over Wireless Networks Kai Zhang et.al. 2503.14882 null
2025-03-18 PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play Wei Fang et.al. 2503.14432 null
2025-03-17 Mitigating KV Cache Competition to Enhance User Experience in LLM Inference Haiying Shen et.al. 2503.13773 null
2025-03-17 AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications Haiying Shen et.al. 2503.13737 null
2025-03-17 ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts Evangelos Georganas et.al. 2503.13565 null
2025-03-14 Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce Jingying Zeng et.al. 2503.13518 null
2025-03-17 xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Maximilian Beck et.al. 2503.13427 link
2025-03-17 VeriLeaky: Navigating IP Protection vs Utility in Fine-Tuning for LLM-Driven Verilog Coding Zeng Wang et.al. 2503.13116 null
2025-03-15 TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation Mayank Kumar et.al. 2503.12217 null
2025-03-09 Green Prompting Marta Adamska et.al. 2503.10666 null
2025-03-13 Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao et.al. 2503.10325 null
2025-03-12 Prompt Inference Attack on Distributed Large Language Model Inference Frameworks Xinjian Luo et.al. 2503.09291 null
2025-03-11 TokenSim: Enabling Hardware and Software Exploration for Large Language Model Inference Systems Feiyang Wu et.al. 2503.08415 link
2025-03-11 Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference Pol G. Recasens et.al. 2503.08311 null
2025-03-09 Seesaw: High-throughput LLM Inference via Model Re-sharding Qidong Su et.al. 2503.06433 null
2025-03-07 Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching Bowen Pang et.al. 2503.05248 link
2025-03-07 SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding Kaiyu Huang et.al. 2503.05096 null
2025-03-15 Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking Yijie Xu et.al. 2503.04636 null
2025-03-06 AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services Xiaoqi Wang et.al. 2503.04418 null
2025-03-06 Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Kou Misaki et.al. 2503.04412 null
2025-03-06 Beyond Memorization: Evaluating the True Type Inference Capabilities of LLMs for Java Code Snippets Yiwen Dong et.al. 2503.04076 null
2025-03-04 FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference Hongchao Du et.al. 2503.03777 null
2025-03-05 MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Rui Ye et.al. 2503.03686 null
2025-03-04 VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference Zihan Liu et.al. 2503.02236 null
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-03-03 SAGE: A Framework of Precise Retrieval for RAG Jintao Zhang et.al. 2503.01713 null
2025-03-03 DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems Minoo Hosseinzadeh et.al. 2503.01704 null
2025-03-01 Tutorial Proposal: Speculative Decoding for Efficient LLM Inference Heming Xia et.al. 2503.00491 null
2025-02-28 FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference Xunhao Lai et.al. 2502.20766 link
2025-02-28 SPD: Sync-Point Drop for efficient tensor parallelism of Large Language Models Han-Byul Kim et.al. 2502.20727 null
2025-02-27 ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving Kai Mei et.al. 2502.20576 link
2025-02-26 Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs Yiheng Yang et.al. 2502.19078 null
2025-02-24 LLM Inference Acceleration via Efficient Operation Fusion Mahsa Salmani et.al. 2502.17728 null
2025-02-24 CodeSwift: Accelerating LLM Inference for Efficient Code Generation Qianhui Zhao et.al. 2502.17139 null
2025-02-24 Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM Lian Liu et.al. 2502.16963 null
2025-02-24 DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance Xuanfan Ni et.al. 2502.16886 null
2025-03-01 CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter Yepeng Weng et.al. 2502.16880 null
2025-02-23 DISC: Dynamic Decomposition Improves LLM Inference Scaling Jonathan Light et.al. 2502.16706 null
2025-02-23 TerEffic: Highly Efficient Ternary LLM Inference on FPGA Chenyang Yin et.al. 2502.16473 null
2025-02-21 KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse Jingbo Yang et.al. 2502.16002 link
2025-02-21 Towards Swift Serverless LLM Cold Starts with ParaServe Chiheng Lou et.al. 2502.15524 null
2025-02-24 HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings Rasmus Aavang et.al. 2502.15411 link
2025-02-24 Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference Yaohua Tang et.al. 2502.15294 null
2025-02-21 A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation Shilong Hou et.al. 2502.15233 link
2025-02-19 EvoP: Robust LLM Inference via Evolutionary Pruning Shangyu Wu et.al. 2502.14910 null
2025-02-20 Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale Shashwat Jaiswal et.al. 2502.14617 null
2025-02-20 SR-LLM: Rethinking the Structured Representation in Large Language Model Jiahuan Zhang et.al. 2502.14352 null
2025-02-19 RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression Payman Behnam et.al. 2502.14051 null
2025-02-19 Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference Qingfa Xiao et.al. 2502.13542 null
2025-02-19 What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis Peiran Wang et.al. 2502.13490 null
2025-02-18 BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference Ahmed Burak Gulhan et.al. 2502.13176 null
2025-02-18 R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs Sumin Jo et.al. 2502.12767 link
2025-02-18 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Cheng Luo et.al. 2502.12574 link
2025-02-18 Distributed On-Device LLM Inference With Over-the-Air Computation Kai Zhang et.al. 2502.12559 null
2025-02-18 SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs Ahmed F. AbouElhamayed et.al. 2502.12444 link
2025-02-17 Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu et.al. 2502.12216 null
2025-02-17 Designing Role Vectors to Improve LLM Inference Behaviour Daniele Potertì et.al. 2502.12055 null
2025-02-17 DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services Ting Sun et.al. 2502.11417 null
2025-02-17 Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment Ben Dong et.al. 2502.11347 null
2025-02-16 Diversified Sampling Improves Scaling LLM inference Tianchun Wang et.al. 2502.11027 null
2025-02-16 Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings Liangqi Yuan et.al. 2502.11007 link
2025-02-15 Pushing up to the Limit of Memory Bandwidth and Capacity Utilization for Efficient LLM Decoding on Embedded FPGA Jindong Li et.al. 2502.10659 null
2025-02-14 λScale: Enabling Fast Scaling for Serverless Large Language Model Inference Minchen Yu et.al. 2502.09922 null
2025-02-14 INF^2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing Hongsun Jang et.al. 2502.09921 null
2025-02-13 On multi-token prediction for efficient LLM inference Somesh Mehra et.al. 2502.09419 null
2025-02-13 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Heejun Lee et.al. 2502.08910 null
2025-02-12 Universal Model Routing for Efficient LLM Inference Wittawat Jitkrittum et.al. 2502.08773 null
2025-02-12 Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences Shanshan Han et.al. 2502.08142 null
2025-02-11 HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment Youhe Jiang et.al. 2502.07903 null
2025-02-11 SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters Yiping Wang et.al. 2502.07832 null
2025-02-11 PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference Yufeng Gu et.al. 2502.07578 link
2025-02-13 Online Scheduling for LLM Inference with KV Cache Constraints Patrick Jaillet et.al. 2502.07115 null
2025-02-08 Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models Soham Poddar et.al. 2502.05610 null
2025-02-08 Mechanistic Interpretability of Emotion Inference in Large Language Models Ala N. Tak et.al. 2502.05489 null
2025-02-07 BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference Reena Elangovan et.al. 2502.05376 null
2025-02-07 LLM Query Scheduling with Prefix Reuse and Latency Constraints Gregory Dexter et.al. 2502.04677 null
2025-02-06 WaferLLM: A Wafer-Scale LLM Inference System Congjie He et.al. 2502.04563 null
2025-02-06 KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li et.al. 2502.04420 link
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-06 AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference Qingyue Yang et.al. 2502.04077 link
2025-02-06 Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective Yuan Feng et.al. 2502.03805 link
2025-02-06 Adaptive Semantic Prompt Caching with VectorQ Luis Gaspar Schroeder et.al. 2502.03771 null
2025-02-05 HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference Zeyu Zhang et.al. 2502.03589 null
2025-02-05 Accessible and Portable LLM Inference by Compiling Computational Graphs into SQL Wenbo Sun et.al. 2502.02818 null
2025-02-05 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation Jingyu Liu et.al. 2502.02789 link
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-01-30 Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency Sazzad Hossain et.al. 2502.01651 null
2025-02-06 An Investigation of FP8 Across Accelerators for LLM Inference Jiwoo Kim et.al. 2502.01070 null
2025-02-02 Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference Patrick Yubeaton et.al. 2502.00922 null
2025-02-02 SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models Jiawen Zhang et.al. 2502.00847 null
2025-02-01 UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs Yizhe Xiong et.al. 2502.00439 null
2025-02-01 ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Xiang Liu et.al. 2502.00299 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-02-02 RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations Zunhai Su et.al. 2501.16383 link
2025-01-27 Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs Antony Bartlett et.al. 2501.16191 null
2025-01-27 TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference Jack Min Ong et.al. 2501.16007 null
2025-01-27 Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference Tharindu B. Hewage et.al. 2501.15829 link
2025-01-25 Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads Xingyang He et.al. 2501.15113 null
2025-01-24 Locality-aware Fair Scheduling in LLM Serving Shiyi Cao et.al. 2501.14312 null
2025-01-20 Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference Pouya Hamadanian et.al. 2501.11779 link
2025-01-20 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas Nishant Balepur et.al. 2501.11549 link
2025-01-19 GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation Shashikant Ilager et.al. 2501.11006 link
2025-01-17 A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks Xinzhe Li et.al. 2501.10069 link
2025-01-17 PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks Huiyou Zhan et.al. 2501.09367 null
2025-01-16 Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition Takaaki Hori et.al. 2501.09258 null
2025-01-15 Guiding Retrieval using LLM-based Listwise Rankers Mandeep Rathee et.al. 2501.09186 link
2025-01-14 Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings Paul Joe Maliakel et.al. 2501.08219 null
2025-01-14 PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving Ahmet Caner Yüzügüler et.al. 2501.08192 null
2025-01-14 Hierarchical Autoscaling for Large Language Model Serving with Chiron Archit Patke et.al. 2501.08090 null
2025-01-12 MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Wenxuan Zeng et.al. 2501.06807 null
2025-01-05 TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms Jovan Stojkovic et.al. 2501.02600 null
2025-01-04 AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Zhuomin He et.al. 2501.02336 link
2025-01-03 Efficient LLM Inference with Activation Checkpointing and Hybrid Caching Sanghyeon Lee et.al. 2501.01792 null
2025-01-03 BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference Wonsuk Jang et.al. 2501.01144 link
2025-01-02 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye et.al. 2501.01005 link
2024-12-23 Highly Optimized Kernels and Fine-Grained Codebooks for LLM Inference on Arm CPUs Dibakar Gope et.al. 2501.00032 link
2024-12-29 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang et.al. 2412.20501 link
2024-12-28 LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System Hyucksung Kwon et.al. 2412.20166 null
2024-12-19 GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors Chengming Zhang et.al. 2412.19829 null
2025-01-02 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li et.al. 2412.19442 link
2024-12-27 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong et.al. 2412.19394 link
2024-12-25 Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference Libo Zhang et.al. 2412.18934 null
2024-12-21 SYMPHONY: Improving Memory Management for LLM Inference Workloads Saurabh Agarwal et.al. 2412.16434 null
2024-12-20 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan et.al. 2412.15803 link
2024-12-18 A Survey on LLM Inference-Time Self-Improvement Xiangjue Dong et.al. 2412.14352 link
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 A System for Microserving of LLMs Hongyi Jin et.al. 2412.12488 null
2024-12-16 CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation Hongxuan Zhang et.al. 2412.11741 null
2024-12-15 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei et.al. 2412.11053 link
2024-12-13 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Yucheng Li et.al. 2412.10319 null
2024-12-17 TurboAttention: Efficient Attention Approximation For High Throughputs LLMs Hao Kang et.al. 2412.08585 null
2024-12-11 Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths Naryeong Kim et.al. 2412.08281 null
2024-12-12 TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch Xingchen Song et.al. 2412.08237 null
2024-12-09 Asynchronous LLM Function Calling In Gim et.al. 2412.07017 null
2024-12-09 SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs James Vo et.al. 2412.06198 null
2024-12-08 XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference Weizhuo Li et.al. 2412.05896 null
2024-12-06 GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments Yanyu Chen et.al. 2412.04788 null
2024-12-03 Multi-Bin Batching for Increasing LLM Inference Throughput Ozgur Guldogan et.al. 2412.04504 null
2024-11-29 BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching Zhen Zheng et.al. 2412.03594 null
2024-12-03 Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Da Ma et.al. 2412.02252 null
2024-12-02 PLD+: Accelerating LLM inference by leveraging Language Model Artifacts Shwetha Somasundaram et.al. 2412.01447 null
2024-12-02 Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking Marco Federici et.al. 2412.01380 null
2024-12-05 RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Geonho Lee et.al. 2412.01129 link
2024-12-02 TruncFormer: Private LLM Inference Using Only Truncations Patrick Yubeaton et.al. 2412.01042 null
2024-11-29 A dynamic parallel method for performance optimization on hybrid CPUs Luo Yu et.al. 2411.19542 null
2024-12-03 Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Akhiad Bercovich et.al. 2411.19146 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-28 MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache Akshat Sharma et.al. 2411.18077 null
2024-11-24 Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments Nikoleta Iliakopoulou et.al. 2411.17741 null
2024-11-26 PIM-AI: A Novel Architecture for High-Efficiency LLM Inference Cristobal Ortega et.al. 2411.17309 null
2024-11-26 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya et.al. 2411.17116 link
2024-11-26 Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation Chaoyi Jiang et.al. 2411.17089 link
2024-11-25 MixPE: Quantization and Hardware Co-design for Efficient LLM Inference Yu Zhang et.al. 2411.16158 null
2024-11-24 eFedLLM: Efficient LLM Inference Based on Federated Learning Shengwen Ding et.al. 2411.16003 null
2024-11-24 Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format Chao Fang et.al. 2411.15982 null
2024-11-24 Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems Wenxiang Lin et.al. 2411.15715 null
2024-11-22 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Yixin Dong et.al. 2411.15100 null
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu et.al. 2411.13157 null
2024-11-21 LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts Zhuohan Gu et.al. 2411.13009 null
2024-11-15 An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2 Pepijn de Reus et.al. 2411.12758 link
2024-11-19 SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference Jiho Shin et.al. 2411.12692 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-14 Squeezed Attention: Accelerating Long Context Length LLM Inference Coleman Hooper et.al. 2411.09688 link
2024-11-15 Communication Compression for Tensor Parallel LLM Inference Jan Hansen-Palmus et.al. 2411.09510 null
2024-11-14 Pie: Pooling CPU Memory for LLM Inference Yi Xu et.al. 2411.09317 null
2024-11-12 Towards Low-bit Communication for Tensor Parallel LLM Inference Harry Dong et.al. 2411.07942 null
2024-11-12 The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving Kyoungmin Kim et.al. 2411.07447 null
2024-11-08 AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality Ilias Bournias et.al. 2411.05555 null
2024-11-07 Hardware and Software Platform Inference Cheng Zhang et.al. 2411.05197 null
2024-11-07 SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference Gabriele Oliaro et.al. 2411.04975 link
2024-11-05 CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration Hongpeng Jin et.al. 2411.02829 null
2024-11-04 RAGViz: Diagnose and Visualize Retrieval-Augmented Generation Tevin Wang et.al. 2411.01751 link
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-02 RA-WEBs: Remote Attestation for WEB services Kosei Akama et.al. 2411.01340 null
2024-11-02 NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference Xuanlin Jiang et.al. 2411.01142 null
2024-11-01 LLM-Based Misconfiguration Detection for AWS Serverless Computing Jinfeng Wen et.al. 2411.00642 null
2024-11-04 ReverseNER: A Self-Generated Example-Driven Framework for Zero-Shot Named Entity Recognition with Large Language Models Anbang Wang et.al. 2411.00533 null
2024-11-01 Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung et.al. 2411.00348 null
2024-10-31 LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators Krishna Teja Chitty-Venkata et.al. 2411.00136 link
2024-10-31 Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim et.al. 2411.00066 link
2024-10-31 ALISE: Accelerating Large Language Model Serving with Speculative Scheduling Youpeng Zhao et.al. 2410.23537 null
2024-10-30 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao et.al. 2410.23079 link
2024-10-29 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang et.al. 2410.22480 link
2024-10-29 SVIP: Towards Verifiable Inference of Open-source Large Language Models Yifan Sun et.al. 2410.22307 null
2024-10-28 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun et.al. 2410.21465 link
2024-10-27 FIRP: Faster LLM inference via future intermediate representation prediction Pengfei Wu et.al. 2410.20488 null
2024-10-29 Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management Tuowei Wang et.al. 2410.19274 null
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-25 A Survey on Speech Large Language Models Jing Peng et.al. 2410.18908 null
2024-10-24 BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching Peizhuang Cong et.al. 2410.18701 null
2024-10-25 Fast Inference for Augmented Large Language Models Rana Shahout et.al. 2410.18248 null
2024-10-23 POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference Aditya K Kamath et.al. 2410.18038 link
2024-10-22 FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs Haoran Lin et.al. 2410.16663 null
2024-10-22 Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency Prafulla Kumar Choubey et.al. 2410.16597 null
2024-10-20 EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models Junhao Hu et.al. 2410.15332 null
2024-10-19 IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System Minseok Seo et.al. 2410.15008 null
2024-10-23 Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching Jie Peng et.al. 2410.14740 null
2024-10-18 A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference You Wu et.al. 2410.14442 link
2024-10-18 Revisiting SLO and Goodput Metrics in LLM Serving Zhibin Wang et.al. 2410.14257 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-10-17 Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo et.al. 2410.13835 link
2024-10-17 Progressive Mixed-Precision Decoding for Efficient LLM Inference Hao Mark Chen et.al. 2410.13461 null
2024-10-17 Data Defenses Against Large Language Models William Agnew et.al. 2410.13138 link
2024-10-19 In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng et.al. 2410.12876 null
2024-10-10 RecurFormer: Not All Transformer Heads Need Self-Attention Ruiqing Yan et.al. 2410.12850 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Yingyu Liang et.al. 2410.11261 null
2024-10-14 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Guangxuan Xiao et.al. 2410.10819 link
2024-10-16 SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization Akrit Mudvari et.al. 2410.10759 null
2024-10-12 Power-Softmax: Towards Secure LLM Inference over Encrypted Data Itamar Zimerman et.al. 2410.09457 null
2024-10-09 SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Heming Xia et.al. 2410.06916 link
2024-10-08 ParallelSpec: Parallel Drafter for Efficient Speculative Decoding Zilin Xiao et.al. 2410.05589 null
2024-10-06 RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference Yige Xu et.al. 2410.04519 link
2024-10-14 Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li et.al. 2410.04466 link
2024-10-04 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation Aurick Qiao et.al. 2410.03960 null
2024-10-04 EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik et.al. 2410.03185 link
2024-10-03 LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences Zhenxiao Fu et.al. 2410.02950 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-04 LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services Małgorzata Łazuka et.al. 2410.02425 null
2024-10-04 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation Xiaoqun Liu et.al. 2410.02220 null
2024-10-02 Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads Yuxiang Huang et.al. 2410.01805 link
2024-10-02 ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving Yifan Qiao et.al. 2410.01228 null
2024-10-02 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Zonghang Li et.al. 2410.00531 null
2024-09-30 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Linke Song et.al. 2409.20002 null
2024-09-26 Control Industrial Automation System with Large Language Models Yuchen Xia et.al. 2409.18009 link
2024-09-26 Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores Shaobo Ma et.al. 2409.17870 null
2024-09-25 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Zhenmei Shi et.al. 2409.17422 link
2024-09-25 Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations Amey Agrawal et.al. 2409.17264 null
2024-09-25 Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference Zongyue Qin et.al. 2409.16560 null
2024-09-25 AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization Yifan Tan et.al. 2409.16546 link
2024-09-23 Eagle: Efficient Training-Free Router for Multi-LLM Inference Zesen Zhao et.al. 2409.15518 null
2024-09-24 UELLM: A Unified and Efficient Approach for LLM Inference Serving Yiyuan He et.al. 2409.14961 null
2024-09-22 RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph Linxi Wei et.al. 2409.14556 null
2024-09-16 Do Large Language Models Need a Content Delivery Network? Yihua Cheng et.al. 2409.13761 link
2024-09-19 PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) Mahmoud Nazzal et.al. 2409.12699 link
2024-09-12 LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs Han Xu et.al. 2409.11424 null
2024-09-04 ISO: Overlap of Computation and Communication within Seqenence For LLM Inference Bin Xiao et.al. 2409.11155 null
2024-09-18 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Di Liu et.al. 2409.10516 link
2024-09-08 InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Xiurui Pan et.al. 2409.04992 null
2024-09-07 Achieving Peak Performance for Large Language Models: A Systematic Review Zhyar Rzgar K Rostam et.al. 2409.04833 null
2024-09-06 A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage Huan Yang et.al. 2409.04040 null
2024-09-13 Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study Jianwei Zhu et.al. 2409.03992 null
2024-09-05 Sirius: Contextual Sparsity with Correction for Efficient LLMs Yang Zhou et.al. 2409.03856 link
2024-08-31 HSF: Defending against Jailbreak Attacks with Hidden State Filtering Cheng Qian et.al. 2409.03788 null
2024-09-03 Contemporary Model Compression on Large Language Models Inference Dong Liu et.al. 2409.01990 link
2024-09-02 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification Junhui He et.al. 2409.01366 link
2024-09-04 Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference Barys Liskavets et.al. 2409.01227 link
2024-09-01 Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product) Xu-Hao Chen et.al. 2409.00661 null
2024-08-28 Decentralized LLM Inference over Edge Networks with Energy Harvesting Aria Khoshsirat et.al. 2408.15907 null
2024-08-28 Efficient LLM Scheduling by Learning to Rank Yichao Fu et.al. 2408.15792 link
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-22 NanoFlow: Towards Optimal Large Language Model Serving Throughput Kan Zhu et.al. 2408.12757 link
2024-09-04 Parallel Speculative Decoding with Adaptive Draft Length Tianyu Liu et.al. 2408.11850 link
2024-08-21 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Elias Frantar et.al. 2408.11743 link
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-19 PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars Sumanth Prabhu et.al. 2408.08869 null
2024-08-23 ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Chao Zeng et.al. 2408.08554 link
2024-08-14 LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference Seungjae Moon et.al. 2408.07326 null
2024-08-12 LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Zhiwen Mo et.al. 2408.06003 null
2024-08-10 LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale Jaehong Cho et.al. 2408.05499 link
2024-08-05 SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving Andreas Kosmas Kakolyris et.al. 2408.05235 null
2024-08-08 Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning Ke Cheng et.al. 2408.04323 null
2024-08-07 Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference Zeyu Zhang et.al. 2408.04107 null
2024-08-07 MPC-Minimized Secure LLM Inference Deevashwer Rathee et.al. 2408.03561 null
2024-08-05 Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning Hao Zhou et.al. 2408.02549 null
2024-08-02 The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Matias Martinez et.al. 2408.01050 null
2024-08-01 DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency Jovan Stojkovic et.al. 2408.00741 null
2024-08-01 Designing Efficient LLM Accelerators for Edge Devices Jude Haris et.al. 2408.00462 null
2024-08-01 Large Language Model (LLM)-enabled In-context Learning for Wireless Network Optimization: A Case Study of Power Control Hao Zhou et.al. 2408.00214 null
2024-07-23 ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao et.al. 2408.00008 null
2024-08-01 Responsive ML inference in multi-tenanted environments using AQUA Abhishek Vijaya Kumar et.al. 2407.21255 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-29 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab et.al. 2407.19825 null
2024-07-29 Teaching LLMs at Charles University: Assignments and Activities Jindřich Helcl et.al. 2407.19798 null
2024-07-22 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Hanlin Tang et.al. 2407.15891 null
2024-07-22 vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving Jiale Xu et.al. 2407.15309 link
2024-07-19 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Qichen Fu et.al. 2407.14057 null
2024-07-17 Struct-X: Enhancing Large Language Models Reasoning with Structured Data Xiaoyu Tan et.al. 2407.12522 null
2024-07-17 LLM Inference Serving: Survey of Recent Advances and Opportunities Baolin Li et.al. 2407.12391 null
2024-07-17 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Ayush Kaushal et.al. 2407.12327 link
2024-07-16 PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Branden Butler et.al. 2407.11798 null
2024-07-21 Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference Yuan Feng et.al. 2407.11550 link
2024-07-15 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo et.al. 2407.10960 link
2024-07-12 Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference Zongyue Qin et.al. 2407.09722 null
2024-09-02 Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems Amey Agrawal et.al. 2407.07000 null
2024-07-08 Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU Daliang Xu et.al. 2407.05858 link
2024-07-07 A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length Yuqing Yang et.al. 2407.05347 null
2024-07-05 Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design Yiyang Huang et.al. 2407.04292 link
2024-07-04 Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems Grant Wilkins et.al. 2407.04014 null
2024-07-02 MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Huiqiang Jiang et.al. 2407.02490 link
2024-06-29 Teola: Towards End-to-End Optimization of LLM-based Applications Xin Tan et.al. 2407.00326 link
2024-06-25 T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jianyu Wei et.al. 2407.00088 link
2024-06-28 InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Wonbeom Lee et.al. 2406.19707 null
2024-06-24 Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters Euiin Yi et.al. 2406.16758 link
2024-06-28 SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Qianchao Zhu et.al. 2406.15486 null
2024-06-21 Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models Qi Liu et.al. 2406.14848 link
2024-06-20 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein et.al. 2406.14546 link
2024-06-20 LiveMind: Low-latency Large Language Models with Simultaneous Inference Chuangtao Chen et.al. 2406.14319 link
2024-06-19 SDQ: Sparse Decomposed Quantization for LLM Inference Geonhwa Jeong et.al. 2406.13868 null
2024-06-19 Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style Zeping Li et.al. 2406.13170 null
2024-06-16 Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization Jungi Lee et.al. 2406.12930 null
2024-06-18 LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization Masafumi Enomoto et.al. 2406.12494 null
2024-06-18 LLMs Are Prone to Fallacies in Causal Inference Nitish Joshi et.al. 2406.12158 null
2024-06-14 Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning Hui Liu et.al. 2406.11890 null
2024-06-17 Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference Donghyeon Joo et.al. 2406.11674 null
2024-06-17 QTIP: Quantization with Trellises and Incoherence Processing Albert Tseng et.al. 2406.11235 link
2024-06-16 New Solutions on LLM Acceleration, Optimization, and Application Yingbing Huang et.al. 2406.10903 null
2024-06-16 Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang et.al. 2406.10774 link
2024-06-15 Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study Hao Hao et.al. 2406.10675 link
2024-06-08 QCQA: Quality and Capacity-aware grouped Query Attention Vinay Joshi et.al. 2406.10247 null
2024-06-12 Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference Christopher Wolters et.al. 2406.08413 null
2024-06-12 PowerInfer-2: Fast Large Language Model Inference on a Smartphone Zhenliang Xue et.al. 2406.06282 null
2024-06-09 A Superalignment Framework in Autonomous Driving with Large Language Models Xiangrui Kong et.al. 2406.05651 null
2024-06-06 Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Jiahao Liu et.al. 2406.03853 null
2024-06-04 Language Models can Infer Action Semantics for Classical Planners from Environment Feedback Wang Zhu et.al. 2406.02791 null
2024-06-08 Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach Yuxuan Chen et.al. 2406.02616 null
2024-06-04 SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Ruslan Svirschevski et.al. 2406.02532 link
2024-06-03 Demystifying Platform Requirements for Diverse LLM Inference Use Cases Abhimanyu Bambhaniya et.al. 2406.01698 link
2024-06-03 PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration Ziqian Zeng et.al. 2406.01394 null
2024-06-01 A Practice-Friendly Two-Stage LLM-Enhanced Paradigm in Sequential Recommendation Dugang Liu et.al. 2406.00333 null
2024-05-31 No Free Lunch Theorem for Privacy-Preserving LLM Inference Xiaojin Zhang et.al. 2405.20681 null
2024-05-30 Decentralized AI: Permissionless LLM Inference on POKT Network Daniel Olshansky et.al. 2405.20450 null
2024-06-01 S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Wei Zhong et.al. 2405.20314 null
2024-05-30 Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models Yuxiao Luo et.al. 2405.19850 null
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN Fei Zheng et.al. 2405.18744 null
2024-06-02 Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Hao Mark Chen et.al. 2405.18628 link
2024-05-25 FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference Chenqi Lin et.al. 2405.16241 null
2024-05-23 EdgeShard: Efficient LLM Inference via Collaborative Edge Computing Mingjin Zhang et.al. 2405.14371 null
2024-05-23 MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Akide Liu et.al. 2405.14366 null
2024-05-21 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference Dongjie Yang et.al. 2405.12532 null
2024-05-12 Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization Xinyuan Zhang et.al. 2405.07140 null
2024-05-11 Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving Chengyi Nie et.al. 2405.06856 null
2024-05-21 Vidur: A Large-Scale Simulation Framework For LLM Inference Amey Agrawal et.al. 2405.05465 link
2024-05-13 KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Minsik Cho et.al. 2405.05329 null
2024-05-12 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature Dawei Li et.al. 2405.04819 link
2024-05-10 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Yujun Lin et.al. 2405.04532 link
2024-05-07 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Ramya Prabhu et.al. 2405.04437 link
2024-05-07 Optimizing Language Model's Reasoning Abilities with Weak Supervision Yongqi Tong et.al. 2405.04086 null
2024-05-06 AlphaMath Almost Zero: process Supervision without process Guoxin Chen et.al. 2405.03553 link
2024-05-03 Efficient and Economic Large Language Model Inference with Attention Offloading Shaoyuan Chen et.al. 2405.01814 null

(back to top)

MoE

Publish Date Title Authors PDF Code
2026-03-06 RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering Gaia A. Bertolino et.al. 2603.06542 null
2026-03-06 A Mixture-of-Experts Framework for Practical Hybrid-Quantum Models in Credit Card Fraud Detection Rodrigo Chaves et.al. 2603.06473 null
2026-03-06 MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis Dongqing Xie et.al. 2603.06378 null
2026-03-06 MoEless: Efficient MoE LLM Serving via Serverless Computing Hanfei Yu et.al. 2603.06350 null
2026-03-06 WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection Peng Chen et.al. 2603.06313 null
2026-03-06 GazeMoE: Perception of Gaze Target with Mixture-of-Experts Zhuangzhuang Dai et.al. 2603.06256 null
2026-03-06 EvoESAP: Non-Uniform Expert Pruning for Sparse MoE Zongfang Liu et.al. 2603.06003 null
2026-03-06 MoE Lens -- An Expert Is All You Need Marmik Chaudhari et.al. 2603.05806 null
2026-03-06 Sparse Crosscoders for diffing MoEs and Dense models Marmik Chaudhari et.al. 2603.05805 null
2026-03-05 Change Point Detection for Cell Populations Measured via Flow Cytometry Yik Lun Kei et.al. 2603.05700 null
2026-03-05 NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension Rongzhi Li et.al. 2603.05046 null
2026-03-05 Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation Yilong Chen et.al. 2603.04971 null
2026-03-05 Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling Yong Liu et.al. 2603.04791 null
2026-03-05 TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings Yebo Wu et.al. 2603.04772 null
2026-03-04 ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model Yuhao Xu et.al. 2603.04589 null
2026-03-04 Augmenting representations with scientific papers Nicolò Oreste Pinciroli Vago et.al. 2603.04516 null
2026-03-04 Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle Nitay Mayo et.al. 2603.04377 null
2026-03-04 RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation Yixin Chen et.al. 2603.04348 null
2026-03-04 CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation Jinfeng Xu et.al. 2603.04320 null
2026-03-04 UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization Qianfeng Yang et.al. 2603.03967 null
2026-03-03 Modeling Cross-vision Synergy for Unified Large Vision Model Shengqiong Wu et.al. 2603.03564 null
2026-03-03 Beyond Language Modeling: An Exploration of Multimodal Pretraining Shengbang Tong et.al. 2603.03276 null
2026-03-04 MoECLIP: Patch-Specialized Experts for Zero-shot Anomaly Detection Jun Yeong Park et.al. 2603.03101 null
2026-03-03 CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots Shihao Ma et.al. 2603.03067 null
2026-03-03 EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education Baoliang Chen et.al. 2603.03066 null
2026-03-03 Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs Wuyue Zhang et.al. 2603.02731 null
2026-03-03 TenExp: Mixture-of-Experts-Based Tensor Decomposition Structure Search Framework Ting-Wei Zhou et.al. 2603.02720 null
2026-03-03 MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration Lingshun Kong et.al. 2603.02710 null
2026-03-03 Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data Sijie Mai et.al. 2603.02695 null
2026-03-03 Robust Heterogeneous Analog-Digital Computing for Mixture-of-Experts Models with Theoretical Generalization Guarantees Mohammed Nowaz Rabbani Chowdhury et.al. 2603.02633 null
2026-03-02 DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks Gökdeniz Gülmez et.al. 2603.01697 null
2026-03-02 PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification Jian Yu et.al. 2603.01547 null
2026-03-02 Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification Jiayang Wu et.al. 2603.01511 null
2026-03-02 UETrack: A Unified and Efficient Framework for Single Object Tracking Ben Kang et.al. 2603.01412 null
2026-03-02 Fed-GAME: Personalized Federated Learning with Graph Attention Mixture-of-Experts For Time-Series Forecasting Yi Li et.al. 2603.01363 null
2026-03-01 Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning Hamed Damirchi et.al. 2603.01326 null
2026-03-01 TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading Yudong Pan et.al. 2603.01058 null
2026-03-01 Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving Xubo Zhu et.al. 2603.01007 null
2026-02-28 MME: Mixture of Mesh Experts with Random Walk Transformer Gating Amir Belder et.al. 2603.00828 null
2026-02-27 Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization Chenwei Jia et.al. 2602.24059 null
2026-02-26 Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG Hanning Guo et.al. 2602.23410 null
2026-02-26 A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations Soumya Dutta et.al. 2602.23300 null
2026-02-26 Learning Physical Operators using Neural Operators Vignesh Gopakumar et.al. 2602.23113 null
2026-02-26 pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation Shentong Mo et.al. 2602.22938 null
2026-02-26 Switch-Hurdle: A MoE Encoder with AR Hurdle Decoder for Intermittent Demand Forecasting Fabian Muşat et.al. 2602.22685 null
2026-02-26 Predictive variational inference for flexible regression models Lucas Kock et.al. 2602.22582 null
2026-02-25 NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training Dengdi Sun et.al. 2602.22059 null
2026-02-25 Excitation: Momentum For Experts Sagi Shaier et.al. 2602.21798 null
2026-02-25 Learning from Yesterday's Error: An Efficient Online Learning Method for Traffic Demand Prediction Xiannan Huang et.al. 2602.21757 null
2026-02-25 TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts Jiafeng Lin et.al. 2602.21693 null
2026-02-25 Multi-Layer Scheduling for MoE-Based LLM Reasoning Yifan Sun et.al. 2602.21626 null
2026-02-24 Dual-Branch INS/GNSS Fusion with Inequality and Equality Constraints Mor Levenhar et.al. 2602.21266 null
2026-02-25 GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer Wenbo Yu et.al. 2602.20871 null
2026-02-24 Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA Nuocheng Yang et.al. 2602.20492 null
2026-02-23 The Universal Eccentricity Distribution for Dynamical Gravitational-Wave Merger Channels Mor Rozner et.al. 2602.20110 null
2026-02-23 Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction Ha-Anh Hoang Nguyen et.al. 2602.19987 null
2026-02-23 A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs Zijie Liu et.al. 2602.19938 null
2026-02-23 Towards Dexterous Embodied Manipulation via Deep Multi-Sensory Fusion and Sparse Expert Scaling Yirui Sun et.al. 2602.19764 null
2026-02-23 RAID: Retrieval-Augmented Anomaly Detection Mingxiu Cai et.al. 2602.19611 null
2026-02-23 Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles David Fraile Navarro et.al. 2602.19507 null
2026-02-23 EMS-FL: Federated Tuning of Mixture-of-Experts in Satellite-Terrestrial Networks via Expert-Driven Model Splitting Angzi Xu et.al. 2602.19485 null
2026-02-22 Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts Toshihide Ubukata et.al. 2602.19244 null
2026-02-22 SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation Yujie Lu et.al. 2602.19213 null
2026-02-22 JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Kai Liu et.al. 2602.19163 null
2026-02-22 Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection Hossein Shokouhinejad et.al. 2602.19025 null
2026-02-21 Give Users the Wheel: Towards Promptable Recommendation Paradigm Fuyuan Lyu et.al. 2602.18929 null
2026-02-20 Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory Vatsal Agarwal et.al. 2602.18434 null
2026-02-19 Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds Ibne Farabi Shihab et.al. 2602.17798 null
2026-02-19 Phase-Aware Mixture of Experts for Agentic Reinforcement Learning Shengtian Yang et.al. 2602.17038 null
2026-02-19 Arcee Trinity Large Technical Report Varun Singh et.al. 2602.17004 null
2026-02-18 Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes Srikumar Nayak et.al. 2602.16109 null
2026-02-17 MoE-Spec: Expert Budgeting for Efficient Speculative Decoding Bradley McDanel et.al. 2602.16052 null
2026-02-17 ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns Ziyu Zhao et.al. 2602.15521 null
2026-02-16 Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs Ali Khalesi et.al. 2602.15091 null
2026-02-15 DeepFusion: Accelerating MoE Training via Federated Knowledge Distillation from Heterogeneous Edge Devices Songyuan Li et.al. 2602.14301 null
2026-02-15 MILD: Multi-Intent Learning and Disambiguation for Proactive Failure Prediction in Intent-based Networking Md. Kamrul Hossain et.al. 2602.14283 null
2026-02-15 Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection Pinqiao Wang et.al. 2602.14251 null
2026-02-15 Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization Rizhen Hu et.al. 2602.14159 null
2026-02-15 LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts Yang Liu et.al. 2602.14060 null
2026-02-15 Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models Sajjad Kachuee et.al. 2602.14039 null
2026-02-15 Eureka-Audio: Triggering Audio Intelligence in Compact Language Models Dan Zhang et.al. 2602.13954 null
2026-02-14 Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening The Tien Mai et.al. 2602.13888 null
2026-02-13 Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning Jon Irureta et.al. 2602.12708 null
2026-02-13 Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers Anrui Chen et.al. 2602.12587 null
2026-02-13 SD-MoE: Spectral Decomposition for Effective Expert Specialization Ruijun Huang et.al. 2602.12556 null
2026-02-13 Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR Jaeyoung Lee et.al. 2602.12546 null
2026-02-12 Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration Akhiad Bercovich et.al. 2602.11937 null
2026-02-12 LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training Xinyi Liu et.al. 2602.11686 null
2026-02-12 Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts Haiyang Jiang et.al. 2602.11622 null
2026-02-12 Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Jinrui Zhang et.al. 2602.11543 null
2026-02-11 Demonstration and performance of an online data selection algorithm for liquid argon time projection chambers using MicroBooNE MicroBooNE collaboration et.al. 2602.11138 null
2026-02-11 MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs Yupu Gu et.al. 2602.10965 null
2026-02-11 CMAD: Cooperative Multi-Agent Diffusion via Stochastic Optimal Control Riccardo Barbano et.al. 2602.10933 null
2026-02-11 VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Guobin Shen et.al. 2602.10693 null
2026-02-11 Multimodal Priors-Augmented Text-Driven 3D Human-Object Interaction Generation Yin Wang et.al. 2602.10659 null
2026-02-11 Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Ailin Huang et.al. 2602.10604 null
2026-02-11 Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity Guangzhi Xiong et.al. 2602.10585 null
2026-02-10 Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching Hanyuan Gao et.al. 2602.10254 null
2026-02-10 Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning Ruopeng Cui et.al. 2602.09767 null
2026-02-10 DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment Bohan Fu et.al. 2602.09531 null
2026-02-10 SMES: Towards Scalable Multi-Task Recommendation via Expert Sparsity Yukun Zhang et.al. 2602.09386 null
2026-02-10 Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density Zhendong Mi et.al. 2602.09316 null
2026-02-09 Generalizing GNNs with Tokenized Mixture of Experts Xiaoguang Guo et.al. 2602.09258 null
2026-02-09 UI-Venus-1.5 Technical Report Veuns-Team et.al. 2602.09082 null
2026-02-09 DirMoE: Dirichlet-routed Mixture of Experts Amirhossein Vahidi et.al. 2602.09001 null
2026-02-09 OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation Yehua Huang et.al. 2602.08896 null
2026-02-09 FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models Annemette Brok Pirchert et.al. 2602.08818 null
2026-02-10 MOVA: Towards Scalable and Synchronized Video-Audio Generation SII-OpenMOSS Team et.al. 2602.08794 null
2026-02-09 Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views Duc-Anh Nguyen et.al. 2602.08755 null
2026-02-09 Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing Jona te Lintelo et.al. 2602.08741 null
2026-02-09 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks Mohamed Amine Ferrag et.al. 2602.08675 null
2026-02-09 Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models Mingzi Cao et.al. 2602.08658 null
2026-02-09 Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs Yukun Jiang et.al. 2602.08621 null
2026-02-09 TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration Linye Wei et.al. 2602.08404 null
2026-02-06 Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing Meng Lou et.al. 2602.06862 null
2026-02-06 POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models Yi Chen et.al. 2602.06822 null
2026-02-06 HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction Shengxuan Qiu et.al. 2602.06527 null
2026-02-05 To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training Meghana Madhyastha et.al. 2602.06183 null
2026-02-05 MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models Nurbek Tastan et.al. 2602.06154 null
2026-02-05 OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale Jingze Shi et.al. 2602.05711 null
2026-02-04 Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection Bharadwaj Dogga et.al. 2602.05100 null
2026-02-04 Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism Chenwei Cui et.al. 2602.04870 null
2026-02-04 ERNIE 5.0 Technical Report Haifeng Wang et.al. 2602.04705 null
2026-02-04 Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting Zhen Zhou et.al. 2602.04678 null
2026-02-04 RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models Jiacheng Liang et.al. 2602.04448 null
2026-02-04 Mixture of Masters: Sparse Chess Language Models with Player Routing Giacomo Frisoni et.al. 2602.04447 null
2026-02-04 Expert Selections In MoE Models Reveal (Almost) As Much As Text Amir Nuriyev et.al. 2602.04105 null
2026-02-03 SpecMD: A Comprehensive Study On Speculative Expert Prefetching Duc Hoang et.al. 2602.03921 null
2026-02-03 DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs Zeyu Zhu et.al. 2602.03495 null
2026-02-03 Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts Meng Lou et.al. 2602.03473 null
2026-02-03 VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers Zhiwen Li et.al. 2602.03210 null
2026-02-03 Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry Ye Su et.al. 2602.03204 null
2026-02-02 SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Qifan Yu et.al. 2602.02472 null
2026-02-02 Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models Noam Steinmetz Yalon et.al. 2602.02467 null
2026-02-02 From Directions to Regions: Decomposing Activations in Language Models via Local Geometry Or Shafran et.al. 2602.02464 null
2026-02-02 DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild Arnab Das et.al. 2602.02286 null
2026-02-02 MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology Susu Hu et.al. 2602.02282 null
2026-02-02 Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts Martin Determann et.al. 2602.02031 null
2026-02-02 SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning Zhen-Hao Xie et.al. 2602.01990 null
2026-02-02 Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition Wonjun Lee et.al. 2602.01967 null
2026-02-02 SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures Liangtao Lin et.al. 2602.01858 null
2026-02-02 Mutual-Guided Expert Collaboration for Cross-Subject EEG Classification Zhi Zhang et.al. 2602.01728 null
2026-01-31 Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity Fanqi Yan et.al. 2602.00939 null
2026-01-31 Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs Hao Mark Chen et.al. 2602.00879 null
2026-01-31 Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion Tianyang Wu et.al. 2602.00678 null
2026-01-31 SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement Xiangfei Qiu et.al. 2602.00589 null
2026-01-31 PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive Prefetching Qianchao Zhu et.al. 2602.00509 null
2026-01-30 UrbanMoE: A Sparse Multi-Modal Mixture-of-Experts Framework for Multi-Task Urban Region Profiling Pingping Liu et.al. 2601.22746 null
2026-01-30 A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization Shiye Lei et.al. 2601.22718 null
2026-01-30 A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation Haonan He et.al. 2601.22708 null
2026-01-30 Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments Jinwoo Jang et.al. 2601.22647 null
2026-01-30 SpanNorm: Reconciling Training Stability and Performance in Deep Transformers Chao Wang et.al. 2601.22580 null
2026-01-30 Continual Policy Distillation from Distributed Reinforcement Learning Teachers Yuxuan Li et.al. 2601.22475 null
2026-01-29 ECO: Quantized Training without Full-Precision Master Weights Mahdi Nikdan et.al. 2601.22101 null
2026-01-29 MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts Lorenzo Mazza et.al. 2601.21971 null
2026-01-29 MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts Evandro S. Ortigossa et.al. 2601.21866 null
2026-01-29 Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers Evandro S. Ortigossa et.al. 2601.21641 null
2026-01-29 Multi-Modal Time Series Prediction via Mixture of Modulated Experts Lige Zhang et.al. 2601.21547 null
2026-01-29 ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory Yang Zhao et.al. 2601.21545 null
2026-01-29 L $^3$ : Large Lookup Layers Albert Tseng et.al. 2601.21461 null
2026-01-29 L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts Minghao Yang et.al. 2601.21349 null
2026-01-29 Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies Ce Hao et.al. 2601.21251 null
2026-01-29 Scaling Embeddings Outperforms Scaling Experts in Language Models Hong Liu et.al. 2601.21204 null
2026-01-29 ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling Yuchen Yang et.al. 2601.21198 null
2026-01-29 BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding Ziyi Zhao et.al. 2601.21148 null
2026-01-29 TRACE: Trajectory Recovery for Continuous Mechanism Evolution in Causal Representation Learning Shicheng Fan et.al. 2601.21135 null
2026-01-28 ProfInfer: An eBPF-based Fine-Grained LLM Inference Profiler Bohua Zou et.al. 2601.20755 null
2026-01-28 Unsupervised Ensemble Learning Through Deep Energy-based Models Ariel Maymon et.al. 2601.20556 null
2026-01-28 OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Le Zhang et.al. 2601.20380 null
2026-01-28 OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion Shuoyan Wei et.al. 2601.20308 null
2026-01-28 MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting Jing Xu et.al. 2601.20300 null
2026-01-28 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH Yueyang Wang et.al. 2601.20255 null
2026-01-28 Control Models for In-IDE Code Completion Aral de Moor et.al. 2601.20223 null
2026-01-28 Hyperparameter Transfer with Mixture-of-Expert Layers Tianze Jiang et.al. 2601.20205 null
2026-01-27 Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts TrungKhang Tran et.al. 2601.19811 null
2026-01-27 Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes Yifan Wang et.al. 2601.19723 null
2026-01-27 Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition Isha Pandey et.al. 2601.19451 null
2026-01-26 Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot Fauna Robotics et.al. 2601.18963 null
2026-01-26 OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion Zhichao Wang et.al. 2601.18094 null
2026-01-26 LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts Venmugil Elango et.al. 2601.18089 null
2026-01-25 Domain-Expert-Guided Hybrid Mixture-of-Experts for Medical AI: Integrating Data-Driven Learning with Clinical Priors Jinchen Gu et.al. 2601.17977 null
2026-01-25 $\infty$ -MoE: Generalizing Mixture of Experts to Infinite Experts Shota Takashiro et.al. 2601.17680 null
2026-01-24 PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes Xinru Cui et.al. 2601.17440 null
2026-01-23 Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Xuan-Phi Nguyen et.al. 2601.17111 null
2026-01-22 FlashMoE: Reducing SSD I/O Bottlenecks via ML-Based Cache Replacement for Mixture-of-Experts Inference on Edge Devices Byeongju Kim et.al. 2601.17063 null
2026-01-23 GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints Andy Zhu et.al. 2601.16905 null
2026-01-23 Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation Tims Pecerskis et.al. 2601.16863 null
2026-01-23 LongCat-Flash-Thinking-2601 Technical Report Meituan LongCat Team et.al. 2601.16725 null
2026-01-22 LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting Yuhan Chen et.al. 2601.15772 null
2026-01-21 Improving MoE Compute Efficiency by Composing Weight and Data Sparsity Maciej Kilian et.al. 2601.15370 null
2026-01-21 Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization Adam Rokah et.al. 2601.15021 null
2026-01-21 Modeling the Thermal Behavior of Photopolymers for In-Space Fabrication Jonathan Ericson et.al. 2601.14897 null
2026-01-21 UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection Qingling Shu et.al. 2601.14797 null
2026-01-21 Robustness of Mixtures of Experts to Feature Noise Dong Sun et.al. 2601.14792 null
2026-01-20 Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models YuanLab. ai et.al. 2601.14327 null
2026-01-20 Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering Yuxin Chen et.al. 2601.14050 null
2026-01-20 DExTeR: Weakly Semi-Supervised Object Detection with Class and Instance Experts for Medical Imaging Adrien Meyer et.al. 2601.13954 null
2026-01-20 The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) II. The radial structure of debris discs Yinuo Han et.al. 2601.13670 null
2026-01-20 MN-TSG:Continuous Time Series Generation with Irregular Observations Xu Zhang et.al. 2601.13534 null
2026-01-19 CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks Mingshuang Luo et.al. 2601.13133 null
2026-01-19 Polychronous Wave Computing: Timing-Native Address Selection in Spiking Networks Natalila G. Berloff et.al. 2601.13079 null
2026-01-19 PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning Zhiyan Hou et.al. 2601.13020 null
2026-01-19 HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads Xiaohui Zhao et.al. 2601.13013 null
2026-01-19 OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models Shiyuan Li et.al. 2601.12996 null
2026-01-19 PhyG-MoE: A Physics-Guided Mixture-of-Experts Framework for Energy-Efficient GNSS Interference Recognition Zhihan Zeng et.al. 2601.12798 null
2026-01-18 The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) V: Comparison between scattered light and thermal emission J. Milli et.al. 2601.12586 null
2026-01-18 A Mixture of Experts Vision Transformer for High-Fidelity Surface Code Decoding Hoang Viet Nguyen et.al. 2601.12483 null
2026-01-18 Learning Diverse Skills for Behavior Models with Mixture of Experts Wangtian Shen et.al. 2601.12397 null
2026-01-18 NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages Lakshya Tomar et.al. 2601.12389 null
2026-01-18 GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer Xinyuan Zhao et.al. 2601.12316 null
2026-01-18 Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation Mingrui Liu et.al. 2601.12301 null
2026-01-17 EMoE: Eigenbasis-Guided Routing for Mixture-of-Experts Anzhe Cheng et.al. 2601.12137 null
2026-01-17 The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) III: The vertical structure of debris disks Brianna Zawadzki et.al. 2601.12128 null
2026-01-17 One-Shot Price Forecasting with Covariate-Guided Experts under Privacy Constraints Ren He et.al. 2601.11977 null
2026-01-16 The ALMA survey to Resolve exoKuiper belt Substructures (ARKS) VII: Optically thick gas with broad CO gaussian local line profiles in the HD 121617 disc A. Brennan et.al. 2601.11824 null
2026-01-16 Self-Augmented Mixture-of-Experts for QoS Prediction Kecheng Cai et.al. 2601.11036 null
2026-01-16 RobuMTL: Enhancing Multi-Task Learning Robustness Against Weather Conditions Tasneem Shaffee et.al. 2601.10921 null
2026-01-15 MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts Yuxuan Lou et.al. 2601.10272 null
2026-01-15 MMPG: MoE-based Adaptive Multi-Perspective Graph Fusion for Protein Representation Learning Yusong Wang et.al. 2601.10157 null
2026-01-14 Progressive Mixture-of-Experts with autoencoder routing for continual RANS turbulence modelling Haoyu Ji et.al. 2601.09305 null
2026-01-15 A.X K1 Technical Report Sung Jun Cheon et.al. 2601.09200 null
2026-01-14 WiFo-E: A Scalable Wireless Foundation Model for End-to-End FDD Precoding in Communication Networks Weibo Wen et.al. 2601.09186 null
2026-01-14 Horseshoe Mixtures-of-Experts (HS-MoE) Nick Polson et.al. 2601.09043 null
2026-01-13 LookAhead: The Optimal Non-decreasing Index Policy for a Time-Varying Holding Cost problem Keerthana Gurushankar et.al. 2601.08960 null
2026-01-13 MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm Bowen Zhou et.al. 2601.08800 null
2026-01-13 LWM-Spectro: A Foundation Model for Wireless Baseband Signal Spectrograms Namhyun Kim et.al. 2601.08780 null
2026-01-13 M $^2$ FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting Yaohui Huang et.al. 2601.08631 null
2026-01-13 Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance Jihang Li et.al. 2601.08418 null
2026-01-13 Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models Bo Wang et.al. 2601.08383 null
2026-01-13 Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints Seng Pei Liew et.al. 2601.08215 null
2026-01-12 Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation Yuxin Yang et.al. 2601.07935 null
2026-01-12 Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator Chaewon Heo et.al. 2601.07698 null
2026-01-12 Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Xin Cheng et.al. 2601.07372 null
2026-01-11 Solar Open Technical Report Sungrae Park et.al. 2601.07022 null
2026-01-11 Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems Qikai Xiao et.al. 2601.06858 null
2026-01-11 MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models Xin Ye et.al. 2601.06857 null
2026-01-11 MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation Bochao Sun et.al. 2601.06829 null
2026-01-11 SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute Bowen Shen et.al. 2601.06790 null
2026-01-10 Hellinger Multimodal Variational Autoencoders Huyen Khanh Vo et.al. 2601.06572 null
2026-01-10 Physics-guided foundation model for universal speckle removal in ultrathin multimode fiber imaging Xianrui Zeng et.al. 2601.06448 null
2026-01-09 Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning Nusrat Jahan Prottasha et.al. 2601.06356 null
2026-01-09 Reconstruction of atmospheric neutrinos in DUNE's horizontal-drift far-detector module DUNE Collaboration et.al. 2601.05697 null
2026-01-09 Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts Wei Zhou et.al. 2601.05537 null
2026-01-08 MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs Jiyuan Zhang et.al. 2601.05296 null
2026-01-08 MoE3D: A Mixture-of-Experts Module for 3D Reconstruction Zichen Wang et.al. 2601.05208 null
2026-01-08 FaST: Efficient and Effective Long-Horizon Forecasting for Large-Scale Spatial-Temporal Graphs via Mixture-of-Experts Yiji Zhao et.al. 2601.05174 null
2026-01-08 How to Set the Learning Rate for Large-Scale Pre-training? Yunhua Zhou et.al. 2601.05049 null
2026-01-08 DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation Guanzhi Deng et.al. 2601.04823 null
2026-01-07 A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems Qi Wu et.al. 2601.03992 null
2026-01-07 Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures Ibrahim Delibasoglu et.al. 2601.03889 null
2026-01-07 Variational Inference, Entropy, and Orthogonality: A Unified Theory of Mixture-of-Experts Ye Su et.al. 2601.03577 null
2026-01-07 CALM: Culturally Self-Aware Language Models Lingzhi Shen et.al. 2601.03483 null
2026-01-06 The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Yan Wang et.al. 2601.03425 null
2026-01-06 ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios Yihan Wei et.al. 2601.03011 null
2026-01-06 MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free Yishu Lei et.al. 2601.02967 null
2026-01-06 MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation Wenzhao Jiang et.al. 2601.02943 null
2026-01-06 MiMo-V2-Flash Technical Report Bangjun Xiao et.al. 2601.02780 null
2026-01-05 Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts Boxuan Lyu et.al. 2601.02144 null
2026-01-05 GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection Joongwon Chae et.al. 2601.01856 null
2026-01-05 K-EXAONE Technical Report Eunbi Choi et.al. 2601.01739 null
2026-01-05 Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications YuanLab. ai et.al. 2601.01718 null
2026-01-05 Varying-Coefficient Mixture of Experts Model Qicheng Zhao et.al. 2601.01699 null
2026-01-04 Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts Ruofeng Yang et.al. 2601.01475 null
2026-01-04 Making MoE based LLM inference resilient with Tarragon Songyu Zhang et.al. 2601.01310 null
2026-01-03 MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance Hamad Khan et.al. 2601.01260 null
2026-01-02 Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures Kabir Grover et.al. 2601.00942 null
2026-01-02 HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts Zihan Fang et.al. 2601.00583 null
2026-01-01 Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations Hyunjun Kim et.al. 2601.00457 null
2026-01-01 Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach Kohei Yoshikawa et.al. 2601.00287 null
2025-12-31 Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models Ákos Prucs et.al. 2512.24776 null
2026-01-01 Sufficient and Necessary Conditions for Eckart-Young like Result for Tubal Tensors Uria Mor et.al. 2512.24405 null
2025-12-30 Quantum Computing, Ising Formulation, and the Traveling Salesman Problem Omer Gurevich et.al. 2512.24308 null
2025-12-30 Training Report of TeleChat3-MoE Xinzhang Liu et.al. 2512.24157 null
2025-12-30 RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress Ruixuan Huang et.al. 2512.23995 null
2025-12-30 Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation Hualin Ye et.al. 2512.23938 null
2025-12-29 Dynamic Subspace Composition: Efficient Adaptation via Contractive Basis Expansion Vladimer Khasia et.al. 2512.23448 null
2025-12-29 Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Ang Lv et.al. 2512.23447 null
2025-12-30 YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Xu Lin et.al. 2512.23273 null
2025-12-28 Trust Region Masking for Long-Horizon LLM Reinforcement Learning Yingru Li et.al. 2512.23075 null
2025-12-28 FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment Boyang Zhang et.al. 2512.23070 null
2025-12-28 Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware Alex Khalil et.al. 2512.23029 null
2025-12-28 Text-Routed Sparse Mixture-of-Experts Model with Explanation and Temporal Alignment for Multi-Modal Sentiment Analysis Dongning Rao et.al. 2512.22741 null
2025-12-27 Bright 4B: Scaling Hyperspherical Learning for Segmentation in 3D Brightfield Microscopy Amil Khan et.al. 2512.22423 null
2025-12-26 FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion Zhuoran Zhu et.al. 2512.22036 null
2025-12-26 SWE-RM: Execution-free Feedback For Software Engineering Agents KaShun Shum et.al. 2512.21919 null
2025-12-26 Accelerate Speculative Decoding with Sparse Computation in Verification Jikai Wang et.al. 2512.21911 null
2025-12-26 MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction Carolina Aparício et.al. 2512.21897 null
2025-12-25 Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction Zheng Yin et.al. 2512.21707 null
2025-12-25 Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism Xinglin Pan et.al. 2512.21487 null
2025-12-24 DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction Khondoker Mirazul Mumenin et.al. 2512.21433 null
2025-12-25 GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs Lichao Wu et.al. 2512.21008 null
2025-12-24 RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks Ningyuan Liu et.al. 2512.20920 null
2025-12-24 NVIDIA Nemotron 3: Efficient and Open Intelligence NVIDIA et.al. 2512.20856 null
2025-12-23 Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning NVIDIA et.al. 2512.20848 null
2025-12-23 Defending against adversarial attacks using mixture of experts Mohammad Meymani et.al. 2512.20821 null
2025-12-23 MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts Alexandros Christoforos et.al. 2512.20604 null
2025-12-23 Branch Learning in MRI: More Data, More Models, More Training Yuyang Li et.al. 2512.20330 null
2025-12-23 Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity Yuxing Gan et.al. 2512.20291 null
2025-12-23 Degradation-Aware Metric Prompting for Hyperspectral Image Restoration Binfeng Wang et.al. 2512.20251 null
2025-12-23 AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model Sofian Chaybouti et.al. 2512.20157 null
2025-12-22 UCCL-EP: Portable Expert-Parallel Communication Ziming Mao et.al. 2512.19849 null
2025-12-22 Towards Closed-Loop Embodied Empathy Evolution: Probing LLM-Centric Lifelong Empathic Motion Generation in Unseen Scenarios Jiawen Wang et.al. 2512.19551 null
2025-12-22 EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control Chao Yang et.al. 2512.19043 null
2025-12-21 Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation Guangtao Lyu et.al. 2512.18804 null
2025-12-21 Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts Linwei Qiu et.al. 2512.18718 null
2025-12-21 Remoe: Towards Efficient and Low-Cost MoE Inference in Serverless Computing Wentao Liu et.al. 2512.18674 null
2025-12-20 Secret mixtures of experts inside your LLM Enric Boix-Adsera et.al. 2512.18452 null
2025-12-20 MoE Pathfinder: Trajectory-driven Expert Pruning Xican Yang et.al. 2512.18425 null
2025-12-20 MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation Kaixing Yang et.al. 2512.18181 null
2025-12-19 MoE-TransMov: A Transformer-based Model for Next POI Prediction in Familiar & Unfamiliar Movements Ruichen Tan et.al. 2512.17985 null
2025-12-22 SCOPE: Sequential Causal Optimization of Process Interventions Jakob De Moor et.al. 2512.17629 null
2025-12-18 Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation Zhenyu Liu et.al. 2512.17073 null
2025-12-18 Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models Zhongpan Tang et.al. 2512.16963 null
2025-12-18 An Upper Bound on the M/M/k Queue With Deterministic Setup Times Jalani Williams et.al. 2512.16854 null
2025-12-18 Meta-RL Induces Exploration in Language Agents Yulun Jiang et.al. 2512.16848 null
2025-12-18 PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation Mengyuan Liu et.al. 2512.16494 null
2025-12-18 Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems En-Ming Huang et.al. 2512.16473 null
2025-12-18 Pretrained Battery Transformer (PBT): A battery life prediction foundation model Ruifeng Tan et.al. 2512.16334 null
2025-12-19 Sigma-MoE-Tiny Technical Report Qingguo Hu et.al. 2512.16248 null
2025-12-18 INTELLECT-3: Technical Report Prime Intellect Team et.al. 2512.16144 null
2025-12-18 Let the Barbarians In: How AI Can Accelerate Systems Performance Research Audrey Cheng et.al. 2512.14806 null
2025-12-15 SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning Tomohito Kawabata et.al. 2512.14757 null
2025-12-16 SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing Han Zou et.al. 2512.14140 null
2025-12-16 SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Wentao Guo et.al. 2512.14080 null
2025-12-16 Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training Can Jin et.al. 2512.13996 null
2025-12-13 RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing Yuhan Tang et.al. 2512.13727 null
2025-12-15 StutterFuse: Mitigating Modality Collapse in Stuttering Detection with Jaccard-Weighted Metric Learning and Gated Fusion Guransh Singh et.al. 2512.13632 null
2025-12-16 Janus: Disaggregating Attention and Experts for Scalable MoE Inference Zhexiang Zhang et.al. 2512.13525 null
2025-12-15 Automated Information Flow Selection for Multi-scenario Multi-task Recommendation Chaohua Yang et.al. 2512.13396 null
2025-12-13 Fine-Grained Zero-Shot Learning with Attribute-Centric Representations Zhi Chen et.al. 2512.12219 null
2025-12-13 MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models Ahmad Chamma et.al. 2512.12121 null
2025-12-11 Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning Benjamin Gundersen et.al. 2512.10691 null
2025-12-11 Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration Wenlong Jiao et.al. 2512.10581 null
2025-12-11 Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment Han Li et.al. 2512.10450 null
2025-12-10 Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach Salvador Carrión et.al. 2512.09910 null
2025-12-10 DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation Zhizhong Wang et.al. 2512.09814 null
2025-12-10 M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks Blessed Guda et.al. 2512.09797 null
2025-12-10 FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model Xiang Chen et.al. 2512.09282 null
2025-12-10 Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens Yanpeng Yu et.al. 2512.09277 null
2025-12-09 Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts Yifan Lyu et.al. 2512.08814 null
2025-12-09 What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance Athena Psalta et.al. 2512.08697 null
2025-12-09 Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems Mingwei Li et.al. 2512.08411 null
2025-12-08 LongCat-Image Technical Report Meituan LongCat Team et.al. 2512.07584 null
2025-12-08 Search for Light Sterile Neutrinos With Two Neutrino Beams at MicroBooNE MicroBooNE collaboration et.al. 2512.07159 null
2025-12-09 TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning Zebin Xing et.al. 2512.07135 null
2025-12-08 PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes Kepeng Lin et.al. 2512.07113 null
2025-12-07 Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding MinCheol Jeon et.al. 2512.06929 null
2025-12-07 Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks Long Shi et.al. 2512.06784 null
2025-12-07 Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving Wei-Bin Kou et.al. 2512.06664 null
2025-12-06 Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion Jaewon Ahn et.al. 2512.06449 null
2025-12-04 The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation Ranjan Sapkota et.al. 2512.06032 null
2025-12-05 HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies Zhiying Du et.al. 2512.05693 null
2025-12-05 ProPhy: Progressive Physical Alignment for Dynamic World Simulation Zijun Wang et.al. 2512.05564 null
2025-12-05 EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Xin He et.al. 2512.04810 null
2025-12-04 Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space Joey Hong et.al. 2512.04601 null
2025-12-04 Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems Zehao Fan et.al. 2512.04476 null
2025-12-03 Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research Zia Qi et.al. 2512.04261 null
2025-12-03 OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference Liujianfu Wang et.al. 2512.03927 null
2025-12-04 A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models X. Y. Han et.al. 2512.03915 null
2025-12-03 Parsimonious Clustering of Covariance Matrices Yixi Xu et.al. 2512.03912 null
2025-12-03 CellScout: Visual Analytics for Mining Biomarkers in Cell State Discovery Rui Sheng et.al. 2512.03485 null
2025-12-03 SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models Geoffrey J. McLachlan et.al. 2512.03322 null
2025-12-02 SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts Jiaqi Liu et.al. 2512.02517 null
2025-12-02 Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention Wenyi Xiong et.al. 2512.02368 null
2025-12-02 Understanding and Harnessing Sparsity in Unified Multimodal Models Shwai He et.al. 2512.02351 null
2025-12-01 Towards Unified Video Quality Assessment Chen Feng et.al. 2512.02224 null
2025-12-01 ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation Chenyang Gu et.al. 2512.02013 null
2025-12-01 Multimodal Mixture-of-Experts for ISAC in Low-Altitude Wireless Networks Kai Zhang et.al. 2512.01750 null
2025-12-01 GRASP: Guided Residual Adapters with Sample-wise Partitioning Felix Nützel et.al. 2512.01675 null
2025-12-01 Bridging the Scale Gap: Balanced Tiny and General Object Detection in Remote Sensing Imagery Zhicheng Zhao et.al. 2512.01665 null
2025-12-01 Cuffless Blood Pressure Estimation from Six Wearable Sensor Modalities in Multi-Motion-State Scenarios Yiqiao Chen et.al. 2512.01653 null
2025-12-02 Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Chujie Zheng et.al. 2512.01374 null
2025-12-01 Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe Yahui Liu et.al. 2512.01252 null
2025-11-30 Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning Yebo Wu et.al. 2512.00902 null
2025-11-30 Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking Lingling Fu et.al. 2512.00724 null
2025-11-29 GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding Yiqiao Chen et.al. 2512.00574 null
2025-11-28 Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model Junshu Tang et.al. 2511.23429 null
2025-11-28 LFM2 Technical Report Alexander Amini et.al. 2511.23404 null
2025-11-28 Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing Yifei Wang et.al. 2511.23321 null
2025-11-28 Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering Zijian Fu et.al. 2511.23304 null
2025-11-28 Experts are all you need: A Composable Framework for Large Language Model Inference Shrihari Sridharan et.al. 2511.22955 null
2025-11-28 EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model Yuhao Xu et.al. 2511.22935 null
2025-11-27 OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency Jun Wang et.al. 2511.22481 null
2025-11-27 Foundation Model for Intelligent Wireless Communications Boxun Liu et.al. 2511.22222 null
2025-11-27 MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding Yu Li et.al. 2511.22103 null
2025-11-27 Qwen3-VL Technical Report Shuai Bai et.al. 2511.21631 null
2025-11-26 MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training Lu Zhao et.al. 2511.21431 null
2025-11-26 MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts Ivan Novikov et.al. 2511.21089 null
2025-11-25 HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation Xiang Wang et.al. 2511.20520 null
2025-11-25 MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology Kiril Vasilev et.al. 2511.20490 null
2025-11-25 Soft Adaptive Policy Optimization Chang Gao et.al. 2511.20347 null
2025-11-25 ADNet: A Large-Scale and Extensible Multi-Domain Benchmark for Anomaly Detection Across 380 Real-World Categories Hai Ling et.al. 2511.20169 null
2025-11-25 Adaptive Knowledge Transfer for Cross-Disciplinary Cold-Start Knowledge Tracing Yulong Deng et.al. 2511.20009 null
2025-11-25 Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models Wentao Hu et.al. 2511.19822 null
2025-11-22 Exploiting the Experts: Unauthorized Compression in MoE-LLMs Pinaki Prasad Guha Neogi et.al. 2511.19480 null
2025-11-24 OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs Yuting Gao et.al. 2511.19023 null
2025-11-24 Dynamic Mixture of Experts Against Severe Distribution Shifts Donghu Kim et.al. 2511.18987 null
2025-11-23 HiFi-MambaV2: Hierarchical Shared-Routed MoE for High-Fidelity MRI Reconstruction Pengcheng Fang et.al. 2511.18534 null
2025-11-23 Attosecond-resolved quantum fluctuations of light and matter Matan Even Tzur et.al. 2511.18362 null
2025-11-23 AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert Yuting Gao et.al. 2511.18314 null
2025-11-22 PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures Yuheng Shao et.al. 2511.18116 null
2025-11-22 CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking Hao Li et.al. 2511.17967 null
2025-11-22 FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning Guoyang Xia et.al. 2511.17885 null
2025-11-22 Equivalence of Context and Parameter Updates in Modern Transformer Blocks Adrian Goldwaser et.al. 2511.17864 null
2025-11-21 Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization Akhil Singampalli et.al. 2511.17829 null
2025-11-21 Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required? Sukwon Yun et.al. 2511.17400 null
2025-11-21 MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment Huangbiao Xu et.al. 2511.17397 null
2025-11-21 Measurements of differential charged-current cross sections on argon for electron neutrinos with final-state protons in MicroBooNE MicroBooNE collaboration et.al. 2511.17342 null
2025-11-21 Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design Quentin Anthony et.al. 2511.17127 null
2025-11-21 VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions Qianyi Shao et.al. 2511.16998 null
2025-11-21 RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts Fupei Guo et.al. 2511.16986 null
2025-11-21 MicroMoE: Fine-Grained Load Balancing for Mixture-of-Experts with Token Scheduling Chenqi Zhao et.al. 2511.16947 null
2025-11-20 Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution Xiao He et.al. 2511.16024 null
2025-11-19 AquaSentinel: Next-Generation AI System Integrating Sensor Networks for Urban Underground Water Pipeline Anomaly Detection via Collaborative MoE-LLM Agent Architecture Qiming Guo et.al. 2511.15870 null
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Yushi Huang et.al. 2511.15690 null
2025-11-19 VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Tairan He et.al. 2511.15200 null
2025-11-19 GPU-Initiated Networking for NCCL Khaled Hamidouche et.al. 2511.15076 null
2025-11-19 WiCo-PG: Wireless Channel Foundation Model for Pathloss Map Generation via Synesthesia of Machines Mingran Sun et.al. 2511.15030 null
2025-11-19 WiCo-MG: Wireless Channel Foundation Model for Multipath Generation via Synesthesia of Machines Zengrui Han et.al. 2511.15026 null
2025-11-19 Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference Kexin Chu et.al. 2511.15015 null
2025-11-18 HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation Lai Wei et.al. 2511.14756 null
2025-11-18 Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching Jintao Zhang et.al. 2511.14488 null
2025-11-18 MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts Wenfeng Wang et.al. 2511.14102 null
2025-11-18 FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration Jingren Liu et.al. 2511.14099 null
2025-11-18 SMGeo: Cross-View Object Geo-Localization with Grid-Level Mixture-of-Experts Fan Zhang et.al. 2511.14093 null
2025-11-17 MoMoE: A Mixture of Expert Agent Model for Financial Sentiment Analysis Peng Shu et.al. 2511.13983 null
2025-11-17 Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes Isabel Corpus et.al. 2511.13949 null
2025-11-17 InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE Lipeng Wang et.al. 2511.13488 null
2025-11-17 Measurement of Exclusive $π^+$ --argon Interactions Using ProtoDUNE-SP DUNE Collaboration et.al. 2511.13462 null
2025-11-18 YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection Ori Meiraz et.al. 2511.13344 null
2025-11-17 Self-Adaptive Graph Mixture of Models Mohit Meena et.al. 2511.13062 null
2025-11-17 Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation Yu Hou et.al. 2511.12922 null
2025-11-16 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Yunxin Li et.al. 2511.12609 null
2025-11-16 SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition Qing Cai et.al. 2511.12559 null
2025-11-16 MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics Jing Li et.al. 2511.12525 null
2025-11-16 MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding Zhanheng Nie et.al. 2511.12449 null
2025-11-15 SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty Leroy D'Souza et.al. 2511.12361 null
2025-11-15 AMR-MoEGA: Antimicrobial Resistance Prediction using Mixture of Experts and Genetic Algorithms Anshul Bagaria et.al. 2511.12223 null
2025-11-15 ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction Ruochen Li et.al. 2511.12214 null
2025-11-14 First Measurement of $π^+$-Ar and $p$ -Ar Total Inelastic Cross Sections in the Sub-GeV Energy Regime with ProtoDUNE-SP Data DUNE Collaboration et.al. 2511.11925 null
2025-11-14 FarSkip-Collective: Unhobbling Blocking Communication in Mixture of Experts Models Yonatan Dukler et.al. 2511.11505 null
2025-11-14 Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification Qinghao Gao et.al. 2511.11460 null
2025-11-14 Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing Cong Cao et.al. 2511.11236 null
2025-11-14 DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding Mingwei Xing et.al. 2511.11232 null
2025-11-14 ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization Anzhe Cheng et.al. 2511.10971 null
2025-11-14 Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go Yashshi Pipalani et.al. 2511.10868 null
2025-11-13 Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts Sumin Lee et.al. 2511.10300 null
2025-11-13 RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo Jueun Ko et.al. 2511.10107 null
2025-11-13 BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference Yun Wang et.al. 2511.10054 null
2025-11-13 ConSurv: Multimodal Continual Learning for Survival Analysis Dianzhi Yu et.al. 2511.09853 null
2025-11-12 UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving Ziyi Song et.al. 2511.09013 null
2025-11-12 Selective Sinkhorn Routing for Improved Sparse Mixture of Experts Duc Anh Nguyen et.al. 2511.08972 null
2025-11-12 Bayesian Mixture of Experts For Large Language Models Maryam Dialameh et.al. 2511.08968 null
2025-11-11 OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild Yuncheng Guo et.al. 2511.08423 null
2025-11-11 Text-based Aerial-Ground Person Retrieval Xinyu Zhou et.al. 2511.08369 null
2025-11-13 National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution Maryam Zolnoori et.al. 2511.08132 null
2025-11-10 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Zhongyang Li et.al. 2511.07419 null
2025-11-10 AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning Qile Jiang et.al. 2511.07262 null
2025-11-10 S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning Jiangwen Dong et.al. 2511.06727 null
2025-11-10 Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation Evelyn Chee et.al. 2511.06723 null
2025-11-09 Route Experts by Sequence, not by Token Tiansheng Wen et.al. 2511.06494 null
2025-11-09 HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation Kunrong Li et.al. 2511.06388 null
2025-11-09 A Mixture-of-Experts Framework with Log-Logistic Components for Survival Analysis on Histopathology Images Ardhendu Sekhar et.al. 2511.06266 null
2025-11-08 DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities Nagur Shareef Shaik et.al. 2511.05968 null
2025-11-08 MoEGCL: Mixture of Ego-Graphs Contrastive Representation Learning for Multi-View Clustering Jian Zhu et.al. 2511.05876 null
2025-11-08 In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading Shuning Lin et.al. 2511.05814 null
2025-11-07 MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery Baiye Cheng et.al. 2511.05007 null
2025-11-06 PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference Yushu Zhao et.al. 2511.04805 null
2025-11-06 GNN-MoE: Context-Aware Patch Routing using GNNs for Parameter-Efficient Domain Generalization Mahmoud Soliman et.al. 2511.04008 null
2025-11-05 GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models Zhibin Wang et.al. 2511.03251 null
2025-11-04 RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains Tianle Pu et.al. 2511.02331 null
2025-11-04 FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error Fengjuan Wang et.al. 2511.02302 null
2025-11-04 Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining Costin-Andrei Oncescu et.al. 2511.02237 null
2025-11-03 Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing Song Gao et.al. 2511.01743 null
2025-11-03 HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA Lei Hu et.al. 2511.01463 null
2025-11-04 CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing Yifan Zhou et.al. 2511.01197 null
2025-11-03 DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection Guoxin Ma et.al. 2511.01192 null
2025-11-01 OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback Kai Luo et.al. 2511.00510 null
2025-10-31 LongCat-Flash-Omni Technical Report Meituan LongCat Team et.al. 2511.00279 null
2025-10-31 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Xiangyu Fan et.al. 2510.27684 null
2025-10-31 RDMA Point-to-Point Communication for LLM Systems Nandor Licker et.al. 2510.27656 null
2025-10-31 MoRE: 3D Visual Geometry Reconstruction Meets Mixture-of-Experts Jingnan Gao et.al. 2510.27234 null
2025-10-31 AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification Yuanhao Tang et.al. 2510.27155 null
2025-10-30 Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement Aaditya Shukla et.al. 2510.27051 null
2025-10-30 Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems Hongbo Li et.al. 2510.27004 null
2025-10-30 MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation Arghavan Rezvani et.al. 2510.26996 null
2025-10-30 ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference Zixu Shen et.al. 2510.26730 null
2025-10-30 Low-Altitude UAV-Carried Movable Antenna for Joint Wireless Power Transfer and Covert Communications Chuang Zhang et.al. 2510.26628 null
2025-10-30 MossNet: Mixture of State-Space Experts is a Multi-Head Attention Shikhar Tuli et.al. 2510.26182 null
2025-10-29 Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis Hyeonjun Lee et.al. 2510.26014 null
2025-10-31 Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training Hong Wang et.al. 2510.25803 null
2025-10-29 Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts Qiushi Pan et.al. 2510.25285 null
2025-10-29 MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference Xinru Tang et.al. 2510.25258 null
2025-10-29 H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts Peilin Tan et.al. 2510.25091 null
2025-10-28 Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Inclusion AI et.al. 2510.24821 null
2025-10-28 Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Yujie Wei et.al. 2510.24711 null
2025-10-28 Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation Xiucheng Zhang et.al. 2510.24055 null
2025-10-26 Sparsity and Superposition in Mixture of Experts Marmik Chaudhari et.al. 2510.23671 null
2025-10-27 EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting Musleh Alharthi et.al. 2510.23396 null
2025-10-27 Rethinking GSPO: The Perplexity-Entropy Equivalence Chi Liu et.al. 2510.23142 null
2025-10-27 Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts Di Zhang et.al. 2510.23027 null
2025-10-27 MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning Han Wu et.al. 2510.23013 null
2025-10-25 Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Ling-Team et.al. 2510.22115 null
2025-10-24 PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling Andrea Bonfanti et.al. 2510.21262 null
2025-10-24 Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization Yunlong Chu et.al. 2510.21207 null
2025-10-24 Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts Yanguang Sun et.al. 2510.21114 null
2025-10-24 MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning Siyong Chen et.al. 2510.21093 null
2025-10-23 Bayesian Jammer Localization with a Hybrid CNN and Path-Loss Mixture of Experts Mariona Jaramillo-Civill et.al. 2510.20666 null
2025-10-23 xTime: Extreme Event Prediction with Hierarchical Knowledge Distillation and Expert Fusion Quan Li et.al. 2510.20651 null
2025-10-23 Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning Xiaohan Lan et.al. 2510.20519 null
2025-10-23 A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization LinFeng Li et.al. 2510.20291 null
2025-10-23 AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training Huawei Bai et.al. 2510.20111 null
2025-10-22 HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission Weihao Yang et.al. 2510.19470 null
2025-10-22 MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs Xinfeng Xia et.al. 2510.19366 null
2025-10-22 Modeling Turn-Taking with Semantically Informed Gestures Varsha Suresh et.al. 2510.19350 null
2025-10-23 RailS: Load Balancing for All-to-All Communication in Distributed Mixture-of-Experts Training Heng Xu et.al. 2510.19262 null
2025-10-22 A Design Science Blueprint for an Orchestrated AI Assistant in Doctoral Supervision Teo Susnjak et.al. 2510.19227 null
2025-10-22 MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting In-Hwan Jin et.al. 2510.19210 null
2025-10-21 Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework Yujie Xing et.al. 2510.18825 null
2025-10-21 Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification Bin Gu et.al. 2510.18533 null
2025-10-21 Training Diverse Graph Experts for Ensembles: A Systematic Empirical Study Gangda Deng et.al. 2510.18370 null
2025-10-19 L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts Shihao Ji et.al. 2510.17898 null
2025-10-20 Towards 3D Objectness Learning in an Open World Taichi Liu et.al. 2510.17686 null
2025-10-20 Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model Xinwei Zhang et.al. 2510.17684 null
2025-10-20 Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm Hao Qiao et.al. 2510.17604 null
2025-10-20 ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts Zheyue Tan et.al. 2510.17483 null
2025-10-19 End-to-end Listen, Look, Speak and Act Siyin Wang et.al. 2510.16756 null
2025-10-18 NeurIPT: Foundation Model for Neural Interfaces Zitao Fang et.al. 2510.16548 null
2025-10-18 Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts Yongxiang Hua et.al. 2510.16448 null
2025-10-18 Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures Minh-Khoi Nguyen-Nhat et.al. 2510.16411 null
2025-10-17 Expert Merging in Sparse Mixture of Experts with Nash Bargaining Dung V. Nguyen et.al. 2510.16138 null
2025-10-17 Mixture of Experts Approaches in Dense Retrieval Tasks Effrosyni Sokli et.al. 2510.15683 null
2025-10-17 FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification Zhen Sun et.al. 2510.15595 null
2025-10-17 Backdoor or Manipulation? Graph Mixture of Experts Can Defend Against Various Graph Adversarial Attacks Yuyuan Feng et.al. 2510.15333 null
2025-10-17 MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation Xianyang Qi et.al. 2510.15286 null
2025-10-17 Adaptive Individual Uncertainty under Out-Of-Distribution Shift with Expert-Routed Conformal Prediction Amitesh Badkul et.al. 2510.15233 null
2025-10-16 Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models Guinan Su et.al. 2510.14853 null
2025-10-16 MergeMoE: Efficient Compression of MoE Models via Expert Output Merging Ruijie Miao et.al. 2510.14436 null
2025-10-16 Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Weijie Shen et.al. 2510.14300 null
2025-10-16 MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering Mingkai Liu et.al. 2510.14251 null
2025-10-15 REAP the Experts: Why Pruning Prevails for One-Shot MoE compression Mike Lasby et.al. 2510.13999 null
2025-10-15 Steer-MoE: Efficient Audio-Language Alignment with a Mixture-of-Experts Steering Module Ruitao Feng et.al. 2510.13558 null
2025-10-15 ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition Deeptimaan Banerjee et.al. 2510.13493 null
2025-10-15 Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers Xin Zhao et.al. 2510.13462 null
2025-10-15 Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts Li Bai et.al. 2510.13451 null
2025-10-15 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Zhenyu Liu et.al. 2510.13344 null
2025-10-15 GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models Chen Zheng et.al. 2510.13079 null
2025-10-14 Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps Do Tien Hai et.al. 2510.12744 null
2025-10-14 MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts Yushu Zhao et.al. 2510.12357 null
2025-10-14 DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification Tao Xie et.al. 2510.12214 null
2025-10-13 Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View Jinyu Zhang et.al. 2510.11687 null
2025-10-13 Robust Ego-Exo Correspondence with Long-Term Memory Yijun Hu et.al. 2510.11417 null
2025-10-13 Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Wenhan Ma et.al. 2510.11370 null
2025-10-13 What to expect from microscopic nuclear modelling for k $_{\rm eff}$ calculations ? D. Rochman et.al. 2510.11256 null
2025-10-13 MC#: Mixture Compressor for Mixture-of-Experts Large Models Wei Huang et.al. 2510.10962 null
2025-10-12 Crisis-Aware Regime-Conditioned Diffusion with CVaR Allocation Ali Atiah Alzahrani et.al. 2510.10807 null
2025-10-12 Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection Shizhen Zhao et.al. 2510.10584 null
2025-10-12 Hierarchical LoRA MoE for Efficient CTR Model Scaling Zhichen Zeng et.al. 2510.10432 null
2025-10-11 SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference Liangkun Chen et.al. 2510.10302 null
2025-10-10 MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest Xiao Yang et.al. 2510.09857 null
2025-10-10 Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation Youwei Zheng et.al. 2510.09094 null
2025-10-09 LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution Xiaohui Li et.al. 2510.08771 null
2025-10-09 FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts Heming Zou et.al. 2510.08396 null
2025-10-09 Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization Jason Bohne et.al. 2510.08256 null
2025-10-09 From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Gunjun Lee et.al. 2510.08055 null
2025-10-09 Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training Ruizhe Wang et.al. 2510.08008 null
2025-10-09 Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing Cunli Mao et.al. 2510.07736 null
2025-10-09 Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision Xiaoxu Ma et.al. 2510.07703 null
2025-10-09 LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning Yuhan Sun et.al. 2510.07685 null
2025-10-08 MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting Yoli Shavit et.al. 2510.07459 null
2025-10-08 Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting Walid Guettala et.al. 2510.07426 null
2025-10-08 Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts Fangshuo Liao et.al. 2510.07205 null
2025-10-08 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages Zibo Su et.al. 2510.06612 null
2025-10-09 SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Shuang Cheng et.al. 2510.06303 null
2025-10-06 Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation" Ranjan Mishra et.al. 2510.06275 null
2025-10-08 Barbarians at the Gate: How AI is Upending Systems Research Audrey Cheng et.al. 2510.06189 null
2025-10-07 Rasterized Steered Mixture of Experts for Efficient 2D Image Regression Yi-Hsin Li et.al. 2510.05814 null
2025-10-07 MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition Haoxun Li et.al. 2510.05749 null
2025-10-07 Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting Zhongkai Yu et.al. 2510.05497 null
2025-10-06 Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving Yue Pan et.al. 2510.05245 null
2025-10-06 REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis Alec K. Peltekian et.al. 2510.04923 null
2025-10-06 LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0 Jinbo Wen et.al. 2510.04765 null
2025-10-06 Multilingual Routing in Mixture-of-Experts Lucas Bandarkar et.al. 2510.04694 null
2025-10-06 Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing Xuanhua Yin et.al. 2510.04670 null
2025-10-05 HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks Nghiem T. Diep et.al. 2510.04295 null
2025-10-05 SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling Harshil Vejendla et.al. 2510.04286 null
2025-10-05 MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition Umberto Cappellazzo et.al. 2510.04136 null
2025-10-03 Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective Yehuda Dar et.al. 2510.03151 null
2025-10-02 ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models Gursimran Singh et.al. 2510.02613 null
2025-10-02 UpSafe $^\circ$ C: Upcycling for Controllable Safety in Large Language Models Yuhao Sun et.al. 2510.02194 null
2025-10-02 LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition Rixin Zhou et.al. 2510.01651 null
2025-10-01 Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs Leyla Mirvakhabova et.al. 2510.01185 null
2025-10-01 Learning Compact Representations of LLM Abilities via Item Response Theory Jianhao Chen et.al. 2510.00844 null
2025-10-01 Graph Integrated Multimodal Concept Bottleneck Model Jiakai Lin et.al. 2510.00701 null
2025-10-01 FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression Yifei Gao et.al. 2510.00621 null
2025-10-01 Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning Minghao Yang et.al. 2510.00570 null
2025-09-30 FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training Yunqi Gao et.al. 2510.00207 null
2025-09-30 Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization Yaoxiang Wang et.al. 2509.26520 null
2025-09-30 Nephrobase Cell+: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology Chenyu Li et.al. 2509.26223 null
2025-09-30 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline Haiyang Li et.al. 2509.25991 null
2025-09-30 UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression Yuan Zhao et.al. 2509.25934 null
2025-09-30 Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel Chuanyang Zheng et.al. 2509.25913 null
2025-10-01 A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI Arvind Murari Vepa et.al. 2509.25889 null
2025-09-30 Collaborative Compression for Large-Scale MoE Deployment on Edge Yixiao Chen et.al. 2509.25689 null
2025-09-30 LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts Yuan Zhuang et.al. 2509.25684 null
2025-09-30 Guiding Mixture-of-Experts with Temporal Multimodal Interactions Xing Han et.al. 2509.25678 null
2025-09-29 K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model Bangwei Guo et.al. 2509.25594 null
2025-09-29 GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference Yu Han et.al. 2509.25041 null
2025-09-29 LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection Bao-Ngoc Dao et.al. 2509.24547 null
2025-09-29 One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning Minh Le et.al. 2509.24483 null
2025-09-29 Muon: Training and Trade-offs with Latent Attention and MoE Sushant Mehta et.al. 2509.24406 null
2025-09-29 LLaDA-MoE: A Sparse MoE Diffusion Language Model Fengqi Zhu et.al. 2509.24389 null
2025-09-29 Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning Zhisheng Chen et.al. 2509.24222 null
2025-09-28 HunyuanImage 3.0 Technical Report Siyu Cao et.al. 2509.23951 null
2025-09-28 Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms Jiahao Ying et.al. 2509.23933 null
2025-09-28 Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know Albus Yizhuo Li et.al. 2509.23830 null
2025-09-28 A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning Yaya Zhao et.al. 2509.23772 null
2025-09-26 Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time Yixuan Han et.al. 2509.22572 null
2025-09-26 Learning to Ball: Composing Policies for Long-Horizon Basketball Moves Pei Xu et.al. 2509.22442 null
2025-09-26 Role-Aware Multi-modal federated learning system for detecting phishing webpages Bo Wang et.al. 2509.22369 null
2025-09-26 HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space Ke Li et.al. 2509.22299 null
2025-09-26 Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics Xingjian Wu et.al. 2509.22279 null
2025-09-26 MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning Tao Wu et.al. 2509.21953 null
2025-09-26 Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts Naibin Gu et.al. 2509.21892 null
2025-09-26 ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations Chang Liu et.al. 2509.21802 null
2025-09-26 LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE Yu Shang et.al. 2509.21790 null
2025-09-25 Distributed Specialization: Rare-Token Neurons in Large Language Models Jing Liu et.al. 2509.21163 null
2025-09-26 Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns Xuemiao Zhang et.al. 2509.21124 null
2025-09-25 Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders Alessandro Bombini et.al. 2509.21123 null
2025-09-24 Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures Sampurna Roy et.al. 2509.20577 null
2025-09-24 SHMoAReg: Spark Deformable Image Registration via Spatial Heterogeneous Mixture of Experts and Attention Heads Yuxi Zheng et.al. 2509.20073 null
2025-09-24 Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference Ziyi Han et.al. 2509.19781 null
2025-09-23 DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces Tianshuo Zhang et.al. 2509.19230 null
2025-09-23 Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation Yunzhe Shen et.al. 2509.18912 null
2025-09-23 LongCat-Flash-Thinking Technical Report Meituan LongCat Team et.al. 2509.18883 null
2025-09-23 PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving Chengran Yuan et.al. 2509.18609 null
2025-09-23 Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts Qi Wang et.al. 2509.18542 null
2025-09-23 StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models Haoxin Yang et.al. 2509.17993 null
2025-09-23 Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark Siu Hang Ho et.al. 2509.17894 null
2025-09-22 Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving Ziming Liu et.al. 2509.17863 null
2025-09-22 Attention-based Mixture of Experts for Robust Speech Deepfake Detection Viola Negroni et.al. 2509.17585 null
2025-09-22 Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity Siqi Li et.al. 2509.17411 null
2025-09-21 MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Soheil Zibakhsh et.al. 2509.17238 null
2025-09-21 CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception Lingzhao Kong et.al. 2509.17107 null
2025-09-21 Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation Junzhuo Li et.al. 2509.16882 null
2025-09-20 KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control Jinrui Han et.al. 2509.16638 null
2025-09-19 DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning Sikai Bai et.al. 2509.16105 null
2025-09-19 MoE-CE: Enhancing Generalization for Deep Learning based Channel Estimation via a Mixture-of-Experts Framework Tianyu Li et.al. 2509.15964 null
2025-09-19 pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation Tong Wang et.al. 2509.15638 null
2025-09-19 MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training Junbiao Pang et.al. 2509.15514 null
2025-09-18 Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing Zichen Wu et.al. 2509.15361 null
2025-09-18 Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting Liran Nochumsohn et.al. 2509.15105 null
2025-09-18 Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning Lei Wang et.al. 2509.15087 null
2025-09-18 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence Chaoyin She et.al. 2509.14977 null
2025-09-18 FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts Jiayi Han et.al. 2509.14900 null
2025-09-18 CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human Nan Sun et.al. 2509.14889 null
2025-09-17 CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts Leonard Hackel et.al. 2509.14104 null
2025-09-18 SAIL-VL2 Technical Report Weijie Yin et.al. 2509.14033 null
2025-09-17 Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation Nguyen Lan Vi Vu et.al. 2509.13834 null
2025-09-18 Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers Manan Mittal et.al. 2509.13548 null
2025-09-18 GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR Yujie Guo et.al. 2509.13093 null
2025-09-16 Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection Boyu Han et.al. 2509.12990 null
2025-09-16 Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks Bowen Ye et.al. 2509.12813 null
2025-09-16 MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos Damola Agbelese et.al. 2509.12772 null
2025-09-17 NavMoE: Hybrid Model- and Learning-based Traversability Estimation for Local Navigation via Mixture of Experts Botao He et.al. 2509.12747 null
2025-09-16 AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models Heng Zhang et.al. 2509.12715 null
2025-10-24 Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts Yunke Qu et.al. 2508.05993 null
2025-07-23 Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models Changxin Tian et.al. 2507.17702 null
2025-07-23 Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography Farnoush Bayatmakou et.al. 2507.17662 null
2025-07-23 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Shuai Yang et.al. 2507.17520 null
2025-07-23 Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection Yehao Lu et.al. 2507.17436 null
2025-07-23 A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model Zhe Xu et.al. 2507.17303 null
2025-07-23 BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs Jianmin Hu et.al. 2507.17133 null
2025-07-22 GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI Joshua Kalyanapu et.al. 2507.17033 null
2025-07-22 Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data Yunyi Shen et.al. 2507.16817 null
2025-07-22 Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training Zixiao Huang et.al. 2507.16274 null
2025-07-21 Applying multimodal learning to Classify transient Detections Early (AppleCiDEr) I: Data set, methods, and infrastructure Alexandra Junell et.al. 2507.16088 null
2025-07-21 Just Ask for Music (JAM): Multimodal and Personalized Natural Language Music Recommendation Alessandro B. Melchiorre et.al. 2507.15826 null
2025-07-21 The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts Sungmin Yun et.al. 2507.15465 null
2025-07-21 Universal crystal material property prediction via multi-view geometric fusion in graph transformers Liang Zhang et.al. 2507.15303 null
2025-07-20 CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning Pan Hu et.al. 2507.14903 null
2025-07-23 GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving Chi Wan et.al. 2507.14456 null
2025-07-18 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing Yingying Zhang et.al. 2507.13812 null
2025-07-17 Apple Intelligence Foundation Language Models: Tech Report 2025 Hanzhi Zhou et.al. 2507.13575 null
2025-07-17 R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning Xiaohan Guo et.al. 2507.13107 null
2025-07-16 Astro-MoE: Mixture of Experts for Multiband Astronomical Time Series Martina Cádiz-Leyton et.al. 2507.12611 null
2025-07-16 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Gen Luo et.al. 2507.12566 null
2025-07-17 Mixture of Raytraced Experts Andrea Perin et.al. 2507.12419 null
2025-07-16 CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning Peiwen Xia et.al. 2507.11834 null
2025-07-15 Mixture of Experts in Large Language Models Danyang Zhang et.al. 2507.11181 null
2025-07-15 Atmos-Bench: 3D Atmospheric Structures for Climate Insight Tianchi Xu et.al. 2507.11085 null
2025-07-14 DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models Luolin Xiong et.al. 2507.09955 null
2025-07-14 ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization Huilai Li et.al. 2507.09945 null
2025-07-14 Multi-residual Mixture of Experts Learning for Cooperative Control in Multi-vehicle Systems Vindula Jayawardana et.al. 2507.09836 null
2025-07-13 Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts Aakash Tripathi et.al. 2507.09754 null
2025-07-13 Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive You Huang et.al. 2507.09612 null
2025-07-12 PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process Shiqi Jiang et.al. 2507.09242 null
2025-07-11 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Chenyang Song et.al. 2507.08771 null
2025-07-11 CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes Tianyou Jiang et.al. 2507.08542 null
2025-07-11 White-Basilisk: A Hybrid Model for Code Vulnerability Detection Ioannis Lamprou et.al. 2507.08540 null
2025-07-15 KAT-V1: Kwai-AutoThink Technical Report Zizheng Zhan et.al. 2507.08297 null
2025-07-11 Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization Woon Ryong Kim et.al. 2507.08269 null
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Lu Xu et.al. 2507.07818 null
2025-07-10 When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance Peizhang Shao et.al. 2507.07748 null
2025-07-09 Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Ankit Jyothish et.al. 2507.07335 null
2025-07-08 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate A. Bochkov et.al. 2507.07129 null
2025-07-09 4KAgent: Agentic Any Image to 4K Super-Resolution Yushen Zuo et.al. 2507.07105 null
2025-07-11 FlexOlmo: Open Language Models for Flexible Data Use Weijia Shi et.al. 2507.07024 null
2025-07-09 Deep Disentangled Representation Network for Treatment Effect Estimation Hui Meng et.al. 2507.06650 null
2025-07-09 SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Qian Chen et.al. 2507.06567 null
2025-07-09 MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models Yiwen Liu et.al. 2507.06502 null
2025-07-08 Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation Szymon Płotka et.al. 2507.06363 null
2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis Xintong Hu et.al. 2507.06116 null
2025-07-09 A Survey on Prompt Tuning Zongqian Li et.al. 2507.06085 null
2025-07-08 Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors Bing Wang et.al. 2507.05939 null
2025-07-08 What You Have is What You Track: Adaptive and Robust Multimodal Tracking Yuedong Tan et.al. 2507.05899 null
2025-07-08 Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition Zijin Gu et.al. 2507.05724 null
2025-07-08 Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach Xiaobing Chen et.al. 2507.05685 null
2025-07-08 City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data Tianxing Wu et.al. 2507.05651 null
2025-07-07 QMoE: A Quantum Mixture of Experts Framework for Scalable Quantum Neural Networks Hoang-Quan Nguyen et.al. 2507.05190 null
2025-07-07 NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification Jun Hu et.al. 2507.04870 null
2025-07-07 DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics Yayu Long et.al. 2507.04661 null
2025-07-08 UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification Xixi Wan et.al. 2507.04638 null
2025-07-07 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts Yun Wang et.al. 2507.04631 null
2025-07-05 Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge Linshen Liu et.al. 2507.04123 null
2025-07-05 From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM Xinyi Wu et.al. 2507.03868 null
2025-07-04 Decoupled Relative Learning Rate Schedules Jan Ludziejewski et.al. 2507.03526 null
2025-07-03 Neural Inhibition Improves Dynamic Routing and Mixture of Experts Will Y. Zou et.al. 2507.03221 null
2025-07-03 System-performance and cost modeling of Large Language Model training and inference Wenzhe Guo et.al. 2507.02456 null
2025-07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding Jacob J. Morra et.al. 2507.02264 null
2025-07-02 MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics Dmytro Kuzmenko et.al. 2507.01843 null
2025-07-02 Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data Ethan Pawl et.al. 2507.01375 null
2025-07-02 Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model Chaoxiang Cai et.al. 2507.01351 null
2025-07-02 Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations Bohao Wang et.al. 2507.01337 null
2025-07-02 ExPaMoE: An Expandable Parallel Mixture of Experts for Continual Test-Time Adaptation JianChao Zhao et.al. 2507.00502 null
2025-07-01 MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE Geng Zhang et.al. 2507.00390 null
2025-06-30 MotionGPT3: Human Motion as a Second Modality Bingfan Zhu et.al. 2506.24086 null
2025-06-30 MReg: A Novel Regression Model with MoE-based Video Feature Mining for Mitral Regurgitation Diagnosis Zhe Liu et.al. 2506.23648 null
2025-06-30 Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model Mu-Chi Chen et.al. 2506.23635 null
2025-06-29 Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging Lujun Li et.al. 2506.23266 null
2025-06-29 External Data-Enhanced Meta-Representation for Adaptive Probabilistic Load Forecasting Haoran Li et.al. 2506.23201 null
2025-06-29 Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound Zhiyuan Zhu et.al. 2506.23108 null
2025-07-01 Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning Sanskar Pandey et.al. 2506.22919 null
2025-06-27 Towards Distributed Neural Architectures Aditya Cowsik et.al. 2506.22389 null
2025-06-27 MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism Zheng Zhang et.al. 2506.22175 null
2025-06-27 DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE Hang Shao et.al. 2506.21864 null
2025-06-26 Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts Jiajie Yang et.al. 2506.21328 null
2025-06-26 Learning to Skip the Middle Layers of Transformers Tim Lawson et.al. 2506.21103 null
2025-06-26 Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning Haodong Lu et.al. 2506.21035 null
2025-06-26 EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning Xiao Zhang et.al. 2506.20986 null
2025-06-25 Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration Jiaxing Huang et.al. 2506.20282 null
2025-06-23 Multimodal Anomaly Detection with a Mixture-of-Experts Christoph Willibald et.al. 2506.19077 null
2025-06-23 Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Zihan Wang et.al. 2506.18945 null
2025-06-23 Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning Rahul Atul Bhope et.al. 2506.18789 null
2025-06-23 An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify Shivam Verma et.al. 2506.18735 null
2025-06-23 Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks Xiaodong Wu et.al. 2506.18543 null
2025-06-23 SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Zichong Li et.al. 2506.18349 null
2025-06-23 Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies Junchao Fan et.al. 2506.18304 null
2025-06-22 Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection Zheng Zhan et.al. 2506.18145 null
2025-06-21 Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert Gelei Xu et.al. 2506.17787 null
2025-06-21 Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities Xinghao Huang et.al. 2506.17755 null
2025-06-21 PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation Xinyu Xiong et.al. 2506.17712 null
2025-06-20 SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification Zhenglin Lai et.al. 2506.17368 null
2025-06-19 FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE Khiem Le et.al. 2506.16600 null
2025-06-19 Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models Daniel Fidel Harvey et.al. 2506.16419 null
2025-06-17 Scaling Intelligence: Designing Data Centers for Next-Gen Language Models Jesmin Jahan Tithi et.al. 2506.15006 null
2025-06-17 NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification Wajih Hassan Raza et.al. 2506.14970 null
2025-06-17 GMT: General Motion Tracking for Humanoid Whole-Body Control Zixuan Chen et.al. 2506.14770 null
2025-06-17 Exploring Speaker Diarization with Mixture of Experts Gaobin Yang et.al. 2506.14750 null
2025-06-18 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Ling Team et.al. 2506.14731 null
2025-06-17 GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors Hengyuan Zhang et.al. 2506.14646 link
2025-06-17 Single-Example Learning in a Mixture of GPDMs with Latent Geometries Jesse St. Amand et.al. 2506.14563 null
2025-06-17 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models Hongyu Wang et.al. 2506.14435 null
2025-06-16 Load Balancing Mixture of Experts with Similarity Preserving Routers Nabil Omi et.al. 2506.14038 null
2025-06-16 GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics Qianzhong Chen et.al. 2506.14009 null
2025-06-16 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention MiniMax et.al. 2506.13585 link
2025-06-16 Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization Guanghui Song et.al. 2506.13541 null
2025-06-16 EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization Zhongqian Fu et.al. 2506.13329 link
2025-06-16 Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs Xintong Tang et.al. 2506.13192 null
2025-06-15 Serving Large Language Models on Huawei CloudMatrix384 Pengfei Zuo et.al. 2506.12708 null
2025-06-14 Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts Shengzhuang Chen et.al. 2506.12597 null
2025-06-14 Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control Rongpeng Li et.al. 2506.12453 null
2025-06-17 HarMoEny: Efficient Multi-GPU Inference of MoE Models Zachary Doucet et.al. 2506.12417 null
2025-06-14 Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model Chong Li et.al. 2506.12388 null
2025-06-13 Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? Houyi Li et.al. 2506.12119 null
2025-06-13 Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution Zhangkai Ni et.al. 2506.11823 link
2025-06-12 Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts Zaijing Li et.al. 2506.10357 null
2025-06-11 GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture GigaChat team et.al. 2506.09440 null
2025-06-11 DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts Yuchen Feng et.al. 2506.09351 null
2025-06-10 CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA Jiale Dong et.al. 2506.08496 link
2025-06-11 MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding Shivang Chopra et.al. 2506.08356 null
2025-06-11 STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation Yiming Wang et.al. 2506.08054 link
2025-06-09 A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Jacob Helwig et.al. 2506.07969 link
2025-06-09 M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration Yongzhen Wang et.al. 2506.07814 null
2025-06-11 MIRA: Medical Time Series Foundation Model for Real-World Health Data Hao Li et.al. 2506.07584 null
2025-06-11 MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization Ken Yaggel et.al. 2506.07563 link
2025-06-09 MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts Wei Tao et.al. 2506.07533 null
2025-06-09 MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing Haiyue Ma et.al. 2506.07366 null
2025-06-08 UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment Wentao Zhao et.al. 2506.07013 null
2025-06-07 High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations Ziwei Li et.al. 2506.06858 null
2025-06-07 Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning Yuan Yuan et.al. 2506.06694 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Jonathan Yang et.al. 2506.06196 null
2025-06-06 MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Jie Cao et.al. 2506.05928 null
2025-06-06 dots.llm1 Technical Report Bi Huo et.al. 2506.05767 null
2025-06-05 Mixture-of-Experts Meets In-Context Reinforcement Learning Wenhao Wu et.al. 2506.05426 null
2025-06-05 Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection Ziyi Zhou et.al. 2506.04739 null
2025-06-05 FlashDMoE: Fast Distributed MoE in a Single Kernel Osayamen Jonathan Aimuyo et.al. 2506.04667 link
2025-06-04 Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts Jiaxing Zhang et.al. 2506.03591 null
2025-06-04 PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs Ze Yu Zhang et.al. 2506.02965 null
2025-06-03 Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights Jakub Krajewski et.al. 2506.02890 null
2025-06-03 Brain-Like Processing Pathways Form in Models With Heterogeneous Experts Jack Cook et.al. 2506.02813 null
2025-06-04 MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection Juntong Li et.al. 2506.02535 null
2025-06-03 MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework Yupeng Qi et.al. 2506.02460 null
2025-05-31 Enhancing Multimodal Continual Instruction Tuning with BranchLoRA Duzhen Zhang et.al. 2506.02041 null
2025-06-02 SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model Zhao Yang et.al. 2506.01833 link
2025-06-02 Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning Ryotaro Kawata et.al. 2506.01656 null
2025-06-02 DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models Jiancheng Ye et.al. 2506.01257 null
2025-06-01 Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts Fan Liu et.al. 2506.00965 null
2025-05-30 Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction Shuai Liu et.al. 2505.24597 null
2025-05-30 Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis Junzhuo Li et.al. 2505.24593 null
2025-05-30 Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer Yilun Kong et.al. 2505.24378 link
2025-05-30 GradPower: Powering Gradients for Faster Language Model Pre-Training Mingze Wang et.al. 2505.24275 null
2025-05-30 On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks Mingze Wang et.al. 2505.24205 null
2025-05-29 Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts Xuweiyi Chen et.al. 2505.23926 null
2025-06-03 Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert Zhaokun Wang et.al. 2505.23868 null
2025-05-29 From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents Tobias Lindenbauer et.al. 2505.23422 link
2025-05-29 Context-Aware Semantic Communication for the Wireless Networks Guangyuan Liu et.al. 2505.23249 null
2025-05-29 Two Is Better Than One: Rotations Scale LoRAs Hongcan Guo et.al. 2505.23184 null
2025-05-28 HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Qi Cai et.al. 2505.22705 link
2025-05-28 Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts Xue Zhang et.al. 2505.22582 null
2025-05-28 A Human-Centric Approach to Explainable AI for Personalized Education Vinitra Swamy et.al. 2505.22541 link
2025-05-28 Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion Kewen Chen et.al. 2505.22360 null
2025-05-28 Advancing Expert Specialization for Better MoE Hongcan Guo et.al. 2505.22323 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Jiawen Yu et.al. 2505.22159 null
2025-05-28 AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation Yan Rong et.al. 2505.22053 null
2025-05-28 Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Zhongyi Zhou et.al. 2505.21906 null
2025-05-27 MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis Yitong Li et.al. 2505.21698 null
2025-05-28 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Yehui Tang et.al. 2505.21411 null
2025-05-27 Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities Junyan Zhang et.al. 2505.21191 null
2025-05-27 Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts Yue Zhang et.al. 2505.21079 null
2025-05-27 Multi-objective Large Language Model Alignment with Hierarchical Experts Zhuo Li et.al. 2505.20925 null
2025-05-26 FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models Hao Kang et.al. 2505.20225 link
2025-05-26 NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID Shihao Li et.al. 2505.20001 null
2025-05-26 Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments Junming Liu et.al. 2505.19699 null
2025-05-26 MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE Zongle Huang et.al. 2505.19645 null
2025-05-26 Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate Liangwei Nathan Zheng et.al. 2505.19525 link
2025-05-26 WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Sihan Chen et.al. 2505.19427 link
2025-05-25 RankLLM: A Python Package for Reranking with LLMs Sahel Sharifymoghaddam et.al. 2505.19284 null
2025-05-25 I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts Jiayi Xin et.al. 2505.19190 link
2025-05-24 TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling Chonghua Han et.al. 2505.18670 null
2025-05-24 ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation Jian Liang et.al. 2505.18640 link
2025-05-24 Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter Weizhi Zhong et.al. 2505.18612 null
2025-05-23 Enhancing CTR Prediction with De-correlated Expert Networks Jiancheng Wang et.al. 2505.17925 null
2025-05-23 PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval Zehua Pei et.al. 2505.17639 null
2025-05-23 CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning Jinyuan Feng et.al. 2505.17553 null
2025-05-23 MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation Kaixing Yang et.al. 2505.17543 null
2025-05-22 JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model Qihao Duan et.al. 2505.17257 null
2025-05-22 DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving Zhenjie Yang et.al. 2505.16278 null
2025-05-22 DualComp: End-to-End Learning of a Unified Dual-Modality Lossless Compressor Yan Zhao et.al. 2505.16256 null
2025-05-21 Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models Jingcong Liang et.al. 2505.16056 link
2025-05-21 MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding Yuxiang Wei et.al. 2505.15946 null
2025-05-21 CoLA: Collaborative Low-Rank Adaptation Yiyun Zhou et.al. 2505.15471 link
2025-05-22 Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Tencent Hunyuan Team et.al. 2505.15431 null
2025-05-21 Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks Uranik Berisha et.al. 2505.15414 null
2025-05-21 Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines Xiaohou Shi et.al. 2505.15151 null
2025-05-20 Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies Haoyi Qiu et.al. 2505.14972 link
2025-05-20 Balanced and Elastic End-to-end Training of Dynamic LLMs Mohamed Wahib et.al. 2505.14864 null
2025-05-20 Solving MNIST with a globally trained Mixture of Quantum Experts Paolo Alessandro Xavier Tognini et.al. 2505.14789 null
2025-05-20 Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Mengru Wang et.al. 2505.14681 null
2025-05-21 Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Umberto Cappellazzo et.al. 2505.14336 null
2025-05-20 FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation Shaolin Zhu et.al. 2505.14256 null
2025-05-20 THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation Yunlong Liang et.al. 2505.14173 null
2025-05-20 Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition Shuo Zhang et.al. 2505.14143 null
2025-05-20 Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Ryo Bertolissi et.al. 2505.14136 null
2025-05-20 StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning Huaijie Wang et.al. 2505.13997 null
2025-05-20 Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting Bao-Ngoc Dao et.al. 2505.13944 link
2025-05-20 U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding Ziqian Wang et.al. 2505.13880 link
2025-05-20 EfficientLLM: Efficiency in Large Language Models Zhengqing Yuan et.al. 2505.13840 null
2025-05-19 CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition Nam V. Nguyen et.al. 2505.13380 link
2025-05-19 Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference Shuqing Luo et.al. 2505.13345 link
2025-05-19 Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models Lucas Berry et.al. 2505.13273 null
2025-05-19 True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics Christoph Jürgen Hemmer et.al. 2505.13192 null
2025-05-19 Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures Tuan Thai et.al. 2505.13052 null
2025-05-18 Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization Hongbiao Zhu et.al. 2505.12311 null
2025-05-20 Model Merging in Pre-training of Large Language Models Yunshui Li et.al. 2505.12082 null
2025-05-20 Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition Runduo Han et.al. 2505.12007 link
2025-05-17 MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging Zihuan Qiu et.al. 2505.11883 null
2025-05-17 Improving Coverage in Combined Prediction Sets with Weighted p-values Gina Wong et.al. 2505.11785 null
2025-05-16 MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production Chao Jin et.al. 2505.11432 null
2025-05-16 MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems Yinsicheng Jiang et.al. 2505.11415 null
2025-05-16 A Fast Kernel-based Conditional Independence test with Application to Causal Discovery Oliver Schacht et.al. 2505.11085 null
2025-05-16 On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating Huy Nguyen et.al. 2505.10860 null
2025-05-14 PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning Zongqian Li et.al. 2505.09519 link
2025-05-14 Qwen3 Technical Report An Yang et.al. 2505.09388 link
2025-05-14 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Chenggang Zhao et.al. 2505.09343 null
2025-05-13 Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony Shaoyu Wang et.al. 2505.08944 null
2025-05-13 PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts Yang Su et.al. 2505.08719 null
2025-05-13 AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale Yunjie Ji et.al. 2505.08311 null
2025-05-12 UMoE: Unifying Attention and FFN with Shared Experts Yuanhang Yang et.al. 2505.07260 null
2025-05-11 Seed1.5-VL Technical Report Dong Guo et.al. 2505.07062 null
2025-05-11 FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers Tianyu Chen et.al. 2505.06858 null
2025-05-11 The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts Enric Boix-Adsera et.al. 2505.06839 null
2025-05-10 Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Zihan Qiu et.al. 2505.06708 link
2025-05-10 Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding Dawei Huang et.al. 2505.06685 link
2025-05-10 QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration HamidReza Imani et.al. 2505.06481 null
2025-05-12 FloE: On-the-Fly MoE Inference on Memory-constrained GPU Yuxin Zhou et.al. 2505.05950 null
2025-05-09 MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Haojie Duanmu et.al. 2505.05799 link
2025-05-08 Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts Ming Li et.al. 2505.05035 null
2025-05-07 Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Yehui Tang et.al. 2505.04519 null
2025-05-07 SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios Ning Cheng et.al. 2505.04201 null
2025-05-07 LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? Teddy Foley et.al. 2505.04075 link
2025-05-07 Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications Yuanai Xie et.al. 2505.04068 null
2025-05-06 Towards Smart Point-and-Shoot Photography Jiawan Li et.al. 2505.03638 null
2025-05-06 Faster MoE LLM Inference for Extremely Large Models Haoqi Yang et.al. 2505.03531 null
2025-05-06 STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation Maolin Wang et.al. 2505.03484 null
2025-05-06 3D Gaussian Splatting Data Compression with Mixture of Priors Lei Liu et.al. 2505.03310 null
2025-05-05 Finger Pose Estimation for Under-screen Fingerprint Sensor Xiongjun Guan et.al. 2505.02481 link
2025-05-05 Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems Kai Zhang et.al. 2505.02381 null
2025-05-05 Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques Sanjay Surendranath Girija et.al. 2505.02309 null
2025-05-04 Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields Zhenxing Mi et.al. 2505.02005 link
2025-05-03 Backdoor Attacks Against Patch-based Mixture of Experts Cedric Chan et.al. 2505.01811 link
2025-05-01 MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling Abdoul Majid O. Thiombiano et.al. 2505.01459 null
2025-05-02 Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders Rogelio A Mancisidor et.al. 2505.01134 null
2025-05-02 CoCoAFusE: Beyond Mixtures of Experts via Model Fusion Aurelio Raffa Ugolini et.al. 2505.01105 null
2025-05-01 Improving Routing in Sparse Mixture of Experts with Graph of Tokens Tam Nguyen et.al. 2505.00792 null
2025-05-01 CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series Tian Lan et.al. 2505.00415 null
2025-05-01 Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piękos et.al. 2505.00315 link
2025-04-30 Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders Xuwei Yang et.al. 2505.00216 null
2025-04-29 TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts Pradip Kunwar et.al. 2504.21190 null
2025-04-29 Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization Shuai Gong et.al. 2504.21063 null
2025-04-26 PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight Ben Goertzel et.al. 2504.21029 null
2025-04-29 MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification Yichu Xu et.al. 2504.20509 null
2025-04-29 FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks Wenjing Xiao et.al. 2504.20446 null
2025-04-29 MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation Amaan Izhar et.al. 2504.20343 link
2025-04-28 Accelerating Mixture-of-Experts Training with Adaptive Expert Replication Athinagoras Skiadopoulos et.al. 2504.19925 null
2025-04-28 Decentralization of Generative AI via Mixture of Experts for Wireless Networks: A Comprehensive Survey Yunting Xu et.al. 2504.19660 null
2025-04-28 ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving Renju Feng et.al. 2504.19580 link
2025-04-29 BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts Qingyue Wang et.al. 2504.18598 null
2025-04-25 NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation Rob Romijnders et.al. 2504.18147 null
2025-04-28 Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection Haokai Zhang et.al. 2504.17834 link
2025-04-22 Compass-V2 Technical Report Sophia Maria et.al. 2504.15527 null
2025-04-21 Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images Jonathan Brokman et.al. 2504.15470 link
2025-04-17 D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang et.al. 2504.15299 null
2025-04-23 MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core Dennis Liu et.al. 2504.14960 null
2025-04-18 Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts Jie Zou et.al. 2504.13655 null
2025-04-18 HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering Alexander Rusnak et.al. 2504.13590 null
2025-04-18 Dense Backpropagation Improves Training for Sparse Mixture-of-Experts Ashwinee Panda et.al. 2504.12463 link
2025-04-16 Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models Yuanbo Tang et.al. 2504.12359 null
2025-04-16 Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data Sangwon Hyun et.al. 2504.12287 null
2025-04-16 MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models Hang Yuan et.al. 2504.12234 null
2025-04-15 Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology Henrik Häggström et.al. 2504.11279 link
2025-04-14 Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning LeiLei Ma et.al. 2504.09990 null
2025-04-14 Multi-objective Bayesian Optimization With Mixed-categorical Design Variables for Expensive-to-evaluate Aeronautical Applications Nathalie Bartoli et.al. 2504.09930 null
2025-04-14 Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming Zhiqiang He et.al. 2504.09906 null
2025-04-13 Mixture-of-Shape-Experts (MoSE): End-to-End Shape Dictionary Framework to Prompt SAM for Generalizable Medical Segmentation Jia Wei et.al. 2504.09601 null
2025-04-12 MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints Yichao Yuan et.al. 2504.09345 null
2025-04-12 Mixture of Group Experts for Learning Invariant Representations Lei Kang et.al. 2504.09265 null
2025-04-11 RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao et.al. 2504.08989 link
2025-04-11 Regularized infill criteria for multi-objective Bayesian optimization with application to aircraft design Robin Grapin et.al. 2504.08671 null
2025-04-10 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Zhongyang Li et.al. 2504.07964 link
2025-04-11 Scaling Laws for Native Multimodal Models Mustafa Shukor et.al. 2504.07951 null
2025-04-10 Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models Hongcheng Guo et.al. 2504.07807 link
2025-04-10 Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural Network Peng Jia et.al. 2504.07777 null
2025-04-10 Kimi-VL Technical Report Kimi Team et.al. 2504.07491 link
2025-04-09 MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution Zhe Wang et.al. 2504.07308 link
2025-04-11 Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Ling Team et.al. 2504.07158 null
2025-04-09 Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Zican Dong et.al. 2504.06792 null
2025-04-09 FedMerge: Federated Personalization via Model Merging Shutong Chen et.al. 2504.06768 null
2025-04-08 S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning Hanqing Zeng et.al. 2504.06426 null
2025-04-08 HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Shuzhang Zhong et.al. 2504.05897 link
2025-04-08 Adaptive Substructure-Aware Expert Model for Molecular Property Prediction Tianyi Jiang et.al. 2504.05844 null
2025-04-10 Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations Ajay Jaiswal et.al. 2504.05586 null
2025-04-07 SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement Zuying Xie et.al. 2504.04818 null
2025-04-06 On the Spatial Structure of Mixture-of-Experts in Transformers Daniel Bershatsky et.al. 2504.04444 null
2025-04-05 Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator Bing Wang et.al. 2504.04076 link
2025-04-04 HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Yongji Wu et.al. 2504.03871 null
2025-04-01 Detecting Financial Fraud with Hybrid Deep Learning: A Mix-of-Experts Approach to Sequential and Anomalous Patterns Diego Vallarino et.al. 2504.03750 null
2025-04-04 RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation Hanbo Bi et.al. 2504.03166 null
2025-04-03 TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models Xinquan Wang et.al. 2504.02712 null
2025-04-07 MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Beichen Huang et.al. 2504.02658 link
2025-04-07 MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism Ruidong Zhu et.al. 2504.02263 null
2025-04-02 Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Mohan Zhang et.al. 2504.01337 null
2025-04-01 Mixture-of-Experts for Distributed Edge Computing with Channel-Aware Gating Function Qiuchen Song et.al. 2504.00819 null
2025-04-01 DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Dengchun Li et.al. 2504.00661 link
2025-04-01 Continual Cross-Modal Generalization Yan Xia et.al. 2504.00561 null
2025-04-01 Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection Shunxin Chen et.al. 2504.00458 null
2025-03-31 Unimodal-driven Distillation in Multimodal Emotion Recognition with Dynamic Fusion Jiagen Li et.al. 2503.23721 null
2025-03-30 Mixture of Routers Jia-Chen Zhang et.al. 2503.23362 null
2025-03-29 Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models Zehua Liu et.al. 2503.23100 null
2025-03-29 S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning Giang Do et.al. 2503.23007 null
2025-03-29 Sparse Mixture of Experts as Unified Competitive Learning Giang Do et.al. 2503.22996 null
2025-04-01 Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities Raman Dutt et.al. 2503.22517 null
2025-03-27 RocketPPA: Ultra-Fast LLM-Based PPA Estimator at Code-Level Abstraction Armin Abdollahi et.al. 2503.21971 null
2025-03-27 iMedImage Technical Report Ran Wei et.al. 2503.21836 null
2025-03-27 LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models Hengyuan Zhao et.al. 2503.21227 null
2025-03-26 Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework Soham Sane et.al. 2503.20750 null
2025-03-26 UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines Chen Tang et.al. 2503.20748 null
2025-03-26 Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning Sashuai Zhou et.al. 2503.20633 null
2025-03-26 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Rongyu Zhang et.al. 2503.20384 null
2025-03-26 Modality-Independent Brain Lesion Segmentation with Privacy-aware Continual Learning Yousef Sadegheih et.al. 2503.20326 link
2025-03-25 Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion Konyul Park et.al. 2503.19776 null
2025-03-25 BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts Suzhe Xu et.al. 2503.19769 null
2025-03-25 M $^2$ CD: A Unified MultiModal Framework for Optical-SAR Change Detection with Mixture of Experts and Self-Distillation Ziyuan Liu et.al. 2503.19406 null
2025-03-27 Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design Rui Xie et.al. 2503.18869 null
2025-03-24 Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding Tianyu Chen et.al. 2503.18578 null
2025-03-24 SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking Wenrui Cai et.al. 2503.18338 link
2025-03-23 Challenging Dataset and Multi-modal Gated Mixture of Experts Model for Remote Sensing Copy-Move Forgery Understanding Ze Zhang et.al. 2503.18104 link
2025-03-22 Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM Codefuse et.al. 2503.17793 null
2025-03-25 Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Yike Yuan et.al. 2503.16057 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-05-27 Mixture of Lookup Experts Shibo Jie et.al. 2503.15798 null
2025-03-21 Leveraging MoE-based Large Language Model for Zero-Shot Multi-Task Semantic Communication Sin-Yu Huang et.al. 2503.15722 null
2025-03-19 SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation Thomas Pickard et.al. 2503.15358 null
2025-03-21 Body-Hand Modality Expertized Networks with Cross-attention for Fine-grained Skeleton Action Recognition Seungyeon Cho et.al. 2503.14960 null
2025-03-18 Core-Periphery Principle Guided State Space Model for Functional Connectome Classification Minheng Chen et.al. 2503.14655 null
2025-03-18 MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts Runqi Meng et.al. 2503.14355 null
2025-03-18 SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts Architecture Tian Qin et.al. 2503.13808 null
2025-03-17 Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge Shengling Qin et.al. 2503.13421 null
2025-03-17 Channel Estimation for Pinching-Antenna Systems (PASS) Jian Xiao et.al. 2503.13268 null
2025-03-17 Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation Yu Liu et.al. 2503.13254 null
2025-03-16 Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps Mohammad Al-Jarrah et.al. 2503.12633 link
2025-03-16 MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts Harshit et.al. 2503.12592 null
2025-03-16 MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification Jianwei Zhao et.al. 2503.12401 null
2025-03-15 Adaptive Mixture of Experts Learning for Robust Audio Spoofing Detection Qixian Chen et.al. 2503.12010 null
2025-03-14 FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA Jieming Bian et.al. 2503.11880 null
2025-03-14 A Review of DeepSeek Models' Key Innovative Techniques Chengen Wang et.al. 2503.11486 null
2025-03-14 MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling Rachel S. Y. Teo et.al. 2503.11144 link
2025-03-13 Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores Chenpeng Wu et.al. 2503.10725 link
2025-03-14 dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis Luyuan Xie et.al. 2503.10412 null
2025-03-13 StableFusion: Continual Video Retrieval via Frame Adaptation Zecheng Zhao et.al. 2503.10111 link
2025-03-12 Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework Bakary Badjie et.al. 2503.09504 null
2025-03-12 Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment Nazanin Moradinasab et.al. 2503.09498 link
2025-03-12 Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Xiaoda Yang et.al. 2503.09445 null
2025-03-12 Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach Ruifeng She et.al. 2503.09357 null
2025-03-12 Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference Mohammad Siavashi et.al. 2503.09304 null
2025-03-13 FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models Fufangchen Zhao et.al. 2503.09158 null
2025-03-11 MoE-Loco: Mixture of Experts for Multitask Locomotion Runhan Huang et.al. 2503.08564 null
2025-03-11 Accelerating MoE Model Inference with Expert Sharding Oana Balmau et.al. 2503.08467 null
2025-03-11 Uni $\textbf{F}^2$ ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Junzhe Li et.al. 2503.08120 null
2025-03-11 MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Han Zhao et.al. 2503.08007 null
2025-03-10 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-03-10 A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu et.al. 2503.07137 link
2025-03-10 VMTS: Vision-Assisted Teacher-Student Reinforcement Learning for Multi-Terrain Locomotion in Bipedal Robots Fu Chen et.al. 2503.07049 link
2025-03-10 ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration Mengting Ai et.al. 2503.06881 link
2025-03-10 eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference Suraiya Tairin et.al. 2503.06823 null
2025-03-09 MoFE: Mixture of Frozen Experts Architecture Jean Seo et.al. 2503.06491 null
2025-03-09 Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models Nguyen Do et.al. 2503.06413 link
2025-03-08 MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Vinay Kumar Verma et.al. 2503.06296 null
2025-03-08 A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts Wenzhuo Du et.al. 2503.06064 null
2025-03-08 MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model Miguel Contreras et.al. 2503.06059 null
2025-03-07 Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning Justin Chih-Yao Chen et.al. 2503.05641 null
2025-03-07 FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework Jingyu Xu et.al. 2503.05626 null
2025-03-07 Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Weigao Sun et.al. 2503.05447 link
2025-03-07 Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Ling Team et.al. 2503.05139 null
2025-03-07 Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts Shwai He et.al. 2503.05066 null
2025-03-06 Continual Pre-training of MoEs: How robust is your router? Benjamin Thérien et.al. 2503.05029 null
2025-03-06 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Houyi Li et.al. 2503.04715 null
2025-03-07 Question-Aware Gaussian Experts for Audio-Visual Question Answering Hongyeob Kim et.al. 2503.04459 link
2025-03-07 Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling Yan Li et.al. 2503.04398 null
2025-03-06 A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery Yiheng Zhu et.al. 2503.04362 null
2025-03-06 DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval Yating Liu et.al. 2503.04144 null
2025-03-05 VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection Enkhtogtokh Togootogtokh et.al. 2503.03797 link
2025-03-05 Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs Haoran Fan et.al. 2503.03594 link
2025-03-06 Convergence Rates for Softmax Gating Mixture of Experts Huy Nguyen et.al. 2503.03213 null
2025-03-04 MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation Weihang Wang et.al. 2503.02799 link
2025-03-04 FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting Congluo Xu et.al. 2503.02692 null
2025-03-04 Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Yujiao Yang et.al. 2503.02495 link
2025-03-04 Tabby: Tabular Data Synthesis with Language Models Sonia Cromp et.al. 2503.02152 null
2025-03-03 ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition Nastaran Mansourian et.al. 2503.01750 null
2025-03-03 Effective High-order Graph Representation Learning for Credit Card Fraud Detection Yao Zou et.al. 2503.01556 null
2025-03-03 DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Yongqi Huang et.al. 2503.01359 null
2025-03-03 PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation Linhai Zhang et.al. 2503.01303 null
2025-03-03 Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting Xiaobin Hong et.al. 2503.01157 null
2025-03-02 Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion Daiki Nishiyama et.al. 2503.00925 null
2025-03-01 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Zhongyang Li et.al. 2502.20395 link
2025-02-27 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias et.al. 2502.20213 null
2025-02-27 Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems Zeyi Ren et.al. 2502.20183 null
2025-02-27 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook Yidi Jiang et.al. 2502.20067 null
2025-03-01 Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts Shulai Zhang et.al. 2502.19811 link
2025-02-26 Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Taishi Nakamura et.al. 2502.19261 null
2025-02-26 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Jiaxin Deng et.al. 2502.18965 null
2025-02-25 Generative AI-enabled Wireless Communications for Robust Low-Altitude Economy Networking Changyuan Zhao et.al. 2502.18118 null
2025-02-24 The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE Andrei Chernov et.al. 2502.17391 null
2025-02-24 Delta Decompression for MoE-based LLMs Compression Hao Gu et.al. 2502.17298 link
2025-02-24 Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks Andrei Chernov et.al. 2502.17187 null
2025-02-24 Muon is Scalable for LLM Training Jingyuan Liu et.al. 2502.16982 link
2025-02-24 BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference Zewen Jin et.al. 2502.16927 null
2025-02-24 ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds Jiho Han et.al. 2502.16914 null
2025-02-26 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Chenghao Fan et.al. 2502.16894 link
2025-02-22 An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning Masoud Shokrnezhad et.al. 2502.16198 null
2025-02-21 A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs Yuan Sun et.al. 2502.15451 link
2025-02-21 Tight Clusters Make Specialized Experts Stefan K. Nielsen et.al. 2502.15315 link
2025-02-21 Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction Baohang Zhou et.al. 2502.15290 link
2025-02-20 Ray-Tracing for Conditionally Activated Neural Networks Claudio Gallicchio et.al. 2502.14788 null
2025-02-21 ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model Zhongyi Zhou et.al. 2502.14420 link
2025-02-19 Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts Xin Li et.al. 2502.13577 null
2025-02-18 MoBA: Mixture of Block Attention for Long-Context LLMs Enzhe Lu et.al. 2502.13189 link
2025-02-18 Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models Gyeongman Kim et.al. 2502.12947 null
2025-02-18 DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs Minxuan Lv et.al. 2502.12455 null
2025-02-17 From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs Kumari Nishu et.al. 2502.12325 null
2025-02-17 Accurate Expert Predictions in MoE Inference via Cross-Layer Gate Zhiyuan Fang et.al. 2502.12224 null
2025-02-17 How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines Ayan Sengupta et.al. 2502.12051 null
2025-02-17 Connector-S: A Survey of Connectors in Multi-modal Large Language Models Xun Zhu et.al. 2502.11453 null
2025-02-16 Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time Robert Dahlke et.al. 2502.11096 null
2025-02-16 ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models Shixuan Li et.al. 2502.11059 null
2025-02-15 Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization Matthew Lyle Olson et.al. 2502.10928 null
2025-02-12 Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution Bowen Chen et.al. 2502.09654 link
2025-02-14 Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting Nicholas Dronen et.al. 2502.09500 link
2025-02-12 The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities Ning Li et.al. 2502.08381 null
2025-02-12 Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification Xuanze Chen et.al. 2502.08083 null
2025-02-13 Training Sparse Mixture Of Experts Text Embedding Models Zach Nussbaum et.al. 2502.07972 link
2025-02-11 Memory Analysis on the Training Course of DeepSeek Models Ping Zhang et.al. 2502.07846 null
2025-02-11 MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks Lotfi Abdelkrim Mecharbat et.al. 2502.07422 null
2025-02-11 Online Aggregation of Trajectory Predictors Alex Tong et.al. 2502.07178 null
2025-02-09 Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline Zhiyuan Fang et.al. 2502.06888 null
2025-02-10 MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing Seokjin Go et.al. 2502.06643 null
2025-02-10 Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Haiduo Huang et.al. 2502.06282 link
2025-02-10 Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models Peiran Wang et.al. 2502.06094 null
2025-02-08 Mol-MoE: Training Preference-Guided Routers for Molecule Generation Diego Calanzone et.al. 2502.05633 link
2025-02-08 UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA Jiale Dong et.al. 2502.05602 link
2025-02-07 fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving Hanfei Yu et.al. 2502.05370 null
2025-02-07 Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts Roussel Desmond Nzoyem et.al. 2502.05335 null
2025-02-07 Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Jan Ludziejewski et.al. 2502.05172 null
2025-02-06 Mixture of neural operator experts for learning boundary conditions and model selection Dwyer Deighan et.al. 2502.04562 null
2025-02-06 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei et.al. 2502.04416 link
2025-02-06 Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning Peizhuang Cong et.al. 2502.03884 null
2025-02-05 (GG) MoE vs. MLP on Tabular Data Andrei Chernov et.al. 2502.03608 null
2025-02-05 RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts Tuan Truong et.al. 2502.03044 null
2025-02-05 On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation Nghiem T. Diep et.al. 2502.03029 null
2025-02-05 Scaling Laws for Upcycling Mixture-of-Experts Language Models Seng Pei Liew et.al. 2502.03009 null
2025-02-04 ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals Jianan Nie et.al. 2502.02748 null
2025-02-04 Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism Yuhao Qing et.al. 2502.02581 null
2025-02-05 Brief analysis of DeepSeek R1 and its implications for Generative AI Sarah Mercer et.al. 2502.02523 null
2025-02-04 M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference Nikhil Bhendawade et.al. 2502.02040 null
2025-02-05 MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Haibo Tong et.al. 2502.01719 null
2025-02-04 MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs Yuhang Zhou et.al. 2502.00997 null
2025-02-03 CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling Xinze Wang et.al. 2502.00965 null
2025-02-02 UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs Yufei He et.al. 2502.00806 link
2025-02-02 Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective Yujin Oh et.al. 2502.00619 link
2025-02-01 PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning Yu Feng et.al. 2502.00354 link
2025-02-01 Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective Fanqi Yan et.al. 2502.00281 null
2025-01-31 Pheromone-based Learning of Optimal Reasoning Paths Anirudh Chari et.al. 2501.19278 null
2025-01-31 Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning Minh Le et.al. 2501.18936 null
2025-01-30 MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability Yan Sun et.al. 2501.18439 null
2025-01-29 Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework Jung-Hua Liu et.al. 2501.17903 null
2025-01-29 Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks Lucio La Cava et.al. 2501.17557 null
2025-01-28 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow Yueen Ma et.al. 2501.16698 null
2025-01-27 MoEVD: Enhancing Vulnerability Detection by Mixture-of-Experts (MoE) Xu Yang et.al. 2501.16454 null
2025-01-27 Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference Yinghan Li et.al. 2501.16103 null
2025-01-25 ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning Shangqian Gao et.al. 2501.15316 null
2025-01-25 FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts Ziqi Liu et.al. 2501.15125 link
2025-01-25 Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning Ziyu Zhao et.al. 2501.15103 null
2025-01-24 Mean-field limit from general mixtures of experts to quantum neural networks Anderson Melchor Hernandez et.al. 2501.14660 null
2025-01-24 Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation Shengzhe Zhang et.al. 2501.14269 link
2025-01-24 Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images Zeyun Deng et.al. 2501.14198 null
2025-01-23 CSAOT: Cooperative Multi-Agent System for Active Object Tracking Hy Nguyen et.al. 2501.13994 null
2025-01-22 Autonomy-of-Experts Models Ang Lv et.al. 2501.13074 null
2025-01-22 LLM4WM: Adapting LLM for Wireless Multi-Tasking Xuanyu Liu et.al. 2501.12983 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR Guodong Ma et.al. 2501.12602 null
2025-01-21 Modality Interactive Mixture-of-Experts for Fake News Detection Yifan Liu et.al. 2501.12431 link
2025-01-21 SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection Xiaocheng Zhang et.al. 2501.12430 null
2025-01-21 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Samira Abnar et.al. 2501.12370 null
2025-01-21 MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks Qishen Zhou et.al. 2501.12281 link
2025-01-21 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Zihan Qiu et.al. 2501.11873 null
2025-01-18 FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models Xinglin Pan et.al. 2501.10714 null
2025-01-17 OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning Jinyuan Feng et.al. 2501.10062 null
2025-01-17 LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading Kuan-Ming Liu et.al. 2501.09636 null
2025-01-14 MiniMax-01: Scaling Foundation Models with Lightning Attention MiniMax et.al. 2501.08313 null
2025-01-14 GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism Chen Tang et.al. 2501.07890 null
2025-01-18 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration Xiaoshui Huang et.al. 2501.07762 null
2025-01-13 A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis Binyu Zhang et.al. 2501.07016 link
2025-01-12 Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning Hanwen Zhong et.al. 2501.06884 link
2025-01-10 TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning Yinghao Zhu et.al. 2501.05661 link
2025-01-09 Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing Mengfan Liu et.al. 2501.05313 null
2025-01-07 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Xiang Xu et.al. 2501.04004 link
2025-01-07 mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training Xudong Liao et.al. 2501.03905 null
2025-01-08 Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection Donatella Genovese et.al. 2501.03432 null
2025-01-12 Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning Zhongyi Zhou et.al. 2501.02198 null
2025-01-03 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders Jiajun Cao et.al. 2501.01709 null
2025-01-01 REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization Huyen Nguyen et.al. 2501.00779 null
2025-01-06 Superposition in Transformers: A Novel Way of Building Mixture of Experts Ayoub Ben Chaliah et.al. 2501.00530 link
2024-12-31 CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection Xiaolei Wang et.al. 2501.00346 null
2024-12-29 Multimodal Variational Autoencoder: a Barycentric View Peijie Qiu et.al. 2412.20487 null
2024-12-29 A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement Sidra Nasir et.al. 2412.20468 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang et.al. 2412.20156 null
2024-12-27 DeepSeek-V3 Technical Report DeepSeek-AI et.al. 2412.19437 link
2024-12-26 AskChart: Universal Chart Understanding through Textual Enhancement Xudong Yang et.al. 2412.19146 link
2024-12-30 Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection Xiaoyu Huang et.al. 2412.19108 null
2024-12-24 Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making David Shoresh et.al. 2412.18593 link
2024-12-24 BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing Yingjie Ma et.al. 2412.18065 link
2024-12-23 UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition Li Fu et.al. 2412.17507 null
2024-12-23 BrainMAP: Learning Multiple Activation Pathways in Brain Networks Song Wang et.al. 2412.17404 link
2024-12-22 Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models Elie Antoine et.al. 2412.16971 null
2024-12-20 Theory of Mixture-of-Experts for Mobile Edge Computing Hongbo Li et.al. 2412.15690 null
2024-12-19 MoEtion: Efficient and Reliable Checkpointing for Mixture-of-Experts Models at Scale Swapnil Gandhi et.al. 2412.15411 null
2024-12-19 Qwen2.5 Technical Report Qwen et.al. 2412.15115 link
2024-12-19 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Ziteng Wang et.al. 2412.14711 link
2024-12-18 A Survey on Inference Optimization Techniques for Mixture of Experts Models Jiacheng Liu et.al. 2412.14219 link
2024-12-18 SEKE: Specialised Experts for Keyword Extraction Matej Martinc et.al. 2412.14087 link
2024-12-18 MedCoT: Medical Chain of Thought via Hierarchical Expert Jiaxiang Liu et.al. 2412.13736 link
2024-12-17 SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks Mátyás Vincze et.al. 2412.13053 link
2024-12-17 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Moritz Reuss et.al. 2412.12953 null
2024-12-17 CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition He Wang et.al. 2412.12760 null
2024-12-16 Investigating Mixture of Experts in Dense Retrieval Effrosyni Sokli et.al. 2412.11864 null
2024-12-18 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Jingze Shi et.al. 2412.11834 link
2024-12-16 Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation Svetlana Pavlitska et.al. 2412.11608 link
2024-12-16 Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture Jingyu Xu et.al. 2412.11557 null
2024-12-14 DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification Yuhao Wang et.al. 2412.10650 link
2024-12-13 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu et.al. 2412.10302 link
2024-12-13 Llama 3 Meets MoE: Efficient Upcycling Aditya Vavre et.al. 2412.09952 link
2024-12-12 Memory Layers at Scale Vincent-Pierre Berges et.al. 2412.09764 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-12 Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective Minh Le et.al. 2412.08285 null
2024-12-11 Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification Xuanze Chen et.al. 2412.08193 link
2024-12-10 MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems Yao Fu et.al. 2412.07067 null
2024-12-07 Partition of Unity Physics-Informed Neural Networks (POU-PINNs): An Unsupervised Framework for Physics-Informed Domain Decomposition and Mixtures of Experts Arturo Rodriguez et.al. 2412.06842 null
2024-12-09 Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset Xiao Wang et.al. 2412.06647 link
2024-12-09 UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts Zhen Wan et.al. 2412.06340 null
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-10 An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism Qing Zhang et.al. 2412.05821 null
2024-12-10 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts Xu Liu et.al. 2412.05679 link
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts Gengze Zhou et.al. 2412.05552 link
2024-12-07 Towards 3D Acceleration for low-power Mixture-of-Experts and Multi-Head Attention Spiking Transformers Boxun Xu et.al. 2412.05540 null
2024-12-06 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-09 Monet: Mixture of Monosemantic Experts for Transformers Jungwoo Park et.al. 2412.04139 link
2024-12-05 Meta-Reinforcement Learning With Mixture of Experts for Generalizable Multi Access in Heterogeneous Wireless Networks Zhaoyang Liu et.al. 2412.03850 null
2024-12-04 Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond Loukas Ilias et.al. 2412.03483 null
2024-12-05 MQFL-FHE: Multimodal Quantum Federated Learning Framework with Fully Homomorphic Encryption Siddhant Dutta et.al. 2412.01858 null
2024-12-05 Yi-Lightning Technical Report 01. AI et.al. 2412.01253 null
2024-11-30 Mixture of Experts for Node Classification Yu Shi et.al. 2412.00418 null
2024-11-30 HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting Shaohan Yu et.al. 2412.00316 null
2024-11-27 Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference Andrii Skliar et.al. 2412.00099 null
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References Shuguo Jiang et.al. 2411.19758 null
2024-11-28 On the effectiveness of discrete representations in sparse mixture of experts Giang Do et.al. 2411.19402 null
2024-11-28 Bayesian Cluster Weighted Gaussian Models Panagiotis Papastamoulis et.al. 2411.18957 link
2024-11-27 UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS Haomin Zhuang et.al. 2411.18797 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Mixture of Experts in Image Classification: What's the Sweet Spot? Mathurin Videau et.al. 2411.18322 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-25 Staleness-Centric Optimizations for Efficient Diffusion MoE Inference Jiajun Luo et.al. 2411.16786 null
2024-11-29 MH-MoE: Multi-Head Mixture-of-Experts Shaohan Huang et.al. 2411.16205 null
2024-11-25 LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy Peng Cui et.al. 2411.16095 null
2024-11-24 Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution Haiquan Wang et.al. 2411.15871 null
2024-11-24 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu et.al. 2411.15708 link
2024-11-23 Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts Qizhou Chen et.al. 2411.15432 null
2024-11-23 Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen et.al. 2411.15419 null
2024-11-20 MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification Yuxuan Chen et.al. 2411.13004 null
2024-11-23 KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning Ming Yin et.al. 2411.12950 null
2024-11-19 Ultra-Sparse Memory Network Zihao Huang et.al. 2411.12364 null
2024-11-18 MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs Shiyi Cao et.al. 2411.11217 null
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Weakly-Supervised Multimodal Learning on MIMIC-CXR Andrea Agostini et.al. 2411.10356 link
2024-11-21 Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models Wei Wang et.al. 2411.10003 null
2024-11-13 Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection Vima Gupta et.al. 2411.08982 null
2024-11-13 Sparse Upcycling: Inference Inefficient Finetuning Sasha Doubov et.al. 2411.08968 null
2024-11-13 LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Xiaonan Nie et.al. 2411.08446 null
2024-11-12 Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach Renzi Wang et.al. 2411.08232 null
2024-11-12 PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model Yilun Liu et.al. 2411.08212 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge Emmanuel Azuh Mensah et.al. 2411.07834 null
2024-11-11 Adaptive Conditional Expert Selection Network for Multi-domain Recommendation Kuiyao Dong et.al. 2411.06826 null
2024-11-11 WDMoE: Wireless Distributed Mixture of Experts for Large Language Models Nan Xue et.al. 2411.06681 null
2024-11-09 Learning Mixtures of Experts with EM Quentin Fruytier et.al. 2411.06056 null
2024-11-08 NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Yen-Ting Lin et.al. 2411.05945 null
2024-11-05 DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of Experts Zelin Yao et.al. 2411.03025 link
2024-11-05 Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts Yuan Xie et.al. 2411.02787 null
2024-11-06 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Xingwu Sun et.al. 2411.02265 null
2024-11-04 FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation Ziwei Zhan et.al. 2411.02115 null
2024-11-03 RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin et.al. 2411.01595 null
2024-11-03 Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation Mingrui Liu et.al. 2411.01457 null
2024-11-06 HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Peng Tang et.al. 2411.01433 null
2024-11-07 HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy Shuqing Luo et.al. 2411.01288 link
2024-11-02 PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment Dongxu Liu et.al. 2411.01245 null
2024-11-01 MoE-I $^2$ : Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang et.al. 2411.01016 null
2024-11-01 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen et.al. 2411.00918 link
2024-11-01 MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization Jingming Guo et.al. 2411.00662 link
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-30 Efficient and Interpretable Grammatical Error Correction with Mixture of Experts Muhammad Reza Qorib et.al. 2410.23507 link
2024-10-30 Stealing User Prompts from Mixture of Experts Itay Yona et.al. 2410.22884 null
2024-10-30 MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning Xujia Wang et.al. 2410.22782 null
2024-10-29 ProMoE: Fast MoE-based LLM Serving using Proactive Caching Xiaoniu Song et.al. 2410.22134 null
2024-10-29 Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging Li Shen et.al. 2410.21804 null
2024-10-29 Neural Experts: Mixture of Experts for Implicit Neural Representations Yizhak Ben-Shabat et.al. 2410.21643 null
2024-10-28 FinTeamExperts: Role Specialized MOEs For Financial Analysis Yue Yu et.al. 2410.21338 null
2024-10-28 Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving Jiyao Wang et.al. 2410.21086 null
2024-10-27 Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Maohao Shen et.al. 2410.20336 null
2024-10-27 GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields Yusuke Sekikawa et.al. 2410.20306 null
2024-10-25 DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction Zelin Zang et.al. 2410.19504 link
2024-10-25 Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis Weikai Li et.al. 2410.19225 link
2024-10-24 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai et.al. 2410.19123 link
2024-10-24 Mixture of Parrots: Experts improve memorization more than reasoning Samy Jelassi et.al. 2410.19034 null
2024-10-24 MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases Zhisheng Lin et.al. 2410.18406 null
2024-10-23 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches Kexin Feng et.al. 2410.18298 null
2024-10-23 MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning Jingfan Zhang et.al. 2410.18035 null
2024-10-24 ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He et.al. 2410.17954 null
2024-10-23 Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition Artem Basharin et.al. 2410.17765 null
2024-10-22 Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling Jialong Li et.al. 2410.17043 null
2024-10-21 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang et.al. 2410.16095 link
2024-10-22 CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Zhenpeng Su et.al. 2410.16077 link
2024-10-21 Generalizing Motion Planners with Mixture of Experts for Autonomous Driving Qiao Sun et.al. 2410.15774 link
2024-10-21 ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Xumeng Han et.al. 2410.15732 null
2024-10-20 Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Xin Zhou et.al. 2410.15438 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-19 MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Suning Huang et.al. 2410.14972 null
2024-10-18 MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts Rachel S. Y. Teo et.al. 2410.14574 link
2024-10-18 ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction Haoyu He et.al. 2410.14099 link
2024-10-17 Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in Compositional Tasks Jinze Zhao et.al. 2410.13964 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts Fanqi Yan et.al. 2410.12258 null
2024-10-16 EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference Yulei Qian et.al. 2410.12247 null
2024-10-15 MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie et.al. 2410.12013 null
2024-10-15 MoH: Multi-Head Attention as Mixture-of-Head Attention Peng Jin et.al. 2410.11842 link
2024-10-15 GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation Fei Tang et.al. 2410.11841 link
2024-10-15 Transformer Layer Injection: A Novel Approach for Efficient Upscaling of Large Language Models James Vo et.al. 2410.11654 null
2024-10-16 Quadratic Gating Functions in Mixture of Experts: A Statistical Insight Pedram Akbarian et.al. 2410.11222 null
2024-10-16 Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Ziyue Li et.al. 2410.10814 link
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469 null
2024-10-15 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs Tongtian Yue et.al. 2410.10456 null
2024-10-14 Tighter Risk Bounds for Mixtures of Experts Wissam Akretche et.al. 2410.10397 null
2024-10-14 Scalable Multi-Domain Adaptation of Language Models using Modular Experts Peter Schafhalter et.al. 2410.10181 null
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Jun Luo et.al. 2410.10114 link
2024-10-14 AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Peijun Qing et.al. 2410.10054 link
2024-10-13 ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL Zhanqiu Guo et.al. 2410.09781 null
2024-10-11 Semi-Supervised Learning of Noisy Mixture of Experts Models Oh-Ran Kwon et.al. 2410.09039 null
2024-10-11 Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering I-Chun Chen et.al. 2410.08589 link
2024-10-10 Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts Sukwon Yun et.al. 2410.08245 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Efficient Dictionary Learning with Switch Sparse Autoencoders Anish Mudide et.al. 2410.08201 link
2024-10-10 More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing Sagi Shaier et.al. 2410.08003 link
2024-10-10 SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture Jiayi Han et.al. 2410.07739 null
2024-10-10 Upcycling Large Language Models into Mixture of Experts Ethan He et.al. 2410.07524 null
2024-10-09 MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin et.al. 2410.07348 link
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Probing the Robustness of Theory of Mind in Large Language Models Christian Nickel et.al. 2410.06271 null
2024-10-08 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang et.al. 2410.06270 link
2024-10-08 Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li et.al. 2410.05993 link
2024-10-08 Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models Siqi Wang et.al. 2410.05661 null
2024-10-07 Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild Xinyu Zhao et.al. 2410.05357 link
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding Kuan-Chen Mu et.al. 2410.04511 null
2024-10-09 Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding Wei Wu et.al. 2410.03553 null
2024-10-04 Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang et.al. 2410.03440 link
2024-10-03 MLP-KAN: Unifying Deep Representation and Function Learning Yunhong He et.al. 2410.03027 link
2024-10-03 On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions Huy Nguyen et.al. 2410.02935 null
2024-10-03 Neutral residues: revisiting adapters for model extension Franck Signe Talla et.al. 2410.02744 null
2024-10-03 Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping Ziye Huang et.al. 2410.02475 null
2024-10-03 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Zhaojian Yu et.al. 2410.02241 null
2024-10-03 Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts Minh Le et.al. 2410.02200 link
2024-10-04 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices Andres Potapczynski et.al. 2410.02117 link
2024-10-04 EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing Haotian Sun et.al. 2410.02098 null
2024-10-02 Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL Ghada Sokar et.al. 2410.01930 null
2024-10-02 Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models Shayekh Bin Islam et.al. 2410.01782 link
2024-10-02 Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging Tingfeng Hui et.al. 2410.01610 null
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417 null
2024-10-01 MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Sheng Wang et.al. 2410.00938 null
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Robust Traffic Forecasting against Spatial Shift over Years Hongjun Wang et.al. 2410.00373 link
2024-09-29 IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method Chaohui Xu et.al. 2410.00059 null
2024-09-30 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang et.al. 2409.20566 null
2024-10-02 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Jihai Zhang et.al. 2409.19291 link
2024-09-27 SciDFM: A Large Language Model with Mixture-of-Experts for Science Liangtai Sun et.al. 2409.18412 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-26 A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction Guangyu Wang et.al. 2409.17440 link
2024-09-24 Leveraging Mixture of Experts for Improved Speech Deepfake Detection Viola Negroni et.al. 2409.16077 null
2024-10-02 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Xiaoming Shi et.al. 2409.16040 link
2024-09-24 Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Fengrun Zhang et.al. 2409.15905 null
2024-09-24 Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks Jiayi He et.al. 2409.15695 null
2024-09-23 A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts Hugo Inzirillo et.al. 2409.15161 link
2024-09-23 Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond Hong Chen et.al. 2409.14993 null
2024-09-21 Routing in Sparsely-gated Language Models responds to Context Stefan Arnold et.al. 2409.14107 null
2024-09-20 On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan et.al. 2409.13931 link
2024-09-20 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning Annette Spooner et.al. 2409.13791 null
2024-09-19 Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu et.al. 2409.12370 null
2024-09-18 GRIN: GRadient-INformed MoE Liyuan Liu et.al. 2409.12136 null
2024-09-18 Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0 Zhiyong Wang et.al. 2409.11909 link
2024-09-17 LPT++: Efficient Training on Mixture of Long-tailed Experts Bowen Dong et.al. 2409.11323 null
2024-09-19 LOLA -- An Open-Source Massively Multilingual Large Language Model Nikit Srivastava et.al. 2409.11272 link
2024-09-16 Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression Yi-Hsin Li et.al. 2409.10101 null
2024-09-14 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving Enming Zhang et.al. 2409.07267 link
2024-09-10 DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models Maryam Akhavan Aghdam et.al. 2409.06669 null
2024-09-10 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning Jaeseong Lee et.al. 2409.06211 null
2024-09-10 VE: Modeling Multivariate Time Series Correlation with Variate Embedding Shangjiong Wang et.al. 2409.06169 link
2024-09-09 Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models Hongyang Lei et.al. 2409.05929 link
2024-09-09 Optical Spiking Neurons Enable High-Speed and Energy-Efficient Optical Neural Networks Bo Xu et.al. 2409.05726 null
2024-09-09 Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection Tianwu Lei et.al. 2409.05611 null
2024-09-05 Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions Zemian Ke et.al. 2409.03282 null
2024-09-05 ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding Zhengzhuo Xu et.al. 2409.03277 null
2024-09-05 xLAM: A Family of Large Action Models to Empower AI Agent Systems Jianguo Zhang et.al. 2409.03215 link
2024-09-04 Configurable Foundation Models: Building LLMs from a Modular Perspective Chaojun Xiao et.al. 2409.02877 null
2024-09-04 Pluralistic Salient Object Detection Xuelu Feng et.al. 2409.02368 null
2024-09-03 OLMoE: Open Mixture-of-Experts Language Models Niklas Muennighoff et.al. 2409.02060 link
2024-09-05 Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model Hukai Huang et.al. 2409.02050 null
2024-09-02 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning Soumajyoti Sarkar et.al. 2409.01483 null
2024-09-02 Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching Sungmin Yun et.al. 2409.01141 null
2024-09-04 Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack Guanzhong Chen et.al. 2409.00960 link
2024-09-02 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts Youngseog Chung et.al. 2409.00879 null
2024-08-29 Gradient-free variational learning with conditional mixture networks Conor Heins et.al. 2408.16429 link
2024-08-28 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts Nikolas Gritsch et.al. 2408.15901 null
2024-08-28 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Fangxun Shu et.al. 2408.15881 link
2024-08-28 Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Lean Wang et.al. 2408.15664 null
2024-08-27 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis Sakhinana Sagar Srinivas et.al. 2408.15305 null
2024-08-27 MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce Hao Jiang et.al. 2408.14968 null
2024-08-24 Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings Sagar Srinivas Sakhinana et.al. 2408.13622 null
2024-08-23 The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities Venkatesh Balavadhani Parthasarathy et.al. 2408.13296 null
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation Xiaowei Mao et.al. 2408.12809 link
2024-08-23 Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth Yuxiang Wei et.al. 2408.12803 null
2024-08-23 La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection Hang Zou et.al. 2408.12793 null
2024-08-22 SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging Mohammadreza Pourreza et.al. 2408.12733 null
2024-08-22 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Jamba Team et.al. 2408.12570 null
2024-08-22 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 link
2024-08-21 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou et.al. 2408.11396 link
2024-08-21 KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? Xiao Han et.al. 2408.11306 link
2024-08-21 FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts Hanzi Mei et.al. 2408.11304 null
2024-08-20 Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data Atmika Gorti et.al. 2408.11247 null
2024-08-20 Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting Jianxiang Zhou et.al. 2408.10822 link
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700 link
2024-08-20 HMoE: Heterogeneous Mixture of Experts for Language Modeling An Wang et.al. 2408.10681 null
2024-08-19 AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Shuzhang Zhong et.al. 2408.10284 link
2024-08-17 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models Xiaochen Wang et.al. 2408.10276 link
2024-08-19 Customizing Language Models with Instance-wise LoRA for Sequential Recommendation Xiaoyu Kong et.al. 2408.10159 link
2024-08-19 A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method Hang Zou et.al. 2408.09752 null
2024-08-16 Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection Haohao Zhu et.al. 2408.08551 link
2024-08-17 BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts Qizhen Zhang et.al. 2408.08274 null
2024-08-14 Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation CanYi Liu et.al. 2408.07427 null
2024-08-13 A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning Prateek Yadav et.al. 2408.07057 null
2024-08-13 Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu et.al. 2408.06793 link
2024-08-13 AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies Bo-Wen Zhang et.al. 2408.06567 null
2024-08-10 HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou Xu Wang et.al. 2408.05430 null
2024-08-08 Understanding the Performance and Estimating the Cost of LLM Fine-Tuning Yuchen Xia et.al. 2408.04693 link
2024-08-08 Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training Weilin Cai et.al. 2408.04307 null
2024-08-08 LaDiMo: Layer-wise Distillation Inspired MoEfier Sungyoon Kim et.al. 2408.04278 null
2024-08-07 MoExtend: Tuning New Experts for Modality and Task Extension Shanshan Zhong et.al. 2408.03511 link
2024-08-05 Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization Changtao Miao et.al. 2408.02306 null
2024-08-02 HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction Xingyu Lou et.al. 2408.01332 null
2024-08-01 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation Hai Yu et.al. 2408.00365 null
2024-08-12 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Xi Victoria Lin et.al. 2407.21770 null
2024-07-31 PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning Min Jae Jung et.al. 2407.21571 null
2024-07-30 Distribution Learning for Molecular Regression Nima Shoghi et.al. 2407.20475 null
2024-07-29 Time series forecasting with high stakes: A field study of the air cargo industry Abhinav Garg et.al. 2407.20192 null
2024-07-30 Mixture of Nested Experts: Adaptive Processing of Visual Tokens Gagan Jain et.al. 2407.19985 null
2024-07-28 Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models Mohammed Al-Maamari et.al. 2407.19610 link
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Boyi Li et.al. 2407.18908 null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition Chang Liu et.al. 2407.18616 link
2024-07-26 Dynamic Language Group-Based MoE: Enhancing Efficiency and Flexibility for Code-Switching Speech Recognition Hukai Huang et.al. 2407.18581 link
2024-07-25 How Lightweight Can A Vision Transformer Be Jen Hong Tan et.al. 2407.17783 null
2024-07-24 Exploring Domain Robust Lightweight Reward Models based on Router Mechanism Hyuk Namgoong et.al. 2407.17546 null
2024-07-24 M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis Junyu Li et.al. 2407.17267 link
2024-07-25 Cheems: Wonderful Matrices More Efficient and More Effective Architecture Jingze Shi et.al. 2407.16958 null
2024-07-22 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag et.al. 2407.15811 link
2024-07-22 Norface: Improving Facial Expression Analysis by Identity Normalization Hanwei Liu et.al. 2407.15617 link
2024-07-19 Mixture of Experts with Mixture of Precisions for Tuning Quality of Service HamidReza Imani et.al. 2407.14417 null
2024-07-19 EVLM: An Efficient Vision-Language Model for Visual Understanding Kaibing Chen et.al. 2407.14177 null
2024-07-19 Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models Qiong Wu et.al. 2407.14093 null
2024-07-18 Discussion: Effective and Interpretable Outcome Prediction by Training Sparse Mixtures of Linear Experts Francesco Folino et.al. 2407.13526 null
2024-07-18 Mixture of Experts based Multi-task Supervise Learning from Crowds Tao Han et.al. 2407.13268 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-18 Qwen2 Technical Report An Yang et.al. 2407.10671 link
2024-07-15 Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering Francesco Di Sario et.al. 2407.10389 null
2024-07-13 Low-Rank Interconnected Adaptation Across Layers Yibo Zhong et.al. 2407.09946 link
2024-07-13 MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts Zhenpeng Su et.al. 2407.09816 link
2024-07-12 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts Zeliang Zhang et.al. 2407.09590 null
2024-07-11 An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio Siding Zeng et.al. 2407.08239 null
2024-07-10 MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations Vignesh Prasad et.al. 2407.07636 link
2024-07-10 Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation Szymon Płotka et.al. 2407.07514 link
2024-07-09 A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts Atilla Özgür et.al. 2407.06718 null
2024-07-06 SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation Guoan Wang et.al. 2407.04938 null
2024-07-06 Completed Feature Disentanglement Learning for Multimodal MRIs Analysis Tianling Liu et.al. 2407.04916 link
2024-07-05 YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation Sungkyun Chang et.al. 2407.04822 link
2024-07-05 Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement Yongji Wu et.al. 2407.04656 null
2024-07-05 MobileFlow: A Multimodal LLM For Mobile GUI Agent Songqin Nong et.al. 2407.04346 null
2024-07-04 Mixture of A Million Experts Xu Owen He et.al. 2407.04153 null
2024-07-02 Terminating Differentiable Tree Experts Jonathan Thomm et.al. 2407.02060 null
2024-07-05 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Zihan Wang et.al. 2407.01906 link
2024-07-01 Uncertainty Quantification in Table Structure Recognition Kehinde Ajayi et.al. 2407.01731 link
2024-07-01 Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning Yixiao Wang et.al. 2407.01531 null
2024-07-01 Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation Nadezhda Chirkova et.al. 2407.01126 null
2024-07-01 Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs Enshu Liu et.al. 2407.00945 link
2024-07-03 Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules Xinglin Pan et.al. 2407.00599 link
2024-07-02 One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Ruochen Wang et.al. 2407.00256 null
2024-06-28 LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models Renzhi Wang et.al. 2406.20030 null
2024-06-28 Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Longrong Yang et.al. 2406.19905 link
2024-06-28 SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR Qiuming Zhao et.al. 2406.19706 link
2024-06-27 A Teacher Is Worth A Million Instructions Nikhil Kothari et.al. 2406.19112 link
2024-06-27 Towards Personalized Federated Multi-scenario Multi-task Recommendation Yue Ding et.al. 2406.18938 null
2024-06-26 Mixture of Experts in a Mixture of RL settings Timon Willi et.al. 2406.18420 null
2024-06-26 A Closer Look into Mixture-of-Experts in Large Language Models Ka Man Lo et.al. 2406.18219 link
2024-06-26 SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Shuaishuai Ye et.al. 2406.18021 null
2024-06-24 Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction Bruce Rushing et.al. 2406.17150 link
2024-06-24 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Tong Zhu et.al. 2406.16554 link
2024-06-25 OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser Jingze Shi et.al. 2406.16495 link
2024-06-24 Theory on Mixture-of-Experts in Continual Learning Hongbo Li et.al. 2406.16437 null
2024-06-22 SimSMoE: Solving Representational Collapse via Similarity Measure Giang Do et.al. 2406.15883 null
2024-06-20 Voice Disorder Analysis: a Transformer-based Approach Alkis Koudounas et.al. 2406.14693 link
2024-06-19 Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation Qian Chen et.al. 2406.13583 null
2024-06-19 AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Zihao Zeng et.al. 2406.13233 link
2024-06-18 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts Haoxiang Wang et.al. 2406.12845 link
2024-06-18 P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts Yuhao Dan et.al. 2406.12548 null
2024-06-18 Variational Distillation of Diffusion Policies into Mixture of Experts Hongyi Zhou et.al. 2406.12538 null
2024-06-18 GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Haoze Wu et.al. 2406.12375 link
2024-06-17 Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding Ukyo Honda et.al. 2406.12060 link
2024-06-17 DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence DeepSeek-AI et.al. 2406.11931 link
2024-06-17 Graph Knowledge Distillation to Mixture of Experts Pavel Rumiantsev et.al. 2406.11919 link
2024-06-17 $\texttt{MoE-RBench}$ : Towards Building Reliable Language Models with Sparse Mixture-of-Experts Guanjie Chen et.al. 2406.11353 link
2024-06-17 Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Tong Zhu et.al. 2406.11256 link
2024-06-14 Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion Anke Tang et.al. 2406.09770 link
2024-06-13 DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts Joel Ong et.al. 2406.08742 link
2024-06-12 Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark Pingzhi Li et.al. 2406.08155 link
2024-06-11 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Yixin Song et.al. 2406.05955 null
2024-06-08 Flexible and Adaptable Summarization via Expertise Separation Xiuying Chen et.al. 2406.05360 link
2024-06-07 MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter Jitai Hao et.al. 2406.04984 link
2024-06-07 MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks Xingkui Zhu et.al. 2406.04801 link
2024-06-05 Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid et.al. 2406.03637 null
2024-06-05 Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach Haoyu Han et.al. 2406.03464 null
2024-06-05 Continual Traffic Forecasting via Mixture of Experts Sanghyun Lee et.al. 2406.03140 null
2024-06-05 Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur et.al. 2406.02969 null
2024-06-04 Parrot: Multilingual Visual Instruction Tuning Hai-Long Sun et.al. 2406.02539 link
2024-06-04 Demystifying the Compression of Mixture-of-Experts Through a Unified Framework Shwai He et.al. 2406.02500 link
2024-06-02 Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model Clement Etienam et.al. 2406.00889 link
2024-06-01 A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers Daniel Waxman et.al. 2406.00570 link
2024-06-01 Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks Jiacheng Wang et.al. 2406.00408 null
2024-05-30 Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach Reza Arabpour et.al. 2405.20094 null
2024-06-02 MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors Renzhi Wang et.al. 2405.19086 null
2024-06-02 Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design Markus J. Buehler et.al. 2405.19076 link
2024-05-29 Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization Shengcai Liu et.al. 2405.18884 link
2024-05-29 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models Taehyun Kim et.al. 2405.18832 null
2024-05-29 Yuan 2.0-M32: Mixture of Experts with Attention Router Shaohua Wu et.al. 2405.17976 link
2024-05-28 LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design Rui Kong et.al. 2405.17741 null
2024-05-27 Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node Andreas Charalampopoulos et.al. 2405.16836 link
2024-05-26 Mixture of Experts Using Tensor Products Zhan Su et.al. 2405.16671 link
2024-05-30 A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts Mohammed Nowaz Rabbani Chowdhury et.al. 2405.16646 null
2024-05-26 Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation Rongyu Zhang et.al. 2405.16486 link
2024-05-25 MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás et.al. 2405.16039 link
2024-05-23 Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training Xianzhi Du et.al. 2405.15052 link
2024-05-23 Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Chufan Shi et.al. 2405.14507 link
2024-05-23 Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Yongxin Guo et.al. 2405.14297 link
2024-05-23 Graph Sparsification via Mixture of Graphs Guibin Zhang et.al. 2405.14260 link
2024-05-23 Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts Huy Nguyen et.al. 2405.14131 null
2024-05-23 Mixture of Experts Meets Prompt-Based Continual Learning Minh Le et.al. 2405.14124 link
2024-05-22 Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts Huy Nguyen et.al. 2405.13997 null
2024-05-22 xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token Xin Cheng et.al. 2405.13792 link
2024-05-24 MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models Jingwei Xu et.al. 2405.13053 link
2024-05-21 Optimizing Generative AI Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts Ruichen Zhang et.al. 2405.12472 null
2024-05-21 Ensemble and Mixture-of-Experts DeepONets For Operator Learning Ramansh Sharma et.al. 2405.11907 link
2024-05-19 Learning More Generalized Experts by Merging Experts in Mixture-of-Experts Sejik Park et.al. 2405.11530 null
2024-05-18 Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Yunxin Li et.al. 2405.11273 link
2024-05-16 Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts Ruolin Su et.al. 2405.09744 null
2024-05-15 M $^4$ oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts Yufeng Jiang et.al. 2405.09446 link
2024-05-13 Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition Zhiyong Yang et.al. 2405.07780 link
2024-05-07 SUTRA: Scalable Multilingual Language Model Architecture Abhijit Bendale et.al. 2405.06694 null
2024-05-09 A Mixture of Experts Approach to 3D Human Motion Prediction Edmund Shieh et.al. 2405.06088 link
2024-05-09 A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds Christopher Z. Cui et.al. 2405.06059 null
2024-05-09 EWMoE: An effective model for global weather forecasting with mixture-of-experts Lihao Gan et.al. 2405.06004 link
2024-05-09 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Jiachen Li et.al. 2405.05949 link
2024-05-16 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI et.al. 2405.04434 link
2024-05-07 Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts Changyuan Zhao et.al. 2405.04198 null
2024-05-06 Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training Zexuan Zhong et.al. 2405.03133 null
2024-05-06 WDMoE: Wireless Distributed Large Language Models with Mixture of Experts Nan Xue et.al. 2405.03131 null
2024-05-31 Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models Xudong Lu et.al. 2402.14800 null
2024-10-29 GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts Shirley Wu et.al. 2312.04693 null
2021-05-25 Tensor-variate Mixture of Experts for Proportional Myographic Control of a Robotic Hand Noémie Jaquier et.al. 1902.11104 null
2018-06-22 Mixtures of Experts Models Isobel Claire Gormley et.al. 1806.08200 null

(back to top)

About

🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%